[DRBD-user] drbc 9.1.1 whole cluster blocked
Andreas Pflug
pgadmin at pse-consulting.de
Thu May 27 13:47:46 CEST 2021
No ko-count set, so apparently something different...
Am 27.05.21 um 13:37 schrieb Rene Peinthor:
> Could still be related to this fix:
>
> * fix timeout detection after idle periods and for configs with ko-count
> when a disk on an a secondary stops delivering IO-completion events
>
> So if you have a ko-count set, this should be fixed.
> Or it is something completely different... ;)
>
> Cheers,
> Rene
>
> On Thu, May 27, 2021 at 1:25 PM Andreas Pflug <pgadmin at pse-consulting.de
> <mailto:pgadmin at pse-consulting.de>> wrote:
>
> I'm running a Proxmox cluster with 3 disk nodes and 3 diskless nodes
> with drbd 9.1.1. The disk nodes have storage on md raid6 (8 ssds each)
> with a journal on an optane device.
>
> Yesterday, the whole cluster was severely impacted when one node had
> write problems. There is no indication for any hardware problem, no
> events whatsoever. What happened, taken from the logs:
>
> - one diskless node reports "sending time expired" for some devices on a
> specific disk node. After 30 seconds, it disconnects those devices on
> that node.
> - the disk node logs state change to outdated.
> - After 80s, the disk node logs "task blocked for more than 120
> seconds". These tasks are 8 drbd_r_xxx processes, but also md2_reclaim.
> - No more logging after that.
>
> After that, the whole cluster was severely impacted, most vms
> unresponsive. The node hosts were still accessible, with no more kernel
> logging.
>
> After analyzing the situation, assuming a single node would block
> everything, that node was rebooted (no normal reboot possible, needed
> "echo b >/proc/sysrq-trigger"). This did help, everything back to
> normal.
>
> So apparently there are situations when a backing storage problem might
> block all drbd processing in a way that prevents normal timeout
> detection and subsequent disconnection on other nodes. Reading the 9.1.2
> release notes, this doesn't seem to be addressed there.
>
> Regards,
> Andreas
>
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT <https://github.com/LINBIT>
> drbd-user mailing list
> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
> https://lists.linbit.com/mailman/listinfo/drbd-user
> <https://lists.linbit.com/mailman/listinfo/drbd-user>
>
More information about the drbd-user
mailing list