[DRBD-user] Forcibly disconnect if secondary is responding too slow?

Lars Ellenberg lars.ellenberg at linbit.com
Fri Nov 7 09:46:42 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Nov 06, 2014 at 07:29:03PM +0100, Felix Zachlod wrote:
> Hello!
> 
> I have just noticed a strange behaviour with a drbd setup.
> 
> After rebooting a node the secondary reconnected and started resync,
> then after a short time a disk on the secondary started to throw
> command timeouts... I don't know why the raid controller did not
> remove the disk but I got a log of error messages and the i/o on the
> VD was stuck for a long time. Then it recovered and froze again.
> 
> I observed the behaviour and decided to disconnect the secondary as
> I/O on the primary was frozen too. But neither disconnect worked on
> the primary nor on the secondary for the problematic drbd device, so
> I had to reset the secondary for resolving this problem. The Primary
> continued without a problem after that. Unfortunately the I/O was
> stall for at least 3-4 minutes I assume so that I/O errors where
> thrown in different vms.
> 
> So I wondererd if there was a possibility to configure a forcibly
> disconnect for unresponsive resources. I can see the disk-timeout
> which is described to be dangerous and would detach the drbd device
> and I can see the timeout option which would disconnect if no packet
> was received by the peer, but it seems that DRBD either decided to
> NOT disconnect or could not disconnest for some reason.
> 
> I can see such messages in the log on the primary:
> 
> Nov  6 18:06:08 node-a kernel: [4315372.080011] block drbd1:
> [drbd1_worker/5623] sock_sendmsg time expired, ko = 4294967256

Right there.
DRBD config setting is ko-count.
Also, please use drbd 8.4 (where that would default to 7, iirc).

> But the node remained in connected / Uptodate/Inconsistent state
> until I reset the peer. How can such a behaviour avoided?

See above: ko-count.
also, you could have used --force disconnect (if no other drbdsetup is
blocking yet), or instead resetting the other box, simply cut the tcp
connection (iptables or any other tool you may be more comfortable with).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list