[DRBD-user] disk-timeout and automatic reattach

Tue Sep 24 14:51:37 CEST 2013

Hello,

After experiencing several dual-node failures due to faulty RAID
controller on one of nodes (it is famous 3ware 9650-16ML, which may
reset and do not respond for 30 seconds at all due to command sent to
its control channel) I noticed 'disk-timeout' option which may help with
such issue.

Visible symptoms are that monitor operations of iscsi target and luns
(ietd) timeout due to block device (drbd in 'Primary') stops to respond
because 'Secondary' replica does not respond (protocol C) due to raid
reset. Target is then restarted.

Of course, I can increase pacemaker monitor timeouts, but I honestly
prefer disk to be dropped very quickly if it does not answer. It is
natural that drbd goes to a 'Diskless' state if block device does not
respond.

Before enabling it I'd like to get "Yep, that's correct" here ;)

One more thing I care about, is that it would be nice if drbd tries to
re-attach disk back after some time (may be several attempts). Although
this is probably very rare case (I was really surprised that hardware
may do that), it would be very helpful for such brain-dead devices (Yes,
I already ordered some new adapters and backplanes, but still want to
reuse existing ones for less mission-critical tasks as they was quite
expensive).

Thank you,
Vladislav