Hello,

<br>

<br>We have a highly available cluster using  DRBD over an<br>arp-monitored 

bonded interface. Due to the system design,<br>the bonded interface can take as long 5 sec to fail-over.<br>

<br>If a bonding fail-over happens while the DRBD link is idle,<br>a situation 

sometimes occurs where a DRBD-ping is sent<br>but is never received by the secondary (packet is lost in the TCP 

layer),<br>resulting in the DRBD Network Failure

and fail-over starting even though<br>the link comes back up in the mean time.

<br>

<br>From drbd.conf:

<br>Net options:

<br>timeout = 8.0 sec

<br>connect-int = 10 sec (default)

<br>ping-int = 10 sec (default

<br>max-epoch-size = 2048  (default)

<br>max-buffers = 2048  (default)

<br>unplug-watermark = 128  (default)

<br>sndbuf-size = 2097152

<br>ko-count = 2

<br>Syncer options:

<br>rate = 8192 KB/sec

<br>group = 0  (default)

<br>al-extents = 127  (default)

<br>

<br>In the case when bonding fail-over occurs while drbd is active due to<br>disk IO, the DRBD Network Failure

never seems to happen, because<br>write requests are re-transmitted in case 

when write-ack is

not received<br>in the timeout period.<br>

<br>Is there a way to set the DRBD-ping to be retried a number of times<br>

before the link is assumed broken?

<br>

<br>Thanks in advance for any feedback.

<br><br>Alex<br><br>PS Trully sorry if this appears again, but my first post did not appear on the mailing list<br>