Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, We have a highly available cluster using DRBD over an arp-monitored bonded interface. Due to the system design, the bonded interface can take as long 5 sec to fail-over. If a bonding fail-over happens while the DRBD link is idle, a situation sometimes occurs where a DRBD-ping is sent but is never received by the secondary (packet is lost in the TCP layer), resulting in the DRBD Network Failure and fail-over starting even though the link comes back up in the mean time. >From drbd.conf: Net options: timeout = 8.0 sec connect-int = 10 sec (default) ping-int = 10 sec (default max-epoch-size = 2048 (default) max-buffers = 2048 (default) unplug-watermark = 128 (default) sndbuf-size = 2097152 ko-count = 2 Syncer options: rate = 8192 KB/sec group = 0 (default) al-extents = 127 (default) In the case when bonding fail-over occurs while drbd is active due to disk IO, the DRBD Network Failure never seems to happen, because write requests are re-transmitted in case when write-ack is not received in the timeout period. Is there a way to set the DRBD-ping to be retried a number of times before the link is assumed broken? Thanks in advance for any feedback. Alex PS Trully sorry if this appears again, but my first post did not appear on the mailing list -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121024/dd1cba91/attachment.htm>