[DRBD-user] help tuning drbd timeouts

paddy paddy at panici.net
Wed Aug 24 22:50:39 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


A while back I had:

09:16:51 drbd0: PingAck did not arrive in time.
09:16:51 drbd0: drbd0_asender [11510]: cstate Connected --> NetworkFailure
09:16:51 drbd0: asender terminated
09:16:51 drbd0: drbd0_receiver [7390]: cstate NetworkFailure --> BrokenPipe
09:16:51 drbd0: short read expecting header on sock: r=-512
09:16:51 drbd0: worker terminated
09:16:51 drbd0: drbd0_receiver [7390]: cstate BrokenPipe --> Unconnected
09:16:51 drbd0: Connection lost.
09:16:51 drbd0: drbd0_receiver [7390]: cstate Unconnected --> WFConnection
09:17:11 drbd0: drbd0_receiver [7390]: cstate WFConnection --> WFReportParams
09:17:11 drbd0: sock_sendmsg returned -104
09:17:11 drbd0: drbd0_receiver [7390]: cstate WFReportParams --> BrokenPipe
09:17:11 drbd0: short sent HandShake size=80 sent=0
09:17:11 drbd0: Discarding network configuration.
09:17:11 drbd0: worker terminated
09:17:11 drbd0: drbd0_receiver [7390]: cstate BrokenPipe --> Unconnected
09:17:11 drbd0: Connection lost.
09:17:11 drbd0: drbd0_receiver [7390]: cstate Unconnected --> StandAlone
09:17:11 drbd0: receiver terminated

If I read this right it fails before handshake with ECONNRESET ??

My problem is that my network is beyond my control, and I occasionally see
congestion and outages of 2-3 minutes.  

What I would like to achieve is a system that will wait for a short while
(if I understand correctly this is the parameter 'timeout')
then fall back to waiting to reconnect for quite a long time (say 5 mins),
and I'm hoping that all I need to do is tweak some parameters, but I'm
really not getting it.  If I have 'on-disconnect reconnect', and make
the heatbeat timeout 5 minutes does this not cover it ?

This is a debian 0.7.10-3 module built on debian sarge against 2.6.8-2-k7.

I'm vaguely aware I probably need to upgrade to a 2.6.12.something kernel
for a bio bug, but I don't immediately see the connection here.  
Apologies if I'm wrong :)

I just took a quick look at the drbd changelog for later versions, but nothing
jumped out at me.  Again, apologies if I'm time-wasting here.

I imagine my drbd.conf to be fairly vanilla:

resource r0 {
  protocol C;
#  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
  incon-degr-cmd "exit 1";
  startup {
    #wfc-timeout         0;  ## Infinite!
    #wfc-timeout      240;
    degr-wfc-timeout 120;    # 2 minutes.
  disk {
    on-io-error   detach;
  net {
  syncer {
    rate 10M;
    group 1;
    al-extents 257;
  on homer.panici.net {
    device     /dev/drbd0;
    disk       /dev/vg0/maillog;
    address    x.x.x.x:7788;
    meta-disk  /dev/vg0/drbdmeta[0];
  on marge.panici.net {
    device    /dev/drbd0;
    disk      /dev/vg0/maillog;
    address   x.x.x.x:7788;
    meta-disk /dev/vg0/drbdmeta[0];

drbdsetup /dev/drbd0 show says:

Lower device: 254:08   (dm-8)
Meta device: 254:09   (dm-9)
Meta index: 0
Disk options:
 on-io-error = detach
Local address: x.x.x.x:7788
Remote address: x.x.x.x:7788
Wire protocol: C
Net options:
 timeout = 6.0 sec (default)
 connect-int = 10 sec (default)
 ping-int = 10 sec (default)
 max-epoch-size = 2048  (default)
 max-buffers = 2048  (default)
 sndbuf-size = 131070  (default)
 ko-count = 0  (default)
Syncer options:
 rate = 10240 KB/sec
 group = 1
 al-extents = 257


Perl 6 will give you the big knob. -- Larry Wall

More information about the drbd-user mailing list