Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, A while back I had: 09:16:51 drbd0: PingAck did not arrive in time. 09:16:51 drbd0: drbd0_asender [11510]: cstate Connected --> NetworkFailure 09:16:51 drbd0: asender terminated 09:16:51 drbd0: drbd0_receiver [7390]: cstate NetworkFailure --> BrokenPipe 09:16:51 drbd0: short read expecting header on sock: r=-512 09:16:51 drbd0: worker terminated 09:16:51 drbd0: drbd0_receiver [7390]: cstate BrokenPipe --> Unconnected 09:16:51 drbd0: Connection lost. 09:16:51 drbd0: drbd0_receiver [7390]: cstate Unconnected --> WFConnection 09:17:11 drbd0: drbd0_receiver [7390]: cstate WFConnection --> WFReportParams 09:17:11 drbd0: sock_sendmsg returned -104 09:17:11 drbd0: drbd0_receiver [7390]: cstate WFReportParams --> BrokenPipe 09:17:11 drbd0: short sent HandShake size=80 sent=0 09:17:11 drbd0: Discarding network configuration. 09:17:11 drbd0: worker terminated 09:17:11 drbd0: drbd0_receiver [7390]: cstate BrokenPipe --> Unconnected 09:17:11 drbd0: Connection lost. 09:17:11 drbd0: drbd0_receiver [7390]: cstate Unconnected --> StandAlone 09:17:11 drbd0: receiver terminated If I read this right it fails before handshake with ECONNRESET ?? My problem is that my network is beyond my control, and I occasionally see congestion and outages of 2-3 minutes. What I would like to achieve is a system that will wait for a short while (if I understand correctly this is the parameter 'timeout') then fall back to waiting to reconnect for quite a long time (say 5 mins), and I'm hoping that all I need to do is tweak some parameters, but I'm really not getting it. If I have 'on-disconnect reconnect', and make the heatbeat timeout 5 minutes does this not cover it ? This is a debian 0.7.10-3 module built on debian sarge against 2.6.8-2-k7. I'm vaguely aware I probably need to upgrade to a 2.6.12.something kernel for a bio bug, but I don't immediately see the connection here. Apologies if I'm wrong :) I just took a quick look at the drbd changelog for later versions, but nothing jumped out at me. Again, apologies if I'm time-wasting here. I imagine my drbd.conf to be fairly vanilla: resource r0 { protocol C; # incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; incon-degr-cmd "exit 1"; startup { #wfc-timeout 0; ## Infinite! #wfc-timeout 240; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 10M; group 1; al-extents 257; } on homer.panici.net { device /dev/drbd0; disk /dev/vg0/maillog; address x.x.x.x:7788; meta-disk /dev/vg0/drbdmeta[0]; } on marge.panici.net { device /dev/drbd0; disk /dev/vg0/maillog; address x.x.x.x:7788; meta-disk /dev/vg0/drbdmeta[0]; } } drbdsetup /dev/drbd0 show says: Lower device: 254:08 (dm-8) Meta device: 254:09 (dm-9) Meta index: 0 Disk options: on-io-error = detach Local address: x.x.x.x:7788 Remote address: x.x.x.x:7788 Wire protocol: C Net options: timeout = 6.0 sec (default) connect-int = 10 sec (default) ping-int = 10 sec (default) max-epoch-size = 2048 (default) max-buffers = 2048 (default) sndbuf-size = 131070 (default) ko-count = 0 (default) Syncer options: rate = 10240 KB/sec group = 1 al-extents = 257 TIA Regards Paddy -- Perl 6 will give you the big knob. -- Larry Wall