Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Sep 01, 2009 at 12:28:01PM +0200, Massimo CtRiX Cetra wrote: > Philipp Reisner ha scritto: >> Hi, >> >> Please help testing this pre release... >> >> > > I see the same problems, here, that i see on 8.3.2 and that i described > a couple of days ago resuming an old thread > See "Re: [DRBD-user] bind before connect failed, err = -99" > > errno 99 is: > > #define EADDRNOTAVAIL 99 /* Cannot assign requested address */ > > and that is definitely not true: when i start drbd, the IP addresses are > already assigned, but drbd fails to bind() to the socket. > As i wrote, i'm testing in a couple of KVM virtualized machines on > debian, using 2.6.35.5. > > What i found is that if i "relax" the init scripts, that is i add a 1 > second delay when drbd starts (sleep 1 added in /etc/init.d/drbd), this > error doesn't show up (though, there are other problems). Then probably when the drbd init script is started, the IP is _NOT_ yet available. > My conclusion (enforced by the sleep1; test) is that on SMP machines, > Network initialization takes some time and drbd, even if is started > later in the init scripts, cannot bind to the correct ip. > The proper solution in this case is to add a retry loop, waiting for a > second before a bind and the next one. Nope. Proper solution is that the network initialisation scripts don't return until they have properly initialized the network. > Let me use the logs to explain the problem: > > NODE A: > > version: 8.3.3rc1 (api:88/proto:86-91) > GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by > phil at fat-tyre, 2009-08-28 15:07:52 > > 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- > ns:0 nr:0 dw:0 dr:1344 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 > > NODE B: > wsos2:~# cat /proc/drbd > version: 8.3.3rc1 (api:88/proto:86-91) > GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by > phil at fat-tyre, 2009-08-28 15:07:52 > > 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r---- > ns:0 nr:0 dw:0 dr:272 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 > > > Note that node B is synced, uptodate and in SECONDARY mode. > This time (KVM is set to SMP=1, so it's using 1 processor) the virtual > machine is rebooted. > > NODE A: > > version: 8.3.3rc1 (api:88/proto:86-91) > GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by > phil at fat-tyre, 2009-08-28 15:07:52 > > 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r---- > ns:0 nr:0 dw:0 dr:1344 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 > > This correctly switches to WFConnection waiting for the other node to > come back. > > When the other node reboots and the situation is: > > > NODE A: > wsos1:~# cat /proc/drbd > version: 8.3.3rc1 (api:88/proto:86-91) > GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by > phil at fat-tyre, 2009-08-28 15:07:52 > > 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r---- > ns:0 nr:0 dw:0 dr:1344 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 > > > NODE B: > version: 8.3.3rc1 (api:88/proto:86-91) > GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by > phil at fat-tyre, 2009-08-28 15:07:52 > > 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r---- > ns:0 nr:0 dw:0 dr:272 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 Then fix the order of your init and shutdown scripts. probably network is started either in parralel or after DRBD, explaining your first issue, and stopped in parralel or before DRBD, explaining this issue. > dmesg shows the following: please avoid line wraps when pasting log files. > NODE B (which was rebooted): > [ 10.592528] drbd: initialized. Version: 8.3.3rc1 (api:88/proto:86-91) the cause of your problem is layed on shutdown already, not when they reboot. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed