Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, On Tuesday 30 November 2004 13:34, Philipp Reisner wrote: > > > > If I start again: > > > > > > > > Starting DRBD resources: [vol21][vol22]Child process does not > > > > terminate! Exiting. > > > > > > > > and so on : one more volume is completed every time I launch the > > > > script. Looking at the source code, I believe the timeout has to do > > > > with the SLEEPS_LONG / SLEEPS_VERY_LONG values that probably aren't > > > > enough in my case (but i'm not sure if I'm hitting the "long" or "very > > > > long" timeout). Can these be made into module parameters, or at least > > > > #define's ? > > > > > > > > Eventually, after enough launches, all volumes are corrected > > > > initialized, except... > > > > > > Yes, I have changed SLEEPS_LONG from 60 to 120 Seconds with the > > > drbd-0.7.6 release. > > > > this might not really solve this issue, since the "child process" (the one > > which does not terminate) stays until the corresponding device is in sync. > > Couldn't drbd check whether the child syncs and went on with the other > > devices? In fact, drbd might first check, which devices are in the highest > > sync group (e.g. group=1) and start the corresponding child processes for > > parallel sync. The others might be schedduled accordingly for sync. Just my > > thoughts. Is there any necessarity for waiting? I consider "Exiting" as a > > bug in a HA context. I stumbeled on the same recently. > > > > Renaud and I were refering to "drbdsetup /dev/drbdX disk ..." while you > are refering to "drbdstup /dev/drbdX wait_sync" probabely... I follow the same procedure Renaud described: /etc/init.d/drbd start - which is essentially: A) # /sbin/drbdadm wait_connect all or (another similar failure) B) # /sbin/drbdadm up drbdX After STONITH and restart of fsrv1 - meanwhile fsrv2 is active - drbd does not connect all devices as desired. This may work with a higher timeout, but if the AL is not sparse, resyncing might take some more minutes on bigger volumes. A) fsrv1:~ # rcdrbd start Starting DRBD resources: [drbd0][drbd1][drbd2] Waiting until resources are connected (or timeouted)ioctl(,WAIT_*,) failed: Timer expired drbdsetup exited with code 20 . While waiting: fsrv2:~ # cat /proc/drbd 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:0 nr:0 dw:460 dr:887846 al:2 bm:76 lo:1440 pe:0 ua:1440 ap:0 [================>...] sync'ed: 83.7% (171652/1052676)K finish: 0:00:30 speed: 5,538 (5,275) K/sec 1: cs:StandAlone st:Primary/Unknown ld:Consistent ns:0 nr:80 dw:140 dr:58362 al:2 bm:2 lo:0 pe:0 ua:0 ap:0 2: cs:StandAlone st:Primary/Unknown ld:Consistent ns:0 nr:60 dw:100 dr:58020 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 fsrv2:/dev/vgs80a # cat /proc/drbd version: 0.7-pre8 (api:74/proto:72) And about two minutes later after expiration: fsrv2:~ # cat /proc/drbd 0: cs:Connected st:Primary/Secondary ld:Consistent ns:0 nr:0 dw:460 dr:1053738 al:2 bm:94 lo:0 pe:0 ua:0 ap:0 1: cs:StandAlone st:Primary/Unknown ld:Consistent ns:0 nr:80 dw:140 dr:58362 al:2 bm:2 lo:0 pe:0 ua:0 ap:0 2: cs:StandAlone st:Primary/Unknown ld:Consistent ns:0 nr:60 dw:100 dr:58020 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 So the system comes up, but does not recover fully. And B) fsrv1:~ # /etc/init.d/drbd start Starting DRBD resources: [drbd0]Child process does not terminate! Exiting. Failed setting up drbd1 -- fsrv2:~ # cat /proc/drbd version: 0.7-pre8 (api:74/proto:72) 0: cs:SyncTarget st:Secondary/Secondary ld:Inconsistent ns:0 nr:324580 dw:1394488 dr:2111612 al:4 bm:277 lo:0 pe:1267 ua:0 ap:0 [======>.............] sync'ed: 30.7% (730060/1052672)K finish: 0:02:13 speed: 5,474 (5,376) K/sec 1: cs:WFConnection st:Primary/Unknown ld:Consistent ns:0 nr:3848 dw:1056588 dr:2111268 al:1 bm:395 lo:0 pe:0 ua:0 ap:0 2: cs:WFConnection st:Primary/Unknown ld:Consistent ns:0 nr:3724 dw:1056400 dr:2111072 al:1 bm:396 lo:0 pe:0 ua:0 ap:0 Regards, Andreas > > -Phil