[DRBD-user] Oopses/timeouts on 0.7.5 with many big volumes

Tue Nov 30 12:22:43 CET 2004

Hi,

On Tuesday 30 November 2004 11:54, Philipp Reisner wrote:
> On Tuesday 30 November 2004 11:18, Renaud Guerin wrote:
[..]
> > Problem 1:
> > ---------
> >
> > When I start drbd for the first time I get something like this :
> > /etc/init.d/drbd start
> >
> > Starting DRBD resources:    [vol21]Child process does not terminate!
Hi,

> > Exiting.
> >
> > If I start again:
> >
> > Starting DRBD resources:    [vol21][vol22]Child process does not terminate!
> > Exiting.
> >
> > and so on : one more volume is completed every time I launch the script.
> > Looking at the source code, I believe the timeout has to do with the
> > SLEEPS_LONG / SLEEPS_VERY_LONG values that probably aren't enough in my
> > case (but i'm not sure if I'm hitting the "long" or "very long" timeout).
> > Can these be made into module parameters, or at least #define's ?
> >
> > Eventually, after enough launches, all volumes are corrected
> > initialized, except...
> 
> Yes, I have changed SLEEPS_LONG from 60 to 120 Seconds with the 
> drbd-0.7.6 release.

this might not really solve this issue, since the "child process" (the one which
does not terminate) stays until the corresponding device is in sync. Couldn't
drbd check whether the child syncs and went on with the other devices? 
In fact, drbd might first check, which devices are in the highest sync group
(e.g. group=1)  and start the corresponding child processes for parallel
sync. The others might be schedduled accordingly for sync. Just my thoughts.
Is there any necessarity for waiting? I consider "Exiting" as a bug
in a HA context. I stumbeled on the same recently.

Regards,
Andreas