[DRBD-user] Oopses/timeouts on 0.7.5 with many big volumes

Philipp Reisner philipp.reisner at linbit.com
Tue Nov 30 11:54:19 CET 2004


On Tuesday 30 November 2004 11:18, Renaud Guerin wrote:
> Hello,
>
> I'm trying to set up DRBD 0.7.5 on 2.6.10-rc1 with two 4GB RAM dual Xeon
> machines replicating ten ~500GB volumes.
> My drbd.conf is mostly standard stuff, I left all the timeouts ,
> protocol, startup, and disk directives as default.
>
> I should probably mention that I had to patch the original 0.7.5 drbd
> because of a variable name change in 2.6.10-rc1 (can't remember which
> variable it was right now, but I doubt it has anything to do with my
> problem, it's a pretty trivial patch)
>

drbd-0.7.6 is known to compile on linux-2.6.10-rc2

> So I have 10 volumes on this machine going from vol21 to vol30.
>
> Problem 1:
> ---------
>
> When I start drbd for the first time I get something like this :
> /etc/init.d/drbd start
>
> Starting DRBD resources:    [vol21]Child process does not terminate!
> Exiting.
>
> If I start again:
>
> Starting DRBD resources:    [vol21][vol22]Child process does not terminate!
> Exiting.
>
> and so on : one more volume is completed every time I launch the script.
> Looking at the source code, I believe the timeout has to do with the
> SLEEPS_LONG / SLEEPS_VERY_LONG values that probably aren't enough in my
> case (but i'm not sure if I'm hitting the "long" or "very long" timeout).
> Can these be made into module parameters, or at least #define's ?
>
> Eventually, after enough launches, all volumes are corrected
> initialized, except...

Yes, I have changed SLEEPS_LONG from 60 to 120 Seconds with the 
drbd-0.7.6 release.

> Problem 2:
> ---------
>
> It goes on like this until [vol27], where the init script hangs, and I
> have this in the kernel logs:
>
> drbd8: Creating state block
> allocation failed: out of vmalloc space - use vmalloc=<size> to increase
> size.
> drbd8: BM resizing failed. Leaving size unchanged at size = 488333312 KB
> drbd8: Assuming that all blocks are out of sync (aka FullSync)
> drbd8: drbd_bm_set_all: (!(b && b->bm)) in

The BitMap needs 32kByte per 1 GB of storage (in your case ~157 MB) RAM.
This RAM needs to be mapped into the virtual address space of the kernel.
[If I remeber correctly] there are only 128MB virtual address space 
available for this on an x86-highmem kernel. 

Solution1: Get a 64 bit architecture like AMD Opteron.
Solution2: Ask LINBIT to rewrite the BitMap code to use unmapped 
           (highmem) pages.

[...]
>
>
> Problem 3 (most serious)
> ---------
>
> There's a new, more serious oops that happened on the target machine, a
> few minutes after I finally started synchronisation:
>
[...]

Have never seen this before... But will not trust anything as long as
problem2 is not solved (at least reduce the number of volumes, so that
there is enough LOWMEM to hold the BitMap for the volumes...)

...and switch to drbd-0.7.6

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :



More information about the drbd-user mailing list