[DRBD-user] System lockup with DRBD

Sun Nov 7 03:50:09 CET 2010

On 10-11-06 10:32 PM, chambal wrote:
> Not having much luck pursuing this - on a newer OS, I cannot even
> run DRBD - I get an immediate kernel panic.
> 
> I was hopeful that a newer OS on this VIA EPIA-M800 hardware
> (with OCZ Vertex-Turbo SSD) might solve the intermittent lockup
> problem.  I chose Fedora 13.  I loaded it from the DVD, put
> drbd-8.3.9.tar.gz on it and did:
> 
> ./configure --prefix=/usr/local --sbindir=/usr/local/sbin
> --localstatedir=/var --sysconfdir=/etc --without-heartbeat
> --without-pacemaker --without-xen
> make clean
> make
> make install
> chkconfig --add drbd
> 
> Cleared and initialized storage via:
> 
> dd if=/dev/zero bs=1M count=1 of=/dev/sda2
> drbdadm create-md r0
> 
> When I then did "service drbd start", I got a kernel panic.
> 
> I tried 8.3.7 instead, same result.  Went back to 8.3.9, added
> "--with-km" to configure, same result.  I played with the
> configuration file - if it didn't define a valid resource, no
> kernel panic, otherwise it crashes.  I also tried "yum update"
> which took the kernel from 2.6.33.3-85.fc13.i686.PAE to
> 2.6.34.7-61.fc13.i686.PAE and after again compiling/installing
> drbd, it still crashes.  The syslog never has a record of the
> panic details.
> 
> The strange thing is, the older CentOS 5.5 with drbd 8.3.9 on the
> same hardware works fine except for the within-a-day lockup
> problem.
> 
> Configuration: NetworkManager service turned off, network on,
> ifcfg-eth1 has IP=10.0.1.151, sysconfig/network has hostname set
> to f13-1.sync, hosts file has that IP and name.  Partner unit is
> not yet set up.  drbd.conf (minimal for testing):
> 
> resource r0 {
>     protocol C;
>     on f13-1.sync {
>         device     /dev/drbd1;
>         disk       /dev/sda2;
>         address    10.0.1.151:7788;
>         meta-disk  internal;
>     }
>     on f13-2.sync {
>         device    /dev/drbd1;
>         disk      /dev/sda2;
>         address   10.0.1.152:7788;
>         meta-disk internal;
>     }
> }

I missed the start of this thread, so apologies if I repeat someone else.

Can you open two extra terminal windows. In one, run 'watch cat
/proc/drbd' (if the 'drbd' module is not loaded yet, this file will not
exist). In the other, run 'clear; tail -f -n 0 /var/log/messages'. Now
you can watch output as you run through the following commands. Watch
for errors are each step.

If '/proc/drbd' doesn't exist, run:

modprobe drbd

Now, on either node, connect DRBD to it's backing device with:

drbdadm attach r1

Now tell both nodes to connect to the other with:

drbdadm connect r1

If you're still alive, and assuming you're running primary/primary, run
the following on both nodes:

drbdadm primary r1

If you have the default sync rate (I didn't see your global config),
then try notching up the sync speed ~10M at a time to see if it's a
failure triggered by network or read/write speeds:

drbdsetup /dev/drbd1 syncer -r 10M (20M, 30M, ...)

At this stage, you've effectively done everything that '/etc/init.d/drbd
start' does. When it fails, report at what step it failed and what, if
anything, was shown in either /proc/drbd or /var/log/messages.

HTH

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org