[DRBD-user] System lockup with DRBD

chambal 2iow-li6l at dea.spamcon.org
Sun Nov 7 05:46:42 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Digimer <linux at alteeve.com> wrote:

>On 10-11-06 10:32 PM, chambal wrote:
>> Not having much luck pursuing this - on a newer OS, I cannot even
>> run DRBD - I get an immediate kernel panic.
>> 
>> I was hopeful that a newer OS on this VIA EPIA-M800 hardware
>> (with OCZ Vertex-Turbo SSD) might solve the intermittent lockup
>> problem.  I chose Fedora 13.  I loaded it from the DVD, put
>> drbd-8.3.9.tar.gz on it and did:
>> 
>> ./configure --prefix=/usr/local --sbindir=/usr/local/sbin
>> --localstatedir=/var --sysconfdir=/etc --without-heartbeat
>> --without-pacemaker --without-xen
>> make clean
>> make
>> make install
>> chkconfig --add drbd
>> 
>> Cleared and initialized storage via:
>> 
>> dd if=/dev/zero bs=1M count=1 of=/dev/sda2
>> drbdadm create-md r0
>> 
>> When I then did "service drbd start", I got a kernel panic.
>> 
>> I tried 8.3.7 instead, same result.  Went back to 8.3.9, added
>> "--with-km" to configure, same result.  I played with the
>> configuration file - if it didn't define a valid resource, no
>> kernel panic, otherwise it crashes.  I also tried "yum update"
>> which took the kernel from 2.6.33.3-85.fc13.i686.PAE to
>> 2.6.34.7-61.fc13.i686.PAE and after again compiling/installing
>> drbd, it still crashes.  The syslog never has a record of the
>> panic details.
>> 
>> The strange thing is, the older CentOS 5.5 with drbd 8.3.9 on the
>> same hardware works fine except for the within-a-day lockup
>> problem.
>> 
>> Configuration: NetworkManager service turned off, network on,
>> ifcfg-eth1 has IP=10.0.1.151, sysconfig/network has hostname set
>> to f13-1.sync, hosts file has that IP and name.  Partner unit is
>> not yet set up.  drbd.conf (minimal for testing):
>> 
>> resource r0 {
>>     protocol C;
>>     on f13-1.sync {
>>         device     /dev/drbd1;
>>         disk       /dev/sda2;
>>         address    10.0.1.151:7788;
>>         meta-disk  internal;
>>     }
>>     on f13-2.sync {
>>         device    /dev/drbd1;
>>         disk      /dev/sda2;
>>         address   10.0.1.152:7788;
>>         meta-disk internal;
>>     }
>> }
>
>I missed the start of this thread, so apologies if I repeat someone else.
>
>Can you open two extra terminal windows. In one, run 'watch cat
>/proc/drbd' (if the 'drbd' module is not loaded yet, this file will not
>exist). In the other, run 'clear; tail -f -n 0 /var/log/messages'. Now
>you can watch output as you run through the following commands. Watch
>for errors are each step.
>
>If '/proc/drbd' doesn't exist, run:
>
>modprobe drbd
>
>Now, on either node, connect DRBD to it's backing device with:
>
>drbdadm attach r1

Thanks for the help.  At the "modprobe drbd" step, the watch
window began showing:

version: 8.3.7 (api:88/proto:86-92)
srcversion: 582E47DEE6FD9EC45926ECF

And syslog showed:

Nov  6 21:31:35 f13-1 kernel: drbd: initialized. Version: 8.3.7
(api:88/proto:86-92)
Nov  6 21:31:35 f13-1 kernel: drbd: srcversion: 582E47DEE6FD9EC45926ECF
Nov  6 21:31:35 f13-1 kernel: drbd: registered as block device major 147
Nov  6 21:31:35 f13-1 kernel: drbd: minor_table @ 0xf6280d80

Then at the "drbadadm attach r0" step, it crashed.  Nothing
showed in the syslog or watch windows (it crashed instantly).
After reboot, nothing in syslog after the above lines, until the
new boot messages.


>
>Now tell both nodes to connect to the other with:
>
>drbdadm connect r1
>
>If you're still alive, and assuming you're running primary/primary, run
>the following on both nodes:
>
>drbdadm primary r1
>
>If you have the default sync rate (I didn't see your global config),
>then try notching up the sync speed ~10M at a time to see if it's a
>failure triggered by network or read/write speeds:
>
>drbdsetup /dev/drbd1 syncer -r 10M (20M, 30M, ...)
>
>At this stage, you've effectively done everything that '/etc/init.d/drbd
>start' does. When it fails, report at what step it failed and what, if
>anything, was shown in either /proc/drbd or /var/log/messages.
>
>HTH





More information about the drbd-user mailing list