[DRBD-user] System lockup with DRBD

Sun Nov 7 03:32:12 CET 2010

Not having much luck pursuing this - on a newer OS, I cannot even
run DRBD - I get an immediate kernel panic.

I was hopeful that a newer OS on this VIA EPIA-M800 hardware
(with OCZ Vertex-Turbo SSD) might solve the intermittent lockup
problem.  I chose Fedora 13.  I loaded it from the DVD, put
drbd-8.3.9.tar.gz on it and did:

./configure --prefix=/usr/local --sbindir=/usr/local/sbin
--localstatedir=/var --sysconfdir=/etc --without-heartbeat
--without-pacemaker --without-xen
make clean
make
make install
chkconfig --add drbd

Cleared and initialized storage via:

dd if=/dev/zero bs=1M count=1 of=/dev/sda2
drbdadm create-md r0

When I then did "service drbd start", I got a kernel panic.

I tried 8.3.7 instead, same result.  Went back to 8.3.9, added
"--with-km" to configure, same result.  I played with the
configuration file - if it didn't define a valid resource, no
kernel panic, otherwise it crashes.  I also tried "yum update"
which took the kernel from 2.6.33.3-85.fc13.i686.PAE to
2.6.34.7-61.fc13.i686.PAE and after again compiling/installing
drbd, it still crashes.  The syslog never has a record of the
panic details.

The strange thing is, the older CentOS 5.5 with drbd 8.3.9 on the
same hardware works fine except for the within-a-day lockup
problem.

Configuration: NetworkManager service turned off, network on,
ifcfg-eth1 has IP=10.0.1.151, sysconfig/network has hostname set
to f13-1.sync, hosts file has that IP and name.  Partner unit is
not yet set up.  drbd.conf (minimal for testing):

resource r0 {
    protocol C;
    on f13-1.sync {
        device     /dev/drbd1;
        disk       /dev/sda2;
        address    10.0.1.151:7788;
        meta-disk  internal;
    }
    on f13-2.sync {
        device    /dev/drbd1;
        disk      /dev/sda2;
        address   10.0.1.152:7788;
        meta-disk internal;
    }
}

chambal <2iow-li6l at dea.spamcon.org> wrote:

>"Robert Dunkley" <Robert at saq.co.uk> wrote:
>
>>Can you try with Intel Nics installed in those Via boards? NICs would be
>>my first choice if the problem is hardware related. I have used DRBD
>>with Intel SSDs, works fine.
>
>What hardware and OS/kernel did you use with the SSDs?
>
>Thanks for the NIC idea.  Found some Intel PCI Ethernet cards but
>they don't fit in the Mini-ITX, have ordered extenders so I can
>try them.
>
>In the meantime, checked the network drivers - the ones included
>in CentOS5.5 for the Via Velocity (VT6120/VT6121/VT6122) show
>V1.13 in the syslog startup messages.  Checking VIA's site, there
>were newer ones for this chipset, the Linux part is V1.30.
>Installed and made active, rebooted, verified it shows V1.30.
>
>Unfortunately these didn't solve the lockup problem.  They did
>solve a problem seen when I was doing intensive read/write
>testing on the DRBD shared partition, where I saw frequent:
>
>   eth1: excessive work at interrupt
>
>in both the Primary and Secondary syslogs.  This newer driver
>solved that, no more such messages.  But it doesn't solve the
>core problem.
>
>
>I am wondering if there is a combination of standard GNU/Linux
>command-line tools that could be used in a script to work with
>the disk and network to approximate how DRBD interacts with the
>system.  If this were possible, and I could trigger the problem
>this way, it would at least let me demonstrate that the problem
>is not "something with DRBD".
>
>>
>>-----Original Message-----
>>From: drbd-user-bounces at lists.linbit.com
>>[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of chambal
>>Sent: 29 October 2010 09:50
>>To: drbd-user at lists.linbit.com
>>Subject: Re: [DRBD-user] System lockup with DRBD
>>
>>chambal <2iow-li6l at dea.spamcon.org> wrote:
>>
>>>I have a pair of VIA M800 Mini-ITX with SSD (one OCZ
>>>Vertex-Turbo, one Intel), and CentOS 5.5 with current patches.
>>>
>>>When I have DRBD active on both units, at some random point but
>>>always within one day, one of the units has completely locked up.
>>>In all but one case, it's the Primary unit.
>>>
>>>When I say locked up, I mean the PC is completely frozen -
>>>keyboard is dead (can't toggle numlock, and Alt-SysRq - which is
>>>enabled - doesn't work), there's no kernel panic dump on the
>>>physical console, there's no response to tapping the power
>>>switch, and it can't be pinged.  There's nothing in the syslog
>>>after it's forcibly rebooted.
>>>
>>>Possibly important clue: the front panel LED for hard disk
>>>activity is solidly on when the failure occurs.
>>>
>>>When I have DRBD running on only the active (Primary) unit (did
>>>"service drbd stop" on the inactive (Secondary) unit), this
>>>lockup never occurs.
>>>
>>>There is not very much disk read/write activity on the shared
>>>partition.  Both units are on the same local private LAN segment.
>>>
>>>Originally I was using DRBD 8.0.1 (which didn't have this problem
>>>on different much older hardware and OS), then updated to DRBD
>>>8.0.16, then yesterday to 8.3.9.  No difference in the problem.
>>>Because the kernel is 2.6.18-194.17.1.el5 I still have to use a
>>>kernel module.
>>>
>>>I am rather lost on how to proceed in tracking down the cause of
>>>this problem or a solution.
>>
>>I received an email response from someone running the exact same CentOS
>>5.5 and kernel version, and DRBD 8.3.9.  So this would seem to point to
>>the hardware, or an interaction between the hardware and software.
>>
>>Has anyone run DRBD on a VIA EPIA-M800 Mini-ITX?
>>
>>Has anyone run DRBD on SSD?
>>
>>_______________________________________________
>>drbd-user mailing list
>>drbd-user at lists.linbit.com
>>http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>The SAQ Group
>>
>>Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ
>>SAQ is the trading name of SEMTEC Limited. Registered in England & Wales
>>Company Number: 06481952
>>
>>http://www.saqnet.co.uk AS29219
>>
>>SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business.
>>
>>Broadband : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support.
>>
>>ISPA Member