[DRBD-user] DRBD 8.2 crashes CentOS 5.2 on rsync from remote host

Lee Christie Lee at titaninternet.co.uk
Sun Aug 17 19:37:38 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


As an FYI we're running

mySQL HA Cluster
and
NFS HA Cluster

All 4 boxes are running Centos 5.2 (2.6.18-92.1.6.el5) and DRBD 8.2.6.

The mySQL cluster consists of a single DRBD resource whereas the NFS box
has 8 resources running on top of LVMs.

We've done silly amounts of testing and the boxes are now into
production. Zero issues.

Why are you running the Centos plus kernel out of interest ? What do you
need that isn't in the stock one ?

Hardware wise we're a supermicro house so its an X7DWN+ mobo, pair of
quad-core xeons, adaptec 5805 controllers. The mySQL servers run SAS 15K
disks whereas the NFS box is a hybrid of SAS 15K in RAID1 for the
OS/metadata and SATA in R10.

All I'd say (a bit of a sweeping generalisation but hey) - is that of
the 1000+ servers we've built, when it comes to high file I/O we always
seem to have issues with 3Ware. I'm not saying that's the case here but
just a comment I'd make.

> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com 
> [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lars 
> Ellenberg
> Sent: 17 August 2008 17:52
> To: drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] DRBD 8.2 crashes CentOS 5.2 on rsync 
> from remote host
> 
> On Wed, Aug 13, 2008 at 03:00:52PM -0700, Chris Miller wrote:
> >
> > I've got a pair of HA servers I'm trying to get into production.
> > Here are some specs :
> >
> > Xeon X3210 Quad Core (aka Core 2 Quad) 2.13Ghz (four logical
> > processors, no Hyper Threading)
> > 4GB memory
> > Hardware (3ware) Raid 1 mirror, 2 x Seagate 750GB SATA2
> > 650GB DRBD partition run on top of an LVM2 partition.
> >
> > CentOS 5.2 2.6.18-92.1.6.el5.centos.plus
> > DRBD 8.2 (drbd82-8.2.6-1.el5.centos)
> > Kernel Module kmod-drbd82-8.2.6-1.2.6.18_92.1.6.el5.centos.plus
> >
> > I've been trying to rsync data from a remote server and it's crashed
> > a couple of times now. It does not happen immediately, but over
> > time. I connected a serial console and got the below panic message.
> > The last file copied was ~1GB in size, but previous files up to 4GB
> > had been copied. I do not have kernel core dumping enabled, but
> > that's a possibility if needed. Not sure if this is a bug or is
> > caused by something I've done. This isn't my first DRBD install
> > (although first on top of LVM) and I believe I've gotten everything
> > setup correctly. I did have a full sync rate (110M) enabled over
> > Gbe, if that's relevant. Thoughts?
> >
> > Regards,
> > 	Chris
> 
> 
> > [root at haws1 ~]# BUG: unable to handle kernel paging request at
> > virtual address c
> >  printing eip:
> > c04e9291
> > *pde = 00000000
> > Oops: 0000 [#1]
> > SMP
> > last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
> > Modules linked in: softdog drbd(U) autofs4 hidp rfcomm l2cap
> > bluetooth sunrpc id
> > CPU:    0
> > EIP:    0060:[<c04e9291>]    Tainted: G      VLI
> > EFLAGS: 00010046   (2.6.18-92.1.6.el5.centos.plus #1)
> 
> > EIP is at list_del+0x25/0x5c 
> 
> in case this Oops can be trusted,
> this list_del aparently dereferences a NULL pointer.
> 
> > eax: fe187128   ebx: f04a6ab8   ecx: f04a6a8c   edx: f04a6a8c
> > esi: fe187128   edi: f4e355a0   ebp: f426c800   esp: f385df3c
> > ds: 007b   es: 007b   ss: 0068
> > Process drbd0_asender (pid: 2900, ti=f385d000 task=f4932000
> > task.ti=f385d000)
> > Stack: 000000e6 f8d1953b 00000000 f04a6a8c 000000e6 
> 00000001 ee187b14 00000046
> >        f49e7bc0 f04a6ab8 f04a6a8c f426c800 fe187128 
> f4e355a0 0000349f f8d24805
> >        00000800 f426c800 f426c800 00000008 f426c9f4 
> f8d14d47 f385dfbc f8d15fbc
> > Call Trace:
> 
> >  [<f8d1953b>] _req_may_be_done+0x4ea/0x710 [drbd]
> 
> and according to this stack trace,
> it is the "list_del(&req->tl_requests)" in _req_is_done()
> 
> this list is protected by the req_lock  spinlock.
> 
> "tl" is short for "transfer log", which is the main housekeeping list
> structure we have for replication requests, so it is modified all the
> time.  If we had a list corruption bug there, someone else should have
> noticed.  I have no idea how a NULL pointer could get there.
> 
> try to reproduce with a differnt kernel
> or with different hardware.
> 
> >  [<f8d24805>] tl_release+0x35/0x172 [drbd]
> >  [<f8d14d47>] got_BarrierAck+0x10/0x6b [drbd]
> >  [<f8d15fbc>] drbd_asender+0x3b1/0x4e7 [drbd]
> >  [<f8d24a53>] drbd_thread_setup+0x0/0x14e [drbd]
> >  [<f8d24adb>] drbd_thread_setup+0x88/0x14e [drbd]
> >  [<f8d24a53>] drbd_thread_setup+0x0/0x14e [drbd]
> >  [<c0405c3b>] kernel_thread_helper+0x7/0x10
> >  =======================
> > Code: 89 c3 eb eb 90 90 53 89 c3 8b 40 04 8b 00 39 d8 74 17 50 53 68
> > 9b 9a 63 c
> 
> 
> 
> -- 
> : Lars Ellenberg                
> : LINBIT HA-Solutions GmbH
> : DRBD(r)/HA support and consulting    http://www.linbit.com
> 
> DRBD(r) and LINBIT(r) are registered trademarks
> of LINBIT Information Technologies GmbH
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 

-------------------------------------------------------------------------------

This email may contain legally privileged and/or confidential information. It is solely for and is confidential for use by the addressee. Unauthorised recipients must preserve, observe and respect this confidentiality. If you have received it in error please notify us and delete it from your computer. Do not discuss, distribute or otherwise copy it. 

Unless expressly stated to the contrary this e-mail is not intended to, and shall not, have any contractually binding effect on the Company and its clients. We accept no liability for any reliance placed on this e-mail other than to the intended recipient. If the content is not about the business of this Company or its clients then the message is neither from nor sanctioned by the Company. 

We accept no liability or responsibility for any changes made to this e-mail after it was sent or any viruses transmitted through this e-mail or any attachment. It is your responsibility to satisfy yourself that this e-mail or any attachment is free from viruses and can be opened without harm to your systems.




More information about the drbd-user mailing list