[DRBD-user] Kernel oops with drbd 8.0.14

Ronald Moesbergen R.Moesbergen at intercommit.nl
Mon Feb 9 16:32:15 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> On Mon, Feb 09, 2009 at 12:28:18PM +0100, Ronald Moesbergen wrote:
> > Hello!
> > 
> > I'm using DRBD 8.0.14 on a Xen 3.3.1 x86_64 cluster for disk 
> > replication. Over the last few weeks the systems have been 
> crashing, 
> > particularly under load. I have captured the following using
> > netconsole:
> > 
> > Unable to handle kernel paging request at 0000180017006702 RIP:
> >  [<ffffffff80262ceb>] put_page+0x0/0x2e PGD 0
> > Oops: 0000 [1] SMP
> > CPU 3
> > Modules linked in: netconsole ip_vs_wrr ip_vs xt_physdev 
> > iptable_filter ip_tables x_tables drbd bridge button ac battery 
> > ipmi_devintf ipmi_si ipmi_msghandler e1000e serio_raw pcsp$
> > Pid: 4044, comm: drbd1_receiver Tainted: GF     2.6.18.8-xen #1
> > RIP: e030:[<ffffffff80262ceb>]  [<ffffffff80262ceb>] 
> put_page+0x0/0x2e
> > RSP: e02b:ffff88026548bba8  EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffff880192b24880 RCX: 0000000000000027
> > RDX: ffff88026e63d680 RSI: ffff8801b1fe8380 RDI: 0000180017006702
> > RBP: 0000000000000001 R08: 010100004600f501 R09: 0000000000000018
> > R10: ffffffff8049dd80 R11: ffff8802672ba8f8 R12: 0000000000000018
> > R13: 0000000000000018 R14: 0000000000000000 R15: 0000000000000000
> > FS:  00002ac6a4f7e010(0000) GS:ffffffff804d8180(0000) 
> > knlGS:0000000000000000
> > CS:  e033 DS: 0000 ES: 0000
> > Process drbd1_receiver (pid: 4044, threadinfo 
> ffff88026548a000, task 
> > ffff880265a017a0)
> > Stack:  ffffffff80395b3c ffff880192b24880 ffff880192b24880 
> > ffff880192b24880
> >  ffffffff80395911 ffff88016fe34d80 ffffffff803c35e0 
> 0000410000000008  
> > ffff88026548be40 0000001800000000 ffff88016fe35280 0000001800000000 
> > Call Trace:
> >  [<ffffffff80395b3c>] skb_release_data+0x61/0x9c  
> [<ffffffff80395911>] 
> > kfree_skbmem+0x9/0x75  [<ffffffff803c35e0>] 
> tcp_recvmsg+0x72e/0xb05  
> > [<ffffffff80392091>] sock_common_recvmsg+0x2d/0x42  
> > [<ffffffff80392091>] sock_common_recvmsg+0x2d/0x42  
> > [<ffffffff8038fee9>] sock_recvmsg+0x101/0x120  [<ffffffff8038fee9>] 
> > sock_recvmsg+0x101/0x120  [<ffffffff8024327a>] 
> > autoremove_wake_function+0x0/0x2e  [<ffffffff802dd2d5>] 
> > generic_make_request+0x15f/0x174  [<ffffffff880c52bf>] 
> > :dm_mod:__map_bio+0x47/0x9b  [<ffffffff8817cb26>] 
> > :drbd:drbd_recv+0x7b/0x109  [<ffffffff88180cd0>] 
> > :drbd:receive_DataRequest+0x72/0x575
> >  [<ffffffff8817d4da>] :drbd:drbdd+0x77/0x151  [<ffffffff881801f2>] 
> > :drbd:drbdd_init+0xbe/0x1ab  [<ffffffff88190440>] 
> > :drbd:drbd_thread_setup+0x11c/0x1c6
> >  [<ffffffff8020af54>] child_rip+0xa/0x12  [<ffffffff88190324>] 
> > :drbd:drbd_thread_setup+0x0/0x1c6  [<ffffffff8020af4a>] 
> > child_rip+0x0/0x12
> > 
> > Code: 8b 07 f6 c4 40 74 05 e9 62 f9 ff ff 8b 47 08 85 c0 75 
> 0a 0f RIP  
> > [<ffffffff80262ceb>] put_page+0x0/0x2e  RSP <ffff88026548bba8>
> > CR2: 0000180017006702
> > 
> > Since there are a lot of drbd related functions listed, I 
> suspect the 
> > problem originates there. The logging shows nothing 
> interesting around 
> > the time of the crash, just a spontaneous crash/reboot. The 
> > 'receiver1' thread belongs to a 'zabbix' DomU, which runs a mysql 
> > database that causes most of the io load.
> > 
> > The excact DRBD version I use is:
> > version: 8.0.14 (api:86/proto:86)
> > GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by 
> > beheer at xen03, 2008-11-20 11:25:39 And the drbd.conf file is 
> attached.
> > 
> > As you can see the kernel tainted flags say 'GF'. I have searched 
> > where this comes from, but can't find anything. We only run 
> > open-source stuff on these machines, so no scary binary 
> only modules.
> > Also, a dmesg|grep -i 'tainted' returns nothing, and cat 
> > /proc/sys/kernel/tainted returns '0'. Go figure... My guess is that 
> > some startup script uses 'insmod -f' while it's not needed.
> 
> then, how about grepping for that in those startup scripts?
> or in the initramdisk/initramfs?

I did but couldn't find anything. Also, after booting the machine and starting everything up, when I do a 'cat /proc/sys/kernel/tainted' it returns '0'. 

> > The kernel I'm using is here:
> > http://xenbits.xensource.com/linux-2.6.18-xen.hg. It's 
> quite old, and 
> > I'd love to upgrade to something recent, but this is the 
> only kernel 
> > that can run as Dom0 with a recent version of Xen  that I can find.
> > 
> > My questions are:
> > - Is this related to DRBD
> 
> maybe "related".
> if so, then probably only in a "garbage in, garbage out" sort of way.
> 
> > or should I go and bug the xen guys?
> 
> yes please,
> and let us know how that goes.

Ok, I will try that then.

> > - Would upgrading to 8.2.x or even 8.3.x help?
> 
> as I don't think DRBD does anything wrong in this case, I 
> don't think that upgrading will resolve this issue.
> 
> the tcp stack, in tcp_recvmsg, thinks it should clean up a 
> bit of the skb memory, calls kfree_skbmem, skb_release_data, 
> which then finds some pages it no longer needs, calls 
> put_page on those.  at which point dereferencing the page 
> pointer apparently causes the oops.
> 
> drbd does not touch those page pointers, it only passes them on.
> 
> and if it was an invalid page pointer when passing it to the 
> tcp stack, it would have exploded right then and there.
> so it was a valid page pointer once, but is invalid now.
> 
> from which I conclude that something causes memory corruption 
> on your box.
> 
> a kernel module that was forced to be loaded, despite not 
> matching the kernel, might do such thing.
> as would a number of other things, including bad hardware.

Interesting. Since both boxes crash now and then, I think we can rule out bad hardware. It must be the ancient xen kernel that bugs out on me again. Thanks for your help!

> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list