[DRBD-user] Kernel oops with drbd 8.0.14

Ronald Moesbergen R.Moesbergen at intercommit.nl
Mon Feb 9 12:28:18 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello!

I'm using DRBD 8.0.14 on a Xen 3.3.1 x86_64 cluster for disk replication. Over the last few weeks the systems have been crashing, particularly under load. I have captured the following using netconsole:

Unable to handle kernel paging request at 0000180017006702 RIP:
 [<ffffffff80262ceb>] put_page+0x0/0x2e
PGD 0
Oops: 0000 [1] SMP
CPU 3
Modules linked in: netconsole ip_vs_wrr ip_vs xt_physdev iptable_filter ip_tables x_tables drbd bridge button ac battery ipmi_devintf ipmi_si ipmi_msghandler e1000e serio_raw pcsp$
Pid: 4044, comm: drbd1_receiver Tainted: GF     2.6.18.8-xen #1
RIP: e030:[<ffffffff80262ceb>]  [<ffffffff80262ceb>] put_page+0x0/0x2e
RSP: e02b:ffff88026548bba8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880192b24880 RCX: 0000000000000027
RDX: ffff88026e63d680 RSI: ffff8801b1fe8380 RDI: 0000180017006702
RBP: 0000000000000001 R08: 010100004600f501 R09: 0000000000000018
R10: ffffffff8049dd80 R11: ffff8802672ba8f8 R12: 0000000000000018
R13: 0000000000000018 R14: 0000000000000000 R15: 0000000000000000
FS:  00002ac6a4f7e010(0000) GS:ffffffff804d8180(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process drbd1_receiver (pid: 4044, threadinfo ffff88026548a000, task ffff880265a017a0)
Stack:  ffffffff80395b3c ffff880192b24880 ffff880192b24880 ffff880192b24880
 ffffffff80395911 ffff88016fe34d80 ffffffff803c35e0 0000410000000008
 ffff88026548be40 0000001800000000 ffff88016fe35280 0000001800000000
Call Trace:
 [<ffffffff80395b3c>] skb_release_data+0x61/0x9c
 [<ffffffff80395911>] kfree_skbmem+0x9/0x75
 [<ffffffff803c35e0>] tcp_recvmsg+0x72e/0xb05
 [<ffffffff80392091>] sock_common_recvmsg+0x2d/0x42
 [<ffffffff80392091>] sock_common_recvmsg+0x2d/0x42
 [<ffffffff8038fee9>] sock_recvmsg+0x101/0x120
 [<ffffffff8038fee9>] sock_recvmsg+0x101/0x120
 [<ffffffff8024327a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff802dd2d5>] generic_make_request+0x15f/0x174
 [<ffffffff880c52bf>] :dm_mod:__map_bio+0x47/0x9b
 [<ffffffff8817cb26>] :drbd:drbd_recv+0x7b/0x109
 [<ffffffff88180cd0>] :drbd:receive_DataRequest+0x72/0x575
 [<ffffffff8817d4da>] :drbd:drbdd+0x77/0x151
 [<ffffffff881801f2>] :drbd:drbdd_init+0xbe/0x1ab
 [<ffffffff88190440>] :drbd:drbd_thread_setup+0x11c/0x1c6
 [<ffffffff8020af54>] child_rip+0xa/0x12
 [<ffffffff88190324>] :drbd:drbd_thread_setup+0x0/0x1c6
 [<ffffffff8020af4a>] child_rip+0x0/0x12

Code: 8b 07 f6 c4 40 74 05 e9 62 f9 ff ff 8b 47 08 85 c0 75 0a 0f
RIP  [<ffffffff80262ceb>] put_page+0x0/0x2e
 RSP <ffff88026548bba8>
CR2: 0000180017006702

Since there are a lot of drbd related functions listed, I suspect the problem originates there. The logging shows nothing interesting around the time of the crash, just a spontaneous crash/reboot. The 'receiver1' thread belongs to a 'zabbix' DomU, which runs a mysql database that causes most of the io load.

The excact DRBD version I use is:
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by beheer at xen03, 2008-11-20 11:25:39
And the drbd.conf file is attached.

As you can see the kernel tainted flags say 'GF'. I have searched where this comes from, but can't find anything. We only run open-source stuff on these machines, so no scary binary only modules. Also, a dmesg|grep -i 'tainted' returns nothing, and cat /proc/sys/kernel/tainted returns '0'. Go figure... My guess is that some startup script uses 'insmod -f' while it's not needed.

The kernel I'm using is here: http://xenbits.xensource.com/linux-2.6.18-xen.hg. It's quite old, and I'd love to upgrade to something recent, but this is the only kernel that can run as Dom0 with a recent version of Xen  that I can find.

My questions are:
- Is this related to DRBD or should I go and bug the xen guys?
- Would upgrading to 8.2.x or even 8.3.x help?

Thanks in advance for your help.

Kind Regards,
Ronald Moesbergen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbd.conf
Type: application/octet-stream
Size: 4228 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090209/b3bd8c2d/attachment.obj>


More information about the drbd-user mailing list