[DRBD-user] Kernel panic when using drbd together with dlm

Vladislav Bogdanov bubble at hoster-ok.com
Tue Aug 24 15:31:31 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi list,

Investigating a kernel panic probably caused by drbdadm when using drbd
together with DLM (OCFS2, GFS2, CLVM), corosync and pacemaker.
I use master-master resources with resource-level fencing.
If I enable DLM with pcmk userspace stack, then on crash (of IPMI power
reset command) of one node another node goes to kernel panic. Completely
the same happens with BOTH nodes if I try to shutdown one node gracefully.

Systems are Fedora 13 x86_64 SMP, all kernels both from stable and
testing updates were tried.
DRBD is either bundled with 33 and 34 kernels (8.3.7 I think) or 8.3.8.1
self-built, doesn't matter, crash is 100% reprodusible. drbd-utils both
from fedora and 8.3.8.1 were checked.

Corosync 1.2.3 - 1.2.7, pacemaker 1.0.8 - 1.1.3hg were tried.
openAIS 1.0.3.

I attach a crash info caught in IPMI console inline. Would you please
tell what additional information may be needed from me to catch what's
really happens?

Google is completely silent, seems only I have this. Possible suspects
are DRDB and DLM, everything works as it should if I unload dlm. But
crash always happens when kernel is running in drbdadm process,
vfs_ioctl syscall.

One more note, drbd resources are connected via loopback /32 interfaces,
routed through two real ethernet segments for redundancy (one is
dedicated intel 10Gbps adapters back-to-back, another is VLAN over
second pair of 10Gbps adapters connected to switch stack). Routing is
done with OSPF.
Corosync is configured to use redundant ring over that two circuits as well.

I can provide more detailed network schema and connectivity setup info
if needed.

Thanks for reading this and best regards,

Vladislav Bogdanov

====================
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
PGD 2153a6067 PUD 211c8e067 PMD 0
Oops: 0000 [#1]
block drbd2: PingAck did not arrive in time.
block drbd2: peer( Primary -> Unknown ) conn( Connected ->
NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
block drbd2: asender terminated
block drbd2: Terminating asender thread
block drbd2: short read expecting header on sock: r=-512
block drbd2: Creating new current UUID
block drbd2: Connection closed
block drbd2: helper command: /sbin/drbdadm fence-peer minor-2
block drbd3: PingAck did not arrive in time.
block drbd3: peer( Primary -> Unknown ) conn( Connected ->
NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
block drbd3: asender terminated
block drbd3: Terminating asender thread
block drbd3: Creating new current UUID
block drbd3: short read expecting header on sock: r=-512
block drbd3: Connection closed
block drbd3: helper command: /sbin/drbdadm fence-peer minor-3
SMP
last sysfs file: /sys/module/drbd/parameters/cn_idx
CPU 3
Modules linked in: sctp libcrc32c dlm configfs iscsi_trgt drbd
ebtable_nat ebtables ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler
sunrpc 8021q garp bridge stp llc dummy bonding xt_multiport ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm ixgbe
igb mdio i2c_i801 i5400_edac ioatdma edac_core i2c_core shpchp i5k_amb
dca serio_raw 3w_9xxx [last unloaded: scsi_wait_scan]

Pid: 13721, comm: drbdadm Tainted: G        W  2.6.34.3-37.fc13.x86_64
#1 X7DWN+/X7DW3
RIP: 0010:[<ffffffff8139460e>]  [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
RSP: 0018:ffff880200e89e38  EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000005401 RCX: 00007fff417f5970
RDX: 00007fff417f5970 RSI: 0000000000005401 RDI: ffff88020273ea80
RBP: ffff880200e89e68 R08: 00007fff417f59b0 R09: 00007f71565e58e0
R10: 00007fff417f5780 R11: 0000000000000202 R12: 0000000000005401
R13: 00007fff417f5970 R14: ffff8802240cb9c0 R15: 0000000000000000
FS:  00007f71567ea700(0000) GS:ffff8800020c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 0000000213a7c000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process drbdadm (pid: 13721, threadinfo ffff880200e88000, task
ffff880216278000)
Stack:
 ffff880200e89ef8 ffffffff810e4dc5 ffff88020273ea80 0000000000005401
<0> 00007fff417f5970 00007fff417f5970 ffff880200e89e98 ffffffff8111a74b
<0> 00000000006245c0 ffff88020273ea80 ffff8802240cba08 00007fff417f5970
Call Trace:
 [<ffffffff810e4dc5>] ? handle_mm_fault+0x3ff/0x91c
 [<ffffffff8111a74b>] vfs_ioctl+0x32/0xa6
 [<ffffffff8111acbe>] do_vfs_ioctl+0x483/0x4c9
 [<ffffffff8111ad5a>] sys_ioctl+0x56/0x79
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Code: 5d 41 5e 41 5f c9 c3 55 48 89 e5 41 56 41 55 41 54 53 48 83 ec 10
0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d5 49 8b 46 38 <4c> 8b
60 40 8d 83 10 76 ff ff 83 f8 0f 76 0d 8d 83 00 75 ff ff
RIP  [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
 RSP <ffff880200e89e38>
CR2: 0000000000000040
BUG: unable to handle kernel
---[ end trace 377dac38faa83a0d ]---
Kernel panic - not syncing: Fatal exception
Pid: 13721, comm: drbdadm Tainted: G      D W  2.6.34.3-37.fc13.x86_64 #1
Call Trace:
 [<ffffffff8144a6c3>] panic+0x78/0xf8
 [<ffffffff8144d9c8>] ? oops_end+0x73/0xc7
 [<ffffffff8144da0c>] oops_end+0xb7/0xc7
 [<ffffffff81030bf9>] no_context+0x1fc/0x20b
 [<ffffffff81030d8c>] __bad_area_nosemaphore+0x184/0x1a7
 [<ffffffff81030e0b>] bad_area+0x47/0x4e
 [<ffffffff8144fbf5>] do_page_fault+0x20b/0x2bb
 [<ffffffff8144ce75>] page_fault+0x25/0x30
 [<ffffffff8139460e>] ? sock_ioctl+0x24/0x21c
 [<ffffffff810e4dc5>] ? handle_mm_fault+0x3ff/0x91c
 [<ffffffff8111a74b>] vfs_ioctl+0x32/0xa6
 [<ffffffff8111acbe>] do_vfs_ioctl+0x483/0x4c9
 [<ffffffff8111ad5a>] sys_ioctl+0x56/0x79
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
NULL pointer dereference at 0000000000000040
IP: [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
PGD 20d5f1067 PUD 2025d6067 PMD 0
Oops: 0000 [#2] SMP
last sysfs file: /sys/module/drbd/parameters/cn_idx
CPU 2
Modules linked in: sctp libcrc32c dlm configfs iscsi_trgt drbd
ebtable_nat ebtables ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler
sunrpc 8021q garp bridge stp llc dummy bonding xt_multiport ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm ixgbe
igb mdio i2c_i801 i5400_edac ioatdma edac_core i2c_core shpchp i5k_amb
dca serio_raw 3w_9xxx [last unloaded: scsi_wait_scan]

Pid: 13726, comm: drbdadm Tainted: G      D W  2.6.34.3-37.fc13.x86_64
#1 X7DWN+/X7DW3
RIP: 0010:[<ffffffff8139460e>]  [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
RSP: 0018:ffff8801ff5d5e38  EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000005401 RCX: 00007fff0b07d740
RDX: 00007fff0b07d740 RSI: 0000000000005401 RDI: ffff88020273ea80
RBP: ffff8801ff5d5e68 R08: 00007fff0b07d780 R09: 00007f77465f78e0
R10: 00007fff0b07d550 R11: 0000000000000202 R12: 0000000000005401
R13: 00007fff0b07d740 R14: ffff8802240cb9c0 R15: 0000000000000000
FS:  00007f77467fc700(0000) GS:ffff880002080000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 0000000200e4e000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process drbdadm (pid: 13726, threadinfo ffff8801ff5d4000, task
ffff8802083d4650)
Stack:
 ffff8801ff5d5ef8 ffffffff810e4dc5 ffff88020273ea80 0000000000005401
<0> 00007fff0b07d740 00007fff0b07d740 ffff8801ff5d5e98 ffffffff8111a74b
<0> 00000000006245c0 ffff88020273ea80 ffff8802240cba08 00007fff0b07d740
Call Trace:
 [<ffffffff810e4dc5>] ? handle_mm_fault+0x3ff/0x91c
 [<ffffffff8111a74b>] vfs_ioctl+0x32/0xa6
 [<ffffffff8111acbe>] do_vfs_ioctl+0x483/0x4c9
 [<ffffffff8111ad5a>] sys_ioctl+0x56/0x79
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Code: 5d 41 5e 41 5f c9 c3 55 48 89 e5 41 56 41 55 41 54 53 48 83 ec 10
0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d5 49 8b 46 38 <4c> 8b
60 40 8d 83 10 76 ff ff 83 f8 0f 76 0d 8d 83 00 75 ff ff
RIP  [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
 RSP <ffff8801ff5d5e38>
CR2: 0000000000000040
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
PGD 202735067 PUD 2151db067 PMD 0
Oops: 0000 [#3] SMP
last sysfs file: /sys/module/drbd/parameters/cn_idx
CPU 0
Modules linked in: sctp libcrc32c dlm configfs iscsi_trgt drbd
ebtable_nat ebtables ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler
sunrpc 8021q garp bridge stp llc dummy bonding xt_multiport ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm ixgbe
igb mdio i2c_i801 i5400_edac ioatdma edac_core i2c_core shpchp i5k_amb
dca serio_raw 3w_9xxx [last unloaded: scsi_wait_scan]

Pid: 13747, comm: drbdadm Tainted: G      D W  2.6.34.3-37.fc13.x86_64
#1 X7DWN+/X7DW3
RIP: 0010:[<ffffffff8139460e>]  [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
RSP: 0018:ffff880200bf9e38  EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000005401 RCX: 00007fffb7953910
RDX: 00007fffb7953910 RSI: 0000000000005401 RDI: ffff88020273ea80
RBP: ffff880200bf9e68 R08: 00007fffb7953950 R09: 00007faf81b6b8e0
R10: 00007fffb7953720 R11: 0000000000000206 R12: 0000000000005401
R13: 00007fffb7953910 R14: ffff8802240cb9c0 R15: 0000000000000000
FS:  00007faf81d70700(0000) GS:ffff880002000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 0000000211c27000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process drbdadm (pid: 13747, threadinfo ffff880200bf8000, task
ffff880216b89770)
Stack:
 ffff880200bf9ef8 ffffffff810e4dc5 ffff88020273ea80 0000000000005401
<0> 00007fffb7953910 00007fffb7953910 ffff880200bf9e98 ffffffff8111a74b
<0> 00000000006245c0 ffff88020273ea80 ffff8802240cba08 00007fffb7953910
Call Trace:
 [<ffffffff810e4dc5>] ? handle_mm_fault+0x3ff/0x91c
 [<ffffffff8111a74b>] vfs_ioctl+0x32/0xa6
 [<ffffffff8111acbe>] do_vfs_ioctl+0x483/0x4c9
 [<ffffffff8111ad5a>] sys_ioctl+0x56/0x79
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Code: 5d 41 5e 41 5f c9 c3 55 48 89 e5 41 56 41 55 41 54 53 48 83 ec 10
0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d5 49 8b 46 38 <4c> 8b
60 40 8d 83 10 76 ff ff 83 f8 0f 76 0d 8d 83 00 75 ff ff
RIP  [<ffffffff8139460e>] sock_ioctl+0x24/0x21c
 RSP <ffff880200bf9e38>
CR2: 0000000000000040
================



More information about the drbd-user mailing list