[DRBD-user] Kernel panic while drbd becomes primary

Radu Radutiu rradutiu at gmail.com
Tue Jun 17 15:47:25 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I get the following kernel panic (and automatic reboot) while the node is
becoming primary. The info below appears on the console then the node
reboots instantly (I have captured it using a serial port console).
Here is the full output from tail -f /var/log/messages :

Jun 17 15:36:38 NODE-CONTB kernel: drbd: initialized. Version: 8.4.4
(api:1/proto:86-101)
Jun 17 15:36:38 NODE-CONTB kernel: drbd: GIT-hash:
599f286440bd633d15d5ff985204aff4bccffadd build by phil at Build64R6,
2013-10-14 15:33:06
Jun 17 15:36:38 NODE-CONTB kernel: drbd: registered as block device major
147
Jun 17 15:36:38 NODE-CONTB multipathd: drbd0: add path (uevent)
Jun 17 15:36:38 NODE-CONTB multipathd: drbd0: failed to get path uid
Jun 17 15:36:38 NODE-CONTB multipathd: uevent trigger error
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: Starting worker thread
(from drbdsetup [7425])
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: disk( Diskless -> Attaching
)
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: Method to ensure write
ordering: drain
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: max BIO size = 1048576
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: drbd_bm_resize called with
capacity == 209708728
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: resync bitmap:
bits=26213591 words=409588 pages=800
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: size = 100 GB (104854364 KB)
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: bitmap READ of 800 pages
took 30 jiffies
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: recounting of set bits took
additional 4 jiffies
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: 104 MB (26624 bits) marked
out-of-sync by on disk bit-map.
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: disk( Attaching -> UpToDate
)
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: attached to UUIDs
825D19B481BE18C5:0000000000000000:85A1FF27F6205E06:7A8DAA94E8F6092A
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: conn( StandAlone ->
Unconnected )
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: Starting receiver thread
(from drbd_w_repdata [7427])
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: receiver (re)started
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: conn( Unconnected ->
WFConnection )
WARN: stdin/stdout is not a TTY; using /dev/consoleJun 17 15:36:38
NODE-CONTB kernel: drbd repdata: Handshake successful: Agreed network
protocol version 101
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: Agreed to support TRIM on
protocol level
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: Peer authenticated using
20 bytes HMAC
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: conn( WFConnection ->
WFReportParams )
Jun 17 15:36:38 NODE-CONTB kernel: drbd repdata: Starting asender thread
(from drbd_r_repdata [7442])
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: drbd_sync_handshake:
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: self
825D19B481BE18C4:0000000000000000:85A1FF27F6205E06:7A8DAA94E8F6092A
bits:26624 flags:0
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: peer
825D19B481BE18C4:0000000000000000:85A1FF27F6205E06:7A8DAA94E8F6092A bits:0
flags:0
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: uuid_compare()=1 by rule 40
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: peer( Unknown -> Secondary
) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 133(1), total 133; compression: 100.0%
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 133(1), total 133; compression: 100.0%
Jun 17 15:36:38 NODE-CONTB kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-source minor-0
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff81446dd0>] sock_ioctl+0x30/0x280
PGD 900ece067 PUD 900eda067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/drbd0/range
CPU 25
Modules linked in: drbd(U) autofs4 sctp libcrc32c dlm configfs
dm_round_robin cpufreq_ondemand freq_table pcc_cpufreq bonding 8021q garp
stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i
cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad
ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
dm_multipath microcode power_meter iTCO_wdt iTCO_vendor_support hpilo hpwdt
osst st ch tg3 ptp pps_core sg serio_raw lpc_ich mfd_core ioatdma dca
shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom mptsas mptscsih
mptbase scsi_transport_sas hpsa pata_acpi ata_generic ata_piix dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 7451, comm: drbdadm Not tainted 2.6.32-431.5.1.el6.x86_64 #1 HP
ProLiant DL360p Gen8
RIP: 0010:[<ffffffff81446dd0>]  [<ffffffff81446dd0>] sock_ioctl+0x30/0x280
RSP: 0018:ffff880900f7de38  EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000005401 RCX: 00007fff07020d00
RDX: 00007fff07020d00 RSI: 0000000000005401 RDI: ffff88092cbbe980
RBP: ffff880900f7de58 R08: ffffffff8166f200 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 00007fff07020d00
R13: 00007fff07020d00 R14: ffff8809295c8780 R15: 0000000000000000
FS:  00007fb831ebf700(0000) GS:ffff880049320000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 000000092acb2000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process drbdadm (pid: 7451, threadinfo ffff880900f7c000, task
ffff880914940080)
Stack:
 ffff88092cbbe980 ffff8809295c87c8 00007fff07020d00 0000000000000000
<d> ffff880900f7de98 ffffffff8119db32 ffff8809304c4880 ffff88092d02c780
<d> 0000000000000000 0000000000000000 0000000000000000 ffff88092cbbe980
Call Trace:
 [<ffffffff8119db32>] vfs_ioctl+0x22/0xa0
 [<ffffffff8119dcd4>] do_vfs_ioctl+0x84/0x580
 [<ffffffff8119e251>] sys_ioctl+0x81/0xa0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 74 24 18 0f
1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 <4c> 8b 68 38
8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff
RIP  [<ffffffff81446dd0>] sock_ioctl+0x30/0x280
 RSP <ffff880900f7de38>
CR2: 0000000000000038


The system is running RHEL 6.5 with DRBD 8.4 from EPEL. In this setup I
have DRBD replicating between two SAN-backed RedHat clusters (similar with
the setup from
http://www.drbd.org/users-guide/s-pacemaker-floating-peers.html but with RH
cluster instead of Pacemaker). The kernel panic occurs on the second node
while trying to start the drbd service after I forcefully reboot the first
cluster node that was running DRBD.  I can failover gracefully the DRBD
service between the two nodes but rebooting one node will cause a kernel
panic on the other node while starting DRBD.
I am not even sure that the problem is in DRBD or in the kernel but the
process seems to be drbdadm.
Have you seen something similar?

Best regards,

Radu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140617/d130dbb6/attachment.htm>


More information about the drbd-user mailing list