[DRBD-user] drbd kernel BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

France mailinglists at isg.si
Fri Mar 16 10:36:35 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

i'm hitting a bug in drbd, with latest CentOs and drbd 8.3.12 using GFS2 
on top with cman and rgmanager.

Here is the simplest method to have it occur.
1. Start drbd on node s2
2. Start drbd on node s3
They sync up:
[root at s3 ~]# cat /proc/drbd
version: 8.3.12 (api:88/proto:86-96)
GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by 
dag at Build64R6, 2011-11-20 10:57:03
  0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
     ns:0 nr:45060 dw:45056 dr:660 al:0 bm:11 lo:0 pe:0 ua:0 ap:0 ep:1 
wo:b oos:0
3. Start cman on s2 & s3, so i can use gfs2: cluster is up OK:
[root at s3 ~]# cman_tool status
Version: 6.2.0
Config Version: 8
Cluster Name: stor
Cluster Id: 61164
Cluster Member: Yes
Cluster Generation: 140
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node
Ports Bound: 0
Node name: s3alt.c.XX.si
Node ID: 3
Multicast addresses: 239.192.238.219 239.192.0.2
Node addresses: 192.168.168.3 10.31.0.42
4. Start gfs2 on both nodes:
Mar 16 10:29:41 s3 kernel: GFS2 (built Mar  7 2012 00:54:51) installed
Mar 16 10:29:41 s3 kernel: GFS2: fsid=: Trying to join cluster 
"lock_dlm", "stor:drbdstor"
Mar 16 10:29:41 s3 kernel: dlm: Using SCTP for communications
Mar 16 10:29:41 s3 kernel: SCTP: Hash tables configured (established 
65536 bind 65536)
Mar 16 10:29:41 s3 kernel: dlm: connecting to 2 sctp association 1
Mar 16 10:29:41 s3 kernel: GFS2: fsid=stor:drbdstor.1: Joined cluster. 
Now mounting FS...
Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1, already 
locked for use
Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1: Looking at 
journal...
Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1: Done
5. Stop gfs on s3 (didn't write anything to s2 or s3 on drbd mount while 
mounted)
6. Stop drbd on s3:
Mar 16 10:32:02 s3 kernel: block drbd0: role( Primary -> Secondary )
Mar 16 10:32:02 s3 kernel: block drbd0: bitmap WRITE of 0 pages took 0 
jiffies
Mar 16 10:32:02 s3 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync 
by on disk bit-map.
Mar 16 10:32:02 s3 kernel: block drbd0: Requested state change failed by 
peer: Refusing to be Primary while peer is not outdated (-7)
Mar 16 10:32:02 s3 kernel: block drbd0: peer( Primary -> Unknown ) conn( 
Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate 
-> DUnknown )
Mar 16 10:32:02 s3 kernel: block drbd0: asender terminated
Mar 16 10:32:02 s3 kernel: block drbd0: Terminating asender thread
Mar 16 10:32:02 s3 kernel: block drbd0: Connection closed
Mar 16 10:32:02 s3 kernel: block drbd0: conn( Disconnecting -> StandAlone )
Mar 16 10:32:02 s3 kernel: block drbd0: receiver terminated
Mar 16 10:32:02 s3 kernel: block drbd0: Terminating receiver thread
Mar 16 10:32:02 s3 kernel: block drbd0: disk( Outdated -> Failed )
Mar 16 10:32:02 s3 kernel: block drbd0: disk( Failed -> Diskless )
Mar 16 10:32:02 s3 kernel: block drbd0: drbd_bm_resize called with 
capacity == 0
Mar 16 10:32:02 s3 kernel: block drbd0: worker terminated
Mar 16 10:32:02 s3 kernel: block drbd0: Terminating worker thread
Mar 16 10:32:02 s3 kernel: drbd: module cleanup done
7. Start drbd on s3 ->
KABUM: s3 kernel panicked:
Message from syslogd at s3 at Mar 16 10:32:45 ...
  kernel:Oops: 0000 [#1] SMP
Message from syslogd at s3 at Mar 16 10:32:45 ...
  kernel:last sysfs file: /sys/devices/virtual/block/drbd0/removable
Message from syslogd at s3 at Mar 16 10:32:45 ...
  kernel:Stack:
Message from syslogd at s3 at Mar 16 10:32:45 ...
  kernel:Call Trace:
Message from syslogd at s3 at Mar 16 10:32:45 ...
  kernel:Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 
74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 
<4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff
Message from syslogd at s3 at Mar 16 10:32:45 ...
  kernel:CR2: 0000000000000038
Message from syslogd at s3 at Mar 16 10:32:45 ...
  kernel:Kernel panic - not syncing: Fatal exception

s2 cman then fences off hung s3.
Should i provide more info?

Here is one of the more detailed errors i managed to get while testing:
Mar 16 09:39:50 s2 kernel: BUG: unable to handle kernel NULL pointer 
dereference at 0000000000000038
Mar 16 09:39:50 s2 kernel: IP: [<ffffffff814185a0>] sock_ioctl+0x30/0x280
Mar 16 09:39:50 s2 kernel: PGD 238460067 PUD 229f59067 PMD 0
Mar 16 09:39:50 s2 kernel: Oops: 0000 [#1] SMP
Message from syslogd at s2 at Mar 16 09:39:50 ...
  kernel:Oops: 0000 [#1] SMP
Mar 16 09:39:50 s2 kernel: last sysfs file: 
/sys/module/drbd/parameters/cn_idx
Message from syslogd at s2 at Mar 16 09:39:50 ...
  kernel:last sysfs file: /sys/module/drbd/parameters/cn_idx
Mar 16 09:39:50 s2 kernel: CPU 0
Mar 16 09:39:50 s2 kernel: Modules linked in: gfs2 drbd(U) sctp 
libcrc32c dlm configfs sunrpc 8021q garp stp llc bonding ipt_REJECT 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables ext2 raid0 serio_raw i2c_i801 i2c_core sg iTCO_wdt 
iTCO_vendor_support ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif 
ahci igb dca e1000e dm_mirror dm_region_hash dm_log dm_mod be2iscsi 
bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp 
qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: drbd]
Mar 16 09:39:50 s2 kernel:
Mar 16 09:39:50 s2 kernel: Pid: 4875, comm: drbdadm Tainted: G        W  
----------------   2.6.32-220.7.1.el6.x86_64 #1 Supermicro 
X9SCL/X9SCM/X9SCL/X9SCM
Mar 16 09:39:50 s2 kernel: RIP: 0010:[<ffffffff814185a0>]  
[<ffffffff814185a0>] sock_ioctl+0x30/0x280
Mar 16 09:39:50 s2 kernel: RSP: 0018:ffff880229bf7e38  EFLAGS: 00010282
Mar 16 09:39:50 s2 kernel: RAX: 0000000000000000 RBX: 0000000000005401 
RCX: 00007fff226c2180
Mar 16 09:39:50 s2 kernel: RDX: 00007fff226c2180 RSI: 0000000000005401 
RDI: ffff880233a91980
Mar 16 09:39:50 s2 kernel: RBP: ffff880229bf7e58 R08: ffffffff8165fa40 
R09: 00007fbd4cb1c940
Mar 16 09:39:50 s2 kernel: R10: 00007fff226c1f90 R11: 0000000000000206 
R12: 00007fff226c2180
Mar 16 09:39:50 s2 kernel: R13: 00007fff226c2180 R14: ffff88023ab51200 
R15: 0000000000000000
Mar 16 09:39:50 s2 kernel: FS:  00007fbd4cd25700(0000) 
GS:ffff880028200000(0000) knlGS:0000000000000000
Mar 16 09:39:50 s2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 16 09:39:50 s2 kernel: CR2: 0000000000000038 CR3: 0000000238d69000 
CR4: 00000000000406f0
Mar 16 09:39:50 s2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 
DR2: 0000000000000000
Mar 16 09:39:50 s2 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 
DR7: 0000000000000400
Mar 16 09:39:50 s2 kernel: Process drbdadm (pid: 4875, threadinfo 
ffff880229bf6000, task ffff8802299c3580)
Mar 16 09:39:50 s2 kernel: Stack:
Message from syslogd at s2 at Mar 16 09:39:50 ...
  kernel:Stack:
Mar 16 09:39:50 s2 kernel: ffff880233a91980 ffff88023ab51248 
00007fff226c2180 0000000000000000
Mar 16 09:39:50 s2 kernel: <0> ffff880229bf7e98 ffffffff811892f2 
ffff880229bf7e98 ffffffff814f253e
Mar 16 09:39:50 s2 kernel: <0> 0000000000000001 0000000000000003 
0000000000627760 ffff880233a91980
Mar 16 09:39:50 s2 kernel: Call Trace:
Message from syslogd at s2 at Mar 16 09:39:50 ...
  kernel:Call Trace:
Mar 16 09:39:50 s2 kernel: [<ffffffff811892f2>] vfs_ioctl+0x22/0xa0
Mar 16 09:39:50 s2 kernel: [<ffffffff814f253e>] ? do_page_fault+0x3e/0xa0
Mar 16 09:39:50 s2 kernel: [<ffffffff81189494>] do_vfs_ioctl+0x84/0x580
Mar 16 09:39:50 s2 kernel: [<ffffffff81189a11>] sys_ioctl+0x81/0xa0
Mar 16 09:39:50 s2 kernel: [<ffffffff8100b0f2>] 
system_call_fastpath+0x16/0x1b
Mar 16 09:39:50 s2 kernel: Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 
89 6c 24 10 4c 89 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 
89 d4 49 8b 46 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 
00 75 ff ff
Message from syslogd at s2 at Mar 16 09:39:50 ...
  kernel:Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 
74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 
<4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff
Mar 16 09:39:50 s2 kernel: RIP  [<ffffffff814185a0>] sock_ioctl+0x30/0x280
Mar 16 09:39:50 s2 kernel: RSP <ffff880229bf7e38>
Mar 16 09:39:50 s2 kernel: CR2: 0000000000000038
Message from syslogd at s2 at Mar 16 09:39:50 ...
  kernel:CR2: 0000000000000038
Mar 16 09:39:50 s2 kernel: ---[ end trace bf74669367969d52 ]---
Mar 16 09:39:50 s2 kernel: Kernel panic - not syncing: Fatal exception
Message from syslogd at s2 at Mar 16 09:39:50 ...
  kernel:Kernel panic - not syncing: Fatal exception
Mar 16 09:39:50 s2 kernel: Pid: 4875, comm: drbdadm Tainted: G      D W  
----------------   2.6.32-220.7.1.el6.x86_64 #1




More information about the drbd-user mailing list