Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, i'm hitting a bug in drbd, with latest CentOs and drbd 8.3.12 using GFS2 on top with cman and rgmanager. Here is the simplest method to have it occur. 1. Start drbd on node s2 2. Start drbd on node s3 They sync up: [root at s3 ~]# cat /proc/drbd version: 8.3.12 (api:88/proto:86-96) GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build64R6, 2011-11-20 10:57:03 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:45060 dw:45056 dr:660 al:0 bm:11 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 3. Start cman on s2 & s3, so i can use gfs2: cluster is up OK: [root at s3 ~]# cman_tool status Version: 6.2.0 Config Version: 8 Cluster Name: stor Cluster Id: 61164 Cluster Member: Yes Cluster Generation: 140 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Node votes: 1 Quorum: 1 Active subsystems: 7 Flags: 2node Ports Bound: 0 Node name: s3alt.c.XX.si Node ID: 3 Multicast addresses: 239.192.238.219 239.192.0.2 Node addresses: 192.168.168.3 10.31.0.42 4. Start gfs2 on both nodes: Mar 16 10:29:41 s3 kernel: GFS2 (built Mar 7 2012 00:54:51) installed Mar 16 10:29:41 s3 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "stor:drbdstor" Mar 16 10:29:41 s3 kernel: dlm: Using SCTP for communications Mar 16 10:29:41 s3 kernel: SCTP: Hash tables configured (established 65536 bind 65536) Mar 16 10:29:41 s3 kernel: dlm: connecting to 2 sctp association 1 Mar 16 10:29:41 s3 kernel: GFS2: fsid=stor:drbdstor.1: Joined cluster. Now mounting FS... Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1, already locked for use Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1: Looking at journal... Mar 16 10:29:42 s3 kernel: GFS2: fsid=stor:drbdstor.1: jid=1: Done 5. Stop gfs on s3 (didn't write anything to s2 or s3 on drbd mount while mounted) 6. Stop drbd on s3: Mar 16 10:32:02 s3 kernel: block drbd0: role( Primary -> Secondary ) Mar 16 10:32:02 s3 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies Mar 16 10:32:02 s3 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Mar 16 10:32:02 s3 kernel: block drbd0: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7) Mar 16 10:32:02 s3 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) Mar 16 10:32:02 s3 kernel: block drbd0: asender terminated Mar 16 10:32:02 s3 kernel: block drbd0: Terminating asender thread Mar 16 10:32:02 s3 kernel: block drbd0: Connection closed Mar 16 10:32:02 s3 kernel: block drbd0: conn( Disconnecting -> StandAlone ) Mar 16 10:32:02 s3 kernel: block drbd0: receiver terminated Mar 16 10:32:02 s3 kernel: block drbd0: Terminating receiver thread Mar 16 10:32:02 s3 kernel: block drbd0: disk( Outdated -> Failed ) Mar 16 10:32:02 s3 kernel: block drbd0: disk( Failed -> Diskless ) Mar 16 10:32:02 s3 kernel: block drbd0: drbd_bm_resize called with capacity == 0 Mar 16 10:32:02 s3 kernel: block drbd0: worker terminated Mar 16 10:32:02 s3 kernel: block drbd0: Terminating worker thread Mar 16 10:32:02 s3 kernel: drbd: module cleanup done 7. Start drbd on s3 -> KABUM: s3 kernel panicked: Message from syslogd at s3 at Mar 16 10:32:45 ... kernel:Oops: 0000 [#1] SMP Message from syslogd at s3 at Mar 16 10:32:45 ... kernel:last sysfs file: /sys/devices/virtual/block/drbd0/removable Message from syslogd at s3 at Mar 16 10:32:45 ... kernel:Stack: Message from syslogd at s3 at Mar 16 10:32:45 ... kernel:Call Trace: Message from syslogd at s3 at Mar 16 10:32:45 ... kernel:Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff Message from syslogd at s3 at Mar 16 10:32:45 ... kernel:CR2: 0000000000000038 Message from syslogd at s3 at Mar 16 10:32:45 ... kernel:Kernel panic - not syncing: Fatal exception s2 cman then fences off hung s3. Should i provide more info? Here is one of the more detailed errors i managed to get while testing: Mar 16 09:39:50 s2 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 Mar 16 09:39:50 s2 kernel: IP: [<ffffffff814185a0>] sock_ioctl+0x30/0x280 Mar 16 09:39:50 s2 kernel: PGD 238460067 PUD 229f59067 PMD 0 Mar 16 09:39:50 s2 kernel: Oops: 0000 [#1] SMP Message from syslogd at s2 at Mar 16 09:39:50 ... kernel:Oops: 0000 [#1] SMP Mar 16 09:39:50 s2 kernel: last sysfs file: /sys/module/drbd/parameters/cn_idx Message from syslogd at s2 at Mar 16 09:39:50 ... kernel:last sysfs file: /sys/module/drbd/parameters/cn_idx Mar 16 09:39:50 s2 kernel: CPU 0 Mar 16 09:39:50 s2 kernel: Modules linked in: gfs2 drbd(U) sctp libcrc32c dlm configfs sunrpc 8021q garp stp llc bonding ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ext2 raid0 serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci igb dca e1000e dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: drbd] Mar 16 09:39:50 s2 kernel: Mar 16 09:39:50 s2 kernel: Pid: 4875, comm: drbdadm Tainted: G W ---------------- 2.6.32-220.7.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM Mar 16 09:39:50 s2 kernel: RIP: 0010:[<ffffffff814185a0>] [<ffffffff814185a0>] sock_ioctl+0x30/0x280 Mar 16 09:39:50 s2 kernel: RSP: 0018:ffff880229bf7e38 EFLAGS: 00010282 Mar 16 09:39:50 s2 kernel: RAX: 0000000000000000 RBX: 0000000000005401 RCX: 00007fff226c2180 Mar 16 09:39:50 s2 kernel: RDX: 00007fff226c2180 RSI: 0000000000005401 RDI: ffff880233a91980 Mar 16 09:39:50 s2 kernel: RBP: ffff880229bf7e58 R08: ffffffff8165fa40 R09: 00007fbd4cb1c940 Mar 16 09:39:50 s2 kernel: R10: 00007fff226c1f90 R11: 0000000000000206 R12: 00007fff226c2180 Mar 16 09:39:50 s2 kernel: R13: 00007fff226c2180 R14: ffff88023ab51200 R15: 0000000000000000 Mar 16 09:39:50 s2 kernel: FS: 00007fbd4cd25700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 Mar 16 09:39:50 s2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 16 09:39:50 s2 kernel: CR2: 0000000000000038 CR3: 0000000238d69000 CR4: 00000000000406f0 Mar 16 09:39:50 s2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 16 09:39:50 s2 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 16 09:39:50 s2 kernel: Process drbdadm (pid: 4875, threadinfo ffff880229bf6000, task ffff8802299c3580) Mar 16 09:39:50 s2 kernel: Stack: Message from syslogd at s2 at Mar 16 09:39:50 ... kernel:Stack: Mar 16 09:39:50 s2 kernel: ffff880233a91980 ffff88023ab51248 00007fff226c2180 0000000000000000 Mar 16 09:39:50 s2 kernel: <0> ffff880229bf7e98 ffffffff811892f2 ffff880229bf7e98 ffffffff814f253e Mar 16 09:39:50 s2 kernel: <0> 0000000000000001 0000000000000003 0000000000627760 ffff880233a91980 Mar 16 09:39:50 s2 kernel: Call Trace: Message from syslogd at s2 at Mar 16 09:39:50 ... kernel:Call Trace: Mar 16 09:39:50 s2 kernel: [<ffffffff811892f2>] vfs_ioctl+0x22/0xa0 Mar 16 09:39:50 s2 kernel: [<ffffffff814f253e>] ? do_page_fault+0x3e/0xa0 Mar 16 09:39:50 s2 kernel: [<ffffffff81189494>] do_vfs_ioctl+0x84/0x580 Mar 16 09:39:50 s2 kernel: [<ffffffff81189a11>] sys_ioctl+0x81/0xa0 Mar 16 09:39:50 s2 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Mar 16 09:39:50 s2 kernel: Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff Message from syslogd at s2 at Mar 16 09:39:50 ... kernel:Code: 83 ec 20 48 89 1c 24 4c 89 64 24 08 4c 89 6c 24 10 4c 89 74 24 18 0f 1f 44 00 00 4c 8b b7 a0 00 00 00 89 f3 49 89 d4 49 8b 46 38 <4c> 8b 68 38 8d 83 10 76 ff ff 83 f8 0f 76 51 8d 83 00 75 ff ff Mar 16 09:39:50 s2 kernel: RIP [<ffffffff814185a0>] sock_ioctl+0x30/0x280 Mar 16 09:39:50 s2 kernel: RSP <ffff880229bf7e38> Mar 16 09:39:50 s2 kernel: CR2: 0000000000000038 Message from syslogd at s2 at Mar 16 09:39:50 ... kernel:CR2: 0000000000000038 Mar 16 09:39:50 s2 kernel: ---[ end trace bf74669367969d52 ]--- Mar 16 09:39:50 s2 kernel: Kernel panic - not syncing: Fatal exception Message from syslogd at s2 at Mar 16 09:39:50 ... kernel:Kernel panic - not syncing: Fatal exception Mar 16 09:39:50 s2 kernel: Pid: 4875, comm: drbdadm Tainted: G D W ---------------- 2.6.32-220.7.1.el6.x86_64 #1