Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I am experimenting with DRBD dual-primary with OCFS 2, and DRBD client as well. With the hope that every node can access the storage in an unified way. But I got a kernel call trace and huge number of ASSERTION failure (*before* OCFS2 is mounted): ----<paste begins>---- [11160.192091] INFO: task drbdsetup:19442 blocked for more than 120 seconds. [11160.192096] Tainted: G OE 4.1.12-37.2.2.el7uek.x86_64 #2 [11160.192097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [11160.192099] drbdsetup D ffff88013fd17840 0 19442 1 0x00000084 [11160.192108] ffff8800addef8c8 0000000000000082 ffff88013a3d3800 ffff8800369eb800 [11160.192111] ffff8800addef938 ffff8800addf0000 ffff8800adb192c0 7fffffffffffffff [11160.192113] ffff8800369eb800 0000000000000297 ffff8800addef8e8 ffffffff81712947 [11160.192116] Call Trace: [11160.192128] [<ffffffff81712947>] schedule+0x37/0x90 [11160.192131] [<ffffffff8171596c>] schedule_timeout+0x20c/0x280 [11160.192134] [<ffffffff817158b6>] ? schedule_timeout+0x156/0x280 [11160.192148] [<ffffffffa05c2695>] ? drbd_destroy_path+0x15/0x20 [drbd] [11160.192152] [<ffffffff817134b4>] wait_for_completion+0x134/0x190 [11160.192157] [<ffffffff810b1d90>] ? wake_up_state+0x20/0x20 [11160.192165] [<ffffffffa05c4d51>] _drbd_thread_stop+0xc1/0x110 [drbd] [11160.192173] [<ffffffffa05dd84c>] del_connection+0x3c/0x140 [drbd] [11160.192179] [<ffffffffa05e0bd3>] drbd_adm_down+0xc3/0x2c0 [drbd] [11160.192184] [<ffffffff8162886d>] genl_family_rcv_msg+0x1cd/0x400 [11160.192186] [<ffffffff81628aa0>] ? genl_family_rcv_msg+0x400/0x400 [11160.192188] [<ffffffff81628b31>] genl_rcv_msg+0x91/0xd0 [11160.192190] [<ffffffff81627901>] netlink_rcv_skb+0xc1/0xe0 [11160.192192] [<ffffffff81627fec>] genl_rcv+0x2c/0x40 [11160.192193] [<ffffffff81626f86>] netlink_unicast+0x106/0x210 [11160.192195] [<ffffffff816274c4>] netlink_sendmsg+0x434/0x690 [11160.192199] [<ffffffff815d66ed>] sock_sendmsg+0x3d/0x50 [11160.192201] [<ffffffff815d6785>] sock_write_iter+0x85/0xf0 [11160.192205] [<ffffffff81209f6e>] __vfs_write+0xce/0x120 [11160.192207] [<ffffffff8120a619>] vfs_write+0xa9/0x1b0 [11160.192210] [<ffffffff8102587c>] ? do_audit_syscall_entry+0x6c/0x70 [11160.192213] [<ffffffff8120b505>] SyS_write+0x55/0xd0 [11160.192215] [<ffffffff81716aee>] system_call_fastpath+0x12/0x71 [11163.573075] __bm_op: 84153300 callbacks suppressed [11163.573075] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in __bm_op [10968.421046] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in __bm_op [10968.421046] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in __bm_op [10968.421046] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in __bm_op [10973.403026] __bm_op: 84588466 callbacks suppressed [10973.403026] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in __bm_op [10973.403026] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in __bm_op [10973.403026] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in __bm_op ----<paste ends>---- 'grep -c' shows tens of thousands of the ASSERTION error as shown above. The call trace (and node got rebooted automatically) happened in a DRBD client node. Any insights? Thanks in advance. # cat /proc/drbd version: 9.0.7-1 (api:2/proto:86-112) My DRBD resource configuration: resource r0 { handlers { split-brain "/usr/lib64/drbd/notify-split-brain.sh root"; } startup { become-primary-on both; } connection-mesh { hosts 10-0-149-20 10-0-147-191 10-0-218-14 10-0-183-69; } on 10-0-149-20 { node-id 0; address ipv4 10.0.149.20:7789; volume 0 { device minor 100; disk /dev/disk/by-id/wwn-0x000f5ab58042677f; meta-disk internal; } } on 10-0-147-191 { node-id 1; address ipv4 10.0.147.191:7789; volume 0 { device minor 100; disk /dev/disk/by-id/wwn-0x000f5ab58042677f; meta-disk internal; } } # DRBD client on 10-0-218-14 { node-id 2; address ipv4 10.0.218.14:7789; volume 0 { device minor 100; disk none; meta-disk internal; } } # DRBD client on 10-0-183-69 { node-id 3; address ipv4 10.0.183.69:7789; volume 0 { device minor 100; disk none; meta-disk internal; } } net { after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; fencing resource-and-stonith; protocol C; allow-two-primaries yes; sndbuf-size 0; } } -- Thanks, Li Qun -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170609/2e270f51/attachment.htm>