Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
I am experimenting with DRBD dual-primary with OCFS 2, and DRBD client as
well.
With the hope that every node can access the storage in an unified way.
But I got a
kernel call trace and huge number of ASSERTION failure (*before* OCFS2 is
mounted):
----<paste begins>----
[11160.192091] INFO: task drbdsetup:19442 blocked for more than 120 seconds.
[11160.192096] Tainted: G OE 4.1.12-37.2.2.el7uek.x86_64
#2
[11160.192097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[11160.192099] drbdsetup D ffff88013fd17840 0 19442 1
0x00000084
[11160.192108] ffff8800addef8c8 0000000000000082 ffff88013a3d3800
ffff8800369eb800
[11160.192111] ffff8800addef938 ffff8800addf0000 ffff8800adb192c0
7fffffffffffffff
[11160.192113] ffff8800369eb800 0000000000000297 ffff8800addef8e8
ffffffff81712947
[11160.192116] Call Trace:
[11160.192128] [<ffffffff81712947>] schedule+0x37/0x90
[11160.192131] [<ffffffff8171596c>] schedule_timeout+0x20c/0x280
[11160.192134] [<ffffffff817158b6>] ? schedule_timeout+0x156/0x280
[11160.192148] [<ffffffffa05c2695>] ? drbd_destroy_path+0x15/0x20 [drbd]
[11160.192152] [<ffffffff817134b4>] wait_for_completion+0x134/0x190
[11160.192157] [<ffffffff810b1d90>] ? wake_up_state+0x20/0x20
[11160.192165] [<ffffffffa05c4d51>] _drbd_thread_stop+0xc1/0x110 [drbd]
[11160.192173] [<ffffffffa05dd84c>] del_connection+0x3c/0x140 [drbd]
[11160.192179] [<ffffffffa05e0bd3>] drbd_adm_down+0xc3/0x2c0 [drbd]
[11160.192184] [<ffffffff8162886d>] genl_family_rcv_msg+0x1cd/0x400
[11160.192186] [<ffffffff81628aa0>] ? genl_family_rcv_msg+0x400/0x400
[11160.192188] [<ffffffff81628b31>] genl_rcv_msg+0x91/0xd0
[11160.192190] [<ffffffff81627901>] netlink_rcv_skb+0xc1/0xe0
[11160.192192] [<ffffffff81627fec>] genl_rcv+0x2c/0x40
[11160.192193] [<ffffffff81626f86>] netlink_unicast+0x106/0x210
[11160.192195] [<ffffffff816274c4>] netlink_sendmsg+0x434/0x690
[11160.192199] [<ffffffff815d66ed>] sock_sendmsg+0x3d/0x50
[11160.192201] [<ffffffff815d6785>] sock_write_iter+0x85/0xf0
[11160.192205] [<ffffffff81209f6e>] __vfs_write+0xce/0x120
[11160.192207] [<ffffffff8120a619>] vfs_write+0xa9/0x1b0
[11160.192210] [<ffffffff8102587c>] ? do_audit_syscall_entry+0x6c/0x70
[11160.192213] [<ffffffff8120b505>] SyS_write+0x55/0xd0
[11160.192215] [<ffffffff81716aee>] system_call_fastpath+0x12/0x71
[11163.573075] __bm_op: 84153300 callbacks suppressed
[11163.573075] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in
__bm_op
[10968.421046] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in
__bm_op
[10968.421046] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in
__bm_op
[10968.421046] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in
__bm_op
[10973.403026] __bm_op: 84588466 callbacks suppressed
[10973.403026] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in
__bm_op
[10973.403026] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in
__bm_op
[10973.403026] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in
__bm_op
----<paste ends>----
'grep -c' shows tens of thousands of the ASSERTION error as shown above.
The call trace (and node got rebooted automatically) happened in a DRBD
client node.
Any insights?
Thanks in advance.
# cat /proc/drbd
version: 9.0.7-1 (api:2/proto:86-112)
My DRBD resource configuration:
resource r0 {
handlers {
split-brain "/usr/lib64/drbd/notify-split-brain.sh root";
}
startup {
become-primary-on both;
}
connection-mesh {
hosts 10-0-149-20 10-0-147-191 10-0-218-14 10-0-183-69;
}
on 10-0-149-20 {
node-id 0;
address ipv4 10.0.149.20:7789;
volume 0 {
device minor 100;
disk /dev/disk/by-id/wwn-0x000f5ab58042677f;
meta-disk internal;
}
}
on 10-0-147-191 {
node-id 1;
address ipv4 10.0.147.191:7789;
volume 0 {
device minor 100;
disk /dev/disk/by-id/wwn-0x000f5ab58042677f;
meta-disk internal;
}
}
# DRBD client
on 10-0-218-14 {
node-id 2;
address ipv4 10.0.218.14:7789;
volume 0 {
device minor 100;
disk none;
meta-disk internal;
}
}
# DRBD client
on 10-0-183-69 {
node-id 3;
address ipv4 10.0.183.69:7789;
volume 0 {
device minor 100;
disk none;
meta-disk internal;
}
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
fencing resource-and-stonith;
protocol C;
allow-two-primaries yes;
sndbuf-size 0;
}
}
--
Thanks,
Li Qun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170609/2e270f51/attachment.htm>