<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jun 12, 2017 at 5:45 PM, Lars Ellenberg <span dir="ltr"><<a href="mailto:lars.ellenberg@linbit.com" target="_blank">lars.ellenberg@linbit.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5">On Fri, Jun 09, 2017 at 11:39:05PM +0800, David Lee wrote:<br>
> Hi,<br>
><br>
> I am experimenting with DRBD dual-primary with OCFS 2, and DRBD client as<br>
> well.<br>
> With the hope that every node can access the storage in an unified way.<br>
> But I got a<br>
> kernel call trace and huge number of ASSERTION failure (*before* OCFS2 is<br>
> mounted):<br>
><br>
> ----<paste begins>----<br>
> [11160.192091] INFO: task drbdsetup:19442 blocked for more than 120 seconds.<br>
> [11160.192096] Tainted: G OE 4.1.12-37.2.2.el7uek.x86_64<br>
> #2<br>
> [11160.192097] "echo 0 > /proc/sys/kernel/hung_task_<wbr>timeout_secs" disables<br>
> this message.<br>
> [11160.192099] drbdsetup D ffff88013fd17840 0 19442 1<br>
> 0x00000084<br>
> [11160.192108] ffff8800addef8c8 0000000000000082 ffff88013a3d3800<br>
> ffff8800369eb800<br>
> [11160.192111] ffff8800addef938 ffff8800addf0000 ffff8800adb192c0<br>
> 7fffffffffffffff<br>
> [11160.192113] ffff8800369eb800 0000000000000297 ffff8800addef8e8<br>
> ffffffff81712947<br>
> [11160.192116] Call Trace:<br>
> [11160.192128] [<ffffffff81712947>] schedule+0x37/0x90<br>
> [11160.192131] [<ffffffff8171596c>] schedule_timeout+0x20c/0x280<br>
> [11160.192134] [<ffffffff817158b6>] ? schedule_timeout+0x156/0x280<br>
> [11160.192148] [<ffffffffa05c2695>] ? drbd_destroy_path+0x15/0x20 [drbd]<br>
> [11160.192152] [<ffffffff817134b4>] wait_for_completion+0x134/<wbr>0x190<br>
> [11160.192157] [<ffffffff810b1d90>] ? wake_up_state+0x20/0x20<br>
> [11160.192165] [<ffffffffa05c4d51>] _drbd_thread_stop+0xc1/0x110 [drbd]<br>
> [11160.192173] [<ffffffffa05dd84c>] del_connection+0x3c/0x140 [drbd]<br>
> [11160.192179] [<ffffffffa05e0bd3>] drbd_adm_down+0xc3/0x2c0 [drbd]<br>
> [11160.192184] [<ffffffff8162886d>] genl_family_rcv_msg+0x1cd/<wbr>0x400<br>
<br>
</div></div><span class="gmail-">> [11163.573075] __bm_op: 84153300 callbacks suppressed<br>
> [11163.573075] drbd r0/0 drbd100: ASSERTION bitmap->bm_pages FAILED in<br>
<br>
<br>
</span>The assertion is that the bitmap pages are supposed to be allocated<br>
when we do bitmap operations.<br>
<br>
Apparently in this case, they are not.<br>
<br>
So either the bitmap pages have never been allocated, and our error<br>
handling for that case sucks, or they are freed too early, while<br>
"something" still wants to flip or count some bits. But I would have<br>
expected someone to notice something like that before. Strange.<br>
<br>
Lars<br></blockquote></div><br></div><div class="gmail_extra"><br>Thanks for your comments, Lars.<br><br></div><div class="gmail_extra">I found other interesting (weird) things with OCFS 2 and DRBD clients, and moved to<br></div><div class="gmail_extra">other directions. The interesting things are:<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">1. In the scenario of a three-node OCFS2 cluster with dual-primary DRBD and 1 client node,<br> the whole cluster fences (every node will reboot) when the DRBD client node down.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">2. If add one more DRBD client node (of course drbd/o2cb/ocfs2 confs are updated)<br></div><div class="gmail_extra"> then both client node constantly failed to join with mount.ocfs2 failure.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">I've changed the experiment to get rid of OCFS2. But if any help needed (for example,<br></div><div class="gmail_extra">to verify some configuration), please kindly let me know.<br clear="all"></div><div class="gmail_extra"><br>-- <br><div class="gmail_signature">Thanks,<br>Li Qun<br></div>
</div></div>