<div dir="ltr"><div>Hi Philipp and Lars,</div><div> Any suggestions?</div><div><br></div>Thanx<br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>发件人: <strong class="gmail_sendername" dir="auto">Dongsheng Yang</strong> <span dir="auto"><<a href="mailto:dongsheng081251@gmail.com">dongsheng081251@gmail.com</a>></span><br>Date: 2020年2月5日周三 下午7:06<br>Subject: Bug Report : meet an unexcepted WFBitMapS status after restarting the primary<br>To: <<a href="mailto:joel.colledge@linbit.com">joel.colledge@linbit.com</a>><br>Cc: <<a href="mailto:drbd-dev@lists.linbit.com">drbd-dev@lists.linbit.com</a>>, <<a href="mailto:duan.zhang@easystack.cn">duan.zhang@easystack.cn</a>><br></div><br><br><div dir="ltr"><div>Hi guys,</div><div><br></div>Version: drbd-9.0.21-1<br><br>Layout: drbd.res within 3 nodes -- node-1(Secondary), node-2(Primary), node-3(Secondary)<br><br>Description: <br>a.reboot node-2 when cluster is working.<br>b.re-up the drbd.res on node-2 after it restarted.<br><a href="http://c.an" target="_blank">c.an</a> expected resync from node-3 to node-2 happens. When the resync is done, however,<br> node-1 raises an unexpected WFBitMapS repl status and can't recover to normal anymore.<br><br>Status output:<br><br>node-1: drbdadm status<br><br>drbd6 role:Secondary<br><br>disk:UpToDate<br><br>hotspare connection:Connecting<br><br>node-2 role:Primary<br><br>replication:WFBitMapS peer-disk:Consistent<br><br>node-3 role:Secondary<br><br>peer-disk:UpToDate<br><br><br>node-2: drbdadm status<br><br>drbd6 role:Primary<br><br>disk:UpToDate<br><br>hotspare connection:Connecting<br><br>node-1 role:Secondary<br><br>peer-disk:UpToDate<br><br>node-3 role:Secondary<br><br>peer-disk:UpToDate<br><br>I assume that there is a process sequence below according to my source code version:<br>node-1 node-2 node-3<br>                                         restarted with CRASHED_PRIMARY <br>                                         start sync with node-3 as target start sync with node-2 as source<br>                                         … …<br> end sync with node-3 end sync with node-2<br>                                         w_after_state_change<br>                                  loop 1 within for loop against node-1:(a)<br>receive_uuids10 send uuid with UUID_FLAG_GOT_STABLE&CRASHED_PRIMARY to node-1<br>receive uuid of node-2 with CRASHED_PRIMARY loop 2 within for loop against node-3:<br>                                         clear CRASHED_PRIMARY(b)<br>send uuid to node-2 with UUID_FLAG_RESYNC receive uuids10<br>sync_handshake to SYNC_SOURCE_IF_BOTH_FAILED sync_handshake to NO_SYNC<br>change repl state to WFBitMapS<br><br>The key problem is about the order of step(a) and step(b), that is, node-2 sends the<br>unexpected CRASHED_PRIMARY to node-1 though it's actually no longer a crashed primary<br>after syncing with node-3.<br>So may I have the below questions:<br>a.Is this really a BUG or just an expected result?<br>b.If there's already a patch fix within the newest verion?<br>c.If there's some workaround method against this kind of unexcepted status, since I really<br> meet so many other problems like that :( <br></div>
</div></div>