<div dir="ltr"><div>Hi guys,</div><div><br></div>Version: drbd-9.0.21-1<br><br>Layout: drbd.res within 3 nodes -- node-1(Secondary), node-2(Primary), node-3(Secondary)<br><br>Description: <br>a.reboot node-2 when cluster is working.<br>b.re-up the drbd.res on node-2 after it restarted.<br><a href="http://c.an">c.an</a> expected resync from node-3 to node-2 happens. When the resync is done, however,<br>  node-1 raises an unexpected WFBitMapS repl status and can&#39;t recover to normal anymore.<br><br>Status output:<br><br>node-1: drbdadm status<br><br>drbd6 role:Secondary<br><br>disk:UpToDate<br><br>hotspare connection:Connecting<br><br>node-2 role:Primary<br><br>replication:WFBitMapS peer-disk:Consistent<br><br>node-3 role:Secondary<br><br>peer-disk:UpToDate<br><br><br>node-2: drbdadm status<br><br>drbd6 role:Primary<br><br>disk:UpToDate<br><br>hotspare connection:Connecting<br><br>node-1 role:Secondary<br><br>peer-disk:UpToDate<br><br>node-3 role:Secondary<br><br>peer-disk:UpToDate<br><br>I assume that there is a process sequence below according to my source code version:<br>node-1                                           node-2                                                            node-3<br>                                                 restarted with CRASHED_PRIMARY               <br>                                                 start sync with node-3 as target                                  start sync with node-2 as source<br>                                                 …                                                                …<br>                                                 end sync with node-3                                              end sync with node-2<br>                                                 w_after_state_change<br>                                                      loop 1 within for loop against node-1:(a)<br>receive_uuids10                                  send uuid with UUID_FLAG_GOT_STABLE&amp;CRASHED_PRIMARY to node-1<br>receive uuid of node-2 with CRASHED_PRIMARY      loop 2 within for loop against node-3:<br>                                                 clear  CRASHED_PRIMARY(b)<br>send uuid to node-2 with UUID_FLAG_RESYNC        receive uuids10<br>sync_handshake to SYNC_SOURCE_IF_BOTH_FAILED     sync_handshake to NO_SYNC<br>change repl state to WFBitMapS<br><br>The key problem is about the order of step(a) and step(b), that is, node-2 sends the<br>unexpected  CRASHED_PRIMARY to node-1 though it&#39;s actually no longer a crashed primary<br>after syncing with node-3.<br>So may I have the below questions:<br>a.Is this really a BUG or just an expected result?<br>b.If there&#39;s already a patch fix within the newest verion?<br>c.If there&#39;s some workaround method against this kind of unexcepted status, since I really<br>  meet so many other problems like that :( <br></div>