[Drbd-dev] Resync finished with SyncSource still have bits set

Nick Wang nwang at suse.com
Tue Mar 21 05:50:21 CET 2017


Hello,

I can reproduce an unexpected drbd_resync_finished() 
process when the SyncSource still have bits set on drbd9.0.6. 
A connected secondary node is detaching to diskless 
during resync process, and the resync may be treated 
as completed when receiving the state change. 

I think it also need to check the state of peer disk before completing 
the resync, at least for the not weak node to avoid completing 
resync for diskless node. With the modification by the end of this 
mail, i can't reproduce this issue anymore.

Steps to reproduce:
1. Write errors happened on Secondary node, then turn detach to diskless 
2. (Manually) Down/Up the resource on the Secondary node, resync started.
3. Write errors happened again on Secondary, detach/down/up again.

Error log:
Mar 15 15:22:48 node1 kernel: [ 6909.187284] drbd drbd0/0 drbd0
node2: Began resync as SyncSource (will sync 9944 KB [2486 bits set]).
Mar 15 15:22:48 node1 failoverd: DRBD: Connecting, Primary/Unknown,
Consistent/Unknown -> Connected, Primary/Secondary, Consistent/Inconsistent
Mar 15 15:22:51 node1 kernel: [ 6912.101262] drbd drbd0/0 drbd0
node2: SyncSource still sees bits set!! FIXME
Mar 15 15:22:51 node1 kernel: [ 6912.101541] drbd drbd0/0 drbd0
node2: Resync done (total 2 sec; paused 0 sec; 4972 K/sec)
Mar 15 15:22:51 node1 kernel: [ 6912.101544] drbd drbd0/0 drbd0
node2: ASSERTION (n_oos - peer_device->rs_failed) == 0 FAILED in
drbd_resync_finished  <===============
Mar 15 15:22:51 node1 kernel: [ 6912.101555] drbd drbd0/0 drbd0
node2: pdsk( Inconsistent -> Failed ) repl( SyncSource -> Established )
Mar 15 15:22:51 node1 kernel: [ 6912.104755] drbd drbd0/0 drbd0
node2: pdsk( Failed -> Diskless )

---
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 00e3e74..e33c94f 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -6240,7 +6240,8 @@ static int receive_state(struct drbd_connection *connection, struct packet_info
                /* if peer_state changes to connected at the same time,
                 * it explicitly notifies us that it finished resync.
                 * Maybe we should finish it up, too? */
-               else if (peer_state.conn == L_ESTABLISHED) {
+               else if (peer_state.conn == L_ESTABLISHED &&
+                        peer_disk_state > D_NEGOTIATING) {
                        bool finish_now = false;
 
                        if (old_peer_state.conn == L_WF_BITMAP_S) {

-- 


Best regards,
Nick



More information about the drbd-dev mailing list