[Drbd-dev] Resync finished with SyncSource still have bits set
Nick Wang
nwang at suse.com
Tue Mar 21 05:50:21 CET 2017
Hello,
I can reproduce an unexpected drbd_resync_finished()
process when the SyncSource still have bits set on drbd9.0.6.
A connected secondary node is detaching to diskless
during resync process, and the resync may be treated
as completed when receiving the state change.
I think it also need to check the state of peer disk before completing
the resync, at least for the not weak node to avoid completing
resync for diskless node. With the modification by the end of this
mail, i can't reproduce this issue anymore.
Steps to reproduce:
1. Write errors happened on Secondary node, then turn detach to diskless
2. (Manually) Down/Up the resource on the Secondary node, resync started.
3. Write errors happened again on Secondary, detach/down/up again.
Error log:
Mar 15 15:22:48 node1 kernel: [ 6909.187284] drbd drbd0/0 drbd0
node2: Began resync as SyncSource (will sync 9944 KB [2486 bits set]).
Mar 15 15:22:48 node1 failoverd: DRBD: Connecting, Primary/Unknown,
Consistent/Unknown -> Connected, Primary/Secondary, Consistent/Inconsistent
Mar 15 15:22:51 node1 kernel: [ 6912.101262] drbd drbd0/0 drbd0
node2: SyncSource still sees bits set!! FIXME
Mar 15 15:22:51 node1 kernel: [ 6912.101541] drbd drbd0/0 drbd0
node2: Resync done (total 2 sec; paused 0 sec; 4972 K/sec)
Mar 15 15:22:51 node1 kernel: [ 6912.101544] drbd drbd0/0 drbd0
node2: ASSERTION (n_oos - peer_device->rs_failed) == 0 FAILED in
drbd_resync_finished <===============
Mar 15 15:22:51 node1 kernel: [ 6912.101555] drbd drbd0/0 drbd0
node2: pdsk( Inconsistent -> Failed ) repl( SyncSource -> Established )
Mar 15 15:22:51 node1 kernel: [ 6912.104755] drbd drbd0/0 drbd0
node2: pdsk( Failed -> Diskless )
---
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 00e3e74..e33c94f 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -6240,7 +6240,8 @@ static int receive_state(struct drbd_connection *connection, struct packet_info
/* if peer_state changes to connected at the same time,
* it explicitly notifies us that it finished resync.
* Maybe we should finish it up, too? */
- else if (peer_state.conn == L_ESTABLISHED) {
+ else if (peer_state.conn == L_ESTABLISHED &&
+ peer_disk_state > D_NEGOTIATING) {
bool finish_now = false;
if (old_peer_state.conn == L_WF_BITMAP_S) {
--
Best regards,
Nick
More information about the drbd-dev
mailing list