[Drbd-dev] [PATCH v2] drbd: fix a race condition in update_sync_bits() and receive_bitmap()

Rui Xu rui.xu at easystack.cn
Mon Sep 13 04:27:08 CEST 2021


There is a race condition in update_sync_bits() and receive_bitmap(),
please consider this scenario:

Primary: node-3, Secondary node-1, node-2

(1) network failure happend on node-1.
(2) node-1 network recovery.
(3) node-1 connect to node-2, and start resync (node-1 is SyncTarget,
node-2 is SyncSource)
(4) before resync in (3) finished, node-1 connect to node-3 and start
resync.(node-1 is PauseSyncTarget, node-3 is PauseSyncSource)

When node-1(SyncTarget) is resync with node-2(SyncSource), node-1 may
set bitmap for node-3 in receive_resync_read()->drbd_set_all_out_of_sync(),
and clear the bitmap for node-3 when got P_PEERS_IN_SYNC from node-2.

Then there is a possibility scenario as below:

thread:ack_receiver (node-1)           thread:receiver (node-1)
update_sync_bits()                     receive_bitmap()

set the rs_is_done to 1
				       set the bitmap for node-3
				       set the repl_state to PauseSyncTarget
set RS_DONE flag

it will lead the reysnc of node-1 and node-3 to finish in an unexpected way, so
we need to determine the is_sync_target_state before getting the bitmap total
weight in update_sync_bits.
---
changelog:
	-v1: fix typo in commit message
 drbd/drbd_actlog.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
index 841e5149..3d2fd399 100644
--- a/drbd/drbd_actlog.c
+++ b/drbd/drbd_actlog.c
@@ -1044,11 +1044,11 @@ static bool lazy_bitmap_update_due(struct drbd_peer_device *peer_device)
 }
 
 static void maybe_schedule_on_disk_bitmap_update(struct drbd_peer_device *peer_device,
-						 bool rs_done)
+						 bool rs_done, bool is_sync_target)
 {
 	if (rs_done) {
 		if (peer_device->connection->agreed_pro_version <= 95 ||
-		    is_sync_target_state(peer_device, NOW))
+		    is_sync_target)
 			set_bit(RS_DONE, &peer_device->flags);
 
 		/* If sync source: rather wait for explicit notification via
@@ -1105,11 +1105,12 @@ static int update_sync_bits(struct drbd_peer_device *peer_device,
 	}
 	if (count) {
 		if (mode == SET_IN_SYNC) {
+			bool is_sync_target = is_sync_target_state(peer_device, NOW);
 			unsigned long still_to_go = drbd_bm_total_weight(peer_device);
 			bool rs_is_done = (still_to_go <= peer_device->rs_failed);
 			drbd_advance_rs_marks(peer_device, still_to_go);
 			if (cleared || rs_is_done)
-				maybe_schedule_on_disk_bitmap_update(peer_device, rs_is_done);
+				maybe_schedule_on_disk_bitmap_update(peer_device, rs_is_done, is_sync_target);
 		} else if (mode == RECORD_RS_FAILED) {
 			peer_device->rs_failed += count;
 		} else /* if (mode == SET_OUT_OF_SYNC) */ {
-- 
2.25.1



More information about the drbd-dev mailing list