[Drbd-dev] [PATCH v2] drbd: fix a race condition in update_sync_bits() and receive_bitmap()
Rui Xu
rui.xu at easystack.cn
Mon Sep 13 04:27:08 CEST 2021
There is a race condition in update_sync_bits() and receive_bitmap(),
please consider this scenario:
Primary: node-3, Secondary node-1, node-2
(1) network failure happend on node-1.
(2) node-1 network recovery.
(3) node-1 connect to node-2, and start resync (node-1 is SyncTarget,
node-2 is SyncSource)
(4) before resync in (3) finished, node-1 connect to node-3 and start
resync.(node-1 is PauseSyncTarget, node-3 is PauseSyncSource)
When node-1(SyncTarget) is resync with node-2(SyncSource), node-1 may
set bitmap for node-3 in receive_resync_read()->drbd_set_all_out_of_sync(),
and clear the bitmap for node-3 when got P_PEERS_IN_SYNC from node-2.
Then there is a possibility scenario as below:
thread:ack_receiver (node-1) thread:receiver (node-1)
update_sync_bits() receive_bitmap()
set the rs_is_done to 1
set the bitmap for node-3
set the repl_state to PauseSyncTarget
set RS_DONE flag
it will lead the reysnc of node-1 and node-3 to finish in an unexpected way, so
we need to determine the is_sync_target_state before getting the bitmap total
weight in update_sync_bits.
---
changelog:
-v1: fix typo in commit message
drbd/drbd_actlog.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
index 841e5149..3d2fd399 100644
--- a/drbd/drbd_actlog.c
+++ b/drbd/drbd_actlog.c
@@ -1044,11 +1044,11 @@ static bool lazy_bitmap_update_due(struct drbd_peer_device *peer_device)
}
static void maybe_schedule_on_disk_bitmap_update(struct drbd_peer_device *peer_device,
- bool rs_done)
+ bool rs_done, bool is_sync_target)
{
if (rs_done) {
if (peer_device->connection->agreed_pro_version <= 95 ||
- is_sync_target_state(peer_device, NOW))
+ is_sync_target)
set_bit(RS_DONE, &peer_device->flags);
/* If sync source: rather wait for explicit notification via
@@ -1105,11 +1105,12 @@ static int update_sync_bits(struct drbd_peer_device *peer_device,
}
if (count) {
if (mode == SET_IN_SYNC) {
+ bool is_sync_target = is_sync_target_state(peer_device, NOW);
unsigned long still_to_go = drbd_bm_total_weight(peer_device);
bool rs_is_done = (still_to_go <= peer_device->rs_failed);
drbd_advance_rs_marks(peer_device, still_to_go);
if (cleared || rs_is_done)
- maybe_schedule_on_disk_bitmap_update(peer_device, rs_is_done);
+ maybe_schedule_on_disk_bitmap_update(peer_device, rs_is_done, is_sync_target);
} else if (mode == RECORD_RS_FAILED) {
peer_device->rs_failed += count;
} else /* if (mode == SET_OUT_OF_SYNC) */ {
--
2.25.1
More information about the drbd-dev
mailing list