[Drbd-dev] [PATCH v2] drbd: fix a race condition in update_sync_bits() and receive_bitmap()

Joel Colledge joel.colledge at linbit.com
Mon Sep 13 17:21:46 CEST 2021


Thanks for the contribution. The problem description and patch look valid to me.

Please add a brief comment to the code explaining why it is important
to evaluate is_sync_target_state at that point.

Here is a cleaned up version of your commit message. Is it still
correct? In particular, I am not sure what the consequences of the bug
are. "resync finishes in an unexpected way" is vague. I wrote "even
though bitmap bits are still set" but maybe there are other effects
that I have not thought of.


drbd: fix a race condition in update_sync_bits() and receive_bitmap()

There was a race condition involving update_sync_bits() and
receive_bitmap(). Consider this scenario:

Primary: node-3, Secondary node-1, node-2

(1) Network failure occurs on node-1
(2) node-1 network recovers
(3) node-1 connects to node-2, and starts resync (node-1 is SyncTarget,
node-2 is SyncSource)
(4) Before resync in (3) finishes, node-1 connects to node-3 and starts
resync (node-1 is PausedSyncT, node-3 is PausedSyncS)

The following sequence can occur on node-1 while it is syncing from
node-2:

* ack_receiver thread processes P_PEERS_IN_SYNC
* ack_receiver: call update_sync_bits()
* ack_receiver: clear the last bitmap bits for node-3
* ack_receiver: set rs_is_done to 1
* receiver thread processes P_*BITMAP
* receiver: call receive_bitmap()
* receiver: set bitmap bits for node-3
* receiver: set the repl_state towards node-3 to PausedSyncT
* ack_receiver: set RS_DONE flag

This causes the resync from node-3 to node-1 to finish even though
bitmap bits are still set. Fix this by evaluating is_sync_target_state
before getting the bitmap total weight in update_sync_bits.

Signed-off-by: Rui Xu <rui.xu at easystack.cn>
Signed-off-by: Joel Colledge <joel.colledge at linbit.com>


More information about the drbd-dev mailing list