[Drbd-dev] [PATCH 2/2] drbd: delay resync start unless source has transferred to L_SYNC_SOURCE

Philipp Reisner philipp.reisner at linbit.com
Fri Nov 20 10:13:22 CET 2020


Hi Zhang,

The explanation is sound, the patch looks good. I am going to apply this.

What I wanted to ask:
Are you using automated tests to find this kind of defects?
The tests we maintain are here: https://github.com/LINBIT/drbd9-tests
We would welcome very much if you contribute your tests to this as well.

best regards,
 Phil

On Wed, Nov 18, 2020 at 9:46 AM Zhang Duan <duan.zhang at easystack.cn> wrote:
>
> drbd_start_resync may be rescheduled due to down_trylock failure, leaves a
> state of L_WF_BITMAP_S while target state is L_SYNC_TARGET and already has
> sent its resync request. Then, resync going on while the source is
> L_WF_BITMAP_S
> will lead to data lose by time sequence below:
>
> L_WF_BITMAP_S                   L_SYNC_TARGET
>                                  resync request(sector A)
> reply old data(A)               read & write old data(A)
> new IO(A)
> send oos(A)                     set oos(A)
> A is at new version             resync write A done
>                                  set in sync(A) but A is at old version
>
> Signed-off-by: ZhangDuan <duan.zhang at easystack.cn>
> ---
>   drbd/drbd_receiver.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git drbd/drbd_receiver.c drbd/drbd_receiver.c
> index a31e44b2..7a9ce4d0 100644
> --- drbd/drbd_receiver.c
> +++ drbd/drbd_receiver.c
> @@ -3301,6 +3301,15 @@ static int receive_DataRequest(struct
> drbd_connection *connection, struct packet
>                 return ignore_remaining_packet(connection, pi->size);
>         }
>   +     /* Tell target to have a retry, waiting for the rescheduled
> +        * drbd_start_resync to complete. Otherwise the concurrency
> +        * of send oos and resync may lead to a data lose. */
> +       if ((pi->cmd == P_RS_DATA_REQUEST || pi->cmd == P_CSUM_RS_REQUEST) &&
> +                       peer_device->repl_state[NOW] == L_WF_BITMAP_S) {
> +               drbd_send_ack_rp(peer_device, P_RS_CANCEL, p);
> +               return ignore_remaining_packet(connection, pi->size);
> +       }
> +
>         peer_req = drbd_alloc_peer_req(peer_device, GFP_TRY);
>         err = -ENOMEM;
>         if (!peer_req)
> --
> 2.24.0.windows.2
>
>
> --
> Sincerely Yours,
> Zhang Duan


More information about the drbd-dev mailing list