drbd: resync blocks
Joel Colledge
joel.colledge at linbit.com
Fri Oct 18 15:23:06 CEST 2024
Hi Zhengbing,
Thanks for investigating this. I don't have any immediate comments on
this. It may take a while until I have time to look into it in detail.
It would be very helpful to have a reproducer. Ideally using the
framework from https://github.com/LINBIT/drbd9-tests
I guess this hasn't been reported by other users because it occurs
during a resync in the less common direction. Normally 9.1 and 9.2 are
only mixed for a short time while upgrading from 9.1 to 9.2. In this
case, the resync with mixed versions has the SyncSource on 9.1 and the
SyncTarget on 9.2. Nevertheless, it is a valid scenario that we want
to fix.
Best regards,
Joel
On Thu, 17 Oct 2024 at 06:30, Zhengbing <zhengbing.huang at easystack.cn> wrote:
>
>
> Hi Joel,
>
> In testing, we found that the problem is consistent with the one that commit "7a9ae1a208" solves.
> commit "7a9ae1a208" comment in w_e_end_rsdata_req() function:
> /* DRBD versions without DRBD_FF_RESYNC_DAGTAG lock
> * 128MiB "resync extents" in the activity log whenever
> * they make resync requests. Some of these versions
> * also lock activity lock extents when receiving
> * P_DATA. In particular, DRBD 9.0 and 9.1. This can
> * cause a deadlock if we send resync replies in these
> * extents as follows:
> * * Node is SyncTarget towards us
> * * Node locks a resync extent and sends P_RS_DATA_REQUEST
> * * Node receives P_DATA write in this extent; write
> * waits for resync extent to be unlocked
> * * Node receives P_BARRIER (protocol A); receiver
> * thread blocks waiting for write to complete
> * * We reply to P_RS_DATA_REQUEST, but it is never
> * processed because receiver thread is blocked
> *
> * Break the deadlock by canceling instead. This is
> * sent on the control socket so it will be processed. */
>
> Then we found two suspicious pieces of code (patch content below):
> 1. In the w_e_end_rsdata_req() function, there is no lock protection between al_resync_extent_active() and drbd_rs_reply(), and P_DATA may be sent through this gap.
> @@ -180,9 +184,11 @@ struct lc_element *_al_get_nonblock(struct drbd_device *device, unsigned int enr
> {
> struct lc_element *al_ext;
>
> + mutex_lock(&device->resync_lock);
> spin_lock_irq(&device->al_lock);
> al_ext = is_local ? lc_try_get_local(device->act_log, enr) : lc_try_get(device->act_log, enr);
> spin_unlock_irq(&device->al_lock);
> + mutex_unlock(&device->resync_lock);
>
> return al_ext;
> }
> @@ -192,9 +198,11 @@ struct lc_element *_al_get(struct drbd_device *device, unsigned int enr, bool is
> {
> struct lc_element *al_ext;
>
> + mutex_lock(&device->resync_lock);
> spin_lock_irq(&device->al_lock);
> al_ext = is_local ? lc_get_local(device->act_log, enr) : lc_get(device->act_log, enr);
> spin_unlock_irq(&device->al_lock);
> + mutex_unlock(&device->resync_lock);
>
> return al_ext;
> }
> diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
> index e9d2c3914..95cf2bb48 100644
> --- a/drbd/drbd_int.h
> +++ b/drbd/drbd_int.h
> @@ -1588,6 +1588,7 @@ struct drbd_device {
>
> int next_barrier_nr;
> struct drbd_md_io md_io;
> + struct mutex resync_lock;
> spinlock_t al_lock;
> wait_queue_head_t al_wait;
> struct lru_cache *act_log; /* activity log */
> diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
> index 02f4ae5d1..f625e2e83 100644
> --- a/drbd/drbd_main.c
> +++ b/drbd/drbd_main.c
> @@ -4092,6 +4092,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
> #ifdef CONFIG_DRBD_TIMING_STATS
> spin_lock_init(&device->timing_lock);
> #endif
> + mutex_init(&device->resync_lock);
> spin_lock_init(&device->al_lock);
>
> spin_lock_init(&device->pending_completion_lock);
> diff --git a/drbd/drbd_req.c b/drbd/drbd_req.c
> index c4aa23a31..f05e21dd3 100644
> --- a/drbd/drbd_req.c
> +++ b/drbd/drbd_req.c
> @@ -2383,6 +2383,7 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
> bool made_progress = false;
> int err;
>
> + mutex_lock(&device->resync_lock);
> spin_lock_irq(&device->al_lock);
>
> /* Don't even try, if someone has it locked right now. */
> @@ -2418,6 +2419,7 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
> }
> out:
> spin_unlock_irq(&device->al_lock);
> + mutex_unlock(&device->resync_lock);
> return made_progress;
> }
>
> diff --git a/drbd/drbd_sender.c b/drbd/drbd_sender.c
> index 738be16d5..ddea6230a 100644
> --- a/drbd/drbd_sender.c
> +++ b/drbd/drbd_sender.c
> @@ -2106,6 +2106,7 @@ int w_e_end_rsdata_req(struct drbd_work *w, int cancel)
> if (peer_device->repl_state[NOW] == L_AHEAD) {
> err = drbd_send_ack(peer_device, P_RS_CANCEL, peer_req);
> } else if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) {
> + mutex_lock(&peer_device->device->resync_lock);
> if (unlikely(peer_device->disk_state[NOW] < D_INCONSISTENT)) {
> if (connection->agreed_features & DRBD_FF_RESYNC_DAGTAG) {
> drbd_err_ratelimit(peer_device,
> @@ -2154,6 +2155,7 @@ int w_e_end_rsdata_req(struct drbd_work *w, int cancel)
> if (expect_ack)
> peer_req = NULL;
> }
> + mutex_unlock(&peer_device->device->resync_lock);
> } else {
> drbd_err_ratelimit(peer_device, "Sending NegRSDReply. sector %llus.\n",
> (unsigned long long)peer_req->i.sector);
>
>
> 2.The al_resync_extent_active() check does not include changing extent.
> --- a/drbd/drbd_actlog.c
> +++ b/drbd/drbd_actlog.c
> @@ -163,12 +163,16 @@ bool drbd_al_active(struct drbd_device *device, sector_t sector, unsigned int si
>
> spin_lock_irq(&device->al_lock);
> for (enr = first; enr <= last; enr++) {
> - struct lc_element *al_ext;
> - al_ext = lc_find(device->act_log, enr);
> - if (al_ext && al_ext->refcnt > 0) {
> + if (lc_is_used(device->act_log, enr)) {
> active = true;
> break;
> }
> + // struct lc_element *al_ext;
> + // al_ext = lc_find(device->act_log, enr);
> + // if (al_ext && al_ext->refcnt > 0) {
> + // active = true;
> + // break;
> + // }
> }
> spin_unlock_irq(&device->al_lock);
>
>
> When we fixed these two pieces of code, the problem continued to recur.
> Now the reason is that SyncSource send thead is also blocked, so it will not reply to P_RS_CANCEL ack.
>
> SyncSource send thead:
> [<0>] wait_woken+0x2c/0x60
> [<0>] sk_stream_wait_memory+0x2bb/0x340
> [<0>] do_tcp_sendpages+0x258/0x340
> [<0>] tcp_sendpage_locked+0x44/0x60
> [<0>] tcp_sendpage+0x37/0x50
> [<0>] inet_sendpage+0x52/0x90
> [<0>] dtt_send_page+0x93/0x140 [drbd_transport_tcp]
> [<0>] flush_send_buffer+0xd0/0x150 [drbd]
> [<0>] __send_command+0xf8/0x160 [drbd]
> [<0>] drbd_send_block+0xaa/0x230 [drbd]
> [<0>] drbd_rs_reply+0x26e/0x300 [drbd]
> [<0>] w_e_end_rsdata_req+0xd6/0x4b0 [drbd]
> [<0>] drbd_sender+0x13a/0x3d0 [drbd]
> [<0>] drbd_thread_setup+0x69/0x190 [drbd]
> [<0>] kthread+0x10a/0x120
> [<0>] ret_from_fork+0x1f/0x40
>
> SyncTarget receiver thread:
> [<0>] conn_wait_active_ee_empty_or_disconnect+0x7d/0xb0 [drbd]
> [<0>] receive_Barrier+0x16b/0x1f0 [drbd]
> [<0>] drbd_receiver+0x5af/0x7f0 [drbd]
> [<0>] drbd_thread_setup+0x5c/0x160 [drbd]
> [<0>] kthread+0x10a/0x120
> [<0>] ret_from_fork+0x1f/0x40
>
> Do you have any good solutions?
>
> Best regards,
> zhengbing
>
>
> From: Zhengbing <zhengbing.huang at easystack.cn>
> Date: 2024-10-16 20:03:27
> To: drbd-dev at lists.linbit.com
> Subject: drbd: resync blocks
>
> Hi Joel,
>
> I have a problem with resync blocks.
>
> First, I have a 2 node cluster, and node-1 running DRBD 9.1 and node -2 is DRBD 9.2 and protocol C.
> and the problem scenario is as follows:
> 1. node-2 always has application IO
> 2. node-1 network failure with node-2
> 3. the network is restored. node-1 is SyncTarget and node-2 is SyncSource
> 4. then resync process is blocks
>
> You solved the same problem in Commit "7a9ae1a208", but I still have this problem.
>
> So, how do I solve this problem?
>
> Best regards,
> zhengbing
>
>
>
More information about the drbd-dev
mailing list