drbd: resync blocks

Fri Oct 18 15:23:06 CEST 2024

Hi Zhengbing,

Thanks for investigating this. I don't have any immediate comments on
this. It may take a while until I have time to look into it in detail.

It would be very helpful to have a reproducer. Ideally using the
framework from https://github.com/LINBIT/drbd9-tests

I guess this hasn't been reported by other users because it occurs
during a resync in the less common direction. Normally 9.1 and 9.2 are
only mixed for a short time while upgrading from 9.1 to 9.2. In this
case, the resync with mixed versions has the SyncSource on 9.1 and the
SyncTarget on 9.2. Nevertheless, it is a valid scenario that we want
to fix.

Best regards,
Joel

On Thu, 17 Oct 2024 at 06:30, Zhengbing <zhengbing.huang at easystack.cn> wrote:
>
>
> Hi Joel,
>
> In testing, we found that the problem is consistent with the one that commit "7a9ae1a208" solves.
> commit "7a9ae1a208" comment in w_e_end_rsdata_req() function:
>             /* DRBD versions without DRBD_FF_RESYNC_DAGTAG lock
>              * 128MiB "resync extents" in the activity log whenever
>              * they make resync requests. Some of these versions
>              * also lock activity lock extents when receiving
>              * P_DATA. In particular, DRBD 9.0 and 9.1. This can
>              * cause a deadlock if we send resync replies in these
>              * extents as follows:
>              * * Node is SyncTarget towards us
>              * * Node locks a resync extent and sends P_RS_DATA_REQUEST
>              * * Node receives P_DATA write in this extent; write
>              *   waits for resync extent to be unlocked
>              * * Node receives P_BARRIER (protocol A); receiver
>              *   thread blocks waiting for write to complete
>              * * We reply to P_RS_DATA_REQUEST, but it is never
>              *   processed because receiver thread is blocked
>              *
>              * Break the deadlock by canceling instead. This is
>              * sent on the control socket so it will be processed. */
>
> Then we found two suspicious pieces of code (patch content below):
> 1. In the w_e_end_rsdata_req() function, there is no lock protection between al_resync_extent_active() and drbd_rs_reply(), and P_DATA may be sent through this gap.
> @@ -180,9 +184,11 @@ struct lc_element *_al_get_nonblock(struct drbd_device *device, unsigned int enr
>  {
>      struct lc_element *al_ext;
>
> +    mutex_lock(&device->resync_lock);
>      spin_lock_irq(&device->al_lock);
>      al_ext = is_local ? lc_try_get_local(device->act_log, enr) : lc_try_get(device->act_log, enr);
>      spin_unlock_irq(&device->al_lock);
> +    mutex_unlock(&device->resync_lock);
>
>      return al_ext;
>  }
> @@ -192,9 +198,11 @@ struct lc_element *_al_get(struct drbd_device *device, unsigned int enr, bool is
>  {
>      struct lc_element *al_ext;
>
> +    mutex_lock(&device->resync_lock);
>      spin_lock_irq(&device->al_lock);
>      al_ext = is_local ? lc_get_local(device->act_log, enr) : lc_get(device->act_log, enr);
>      spin_unlock_irq(&device->al_lock);
> +    mutex_unlock(&device->resync_lock);
>
>      return al_ext;
>  }
> diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
> index e9d2c3914..95cf2bb48 100644
> --- a/drbd/drbd_int.h
> +++ b/drbd/drbd_int.h
> @@ -1588,6 +1588,7 @@ struct drbd_device {
>
>      int next_barrier_nr;
>      struct drbd_md_io md_io;
> +    struct mutex resync_lock;
>      spinlock_t al_lock;
>      wait_queue_head_t al_wait;
>      struct lru_cache *act_log;    /* activity log */
> diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
> index 02f4ae5d1..f625e2e83 100644
> --- a/drbd/drbd_main.c
> +++ b/drbd/drbd_main.c
> @@ -4092,6 +4092,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
>  #ifdef CONFIG_DRBD_TIMING_STATS
>      spin_lock_init(&device->timing_lock);
>  #endif
> +    mutex_init(&device->resync_lock);
>      spin_lock_init(&device->al_lock);
>
>      spin_lock_init(&device->pending_completion_lock);
> diff --git a/drbd/drbd_req.c b/drbd/drbd_req.c
> index c4aa23a31..f05e21dd3 100644
> --- a/drbd/drbd_req.c
> +++ b/drbd/drbd_req.c
> @@ -2383,6 +2383,7 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
>      bool made_progress = false;
>      int err;
>
> +    mutex_lock(&device->resync_lock);
>      spin_lock_irq(&device->al_lock);
>
>      /* Don't even try, if someone has it locked right now. */
> @@ -2418,6 +2419,7 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
>      }
>   out:
>      spin_unlock_irq(&device->al_lock);
> +    mutex_unlock(&device->resync_lock);
>      return made_progress;
>  }
>
> diff --git a/drbd/drbd_sender.c b/drbd/drbd_sender.c
> index 738be16d5..ddea6230a 100644
> --- a/drbd/drbd_sender.c
> +++ b/drbd/drbd_sender.c
> @@ -2106,6 +2106,7 @@ int w_e_end_rsdata_req(struct drbd_work *w, int cancel)
>      if (peer_device->repl_state[NOW] == L_AHEAD) {
>          err = drbd_send_ack(peer_device, P_RS_CANCEL, peer_req);
>      } else if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) {
> +        mutex_lock(&peer_device->device->resync_lock);
>          if (unlikely(peer_device->disk_state[NOW] < D_INCONSISTENT)) {
>              if (connection->agreed_features & DRBD_FF_RESYNC_DAGTAG) {
>                  drbd_err_ratelimit(peer_device,
> @@ -2154,6 +2155,7 @@ int w_e_end_rsdata_req(struct drbd_work *w, int cancel)
>              if (expect_ack)
>                  peer_req = NULL;
>          }
> +        mutex_unlock(&peer_device->device->resync_lock);
>      } else {
>          drbd_err_ratelimit(peer_device, "Sending NegRSDReply. sector %llus.\n",
>              (unsigned long long)peer_req->i.sector);
>
>
> 2.The al_resync_extent_active() check does not include changing extent.
> --- a/drbd/drbd_actlog.c
> +++ b/drbd/drbd_actlog.c
> @@ -163,12 +163,16 @@ bool drbd_al_active(struct drbd_device *device, sector_t sector, unsigned int si
>
>         spin_lock_irq(&device->al_lock);
>         for (enr = first; enr <= last; enr++) {
> -               struct lc_element *al_ext;
> -               al_ext = lc_find(device->act_log, enr);
> -               if (al_ext && al_ext->refcnt > 0) {
> +               if (lc_is_used(device->act_log, enr)) {
>                         active = true;
>                         break;
>                 }
> +               // struct lc_element *al_ext;
> +               // al_ext = lc_find(device->act_log, enr);
> +               // if (al_ext && al_ext->refcnt > 0) {
> +               //      active = true;
> +               //      break;
> +               // }
>         }
>         spin_unlock_irq(&device->al_lock);
>
>
> When we fixed these two pieces of code, the problem continued to recur.
> Now the reason is that SyncSource send thead is also blocked, so it will not reply to P_RS_CANCEL ack.
>
> SyncSource send thead:
> [<0>] wait_woken+0x2c/0x60
> [<0>] sk_stream_wait_memory+0x2bb/0x340
> [<0>] do_tcp_sendpages+0x258/0x340
> [<0>] tcp_sendpage_locked+0x44/0x60
> [<0>] tcp_sendpage+0x37/0x50
> [<0>] inet_sendpage+0x52/0x90
> [<0>] dtt_send_page+0x93/0x140 [drbd_transport_tcp]
> [<0>] flush_send_buffer+0xd0/0x150 [drbd]
> [<0>] __send_command+0xf8/0x160 [drbd]
> [<0>] drbd_send_block+0xaa/0x230 [drbd]
> [<0>] drbd_rs_reply+0x26e/0x300 [drbd]
> [<0>] w_e_end_rsdata_req+0xd6/0x4b0 [drbd]
> [<0>] drbd_sender+0x13a/0x3d0 [drbd]
> [<0>] drbd_thread_setup+0x69/0x190 [drbd]
> [<0>] kthread+0x10a/0x120
> [<0>] ret_from_fork+0x1f/0x40
>
> SyncTarget receiver thread:
> [<0>] conn_wait_active_ee_empty_or_disconnect+0x7d/0xb0 [drbd]
> [<0>] receive_Barrier+0x16b/0x1f0 [drbd]
> [<0>] drbd_receiver+0x5af/0x7f0 [drbd]
> [<0>] drbd_thread_setup+0x5c/0x160 [drbd]
> [<0>] kthread+0x10a/0x120
> [<0>] ret_from_fork+0x1f/0x40
>
> Do you have any good solutions?
>
> Best regards,
> zhengbing
>
>
> From: Zhengbing <zhengbing.huang at easystack.cn>
> Date: 2024-10-16 20:03:27
> To: drbd-dev at lists.linbit.com
> Subject: drbd: resync blocks
>
> Hi Joel,
>
> I have a problem with resync blocks.
>
> First, I have a 2 node cluster, and node-1 running DRBD 9.1 and node -2 is DRBD 9.2 and protocol C.
> and the problem scenario is as follows:
> 1. node-2 always has application IO
> 2. node-1 network failure with node-2
> 3. the network is restored. node-1 is SyncTarget and node-2 is SyncSource
> 4. then resync process is blocks
>
> You solved the same problem in Commit "7a9ae1a208", but I still have this problem.
>
> So, how do I solve this problem?
>
> Best regards,
> zhengbing
>
>
>