Re:drbd: resync blocks
Zhengbing
zhengbing.huang at easystack.cn
Thu Oct 17 05:50:33 CEST 2024
Hi Joel,
In testing, we found that the problem is consistent with the one that commit "7a9ae1a208" solves.
commit "7a9ae1a208" comment in w_e_end_rsdata_req() function:
/* DRBD versions without DRBD_FF_RESYNC_DAGTAG lock
* 128MiB "resync extents" in the activity log whenever
* they make resync requests. Some of these versions
* also lock activity lock extents when receiving
* P_DATA. In particular, DRBD 9.0 and 9.1. This can
* cause a deadlock if we send resync replies in these
* extents as follows:
* * Node is SyncTarget towards us
* * Node locks a resync extent and sends P_RS_DATA_REQUEST
* * Node receives P_DATA write in this extent; write
* waits for resync extent to be unlocked
* * Node receives P_BARRIER (protocol A); receiver
* thread blocks waiting for write to complete
* * We reply to P_RS_DATA_REQUEST, but it is never
* processed because receiver thread is blocked
*
* Break the deadlock by canceling instead. This is
* sent on the control socket so it will be processed. */
Then we found two suspicious pieces of code (patch content below):
1. In the w_e_end_rsdata_req() function, there is no lock protection between al_resync_extent_active() and drbd_rs_reply(), and P_DATA may be sent through this gap.
@@ -180,9 +184,11 @@ struct lc_element *_al_get_nonblock(struct drbd_device *device, unsigned int enr
{
struct lc_element *al_ext;
+ mutex_lock(&device->resync_lock);
spin_lock_irq(&device->al_lock);
al_ext = is_local ? lc_try_get_local(device->act_log, enr) : lc_try_get(device->act_log, enr);
spin_unlock_irq(&device->al_lock);
+ mutex_unlock(&device->resync_lock);
return al_ext;
}
@@ -192,9 +198,11 @@ struct lc_element *_al_get(struct drbd_device *device, unsigned int enr, bool is
{
struct lc_element *al_ext;
+ mutex_lock(&device->resync_lock);
spin_lock_irq(&device->al_lock);
al_ext = is_local ? lc_get_local(device->act_log, enr) : lc_get(device->act_log, enr);
spin_unlock_irq(&device->al_lock);
+ mutex_unlock(&device->resync_lock);
return al_ext;
}
diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
index e9d2c3914..95cf2bb48 100644
--- a/drbd/drbd_int.h
+++ b/drbd/drbd_int.h
@@ -1588,6 +1588,7 @@ struct drbd_device {
int next_barrier_nr;
struct drbd_md_io md_io;
+ struct mutex resync_lock;
spinlock_t al_lock;
wait_queue_head_t al_wait;
struct lru_cache *act_log; /* activity log */
diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
index 02f4ae5d1..f625e2e83 100644
--- a/drbd/drbd_main.c
+++ b/drbd/drbd_main.c
@@ -4092,6 +4092,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
#ifdef CONFIG_DRBD_TIMING_STATS
spin_lock_init(&device->timing_lock);
#endif
+ mutex_init(&device->resync_lock);
spin_lock_init(&device->al_lock);
spin_lock_init(&device->pending_completion_lock);
diff --git a/drbd/drbd_req.c b/drbd/drbd_req.c
index c4aa23a31..f05e21dd3 100644
--- a/drbd/drbd_req.c
+++ b/drbd/drbd_req.c
@@ -2383,6 +2383,7 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
bool made_progress = false;
int err;
+ mutex_lock(&device->resync_lock);
spin_lock_irq(&device->al_lock);
/* Don't even try, if someone has it locked right now. */
@@ -2418,6 +2419,7 @@ static bool prepare_al_transaction_nonblock(struct drbd_device *device,
}
out:
spin_unlock_irq(&device->al_lock);
+ mutex_unlock(&device->resync_lock);
return made_progress;
}
diff --git a/drbd/drbd_sender.c b/drbd/drbd_sender.c
index 738be16d5..ddea6230a 100644
--- a/drbd/drbd_sender.c
+++ b/drbd/drbd_sender.c
@@ -2106,6 +2106,7 @@ int w_e_end_rsdata_req(struct drbd_work *w, int cancel)
if (peer_device->repl_state[NOW] == L_AHEAD) {
err = drbd_send_ack(peer_device, P_RS_CANCEL, peer_req);
} else if (likely((peer_req->flags & EE_WAS_ERROR) == 0)) {
+ mutex_lock(&peer_device->device->resync_lock);
if (unlikely(peer_device->disk_state[NOW] < D_INCONSISTENT)) {
if (connection->agreed_features & DRBD_FF_RESYNC_DAGTAG) {
drbd_err_ratelimit(peer_device,
@@ -2154,6 +2155,7 @@ int w_e_end_rsdata_req(struct drbd_work *w, int cancel)
if (expect_ack)
peer_req = NULL;
}
+ mutex_unlock(&peer_device->device->resync_lock);
} else {
drbd_err_ratelimit(peer_device, "Sending NegRSDReply. sector %llus.\n",
(unsigned long long)peer_req->i.sector);
2.The al_resync_extent_active() check does not include changing extent.
--- a/drbd/drbd_actlog.c
+++ b/drbd/drbd_actlog.c
@@ -163,12 +163,16 @@ bool drbd_al_active(struct drbd_device *device, sector_t sector, unsigned int si
spin_lock_irq(&device->al_lock);
for (enr = first; enr <= last; enr++) {
- struct lc_element *al_ext;
- al_ext = lc_find(device->act_log, enr);
- if (al_ext && al_ext->refcnt > 0) {
+ if (lc_is_used(device->act_log, enr)) {
active = true;
break;
}
+ // struct lc_element *al_ext;
+ // al_ext = lc_find(device->act_log, enr);
+ // if (al_ext && al_ext->refcnt > 0) {
+ // active = true;
+ // break;
+ // }
}
spin_unlock_irq(&device->al_lock);
When we fixed these two pieces of code, the problem continued to recur.
Now the reason is that SyncSource send thead is also blocked, so it will not reply to P_RS_CANCEL ack.
SyncSource send thead:
[<0>] wait_woken+0x2c/0x60
[<0>] sk_stream_wait_memory+0x2bb/0x340
[<0>] do_tcp_sendpages+0x258/0x340
[<0>] tcp_sendpage_locked+0x44/0x60
[<0>] tcp_sendpage+0x37/0x50
[<0>] inet_sendpage+0x52/0x90
[<0>] dtt_send_page+0x93/0x140 [drbd_transport_tcp]
[<0>] flush_send_buffer+0xd0/0x150 [drbd]
[<0>] __send_command+0xf8/0x160 [drbd]
[<0>] drbd_send_block+0xaa/0x230 [drbd]
[<0>] drbd_rs_reply+0x26e/0x300 [drbd]
[<0>] w_e_end_rsdata_req+0xd6/0x4b0 [drbd]
[<0>] drbd_sender+0x13a/0x3d0 [drbd]
[<0>] drbd_thread_setup+0x69/0x190 [drbd]
[<0>] kthread+0x10a/0x120
[<0>] ret_from_fork+0x1f/0x40
SyncTarget receiver thread:
[<0>] conn_wait_active_ee_empty_or_disconnect+0x7d/0xb0 [drbd]
[<0>] receive_Barrier+0x16b/0x1f0 [drbd]
[<0>] drbd_receiver+0x5af/0x7f0 [drbd]
[<0>] drbd_thread_setup+0x5c/0x160 [drbd]
[<0>] kthread+0x10a/0x120
[<0>] ret_from_fork+0x1f/0x40
Do you have any good solutions?
Best regards,
zhengbing
From: Zhengbing <zhengbing.huang at easystack.cn>
Date: 2024-10-16 20:03:27
To: drbd-dev at lists.linbit.com
Subject: drbd: resync blocks
Hi Joel,
I have a problem with resync blocks.
First, I have a 2 node cluster, and node-1 running DRBD 9.1 and node -2 is DRBD 9.2 and protocol C.
and the problem scenario is as follows:
1. node-2 always has application IO
2. node-1 network failure with node-2
3. the network is restored. node-1 is SyncTarget and node-2 is SyncSource
4. then resync process is blocks
You solved the same problem in Commit "7a9ae1a208", but I still have this problem.
So, how do I solve this problem?
Best regards,
zhengbing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20241017/ec4047e9/attachment-0001.htm>
More information about the drbd-dev
mailing list