Re:[PATCH] drbd: make drbd_adm_detach() interruptible
Zhengbing
zhengbing.huang at easystack.cn
Thu Jul 4 04:59:34 CEST 2024
Hi Philipp,
IIUC, this patch is going to fix the following problem:
(1) backing disk error and no completion returned from hardware.
(2) drbdsetup detach command will queue_after_state_change_work(resource, done, work); this work will be handled in drbd_worker, but it con't finish in drbd_md_sync()->wait_until_done_or_force_detached()
(3) drbdsetup detach process will continue to __state_change_unlock and hang as the related after_state_change work not complete.
So this patch allow user to use kill command to send signal to drbdsetup detach command, then it go into interrupt_detach(), and interrupt_detach() can make wait_until_done_or_force_detached() continue.
After that, w_after_state_change() can continue to complete(worker->done), which makes drbdsetup detach process continue.
If this is what you want, I think it fix a different problem case based on the [1/11] in our patchset, So we need [1/11] and this patch both, right?
best regards,
zhengbing
From: Philipp Reisner <philipp.reisner at linbit.com>Date: 2024-07-03 22:31:35
To: Dongsheng Yang <dongsheng.yang at linux.dev>
Cc: "zhengbing . huang" <zhengbing.huang at easystack.cn>,drbd-dev at lists.linbit.com,Philipp Reisner <philipp.reisner at linbit.com>
Subject: [PATCH] drbd: make drbd_adm_detach() interruptible>If a backing device suddenly ceases delivering I/O completions, and in
>reaction, the user issues a `drbdsetup detach`, the operation will
>hang when it tries to write internal meta-data.
>
>The user should have used `drbdsetup --force detach`, but it is too
>late. There was no way to interrupt the hanging drbdsetup detach.
>
>Improve the situation by making detach operations interruptible.
>---
> drbd/drbd_actlog.c | 5 ++++-
> drbd/drbd_int.h | 1 +
> drbd/drbd_state.c | 29 +++++++++++++++++++++++++++--
> 3 files changed, 32 insertions(+), 3 deletions(-)
>
>diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
>index bc09dee2f..d6ba168ac 100644
>--- a/drbd/drbd_actlog.c
>+++ b/drbd/drbd_actlog.c
>@@ -74,7 +74,10 @@ void wait_until_done_or_force_detached(struct drbd_device *device, struct drbd_b
> dt = MAX_SCHEDULE_TIMEOUT;
>
> dt = wait_event_timeout(device->misc_wait,
>- *done || test_bit(FORCE_DETACH, &device->flags), dt);
>+ *done ||
>+ test_bit(FORCE_DETACH, &device->flags) ||
>+ test_bit(INTERRUPT_DETACH, &device->flags),
>+ dt);
> if (dt == 0) {
> drbd_err(device, "meta-data IO operation timed out\n");
> drbd_handle_io_error(device, DRBD_FORCE_DETACH);
>diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
>index 0ebd79091..8ea752edd 100644
>--- a/drbd/drbd_int.h
>+++ b/drbd/drbd_int.h
>@@ -521,6 +521,7 @@ enum device_flag {
> MD_NO_FUA, /* meta data device does not support barriers,
> so don't even try */
> FORCE_DETACH, /* Force-detach from local disk, aborting any pending local IO */
>+ INTERRUPT_DETACH, /* Interrupt an ongoing detach operation */
> NEW_CUR_UUID, /* Create new current UUID when thawing IO or issuing local IO */
> __NEW_CUR_UUID, /* Set NEW_CUR_UUID as soon as state change visible */
> WRITING_NEW_CUR_UUID, /* Set while the new current ID gets generated. */
>diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
>index be1de8f06..643b2f385 100644
>--- a/drbd/drbd_state.c
>+++ b/drbd/drbd_state.c
>@@ -924,14 +924,39 @@ void state_change_lock(struct drbd_resource *resource, unsigned long *irq_flags,
> resource->state_change_flags = flags;
> }
>
>+/* Interrupt writing meta-data */
>+static void interrupt_detach(struct drbd_resource *resource, struct completion *done)
>+{
>+ struct drbd_device *device;
>+ int vnr;
>+
>+ idr_for_each_entry(&resource->devices, device, vnr) {
>+ if (device->disk_state[NOW] == D_DETACHING) {
>+ set_bit(INTERRUPT_DETACH, &device->flags);
>+ wake_up_all(&device->misc_wait);
>+ }
>+ }
>+
>+ wait_for_completion(done);
>+
>+ idr_for_each_entry(&resource->devices, device, vnr) {
>+ if (test_bit(INTERRUPT_DETACH, &device->flags))
>+ clear_bit(INTERRUPT_DETACH, &device->flags);
>+ }
>+}
>+
> static void __state_change_unlock(struct drbd_resource *resource, unsigned long *irq_flags, struct completion *done)
> {
> enum chg_state_flags flags = resource->state_change_flags;
>
> resource->state_change_flags = 0;
> write_unlock_irqrestore(&resource->state_rwlock, *irq_flags);
>- if (done && expect(resource, current != resource->worker.task))
>- wait_for_completion(done);
>+ if (done && expect(resource, current != resource->worker.task)) {
>+ int err = wait_for_completion_interruptible(done);
>+
>+ if (err == -ERESTARTSYS)
>+ interrupt_detach(resource, done);
>+ }
> if ((flags & CS_SERIALIZE) && !(flags & (CS_ALREADY_SERIALIZED | CS_PREPARE)))
> up(&resource->state_sem);
> }
>--
>2.45.2
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20240704/7141fabd/attachment.htm>
More information about the drbd-dev
mailing list