<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">Hi Philipp,<br> IIUC, this patch is going to fix the following problem:<br>(1) backing disk error and no completion returned from hardware.<br><br>(2) drbdsetup detach command will queue_after_state_change_work(resource, done, work); this work will be handled in drbd_worker, but it con't finish in drbd_md_sync()->wait_until_done_or_force_detached()<br><br>(3) drbdsetup detach process will continue to __state_change_unlock and hang as the related after_state_change work not complete.<br><br>So this patch allow user to use kill command to send signal to drbdsetup detach command, then it go into interrupt_detach(), and interrupt_detach() can make wait_until_done_or_force_detached() continue.<br><br>After that, w_after_state_change() can continue to complete(worker->done), which makes drbdsetup detach process continue.<br><br><div>If this is what you want, I think it fix a different problem case based on the [1/11] in our patchset, So we need [1/11] and this patch both, right?</div><div><br></div><div>best regards,</div><div> zhengbing</div><div><br></div><div style="position:relative;zoom:1"></div><span style="white-space: pre-wrap">From: Philipp Reisner <philipp.reisner@linbit.com></span><pre>Date: 2024-07-03 22:31:35
To: Dongsheng Yang <dongsheng.yang@linux.dev>
Cc: "zhengbing . huang" <zhengbing.huang@easystack.cn>,drbd-dev@lists.linbit.com,Philipp Reisner <philipp.reisner@linbit.com>
Subject: [PATCH] drbd: make drbd_adm_detach() interruptible>If a backing device suddenly ceases delivering I/O completions, and in
>reaction, the user issues a `drbdsetup detach`, the operation will
>hang when it tries to write internal meta-data.
>
>The user should have used `drbdsetup --force detach`, but it is too
>late. There was no way to interrupt the hanging drbdsetup detach.
>
>Improve the situation by making detach operations interruptible.
>---
> drbd/drbd_actlog.c | 5 ++++-
> drbd/drbd_int.h | 1 +
> drbd/drbd_state.c | 29 +++++++++++++++++++++++++++--
> 3 files changed, 32 insertions(+), 3 deletions(-)
>
>diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
>index bc09dee2f..d6ba168ac 100644
>--- a/drbd/drbd_actlog.c
>+++ b/drbd/drbd_actlog.c
>@@ -74,7 +74,10 @@ void wait_until_done_or_force_detached(struct drbd_device *device, struct drbd_b
>                 dt = MAX_SCHEDULE_TIMEOUT;
>
>         dt = wait_event_timeout(device->misc_wait,
>-                        *done || test_bit(FORCE_DETACH, &device->flags), dt);
>+                        *done ||
>+                        test_bit(FORCE_DETACH, &device->flags) ||
>+                        test_bit(INTERRUPT_DETACH, &device->flags),
>+                        dt);
>         if (dt == 0) {
>                 drbd_err(device, "meta-data IO operation timed out\n");
>                 drbd_handle_io_error(device, DRBD_FORCE_DETACH);
>diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
>index 0ebd79091..8ea752edd 100644
>--- a/drbd/drbd_int.h
>+++ b/drbd/drbd_int.h
>@@ -521,6 +521,7 @@ enum device_flag {
>         MD_NO_FUA,                /* meta data device does not support barriers,
>                                  so don't even try */
>         FORCE_DETACH,                /* Force-detach from local disk, aborting any pending local IO */
>+        INTERRUPT_DETACH,        /* Interrupt an ongoing detach operation */
>         NEW_CUR_UUID,                /* Create new current UUID when thawing IO or issuing local IO */
>         __NEW_CUR_UUID,                /* Set NEW_CUR_UUID as soon as state change visible */
>         WRITING_NEW_CUR_UUID,        /* Set while the new current ID gets generated. */
>diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
>index be1de8f06..643b2f385 100644
>--- a/drbd/drbd_state.c
>+++ b/drbd/drbd_state.c
>@@ -924,14 +924,39 @@ void state_change_lock(struct drbd_resource *resource, unsigned long *irq_flags,
>         resource->state_change_flags = flags;
> }
>
>+/* Interrupt writing meta-data */
>+static void interrupt_detach(struct drbd_resource *resource, struct completion *done)
>+{
>+        struct drbd_device *device;
>+        int vnr;
>+
>+        idr_for_each_entry(&resource->devices, device, vnr) {
>+                if (device->disk_state[NOW] == D_DETACHING) {
>+                        set_bit(INTERRUPT_DETACH, &device->flags);
>+                        wake_up_all(&device->misc_wait);
>+                }
>+        }
>+
>+        wait_for_completion(done);
>+
>+        idr_for_each_entry(&resource->devices, device, vnr) {
>+                if (test_bit(INTERRUPT_DETACH, &device->flags))
>+                        clear_bit(INTERRUPT_DETACH, &device->flags);
>+        }
>+}
>+
> static void __state_change_unlock(struct drbd_resource *resource, unsigned long *irq_flags, struct completion *done)
> {
>         enum chg_state_flags flags = resource->state_change_flags;
>
>         resource->state_change_flags = 0;
>         write_unlock_irqrestore(&resource->state_rwlock, *irq_flags);
>-        if (done && expect(resource, current != resource->worker.task))
>-                wait_for_completion(done);
>+        if (done && expect(resource, current != resource->worker.task)) {
>+                int err = wait_for_completion_interruptible(done);
>+
>+                if (err == -ERESTARTSYS)
>+                        interrupt_detach(resource, done);
>+        }
>         if ((flags & CS_SERIALIZE) && !(flags & (CS_ALREADY_SERIALIZED | CS_PREPARE)))
>                 up(&resource->state_sem);
> }
>--
>2.45.2
>
</pre></div><br>