<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">Hi Philipp,<br>&nbsp;&nbsp;&nbsp; IIUC, this patch is going to fix the following problem:<br>(1) backing disk error and no completion returned from hardware.<br><br>(2) drbdsetup detach command will queue_after_state_change_work(resource, done, work); this work will be handled in drbd_worker, but it con't finish in drbd_md_sync()-&gt;wait_until_done_or_force_detached()<br><br>(3) drbdsetup detach process will continue to __state_change_unlock and hang as the related after_state_change work not complete.<br><br>So this patch allow user to use kill command to send signal to drbdsetup detach command, then it go into interrupt_detach(), and interrupt_detach() can make wait_until_done_or_force_detached() continue.<br><br>After that, w_after_state_change() can continue to complete(worker-&gt;done), which makes drbdsetup detach process continue.<br><br><div>If this is what you want, I think it fix a different problem case based on the [1/11] in our patchset, So we need [1/11] and this patch both, right?</div><div><br></div><div>best regards,</div><div>&nbsp;&nbsp;&nbsp; zhengbing</div><div><br></div><div  style="position:relative;zoom:1"></div><span style="white-space: pre-wrap">From: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;</span><pre>Date: 2024-07-03 22:31:35
To:  Dongsheng Yang &lt;dongsheng.yang@linux.dev&gt;
Cc:  "zhengbing . huang" &lt;zhengbing.huang@easystack.cn&gt;,drbd-dev@lists.linbit.com,Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Subject: [PATCH] drbd: make drbd_adm_detach() interruptible&gt;If a backing device suddenly ceases delivering I/O completions, and in
&gt;reaction, the user issues a `drbdsetup detach`, the operation will
&gt;hang when it tries to write internal meta-data.
&gt;
&gt;The user should have used `drbdsetup --force detach`, but it is too
&gt;late. There was no way to interrupt the hanging drbdsetup detach.
&gt;
&gt;Improve the situation by making detach operations interruptible.
&gt;---
&gt; drbd/drbd_actlog.c |  5 ++++-
&gt; drbd/drbd_int.h    |  1 +
&gt; drbd/drbd_state.c  | 29 +++++++++++++++++++++++++++--
&gt; 3 files changed, 32 insertions(+), 3 deletions(-)
&gt;
&gt;diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
&gt;index bc09dee2f..d6ba168ac 100644
&gt;--- a/drbd/drbd_actlog.c
&gt;+++ b/drbd/drbd_actlog.c
&gt;@@ -74,7 +74,10 @@ void wait_until_done_or_force_detached(struct drbd_device *device, struct drbd_b
&gt;                 dt = MAX_SCHEDULE_TIMEOUT;
&gt; 
&gt;         dt = wait_event_timeout(device-&gt;misc_wait,
&gt;-                        *done || test_bit(FORCE_DETACH, &amp;device-&gt;flags), dt);
&gt;+                        *done ||
&gt;+                        test_bit(FORCE_DETACH, &amp;device-&gt;flags) ||
&gt;+                        test_bit(INTERRUPT_DETACH, &amp;device-&gt;flags),
&gt;+                        dt);
&gt;         if (dt == 0) {
&gt;                 drbd_err(device, "meta-data IO operation timed out\n");
&gt;                 drbd_handle_io_error(device, DRBD_FORCE_DETACH);
&gt;diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
&gt;index 0ebd79091..8ea752edd 100644
&gt;--- a/drbd/drbd_int.h
&gt;+++ b/drbd/drbd_int.h
&gt;@@ -521,6 +521,7 @@ enum device_flag {
&gt;         MD_NO_FUA,                /* meta data device does not support barriers,
&gt;                                    so don't even try */
&gt;         FORCE_DETACH,                /* Force-detach from local disk, aborting any pending local IO */
&gt;+        INTERRUPT_DETACH,        /* Interrupt an ongoing detach operation */
&gt;         NEW_CUR_UUID,                /* Create new current UUID when thawing IO or issuing local IO */
&gt;         __NEW_CUR_UUID,                /* Set NEW_CUR_UUID as soon as state change visible */
&gt;         WRITING_NEW_CUR_UUID,        /* Set while the new current ID gets generated. */
&gt;diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
&gt;index be1de8f06..643b2f385 100644
&gt;--- a/drbd/drbd_state.c
&gt;+++ b/drbd/drbd_state.c
&gt;@@ -924,14 +924,39 @@ void state_change_lock(struct drbd_resource *resource, unsigned long *irq_flags,
&gt;         resource-&gt;state_change_flags = flags;
&gt; }
&gt; 
&gt;+/* Interrupt writing meta-data */
&gt;+static void interrupt_detach(struct drbd_resource *resource, struct completion *done)
&gt;+{
&gt;+        struct drbd_device *device;
&gt;+        int vnr;
&gt;+
&gt;+        idr_for_each_entry(&amp;resource-&gt;devices, device, vnr) {
&gt;+                if (device-&gt;disk_state[NOW] == D_DETACHING) {
&gt;+                        set_bit(INTERRUPT_DETACH, &amp;device-&gt;flags);
&gt;+                        wake_up_all(&amp;device-&gt;misc_wait);
&gt;+                }
&gt;+        }
&gt;+
&gt;+        wait_for_completion(done);
&gt;+
&gt;+        idr_for_each_entry(&amp;resource-&gt;devices, device, vnr) {
&gt;+                if (test_bit(INTERRUPT_DETACH, &amp;device-&gt;flags))
&gt;+                        clear_bit(INTERRUPT_DETACH, &amp;device-&gt;flags);
&gt;+        }
&gt;+}
&gt;+
&gt; static void __state_change_unlock(struct drbd_resource *resource, unsigned long *irq_flags, struct completion *done)
&gt; {
&gt;         enum chg_state_flags flags = resource-&gt;state_change_flags;
&gt; 
&gt;         resource-&gt;state_change_flags = 0;
&gt;         write_unlock_irqrestore(&amp;resource-&gt;state_rwlock, *irq_flags);
&gt;-        if (done &amp;&amp; expect(resource, current != resource-&gt;worker.task))
&gt;-                wait_for_completion(done);
&gt;+        if (done &amp;&amp; expect(resource, current != resource-&gt;worker.task)) {
&gt;+                int err = wait_for_completion_interruptible(done);
&gt;+
&gt;+                if (err == -ERESTARTSYS)
&gt;+                        interrupt_detach(resource, done);
&gt;+        }
&gt;         if ((flags &amp; CS_SERIALIZE) &amp;&amp; !(flags &amp; (CS_ALREADY_SERIALIZED | CS_PREPARE)))
&gt;                 up(&amp;resource-&gt;state_sem);
&gt; }
&gt;-- 
&gt;2.45.2
&gt;
</pre></div><br>