Re:[PATCH] drbd: make drbd_adm_detach() interruptible

Zhengbing zhengbing.huang at easystack.cn
Thu Jul 4 04:59:34 CEST 2024


Hi Philipp,
    IIUC, this patch is going to fix the following problem:
(1) backing disk error and no completion returned from hardware.

(2) drbdsetup detach command will queue_after_state_change_work(resource, done, work); this work will be handled in drbd_worker, but it con't finish in drbd_md_sync()->wait_until_done_or_force_detached()

(3) drbdsetup detach process will continue to __state_change_unlock and hang as the related after_state_change work not complete.

So this patch allow user to use kill command to send signal to drbdsetup detach command, then it go into interrupt_detach(), and interrupt_detach() can make wait_until_done_or_force_detached() continue.

After that, w_after_state_change() can continue to complete(worker->done), which makes drbdsetup detach process continue.

If this is what you want, I think it fix a different problem case based on the [1/11] in our patchset, So we need [1/11] and this patch both, right?


best regards,
    zhengbing



From: Philipp Reisner <philipp.reisner at linbit.com>Date: 2024-07-03 22:31:35
To:  Dongsheng Yang <dongsheng.yang at linux.dev>
Cc:  "zhengbing . huang" <zhengbing.huang at easystack.cn>,drbd-dev at lists.linbit.com,Philipp Reisner <philipp.reisner at linbit.com>
Subject: [PATCH] drbd: make drbd_adm_detach() interruptible>If a backing device suddenly ceases delivering I/O completions, and in
>reaction, the user issues a `drbdsetup detach`, the operation will
>hang when it tries to write internal meta-data.
>
>The user should have used `drbdsetup --force detach`, but it is too
>late. There was no way to interrupt the hanging drbdsetup detach.
>
>Improve the situation by making detach operations interruptible.
>---
> drbd/drbd_actlog.c |  5 ++++-
> drbd/drbd_int.h    |  1 +
> drbd/drbd_state.c  | 29 +++++++++++++++++++++++++++--
> 3 files changed, 32 insertions(+), 3 deletions(-)
>
>diff --git a/drbd/drbd_actlog.c b/drbd/drbd_actlog.c
>index bc09dee2f..d6ba168ac 100644
>--- a/drbd/drbd_actlog.c
>+++ b/drbd/drbd_actlog.c
>@@ -74,7 +74,10 @@ void wait_until_done_or_force_detached(struct drbd_device *device, struct drbd_b
> 		dt = MAX_SCHEDULE_TIMEOUT;
> 
> 	dt = wait_event_timeout(device->misc_wait,
>-			*done || test_bit(FORCE_DETACH, &device->flags), dt);
>+			*done ||
>+			test_bit(FORCE_DETACH, &device->flags) ||
>+			test_bit(INTERRUPT_DETACH, &device->flags),
>+			dt);
> 	if (dt == 0) {
> 		drbd_err(device, "meta-data IO operation timed out\n");
> 		drbd_handle_io_error(device, DRBD_FORCE_DETACH);
>diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
>index 0ebd79091..8ea752edd 100644
>--- a/drbd/drbd_int.h
>+++ b/drbd/drbd_int.h
>@@ -521,6 +521,7 @@ enum device_flag {
> 	MD_NO_FUA,		/* meta data device does not support barriers,
> 				   so don't even try */
> 	FORCE_DETACH,		/* Force-detach from local disk, aborting any pending local IO */
>+	INTERRUPT_DETACH,	/* Interrupt an ongoing detach operation */
> 	NEW_CUR_UUID,		/* Create new current UUID when thawing IO or issuing local IO */
> 	__NEW_CUR_UUID,		/* Set NEW_CUR_UUID as soon as state change visible */
> 	WRITING_NEW_CUR_UUID,	/* Set while the new current ID gets generated. */
>diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
>index be1de8f06..643b2f385 100644
>--- a/drbd/drbd_state.c
>+++ b/drbd/drbd_state.c
>@@ -924,14 +924,39 @@ void state_change_lock(struct drbd_resource *resource, unsigned long *irq_flags,
> 	resource->state_change_flags = flags;
> }
> 
>+/* Interrupt writing meta-data */
>+static void interrupt_detach(struct drbd_resource *resource, struct completion *done)
>+{
>+	struct drbd_device *device;
>+	int vnr;
>+
>+	idr_for_each_entry(&resource->devices, device, vnr) {
>+		if (device->disk_state[NOW] == D_DETACHING) {
>+			set_bit(INTERRUPT_DETACH, &device->flags);
>+			wake_up_all(&device->misc_wait);
>+		}
>+	}
>+
>+	wait_for_completion(done);
>+
>+	idr_for_each_entry(&resource->devices, device, vnr) {
>+		if (test_bit(INTERRUPT_DETACH, &device->flags))
>+			clear_bit(INTERRUPT_DETACH, &device->flags);
>+	}
>+}
>+
> static void __state_change_unlock(struct drbd_resource *resource, unsigned long *irq_flags, struct completion *done)
> {
> 	enum chg_state_flags flags = resource->state_change_flags;
> 
> 	resource->state_change_flags = 0;
> 	write_unlock_irqrestore(&resource->state_rwlock, *irq_flags);
>-	if (done && expect(resource, current != resource->worker.task))
>-		wait_for_completion(done);
>+	if (done && expect(resource, current != resource->worker.task)) {
>+		int err = wait_for_completion_interruptible(done);
>+
>+		if (err == -ERESTARTSYS)
>+			interrupt_detach(resource, done);
>+	}
> 	if ((flags & CS_SERIALIZE) && !(flags & (CS_ALREADY_SERIALIZED | CS_PREPARE)))
> 		up(&resource->state_sem);
> }
>-- 
>2.45.2
>




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20240704/7141fabd/attachment.htm>


More information about the drbd-dev mailing list