[PATCH 08/11] drbd_transport_rdma: fix a race between dtr_connect and drbd_thread_stop

zhengbing.huang zhengbing.huang at easystack.cn
Mon Jun 24 07:46:16 CEST 2024


From: Dongsheng Yang <dongsheng.yang at easystack.cn>

If the send_sig() in drbd_thread_stop before wait_for_completion_interruptible() in dtr_connect(),
it can't return from dtr_connect in network failure.

So replace wait_for_completion_interruptible with wait_for_completion_interruptible_timeout, and
check status by dtr_connect() itself.

This behavior is similar with tcp transport

Signed-off-by: Dongsheng Yang <dongsheng.yang at easystack.cn>
---
 drbd/drbd_transport_rdma.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
index 77ff0055e..c47b344f8 100644
--- a/drbd/drbd_transport_rdma.c
+++ b/drbd/drbd_transport_rdma.c
@@ -2996,12 +2996,21 @@ static int dtr_connect(struct drbd_transport *transport)
 {
 	struct dtr_transport *rdma_transport =
 		container_of(transport, struct dtr_transport, transport);
-	int i, err = -ENOMEM;
+	int i, err;
 
-	err = wait_for_completion_interruptible(&rdma_transport->connected);
-	if (err) {
+again:
+	if (drbd_should_abort_listening(transport)) {
+		err = -EAGAIN;
+		goto abort;
+	}
+
+	err = wait_for_completion_interruptible_timeout(&rdma_transport->connected, HZ);
+	if (err < 0) {
 		flush_signals(current);
 		goto abort;
+	} else if (err == 0) {
+		/* timed out */
+		goto again;
 	}
 
 	err = atomic_read(&rdma_transport->first_path_connect_err);
-- 
2.27.0



More information about the drbd-dev mailing list