[PATCH] rdma: Fix drbd_transport_rdma module reference count exception

zhengbing.huang zhengbing.huang at easystack.cn
Wed Feb 19 04:08:04 CET 2025


In testing, we find drbd_transport_rdma module reference count is abnormal:
drbd_transport_rdma 262144 28293

we don't have that many drbd devices.

If the XXX_ADDR_ERROR/XXX_ROUTE_ERROR events occurs
and the DSB_CONNECTING flag bit is not set,
the dtr_cma_event_handler() returns 0 directly.
The cm structure cannot be destroyed,
and the drbd_transport_rdma module reference count is abnormal.

So, for XXX_ADDR_ERROR/XXX_ROUTE_ERROR events,
we do not need to judge the DSB_CONNECTING flag,
and we need to kref_put of cm structure.

Signed-off-by: zhengbing.huang <zhengbing.huang at easystack.cn>
---
 drbd/drbd_transport_rdma.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
index ba4f1baa7..bb59e6501 100644
--- a/drbd/drbd_transport_rdma.c
+++ b/drbd/drbd_transport_rdma.c
@@ -1292,6 +1292,11 @@ static int dtr_cma_event_handler(struct rdma_cm_id *cm_id, struct rdma_cm_event
 		// pr_info("%s: RDMA_CM_EVENT_ADDR_ERROR\n", cm->name);
 	case RDMA_CM_EVENT_ROUTE_ERROR:
 		// pr_info("%s: RDMA_CM_EVENT_ROUTE_ERROR\n", cm->name);
+		set_bit(DSB_ERROR, &cm->state);
+
+		dtr_cma_retry_connect(cm->path, cm);
+		break;
+
 	case RDMA_CM_EVENT_CONNECT_ERROR:
 		// pr_info("%s: RDMA_CM_EVENT_CONNECT_ERROR\n", cm->name);
 	case RDMA_CM_EVENT_UNREACHABLE:
-- 
2.43.0



More information about the drbd-dev mailing list