[PATCH 2/3] drbd: Fix kernel crash in drbd_find_path_by_addr()
zhengbing.huang
zhengbing.huang at easystack.cn
Wed Jul 9 04:55:51 CEST 2025
We hava the crash info as follow:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
Workqueue: ib_cm cm_work_handler [ib_cm]
RIP: 0010:drbd_find_path_by_addr+0x6c/0xd0 [drbd]
Call Trace:
dtr_cma_event_handler+0x1c1/0x4ee [drbd_transport_rdma]
cma_cm_event_handler+0x25/0xd0 [rdma_cm]
cma_ib_req_handler+0x7cd/0x1250 [rdma_cm]
? addr4_resolve+0x67/0xd0 [ib_core]
cm_process_work+0x22/0xf0 [ib_cm]
cm_req_handler+0x7ed/0xf40 [ib_cm]
? __switch_to_asm+0x35/0x70
cm_work_handler+0x798/0xf30 [ib_cm]
? finish_task_switch+0x18e/0x2e0
process_one_work+0x1a7/0x360
? create_worker+0x1a0/0x1a0
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x10a/0x120
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x40
The code that crash is traverse the listener->waiters list:
struct drbd_path *drbd_find_path_by_addr(struct drbd_listener *listener, struct sockaddr_storage *addr)
{
struct drbd_path *path;
list_for_each_entry(path, &listener->waiters, listener_link) {
if (addr_equal(&path->peer_addr, addr))
return path;
}
return NULL;
}
The listener->waiters list has a Path node:
crash> struct dtr_listener ff4ba75054797c00
struct dtr_listener {
listener = {
kref = {
refcount = {
refs = {
counter = 2
}
}
},
resource = 0xff4ba766cc325000,
transport_class = 0xffffffffc037f080 <rdma_transport_class>,
list = {
next = 0xff4ba766cc325500,
prev = 0xff4ba766cc325500
},
waiters = {
next = 0xff4ba74fd578e138,
prev = 0xff4ba74fd578e138
},
...
}
but this Path has been released:
crash> struct drbd_path 0xff4ba74fd578e000
struct drbd_path {
my_addr = {
ss_family = 1,
__data = "\000\000\000\000"
},
peer_addr = {
ss_family = 0,
__data = "\000\000\000\000\000\000\0"
},
kref = {
refcount = {
refs = {
counter = 0
}
}
},
net = 0x0,
my_addr_len = 0,
peer_addr_len = 0,
flags = 0,
// all zero
...
}
So this path has been released, but it is still on the listener->waiters list,
which cause problem when traverse the list later.
And the scenario of this problem should be like this:
thread_1:
remove_path()
dtr_remove_path()
drbd_put_listener()
list_del(&path->listener_link)
thread_2:
...
dtr_activate_path()
drbd_get_listener()
list_add(&path->listener_link, &listener->waiters);
...
...
kfree(path)
thread_3:
connect request come in:
dtr_cma_event_handler()
dtr_cma_accept()
drbd_find_path_by_addr()
crash
To avoid this use-after-free, we hold an additional reference to drbd_path
whenever it is added to the listener->waiters list, and drop it when removed.
This ensures the path memory remains valid during list traversal.
Signed-off-by: zhengbing.huang <zhengbing.huang at easystack.cn>
---
drbd/drbd_transport.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drbd/drbd_transport.c b/drbd/drbd_transport.c
index 00e7f9269..aff96716f 100644
--- a/drbd/drbd_transport.c
+++ b/drbd/drbd_transport.c
@@ -224,6 +224,7 @@ int drbd_get_listener(struct drbd_path *path)
spin_lock_bh(&listener->waiters_lock);
list_add(&path->listener_link, &listener->waiters);
+ kref_get(&path->kref);
path->listener = listener;
spin_unlock_bh(&listener->waiters_lock);
/* After exposing the listener on a path, drbd_put_listenr() can destroy it. */
@@ -258,6 +259,7 @@ void drbd_put_listener(struct drbd_path *path)
spin_lock_bh(&listener->waiters_lock);
list_del(&path->listener_link);
+ kref_put(&path->kref, drbd_destroy_path);
spin_unlock_bh(&listener->waiters_lock);
kref_put(&listener->kref, drbd_listener_destroy);
}
--
2.43.0
More information about the drbd-dev
mailing list