[PATCH 2/3] rdma: ratelimit error log
zhengbing.huang
zhengbing.huang at easystack.cn
Tue Jul 8 08:50:39 CEST 2025
Have a crash call trace as follow:
? bit_clear+0x120/0x120
fbcon_putcs+0xe7/0x100
fbcon_redraw.isra.20+0xfd/0x1e0
fbcon_scroll+0x8c9/0xde0
con_scroll+0x20b/0x220
? bit_clear+0x120/0x120
lf+0xa0/0xb0
vt_console_print+0x310/0x400
console_unlock+0x35f/0x4a0
vprintk_emit+0x14d/0x250
printk+0x58/0x6f
dtr_tx_cq_event_handler+0x895/0x8a0 [drbd_transport_rdma]
? sched_clock+0x5/0x10
? do_IRQ+0x7f/0xd0
mlx5_eq_comp_int+0xb0/0x1d0 [mlx5_core]
notifier_call_chain+0x47/0x70
atomic_notifier_call_chain+0x16/0x20
irq_int_handler+0x11/0x20 [mlx5_core]
and the code is:
(gdb) l *dtr_tx_cq_event_handler+0x894
0x3404 is in dtr_tx_cq_event_handler (.../drbd_transport_rdma.c:1935).
1930 if (stream_nr != ST_FLOW_CTRL) {
1931 err = dtr_repost_tx_desc(cm, tx_desc);
1932 if (!err)
1933 tx_desc = NULL; /* it is in the air again! Fly! */
1934 else
1935 tr_warn(transport, "repost of tx_desc failed! %d\n", err);
1936 }
This problem is that too many logs print in irq, cause the kernel crash.
So, we ratelimit error log
Signed-off-by: zhengbing.huang <zhengbing.huang at easystack.cn>
---
drbd/drbd_transport_rdma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drbd/drbd_transport_rdma.c b/drbd/drbd_transport_rdma.c
index 5270e503a..30edfaf96 100644
--- a/drbd/drbd_transport_rdma.c
+++ b/drbd/drbd_transport_rdma.c
@@ -1920,7 +1920,7 @@ static int dtr_handle_tx_cq_event(struct ib_cq *cq, struct dtr_cm *cm)
err = dtr_repost_tx_desc(cm, tx_desc);
if (!err)
tx_desc = NULL; /* it is in the air again! Fly! */
- else
+ else if(__ratelimit(&rdma_transport->rate_limit))
tr_warn(transport, "repost of tx_desc failed! %d\n", err);
}
}
--
2.43.0
More information about the drbd-dev
mailing list