Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
we have the impression this unveils a bug in DRBD.
It might be triggered if:
A resource with multiple volumes
AND
ko-count >=1
AND
a write request triggers the timeout (ko-count * timeout)
then a wrong state transition confuses DRBD's state
handling.
The fix:
diff --git a/drbd/drbd_req.c b/drbd/drbd_req.c
index 7cd9e14..b7df80e 100644
--- a/drbd/drbd_req.c
+++ b/drbd/drbd_req.c
@@ -1733,7 +1733,7 @@ void request_timer_fn(unsigned long data)
time_after(now, req_peer->pre_send_jif + ent) &&
!time_in_range(now, connection->last_reconnect_jif, connection->last_reconnect_jif + e
drbd_warn(device, "Remote failed to finish a request within ko-count * timeout\n");
- _drbd_set_state(_NS(device, conn, C_TIMEOUT), CS_VERBOSE | CS_HARD, NULL);
+ _conn_request_state(connection, NS(conn, C_TIMEOUT), CS_VERBOSE | CS_HARD);
}
if (dt && oldest_submit_jif != now &&
time_after(now, oldest_submit_jif + dt) &&
or here:
http://git.drbd.org/gitweb.cgi?p=drbd-8.4.git;a=commit;h=79a03fc61fd04e91dd2a4562f28c57d256a075e4
best regards,
Phil