[Drbd-dev] wait_event called while nested inside another wait_event, using the same wait queue

Mon Jul 8 19:53:55 CEST 2019

change_cluster_wide_state() calls wait_event, nested inside the condition-test of another
wait_event call, using *the same wait queue* for both the outer and the inner wait calls.

I don't understand the intended kernel model precisely enough to know if this is a subtle bug,
or if it's really guaranteed OK to do this.  But it's pretty suspicious if it hasn't been
thought through carefully, so I thought I'd mention it.

The outer wait on resource->state_wait occurs when drbd_set_role() calls stable_state_change()
to wait for the condition of a successful change_role().  change_role() is called for the
condition-test and calls change_cluster_wide_state() which issues the inner (nested) wait on
resource->state_wait, waiting for cluster_wide_reply_ready().

Possibly it could work or fail depending on changes to the kernel wait implementation.
    For example, suppose the underlying wait implementation held a wait-queue lock during the
    condition check.  Calling wait_event nested inside the condition-test of another wait_event
    on the same wait_queue would lead to a recursive lock-acquisition attempt.  (On the other
    hand that may only prove that the implementation can't hold such a lock.)

The nested usage does not seem to have any bad effects (now that my wait implementation no
longer holds a wait-queue lock during the condition-test).

diff --git a/drbd/drbd_state.c b/drbd/drbd_state.c
index c3371e18..409b9afc 100644
--- a/drbd/drbd_state.c
+++ b/drbd/drbd_state.c
@@ -4195,6 +4195,18 @@ change_cluster_wide_state(bool (*change)(struct change_context *, enum change_ph
 				    &request, reach_immediately);
 	have_peers = rv == SS_CW_SUCCESS;
 	if (have_peers) {
+		// We are here as the change_state macro from inside the condition of
+		//	    wait_event_interruptible((resource)->state_wait,
+		//		    (rv = (change_state)) != SS_IN_TRANSIENT_STATE);
+		//
+		//XXX Is it really kosher to now use that same resource->state_wait
+		//    in another wait_event while we are nested within the first one?
+		//
+		// In drbd_set_role():
+		//	rv = stable_state_change(resource,
+		//		change_role(resource, role, flags, with_force, &err_str));
+		// where change_role() calls this function.
+		//
 		if (wait_event_timeout(resource->state_wait,
 				       cluster_wide_reply_ready(resource),
 				       twopc_timeout(resource)))