Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Oct 14, 2014 at 10:37:13PM +0200, Lars Ellenberg wrote: > _drbd_thread_stop takes a "wait" parameter. > It's a helper function for drbd_thread_stop() > and for drbd_thread_stop_nowait(). > It is possible that "somewhere" we call drbd_thread_stop() > where we should call drbd_thread_stop_nowait(). > But I don't see where that would be. > > Same helper function is called for drbd_thread_restart_nowait(). > I'd say that's what happens here. > > So from the state change triggered by the request_timer_fn > in softirq context ("current" is "swapper" as seen from the other mail), > we trigger a drbd_thread_restart_nowait(). > > Which calls kthread_create -> drbd_thread_setup > > And *there* it bombs out. > > Yes, I see it happening now. > > I just wonder why it does not happen *always* then, *every* time that > request timer expires and causes the peer to be kicked out? > > maybe something changed around kthread_create as well. > > Or maybe we just have been lucky for the last few years, > and the "wake_up_process(); wait_on_completion()" in kthread_run > just so happened to never need to call schedule, > because it always took the fast path. Hm. No. Normally, when we call this drbd_thread_restart_nowait, the respective thread is still alive, and it just "cycles" that thread, and does not need to create a new one. In this case, apparently there was a race with the thread terminating for other reasons and this call, and now we need to call into kthread_create. Ok, we know where it is broken. Still we'll have to think about how to fix it best. Lars