[DRBD-user] strange drbd bug

Lars Ellenberg lars.ellenberg at linbit.com
Wed Oct 15 14:16:29 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On Tue, Oct 14, 2014 at 10:37:13PM +0200, Lars Ellenberg wrote:
> _drbd_thread_stop takes a "wait" parameter.
> It's a helper function for drbd_thread_stop()
> and for drbd_thread_stop_nowait().
> It is possible that "somewhere" we call drbd_thread_stop()
> where we should call drbd_thread_stop_nowait().
> But I don't see where that would be.
> Same helper function is called for drbd_thread_restart_nowait().
> I'd say that's what happens here.
> So from the state change triggered by the request_timer_fn
> in softirq context ("current" is "swapper" as seen from the other mail),
> we trigger a drbd_thread_restart_nowait().
> Which calls kthread_create -> drbd_thread_setup
> And *there* it bombs out.
> Yes, I see it happening now.
> I just wonder why it does not happen *always* then, *every* time that
> request timer expires and causes the peer to be kicked out?
> maybe something changed around kthread_create as well.
> Or maybe we just have been lucky for the last few years,
> and the "wake_up_process(); wait_on_completion()" in kthread_run
> just so happened to never need to call schedule,
> because it always took the fast path.

Hm. No.
Normally, when we call this drbd_thread_restart_nowait,
the respective thread is still alive, and it just
"cycles" that thread, and does not need to create a new one.

In this case, apparently there was a race with the thread terminating
for other reasons and this call, and now we need to call into

Ok, we know where it is broken.
Still we'll have to think about how to fix it best.


More information about the drbd-user mailing list