[Drbd-dev] Avoid nested sleeping on TCP connect

Lars Ellenberg lars.ellenberg at linbit.com
Mon Feb 20 15:07:15 CET 2017


On Mon, Feb 20, 2017 at 11:54:45AM +0100, Andreas Osterburg wrote:
> Recent Linux-kernels (since 3.19) emit a warning when using nested sleeping
> statements within kernel code. CONFIG_DEBUG_ATOMIC_SLEEP must be enabled to
> see it.
> Module drbd_transport_tcp is affected and always triggers a warning
> on first connect:
> [ 6187.934573] WARNING: CPU: 33 PID: 17430 at ../kernel/sched/core.c:7963 __might_sleep+0x76/0x80()
> [ 6187.934580] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810c2dce>] prepare_to_wait_event+0x5e/0xf0

> [ 6187.934926]  [<ffffffff810a30b6>] __might_sleep+0x76/0x80
> [ 6187.934936]  [<ffffffff8160984c>] mutex_lock+0x1c/0x38
> [ 6187.934981]  [<ffffffffa05ba8f0>] dtt_wait_connect_cond+0x20/0xa0 [drbd_transport_tcp]
> [ 6187.935017]  [<ffffffffa05bb3ce>] dtt_wait_for_connect.constprop.10+0x29e/0x440 [drbd_transport_tcp]
> [ 6187.935033]  [<ffffffffa05bbde7>] dtt_connect+0x247/0x7b7 [drbd_transport_tcp]
> [ 6187.935072]  [<ffffffffa05300e1>] drbd_receiver+0x171/0x680 [drbd]

> I fixed this, the patch is attached on this mail. When it is ok, someone should apply it.

Looks almost correct (loop is missing).
I don't yet see the real problem with this particular code,
even just annotating that "this is ok" so the warning goes away
would be "legal". (sched_annotate_sleep() before mutex_lock()).

We are discussing to maybe replace the mutex_lock
by a mutex_trylock, or even by a spinlock.
Either way, real fix should be in "soon".

Thanks,

    Lars

> --- drbd/drbd_transport_tcp.c	2016-12-06 16:20:39.000000000 +0100
> +++ drbd/drbd_transport_tcp.c	2017-02-20 11:23:46.794979063 +0100
> @@ -568,6 +568,7 @@
>  	struct drbd_path *drbd_path2;
>  	struct dtt_listener *listener = container_of(drbd_listener, struct dtt_listener, listener);
>  	struct dtt_path *path = NULL;
> +	DEFINE_WAIT_FUNC(wait_connect, woken_wake_function);
>  
>  	rcu_read_lock();
>  	nc = rcu_dereference(transport->net_conf);
> @@ -582,9 +583,15 @@
>  	timeo += (prandom_u32() & 1) ? timeo / 7 : -timeo / 7; /* 28.5% random jitter */
>  
>  retry:
> -	timeo = wait_event_interruptible_timeout(listener->wait,
> -			(path = dtt_wait_connect_cond(transport)),
> -			timeo);
> +	add_wait_queue(&listener->wait, &wait_connect);
> +	path = dtt_wait_connect_cond(transport);
> +	if(!path) {
> +		wait_woken(&wait_connect, TASK_INTERRUPTIBLE, timeo);
> +		path = dtt_wait_connect_cond(transport);
> +		if(!path) timeo = 0;
> +	}
> +	remove_wait_queue(&listener->wait, &wait_connect);
> +
>  	if (timeo <= 0)
>  		return -EAGAIN;


-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT


More information about the drbd-dev mailing list