[Drbd-dev] Avoid nested sleeping on TCP connect
Lars Ellenberg
lars.ellenberg at linbit.com
Mon Feb 20 15:07:15 CET 2017
On Mon, Feb 20, 2017 at 11:54:45AM +0100, Andreas Osterburg wrote:
> Recent Linux-kernels (since 3.19) emit a warning when using nested sleeping
> statements within kernel code. CONFIG_DEBUG_ATOMIC_SLEEP must be enabled to
> see it.
> Module drbd_transport_tcp is affected and always triggers a warning
> on first connect:
> [ 6187.934573] WARNING: CPU: 33 PID: 17430 at ../kernel/sched/core.c:7963 __might_sleep+0x76/0x80()
> [ 6187.934580] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810c2dce>] prepare_to_wait_event+0x5e/0xf0
> [ 6187.934926] [<ffffffff810a30b6>] __might_sleep+0x76/0x80
> [ 6187.934936] [<ffffffff8160984c>] mutex_lock+0x1c/0x38
> [ 6187.934981] [<ffffffffa05ba8f0>] dtt_wait_connect_cond+0x20/0xa0 [drbd_transport_tcp]
> [ 6187.935017] [<ffffffffa05bb3ce>] dtt_wait_for_connect.constprop.10+0x29e/0x440 [drbd_transport_tcp]
> [ 6187.935033] [<ffffffffa05bbde7>] dtt_connect+0x247/0x7b7 [drbd_transport_tcp]
> [ 6187.935072] [<ffffffffa05300e1>] drbd_receiver+0x171/0x680 [drbd]
> I fixed this, the patch is attached on this mail. When it is ok, someone should apply it.
Looks almost correct (loop is missing).
I don't yet see the real problem with this particular code,
even just annotating that "this is ok" so the warning goes away
would be "legal". (sched_annotate_sleep() before mutex_lock()).
We are discussing to maybe replace the mutex_lock
by a mutex_trylock, or even by a spinlock.
Either way, real fix should be in "soon".
Thanks,
Lars
> --- drbd/drbd_transport_tcp.c 2016-12-06 16:20:39.000000000 +0100
> +++ drbd/drbd_transport_tcp.c 2017-02-20 11:23:46.794979063 +0100
> @@ -568,6 +568,7 @@
> struct drbd_path *drbd_path2;
> struct dtt_listener *listener = container_of(drbd_listener, struct dtt_listener, listener);
> struct dtt_path *path = NULL;
> + DEFINE_WAIT_FUNC(wait_connect, woken_wake_function);
>
> rcu_read_lock();
> nc = rcu_dereference(transport->net_conf);
> @@ -582,9 +583,15 @@
> timeo += (prandom_u32() & 1) ? timeo / 7 : -timeo / 7; /* 28.5% random jitter */
>
> retry:
> - timeo = wait_event_interruptible_timeout(listener->wait,
> - (path = dtt_wait_connect_cond(transport)),
> - timeo);
> + add_wait_queue(&listener->wait, &wait_connect);
> + path = dtt_wait_connect_cond(transport);
> + if(!path) {
> + wait_woken(&wait_connect, TASK_INTERRUPTIBLE, timeo);
> + path = dtt_wait_connect_cond(transport);
> + if(!path) timeo = 0;
> + }
> + remove_wait_queue(&listener->wait, &wait_connect);
> +
> if (timeo <= 0)
> return -EAGAIN;
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support
DRBD® and LINBIT® are registered trademarks of LINBIT
More information about the drbd-dev
mailing list