[Drbd-dev] Avoid nested sleeping on TCP connect

Andreas Osterburg andreas.osterburg at digide.net
Mon Feb 20 17:58:16 CET 2017


Thanks for your investigations.
I didn't use a loop since the old behaviour was to leave the function returning -EAGAIN
on timeout or interrupt. There is just one difference: When an event from the socket occures
and no TCP-connection is established, the function leaves before the timeout elapses. It
makes no real difference to an interrupt, so I didn't handle it specially.

Thanks,

Andreas Osterburg

Am 20.02.2017 um 15:07 schrieb Lars Ellenberg:
> On Mon, Feb 20, 2017 at 11:54:45AM +0100, Andreas Osterburg wrote:
>> Recent Linux-kernels (since 3.19) emit a warning when using nested sleeping
>> statements within kernel code. CONFIG_DEBUG_ATOMIC_SLEEP must be enabled to
>> see it.
>> Module drbd_transport_tcp is affected and always triggers a warning
>> on first connect:
>> [ 6187.934573] WARNING: CPU: 33 PID: 17430 at ../kernel/sched/core.c:7963 __might_sleep+0x76/0x80()
>> [ 6187.934580] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810c2dce>] prepare_to_wait_event+0x5e/0xf0
>
>> [ 6187.934926]  [<ffffffff810a30b6>] __might_sleep+0x76/0x80
>> [ 6187.934936]  [<ffffffff8160984c>] mutex_lock+0x1c/0x38
>> [ 6187.934981]  [<ffffffffa05ba8f0>] dtt_wait_connect_cond+0x20/0xa0 [drbd_transport_tcp]
>> [ 6187.935017]  [<ffffffffa05bb3ce>] dtt_wait_for_connect.constprop.10+0x29e/0x440 [drbd_transport_tcp]
>> [ 6187.935033]  [<ffffffffa05bbde7>] dtt_connect+0x247/0x7b7 [drbd_transport_tcp]
>> [ 6187.935072]  [<ffffffffa05300e1>] drbd_receiver+0x171/0x680 [drbd]
>
>> I fixed this, the patch is attached on this mail. When it is ok, someone should apply it.
>
> Looks almost correct (loop is missing).
> I don't yet see the real problem with this particular code,
> even just annotating that "this is ok" so the warning goes away
> would be "legal". (sched_annotate_sleep() before mutex_lock()).
>
> We are discussing to maybe replace the mutex_lock
> by a mutex_trylock, or even by a spinlock.
> Either way, real fix should be in "soon".
>
> Thanks,
>
>     Lars
>
>> --- drbd/drbd_transport_tcp.c	2016-12-06 16:20:39.000000000 +0100
>> +++ drbd/drbd_transport_tcp.c	2017-02-20 11:23:46.794979063 +0100
>> @@ -568,6 +568,7 @@
>>  	struct drbd_path *drbd_path2;
>>  	struct dtt_listener *listener = container_of(drbd_listener, struct dtt_listener, listener);
>>  	struct dtt_path *path = NULL;
>> +	DEFINE_WAIT_FUNC(wait_connect, woken_wake_function);
>>
>>  	rcu_read_lock();
>>  	nc = rcu_dereference(transport->net_conf);
>> @@ -582,9 +583,15 @@
>>  	timeo += (prandom_u32() & 1) ? timeo / 7 : -timeo / 7; /* 28.5% random jitter */
>>
>>  retry:
>> -	timeo = wait_event_interruptible_timeout(listener->wait,
>> -			(path = dtt_wait_connect_cond(transport)),
>> -			timeo);
>> +	add_wait_queue(&listener->wait, &wait_connect);
>> +	path = dtt_wait_connect_cond(transport);
>> +	if(!path) {
>> +		wait_woken(&wait_connect, TASK_INTERRUPTIBLE, timeo);
>> +		path = dtt_wait_connect_cond(transport);
>> +		if(!path) timeo = 0;
>> +	}
>> +	remove_wait_queue(&listener->wait, &wait_connect);
>> +
>>  	if (timeo <= 0)
>>  		return -EAGAIN;
>
>


-- 
Andreas Osterburg
IT Software GmbH & Data Security Elbe KG       Tel.: +49 (391) 509609-55
Lorenzweg 42 - Haus 3, D-39124 Magdeburg       Fax : +49 (391) 509609-56
Geschäftsführer: Jens Henning             Amtsgericht Stendal, HRA 22588
Zertifiziert nach ISO 9001:2008


More information about the drbd-dev mailing list