[Drbd-dev] Avoid nested sleeping on TCP connect

Andreas Osterburg andreas.osterburg at digide.net
Mon Feb 20 11:54:45 CET 2017


Recent Linux-kernels (since 3.19) emit a warning when using nested sleeping
statements within kernel code. CONFIG_DEBUG_ATOMIC_SLEEP must be enabled to
see it.
Module drbd_transport_tcp is affected and always triggers a warning
on first connect:

[ 6187.804767] drbd: module cleanup done.
[ 6187.903039] drbd: initialized. Version: 9.0.1-1 (api:2/proto:86-111)
[ 6187.903043] drbd: GIT-hash: 86e443973082570aeb651848db89e0c7b995c306 build by abuild
[ 6187.903048] drbd: registered as block device major 147
[ 6187.913714] drbd test: Starting worker thread (from drbdsetup [17413])
[ 6187.929138] drbd test hidden0: Starting sender thread (from drbdsetup [17423])
[ 6187.933323] drbd test hidden0: conn( StandAlone -> Unconnected )
[ 6187.933488] drbd test hidden0: Starting receiver thread (from drbd_w_test [17414])
[ 6187.933739] drbd test hidden0: conn( Unconnected -> Connecting )
[ 6187.934552] ------------[ cut here ]------------
[ 6187.934573] WARNING: CPU: 33 PID: 17430 at ../kernel/sched/core.c:7963 __might_sleep+0x76/0x80()
[ 6187.934580] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810c2dce>] prepare_to_wait_event+0x5e/0xf0
[ 6187.934582] Modules linked in: drbd_transport_tcp(O) drbd(O) dm_mod libcrc32c nf_log_ipv6 xt_pkttype xt_physdev nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet 8021q garp mrp br_netfilter bridge stp llc bonding iscsi_ibft iscsi_boot_sysfs mptctl mptbase ip6t_REJECT nf_reject_ipv6 xt_tcpudp 
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT nf_reject_ipv4 iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables kvm_amd kvm ixgbe irqbypass 
crct10dif_pclmul crc32_pclmul crc32c_intel mgag200 vxlan drbg ip6_udp_tunnel ttm udp_tunnel igb ptp drm_kms_helper ansi_cprng mdio pps_core joydev cdc_acm amd64_edac_mod edac_mce_amd drm fam15h_power edac_core fb_sys_fops syscopyarea k10temp sysfillrect aesni_intel sp5100_tco aes_x86_64 dca lrw 
sysimgblt gf128mul glue_helper ablk_helper cryptd pcspkr i2c_piix4 i2c_algo_bit shpchp fjes button acpi_cpufreq processor ext4 crc16 jbd2 mbcache sd_mod hid_generic sr_mod usbhid cdrom ohci_pci ahci ehci_pci ohci_hcd libahci ehci_hcd serio_raw mpt3sas usbcore raid_class libata usb_common 
megaraid_sas scsi_transport_sas sg scsi_mod autofs4 [last unloaded: drbd]
[ 6187.934784] CPU: 33 PID: 17430 Comm: drbd_r_test Tainted: G           O     4.4.46-11-default #1
[ 6187.934821] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.5b       03/25/2016
[ 6187.934833]  0000000000000000 ffffffff81328657 ffff8847feb4bbc8 ffffffff81a60b92
[ 6187.934835]  ffffffff8107e951 ffffffff81a61cac ffff8847feb4bc18 0000000000000061
[ 6187.934845]  0000000000000000 ffff883fe1795868 ffffffff8107e9cc ffffffff81a51610
[ 6187.934855] Call Trace:
[ 6187.934877]  [<ffffffff81019ea9>] dump_trace+0x59/0x320
[ 6187.934885]  [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180
[ 6187.934893]  [<ffffffff8101b011>] show_stack+0x21/0x40
[ 6187.934901]  [<ffffffff81328657>] dump_stack+0x5c/0x85
[ 6187.934910]  [<ffffffff8107e951>] warn_slowpath_common+0x81/0xb0
[ 6187.934918]  [<ffffffff8107e9cc>] warn_slowpath_fmt+0x4c/0x50
[ 6187.934926]  [<ffffffff810a30b6>] __might_sleep+0x76/0x80
[ 6187.934936]  [<ffffffff8160984c>] mutex_lock+0x1c/0x38
[ 6187.934981]  [<ffffffffa05ba8f0>] dtt_wait_connect_cond+0x20/0xa0 [drbd_transport_tcp]
[ 6187.935017]  [<ffffffffa05bb3ce>] dtt_wait_for_connect.constprop.10+0x29e/0x440 [drbd_transport_tcp]
[ 6187.935033]  [<ffffffffa05bbde7>] dtt_connect+0x247/0x7b7 [drbd_transport_tcp]
[ 6187.935072]  [<ffffffffa05300e1>] drbd_receiver+0x171/0x680 [drbd]
[ 6187.935102]  [<ffffffffa053a897>] drbd_thread_setup+0x67/0x120 [drbd]
[ 6187.935115]  [<ffffffff8109d438>] kthread+0xc8/0xe0
[ 6187.935123]  [<ffffffff8160be4f>] ret_from_fork+0x3f/0x70
[ 6187.938887] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
[ 6187.938904]
[ 6187.938907] Leftover inexact backtrace:
[ 6187.938907]
[ 6187.938915]  [<ffffffff8109d370>] ? kthread_park+0x50/0x50
[ 6187.938934] ---[ end trace 724c9d4565599062 ]---

I fixed this, the patch is attached on this mail. When it is ok, someone should apply it.

Thanks,
Andreas Osterburg

-------------- next part --------------
--- drbd/drbd_transport_tcp.c	2016-12-06 16:20:39.000000000 +0100
+++ drbd/drbd_transport_tcp.c	2017-02-20 11:23:46.794979063 +0100
@@ -568,6 +568,7 @@
 	struct drbd_path *drbd_path2;
 	struct dtt_listener *listener = container_of(drbd_listener, struct dtt_listener, listener);
 	struct dtt_path *path = NULL;
+	DEFINE_WAIT_FUNC(wait_connect, woken_wake_function);
 
 	rcu_read_lock();
 	nc = rcu_dereference(transport->net_conf);
@@ -582,9 +583,15 @@
 	timeo += (prandom_u32() & 1) ? timeo / 7 : -timeo / 7; /* 28.5% random jitter */
 
 retry:
-	timeo = wait_event_interruptible_timeout(listener->wait,
-			(path = dtt_wait_connect_cond(transport)),
-			timeo);
+	add_wait_queue(&listener->wait, &wait_connect);
+	path = dtt_wait_connect_cond(transport);
+	if(!path) {
+		wait_woken(&wait_connect, TASK_INTERRUPTIBLE, timeo);
+		path = dtt_wait_connect_cond(transport);
+		if(!path) timeo = 0;
+	}
+	remove_wait_queue(&listener->wait, &wait_connect);
+
 	if (timeo <= 0)
 		return -EAGAIN;
 


More information about the drbd-dev mailing list