Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, after almost 1 month I noticed the following strange BUG: drbd TOP: Starting receiver thread (from swapper/0 [0]) BUG: scheduling while atomic: swapper/0/0/0x00000100 Modules linked in: iscsi_scst(O) ib_srpt(O) scst_vdisk(O) scst(O) crc32c_intel drbd(O) libcrc32c autofs4 cpufreq_ondemand bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa uinput iTCO_wdt iTCO_vendor_support raid0 raid10 microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core sg e1000e ptp pps_core ib_qib dca ib_mad ib_core acpi_cpufreq freq_table ext4 jbd2 mbcache raid1 sd_mod crc_t10dif crct10dif_common ahci libahci video 3w_9xxx dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.12.14 #1 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 0000000000000000 ffff8802370039c0 ffffffff8153b1c2 0000000000000000 ffff880237012d00 ffff8802370039d0 ffffffff8107ac0b ffff880237003a70 ffffffff8153d7dd ffff880237003a00 ffff880230e38480 0000000000000024 Call Trace: <IRQ> [<ffffffff8153b1c2>] dump_stack+0x49/0x5f [<ffffffff8107ac0b>] __schedule_bug+0x4b/0x60 [<ffffffff8153d7dd>] __schedule+0x6ad/0x6f0 [<ffffffff8153d8e9>] schedule+0x29/0x70 [<ffffffff8153b375>] schedule_timeout+0x195/0x220 [<ffffffff8153f6aa>] ? common_interrupt+0x6a/0x6a [<ffffffff8153dcb6>] wait_for_completion+0xc6/0x100 [<ffffffff8107ff80>] ? try_to_wake_up+0x2c0/0x2c0 [<ffffffff8106ee1d>] kthread_create_on_node+0x9d/0x120 [<ffffffffa0533840>] ? drbd_destroy_device+0x180/0x180 [drbd] [<ffffffff8153b144>] ? printk+0x4d/0x4f [<ffffffff8105b176>] ? __flush_signals+0x36/0x40 [<ffffffffa0533a7a>] drbd_thread_start+0x11a/0x290 [drbd] [<ffffffffa0533cd8>] _drbd_thread_stop+0xe8/0x100 [drbd] [<ffffffffa0544996>] __drbd_set_state+0xb86/0xcd0 [drbd] [<ffffffffa0525c89>] _drbd_set_state.clone.0+0x39/0x50 [drbd] [<ffffffffa0527f2c>] request_timer_fn+0x3fc/0x560 [drbd] [<ffffffff810668a0>] ? flush_delayed_work+0x40/0x40 [<ffffffffa0527b30>] ? drbd_submit_req_private_bio+0x2b0/0x2b0 [drbd] [<ffffffff810581d9>] call_timer_fn+0x49/0x120 [<ffffffff81058db5>] run_timer_softirq+0x225/0x290 [<ffffffff8126bd10>] ? timerqueue_add+0x60/0xb0 [<ffffffffa0527b30>] ? drbd_submit_req_private_bio+0x2b0/0x2b0 [drbd] [<ffffffff81050dbf>] __do_softirq+0xef/0x260 [<ffffffff8107263b>] ? hrtimer_interrupt+0x15b/0x250 [<ffffffff8154905c>] call_softirq+0x1c/0x30 [<ffffffff81005805>] do_softirq+0x55/0x90 [<ffffffff81050af5>] irq_exit+0xb5/0xc0 [<ffffffff8154987a>] smp_apic_timer_interrupt+0x4a/0x5a [<ffffffff8154858a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff81434a6f>] ? cpuidle_enter_state+0x4f/0xd0 [<ffffffff81434a6b>] ? cpuidle_enter_state+0x4b/0xd0 [<ffffffff81434e6f>] cpuidle_idle_call+0xcf/0x160 [<ffffffff8100c5ee>] arch_cpu_idle+0xe/0x30 [<ffffffff8109ed5a>] cpu_idle_loop+0x9a/0x230 [<ffffffff8109ef60>] cpu_startup_entry+0x70/0x80 [<ffffffff81533f07>] rest_init+0x77/0x80 [<ffffffff81b1017c>] start_kernel+0x3e3/0x3f0 [<ffffffff81b0fbc7>] ? repair_env_string+0x5b/0x5b [<ffffffff81b0f59d>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81b0f6d8>] x86_64_start_kernel+0x139/0x140 bad: scheduling from the idle thread! CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W O 3.12.14 #1 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 ffffffff81a10480 ffff880237003970 ffffffff8153b1c2 0000000000000000 ffff880237012d00 ffff880237003990 ffffffff810829f9 ffff8802370039a0 ffff880237012d00 ffff8802370039c0 ffffffff8107add3 ffff8802370039c0 Call Trace: <IRQ> [<ffffffff8153b1c2>] dump_stack+0x49/0x5f [<ffffffff810829f9>] dequeue_task_idle+0x29/0x40 [<ffffffff8107add3>] dequeue_task+0x93/0xb0 [<ffffffff8107ae12>] deactivate_task+0x22/0x30 [<ffffffff8153d5bc>] __schedule+0x48c/0x6f0 [<ffffffff8153d8e9>] schedule+0x29/0x70 [<ffffffff8153b375>] schedule_timeout+0x195/0x220 [<ffffffff8153f6aa>] ? common_interrupt+0x6a/0x6a [<ffffffff8153dcb6>] wait_for_completion+0xc6/0x100 [<ffffffff8107ff80>] ? try_to_wake_up+0x2c0/0x2c0 [<ffffffff8106ee1d>] kthread_create_on_node+0x9d/0x120 [<ffffffffa0533840>] ? drbd_destroy_device+0x180/0x180 [drbd] [<ffffffff8153b144>] ? printk+0x4d/0x4f [<ffffffff8105b176>] ? __flush_signals+0x36/0x40 [<ffffffffa0533a7a>] drbd_thread_start+0x11a/0x290 [drbd] [<ffffffffa0533cd8>] _drbd_thread_stop+0xe8/0x100 [drbd] [<ffffffffa0544996>] __drbd_set_state+0xb86/0xcd0 [drbd] [<ffffffffa0525c89>] _drbd_set_state.clone.0+0x39/0x50 [drbd] [<ffffffffa0527f2c>] request_timer_fn+0x3fc/0x560 [drbd] [<ffffffff810668a0>] ? flush_delayed_work+0x40/0x40 [<ffffffffa0527b30>] ? drbd_submit_req_private_bio+0x2b0/0x2b0 [drbd] [<ffffffff810581d9>] call_timer_fn+0x49/0x120 [<ffffffff81058db5>] run_timer_softirq+0x225/0x290 [<ffffffff8126bd10>] ? timerqueue_add+0x60/0xb0 [<ffffffffa0527b30>] ? drbd_submit_req_private_bio+0x2b0/0x2b0 [drbd] [<ffffffff81050dbf>] __do_softirq+0xef/0x260 [<ffffffff8107263b>] ? hrtimer_interrupt+0x15b/0x250 [<ffffffff8154905c>] call_softirq+0x1c/0x30 [<ffffffff81005805>] do_softirq+0x55/0x90 [<ffffffff81050af5>] irq_exit+0xb5/0xc0 [<ffffffff8154987a>] smp_apic_timer_interrupt+0x4a/0x5a [<ffffffff8154858a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff81434a6f>] ? cpuidle_enter_state+0x4f/0xd0 [<ffffffff81434a6b>] ? cpuidle_enter_state+0x4b/0xd0 [<ffffffff81434e6f>] cpuidle_idle_call+0xcf/0x160 [<ffffffff8100c5ee>] arch_cpu_idle+0xe/0x30 [<ffffffff8109ed5a>] cpu_idle_loop+0x9a/0x230 [<ffffffff8109ef60>] cpu_startup_entry+0x70/0x80 [<ffffffff81533f07>] rest_init+0x77/0x80 [<ffffffff81b1017c>] start_kernel+0x3e3/0x3f0 [<ffffffff81b0fbc7>] ? repair_env_string+0x5b/0x5b [<ffffffff81b0f59d>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81b0f6d8>] x86_64_start_kernel+0x139/0x140 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] SMP Modules linked in: iscsi_scst(O) ib_srpt(O) scst_vdisk(O) scst(O) crc32c_intel drbd(O) libcrc32c autofs4 cpufreq_ondemand bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa uinput iTCO_wdt iTCO_vendor_support raid0 raid10 microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core sg e1000e ptp pps_core ib_qib dca ib_mad ib_core acpi_cpufreq freq_table ext4 jbd2 mbcache raid1 sd_mod crc_t10dif crct10dif_common ahci libahci video 3w_9xxx dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd] CPU: 7 PID: 52888 Comm: kthreadd Tainted: G W O 3.12.14 #1 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 task: ffff8800a3893540 ti: ffff8800771ac000 task.ti: ffff8800771ac000 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff8800771add60 EFLAGS: 00010046 RAX: ffffffff81613440 RBX: ffff880237012d00 RCX: 000000000000b7a8 RDX: 0000000000000005 RSI: ffffffff81a10480 RDI: ffff880237012d00 RBP: ffff8800771add88 R08: 0000000000000000 R09: 00087a835b6f23c0 R10: ffffffff816133e5 R11: 0000000000000001 R12: ffffffff81a10480 R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000086 FS: 0000000000000000(0000) GS:ffff8802370e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000001407e0 Stack: ffffffff8107ae8a 00007f1bb02036a0 ffffffff81a10480 ffff880237012d00 ffffffff81a10c84 ffff8800771add98 ffffffff8107b992 ffff8800771addb8 ffffffff8107b9c1 ffff880237012d00 ffffffff81a10480 ffff8800771addd8 Call Trace: [<ffffffff8107ae8a>] ? enqueue_task+0x6a/0x80 [<ffffffff8107b992>] activate_task+0x22/0x30 [<ffffffff8107b9c1>] ttwu_activate+0x21/0x50 [<ffffffff8107c2bc>] ttwu_do_activate.clone.0+0x3c/0x60 [<ffffffff8107fec6>] try_to_wake_up+0x206/0x2c0 [<ffffffff8107ff92>] default_wake_function+0x12/0x20 [<ffffffff81077659>] __wake_up_common+0x59/0x90 [<ffffffffa0533840>] ? drbd_destroy_device+0x180/0x180 [drbd] [<ffffffff81079666>] complete+0x46/0x60 [<ffffffff8106e97e>] kthread+0x9e/0xe0 [<ffffffff8106e8e0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8154792c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106e8e0>] ? kthread_freezable_should_stop+0x70/0x70 Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff8800771add60> CR2: 0000000000000000 ---[ end trace f6ab84a85de85d6c ]--- Kernel panic - not syncing: Fatal exception Discussed this topic with Bart from SCST, he suggested: "To me this looks like a DRBD bug. If a timeout of a DRBD request occurs the DRBD timeout handler (request_timer_fn()) is called, that function grabs a spinlock and next calls kthread_create_on_node() with that spinlock held. kthread_create_on_node() can sleep which means that it is not allowed to call that function with a spinlock held. I'm not sure why this particular behavior has not yet been noticed before. Can you post the first call stack from the e-mail at the start of this thread together with the kernel version number on a DRBD-related mailing list and ask the DRBD authors for their opinion ?" Thanks in advance -- #Matteo Tescione #RMnet srl