Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I have drbd 8.x in production for many years, without problems. I have a single primary setup, SAS disks are in hardware RAID5, partitions are assigned to drbd. On top of drbd, there are virtual machines, which use drbd resources as raw disks. I migrate from an old pair of servers to a new pair of servers and to drbd 9.0.5. It went well until last midnight. Resource R5 was doing initial synchronization between the two new servers, both running drbd 9.0.5. Resource R4 probably received some read/write load on the primary node, as it ran nightly backup. The bug came on the server that was the target of the synchronisation, and drbd secondary. OS: Debian Jessie (stable) with Debian's stock kernel, and I compiled drbd 9.0.5 myself from the tarball. I'd like to know if this is a drbd bug, or a hardware issue? Best regards, Laszlo Fiat Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.104776] drbd r4/0 drbd4: ASSERTION test_bit(BME_NO_WRITES, &bm_ext->flags) FAILED in drbd_try_rs_begin_io Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.107260] ------------[ cut here ]------------ Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.109802] kernel BUG at /home/lanlaf/data/comp/drbd-9.0.5-1/drbd/lru_cache.c:571! Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.112623] invalid opcode: 0000 [#1] SMP Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.115433] Modules linked in: drbd_transport_tcp(O) drbd(O) libcrc32c crc32c_generic bridge stp llc nls_utf8 nls_cp437 x86_pkg_temp_thermal vfat fat coretemp kvm_intel kvm ttm drm_kms_helper crc32_pclmul drm hid_generic aesni_intel aes_x86_64 cdc_ether usbnet mii lrw usbhid i2c_algo_bit gf128mul hid efi_pstore glue_helper ablk_helper efivars evdev ipmi_devintf iTCO_wdt iTCO_vendor_support pcspkr acpi_power_meter shpchp lpc_ich mxm_wmi i2c_i801 i2c_core mfd_core cryptd processor tpm_tis ipmi_si thermal_sys ipmi_msghandler tpm acpi_pad button mei_me mei wmi autofs4 ext4 crc16 mbcache jbd2 sd_mod sg crc_t10dif ses crct10dif_generic enclosure crct10dif_pclmul crct10dif_common crc32c_intel ehci_pci tg3 ehci_hcd xhci_hcd ptp pps_core megaraid_sas libphy usbcore usb_common scsi_mod Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.144663] CPU: 5 PID: 21081 Comm: drbd_s_r4 Tainted: G O 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.149998] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01GR174, BIOS -[TCE126O-2.20]- 09/07/2016 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.155584] task: ffff88103c374ba0 ti: ffff88100da60000 task.ti: ffff88100da60000 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.161358] RIP: 0010:[<ffffffffa052a224>] [<ffffffffa052a224>] lc_put+0x94/0xa0 [drbd] Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.167336] RSP: 0018:ffff88100da63d50 EFLAGS: 00010046 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.173415] RAX: 0000000000000000 RBX: ffff88103fb79c00 RCX: 0000000000000000 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.179677] RDX: 0000000000000036 RSI: ffff881034600410 RDI: ffff881078679880 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.186011] RBP: ffff881033c40800 R08: 0000000000000000 R09: 0000000000148b54 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.192478] R10: 0000000000000080 R11: 0000000000008000 R12: ffff881034600410 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.199015] R13: 0000000000000709 R14: ffff88103fb79c00 R15: ffff88103445f8d8 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.212529] FS: 0000000000000000(0000) GS:ffff88107f140000(0000) knlGS:0000000000000000 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.242192] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.249873] CR2: 00000000009f8018 CR3: 0000000001813000 CR4: 00000000003407e0 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.257766] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.265696] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.273669] Stack: Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.281628] ffffffffa05288ee 000000001e387285 0038728500000000 ffff881033c40800 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.289980] 00000000038491d6 000000001c248eb0 ffff88103445f450 ffff88103fb79c00 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.298442] ffff88103445f8d8 ffffffffa050852a ffff88103445f000 ffffffff0000009c Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.402492] Call Trace: Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.411026] [<ffffffffa05288ee>] ? drbd_try_rs_begin_io+0x9e/0x4f0 [drbd] Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.419798] [<ffffffffa050852a>] ? w_resync_timer+0x36a/0xed0 [drbd] Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.428472] [<ffffffffa050b6ca>] ? drbd_sender+0x16a/0x3d0 [drbd] Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.436952] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.445337] [<ffffffffa052a5ff>] ? drbd_thread_setup+0x5f/0x110 [drbd] Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.453625] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.461742] [<ffffffff810894bd>] ? kthread+0xbd/0xe0 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.469599] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.477281] [<ffffffff815184d8>] ? ret_from_fork+0x58/0x90 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.484697] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.491970] Code: 89 56 10 48 89 7e 18 48 89 07 83 6f 64 01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20 c3 0f 1f Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.506695] 44 00 00 0f 0b 0f 0b <0f> 0b 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 39 77 58 76 Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.520902] RIP [<ffffffffa052a224>] lc_put+0x94/0xa0 [drbd] Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.527910] RSP <ffff88100da63d50> Nov 24 00:13:02 hufot-s-apl0004 kernel: [643314.552188] ---[ end trace b15348e3853c93f5 ]--- A bit later: Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.793884] INFO: rcu_sched detected stalls on CPUs/tasks: { 5} (detected by 14, t=5252 jiffies, g=5339142, c=5339141, q=820) Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.803094] sending NMI to all CPUs: Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.807792] NMI backtrace for cpu 0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.812351] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D O 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.817202] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01GR174, BIOS -[TCE126O-2.20]- 09/07/2016 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.822109] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.827186] RIP: 0010:[<ffffffff8101c468>] [<ffffffff8101c468>] native_read_tsc+0x8/0x10 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.832482] RSP: 0018:ffff88107f003ef8 EFLAGS: 00000006 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.837871] RAX: 000000002d253f14 RBX: ffffffff8181f7c0 RCX: 0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.843473] RDX: 000626c900000000 RSI: 0000000000000000 RDI: ffffffff8181f7c0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.849194] RBP: 000000000009cf09 R08: 0000000000000611 R09: 0000000000000100 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.855044] R10: 0000000000006f1e R11: 00000000f2660034 R12: 000000000579c512 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.860964] R13: 0000000000000044 R14: 0000000000000000 R15: ffff88107f00db00 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.866925] FS: 0000000000000000(0000) GS:ffff88107f000000(0000) knlGS:0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.873065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.879282] CR2: 00007fe669b64028 CR3: 0000000001813000 CR4: 00000000003407f0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.885663] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.892136] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.898607] Stack: Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.905095] ffffffff8101c4a5 ffffffff810c96be 0000000000000000 000000000000005d Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.911966] 0000000000000044 ffffffff810d1f13 ffffffff818e2ee0 0000000109930a80 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.918941] ffffffff81803fd8 0020200000000009 ffffffff81803e28 000000000000005d Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.926020] Call Trace: Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.933107] <IRQ> Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.933179] [<ffffffff8101c4a5>] ? read_tsc+0x5/0x20 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.947512] [<ffffffff810c96be>] ? ktime_get+0x3e/0xc0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.954874] [<ffffffff810d1f13>] ? __tick_nohz_idle_enter+0x23/0x4a0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.962424] [<ffffffff8106dcd8>] ? irq_exit+0x88/0xa0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.969975] [<ffffffff8151b3a2>] ? do_IRQ+0x52/0xe0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.977580] [<ffffffff815191ed>] ? common_interrupt+0x6d/0x6d Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.985315] <EOI> Nov 24 00:13:25 hufot-s-apl0004 kernel: [643335.985387] [<ffffffff813e33d2>] ? cpuidle_enter_state+0x52/0xc0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.000600] [<ffffffff813e33c8>] ? cpuidle_enter_state+0x48/0xc0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.008295] [<ffffffff810a9af8>] ? cpu_startup_entry+0x328/0x470 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.015866] [<ffffffff81903076>] ? start_kernel+0x497/0x4a2 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.023429] [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.030788] [<ffffffff81902120>] ? early_idt_handler_array+0x120/0x120 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.038127] [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.045424] Code: 90 00 00 00 00 00 01 00 e9 4d ff ff ff e9 54 ff ff ff 90 0f 1f 44 00 00 f3 c3 66 0f 1f 84 00 00 00 00 00 0f 31 48 c1 e2 20 89 c0 <48> 09 c2 48 89 d0 c3 90 8b 05 2e 3c 8c 00 c3 66 0f 1f 84 00 00 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.061237] NMI backtrace for cpu 4 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.061243] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 253.445 msecs Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.076625] CPU: 4 PID: 18067 Comm: drbd_r_r3 Tainted: G D O 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.084590] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01GR174, BIOS -[TCE126O-2.20]- 09/07/2016 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.092801] task: ffff88103b8e1630 ti: ffff88100decc000 task.ti: ffff88100decc000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.101013] RIP: 0010:[<ffffffff8118fa42>] [<ffffffff8118fa42>] kmem_cache_alloc_node+0x32/0x4c0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.109409] RSP: 0018:ffff88100decfa40 EFLAGS: 00000246 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.117568] RAX: 0000000000000020 RBX: 00000000ffffffff RCX: 00000000ffffffff Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.125655] RDX: 00000000ffffffff RSI: 0000000000000020 RDI: ffff881078b772c0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.133750] RBP: 0000000000000020 R08: 00000000badf0c67 R09: 00000000000010f8 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.141900] R10: ffff88103e9aa0d0 R11: 0000000000001000 R12: 0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.149908] R13: 0000000000000000 R14: ffff881078b772c0 R15: 00000000ffffffff Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.157777] FS: 0000000000000000(0000) GS:ffff88107f100000(0000) knlGS:0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.165559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.173261] CR2: 00007f24eb56a000 CR3: 0000000001813000 CR4: 00000000003407e0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.180977] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.188588] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.196142] Stack: Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.203575] ffffffff81472779 000054f800000000 00ff88103e9aa040 ffff88103e44ec00 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.211354] ffff88103e9aa040 ffff88103ea76c40 ffff880ff03e8822 ffff880ff03e880e Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.219194] ffffffff812bed8b ffff88103e9aa040 0000000000000020 0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.227004] Call Trace: Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.234594] [<ffffffff81472779>] ? tcp_rcv_established+0x3e9/0x6c0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.242302] [<ffffffff812bed8b>] ? memcpy_toiovec+0x4b/0x80 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.250006] [<ffffffff81412428>] ? __alloc_skb+0x48/0x2a0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.257765] [<ffffffff814785b5>] ? tcp_send_ack+0x35/0x120 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.265426] [<ffffffff81469e4d>] ? tcp_recvmsg+0x71d/0xc30 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.272962] [<ffffffff8140e5f6>] ? release_sock+0xe6/0x170 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.280454] [<ffffffff8149215a>] ? inet_recvmsg+0x6a/0x80 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.287866] [<ffffffff81409e4a>] ? sock_recvmsg+0x9a/0xd0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.295142] [<ffffffffa050ea8c>] ? __drbd_alloc_pages+0xdc/0x170 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.302501] [<ffffffff81409eb5>] ? kernel_recvmsg+0x35/0x40 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.309885] [<ffffffffa03a05a2>] ? dtt_recv_short+0x52/0x60 [drbd_transport_tcp] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.317378] [<ffffffffa03a07e5>] ? dtt_recv_pages+0xb5/0x140 [drbd_transport_tcp] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.324873] [<ffffffffa0516ffe>] ? read_in_block+0x17e/0x380 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.332336] [<ffffffffa0518691>] ? receive_RSDataReply+0x101/0x550 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.339914] [<ffffffffa051f75f>] ? drbd_receiver+0x39f/0x650 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.347509] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.355010] [<ffffffffa052a5ff>] ? drbd_thread_setup+0x5f/0x110 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.362488] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.370011] [<ffffffff810894bd>] ? kthread+0xbd/0xe0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.377531] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.385069] [<ffffffff815184d8>] ? ret_from_fork+0x58/0x90 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.392627] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.400206] Code: 41 56 49 89 fe 41 55 41 54 55 89 f5 53 89 d3 48 83 ec 48 8b 05 00 58 75 00 65 44 8b 2c 25 28 ef 00 00 44 89 6c 24 08 21 f0 a8 10 <89> 44 24 14 74 05 e8 c3 56 38 00 9c 58 0f 1f 44 00 00 48 89 44 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.416391] NMI backtrace for cpu 5 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.416397] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 608.598 msecs Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.431820] CPU: 5 PID: 21091 Comm: drbd_a_r4 Tainted: G D O 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.439675] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01GR174, BIOS -[TCE126O-2.20]- 09/07/2016 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.455294] task: ffff881034630350 ti: ffff88103d6b0000 task.ti: ffff88103d6b0000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.462996] RIP: 0010:[<ffffffff8151808c>] [<ffffffff8151808c>] _raw_spin_lock_irqsave+0x3c/0x50 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.470732] RSP: 0018:ffff88103d6b3d70 EFLAGS: 00000002 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.478367] RAX: 000000000000e572 RBX: 000000000003dc6d RCX: 000000000000e572 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.486038] RDX: 000000000000e571 RSI: 0000000000000246 RDI: ffff88103fb79f20 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.493724] RBP: 0000000000000000 R08: ffff88107808f800 R09: 0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.501400] R10: 0000000000000000 R11: 0000000000001ee3 R12: ffff88103fb79c00 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.509111] R13: 000000000003dc6d R14: 0000000000000001 R15: ffff88103fb79f20 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.516747] FS: 0000000000000000(0000) GS:ffff88107f140000(0000) knlGS:0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.524429] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.532021] CR2: 00000000009f8018 CR3: 0000000001813000 CR4: 00000000003407e0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.539632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.547249] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.554847] Stack: Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.562301] ffffffffa052575a ffff88103fb79c08 ffff88103a6089f0 ffff88103fb79c00 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.569936] ffff88103d6b3d98 0000000000000003 ffff88103d6b3db8 00000000ffffffff Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.577575] ffffffffa0516ba4 ffff88103a608a10 ffff88103a608a10 0000000000000020 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.585205] Call Trace: Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.592672] [<ffffffffa052575a>] ? put_actlog+0x3a/0x120 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.600226] [<ffffffffa0516ba4>] ? got_peer_ack+0x194/0x370 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.607796] [<ffffffffa051fcc4>] ? drbd_ack_receiver+0x194/0x580 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.615409] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.623016] [<ffffffffa052a5ff>] ? drbd_thread_setup+0x5f/0x110 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.630714] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.638393] [<ffffffff810894bd>] ? kthread+0xbd/0xe0 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.646091] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.653875] [<ffffffff815184d8>] ? ret_from_fork+0x58/0x90 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.661637] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 00:13:25 hufot-s-apl0004 kernel: [643336.669407] Code: 0f 1f 44 00 00 ba 00 00 01 00 f0 0f c1 17 89 d1 c1 e9 10 66 39 d1 89 c8 75 04 48 89 f0 c3 0f b7 17 66 39 d1 74 f4 f3 90 0f b7 17 <66> 39 d0 75 f6 eb e8 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f [...] Again a bit later: Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013441] ------------[ cut here ]------------ Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013485] WARNING: CPU: 5 PID: 21091 at /build/linux-35gxh9/linux-3.16.36/kernel/watchdog.c:265 watchdog_overflow_callback+0x98/0xc0() Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013486] Watchdog detected hard LOCKUP on cpu 5 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013857] Modules linked in: drbd_transport_tcp(O) drbd(O) libcrc32c crc32c_generic bridge stp llc nls_utf8 nls_cp437 x86_pkg_temp_thermal vfat fat coretemp kvm_intel kvm ttm drm_kms_helper crc32_pclmul drm hid_generic aesni_intel aes_x86_64 cdc_ether usbnet mii lrw usbhid i2c_algo_bit gf128mul hid efi_pstore glue_helper ablk_helper efivars evdev ipmi_devintf iTCO_wdt iTCO_vendor_support pcspkr acpi_power_meter shpchp lpc_ich mxm_wmi i2c_i801 i2c_core mfd_core cryptd processor tpm_tis ipmi_si thermal_sys ipmi_msghandler tpm acpi_pad button mei_me mei wmi autofs4 ext4 crc16 mbcache jbd2 sd_mod sg crc_t10dif ses crct10dif_generic enclosure crct10dif_pclmul crct10dif_common crc32c_intel ehci_pci tg3 ehci_hcd xhci_hcd ptp pps_core megaraid_sas libphy usbcore usb_common scsi_mod Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013865] CPU: 5 PID: 21091 Comm: drbd_a_r4 Tainted: G D O 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013870] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01GR174, BIOS -[TCE126O-2.20]- 09/07/2016 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013873] 0000000000000000 ffffffff815123b5 ffff88107f145c98 0000000000000009 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013875] ffffffff81068817 ffff88107810fc00 ffff88107f145ce8 ffff88107f145d88 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013877] 0000000000000000 ffff88107f145ef8 ffffffff8106887c ffffffff8171d440 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013881] Call Trace: Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013913] <NMI> [<ffffffff815123b5>] ? dump_stack+0x5d/0x78 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013932] [<ffffffff81068817>] ? warn_slowpath_common+0x77/0x90 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013935] [<ffffffff8106887c>] ? warn_slowpath_fmt+0x4c/0x50 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013953] [<ffffffff8108d7a6>] ? atomic_notifier_call_chain+0x16/0x20 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013990] [<ffffffff8137df2c>] ? vt_console_print+0x23c/0x3f0 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.013996] [<ffffffff810feeb8>] ? watchdog_overflow_callback+0x98/0xc0 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014049] [<ffffffff81136557>] ? __perf_event_overflow+0x87/0x230 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014076] [<ffffffff810309da>] ? intel_pmu_handle_irq+0x1ea/0x400 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014107] [<ffffffff810285d6>] ? perf_event_nmi_handler+0x26/0x40 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014114] [<ffffffff8101797b>] ? nmi_handle+0x8b/0x130 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014116] [<ffffffff81017f2d>] ? default_do_nmi+0xbd/0x110 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014118] [<ffffffff81018074>] ? do_nmi+0xf4/0x170 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014122] [<ffffffff8151a98f>] ? end_repeat_nmi+0x1a/0x1e Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014128] [<ffffffff8151808c>] ? _raw_spin_lock_irqsave+0x3c/0x50 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014131] [<ffffffff8151808c>] ? _raw_spin_lock_irqsave+0x3c/0x50 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014134] [<ffffffff8151808c>] ? _raw_spin_lock_irqsave+0x3c/0x50 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014188] <<EOE>> [<ffffffffa052575a>] ? put_actlog+0x3a/0x120 [drbd] Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014229] [<ffffffffa0516ba4>] ? got_peer_ack+0x194/0x370 [drbd] Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014242] [<ffffffffa051fcc4>] ? drbd_ack_receiver+0x194/0x580 [drbd] Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014253] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014267] [<ffffffffa052a5ff>] ? drbd_thread_setup+0x5f/0x110 [drbd] Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014305] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014334] [<ffffffff810894bd>] ? kthread+0xbd/0xe0 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014337] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014341] [<ffffffff815184d8>] ? ret_from_fork+0x58/0x90 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014344] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 00:14:08 hufot-s-apl0004 kernel: [643380.014345] ---[ end trace b15348e3853c93f6 ]--- The NMIs repeated many times during the night. This task (drbd_a_r4) stayed with us until reboot, and kept 1 core at 100% CPU Load: Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.871069] INFO: rcu_bh detected stalls on CPUs/tasks: { 5} (detected by 15, t=4779280 jiffies, g=2146, c=2145, q=28) Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.879837] sending NMI to all CPUs: Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.888342] NMI backtrace for cpu 5 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.896579] CPU: 5 PID: 21091 Comm: drbd_a_r4 Tainted: G D W O 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.905006] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01GR174, BIOS -[TCE126O-2.20]- 09/07/2016 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.913440] task: ffff881034630350 ti: ffff88103d6b0000 task.ti: ffff88103d6b0000 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.921915] RIP: 0010:[<ffffffff81518089>] [<ffffffff81518089>] _raw_spin_lock_irqsave+0x39/0x50 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.930637] RSP: 0018:ffff88103d6b3d70 EFLAGS: 00000002 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.939339] RAX: 000000000000e572 RBX: 000000000003dc6d RCX: 000000000000e572 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.948135] RDX: 000000000000e571 RSI: 0000000000000246 RDI: ffff88103fb79f20 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.956991] RBP: 0000000000000000 R08: ffff88107808f800 R09: 0000000000000000 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.965731] R10: 0000000000000000 R11: 0000000000001ee3 R12: ffff88103fb79c00 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.974445] R13: 000000000003dc6d R14: 0000000000000001 R15: ffff88103fb79f20 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.983031] FS: 0000000000000000(0000) GS:ffff88107f140000(0000) knlGS:0000000000000000 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679825.991756] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.000538] CR2: 00000000009f8018 CR3: 0000000001813000 CR4: 00000000003407e0 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.009349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.017999] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.026569] Stack: Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.035029] ffffffffa052575a ffff88103fb79c08 ffff88103a6089f0 ffff88103fb79c00 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.043686] ffff88103d6b3d98 0000000000000003 ffff88103d6b3db8 00000000ffffffff Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.052293] ffffffffa0516ba4 ffff88103a608a10 ffff88103a608a10 0000000000000020 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.060911] Call Trace: Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.069431] [<ffffffffa052575a>] ? put_actlog+0x3a/0x120 [drbd] Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.078108] [<ffffffffa0516ba4>] ? got_peer_ack+0x194/0x370 [drbd] Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.086639] [<ffffffffa051fcc4>] ? drbd_ack_receiver+0x194/0x580 [drbd] Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.095024] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.103277] [<ffffffffa052a5ff>] ? drbd_thread_setup+0x5f/0x110 [drbd] Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.111537] [<ffffffffa052a5a0>] ? w_complete+0x20/0x20 [drbd] Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.119760] [<ffffffff810894bd>] ? kthread+0xbd/0xe0 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.127920] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.136149] [<ffffffff815184d8>] ? ret_from_fork+0x58/0x90 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.144249] [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180 Nov 24 10:21:05 hufot-s-apl0004 kernel: [679826.152327] Code: c6 fa 66 0f 1f 44 00 00 ba 00 00 01 00 f0 0f c1 17 89 d1 c1 e9 10 66 39 d1 89 c8 75 04 48 89 f0 c3 0f b7 17 66 39 d1 74 f4 f3 90 <0f> b7 17 66 39 d0 75 f6 eb e8 66 66 66 66 2e 0f 1f 84 00 00 00 The secondary was available on the network, but it was sluggish to work with. When I tried to reboot next morning, the reboot stalled, it couldn't deconfigure the network, so I had to reboot using the power button...