[DRBD-user] Error after update to 9.0.8+linbit-1

Roland Kammerer roland.kammerer at linbit.com
Mon Aug 14 16:12:20 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Aug 14, 2017 at 02:41:56PM +0200, Frank Rust wrote:
> Hi all,
> after upgrading DRBD-Dkms to 9.0.8+linbit-1 (on kernel 4.4.67-1-pve) I get those errors:
> 
> Software is: root at virt5:~# dpkg -l |grep drbd
> ii  drbd-dkms                         9.0.8+linbit-1                     all          RAID 1 over TCP/IP for Linux module source
> ii  drbd-utils                        9.0.0+linbit-1                     amd64        RAID 1 over TCP/IP for Linux (user utilities)
> ii  drbdmanage-proxmox                1.0-1                              all          DRBD distributed resource management utility
> ii  python-drbdmanage                 0.99.8-1                           all          DRBD distributed resource management utility
> 
> 
> 
> kernel:[233761.855451] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [drbd_s_vm-109-d:65197]
> Aug 14 14:21:49 virt2 kernel: [233761.855451] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [drbd_s_vm-109-d:65197]
> Aug 14 14:21:49 virt2 kernel: [233761.855544] Modules linked in: ipt_REJECT nf_reject_ipv4 drbd_transport_tcp(O) drbd(O) ip_set ip6table_filter ip6_tables xt_multiport binfmt_misc iptable_filter ip_tables x_tables softdog rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel sb_edac aes_x86_64 snd_pcm lrw snd_timer edac_core gf128mul snd glue_helper shpchp soundcore joydev pcspkr input_leds ablk_helper cryptd i2c_i801 lpc_ich ipmi_si 8250_fintek mei_me ipmi_msghandler mei mac_hid acpi_power_meter wmi vhost_net vhost macvtap macvlan coretemp autofs4 hid_generic ixgbe(O) usbkbd usbmouse dca be2net usbhid vxlan ahci ip6_udp_tunnel ptp hid libahci udp_tunnel pps_core megaraid_sas fjes
> Aug 14 14:21:49 virt2 kernel: [233761.855592] CPU: 1 PID: 65197 Comm: drbd_s_vm-109-d Tainted: P           O L  4.4.67-1-pve #1
> Aug 14 14:21:49 virt2 kernel: [233761.855593] Hardware name: FUJITSU PRIMERGY RX2530 M1/D3279-A1, BIOS V5.0.0.9 R1.28.0 for D3279-A1x                     12/09/2015
> Aug 14 14:21:49 virt2 kernel: [233761.855594] task: ffff883003707000 ti: ffff881dc5100000 task.ti: ffff881dc5100000
> Aug 14 14:21:49 virt2 kernel: [233761.855595] RIP: 0010:[<ffffffffc0936ae2>]  [<ffffffffc0936ae2>] wait_for_sender_todo+0x142/0x270 [drbd]
> Aug 14 14:21:49 virt2 kernel: [233761.855606] RSP: 0018:ffff881dc5103db0  EFLAGS: 00000282
> Aug 14 14:21:49 virt2 kernel: [233761.855607] RAX: 00000000002aef32 RBX: ffff88300dd94000 RCX: 8000000000000000
> Aug 14 14:21:49 virt2 kernel: [233761.855608] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff881c71ab1998
> Aug 14 14:21:49 virt2 kernel: [233761.855609] RBP: ffff881dc5103e10 R08: ffff88300dd944d8 R09: 0000000000000001
> Aug 14 14:21:49 virt2 kernel: [233761.855610] R10: 000000002b300018 R11: 00000000fffffffb R12: ffff88300dd940b8
> Aug 14 14:21:49 virt2 kernel: [233761.855610] R13: ffff88300dd944c0 R14: ffff883003707000 R15: 0000000000000001
> Aug 14 14:21:49 virt2 kernel: [233761.855611] FS:  0000000000000000(0000) GS:ffff88181f640000(0000) knlGS:0000000000000000
> Aug 14 14:21:49 virt2 kernel: [233761.855612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Aug 14 14:21:49 virt2 kernel: [233761.855613] CR2: 0000000000000030 CR3: 0000000001e0b000 CR4: 00000000001426e0
> Aug 14 14:21:49 virt2 kernel: [233761.855614] Stack:
> Aug 14 14:21:49 virt2 kernel: [233761.855614]  ffff88300dd94520 0000000000000000 ffff883003707000 ffffffff810c46e0
> Aug 14 14:21:49 virt2 kernel: [233761.855616]  ffff881dc5103dd0 ffff881dc5103dd0 00000000e2e71a4a ffff88300dd94000
> Aug 14 14:21:49 virt2 kernel: [233761.855617]  ffff88300dd944d8 ffff88300dd94528 ffff88300dd94520 ffff88300dd944d8
> Aug 14 14:21:49 virt2 kernel: [233761.855618] Call Trace:
> Aug 14 14:21:49 virt2 kernel: [233761.855623]  [<ffffffff810c46e0>] ? wait_woken+0x90/0x90
> Aug 14 14:21:49 virt2 kernel: [233761.855627]  [<ffffffffc093c1fe>] drbd_sender+0x34e/0x400 [drbd]
> Aug 14 14:21:49 virt2 kernel: [233761.855633]  [<ffffffffc095ad60>] ? w_complete+0x20/0x20 [drbd]
> Aug 14 14:21:49 virt2 kernel: [233761.855637]  [<ffffffffc095adc0>] drbd_thread_setup+0x60/0x110 [drbd]
> Aug 14 14:21:49 virt2 kernel: [233761.855641]  [<ffffffffc095ad60>] ? w_complete+0x20/0x20 [drbd]
> Aug 14 14:21:49 virt2 kernel: [233761.855643]  [<ffffffff810a134a>] kthread+0xfa/0x110
> Aug 14 14:21:49 virt2 kernel: [233761.855644]  [<ffffffff810a1250>] ? kthread_park+0x60/0x60
> Aug 14 14:21:49 virt2 kernel: [233761.855646]  [<ffffffff81864a8f>] ret_from_fork+0x3f/0x70
> Aug 14 14:21:49 virt2 kernel: [233761.855647]  [<ffffffff810a1250>] ? kthread_park+0x60/0x60
> Aug 14 14:21:49 virt2 kernel: [233761.855648] Code: 8b 7b 10 8b 87 a0 03 00 00 84 d2 74 09 3b 83 38 09 00 00 0f 95 c2 48 81 c7 98 01 00 00 c6 07 00 0f 1f 40 00 fb 66 0f 1f 44 00 00 <84> d2 0f 85 99 00 00 00 f0 0f ba b3 f8 00 00 00 0f 72 43 83 bb 
> Aug 14 14:21:51 virt2 kernel: [233763.964203] drbd_send_and_submit: 546 callbacks suppressed
> Aug 14 14:21:51 virt2 kernel: [233763.964206] drbd vm-109-disk-1/0 drbd176: IO ERROR: neither local nor remote data, sector 38107320+8
> Aug 14 14:21:51 virt2 kernel: [233763.964327] buffer_io_error: 546 callbacks suppressed
> 
> 
> top shows processes using all of one cpu core
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                
> 65197 root      20   0       0      0      0 R 100.0  0.0  92:01.19 drbd_s_vm-109-d                                                                                                                                                        
> 65229 root      20   0       0      0      0 R 100.0  0.0  91:54.35 kworker/u145:2                                                                                                                                                        

Any logs you can share before that happend? +/- 50 lines around the time
this happened would help.

> >drbdadm status
> root at virt5:~# drbdadm status
> .drbdctrl role:Secondary
>   volume:0 disk:UpToDate
>   volume:1 disk:UpToDate
>   fs1 role:Secondary
>     volume:0 peer-disk:UpToDate
>     volume:1 peer-disk:UpToDate
>   fs2 role:Secondary
>     volume:0 peer-disk:UpToDate
>     volume:1 peer-disk:UpToDate
>   virt1 role:Secondary
>     volume:0 peer-disk:UpToDate
>     volume:1 peer-disk:UpToDate
>   virt2 role:Primary
>     volume:0 peer-disk:UpToDate
>     volume:1 peer-disk:UpToDate
>   virt3 role:Secondary
>     volume:0 peer-disk:UpToDate
>     volume:1 peer-disk:UpToDate
>   virt4 role:Secondary
>     volume:0 peer-disk:UpToDate
>     volume:1 peer-disk:UpToDate

Maybe a bit unrelated, but why would you spawn the control volume over
that many nodes? Especially virtual machines. That does not make sense
to me. I guess these should be satellite nodes.

> Is there anything I can do about it? (reboot is no option on production machines…)

You saw the trace in the kernel? This is what it is going to be.

Regards, rck



More information about the drbd-user mailing list