[DRBD-user] Still kernel crashes (ZFS or DRBD ?)

Julien Escario julien.escario at altinea.fr
Wed Dec 12 20:19:56 CET 2018


Hello,
Yesterday and today, I experienced a strange crash when live migrating a
VM inside a Proxmox cluster from a diskless node to another node (with
disk attached).

I'm using ZFSThin as backend.

You'll find below the kernel error message I've been able to catch
before everything goes wrong.

I'm not that comfortable with reading such errors. It seems it crashed
on the zfs_range_lock called by spl_kmem_alloc so it seems more a ZFS bug.

It happened just after I moved disk from a NFS storage to Linstor
storage (online). I don't know if both storage nodes both had the
complete dataset. It's perhaps the problem : when you move as primary on
a node that's not completely sync.

Can it be an explanation ?

Any idea to guide me through the resolution ?

Feel free to ask details if I'm not clear, I'm still trying to complete
my analysis.

Thanks a lot,
Julien

> Dec 12 19:22:18 vm13 kernel: [92347.195898] BUG: unable to handle kernel paging request at ffffffffc0559fce
> Dec 12 19:22:18 vm13 kernel: [92347.195950] IP: avl_insert+0x4b/0xd0 [zavl]
> Dec 12 19:22:18 vm13 kernel: [92347.195973] PGD 1d84e0e067 P4D 1d84e0e067 PUD 1d84e10067 PMD 3f63f00067 PTE 3f6da55061
> Dec 12 19:22:18 vm13 kernel: [92347.196011] Oops: 0003 [#1] SMP PTI
> Dec 12 19:22:18 vm13 kernel: [92347.196023] Modules linked in: veth tcp_diag inet_diag binfmt_misc drbd_transport_tcp(O) ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_tcpudp xt_addrtype xt_conntrack nf_conntrack xt_set xt_mark ip_set_hash_net ip_set xt_multiport iptable_filter 8021q garp mrp softdog nfnetlink_log nfnetlink nls_iso8859_1 vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 zfs(PO) crypto_simd glue_helper cryptd zunicode(PO) zavl(PO)
> Dec 12 19:22:18 vm13 kernel: [92347.196265]  intel_cstate icp(PO) snd_pcm snd_timer intel_rapl_perf snd ast soundcore ttm pcspkr drm_kms_helper joydev input_leds drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt lpc_ich mei_me mei wmi shpchp ioatdma ipmi_si acpi_power_meter acpi_pad mac_hid zcommon(PO) znvpair(PO) spl(O) drbd(O) libcrc32c ipmi_devintf sunrpc ipmi_msghandler ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic usbkbd usbmouse usbhid hid i2c_i801 igb(O) ahci libahci ixgbe dca ptp pps_core mdio
> Dec 12 19:22:18 vm13 kernel: [92347.196428] CPU: 11 PID: 10103 Comm: drbd_r_vm-145-d Tainted: P           O     4.15.18-9-pve #1
> Dec 12 19:22:18 vm13 kernel: [92347.196458] Hardware name: Supermicro Super Server/X10SRW-F, BIOS 3.1 06/06/2018
> Dec 12 19:22:18 vm13 kernel: [92347.196485] RIP: 0010:avl_insert+0x4b/0xd0 [zavl]
> Dec 12 19:22:18 vm13 kernel: [92347.196499] RSP: 0018:ffffaedaad6dbc40 EFLAGS: 00010282
> Dec 12 19:22:18 vm13 kernel: [92347.196519] RAX: 0000000000000000 RBX: ffff8ea096f74900 RCX: ffffffffc0559fcf
> Dec 12 19:22:18 vm13 kernel: [92347.196541] RDX: 0000000000000000 RSI: ffff8ea096f74908 RDI: ffff8e9e1a7ac560
> Dec 12 19:22:18 vm13 kernel: [92347.196562] RBP: ffffaedaad6dbc90 R08: ffffffffc0559fce R09: ffff8ea23e807180
> Dec 12 19:22:18 vm13 kernel: [92347.196597] R10: ffff8ea096f74900 R11: 0000000000000000 R12: ffff8e9e1a7ac530
> Dec 12 19:22:18 vm13 kernel: [92347.196628] R13: ffff8ea096f74200 R14: 0000000000000000 R15: 0000000000000000
> Dec 12 19:22:18 vm13 kernel: [92347.196654] FS:  0000000000000000(0000) GS:ffff8ea23f0c0000(0000) knlGS:0000000000000000
> Dec 12 19:22:18 vm13 kernel: [92347.196701] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Dec 12 19:22:18 vm13 kernel: [92347.196736] CR2: ffffffffc0559fce CR3: 0000001d84e0a001 CR4: 00000000003626e0
> Dec 12 19:22:18 vm13 kernel: [92347.196758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec 12 19:22:18 vm13 kernel: [92347.196779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Dec 12 19:22:18 vm13 kernel: [92347.196812] Call Trace:
> Dec 12 19:22:18 vm13 kernel: [92347.196872]  ? zfs_range_lock+0x4bf/0x5c0 [zfs]
> Dec 12 19:22:18 vm13 kernel: [92347.196893]  ? spl_kmem_alloc+0xae/0x1a0 [spl]
> Dec 12 19:22:18 vm13 kernel: [92347.196939]  zvol_request+0x16e/0x300 [zfs]
> Dec 12 19:22:18 vm13 kernel: [92347.197879]  generic_make_request+0x123/0x2f0
> Dec 12 19:22:18 vm13 kernel: [92347.198751]  submit_bio+0x73/0x140
> Dec 12 19:22:18 vm13 kernel: [92347.199616]  ? submit_bio+0x73/0x140
> Dec 12 19:22:18 vm13 kernel: [92347.200475]  ? drbd_flush_after_epoch+0x119/0x360 [drbd]
> Dec 12 19:22:18 vm13 kernel: [92347.201652]  drbd_flush_after_epoch+0x1ae/0x360 [drbd]
> Dec 12 19:22:18 vm13 kernel: [92347.202775]  ? w_flush+0x50/0x50 [drbd]
> Dec 12 19:22:18 vm13 kernel: [92347.204120]  receive_Barrier+0x132/0x1c0 [drbd]
> Dec 12 19:22:18 vm13 kernel: [92347.205319]  drbd_receiver+0x4ba/0x730 [drbd]
> Dec 12 19:22:18 vm13 kernel: [92347.206949]  drbd_thread_setup+0x8f/0x1a0 [drbd]
> Dec 12 19:22:18 vm13 kernel: [92347.208169]  kthread+0x105/0x140
> Dec 12 19:22:18 vm13 kernel: [92347.209303]  ? __drbd_next_peer_device_ref+0x170/0x170 [drbd]
> Dec 12 19:22:18 vm13 kernel: [92347.210495]  ? kthread_create_worker_on_cpu+0x70/0x70
> Dec 12 19:22:18 vm13 kernel: [92347.211797]  ? kthread_create_worker_on_cpu+0x70/0x70
> Dec 12 19:22:18 vm13 kernel: [92347.213006]  ret_from_fork+0x35/0x40
> Dec 12 19:22:18 vm13 kernel: [92347.213786] Code: 89 c1 83 e0 04 48 83 c9 01 48 09 c8 4d 85 c0 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 89 46 10 0f 84 84 00 00 00 48 63 c2 <49> 89 34 c0 49 8b 50 10 8b 04 85 70 f1 91 c0 89 d1 83 e1 03 83 
> Dec 12 19:22:18 vm13 kernel: [92347.215525] RIP: avl_insert+0x4b/0xd0 [zavl] RSP: ffffaedaad6dbc40
> Dec 12 19:22:18 vm13 kernel: [92347.216335] CR2: ffffffffc0559fce
> Dec 12 19:22:18 vm13 kernel: [92347.217251] ---[ end trace 4a48ac46305bbcb1 ]---



More information about the drbd-user mailing list