[DRBD-user] Proxmox 5.1 and DRBD over ZFS stability issues

Massimo De Nadal maxx at digital-system.it
Fri May 11 10:51:33 CEST 2018


Hello,

I'm trying to use a ZFS backed DRBD storage on Proxmox 5.1
Everything is working fine and integration is great but I'm facing some serious stability issues under heavy write load.

Basically under heavy write load a node crashes  especially (but not only) during VM live migration.
It seems that it happens only during dual-primary operations (which is necessary for VM live migration).
The problem happens on both DBRD 8.4.10 and 9.0.12.

Anybody out there facing similar problems?
Thanks.


This is the crash message:

May  9 12:17:33 pve-LAB2 kernel: [  949.958947] Oops: 0003 [#1] SMP PTI
May  9 12:17:33 pve-LAB2 kernel: [  949.958960] Modules linked in: ip_set ip6table_filter ip6_tables iptable_filter softdog bonding nfnetlink_log nfnetlink ipmi_ssif 
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mxm_wmi kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc 
ast ttm aesni_intel aes_x86_64 crypto_simd drm_kms_helper glue_helper snd_pcm cryptd snd_timer drm snd intel_cstate soundcore fb_sys_fops syscopyarea 
mei_me sysfillrect pcspkr intel_rapl_perf input_leds joydev sysimgblt lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter 
acpi_pad mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drbd sunrpc lru_cache libcrc32c 
ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO)
May  9 12:17:33 pve-LAB2 kernel: [  949.959231]  zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid 
igb ixgbe i2c_algo_bit mpt3sas dca ahci raid_class ptp i2c_i801 libahci mdio pps_core scsi_transport_sas
May  9 12:17:33 pve-LAB2 kernel: [  949.959305] CPU: 0 PID: 18178 Comm: drbd_r_Test1-2 Tainted: P           O     4.15.17-1-pve #1
May  9 12:17:33 pve-LAB2 kernel: [  949.959332] Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
May  9 12:17:33 pve-LAB2 kernel: [  949.959358] RIP: 0010:avl_insert+0x4b/0xd0 [zavl]
May  9 12:17:33 pve-LAB2 kernel: [  949.959374] RSP: 0018:ffff9e6a086a3ca0 EFLAGS: 00010282
May  9 12:17:33 pve-LAB2 kernel: [  949.959392] RAX: 0000000000000000 RBX: ffff8cf5ca2bb200 RCX: ffffffffc057afcf
May  9 12:17:33 pve-LAB2 kernel: [  949.959415] RDX: 0000000000000000 RSI: ffff8cf5ca2bb208 RDI: ffff8cf5ef7d7160
May  9 12:17:33 pve-LAB2 kernel: [  949.959437] RBP: ffff9e6a086a3cf0 R08: ffffffffc057afce R09: ffff8cf5fec07180
May  9 12:17:33 pve-LAB2 kernel: [  949.959461] R10: ffff8cf5ca2bb200 R11: 0000000000000000 R12: ffff8cf5ef7d7130
May  9 12:17:33 pve-LAB2 kernel: [  949.959483] R13: ffff8cf5792f2b00 R14: 0000000000000000 R15: 0000000000000000
May  9 12:17:33 pve-LAB2 kernel: [  949.959507] FS:  0000000000000000(0000) GS:ffff8cf5ff200000(0000) knlGS:0000000000000000
May  9 12:17:33 pve-LAB2 kernel: [  949.959532] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  9 12:17:33 pve-LAB2 kernel: [  949.959552] CR2: ffffffffc057afce CR3: 0000001b7ac0a001 CR4: 00000000003626f0
May  9 12:17:33 pve-LAB2 kernel: [  949.959575] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May  9 12:17:33 pve-LAB2 kernel: [  949.959598] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May  9 12:17:33 pve-LAB2 kernel: [  949.959621] Call Trace:
May  9 12:17:33 pve-LAB2 kernel: [  949.959677]  ? zfs_range_lock+0x4bf/0x5c0 [zfs]
May  9 12:17:33 pve-LAB2 kernel: [  949.959699]  ? spl_kmem_alloc+0xae/0x190 [spl]
May  9 12:17:33 pve-LAB2 kernel: [  949.959744]  zvol_request+0x16e/0x300 [zfs]
May  9 12:17:33 pve-LAB2 kernel: [  949.959764]  generic_make_request+0x123/0x2f0
May  9 12:17:33 pve-LAB2 kernel: [  949.959781]  submit_bio+0x73/0x150
May  9 12:17:33 pve-LAB2 kernel: [  949.959794]  ? submit_bio+0x73/0x150
May  9 12:17:33 pve-LAB2 kernel: [  949.959812]  ? receive_Barrier+0x147/0x3c0 [drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.959832]  receive_Barrier+0x1d6/0x3c0 [drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.960673]  ? drbd_bump_write_ordering+0x350/0x350 [drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.961526]  drbd_receiver+0x1ad/0x320 [drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.962386]  drbd_thread_setup+0x58/0x140 [drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.963127]  kthread+0x105/0x140
May  9 12:17:33 pve-LAB2 kernel: [  949.963859]  ? drbd_destroy_device+0x2b0/0x2b0 [drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.964585]  ? kthread_create_worker_on_cpu+0x70/0x70
May  9 12:17:33 pve-LAB2 kernel: [  949.965314]  ? kthread_create_worker_on_cpu+0x70/0x70
May  9 12:17:33 pve-LAB2 kernel: [  949.966153]  ret_from_fork+0x35/0x40
May  9 12:17:33 pve-LAB2 kernel: [  949.966874] Code: 89 c1 83 e0 04 48 83 c9 01 48 09 c8 4d 85 c0 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 89 46 10 
0f 84 84 00 00 00 48 63 c2 <49> 89 34 c0 49 8b 50 10 8b 04 85 70 01 46 c0 89 d1 83 e1 03 83
May  9 12:17:33 pve-LAB2 kernel: [  949.969131] CR2: ffffffffc057afce
May  9 12:17:33 pve-LAB2 kernel: [  949.969890] ---[ end trace 39ccea975700982e ]---


_______________________________________
Massimo De Nadal

Digital System srl
Via E.B. Mondin 7 - 32100 - Belluno (Italy)
tel. +39.0437.296539 - fax +39.0437.917154
sip:maxx at digital-system.it
email:maxx at digital-system.it
http://www.digital-system.it
_______________________________________
/"\
\ /    ASCII Ribbon Campaign
 X   against HTML email & vCards
/ \

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 484 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180511/f4d3ea08/attachment.pgp>


More information about the drbd-user mailing list