Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, We run 2 Proxmox 4 nodes with KVM in a dual-primary scenario with protocol C on DRBD9. Hardware is PowerEdge R730 with tg3 NIC and H730P RAID card with megaraid_sas driver with latest firmwares for IDRAC, BIOS and RAID. Storage is SSD. When doing heavy I/O in a VM, we have a kernel panic in drbd module on the node running the VM. We get the kernel panic using the latest proxmox kernel (drbd9 360c65a035fc2dec2b93e839b5c7fae1201fa7d9 ) and using drbd9 git master also (a48a43a73ebc01e398ca1b755a7006b96ccdfb28) We have a kdump crash dump if that can be of any help. Virtualization: KVM guest with virtio for net and disk. Using writethrough caching strategy for guest VM. Backing storage for VM is LVM on top of DRBD. Tried both versions: # cat /proc/drbd version: 9.0.0 (api:2/proto:86-110) GIT-hash: 360c65a035fc2dec2b93e839b5c7fae1201fa7d9 build by root at elsa, 2016-01-10 15:26:34 Transports (api:10): tcp (1.0.0) # cat /proc/drbd version: 9.0.0 (api:2/proto:86-110) GIT-hash: a48a43a73ebc01e398ca1b755a7006b96ccdfb28 build by root at sd-84686, 2016-01-17 16:31:20 Transports (api:13): tcp (1.0.0) Doing in VM: dd if=/dev/zero of=dd1 bs=65536 count=1M Node: Linux version 4.2.6-1-pve (root at sd-84686) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Sun Jan 17 13:39:16 CET 2016 [ 861.968976] drbd r0/0 drbd0: LOGIC BUG for enr=64243 [ 862.065397] ------------[ cut here ]------------ [ 862.065442] kernel BUG at /usr/src/drbd-9.0/drbd/lru_cache.c:571! [ 862.065484] invalid opcode: 0000 [#1] SMP [ 862.065529] Modules linked in: drbd_transport_tcp(O) drbd(O) netconsole configfs ip_set ip6table_filter ip6_tables iptable_filter ip_tables softdog x_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscs i_tcp libiscsi scsi_transport_iscsi ip_gre ip_tunnel vport_gre gre openvswitch libcrc32c nfnetlink_log nfnetlink ipmi_ssif ipmi_devintf intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel dcdbas kvm crct10dif_ pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr joydev input_leds sb_edac edac_core mei_me ioatdma mei shpchp lpc_ich dca wmi ipmi_si 8250_fintek ipmi_msgha ndler mac_hid acpi_power_meter vhost_net vhost macvtap macvlan autofs4 hid_generic usbkbd usbmouse usbhid hid ahci libahci tg3 ptp pps_core megaraid_sas [last unloaded: drbd] [ 862.066319] CPU: 0 PID: 2343 Comm: drbd_a_r0 Tainted: G O 4.2.6-1-pve #1 [ 862.066386] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015 [ 862.066451] task: ffff881fee1d0000 ti: ffff881fecd78000 task.ti: ffff881fecd78000 [ 862.066517] RIP: 0010:[<ffffffffc0556e30>] [<ffffffffc0556e30>] lc_put+0x90/0xa0 [drbd] [ 862.066594] RSP: 0000:ffff881fecd7bb08 EFLAGS: 00010046 [ 862.066633] RAX: 0000000000000000 RBX: 000000000000faf3 RCX: ffff881fe7b8cab0 [ 862.066677] RDX: ffff881fe65dc000 RSI: ffff881fe7b8cab0 RDI: ffff881fdc2eca80 [ 862.066721] RBP: ffff881fecd7bb08 R08: 0000000000000484 R09: 0000000000000000 [ 862.066765] R10: ffff883cf8428870 R11: 0000000000000000 R12: ffff881fec93ec00 [ 862.066808] R13: 0000000000000000 R14: 000000000000faf3 R15: 0000000000000001 [ 862.066852] FS: 0000000000000000(0000) GS:ffff881ffec00000(0000) knlGS:0000000000000000 [ 862.066919] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 862.066959] CR2: 00007f1f0a44f028 CR3: 0000000001e0d000 CR4: 00000000001426f0 [ 862.067003] Stack: [ 862.067034] ffff881fecd7bb58 ffffffffc0553b5a 0000000000000046 ffff881fec93eeb0 [ 862.067115] ffff881fecd7bb68 ffff883cf8428438 ffff881fec93ec00 ffff883cf8428448 [ 862.067196] 0000000000000800 0000000000004000 ffff881fecd7bb68 ffffffffc0554060 [ 862.067277] Call Trace: [ 862.067316] [<ffffffffc0553b5a>] put_actlog+0x6a/0x120 [drbd] [ 862.067360] [<ffffffffc0554060>] drbd_al_complete_io+0x30/0x40 [drbd] [ 862.067406] [<ffffffffc054e192>] drbd_req_destroy+0x442/0x880 [drbd] [ 862.067451] [<ffffffff81734640>] ? tcp_recvmsg+0x390/0xb90 [ 862.067493] [<ffffffffc054ead8>] mod_rq_state+0x508/0x7c0 [drbd] [ 862.067537] [<ffffffffc054f084>] __req_mod+0x214/0x8d0 [drbd] [ 862.067582] [<ffffffffc0558c4b>] tl_release+0x1db/0x320 [drbd] [ 862.067626] [<ffffffffc053c3c2>] got_BarrierAck+0x32/0xc0 [drbd] [ 862.067670] [<ffffffffc054cdc0>] drbd_ack_receiver+0x160/0x5c0 [drbd] [ 862.067716] [<ffffffffc05571d0>] ? w_complete+0x20/0x20 [drbd] [ 862.067760] [<ffffffffc0557234>] drbd_thread_setup+0x64/0x120 [drbd] [ 862.067804] [<ffffffffc05571d0>] ? w_complete+0x20/0x20 [drbd] [ 862.067847] [<ffffffff8109acaa>] kthread+0xea/0x100 [ 862.067886] [<ffffffff8109abc0>] ? kthread_create_on_node+0x1f0/0x1f0 [ 862.067930] [<ffffffff8180875f>] ret_from_fork+0x3f/0x70 [ 862.067970] [<ffffffff8109abc0>] ? kthread_create_on_node+0x1f0/0x1f0 [ 862.068012] Code: 89 42 08 48 89 56 10 48 89 7e 18 48 89 07 83 6f 64 01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20 5d c3 0f 0b 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [ 862.068414] RIP [<ffffffffc0556e30>] lc_put+0x90/0xa0 [drbd] [ 862.068459] RSP <ffff881fecd7bb08> [ 862.069000] ---[ end trace b005772103543ee2 ]--- [ 872.163694] ------------[ cut here ]------------ # drbdsetup show resource r0 { _this_host { node-id 0; volume 0 { device minor 0; disk "/dev/sda4"; meta-disk internal; disk { disk-flushes no; } } } connection { _peer_node_id 1; path { _this_host ipv4 10.0.0.197:7788; _remote_host ipv4 10.0.0.140:7788; } net { allow-two-primaries yes; cram-hmac-alg "sha1"; shared-secret "xxxxxxxx"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; verify-alg "md5"; _name "proxmox1"; } volume 0 { disk { resync-rate 40960k; # bytes/second } } } } Shortly after the tg3 watchdog trigger, it's probably a consequence of the drbd kernel panic but maybe not ? See here: https://pastebin.synalabs.hosting/#cI5nWLuuD37_yN6ii8RLtg Is this a known problem for this kind of configuration? (kvm->virtio->lvm->drbd->h730p+tg3) Best regards, Francois