[DRBD-user] Kernel panic with DRBD 9.0 on Kernel 4.2.6 "LOGIC BUG for enr=x"

Francois Baligant fbaligant at synalabs.com
Sun Jan 17 17:59:20 CET 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

We run 2 Proxmox 4 nodes with KVM in a dual-primary scenario with
protocol C on DRBD9.

Hardware is PowerEdge R730 with tg3 NIC and H730P RAID card with
megaraid_sas driver with latest firmwares for IDRAC, BIOS and RAID.
Storage is SSD.

When doing heavy I/O in a VM, we have a kernel panic in drbd module on
the node running the VM.

We get the kernel panic using the latest proxmox kernel (drbd9
360c65a035fc2dec2b93e839b5c7fae1201fa7d9 ) and using drbd9 git master
also (a48a43a73ebc01e398ca1b755a7006b96ccdfb28)

We have a kdump crash dump if that can be of any help.

Virtualization: KVM guest with virtio for net and disk. Using
writethrough caching strategy for guest VM. Backing storage for VM is
LVM on top of DRBD.

Tried both versions:

# cat /proc/drbd
version: 9.0.0 (api:2/proto:86-110)
GIT-hash: 360c65a035fc2dec2b93e839b5c7fae1201fa7d9 build by root at elsa,
2016-01-10 15:26:34
Transports (api:10): tcp (1.0.0)

# cat /proc/drbd
version: 9.0.0 (api:2/proto:86-110)
GIT-hash: a48a43a73ebc01e398ca1b755a7006b96ccdfb28 build by
root at sd-84686, 2016-01-17 16:31:20
Transports (api:13): tcp (1.0.0)

Doing in VM: dd if=/dev/zero of=dd1 bs=65536 count=1M

Node:

Linux version 4.2.6-1-pve (root at sd-84686) (gcc version 4.9.2 (Debian
4.9.2-10) ) #1 SMP Sun Jan 17 13:39:16 CET 2016

[  861.968976] drbd r0/0 drbd0: LOGIC BUG for enr=64243
[  862.065397] ------------[ cut here ]------------
[  862.065442] kernel BUG at /usr/src/drbd-9.0/drbd/lru_cache.c:571!
[  862.065484] invalid opcode: 0000 [#1] SMP
[  862.065529] Modules linked in: drbd_transport_tcp(O) drbd(O)
netconsole configfs ip_set ip6table_filter ip6_tables iptable_filter
ip_tables softdog x_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad
ib_core ib_addr iscsi_tcp libiscs
i_tcp libiscsi scsi_transport_iscsi ip_gre ip_tunnel vport_gre gre
openvswitch libcrc32c nfnetlink_log nfnetlink ipmi_ssif ipmi_devintf
intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel dcdbas kvm crct10dif_
pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd snd_pcm snd_timer snd soundcore pcspkr joydev
input_leds sb_edac edac_core mei_me ioatdma mei shpchp lpc_ich dca wmi
ipmi_si 8250_fintek ipmi_msgha
ndler mac_hid acpi_power_meter vhost_net vhost macvtap macvlan autofs4
hid_generic usbkbd usbmouse usbhid hid ahci libahci tg3 ptp pps_core
megaraid_sas [last unloaded: drbd]
[  862.066319] CPU: 0 PID: 2343 Comm: drbd_a_r0 Tainted: G           O
   4.2.6-1-pve #1
[  862.066386] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS
1.5.4 10/002/2015
[  862.066451] task: ffff881fee1d0000 ti: ffff881fecd78000 task.ti:
ffff881fecd78000
[  862.066517] RIP: 0010:[<ffffffffc0556e30>]  [<ffffffffc0556e30>]
lc_put+0x90/0xa0 [drbd]
[  862.066594] RSP: 0000:ffff881fecd7bb08  EFLAGS: 00010046
[  862.066633] RAX: 0000000000000000 RBX: 000000000000faf3 RCX: ffff881fe7b8cab0
[  862.066677] RDX: ffff881fe65dc000 RSI: ffff881fe7b8cab0 RDI: ffff881fdc2eca80
[  862.066721] RBP: ffff881fecd7bb08 R08: 0000000000000484 R09: 0000000000000000
[  862.066765] R10: ffff883cf8428870 R11: 0000000000000000 R12: ffff881fec93ec00
[  862.066808] R13: 0000000000000000 R14: 000000000000faf3 R15: 0000000000000001
[  862.066852] FS:  0000000000000000(0000) GS:ffff881ffec00000(0000)
knlGS:0000000000000000
[  862.066919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  862.066959] CR2: 00007f1f0a44f028 CR3: 0000000001e0d000 CR4: 00000000001426f0
[  862.067003] Stack:
[  862.067034]  ffff881fecd7bb58 ffffffffc0553b5a 0000000000000046
ffff881fec93eeb0
[  862.067115]  ffff881fecd7bb68 ffff883cf8428438 ffff881fec93ec00
ffff883cf8428448
[  862.067196]  0000000000000800 0000000000004000 ffff881fecd7bb68
ffffffffc0554060
[  862.067277] Call Trace:
[  862.067316]  [<ffffffffc0553b5a>] put_actlog+0x6a/0x120 [drbd]
[  862.067360]  [<ffffffffc0554060>] drbd_al_complete_io+0x30/0x40 [drbd]
[  862.067406]  [<ffffffffc054e192>] drbd_req_destroy+0x442/0x880 [drbd]
[  862.067451]  [<ffffffff81734640>] ? tcp_recvmsg+0x390/0xb90
[  862.067493]  [<ffffffffc054ead8>] mod_rq_state+0x508/0x7c0 [drbd]
[  862.067537]  [<ffffffffc054f084>] __req_mod+0x214/0x8d0 [drbd]
[  862.067582]  [<ffffffffc0558c4b>] tl_release+0x1db/0x320 [drbd]
[  862.067626]  [<ffffffffc053c3c2>] got_BarrierAck+0x32/0xc0 [drbd]
[  862.067670]  [<ffffffffc054cdc0>] drbd_ack_receiver+0x160/0x5c0 [drbd]
[  862.067716]  [<ffffffffc05571d0>] ? w_complete+0x20/0x20 [drbd]
[  862.067760]  [<ffffffffc0557234>] drbd_thread_setup+0x64/0x120 [drbd]
[  862.067804]  [<ffffffffc05571d0>] ? w_complete+0x20/0x20 [drbd]
[  862.067847]  [<ffffffff8109acaa>] kthread+0xea/0x100
[  862.067886]  [<ffffffff8109abc0>] ? kthread_create_on_node+0x1f0/0x1f0
[  862.067930]  [<ffffffff8180875f>] ret_from_fork+0x3f/0x70
[  862.067970]  [<ffffffff8109abc0>] ? kthread_create_on_node+0x1f0/0x1f0
[  862.068012] Code: 89 42 08 48 89 56 10 48 89 7e 18 48 89 07 83 6f
64 01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20 5d c3
0f 0b 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
00 00
[  862.068414] RIP  [<ffffffffc0556e30>] lc_put+0x90/0xa0 [drbd]
[  862.068459]  RSP <ffff881fecd7bb08>
[  862.069000] ---[ end trace b005772103543ee2 ]---
[  872.163694] ------------[ cut here ]------------

# drbdsetup show
resource r0 {
    _this_host {
        node-id      0;
        volume 0 {
            device     minor 0;
            disk       "/dev/sda4";
            meta-disk     internal;
            disk {
                disk-flushes       no;
            }
        }
    }
    connection {
        _peer_node_id 1;
        path {
            _this_host ipv4 10.0.0.197:7788;
            _remote_host ipv4 10.0.0.140:7788;
        }
        net {
            allow-two-primaries yes;
            cram-hmac-alg       "sha1";
            shared-secret       "xxxxxxxx";
            after-sb-0pri       discard-zero-changes;
            after-sb-1pri       discard-secondary;
            verify-alg          "md5";
            _name               "proxmox1";
        }
        volume 0 {
            disk {
                resync-rate        40960k; # bytes/second
            }
        }
    }
}

Shortly after the tg3 watchdog trigger, it's probably a consequence of
the drbd kernel panic but maybe not ?

See here: https://pastebin.synalabs.hosting/#cI5nWLuuD37_yN6ii8RLtg

Is this a known problem for this kind of configuration?
(kvm->virtio->lvm->drbd->h730p+tg3)

Best regards,
Francois



More information about the drbd-user mailing list