[DRBD-user] Kernel panic with DRBD 9.0 on Kernel 4.2.6 "LOGIC BUG for enr=x"

Claudio Nicora claudio.nicora at gmail.com
Wed Feb 24 12:38:34 CET 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Just tested DRBD 9.0.1 and it still crashes with the same kernel panic 
at the same line:

---------------------------
[ 1892.949041] drbd r0/0 drbd0: LOGIC BUG for enr=107636
[ 1892.954170] drbd r0/0 drbd0: LOGIC BUG for enr=107636
[ 1893.141512] ------------[ cut here ]------------
[ 1893.146192] kernel BUG at 
/home/dietmar/pve4-devel/pve-kernel/drbd-9.0.1-1/drbd/lru_cache.c: 571!
[ 1893.155075] invalid opcode: 0000 [#1] SMP
[ 1893.159244] Modules linked in: ip_set ip6table_filter ip6_tables 
drbd_transport_tcp(O) drbd(O) libcrc32c softdog nfsd auth_rpcgss nfs_acl 
nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport 
iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 
nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) 
spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon ttm gpio_ich 
drm_kms_helper drm psmouse coretemp snd_pcm i2c_algo_bit kvm_intel 
snd_timer snd kvm soundcore input_leds hpilo shpchp serio_raw 
i7core_edac pcspkr acpi_power_meter ipmi_si lpc_ich ipmi_msghandler 
8250_fintek mac_hid edac_core vhost_net vhost macvtap macvlan autofs4 
hid_generic usbkbd usbmouse usbhid hid pata_acpi tg3 e1000e(O) 
ptppps_core hpsa
[ 1893.245546] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P IO    4.2.8-1-pve #1
[ 1893.253218] Hardware name: HP ProLiant ML350 G6, BIOS D22 08/16/2015
[ 1893.259682] task: ffff88020e29be80 ti: ffff88020e2b0000 task.ti: 
ffff88020e2b0000
[ 1893.267274] RIP: 0010:[<ffffffffc0ab0fe0>] [<ffffffffc0ab0fe0>] 
lc_put+0x90/0xa0 [drbd]
[ 1893.275483] RSP: 0018:ffff880217503ac8  EFLAGS: 00010046
[ 1893.280853] RAX: 0000000000000000 RBX: 000000000001a474 RCX: 
ffff8800357d9900
[ 1893.288066] RDX: ffff8800dec48000 RSI: ffff8800357d9900 RDI: 
ffff88020b2a6b40
[ 1893.295306] RBP: ffff880217503ac8 R08: 0000000000000011 R09: 
0000000000000000
[ 1893.302520] R10: ffff8801a5e3edc0 R11: 0000000000000166 R12: 
ffff88020c478c00
[ 1893.309733] R13: 0000000000000000 R14: 000000000001a474 R15: 
0000000000000001
[ 1893.316981] FS:  0000000000000000(0000) GS:ffff880217500000(0000) 
knlGS:0000000000000000
[ 1893.325160] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1893.330996] CR2: 00007f47508cbf70 CR3: 0000000001e0d000 CR4: 
00000000000026e0
[ 1893.338207] Stack:
[ 1893.340241]  ffff880217503b18 ffffffffc0aadd0a 0000000000000046 
ffff88020c478eb0
[ 1893.347776]  ffff88020c478c08 ffff8801a5e3e978 ffff88020c478c00 
ffff8801a5e3e988
[ 1893.355326]  0000000000000800 0000000000004000 ffff880217503b28 
ffffffffc0aae210
[ 1893.362876] Call Trace:
[ 1893.365348]  <IRQ>
[ 1893.367302]  [<ffffffffc0aadd0a>] put_actlog+0x6a/0x120 [drbd]
[ 1893.373395]  [<ffffffffc0aae210>] drbd_al_complete_io+0x30/0x40 [drbd]
[ 1893.380000]  [<ffffffffc0aa8342>] drbd_req_destroy+0x442/0x880 [drbd]
[ 1893.386518]  [<ffffffffc0aa7996>] ? 
drbd_req_put_completion_ref+0x116/0x350 [drbd]
[ 1893.394177]  [<ffffffffc0aa8c88>] mod_rq_state+0x508/0x7c0 [drbd]
[ 1893.404919]  [<ffffffff811852bf>] ? mempool_free+0x2f/0x90
[ 1893.415114]  [<ffffffffc0aa90f7>] __req_mod+0xd7/0x8d0 [drbd]
[ 1893.425501]  [<ffffffffc0a8ff81>] drbd_request_endio+0x81/0x230 [drbd]
[ 1893.436651]  [<ffffffff813954c7>] bio_endio+0x57/0x90
[ 1893.446272]  [<ffffffff8139c31f>] blk_update_request+0x8f/0x340
[ 1893.456751]  [<ffffffff81583f23>] scsi_end_request+0x33/0x1c0
[ 1893.467069]  [<ffffffff815864d4>] scsi_io_completion+0xc4/0x650
[ 1893.477558]  [<ffffffff8157d50f>] scsi_finish_command+0xcf/0x120
[ 1893.488152]  [<ffffffff81585d26>] scsi_softirq_done+0x126/0x150
[ 1893.498614]  [<ffffffff813a2f47>] blk_done_softirq+0x87/0xb0
[ 1893.508796]  [<ffffffff81080095>] __do_softirq+0x105/0x260
[ 1893.518755]  [<ffffffff8108034e>] irq_exit+0x8e/0x90
[ 1893.528139]  [<ffffffff8180d6f8>] do_IRQ+0x58/0xe0
[ 1893.537325]  [<ffffffff8180b66b>] common_interrupt+0x6b/0x6b
[ 1893.547299]  <EOI>
[ 1893.549247]  [<ffffffff8168d011>] ? cpuidle_enter_state+0xf1/0x220
[ 1893.564052]  [<ffffffff8168cff0>] ? cpuidle_enter_state+0xd0/0x220
[ 1893.574285]  [<ffffffff8168d177>] cpuidle_enter+0x17/0x20
[ 1893.583642]  [<ffffffff810be18b>] call_cpuidle+0x3b/0x70
[ 1893.592753]  [<ffffffff8168d153>] ? cpuidle_select+0x13/0x20
[ 1893.602118]  [<ffffffff810be45c>] cpu_startup_entry+0x29c/0x360
[ 1893.611711]  [<ffffffff8104d983>] start_secondary+0x183/0x1c0
[ 1893.620980] Code: 89 42 08 48 89 56 10 48 89 7e 18 48 89 07 83 6f 64 
01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20 5d c3 0f 0b 
0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
[ 1893.647996] RIP  [<ffffffffc0ab0fe0>] lc_put+0x90/0xa0 [drbd]
[ 1893.657350]  RSP <ffff880217503ac8>
[ 1893.664377] ---[ end trace 00eeba9098fc3948 ]---
[ 1893.672498] Kernel panic - not syncing: Fatal exception in interrupt
[ 1894.745252] Shutting down cpus with NMI
[ 1894.752650] Kernel Offset: disabled
[ 1894.759570] drm_kms_helper: panic occurred, switching back to text 
console
[ 1894.769935] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt
[ 1894.780616] ------------[ cut here ]------------
[ 1894.788757] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smp.c:124 
native_smp_send_reschedule+0x60/0x70()
[ 1894.801701] Modules linked in: ip_set ip6table_filter ip6_tables 
drbd_transport_tcp(O) drbd(O) libcrc32c softdog nfsd auth_rpcgss nfs_acl 
nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport 
iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 
nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) 
spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon ttm gpio_ich 
drm_kms_helper drm psmouse coretemp snd_pcm i2c_algo_bit kvm_intel 
snd_timer snd kvm soundcore input_leds hpilo shpchp serio_raw 
i7core_edac pcspkr acpi_power_meter ipmi_si lpc_ich ipmi_msghandler 
8250_fintek mac_hid edac_core vhost_net vhost macvtap macvlan autofs4 
hid_generic usbkbd usbmouse usbhid hid pata_acpi tg3 e1000e(O) 
ptppps_core hpsa
[ 1894.918913] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P      D IO    
4.2.8-1-pve #1
[ 1894.930775] Hardware name: HP ProLiant ML350 G6, BIOS D22 08/16/2015
[ 1894.941441]  0000000000000000 cb864877fc32c408 ffff880217503530 
ffffffff81803a9b
[ 1894.953278]  0000000000000000 0000000000000000 ffff880217503570 
ffffffff8107bbfa
[ 1894.965055]  ffff880217503560 0000000000000000 ffff880217416a00 
0000000000000004
[ 1894.976768] Call Trace:
[ 1894.983495]  <IRQ>  [<ffffffff81803a9b>] dump_stack+0x45/0x57
[ 1894.993593]  [<ffffffff8107bbfa>] warn_slowpath_common+0x8a/0xc0
[ 1895.003920]  [<ffffffff8107bd2a>] warn_slowpath_null+0x1a/0x20
[ 1895.014099]  [<ffffffff8104cc50>] native_smp_send_reschedule+0x60/0x70
[ 1895.024995]  [<ffffffff810b897b>] trigger_load_balance+0x13b/0x230
[ 1895.035518]  [<ffffffff810a7ab6>] scheduler_tick+0xa6/0xd0
[ 1895.045349]  [<ffffffff810f7ac0>] ? tick_sched_do_timer+0x30/0x30
[ 1895.055802]  [<ffffffff810e81b1>] update_process_times+0x51/0x60
[ 1895.066195]  [<ffffffff810f74b5>] tick_sched_handle.isra.15+0x25/0x60
[ 1895.077027]  [<ffffffff810f7b04>] tick_sched_timer+0x44/0x80
[ 1895.087031]  [<ffffffff810e8d83>] __hrtimer_run_queues+0xf3/0x220
[ 1895.097489]  [<ffffffff810e91e8>] hrtimer_interrupt+0xa8/0x1a0
[ 1895.107695]  [<ffffffff8104f57c>] local_apic_timer_interrupt+0x3c/0x70
[ 1895.118653]  [<ffffffff8180d7c1>] smp_apic_timer_interrupt+0x41/0x60
[ 1895.129425]  [<ffffffff8180b95b>] apic_timer_interrupt+0x6b/0x70
[ 1895.139812]  [<ffffffff818018a0>] ? panic+0x1d3/0x217
[ 1895.149264]  [<ffffffff8180189c>] ? panic+0x1cf/0x217
[ 1895.158676]  [<ffffffff810180a6>] oops_end+0xd6/0xe0
[ 1895.167997]  [<ffffffff810185cb>] die+0x4b/0x70
[ 1895.176869]  [<ffffffff810154bd>] do_trap+0x13d/0x150
[ 1895.186265]  [<ffffffff81015a99>] do_error_trap+0x89/0x110
[ 1895.196100]  [<ffffffffc0ab0fe0>] ? lc_put+0x90/0xa0 [drbd]
[ 1895.205911]  [<ffffffff8118a069>] ? __free_pages+0x19/0x30
[ 1895.215521]  [<ffffffff811dbf6a>] ? __free_slab+0xda/0x1e0
[ 1895.225002]  [<ffffffff81015dc0>] do_invalid_op+0x20/0x30
[ 1895.234412]  [<ffffffff8180c41e>] invalid_op+0x1e/0x30
[ 1895.243541]  [<ffffffffc0ab0fe0>] ? lc_put+0x90/0xa0 [drbd]
[ 1895.253039]  [<ffffffffc0ab0d50>] ? lc_find+0x10/0x20 [drbd]
[ 1895.262582]  [<ffffffffc0aadd0a>] put_actlog+0x6a/0x120 [drbd]
[ 1895.272310]  [<ffffffffc0aae210>] drbd_al_complete_io+0x30/0x40 [drbd]
[ 1895.282784]  [<ffffffffc0aa8342>] drbd_req_destroy+0x442/0x880 [drbd]
[ 1895.293104]  [<ffffffffc0aa7996>] ? 
drbd_req_put_completion_ref+0x116/0x350 [drbd]
[ 1895.304576]  [<ffffffffc0aa8c88>] mod_rq_state+0x508/0x7c0 [drbd]
[ 1895.314461]  [<ffffffff811852bf>] ? mempool_free+0x2f/0x90
[ 1895.323681]  [<ffffffffc0aa90f7>] __req_mod+0xd7/0x8d0 [drbd]
[ 1895.333029]  [<ffffffffc0a8ff81>] drbd_request_endio+0x81/0x230 [drbd]
[ 1895.343097]  [<ffffffff813954c7>] bio_endio+0x57/0x90
[ 1895.351572]  [<ffffffff8139c31f>] blk_update_request+0x8f/0x340
[ 1895.360826]  [<ffffffff81583f23>] scsi_end_request+0x33/0x1c0
[ 1895.369798]  [<ffffffff815864d4>] scsi_io_completion+0xc4/0x650
[ 1895.378798]  [<ffffffff8157d50f>] scsi_finish_command+0xcf/0x120
[ 1895.387812]  [<ffffffff81585d26>] scsi_softirq_done+0x126/0x150
[ 1895.396678]  [<ffffffff813a2f47>] blk_done_softirq+0x87/0xb0
[ 1895.405243]  [<ffffffff81080095>] __do_softirq+0x105/0x260
[ 1895.413575]  [<ffffffff8108034e>] irq_exit+0x8e/0x90
[ 1895.421405]  [<ffffffff8180d6f8>] do_IRQ+0x58/0xe0
[ 1895.429037]  [<ffffffff8180b66b>] common_interrupt+0x6b/0x6b
[ 1895.437471]  <EOI>  [<ffffffff8168d011>] ? cpuidle_enter_state+0xf1/0x220
[ 1895.447063]  [<ffffffff8168cff0>] ? cpuidle_enter_state+0xd0/0x220
[ 1895.456071]  [<ffffffff8168d177>] cpuidle_enter+0x17/0x20
[ 1895.464262]  [<ffffffff810be18b>] call_cpuidle+0x3b/0x70
[ 1895.472391]  [<ffffffff8168d153>] ? cpuidle_select+0x13/0x20
[ 1895.480850]  [<ffffffff810be45c>] cpu_startup_entry+0x29c/0x360
[ 1895.489579]  [<ffffffff8104d983>] start_secondary+0x183/0x1c0
[ 1895.498098] ---[ end trace 00eeba9098fc3949 ]---
-------------------

I was watching "drbdadm status" each 2s.
This is its last output before the panic:
-------------------
r0 node-id:0 role:Primary suspended:no
     write-ordering:drain
   volume:0 minor:0 disk:UpToDate
       size:488336928 read:829508 written:5750835 al-writes:2689 bm-writes:0
       upper-pending:320 lower-pending:320 al-suspended:no blocked:no
   srvvmhost2 node-id:1 connection:Connected role:Primary
       congested:no
     volume:0 replication:Established peer-disk:UpToDate
         resync-suspended:no
         received:1034427 sent:4717688 out-of-sync:0 pending:0 unacked:0
-------------------

I suppose that version 9.0.1 is not targeting this bug.
@Lars: can you confirm it?

@Dietmar: what's my best option now?
I'd like to stay on DRBD9, but I urge to fix this kernel panic soon 
because the hosts are already in production.
Self compiling 8.4 could be an option but I suppose Proxmox will use 9.x 
in the future and never get back to 8.4.
Am I right or is there a special kernel version with 8.4?

@Lars: in case of a downgrade (if I decide to build 8.4 by myself and 
enter the versioning hell), is this the right path?
1) move all of the VMs to node B
2) downgrade node A module 9.0-->8.4
3) ... resource metadata? ...
4) reboot A (now 8.4) and reconnect to node B (still at 9.0)
5) repeat 2) and 3) on node B

Could you please help me on points 3) and 4)?

Thank you all for helping

Claudio

Il 24/02/2016 10:05, Claudio ha scritto:
> That's great, will test it immediately and report back...
>
> Thanks
>
> Il 24/02/2016 10:01, Dietmar Maurer wrote:
>>> Upgrade to 9.0.1: @Lars, was this fixed in DRBD 9.0.1, so I could ask
>>> Proxmox guys to build a kernel with this DRBD version (or trying to
>>> build it by myself)?
>> I just build a new proxmox kernel with 9.0.1 - will upload today to pvetest ...
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160224/f0f381f5/attachment.htm>


More information about the drbd-user mailing list