Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Just tested DRBD 9.0.1 and it still crashes with the same kernel panic
at the same line:
---------------------------
[ 1892.949041] drbd r0/0 drbd0: LOGIC BUG for enr=107636
[ 1892.954170] drbd r0/0 drbd0: LOGIC BUG for enr=107636
[ 1893.141512] ------------[ cut here ]------------
[ 1893.146192] kernel BUG at
/home/dietmar/pve4-devel/pve-kernel/drbd-9.0.1-1/drbd/lru_cache.c: 571!
[ 1893.155075] invalid opcode: 0000 [#1] SMP
[ 1893.159244] Modules linked in: ip_set ip6table_filter ip6_tables
drbd_transport_tcp(O) drbd(O) libcrc32c softdog nfsd auth_rpcgss nfs_acl
nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport
iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables
nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO)
spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon ttm gpio_ich
drm_kms_helper drm psmouse coretemp snd_pcm i2c_algo_bit kvm_intel
snd_timer snd kvm soundcore input_leds hpilo shpchp serio_raw
i7core_edac pcspkr acpi_power_meter ipmi_si lpc_ich ipmi_msghandler
8250_fintek mac_hid edac_core vhost_net vhost macvtap macvlan autofs4
hid_generic usbkbd usbmouse usbhid hid pata_acpi tg3 e1000e(O)
ptppps_core hpsa
[ 1893.245546] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P IO 4.2.8-1-pve #1
[ 1893.253218] Hardware name: HP ProLiant ML350 G6, BIOS D22 08/16/2015
[ 1893.259682] task: ffff88020e29be80 ti: ffff88020e2b0000 task.ti:
ffff88020e2b0000
[ 1893.267274] RIP: 0010:[<ffffffffc0ab0fe0>] [<ffffffffc0ab0fe0>]
lc_put+0x90/0xa0 [drbd]
[ 1893.275483] RSP: 0018:ffff880217503ac8 EFLAGS: 00010046
[ 1893.280853] RAX: 0000000000000000 RBX: 000000000001a474 RCX:
ffff8800357d9900
[ 1893.288066] RDX: ffff8800dec48000 RSI: ffff8800357d9900 RDI:
ffff88020b2a6b40
[ 1893.295306] RBP: ffff880217503ac8 R08: 0000000000000011 R09:
0000000000000000
[ 1893.302520] R10: ffff8801a5e3edc0 R11: 0000000000000166 R12:
ffff88020c478c00
[ 1893.309733] R13: 0000000000000000 R14: 000000000001a474 R15:
0000000000000001
[ 1893.316981] FS: 0000000000000000(0000) GS:ffff880217500000(0000)
knlGS:0000000000000000
[ 1893.325160] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1893.330996] CR2: 00007f47508cbf70 CR3: 0000000001e0d000 CR4:
00000000000026e0
[ 1893.338207] Stack:
[ 1893.340241] ffff880217503b18 ffffffffc0aadd0a 0000000000000046
ffff88020c478eb0
[ 1893.347776] ffff88020c478c08 ffff8801a5e3e978 ffff88020c478c00
ffff8801a5e3e988
[ 1893.355326] 0000000000000800 0000000000004000 ffff880217503b28
ffffffffc0aae210
[ 1893.362876] Call Trace:
[ 1893.365348] <IRQ>
[ 1893.367302] [<ffffffffc0aadd0a>] put_actlog+0x6a/0x120 [drbd]
[ 1893.373395] [<ffffffffc0aae210>] drbd_al_complete_io+0x30/0x40 [drbd]
[ 1893.380000] [<ffffffffc0aa8342>] drbd_req_destroy+0x442/0x880 [drbd]
[ 1893.386518] [<ffffffffc0aa7996>] ?
drbd_req_put_completion_ref+0x116/0x350 [drbd]
[ 1893.394177] [<ffffffffc0aa8c88>] mod_rq_state+0x508/0x7c0 [drbd]
[ 1893.404919] [<ffffffff811852bf>] ? mempool_free+0x2f/0x90
[ 1893.415114] [<ffffffffc0aa90f7>] __req_mod+0xd7/0x8d0 [drbd]
[ 1893.425501] [<ffffffffc0a8ff81>] drbd_request_endio+0x81/0x230 [drbd]
[ 1893.436651] [<ffffffff813954c7>] bio_endio+0x57/0x90
[ 1893.446272] [<ffffffff8139c31f>] blk_update_request+0x8f/0x340
[ 1893.456751] [<ffffffff81583f23>] scsi_end_request+0x33/0x1c0
[ 1893.467069] [<ffffffff815864d4>] scsi_io_completion+0xc4/0x650
[ 1893.477558] [<ffffffff8157d50f>] scsi_finish_command+0xcf/0x120
[ 1893.488152] [<ffffffff81585d26>] scsi_softirq_done+0x126/0x150
[ 1893.498614] [<ffffffff813a2f47>] blk_done_softirq+0x87/0xb0
[ 1893.508796] [<ffffffff81080095>] __do_softirq+0x105/0x260
[ 1893.518755] [<ffffffff8108034e>] irq_exit+0x8e/0x90
[ 1893.528139] [<ffffffff8180d6f8>] do_IRQ+0x58/0xe0
[ 1893.537325] [<ffffffff8180b66b>] common_interrupt+0x6b/0x6b
[ 1893.547299] <EOI>
[ 1893.549247] [<ffffffff8168d011>] ? cpuidle_enter_state+0xf1/0x220
[ 1893.564052] [<ffffffff8168cff0>] ? cpuidle_enter_state+0xd0/0x220
[ 1893.574285] [<ffffffff8168d177>] cpuidle_enter+0x17/0x20
[ 1893.583642] [<ffffffff810be18b>] call_cpuidle+0x3b/0x70
[ 1893.592753] [<ffffffff8168d153>] ? cpuidle_select+0x13/0x20
[ 1893.602118] [<ffffffff810be45c>] cpu_startup_entry+0x29c/0x360
[ 1893.611711] [<ffffffff8104d983>] start_secondary+0x183/0x1c0
[ 1893.620980] Code: 89 42 08 48 89 56 10 48 89 7e 18 48 89 07 83 6f 64
01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20 5d c3 0f 0b
0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
[ 1893.647996] RIP [<ffffffffc0ab0fe0>] lc_put+0x90/0xa0 [drbd]
[ 1893.657350] RSP <ffff880217503ac8>
[ 1893.664377] ---[ end trace 00eeba9098fc3948 ]---
[ 1893.672498] Kernel panic - not syncing: Fatal exception in interrupt
[ 1894.745252] Shutting down cpus with NMI
[ 1894.752650] Kernel Offset: disabled
[ 1894.759570] drm_kms_helper: panic occurred, switching back to text
console
[ 1894.769935] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt
[ 1894.780616] ------------[ cut here ]------------
[ 1894.788757] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smp.c:124
native_smp_send_reschedule+0x60/0x70()
[ 1894.801701] Modules linked in: ip_set ip6table_filter ip6_tables
drbd_transport_tcp(O) drbd(O) libcrc32c softdog nfsd auth_rpcgss nfs_acl
nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport
iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables
nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO)
spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon ttm gpio_ich
drm_kms_helper drm psmouse coretemp snd_pcm i2c_algo_bit kvm_intel
snd_timer snd kvm soundcore input_leds hpilo shpchp serio_raw
i7core_edac pcspkr acpi_power_meter ipmi_si lpc_ich ipmi_msghandler
8250_fintek mac_hid edac_core vhost_net vhost macvtap macvlan autofs4
hid_generic usbkbd usbmouse usbhid hid pata_acpi tg3 e1000e(O)
ptppps_core hpsa
[ 1894.918913] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P D IO
4.2.8-1-pve #1
[ 1894.930775] Hardware name: HP ProLiant ML350 G6, BIOS D22 08/16/2015
[ 1894.941441] 0000000000000000 cb864877fc32c408 ffff880217503530
ffffffff81803a9b
[ 1894.953278] 0000000000000000 0000000000000000 ffff880217503570
ffffffff8107bbfa
[ 1894.965055] ffff880217503560 0000000000000000 ffff880217416a00
0000000000000004
[ 1894.976768] Call Trace:
[ 1894.983495] <IRQ> [<ffffffff81803a9b>] dump_stack+0x45/0x57
[ 1894.993593] [<ffffffff8107bbfa>] warn_slowpath_common+0x8a/0xc0
[ 1895.003920] [<ffffffff8107bd2a>] warn_slowpath_null+0x1a/0x20
[ 1895.014099] [<ffffffff8104cc50>] native_smp_send_reschedule+0x60/0x70
[ 1895.024995] [<ffffffff810b897b>] trigger_load_balance+0x13b/0x230
[ 1895.035518] [<ffffffff810a7ab6>] scheduler_tick+0xa6/0xd0
[ 1895.045349] [<ffffffff810f7ac0>] ? tick_sched_do_timer+0x30/0x30
[ 1895.055802] [<ffffffff810e81b1>] update_process_times+0x51/0x60
[ 1895.066195] [<ffffffff810f74b5>] tick_sched_handle.isra.15+0x25/0x60
[ 1895.077027] [<ffffffff810f7b04>] tick_sched_timer+0x44/0x80
[ 1895.087031] [<ffffffff810e8d83>] __hrtimer_run_queues+0xf3/0x220
[ 1895.097489] [<ffffffff810e91e8>] hrtimer_interrupt+0xa8/0x1a0
[ 1895.107695] [<ffffffff8104f57c>] local_apic_timer_interrupt+0x3c/0x70
[ 1895.118653] [<ffffffff8180d7c1>] smp_apic_timer_interrupt+0x41/0x60
[ 1895.129425] [<ffffffff8180b95b>] apic_timer_interrupt+0x6b/0x70
[ 1895.139812] [<ffffffff818018a0>] ? panic+0x1d3/0x217
[ 1895.149264] [<ffffffff8180189c>] ? panic+0x1cf/0x217
[ 1895.158676] [<ffffffff810180a6>] oops_end+0xd6/0xe0
[ 1895.167997] [<ffffffff810185cb>] die+0x4b/0x70
[ 1895.176869] [<ffffffff810154bd>] do_trap+0x13d/0x150
[ 1895.186265] [<ffffffff81015a99>] do_error_trap+0x89/0x110
[ 1895.196100] [<ffffffffc0ab0fe0>] ? lc_put+0x90/0xa0 [drbd]
[ 1895.205911] [<ffffffff8118a069>] ? __free_pages+0x19/0x30
[ 1895.215521] [<ffffffff811dbf6a>] ? __free_slab+0xda/0x1e0
[ 1895.225002] [<ffffffff81015dc0>] do_invalid_op+0x20/0x30
[ 1895.234412] [<ffffffff8180c41e>] invalid_op+0x1e/0x30
[ 1895.243541] [<ffffffffc0ab0fe0>] ? lc_put+0x90/0xa0 [drbd]
[ 1895.253039] [<ffffffffc0ab0d50>] ? lc_find+0x10/0x20 [drbd]
[ 1895.262582] [<ffffffffc0aadd0a>] put_actlog+0x6a/0x120 [drbd]
[ 1895.272310] [<ffffffffc0aae210>] drbd_al_complete_io+0x30/0x40 [drbd]
[ 1895.282784] [<ffffffffc0aa8342>] drbd_req_destroy+0x442/0x880 [drbd]
[ 1895.293104] [<ffffffffc0aa7996>] ?
drbd_req_put_completion_ref+0x116/0x350 [drbd]
[ 1895.304576] [<ffffffffc0aa8c88>] mod_rq_state+0x508/0x7c0 [drbd]
[ 1895.314461] [<ffffffff811852bf>] ? mempool_free+0x2f/0x90
[ 1895.323681] [<ffffffffc0aa90f7>] __req_mod+0xd7/0x8d0 [drbd]
[ 1895.333029] [<ffffffffc0a8ff81>] drbd_request_endio+0x81/0x230 [drbd]
[ 1895.343097] [<ffffffff813954c7>] bio_endio+0x57/0x90
[ 1895.351572] [<ffffffff8139c31f>] blk_update_request+0x8f/0x340
[ 1895.360826] [<ffffffff81583f23>] scsi_end_request+0x33/0x1c0
[ 1895.369798] [<ffffffff815864d4>] scsi_io_completion+0xc4/0x650
[ 1895.378798] [<ffffffff8157d50f>] scsi_finish_command+0xcf/0x120
[ 1895.387812] [<ffffffff81585d26>] scsi_softirq_done+0x126/0x150
[ 1895.396678] [<ffffffff813a2f47>] blk_done_softirq+0x87/0xb0
[ 1895.405243] [<ffffffff81080095>] __do_softirq+0x105/0x260
[ 1895.413575] [<ffffffff8108034e>] irq_exit+0x8e/0x90
[ 1895.421405] [<ffffffff8180d6f8>] do_IRQ+0x58/0xe0
[ 1895.429037] [<ffffffff8180b66b>] common_interrupt+0x6b/0x6b
[ 1895.437471] <EOI> [<ffffffff8168d011>] ? cpuidle_enter_state+0xf1/0x220
[ 1895.447063] [<ffffffff8168cff0>] ? cpuidle_enter_state+0xd0/0x220
[ 1895.456071] [<ffffffff8168d177>] cpuidle_enter+0x17/0x20
[ 1895.464262] [<ffffffff810be18b>] call_cpuidle+0x3b/0x70
[ 1895.472391] [<ffffffff8168d153>] ? cpuidle_select+0x13/0x20
[ 1895.480850] [<ffffffff810be45c>] cpu_startup_entry+0x29c/0x360
[ 1895.489579] [<ffffffff8104d983>] start_secondary+0x183/0x1c0
[ 1895.498098] ---[ end trace 00eeba9098fc3949 ]---
-------------------
I was watching "drbdadm status" each 2s.
This is its last output before the panic:
-------------------
r0 node-id:0 role:Primary suspended:no
write-ordering:drain
volume:0 minor:0 disk:UpToDate
size:488336928 read:829508 written:5750835 al-writes:2689 bm-writes:0
upper-pending:320 lower-pending:320 al-suspended:no blocked:no
srvvmhost2 node-id:1 connection:Connected role:Primary
congested:no
volume:0 replication:Established peer-disk:UpToDate
resync-suspended:no
received:1034427 sent:4717688 out-of-sync:0 pending:0 unacked:0
-------------------
I suppose that version 9.0.1 is not targeting this bug.
@Lars: can you confirm it?
@Dietmar: what's my best option now?
I'd like to stay on DRBD9, but I urge to fix this kernel panic soon
because the hosts are already in production.
Self compiling 8.4 could be an option but I suppose Proxmox will use 9.x
in the future and never get back to 8.4.
Am I right or is there a special kernel version with 8.4?
@Lars: in case of a downgrade (if I decide to build 8.4 by myself and
enter the versioning hell), is this the right path?
1) move all of the VMs to node B
2) downgrade node A module 9.0-->8.4
3) ... resource metadata? ...
4) reboot A (now 8.4) and reconnect to node B (still at 9.0)
5) repeat 2) and 3) on node B
Could you please help me on points 3) and 4)?
Thank you all for helping
Claudio
Il 24/02/2016 10:05, Claudio ha scritto:
> That's great, will test it immediately and report back...
>
> Thanks
>
> Il 24/02/2016 10:01, Dietmar Maurer wrote:
>>> Upgrade to 9.0.1: @Lars, was this fixed in DRBD 9.0.1, so I could ask
>>> Proxmox guys to build a kernel with this DRBD version (or trying to
>>> build it by myself)?
>> I just build a new proxmox kernel with 9.0.1 - will upload today to pvetest ...
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160224/f0f381f5/attachment.htm>