Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Just tested DRBD 9.0.1 and it still crashes with the same kernel panic at the same line: --------------------------- [ 1892.949041] drbd r0/0 drbd0: LOGIC BUG for enr=107636 [ 1892.954170] drbd r0/0 drbd0: LOGIC BUG for enr=107636 [ 1893.141512] ------------[ cut here ]------------ [ 1893.146192] kernel BUG at /home/dietmar/pve4-devel/pve-kernel/drbd-9.0.1-1/drbd/lru_cache.c: 571! [ 1893.155075] invalid opcode: 0000 [#1] SMP [ 1893.159244] Modules linked in: ip_set ip6table_filter ip6_tables drbd_transport_tcp(O) drbd(O) libcrc32c softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon ttm gpio_ich drm_kms_helper drm psmouse coretemp snd_pcm i2c_algo_bit kvm_intel snd_timer snd kvm soundcore input_leds hpilo shpchp serio_raw i7core_edac pcspkr acpi_power_meter ipmi_si lpc_ich ipmi_msghandler 8250_fintek mac_hid edac_core vhost_net vhost macvtap macvlan autofs4 hid_generic usbkbd usbmouse usbhid hid pata_acpi tg3 e1000e(O) ptppps_core hpsa [ 1893.245546] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P IO 4.2.8-1-pve #1 [ 1893.253218] Hardware name: HP ProLiant ML350 G6, BIOS D22 08/16/2015 [ 1893.259682] task: ffff88020e29be80 ti: ffff88020e2b0000 task.ti: ffff88020e2b0000 [ 1893.267274] RIP: 0010:[<ffffffffc0ab0fe0>] [<ffffffffc0ab0fe0>] lc_put+0x90/0xa0 [drbd] [ 1893.275483] RSP: 0018:ffff880217503ac8 EFLAGS: 00010046 [ 1893.280853] RAX: 0000000000000000 RBX: 000000000001a474 RCX: ffff8800357d9900 [ 1893.288066] RDX: ffff8800dec48000 RSI: ffff8800357d9900 RDI: ffff88020b2a6b40 [ 1893.295306] RBP: ffff880217503ac8 R08: 0000000000000011 R09: 0000000000000000 [ 1893.302520] R10: ffff8801a5e3edc0 R11: 0000000000000166 R12: ffff88020c478c00 [ 1893.309733] R13: 0000000000000000 R14: 000000000001a474 R15: 0000000000000001 [ 1893.316981] FS: 0000000000000000(0000) GS:ffff880217500000(0000) knlGS:0000000000000000 [ 1893.325160] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1893.330996] CR2: 00007f47508cbf70 CR3: 0000000001e0d000 CR4: 00000000000026e0 [ 1893.338207] Stack: [ 1893.340241] ffff880217503b18 ffffffffc0aadd0a 0000000000000046 ffff88020c478eb0 [ 1893.347776] ffff88020c478c08 ffff8801a5e3e978 ffff88020c478c00 ffff8801a5e3e988 [ 1893.355326] 0000000000000800 0000000000004000 ffff880217503b28 ffffffffc0aae210 [ 1893.362876] Call Trace: [ 1893.365348] <IRQ> [ 1893.367302] [<ffffffffc0aadd0a>] put_actlog+0x6a/0x120 [drbd] [ 1893.373395] [<ffffffffc0aae210>] drbd_al_complete_io+0x30/0x40 [drbd] [ 1893.380000] [<ffffffffc0aa8342>] drbd_req_destroy+0x442/0x880 [drbd] [ 1893.386518] [<ffffffffc0aa7996>] ? drbd_req_put_completion_ref+0x116/0x350 [drbd] [ 1893.394177] [<ffffffffc0aa8c88>] mod_rq_state+0x508/0x7c0 [drbd] [ 1893.404919] [<ffffffff811852bf>] ? mempool_free+0x2f/0x90 [ 1893.415114] [<ffffffffc0aa90f7>] __req_mod+0xd7/0x8d0 [drbd] [ 1893.425501] [<ffffffffc0a8ff81>] drbd_request_endio+0x81/0x230 [drbd] [ 1893.436651] [<ffffffff813954c7>] bio_endio+0x57/0x90 [ 1893.446272] [<ffffffff8139c31f>] blk_update_request+0x8f/0x340 [ 1893.456751] [<ffffffff81583f23>] scsi_end_request+0x33/0x1c0 [ 1893.467069] [<ffffffff815864d4>] scsi_io_completion+0xc4/0x650 [ 1893.477558] [<ffffffff8157d50f>] scsi_finish_command+0xcf/0x120 [ 1893.488152] [<ffffffff81585d26>] scsi_softirq_done+0x126/0x150 [ 1893.498614] [<ffffffff813a2f47>] blk_done_softirq+0x87/0xb0 [ 1893.508796] [<ffffffff81080095>] __do_softirq+0x105/0x260 [ 1893.518755] [<ffffffff8108034e>] irq_exit+0x8e/0x90 [ 1893.528139] [<ffffffff8180d6f8>] do_IRQ+0x58/0xe0 [ 1893.537325] [<ffffffff8180b66b>] common_interrupt+0x6b/0x6b [ 1893.547299] <EOI> [ 1893.549247] [<ffffffff8168d011>] ? cpuidle_enter_state+0xf1/0x220 [ 1893.564052] [<ffffffff8168cff0>] ? cpuidle_enter_state+0xd0/0x220 [ 1893.574285] [<ffffffff8168d177>] cpuidle_enter+0x17/0x20 [ 1893.583642] [<ffffffff810be18b>] call_cpuidle+0x3b/0x70 [ 1893.592753] [<ffffffff8168d153>] ? cpuidle_select+0x13/0x20 [ 1893.602118] [<ffffffff810be45c>] cpu_startup_entry+0x29c/0x360 [ 1893.611711] [<ffffffff8104d983>] start_secondary+0x183/0x1c0 [ 1893.620980] Code: 89 42 08 48 89 56 10 48 89 7e 18 48 89 07 83 6f 64 01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20 5d c3 0f 0b 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 [ 1893.647996] RIP [<ffffffffc0ab0fe0>] lc_put+0x90/0xa0 [drbd] [ 1893.657350] RSP <ffff880217503ac8> [ 1893.664377] ---[ end trace 00eeba9098fc3948 ]--- [ 1893.672498] Kernel panic - not syncing: Fatal exception in interrupt [ 1894.745252] Shutting down cpus with NMI [ 1894.752650] Kernel Offset: disabled [ 1894.759570] drm_kms_helper: panic occurred, switching back to text console [ 1894.769935] ---[ end Kernel panic - not syncing: Fatal exception in interrupt [ 1894.780616] ------------[ cut here ]------------ [ 1894.788757] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x60/0x70() [ 1894.801701] Modules linked in: ip_set ip6table_filter ip6_tables drbd_transport_tcp(O) drbd(O) libcrc32c softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon ttm gpio_ich drm_kms_helper drm psmouse coretemp snd_pcm i2c_algo_bit kvm_intel snd_timer snd kvm soundcore input_leds hpilo shpchp serio_raw i7core_edac pcspkr acpi_power_meter ipmi_si lpc_ich ipmi_msghandler 8250_fintek mac_hid edac_core vhost_net vhost macvtap macvlan autofs4 hid_generic usbkbd usbmouse usbhid hid pata_acpi tg3 e1000e(O) ptppps_core hpsa [ 1894.918913] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P D IO 4.2.8-1-pve #1 [ 1894.930775] Hardware name: HP ProLiant ML350 G6, BIOS D22 08/16/2015 [ 1894.941441] 0000000000000000 cb864877fc32c408 ffff880217503530 ffffffff81803a9b [ 1894.953278] 0000000000000000 0000000000000000 ffff880217503570 ffffffff8107bbfa [ 1894.965055] ffff880217503560 0000000000000000 ffff880217416a00 0000000000000004 [ 1894.976768] Call Trace: [ 1894.983495] <IRQ> [<ffffffff81803a9b>] dump_stack+0x45/0x57 [ 1894.993593] [<ffffffff8107bbfa>] warn_slowpath_common+0x8a/0xc0 [ 1895.003920] [<ffffffff8107bd2a>] warn_slowpath_null+0x1a/0x20 [ 1895.014099] [<ffffffff8104cc50>] native_smp_send_reschedule+0x60/0x70 [ 1895.024995] [<ffffffff810b897b>] trigger_load_balance+0x13b/0x230 [ 1895.035518] [<ffffffff810a7ab6>] scheduler_tick+0xa6/0xd0 [ 1895.045349] [<ffffffff810f7ac0>] ? tick_sched_do_timer+0x30/0x30 [ 1895.055802] [<ffffffff810e81b1>] update_process_times+0x51/0x60 [ 1895.066195] [<ffffffff810f74b5>] tick_sched_handle.isra.15+0x25/0x60 [ 1895.077027] [<ffffffff810f7b04>] tick_sched_timer+0x44/0x80 [ 1895.087031] [<ffffffff810e8d83>] __hrtimer_run_queues+0xf3/0x220 [ 1895.097489] [<ffffffff810e91e8>] hrtimer_interrupt+0xa8/0x1a0 [ 1895.107695] [<ffffffff8104f57c>] local_apic_timer_interrupt+0x3c/0x70 [ 1895.118653] [<ffffffff8180d7c1>] smp_apic_timer_interrupt+0x41/0x60 [ 1895.129425] [<ffffffff8180b95b>] apic_timer_interrupt+0x6b/0x70 [ 1895.139812] [<ffffffff818018a0>] ? panic+0x1d3/0x217 [ 1895.149264] [<ffffffff8180189c>] ? panic+0x1cf/0x217 [ 1895.158676] [<ffffffff810180a6>] oops_end+0xd6/0xe0 [ 1895.167997] [<ffffffff810185cb>] die+0x4b/0x70 [ 1895.176869] [<ffffffff810154bd>] do_trap+0x13d/0x150 [ 1895.186265] [<ffffffff81015a99>] do_error_trap+0x89/0x110 [ 1895.196100] [<ffffffffc0ab0fe0>] ? lc_put+0x90/0xa0 [drbd] [ 1895.205911] [<ffffffff8118a069>] ? __free_pages+0x19/0x30 [ 1895.215521] [<ffffffff811dbf6a>] ? __free_slab+0xda/0x1e0 [ 1895.225002] [<ffffffff81015dc0>] do_invalid_op+0x20/0x30 [ 1895.234412] [<ffffffff8180c41e>] invalid_op+0x1e/0x30 [ 1895.243541] [<ffffffffc0ab0fe0>] ? lc_put+0x90/0xa0 [drbd] [ 1895.253039] [<ffffffffc0ab0d50>] ? lc_find+0x10/0x20 [drbd] [ 1895.262582] [<ffffffffc0aadd0a>] put_actlog+0x6a/0x120 [drbd] [ 1895.272310] [<ffffffffc0aae210>] drbd_al_complete_io+0x30/0x40 [drbd] [ 1895.282784] [<ffffffffc0aa8342>] drbd_req_destroy+0x442/0x880 [drbd] [ 1895.293104] [<ffffffffc0aa7996>] ? drbd_req_put_completion_ref+0x116/0x350 [drbd] [ 1895.304576] [<ffffffffc0aa8c88>] mod_rq_state+0x508/0x7c0 [drbd] [ 1895.314461] [<ffffffff811852bf>] ? mempool_free+0x2f/0x90 [ 1895.323681] [<ffffffffc0aa90f7>] __req_mod+0xd7/0x8d0 [drbd] [ 1895.333029] [<ffffffffc0a8ff81>] drbd_request_endio+0x81/0x230 [drbd] [ 1895.343097] [<ffffffff813954c7>] bio_endio+0x57/0x90 [ 1895.351572] [<ffffffff8139c31f>] blk_update_request+0x8f/0x340 [ 1895.360826] [<ffffffff81583f23>] scsi_end_request+0x33/0x1c0 [ 1895.369798] [<ffffffff815864d4>] scsi_io_completion+0xc4/0x650 [ 1895.378798] [<ffffffff8157d50f>] scsi_finish_command+0xcf/0x120 [ 1895.387812] [<ffffffff81585d26>] scsi_softirq_done+0x126/0x150 [ 1895.396678] [<ffffffff813a2f47>] blk_done_softirq+0x87/0xb0 [ 1895.405243] [<ffffffff81080095>] __do_softirq+0x105/0x260 [ 1895.413575] [<ffffffff8108034e>] irq_exit+0x8e/0x90 [ 1895.421405] [<ffffffff8180d6f8>] do_IRQ+0x58/0xe0 [ 1895.429037] [<ffffffff8180b66b>] common_interrupt+0x6b/0x6b [ 1895.437471] <EOI> [<ffffffff8168d011>] ? cpuidle_enter_state+0xf1/0x220 [ 1895.447063] [<ffffffff8168cff0>] ? cpuidle_enter_state+0xd0/0x220 [ 1895.456071] [<ffffffff8168d177>] cpuidle_enter+0x17/0x20 [ 1895.464262] [<ffffffff810be18b>] call_cpuidle+0x3b/0x70 [ 1895.472391] [<ffffffff8168d153>] ? cpuidle_select+0x13/0x20 [ 1895.480850] [<ffffffff810be45c>] cpu_startup_entry+0x29c/0x360 [ 1895.489579] [<ffffffff8104d983>] start_secondary+0x183/0x1c0 [ 1895.498098] ---[ end trace 00eeba9098fc3949 ]--- ------------------- I was watching "drbdadm status" each 2s. This is its last output before the panic: ------------------- r0 node-id:0 role:Primary suspended:no write-ordering:drain volume:0 minor:0 disk:UpToDate size:488336928 read:829508 written:5750835 al-writes:2689 bm-writes:0 upper-pending:320 lower-pending:320 al-suspended:no blocked:no srvvmhost2 node-id:1 connection:Connected role:Primary congested:no volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:1034427 sent:4717688 out-of-sync:0 pending:0 unacked:0 ------------------- I suppose that version 9.0.1 is not targeting this bug. @Lars: can you confirm it? @Dietmar: what's my best option now? I'd like to stay on DRBD9, but I urge to fix this kernel panic soon because the hosts are already in production. Self compiling 8.4 could be an option but I suppose Proxmox will use 9.x in the future and never get back to 8.4. Am I right or is there a special kernel version with 8.4? @Lars: in case of a downgrade (if I decide to build 8.4 by myself and enter the versioning hell), is this the right path? 1) move all of the VMs to node B 2) downgrade node A module 9.0-->8.4 3) ... resource metadata? ... 4) reboot A (now 8.4) and reconnect to node B (still at 9.0) 5) repeat 2) and 3) on node B Could you please help me on points 3) and 4)? Thank you all for helping Claudio Il 24/02/2016 10:05, Claudio ha scritto: > That's great, will test it immediately and report back... > > Thanks > > Il 24/02/2016 10:01, Dietmar Maurer wrote: >>> Upgrade to 9.0.1: @Lars, was this fixed in DRBD 9.0.1, so I could ask >>> Proxmox guys to build a kernel with this DRBD version (or trying to >>> build it by myself)? >> I just build a new proxmox kernel with 9.0.1 - will upload today to pvetest ... >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160224/f0f381f5/attachment.htm>