<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body style="font-family: Courier New; font-size: 13px;"
bgcolor="#FFFFFF" text="#000000">
<div id="QCMcontainer" style="font-family:Courier
New;font-size:13px;">Just tested DRBD 9.0.1 and it still crashes
with the same kernel panic at the same line:<br>
<br>
---------------------------<br>
[ 1892.949041] drbd r0/0 drbd0: LOGIC BUG for enr=107636<br>
[ 1892.954170] drbd r0/0 drbd0: LOGIC BUG for enr=107636<br>
[ 1893.141512] ------------[ cut here ]------------<br>
[ 1893.146192] kernel BUG at
/home/dietmar/pve4-devel/pve-kernel/drbd-9.0.1-1/drbd/lru_cache.c:
571!<br>
[ 1893.155075] invalid opcode: 0000 [#1] SMP<br>
[ 1893.159244] Modules linked in: ip_set ip6table_filter
ip6_tables drbd_transport_tcp(O) drbd(O) libcrc32c softdog nfsd
auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm
iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_tcpudp
xt_comment xt_conntrack xt_multiport iptable_filter iptable_mangle
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack ip_tables x_tables nfnetlink_log nfnetlink zfs(PO)
zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) ipmi_ssif
amdkfd amd_iommu_v2 radeon ttm gpio_ich drm_kms_helper drm psmouse
coretemp snd_pcm i2c_algo_bit kvm_intel snd_timer snd kvm
soundcore input_leds hpilo shpchp serio_raw i7core_edac pcspkr
acpi_power_meter ipmi_si lpc_ich ipmi_msghandler 8250_fintek
mac_hid edac_core vhost_net vhost macvtap macvlan autofs4
hid_generic usbkbd usbmouse usbhid hid pata_acpi tg3 e1000e(O)
ptppps_core hpsa<br>
[ 1893.245546] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P
IO 4.2.8-1-pve #1<br>
[ 1893.253218] Hardware name: HP ProLiant ML350 G6, BIOS D22
08/16/2015<br>
[ 1893.259682] task: ffff88020e29be80 ti: ffff88020e2b0000
task.ti: ffff88020e2b0000<br>
[ 1893.267274] RIP: 0010:[<ffffffffc0ab0fe0>]
[<ffffffffc0ab0fe0>] lc_put+0x90/0xa0 [drbd]<br>
[ 1893.275483] RSP: 0018:ffff880217503ac8 EFLAGS: 00010046<br>
[ 1893.280853] RAX: 0000000000000000 RBX: 000000000001a474 RCX:
ffff8800357d9900<br>
[ 1893.288066] RDX: ffff8800dec48000 RSI: ffff8800357d9900 RDI:
ffff88020b2a6b40<br>
[ 1893.295306] RBP: ffff880217503ac8 R08: 0000000000000011 R09:
0000000000000000<br>
[ 1893.302520] R10: ffff8801a5e3edc0 R11: 0000000000000166 R12:
ffff88020c478c00<br>
[ 1893.309733] R13: 0000000000000000 R14: 000000000001a474 R15:
0000000000000001<br>
[ 1893.316981] FS: 0000000000000000(0000)
GS:ffff880217500000(0000) knlGS:0000000000000000<br>
[ 1893.325160] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b<br>
[ 1893.330996] CR2: 00007f47508cbf70 CR3: 0000000001e0d000 CR4:
00000000000026e0<br>
[ 1893.338207] Stack:<br>
[ 1893.340241] ffff880217503b18 ffffffffc0aadd0a 0000000000000046
ffff88020c478eb0<br>
[ 1893.347776] ffff88020c478c08 ffff8801a5e3e978 ffff88020c478c00
ffff8801a5e3e988<br>
[ 1893.355326] 0000000000000800 0000000000004000 ffff880217503b28
ffffffffc0aae210<br>
[ 1893.362876] Call Trace:<br>
[ 1893.365348] <IRQ><br>
[ 1893.367302] [<ffffffffc0aadd0a>] put_actlog+0x6a/0x120
[drbd]<br>
[ 1893.373395] [<ffffffffc0aae210>]
drbd_al_complete_io+0x30/0x40 [drbd]<br>
[ 1893.380000] [<ffffffffc0aa8342>]
drbd_req_destroy+0x442/0x880 [drbd]<br>
[ 1893.386518] [<ffffffffc0aa7996>] ?
drbd_req_put_completion_ref+0x116/0x350 [drbd]<br>
[ 1893.394177] [<ffffffffc0aa8c88>]
mod_rq_state+0x508/0x7c0 [drbd]<br>
[ 1893.404919] [<ffffffff811852bf>] ?
mempool_free+0x2f/0x90<br>
[ 1893.415114] [<ffffffffc0aa90f7>] __req_mod+0xd7/0x8d0
[drbd]<br>
[ 1893.425501] [<ffffffffc0a8ff81>]
drbd_request_endio+0x81/0x230 [drbd]<br>
[ 1893.436651] [<ffffffff813954c7>] bio_endio+0x57/0x90<br>
[ 1893.446272] [<ffffffff8139c31f>]
blk_update_request+0x8f/0x340<br>
[ 1893.456751] [<ffffffff81583f23>]
scsi_end_request+0x33/0x1c0<br>
[ 1893.467069] [<ffffffff815864d4>]
scsi_io_completion+0xc4/0x650<br>
[ 1893.477558] [<ffffffff8157d50f>]
scsi_finish_command+0xcf/0x120<br>
[ 1893.488152] [<ffffffff81585d26>]
scsi_softirq_done+0x126/0x150<br>
[ 1893.498614] [<ffffffff813a2f47>]
blk_done_softirq+0x87/0xb0<br>
[ 1893.508796] [<ffffffff81080095>]
__do_softirq+0x105/0x260<br>
[ 1893.518755] [<ffffffff8108034e>] irq_exit+0x8e/0x90<br>
[ 1893.528139] [<ffffffff8180d6f8>] do_IRQ+0x58/0xe0<br>
[ 1893.537325] [<ffffffff8180b66b>]
common_interrupt+0x6b/0x6b<br>
[ 1893.547299] <EOI><br>
[ 1893.549247] [<ffffffff8168d011>] ?
cpuidle_enter_state+0xf1/0x220<br>
[ 1893.564052] [<ffffffff8168cff0>] ?
cpuidle_enter_state+0xd0/0x220<br>
[ 1893.574285] [<ffffffff8168d177>] cpuidle_enter+0x17/0x20<br>
[ 1893.583642] [<ffffffff810be18b>] call_cpuidle+0x3b/0x70<br>
[ 1893.592753] [<ffffffff8168d153>] ?
cpuidle_select+0x13/0x20<br>
[ 1893.602118] [<ffffffff810be45c>]
cpu_startup_entry+0x29c/0x360<br>
[ 1893.611711] [<ffffffff8104d983>]
start_secondary+0x183/0x1c0<br>
[ 1893.620980] Code: 89 42 08 48 89 56 10 48 89 7e 18 48 89 07 83
6f 64 01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20
5d c3 0f 0b 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00
00 00 00 66 66 66 66 90<br>
[ 1893.647996] RIP [<ffffffffc0ab0fe0>] lc_put+0x90/0xa0
[drbd]<br>
[ 1893.657350] RSP <ffff880217503ac8><br>
[ 1893.664377] ---[ end trace 00eeba9098fc3948 ]---<br>
[ 1893.672498] Kernel panic - not syncing: Fatal exception in
interrupt<br>
[ 1894.745252] Shutting down cpus with NMI<br>
[ 1894.752650] Kernel Offset: disabled<br>
[ 1894.759570] drm_kms_helper: panic occurred, switching back to
text console<br>
[ 1894.769935] ---[ end Kernel panic - not syncing: Fatal
exception in interrupt<br>
[ 1894.780616] ------------[ cut here ]------------<br>
[ 1894.788757] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smp.c:124
native_smp_send_reschedule+0x60/0x70()<br>
[ 1894.801701] Modules linked in: ip_set ip6table_filter
ip6_tables drbd_transport_tcp(O) drbd(O) libcrc32c softdog nfsd
auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm
iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_tcpudp
xt_comment xt_conntrack xt_multiport iptable_filter iptable_mangle
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack ip_tables x_tables nfnetlink_log nfnetlink zfs(PO)
zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) ipmi_ssif
amdkfd amd_iommu_v2 radeon ttm gpio_ich drm_kms_helper drm psmouse
coretemp snd_pcm i2c_algo_bit kvm_intel snd_timer snd kvm
soundcore input_leds hpilo shpchp serio_raw i7core_edac pcspkr
acpi_power_meter ipmi_si lpc_ich ipmi_msghandler 8250_fintek
mac_hid edac_core vhost_net vhost macvtap macvlan autofs4
hid_generic usbkbd usbmouse usbhid hid pata_acpi tg3 e1000e(O)
ptppps_core hpsa<br>
[ 1894.918913] CPU: 4 PID: 0 Comm: swapper/4 Tainted: P D
IO 4.2.8-1-pve #1<br>
[ 1894.930775] Hardware name: HP ProLiant ML350 G6, BIOS D22
08/16/2015<br>
[ 1894.941441] 0000000000000000 cb864877fc32c408 ffff880217503530
ffffffff81803a9b<br>
[ 1894.953278] 0000000000000000 0000000000000000 ffff880217503570
ffffffff8107bbfa<br>
[ 1894.965055] ffff880217503560 0000000000000000 ffff880217416a00
0000000000000004<br>
[ 1894.976768] Call Trace:<br>
[ 1894.983495] <IRQ> [<ffffffff81803a9b>]
dump_stack+0x45/0x57<br>
[ 1894.993593] [<ffffffff8107bbfa>]
warn_slowpath_common+0x8a/0xc0<br>
[ 1895.003920] [<ffffffff8107bd2a>]
warn_slowpath_null+0x1a/0x20<br>
[ 1895.014099] [<ffffffff8104cc50>]
native_smp_send_reschedule+0x60/0x70<br>
[ 1895.024995] [<ffffffff810b897b>]
trigger_load_balance+0x13b/0x230<br>
[ 1895.035518] [<ffffffff810a7ab6>]
scheduler_tick+0xa6/0xd0<br>
[ 1895.045349] [<ffffffff810f7ac0>] ?
tick_sched_do_timer+0x30/0x30<br>
[ 1895.055802] [<ffffffff810e81b1>]
update_process_times+0x51/0x60<br>
[ 1895.066195] [<ffffffff810f74b5>]
tick_sched_handle.isra.15+0x25/0x60<br>
[ 1895.077027] [<ffffffff810f7b04>]
tick_sched_timer+0x44/0x80<br>
[ 1895.087031] [<ffffffff810e8d83>]
__hrtimer_run_queues+0xf3/0x220<br>
[ 1895.097489] [<ffffffff810e91e8>]
hrtimer_interrupt+0xa8/0x1a0<br>
[ 1895.107695] [<ffffffff8104f57c>]
local_apic_timer_interrupt+0x3c/0x70<br>
[ 1895.118653] [<ffffffff8180d7c1>]
smp_apic_timer_interrupt+0x41/0x60<br>
[ 1895.129425] [<ffffffff8180b95b>]
apic_timer_interrupt+0x6b/0x70<br>
[ 1895.139812] [<ffffffff818018a0>] ? panic+0x1d3/0x217<br>
[ 1895.149264] [<ffffffff8180189c>] ? panic+0x1cf/0x217<br>
[ 1895.158676] [<ffffffff810180a6>] oops_end+0xd6/0xe0<br>
[ 1895.167997] [<ffffffff810185cb>] die+0x4b/0x70<br>
[ 1895.176869] [<ffffffff810154bd>] do_trap+0x13d/0x150<br>
[ 1895.186265] [<ffffffff81015a99>]
do_error_trap+0x89/0x110<br>
[ 1895.196100] [<ffffffffc0ab0fe0>] ? lc_put+0x90/0xa0
[drbd]<br>
[ 1895.205911] [<ffffffff8118a069>] ?
__free_pages+0x19/0x30<br>
[ 1895.215521] [<ffffffff811dbf6a>] ?
__free_slab+0xda/0x1e0<br>
[ 1895.225002] [<ffffffff81015dc0>] do_invalid_op+0x20/0x30<br>
[ 1895.234412] [<ffffffff8180c41e>] invalid_op+0x1e/0x30<br>
[ 1895.243541] [<ffffffffc0ab0fe0>] ? lc_put+0x90/0xa0
[drbd]<br>
[ 1895.253039] [<ffffffffc0ab0d50>] ? lc_find+0x10/0x20
[drbd]<br>
[ 1895.262582] [<ffffffffc0aadd0a>] put_actlog+0x6a/0x120
[drbd]<br>
[ 1895.272310] [<ffffffffc0aae210>]
drbd_al_complete_io+0x30/0x40 [drbd]<br>
[ 1895.282784] [<ffffffffc0aa8342>]
drbd_req_destroy+0x442/0x880 [drbd]<br>
[ 1895.293104] [<ffffffffc0aa7996>] ?
drbd_req_put_completion_ref+0x116/0x350 [drbd]<br>
[ 1895.304576] [<ffffffffc0aa8c88>]
mod_rq_state+0x508/0x7c0 [drbd]<br>
[ 1895.314461] [<ffffffff811852bf>] ?
mempool_free+0x2f/0x90<br>
[ 1895.323681] [<ffffffffc0aa90f7>] __req_mod+0xd7/0x8d0
[drbd]<br>
[ 1895.333029] [<ffffffffc0a8ff81>]
drbd_request_endio+0x81/0x230 [drbd]<br>
[ 1895.343097] [<ffffffff813954c7>] bio_endio+0x57/0x90<br>
[ 1895.351572] [<ffffffff8139c31f>]
blk_update_request+0x8f/0x340<br>
[ 1895.360826] [<ffffffff81583f23>]
scsi_end_request+0x33/0x1c0<br>
[ 1895.369798] [<ffffffff815864d4>]
scsi_io_completion+0xc4/0x650<br>
[ 1895.378798] [<ffffffff8157d50f>]
scsi_finish_command+0xcf/0x120<br>
[ 1895.387812] [<ffffffff81585d26>]
scsi_softirq_done+0x126/0x150<br>
[ 1895.396678] [<ffffffff813a2f47>]
blk_done_softirq+0x87/0xb0<br>
[ 1895.405243] [<ffffffff81080095>]
__do_softirq+0x105/0x260<br>
[ 1895.413575] [<ffffffff8108034e>] irq_exit+0x8e/0x90<br>
[ 1895.421405] [<ffffffff8180d6f8>] do_IRQ+0x58/0xe0<br>
[ 1895.429037] [<ffffffff8180b66b>]
common_interrupt+0x6b/0x6b<br>
[ 1895.437471] <EOI> [<ffffffff8168d011>] ?
cpuidle_enter_state+0xf1/0x220<br>
[ 1895.447063] [<ffffffff8168cff0>] ?
cpuidle_enter_state+0xd0/0x220<br>
[ 1895.456071] [<ffffffff8168d177>] cpuidle_enter+0x17/0x20<br>
[ 1895.464262] [<ffffffff810be18b>] call_cpuidle+0x3b/0x70<br>
[ 1895.472391] [<ffffffff8168d153>] ?
cpuidle_select+0x13/0x20<br>
[ 1895.480850] [<ffffffff810be45c>]
cpu_startup_entry+0x29c/0x360<br>
[ 1895.489579] [<ffffffff8104d983>]
start_secondary+0x183/0x1c0<br>
[ 1895.498098] ---[ end trace 00eeba9098fc3949 ]---<br>
-------------------<br>
<br>
I was watching "drbdadm status" each 2s.<br>
This is its last output before the panic:<br>
-------------------<br>
r0 node-id:0 role:Primary suspended:no<br>
write-ordering:drain<br>
volume:0 minor:0 disk:UpToDate<br>
size:488336928 read:829508 written:5750835 al-writes:2689
bm-writes:0<br>
upper-pending:320 lower-pending:320 al-suspended:no
blocked:no<br>
srvvmhost2 node-id:1 connection:Connected role:Primary<br>
congested:no<br>
volume:0 replication:Established peer-disk:UpToDate<br>
resync-suspended:no<br>
received:1034427 sent:4717688 out-of-sync:0 pending:0
unacked:0<br>
-------------------<br>
<br>
I suppose that version 9.0.1 is not targeting this bug.<br>
@Lars: can you confirm it?<br>
<br>
@Dietmar: what's my best option now?<br>
I'd like to stay on DRBD9, but I urge to fix this kernel panic
soon because the hosts are already in production.<br>
Self compiling 8.4 could be an option but I suppose Proxmox will
use 9.x in the future and never get back to 8.4.<br>
Am I right or is there a special kernel version with 8.4?<br>
<br>
@Lars: in case of a downgrade (if I decide to build 8.4 by myself
and enter the versioning hell), is this the right path?<br>
1) move all of the VMs to node B<br>
2) downgrade node A module 9.0-->8.4<br>
3) ... resource metadata? ...<br>
4) reboot A (now 8.4) and reconnect to node B (still at 9.0)<br>
5) repeat 2) and 3) on node B<br>
<br>
Could you please help me on points 3) and 4)?<br>
<br>
Thank you all for helping<br>
<br>
Claudio<br>
<br>
<span style=" font-family:Courier New; font-size:14px;
font-weight:normal" class="headerSpan">
<div class="moz-cite-prefix">Il 24/02/2016 10:05, Claudio ha
scritto:<br>
</div>
</span>
<blockquote style="font-size: medium;"
cite="mid:56CD7260.7010302@gmail.com" type="cite">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
<div id="QCMcontainer" style="font-family:Courier
New;font-size:13px;">That's great, will test it immediately
and report back...<br>
<br>
Thanks<br>
<br>
<span style=" font-family:Courier New; font-size:14px;
font-weight:normal" class="headerSpan">
<div class="moz-cite-prefix">Il 24/02/2016 10:01, Dietmar
Maurer wrote:<br>
</div>
</span>
<blockquote style="font-size: medium;"
cite="mid:1271103604.324.3bf4df64-2713-439a-b1d3-e77c4e102abb.open-xchange@webmail.proxmox.com"
type="cite">
<blockquote type="cite">
<pre wrap="">Upgrade to 9.0.1: @Lars, was this fixed in DRBD 9.0.1, so I could ask
Proxmox guys to build a kernel with this DRBD version (or trying to
build it by myself)?
</pre>
</blockquote>
<pre wrap="">I just build a new proxmox kernel with 9.0.1 - will upload today to pvetest ...
</pre>
</blockquote>
<br>
</div>
</blockquote>
<br>
</div>
</body>
</html>