Kernel Panic with 9.2.8
Aleksandr Zimin
alexandr.zimin at flant.com
Mon Mar 25 22:33:29 CET 2024
Hello,
I hope this message finds you well.
We understand everyone is busy, and we would greatly appreciate any advice
or guidance you could provide. This issue is critically important to our
operations, and we're ready to provide any additional information or
conduct further tests to assist in resolving it.
For context, we've encountered this issue across several clusters,
including our production environments. We have specifically reproduced the
issue on physical servers in our testing environment to eliminate
additional layers of abstraction, such as virtualization.
Here is an example of the configuration for one of our DRBD resources:
```
resource "pvc-xxxxx"
{
options
{
on-no-data-accessible suspend-io;
on-no-quorum suspend-io;
on-suspended-primary-outdated force-secondary;
quorum majority;
}
net
{
cram-hmac-alg sha1;
shared-secret "xxxxx";
connect-int 10;
ping-int 15;
ping-timeout 20;
rr-conflict retry-connect;
timeout 90;
verify-alg "crct10dif-pclmul";
}
on "node-1"
{
volume 0
{
disk /dev/vg/pvc-xxxxx_00000;
disk
{
discard-zeroes-if-aligned no;
rs-discard-granularity 4096;
}
meta-disk internal;
device minor 1008;
}
node-id 1;
}
on "node-6"
{
volume 0
{
disk none;
disk
{
discard-zeroes-if-aligned no;
rs-discard-granularity 4096;
}
meta-disk internal;
device minor 1008;
}
node-id 0;
}
on "node-9"
{
volume 0
{
disk /dev/drbd/this/is/not/used;
disk
{
discard-zeroes-if-aligned no;
rs-discard-granularity 4096;
}
meta-disk internal;
device minor 1008;
}
node-id 2;
}
connection
{
host "node-1" address ipv4 x.x.x.x:7007;
host "node-6" address ipv4 x.x.x.x:7007;
}
connection
{
host "node-1" address ipv4 x.x.x.x:7007;
host "node-9" address ipv4 x.x.x.x:7007;
}
}
```
Thank you for considering our request.
-- Best Regards,
Aleksandr Zimin
On Wed, 20 Mar 2024 at 01:41, Aleksandr Zimin <alexandr.zimin at flant.com>
wrote:
> Hello,
>
> We are experiencing a critical issue with DRBD 9.2.8 running in a 5-node
> cluster environment. Occasionally, several servers in the cluster undergo
> unexpected reboots. In one instance, all servers rebooted simultaneously.
> Most recently, we encountered situations where servers rebooted, and we
> were able to capture full Call Traces for these incidents. It's also
> worth mentioning that we observed similar reboot issues with version 9.2.5.
> We are able to reproduce this behavior on our test setups by continuously
> creating/deleting a large number of resources comprised of several replicas
> under high disk/network load.
>
> First Incident Call Trace:
>
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.309042] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-2: Preparing remote state change 2720219251
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.329658] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-2: Committing remote state change 2720219251 (primary_nodes=0)
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.337403] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070 offline-stand-stor-2: pdsk( UpToDate -> Detaching ) [remote]
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.361074] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: Preparing cluster-wide state change 2146026655 (1->-1 7680/1024)
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.368190] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: State change 2146026655: primary_nodes=0, weak_nodes=0
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.373618] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: Committing cluster-wide state change 2146026655 (12ms)
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.381997] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070 offline-stand-stor-2: pdsk( Detaching -> Diskless ) [peer-state]
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.390526] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: disk( UpToDate -> Detaching ) [detach]
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.407933] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: Would lose quorum, but using tiebreaker logic to keep
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.410349] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: disk( Detaching -> Diskless ) [go-diskless]
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.430930] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: drbd_bm_resize called with capacity == 0
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.495886] eth0: renamed from tmp1ef17
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.531000] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Mar 17 01:38:20 offine-stand-stor-0 [ 4333.531916] IPv6: ADDRCONF(NETDEV_CHANGE): lxce958c5d240e4: link becomes ready
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.876144] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.877717] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: State change failed: State change was refused by peer node (-10)
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.878487] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: Failed: susp-io( no -> quorum ) [del-minor]
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.879278] drbd /unregistered/pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: Failed: quorum( yes -> no ) [del-minor]
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.880013] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070 offline-stand-stor-3: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [del-minor]
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.882034] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.884237] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: State change failed: State change was refused by peer node (-10)
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.885172] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: Failed: susp-io( no -> quorum ) [del-minor]
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.886105] drbd /unregistered/pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: Failed: quorum( yes -> no ) [del-minor]
> Mar 17 01:38:21 offine-stand-stor-0 [ 4333.887037] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070 offline-stand-stor-2: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [del-minor]
> Mar 17 01:38:21 offine-stand-stor-0 [ 4334.013360] WARNING: chroot access!
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.082850] WARNING: chroot access!
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.693626] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-3: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [del-peer]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.703867] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-3: Terminating sender thread
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.706924] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-3: Starting sender thread (from drbd_r_pvc-daaa [13435])
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.762465] BUG: kernel NULL pointer dereference, address: 000000000000078c
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.763253] #PF: supervisor write access in kernel mode
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.763993] #PF: error_code(0x0002) - not-present page
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.764865] PGD 0 P4D 0
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.765591] Oops: 0002 [#1] SMP PTI
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.766259] CPU: 5 PID: 13435 Comm: drbd_r_pvc-daaa Kdump: loaded Tainted: G OE 5.15.0-83-generic #astra1+ci14
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.767117] Hardware name: Supermicro SYS-5039MS-H8TRF/X11SSD-F, BIOS 2.3 12/20/2019
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.767795] RIP: 0010:_raw_spin_lock_irq+0x17/0x40
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.768501] Code: cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 06 5d c3 cc cc cc cc 89 c6 e8 d6 c5 42 ff 66 90 5d
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.769906] RSP: 0018:ffffa24d3e1e3c48 EFLAGS: 00010046
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.770660] RAX: 0000000000000000 RBX: ffff902fc347e780 RCX: 0000000000000000
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.771336] RDX: 0000000000000001 RSI: ffffa24d3e1e3ca0 RDI: 000000000000078c
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.771997] RBP: ffffa24d3e1e3c48 R08: ffff90306da773e0 R09: ffff90306da773e0
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.772757] R10: ffff90306da773e0 R11: ffff90306da773e0 R12: 0000000000000001
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.773396] R13: ffff902e8a073000 R14: 000000000000078c R15: ffff90306da77000
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.774032] FS: 0000000000000000(0000) GS:ffff9035d7b40000(0000) knlGS:0000000000000000
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.774712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.775423] CR2: 000000000000078c CR3: 000000058e410005 CR4: 00000000003706e0
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.776071] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.776690] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.777303] Call Trace:
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.777930] <TASK>
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.778568] ? show_regs.cold.16+0x1a/0x1f
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.779240] ? __die_body+0x1f/0x70
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.779861] ? __die+0x2a/0x35
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.780448] ? page_fault_oops+0x136/0x2b0
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.781054] ? do_user_addr_fault+0x33e/0x660
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.781667] ? finish_task_switch+0x81/0x2a0
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.782385] ? exc_page_fault+0x7e/0x170
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.783023] ? asm_exc_page_fault+0x27/0x30
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.783604] ? _raw_spin_lock_irq+0x17/0x40
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.784143] drbd_free_peer_req+0xa9/0x240 [drbd]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.784688] drbd_finish_peer_reqs+0xc2/0x180 [drbd]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.785211] drain_resync_activity+0x579/0xdc0 [drbd]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.785720] ? wake_up_q+0x4e/0x90
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.786204] ? __mutex_unlock_slowpath.isra.24+0x9c/0x110
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.786691] ? mutex_unlock+0x26/0x30
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.787162] conn_disconnect+0x1b3/0xa40 [drbd]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.787643] drbd_receiver+0x5ef/0x990 [drbd]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.788103] ? drbd_unplug_all_devices+0x50/0x50 [drbd]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.788593] drbd_thread_setup+0x85/0x1e0 [drbd]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.789081] ? inc_open_count+0xb0/0xb0 [drbd]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.789532] kthread+0x12d/0x150
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.790280] ? set_kthread_struct+0x50/0x50
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.790903] ret_from_fork+0x1f/0x30
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.791402] </TASK>
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.791792] Modules linked in: netconsole(E) drbd_transport_tcp(OE) udp_diag(E) ip_set(E) xt_CT(E) cls_bpf(E) sch_ingress(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) sch_fq(E) bcache(E) crc64(E) dm_cache(E) dm_writecache(E) xfrm_user(E) xfrm_algo(E) veth(E) nvme_rdma(E) nvme_fabrics(E) nvmet_rdma(E) nvmet(E) nvme_core(E) rdma_cm(E) iw_cm(E) ib_cm(E) nf_tables(E) ib_core(E) nfnetlink(E) xt_socket(E) nf_socket_ipv4(E) nf_socket_ipv6(E) ip6table_raw(E) iptable_raw(E) ip6table_filter(E) ip6table_nat(E) ip6table_mangle(E) ip6_tables(E) xt_MASQUERADE(E) xt_mark(E) iptable_nat(E) nf_nat(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_comment(E) iptable_filter(E) iptable_mangle(E) bpfilter(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) tcp_diag(E) inet_diag(E) aufs(E) overlay(E) intel_rapl_msr(E) intel_rapl_common(E) intel_tcc_cooling(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E)
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.791850] crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) intel_cstate(E) ipmi_ssif(E) ast(E) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) cec(E) rc_core(E) mei_me(E) drm(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) joydev(E) sysimgblt(E) ee1004(E) mei(E) input_leds(E) intel_pch_thermal(E) ie31200_edac(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E) mac_hid(E) handshake(OE) drbd(OE) lru_cache(E) libcrc32c(E) br_netfilter(E) bridge(E) stp(E) llc(E) parport_pc(E) ppdev(E) lp(E) parport(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) i2c_i801(E) i2c_smbus(E) igb(E) intel_ish_ipc(E) xhci_pci(E) i2c_algo_bit(E) xhci_pci_renesas(E) intel_ishtp(E) dca(E) video(E) parsec(OE) digsig_verif(OE) [last unloaded: netconsole]
> Mar 17 01:38:22 offine-stand-stor-0 [ 4334.798957] CR2: 000000000000078c
>
> Second Incident Call Trace:
>
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.678989] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-2: Preparing remote state change 3569889254
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.699329] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-2: Committing remote state change 3569889254 (primary_nodes=0)
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.702481] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054 offline-stand-stor-2: pdsk( UpToDate -> Detaching ) [remote]
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.719818] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Preparing cluster-wide state change 411107339 (0->-1 7680/1024)
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.721900] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054 offline-stand-stor-2: pdsk( Detaching -> Diskless ) [peer-state]
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.735629] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: State change 411107339: primary_nodes=0, weak_nodes=0
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.736683] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Committing cluster-wide state change 411107339 (20ms)
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.737923] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: disk( UpToDate -> Detaching ) [detach]
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.740155] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: Would lose quorum, but using tiebreaker logic to keep
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.740886] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: disk( Detaching -> Diskless ) [go-diskless]
> Mar 17 01:59:55 offine-stand-stor-0 [ 460.758615] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: drbd_bm_resize called with capacity == 0
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.096657] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.098095] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: State change failed: State change was refused by peer node (-10)
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.098830] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Failed: susp-io( no -> quorum ) [del-minor]
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.099539] drbd /unregistered/pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: Failed: quorum( yes -> no ) [del-minor]
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.100281] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054 offline-stand-stor-2: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [del-minor]
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.102205] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.104149] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: State change failed: State change was refused by peer node (-10)
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.105104] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Failed: susp-io( no -> quorum ) [del-minor]
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.106169] drbd /unregistered/pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: Failed: quorum( yes -> no ) [del-minor]
> Mar 17 01:59:56 offine-stand-stor-0 [ 461.107126] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054 offline-stand-stor-1: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [del-minor]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.212691] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: sock was shut down by peer
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.212741] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: meta connection shut down by peer.
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.213491] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown )
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.215789] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Preparing cluster-wide state change 605233927 (0->-1 0/0)
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.233366] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: Terminating sender thread
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.234120] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: Starting sender thread (from drbd_r_pvc-fc9d [12946])
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.235535] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: State change 605233927: primary_nodes=0, weak_nodes=0
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.236317] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Committing cluster-wide state change 605233927 (24ms)
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.263975] BUG: kernel NULL pointer dereference, address: 000000000000038c
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.264705] #PF: supervisor write access in kernel mode
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.265422] #PF: error_code(0x0002) - not-present page
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.266124] PGD 0 P4D 0
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.266813] Oops: 0002 [#1] SMP PTI
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.267525] CPU: 1 PID: 12946 Comm: drbd_r_pvc-fc9d Kdump: loaded Tainted: G OE 5.15.0-83-generic #astra1+ci14
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.268226] Hardware name: Supermicro SYS-5039MS-H8TRF/X11SSD-F, BIOS 2.3 12/20/2019
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.268918] RIP: 0010:_raw_spin_lock_irq+0x17/0x40
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.269616] Code: cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 06 5d c3 cc cc cc cc 89 c6 e8 d6 c5 42 ff 66 90 5d
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.271066] RSP: 0018:ffffb93ebe2b7c48 EFLAGS: 00010046
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.271840] RAX: 0000000000000000 RBX: ffff9ec8c6fc3e40 RCX: 0000000000000000
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.272555] RDX: 0000000000000001 RSI: ffffb93ebe2b7ca0 RDI: 000000000000038c
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.273248] RBP: ffffb93ebe2b7c48 R08: ffff9eca8f6fd3e0 R09: ffff9eca8f6fd3e0
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.273944] R10: ffff9eca8f6fd3e0 R11: ffff9eca8f6fd3e0 R12: 0000000000000001
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.274600] R13: ffff9ec9e441b800 R14: 000000000000038c R15: ffff9eca8f6fd000
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.275252] FS: 0000000000000000(0000) GS:ffff9ed017a40000(0000) knlGS:0000000000000000
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.275900] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.276568] CR2: 000000000000038c CR3: 0000000312c10006 CR4: 00000000003706e0
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.277207] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.277862] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.278534] Call Trace:
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.279259] <TASK>
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.279938] ? show_regs.cold.16+0x1a/0x1f
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.280548] ? __die_body+0x1f/0x70
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.281175] ? __die+0x2a/0x35
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.281752] ? page_fault_oops+0x136/0x2b0
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.282375] ? do_user_addr_fault+0x33e/0x660
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.282953] ? finish_task_switch+0x81/0x2a0
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.283551] ? exc_page_fault+0x7e/0x170
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.284180] ? asm_exc_page_fault+0x27/0x30
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.284738] ? _raw_spin_lock_irq+0x17/0x40
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.285355] drbd_free_peer_req+0xa9/0x240 [drbd]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.285905] drbd_finish_peer_reqs+0xc2/0x180 [drbd]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.286463] drain_resync_activity+0x579/0xdc0 [drbd]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.287001] ? wake_up_q+0x4e/0x90
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.287483] ? __mutex_unlock_slowpath.isra.24+0x9c/0x110
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.288077] ? mutex_unlock+0x26/0x30
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.288539] conn_disconnect+0x1b3/0xa40 [drbd]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.289045] drbd_receiver+0x5ef/0x990 [drbd]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.289515] ? drbd_unplug_all_devices+0x50/0x50 [drbd]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.290042] drbd_thread_setup+0x85/0x1e0 [drbd]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.290486] ? inc_open_count+0xb0/0xb0 [drbd]
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.290925] kthread+0x12d/0x150
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.291348] ? set_kthread_struct+0x50/0x50
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.291761] ret_from_fork+0x1f/0x30
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.292192] </TASK>
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.292583] Modules linked in: drbd_transport_tcp(OE) udp_diag(E) ip_set(E) xt_CT(E) cls_bpf(E) sch_ingress(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) sch_fq(E) bcache(E) crc64(E) xfrm_user(E) dm_cache(E) xfrm_algo(E) dm_writecache(E) veth(E) nf_tables(E) nfnetlink(E) xt_socket(E) nf_socket_ipv4(E) nf_socket_ipv6(E) ip6table_raw(E) iptable_raw(E) nvme_rdma(E) nvme_fabrics(E) nvmet_rdma(E) nvmet(E) nvme_core(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) ip6table_filter(E) ip6table_nat(E) ip6table_mangle(E) ip6_tables(E) xt_MASQUERADE(E) xt_mark(E) iptable_nat(E) nf_nat(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_comment(E) iptable_filter(E) iptable_mangle(E) bpfilter(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) tcp_diag(E) inet_diag(E) aufs(E) overlay(E) intel_rapl_msr(E) intel_rapl_common(E) intel_tcc_cooling(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E)
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.292625] ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) intel_cstate(E) ipmi_ssif(E) ast(E) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) cec(E) rc_core(E) drm(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) input_leds(E) joydev(E) sysimgblt(E) ee1004(E) mei_me(E) acpi_ipmi(E) intel_pch_thermal(E) mei(E) ipmi_si(E) ie31200_edac(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E) mac_hid(E) netconsole(E) handshake(OE) drbd(OE) lru_cache(E) libcrc32c(E) br_netfilter(E) bridge(E) stp(E) llc(E) parport_pc(E) ppdev(E) lp(E) parport(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) i2c_i801(E) i2c_smbus(E) igb(E) intel_ish_ipc(E) xhci_pci(E) i2c_algo_bit(E) xhci_pci_renesas(E) intel_ishtp(E) dca(E) video(E) parsec(OE) digsig_verif(OE)
> Mar 17 01:59:57 offine-stand-stor-0 [ 462.299942] CR2: 000000000000038c
>
>
> DRBD version:
>
> cat /proc/drbd
> version: 9.2.8 (api:2/proto:86-122)
> GIT-hash:123456 build by @offine-stand-stor-0, 2024-03-14 14:27:44
> Transports (api:20): tcp (9.2.8)
>
> Please find the attached log file for more detailed information
> surrounding the kernel panic incident.
>
> Thank you in advance for your support.
>
> -- Best Regards,
> Aleksandr Zimin
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20240326/fe295c00/attachment-0001.htm>
More information about the drbd-user
mailing list