<div dir="ltr"><div dir="ltr">Hello,<br><br>I hope this message finds you well. <br><br>We understand everyone is busy, and we would greatly appreciate any advice or guidance you could provide. This issue is critically important to our operations, and we're ready to provide any additional information or conduct further tests to assist in resolving it.<br><br>For context, we've encountered this issue across several clusters, including our production environments. We have specifically reproduced the issue on physical servers in our testing environment to eliminate additional layers of abstraction, such as virtualization.<br><br>Here is an example of the configuration for one of our DRBD resources:<br><br>```<br>resource "pvc-xxxxx"<br>{<br> <br> options<br> {<br> on-no-data-accessible suspend-io;<br> on-no-quorum suspend-io;<br> on-suspended-primary-outdated force-secondary;<br> quorum majority;<br> }<br> <br> net<br> {<br> cram-hmac-alg sha1;<br> shared-secret "xxxxx";<br> connect-int 10;<br> ping-int 15;<br> ping-timeout 20;<br> rr-conflict retry-connect;<br> timeout 90;<br> verify-alg "crct10dif-pclmul";<br> }<br> <br> on "node-1"<br> {<br> volume 0<br> {<br> disk /dev/vg/pvc-xxxxx_00000;<br> disk<br> {<br> discard-zeroes-if-aligned no;<br> rs-discard-granularity 4096;<br> }<br> meta-disk internal;<br> device minor 1008;<br> }<br> node-id 1;<br> }<br> <br> on "node-6"<br> {<br> volume 0<br> {<br> disk none;<br> disk<br> {<br> discard-zeroes-if-aligned no;<br> rs-discard-granularity 4096;<br> }<br> meta-disk internal;<br> device minor 1008;<br> }<br> node-id 0;<br> }<br> <br> on "node-9"<br> {<br> volume 0<br> {<br> disk /dev/drbd/this/is/not/used;<br> disk<br> {<br> discard-zeroes-if-aligned no;<br> rs-discard-granularity 4096;<br> }<br> meta-disk internal;<br> device minor 1008;<br> }<br> node-id 2;<br> }<br> <br> connection<br> {<br> host "node-1" address ipv4 x.x.x.x:7007;<br> host "node-6" address ipv4 x.x.x.x:7007;<br> }<br> <br> connection<br> {<br> host "node-1" address ipv4 x.x.x.x:7007;<br> host "node-9" address ipv4 x.x.x.x:7007;<br> }<br>}<br>```<br><br><br>Thank you for considering our request.<br clear="all"><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><pre lang="plaintext" style="color:rgb(0,0,0)"><pre lang="plaintext" style="text-wrap: wrap;"><span lang="plaintext">-- </span>
<span lang="plaintext">Best Regards,</span>
Aleksandr Zimin</pre>
</pre></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 20 Mar 2024 at 01:41, Aleksandr Zimin <<a href="mailto:alexandr.zimin@flant.com">alexandr.zimin@flant.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p dir="auto" style="box-sizing:border-box;margin-bottom:16px;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";margin-top:0px">Hello,</p><p dir="auto" style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji"">We are experiencing a critical issue with DRBD 9.2.8 running in a 5-node cluster environment. Occasionally, several servers in the cluster undergo unexpected reboots. In one instance, all servers rebooted simultaneously. Most recently, we encountered situations where servers rebooted, and we were able to capture full Call Traces for these incidents. <span style="font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34)">It's also worth mentioning that we observed similar reboot issues with version 9.2.5. We are able to reproduce this behavior on our test setups by continuously creating/deleting a large number of resources comprised of several replicas under high disk/network load.</span></p><br><p dir="auto" style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji"">First Incident Call Trace:</p><div style="box-sizing:border-box;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";font-size:14px;overflow:auto"><pre style="box-sizing:border-box;font-size:11.9px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;border-radius:6px"><code style="box-sizing:border-box;font-size:11.9px;padding:0px;margin:0px;background:none;border-radius:6px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">Mar 17 01:38:20 offine-stand-stor-0 [ 4333.309042] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-2: Preparing remote state change 2720219251
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.329658] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-2: Committing remote state change 2720219251 (primary_nodes=0)
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.337403] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070 offline-stand-stor-2: pdsk( UpToDate -> Detaching ) [remote]
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.361074] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: Preparing cluster-wide state change 2146026655 (1->-1 7680/1024)
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.368190] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: State change 2146026655: primary_nodes=0, weak_nodes=0
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.373618] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: Committing cluster-wide state change 2146026655 (12ms)
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.381997] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070 offline-stand-stor-2: pdsk( Detaching -> Diskless ) [peer-state]
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.390526] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: disk( UpToDate -> Detaching ) [detach]
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.407933] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: Would lose quorum, but using tiebreaker logic to keep
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.410349] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: disk( Detaching -> Diskless ) [go-diskless]
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.430930] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: drbd_bm_resize called with capacity == 0
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.495886] eth0: renamed from tmp1ef17
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.531000] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 17 01:38:20 offine-stand-stor-0 [ 4333.531916] IPv6: ADDRCONF(NETDEV_CHANGE): lxce958c5d240e4: link becomes ready
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.876144] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.877717] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: State change failed: State change was refused by peer node (-10)
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.878487] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: Failed: susp-io( no -> quorum ) [del-minor]
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.879278] drbd /unregistered/pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: Failed: quorum( yes -> no ) [del-minor]
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.880013] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070 offline-stand-stor-3: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [del-minor]
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.882034] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.884237] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: State change failed: State change was refused by peer node (-10)
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.885172] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479: Failed: susp-io( no -> quorum ) [del-minor]
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.886105] drbd /unregistered/pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070: Failed: quorum( yes -> no ) [del-minor]
Mar 17 01:38:21 offine-stand-stor-0 [ 4333.887037] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479/0 drbd1070 offline-stand-stor-2: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [del-minor]
Mar 17 01:38:21 offine-stand-stor-0 [ 4334.013360] WARNING: chroot access!
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.082850] WARNING: chroot access!
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.693626] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-3: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [del-peer]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.703867] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-3: Terminating sender thread
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.706924] drbd pvc-daaa4fed-540a-4bb8-ad50-d6ba07126479 offline-stand-stor-3: Starting sender thread (from drbd_r_pvc-daaa [13435])
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.762465] BUG: kernel NULL pointer dereference, address: 000000000000078c
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.763253] #PF: supervisor write access in kernel mode
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.763993] #PF: error_code(0x0002) - not-present page
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.764865] PGD 0 P4D 0
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.765591] Oops: 0002 [#1] SMP PTI
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.766259] CPU: 5 PID: 13435 Comm: drbd_r_pvc-daaa Kdump: loaded Tainted: G OE 5.15.0-83-generic #astra1+ci14
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.767117] Hardware name: Supermicro SYS-5039MS-H8TRF/X11SSD-F, BIOS 2.3 12/20/2019
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.767795] RIP: 0010:_raw_spin_lock_irq+0x17/0x40
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.768501] Code: cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 06 5d c3 cc cc cc cc 89 c6 e8 d6 c5 42 ff 66 90 5d
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.769906] RSP: 0018:ffffa24d3e1e3c48 EFLAGS: 00010046
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.770660] RAX: 0000000000000000 RBX: ffff902fc347e780 RCX: 0000000000000000
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.771336] RDX: 0000000000000001 RSI: ffffa24d3e1e3ca0 RDI: 000000000000078c
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.771997] RBP: ffffa24d3e1e3c48 R08: ffff90306da773e0 R09: ffff90306da773e0
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.772757] R10: ffff90306da773e0 R11: ffff90306da773e0 R12: 0000000000000001
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.773396] R13: ffff902e8a073000 R14: 000000000000078c R15: ffff90306da77000
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.774032] FS: 0000000000000000(0000) GS:ffff9035d7b40000(0000) knlGS:0000000000000000
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.774712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.775423] CR2: 000000000000078c CR3: 000000058e410005 CR4: 00000000003706e0
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.776071] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.776690] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.777303] Call Trace:
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.777930] <TASK>
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.778568] ? show_regs.cold.16+0x1a/0x1f
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.779240] ? __die_body+0x1f/0x70
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.779861] ? __die+0x2a/0x35
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.780448] ? page_fault_oops+0x136/0x2b0
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.781054] ? do_user_addr_fault+0x33e/0x660
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.781667] ? finish_task_switch+0x81/0x2a0
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.782385] ? exc_page_fault+0x7e/0x170
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.783023] ? asm_exc_page_fault+0x27/0x30
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.783604] ? _raw_spin_lock_irq+0x17/0x40
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.784143] drbd_free_peer_req+0xa9/0x240 [drbd]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.784688] drbd_finish_peer_reqs+0xc2/0x180 [drbd]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.785211] drain_resync_activity+0x579/0xdc0 [drbd]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.785720] ? wake_up_q+0x4e/0x90
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.786204] ? __mutex_unlock_slowpath.isra.24+0x9c/0x110
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.786691] ? mutex_unlock+0x26/0x30
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.787162] conn_disconnect+0x1b3/0xa40 [drbd]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.787643] drbd_receiver+0x5ef/0x990 [drbd]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.788103] ? drbd_unplug_all_devices+0x50/0x50 [drbd]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.788593] drbd_thread_setup+0x85/0x1e0 [drbd]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.789081] ? inc_open_count+0xb0/0xb0 [drbd]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.789532] kthread+0x12d/0x150
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.790280] ? set_kthread_struct+0x50/0x50
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.790903] ret_from_fork+0x1f/0x30
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.791402] </TASK>
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.791792] Modules linked in: netconsole(E) drbd_transport_tcp(OE) udp_diag(E) ip_set(E) xt_CT(E) cls_bpf(E) sch_ingress(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) sch_fq(E) bcache(E) crc64(E) dm_cache(E) dm_writecache(E) xfrm_user(E) xfrm_algo(E) veth(E) nvme_rdma(E) nvme_fabrics(E) nvmet_rdma(E) nvmet(E) nvme_core(E) rdma_cm(E) iw_cm(E) ib_cm(E) nf_tables(E) ib_core(E) nfnetlink(E) xt_socket(E) nf_socket_ipv4(E) nf_socket_ipv6(E) ip6table_raw(E) iptable_raw(E) ip6table_filter(E) ip6table_nat(E) ip6table_mangle(E) ip6_tables(E) xt_MASQUERADE(E) xt_mark(E) iptable_nat(E) nf_nat(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_comment(E) iptable_filter(E) iptable_mangle(E) bpfilter(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) tcp_diag(E) inet_diag(E) aufs(E) overlay(E) intel_rapl_msr(E) intel_rapl_common(E) intel_tcc_cooling(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E)
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.791850] crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) intel_cstate(E) ipmi_ssif(E) ast(E) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) cec(E) rc_core(E) mei_me(E) drm(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) joydev(E) sysimgblt(E) ee1004(E) mei(E) input_leds(E) intel_pch_thermal(E) ie31200_edac(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E) mac_hid(E) handshake(OE) drbd(OE) lru_cache(E) libcrc32c(E) br_netfilter(E) bridge(E) stp(E) llc(E) parport_pc(E) ppdev(E) lp(E) parport(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) i2c_i801(E) i2c_smbus(E) igb(E) intel_ish_ipc(E) xhci_pci(E) i2c_algo_bit(E) xhci_pci_renesas(E) intel_ishtp(E) dca(E) video(E) parsec(OE) digsig_verif(OE) [last unloaded: netconsole]
Mar 17 01:38:22 offine-stand-stor-0 [ 4334.798957] CR2: 000000000000078c</code></pre></div><p dir="auto" style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji"">Second Incident Call Trace:</p><div style="box-sizing:border-box;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";font-size:14px;overflow:auto"><pre style="box-sizing:border-box;font-size:11.9px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;border-radius:6px"><code style="box-sizing:border-box;font-size:11.9px;padding:0px;margin:0px;background:none;border-radius:6px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">Mar 17 01:59:55 offine-stand-stor-0 [ 460.678989] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-2: Preparing remote state change 3569889254
Mar 17 01:59:55 offine-stand-stor-0 [ 460.699329] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-2: Committing remote state change 3569889254 (primary_nodes=0)
Mar 17 01:59:55 offine-stand-stor-0 [ 460.702481] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054 offline-stand-stor-2: pdsk( UpToDate -> Detaching ) [remote]
Mar 17 01:59:55 offine-stand-stor-0 [ 460.719818] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Preparing cluster-wide state change 411107339 (0->-1 7680/1024)
Mar 17 01:59:55 offine-stand-stor-0 [ 460.721900] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054 offline-stand-stor-2: pdsk( Detaching -> Diskless ) [peer-state]
Mar 17 01:59:55 offine-stand-stor-0 [ 460.735629] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: State change 411107339: primary_nodes=0, weak_nodes=0
Mar 17 01:59:55 offine-stand-stor-0 [ 460.736683] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Committing cluster-wide state change 411107339 (20ms)
Mar 17 01:59:55 offine-stand-stor-0 [ 460.737923] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: disk( UpToDate -> Detaching ) [detach]
Mar 17 01:59:55 offine-stand-stor-0 [ 460.740155] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: Would lose quorum, but using tiebreaker logic to keep
Mar 17 01:59:55 offine-stand-stor-0 [ 460.740886] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: disk( Detaching -> Diskless ) [go-diskless]
Mar 17 01:59:55 offine-stand-stor-0 [ 460.758615] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: drbd_bm_resize called with capacity == 0
Mar 17 01:59:56 offine-stand-stor-0 [ 461.096657] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
Mar 17 01:59:56 offine-stand-stor-0 [ 461.098095] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: State change failed: State change was refused by peer node (-10)
Mar 17 01:59:56 offine-stand-stor-0 [ 461.098830] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Failed: susp-io( no -> quorum ) [del-minor]
Mar 17 01:59:56 offine-stand-stor-0 [ 461.099539] drbd /unregistered/pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: Failed: quorum( yes -> no ) [del-minor]
Mar 17 01:59:56 offine-stand-stor-0 [ 461.100281] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054 offline-stand-stor-2: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [del-minor]
Mar 17 01:59:56 offine-stand-stor-0 [ 461.102205] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
Mar 17 01:59:56 offine-stand-stor-0 [ 461.104149] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: State change failed: State change was refused by peer node (-10)
Mar 17 01:59:56 offine-stand-stor-0 [ 461.105104] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Failed: susp-io( no -> quorum ) [del-minor]
Mar 17 01:59:56 offine-stand-stor-0 [ 461.106169] drbd /unregistered/pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054: Failed: quorum( yes -> no ) [del-minor]
Mar 17 01:59:56 offine-stand-stor-0 [ 461.107126] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a/0 drbd1054 offline-stand-stor-1: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [del-minor]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.212691] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: sock was shut down by peer
Mar 17 01:59:57 offine-stand-stor-0 [ 462.212741] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: meta connection shut down by peer.
Mar 17 01:59:57 offine-stand-stor-0 [ 462.213491] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown )
Mar 17 01:59:57 offine-stand-stor-0 [ 462.215789] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Preparing cluster-wide state change 605233927 (0->-1 0/0)
Mar 17 01:59:57 offine-stand-stor-0 [ 462.233366] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: Terminating sender thread
Mar 17 01:59:57 offine-stand-stor-0 [ 462.234120] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a offline-stand-stor-1: Starting sender thread (from drbd_r_pvc-fc9d [12946])
Mar 17 01:59:57 offine-stand-stor-0 [ 462.235535] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: State change 605233927: primary_nodes=0, weak_nodes=0
Mar 17 01:59:57 offine-stand-stor-0 [ 462.236317] drbd pvc-fc9def6f-a8c8-442f-beb1-f5a61262fc7a: Committing cluster-wide state change 605233927 (24ms)
Mar 17 01:59:57 offine-stand-stor-0 [ 462.263975] BUG: kernel NULL pointer dereference, address: 000000000000038c
Mar 17 01:59:57 offine-stand-stor-0 [ 462.264705] #PF: supervisor write access in kernel mode
Mar 17 01:59:57 offine-stand-stor-0 [ 462.265422] #PF: error_code(0x0002) - not-present page
Mar 17 01:59:57 offine-stand-stor-0 [ 462.266124] PGD 0 P4D 0
Mar 17 01:59:57 offine-stand-stor-0 [ 462.266813] Oops: 0002 [#1] SMP PTI
Mar 17 01:59:57 offine-stand-stor-0 [ 462.267525] CPU: 1 PID: 12946 Comm: drbd_r_pvc-fc9d Kdump: loaded Tainted: G OE 5.15.0-83-generic #astra1+ci14
Mar 17 01:59:57 offine-stand-stor-0 [ 462.268226] Hardware name: Supermicro SYS-5039MS-H8TRF/X11SSD-F, BIOS 2.3 12/20/2019
Mar 17 01:59:57 offine-stand-stor-0 [ 462.268918] RIP: 0010:_raw_spin_lock_irq+0x17/0x40
Mar 17 01:59:57 offine-stand-stor-0 [ 462.269616] Code: cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 06 5d c3 cc cc cc cc 89 c6 e8 d6 c5 42 ff 66 90 5d
Mar 17 01:59:57 offine-stand-stor-0 [ 462.271066] RSP: 0018:ffffb93ebe2b7c48 EFLAGS: 00010046
Mar 17 01:59:57 offine-stand-stor-0 [ 462.271840] RAX: 0000000000000000 RBX: ffff9ec8c6fc3e40 RCX: 0000000000000000
Mar 17 01:59:57 offine-stand-stor-0 [ 462.272555] RDX: 0000000000000001 RSI: ffffb93ebe2b7ca0 RDI: 000000000000038c
Mar 17 01:59:57 offine-stand-stor-0 [ 462.273248] RBP: ffffb93ebe2b7c48 R08: ffff9eca8f6fd3e0 R09: ffff9eca8f6fd3e0
Mar 17 01:59:57 offine-stand-stor-0 [ 462.273944] R10: ffff9eca8f6fd3e0 R11: ffff9eca8f6fd3e0 R12: 0000000000000001
Mar 17 01:59:57 offine-stand-stor-0 [ 462.274600] R13: ffff9ec9e441b800 R14: 000000000000038c R15: ffff9eca8f6fd000
Mar 17 01:59:57 offine-stand-stor-0 [ 462.275252] FS: 0000000000000000(0000) GS:ffff9ed017a40000(0000) knlGS:0000000000000000
Mar 17 01:59:57 offine-stand-stor-0 [ 462.275900] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 17 01:59:57 offine-stand-stor-0 [ 462.276568] CR2: 000000000000038c CR3: 0000000312c10006 CR4: 00000000003706e0
Mar 17 01:59:57 offine-stand-stor-0 [ 462.277207] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 17 01:59:57 offine-stand-stor-0 [ 462.277862] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 17 01:59:57 offine-stand-stor-0 [ 462.278534] Call Trace:
Mar 17 01:59:57 offine-stand-stor-0 [ 462.279259] <TASK>
Mar 17 01:59:57 offine-stand-stor-0 [ 462.279938] ? show_regs.cold.16+0x1a/0x1f
Mar 17 01:59:57 offine-stand-stor-0 [ 462.280548] ? __die_body+0x1f/0x70
Mar 17 01:59:57 offine-stand-stor-0 [ 462.281175] ? __die+0x2a/0x35
Mar 17 01:59:57 offine-stand-stor-0 [ 462.281752] ? page_fault_oops+0x136/0x2b0
Mar 17 01:59:57 offine-stand-stor-0 [ 462.282375] ? do_user_addr_fault+0x33e/0x660
Mar 17 01:59:57 offine-stand-stor-0 [ 462.282953] ? finish_task_switch+0x81/0x2a0
Mar 17 01:59:57 offine-stand-stor-0 [ 462.283551] ? exc_page_fault+0x7e/0x170
Mar 17 01:59:57 offine-stand-stor-0 [ 462.284180] ? asm_exc_page_fault+0x27/0x30
Mar 17 01:59:57 offine-stand-stor-0 [ 462.284738] ? _raw_spin_lock_irq+0x17/0x40
Mar 17 01:59:57 offine-stand-stor-0 [ 462.285355] drbd_free_peer_req+0xa9/0x240 [drbd]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.285905] drbd_finish_peer_reqs+0xc2/0x180 [drbd]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.286463] drain_resync_activity+0x579/0xdc0 [drbd]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.287001] ? wake_up_q+0x4e/0x90
Mar 17 01:59:57 offine-stand-stor-0 [ 462.287483] ? __mutex_unlock_slowpath.isra.24+0x9c/0x110
Mar 17 01:59:57 offine-stand-stor-0 [ 462.288077] ? mutex_unlock+0x26/0x30
Mar 17 01:59:57 offine-stand-stor-0 [ 462.288539] conn_disconnect+0x1b3/0xa40 [drbd]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.289045] drbd_receiver+0x5ef/0x990 [drbd]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.289515] ? drbd_unplug_all_devices+0x50/0x50 [drbd]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.290042] drbd_thread_setup+0x85/0x1e0 [drbd]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.290486] ? inc_open_count+0xb0/0xb0 [drbd]
Mar 17 01:59:57 offine-stand-stor-0 [ 462.290925] kthread+0x12d/0x150
Mar 17 01:59:57 offine-stand-stor-0 [ 462.291348] ? set_kthread_struct+0x50/0x50
Mar 17 01:59:57 offine-stand-stor-0 [ 462.291761] ret_from_fork+0x1f/0x30
Mar 17 01:59:57 offine-stand-stor-0 [ 462.292192] </TASK>
Mar 17 01:59:57 offine-stand-stor-0 [ 462.292583] Modules linked in: drbd_transport_tcp(OE) udp_diag(E) ip_set(E) xt_CT(E) cls_bpf(E) sch_ingress(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) sch_fq(E) bcache(E) crc64(E) xfrm_user(E) dm_cache(E) xfrm_algo(E) dm_writecache(E) veth(E) nf_tables(E) nfnetlink(E) xt_socket(E) nf_socket_ipv4(E) nf_socket_ipv6(E) ip6table_raw(E) iptable_raw(E) nvme_rdma(E) nvme_fabrics(E) nvmet_rdma(E) nvmet(E) nvme_core(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) ip6table_filter(E) ip6table_nat(E) ip6table_mangle(E) ip6_tables(E) xt_MASQUERADE(E) xt_mark(E) iptable_nat(E) nf_nat(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_comment(E) iptable_filter(E) iptable_mangle(E) bpfilter(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) tcp_diag(E) inet_diag(E) aufs(E) overlay(E) intel_rapl_msr(E) intel_rapl_common(E) intel_tcc_cooling(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E)
Mar 17 01:59:57 offine-stand-stor-0 [ 462.292625] ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) intel_cstate(E) ipmi_ssif(E) ast(E) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) cec(E) rc_core(E) drm(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) input_leds(E) joydev(E) sysimgblt(E) ee1004(E) mei_me(E) acpi_ipmi(E) intel_pch_thermal(E) mei(E) ipmi_si(E) ie31200_edac(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E) mac_hid(E) netconsole(E) handshake(OE) drbd(OE) lru_cache(E) libcrc32c(E) br_netfilter(E) bridge(E) stp(E) llc(E) parport_pc(E) ppdev(E) lp(E) parport(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) i2c_i801(E) i2c_smbus(E) igb(E) intel_ish_ipc(E) xhci_pci(E) i2c_algo_bit(E) xhci_pci_renesas(E) intel_ishtp(E) dca(E) video(E) parsec(OE) digsig_verif(OE)
Mar 17 01:59:57 offine-stand-stor-0 [ 462.299942] CR2: 000000000000038c
<br></code></pre></div><p dir="auto" style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji"">DRBD version:</p><div style="box-sizing:border-box;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";overflow:auto"><pre style="box-sizing:border-box;font-size:11.9px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;border-radius:6px"><code style="box-sizing:border-box;font-size:11.9px;padding:0px;margin:0px;background:none;border-radius:6px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">cat /proc/drbd
version: 9.2.8 (api:2/proto:86-122)
GIT-hash:123456 build by @offine-stand-stor-0, 2024-03-14 14:27:44
Transports (api:20): tcp (9.2.8)</code></pre></div><p dir="auto" style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji"">Please find the attached log file for more detailed information surrounding the kernel panic incident.<br style="box-sizing:border-box"></p><p dir="auto" style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(31,35,40);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji""><span style="font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Noto Sans",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji"">Thank you in advance for your support.</span><br></p><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><pre lang="plaintext" style="color:rgb(0,0,0)"><pre lang="plaintext"><span lang="plaintext">-- </span>
<span lang="plaintext">Best Regards,</span>
Aleksandr Zimin</pre></pre></div></div></div></div>
</blockquote></div></div>