Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello there, not sure if this was already reported, but here goes: 3 node setup following the manual, all settings on defaults with two exceptions: [GLOBAL] storage-plugin = drbdmanage.storage.lvm.Lvm common { net { verify-alg crc32c; } My workload (KVM with guest disk cache=none) every now and then triggers the "Digest mismatch, buffer modified by upper layers during write:" error. As per documentation, a digest mismatch should immediately cause a disconnect/reconnect/resync. The first observed problem is: in my case the disconnect happens immediately only sometimes, and sometimes a lot of those messages are repeated and a couple of minutes can pass before it does disconnect. Second observed problem: sometimes when such disconnect happens, the drbd thread on primary node can crash, (after that kernel works for a moment before all cpus finally lock up and the node needs a power cycle) Disabling verify-alg helps, but maybe simply because there are no more disconnects happening to trigger the crash? software versions: --- [ 0.000000] Linux version 4.2.8-1-pve (root at elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Fri Feb 26 16:37:36 CET 2016 () (...) [ 116.274693] drbd: initialized. Version: 9.0.1-1 (api:2/proto:86-111) [ 116.281869] drbd: GIT-hash: 3d38916489fac62b036d8e79d3fcd81d318ca4cb build by root at elsa, 2016-02-26 16:42:55 --- crash relevant dmesg output: --- [114257.391726] drbd vm-10030-disk-1/0 drbd104: Digest mismatch, buffer modified by upper layers during write: 3534072s +163840 [114257.408687] drbd vm-10030-disk-1 hn51: Connection closed [114257.414970] drbd vm-10030-disk-1 hn51: conn( Disconnecting -> StandAlone ) [114257.422821] drbd vm-10030-disk-1 hn51: Terminating receiver thread [114257.450955] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 [114257.459878] IP: [<ffffffffc04b0ffa>] _tl_restart+0xaa/0xf0 [drbd] [114257.466834] PGD 0 [114257.469214] Oops: 0000 [#1] SMP [114257.472968] Modules linked in: veth act_police cls_u32 sch_ingress sch_htb drbd_transport_tcp(O) drbd(O) ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment xt_tcpudp xt_mark xt_addrtype ip_set_hash_net softdog iptable_filter nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc 8021q garp mrp openvswitch libcrc32c bonding xt_set ip_set xt_multiport xt_conntrack ip6table_filter ip6_tables xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nfnetlink_log nfnetlink intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul snd_pcm crc32_pclmul snd_timer aesni_intel snd aes_x86_64 soundcore lrw uas gf128mul glue_helper ablk_helper cdc_ether cryptd pcspkr sb_edac [114257.559773] usbnet usb_storage mii edac_core lpc_ich 8250_fintek shpchp mac_hid ioatdma wmi ipmi_ssif vhost_net vhost macvtap macvlan ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 igb i2c_algo_bit dca ahci ptp libahci pps_core megaraid_sas [114257.583826] CPU: 5 PID: 10103 Comm: drbd_r_vm-10030 Tainted: G O 4.2.8-1-pve #1 [114257.593406] Hardware name: IBM System x3650 M4 -[7915E3G]-/00W2665, BIOS -[VVE146AUS-2.00]- 09/17/2015 [114257.603957] task: ffff881003872940 ti: ffff88201b2ec000 task.ti: ffff88201b2ec000 [114257.612467] RIP: 0010:[<ffffffffc04b0ffa>] [<ffffffffc04b0ffa>] _tl_restart+0xaa/0xf0 [drbd] [114257.622159] RSP: 0018:ffff88201b2efcd8 EFLAGS: 00010082 [114257.628219] RAX: ffff88103573d1e0 RBX: ffff881fc0d56db0 RCX: 0000000000000000 [114257.636342] RDX: ffff881fc0d56e08 RSI: 0000000000000092 RDI: 0000000000000092 [114257.644464] RBP: ffff88201b2efd38 R08: 0000000000000000 R09: 0000000180190014 [114257.652587] R10: ffff88103fb60720 R11: ffff8810010f7200 R12: ffff882035dbd930 [114257.660709] R13: ffff881035e22000 R14: 0000000000000000 R15: ffff881fc0d56db0 [114257.668832] FS: 0000000000000000(0000) GS:ffff88103fb40000(0000) knlGS:0000000000000000 [114257.678022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [114257.684567] CR2: 0000000000000068 CR3: 0000000001e0d000 CR4: 00000000000426e0 [114257.692689] Stack: [114257.695055] ffff881035e22048 0000000b00000286 ffff88201b2efd18 ffff880f7a01df00 [114257.703525] 0000000000000000 0000000016746d9c ffff882035dbdbb0 ffff882035dbd918 [114257.711992] ffff881035e22000 000000000000000b ffff882034cc9800 ffff882034cc9800 [114257.720456] Call Trace: [114257.723318] [<ffffffffc04b1082>] tl_restart+0x42/0x60 [drbd] [114257.729860] [<ffffffffc04b10b3>] tl_clear+0x13/0x20 [drbd] [114257.736215] [<ffffffffc04a3681>] conn_disconnect+0x281/0x830 [drbd] [114257.743447] [<ffffffffc04d30b6>] ? change_cstate+0x86/0xc0 [drbd] [114257.750483] [<ffffffffc0499590>] ? got_IsInSync+0x300/0x300 [drbd] [114257.757617] [<ffffffffc04a4887>] drbd_receiver+0x177/0x5e0 [drbd] [114257.764654] [<ffffffffc04af390>] ? w_complete+0x20/0x20 [drbd] [114257.771397] [<ffffffffc04af3f4>] drbd_thread_setup+0x64/0x120 [drbd] [114257.778723] [<ffffffffc04af390>] ? w_complete+0x20/0x20 [drbd] [114257.785466] [<ffffffff8109b1fa>] kthread+0xea/0x100 [114257.791140] [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0 [114257.798562] [<ffffffff8180af1f>] ret_from_fork+0x3f/0x70 [114257.804718] [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0 [114257.812139] Code: 75 b8 4c 89 f7 e8 87 5f ff ff 48 8b 43 58 48 8d 53 58 49 89 df 48 83 e8 58 49 39 d4 74 29 48 89 c3 49 8b 45 48 4d 8b 37 48 85 c0 <41> 8b 76 68 74 a8 89 f2 30 d2 3b 10 75 a0 40 0f b6 f6 48 8d 04 [114257.834267] RIP [<ffffffffc04b0ffa>] _tl_restart+0xaa/0xf0 [drbd] [114257.841312] RSP <ffff88201b2efcd8> [114257.845331] CR2: 0000000000000068 [114257.849575] ---[ end trace 7c993d7d40ff47ee ]--- [114271.130266] ------------[ cut here ]------------ --- -- Jan Janicki