Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello everyone, i need some help. I've got 2 servers in DRBD mirror, replication link is 4x1Gbit in balance-rr bond, resulting in about 3.8 Gbit/s as measured by iperf. No errors on interfaces at all, the replication never fails under normal conditions, it's been up for almost a year. But when i decide to do a verification of a DRBD device, it goes fine for some time (each time different), but then it fails. Kernel is 3.4.42, DRBD is built as external module version 8.4.3 Any help would be appreciated. dmesg: [129075.674385] block drbd0: conn( Connected -> VerifyS ) [129075.674416] block drbd0: Starting Online Verify from sector 44054936 [132983.906214] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [133195.313395] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [133287.266490] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [133297.266822] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 5 [134100.973873] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [134771.926400] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [134863.449429] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [135089.907073] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [135099.907403] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 5 [135109.907719] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 4 [136072.370061] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [137326.482215] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [137336.492527] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 5 [137718.945362] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [138309.405249] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [138571.674094] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [138595.804909] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [138616.045610] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 6 [138626.045929] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 5 [138636.046296] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 4 [138646.046603] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 3 [138656.046968] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg time expired, ko = 2 [138666.012277] d-con VM_STORAGE2_1: sock_sendmsg returned -104 [138666.012315] block drbd0: Online Verify reached sector 3356693320 [138666.012324] d-con VM_STORAGE2_1: meta connection shut down by peer. [138666.012376] d-con VM_STORAGE2_1: peer( Primary -> Unknown ) conn( VerifyS -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) [138666.012430] block drbd0: drbd_alloc_pages interrupted! [138666.012462] d-con VM_STORAGE2_1: error receiving OVReply, e: -12 l: 20! [138666.012610] d-con VM_STORAGE2_1: asender terminated [138666.012640] d-con VM_STORAGE2_1: Terminating drbd_a_VM_STORA [138666.016071] d-con VM_STORAGE2_1: Connection closed [138666.016108] d-con VM_STORAGE2_1: conn( BrokenPipe -> Unconnected ) [138666.016137] d-con VM_STORAGE2_1: receiver terminated [138666.016163] d-con VM_STORAGE2_1: Restarting receiver thread [138666.016190] d-con VM_STORAGE2_1: receiver (re)started [138666.016220] d-con VM_STORAGE2_1: conn( Unconnected -> WFConnection ) [138672.027505] d-con VM_STORAGE2_1: Handshake successful: Agreed network protocol version 101 [138672.027751] d-con VM_STORAGE2_1: Peer authenticated using 20 bytes HMAC [138672.027806] d-con VM_STORAGE2_1: conn( WFConnection -> WFReportParams ) [138672.027836] d-con VM_STORAGE2_1: Starting asender thread (from drbd_r_VM_STORA [3899]) [138672.157504] block drbd0: drbd_sync_handshake: [138672.157532] block drbd0: self 4069562B352F2C2A:0000000000000000:509EB0F8F224F810:509DB0F8F224F811 bits:0 flags:0 [138672.157583] block drbd0: peer C3668D93B3352143:4069562B352F2C2B:509EB0F8F224F811:509DB0F8F224F811 bits:3966 flags:0 [138672.157634] block drbd0: uuid_compare()=-1 by rule 50 [138672.157665] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) [138672.218870] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 1606(1), total 1606; compression: 100.0% [138672.276014] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 1606(1), total 1606; compression: 100.0% [138672.276073] block drbd0: conn( WFBitMapT -> WFSyncUUID ) [138672.295163] block drbd0: updated sync uuid 406A562B352F2C2A:0000000000000000:509EB0F8F224F810:509DB0F8F224F811 [138672.295429] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 [138672.296713] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) [138672.296776] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) [138672.296829] block drbd0: Began resync as SyncTarget (will sync 15864 KB [3966 bits set]). [138688.644326] block drbd0: Resync done (total 16 sec; paused 0 sec; 988 K/sec) [138688.644361] block drbd0: 9 % had equal checksums, eliminated: 1580K; transferred 14284K total 15864K [138688.644412] block drbd0: updated UUIDs C3668D93B3352142:0000000000000000:406A562B352F2C2A:4069562B352F2C2B [138688.644465] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) [138688.644658] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 [138688.645847] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) Global config: global { usage-count no; } common { protocol B; handlers { } startup { wfc-timeout 10; } disk { c-plan-ahead 0; al-extents 6433; resync-rate 400M; disk-barrier no; disk-flushes no; disk-drain yes; } net { sndbuf-size 1024k; rcvbuf-size 1024k; max-buffers 8192; max-epoch-size 8192; unplug-watermark 8192; timeout 100; ping-int 15; ping-timeout 60; connect-int 15; timeout 50; verify-alg sha1; csums-alg sha1; data-integrity-alg crc32c; cram-hmac-alg sha1; shared-secret "xxxxxxxxxxxxxxxxxxxxxxxx"; use-rle; } }