[DRBD-user] Device verification always fails with timeout

Igor Novgorodov igor at novg.net
Mon Jun 30 08:35:38 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello everyone, i need some help.

I've got 2 servers in DRBD mirror, replication link is 4x1Gbit in 
balance-rr bond, resulting in about 3.8 Gbit/s as measured by iperf.
No errors on interfaces at all, the replication never fails under normal 
conditions, it's been up for almost a year.

But when i decide to do a verification of a DRBD device, it goes fine 
for some time (each time different), but then it fails.
Kernel is 3.4.42, DRBD is built as external module version 8.4.3

Any help would be appreciated.

dmesg:
[129075.674385] block drbd0: conn( Connected -> VerifyS )
[129075.674416] block drbd0: Starting Online Verify from sector 44054936
[132983.906214] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[133195.313395] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[133287.266490] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[133297.266822] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 5
[134100.973873] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[134771.926400] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[134863.449429] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[135089.907073] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[135099.907403] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 5
[135109.907719] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 4
[136072.370061] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[137326.482215] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[137336.492527] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 5
[137718.945362] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[138309.405249] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[138571.674094] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[138595.804909] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[138616.045610] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 6
[138626.045929] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 5
[138636.046296] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 4
[138646.046603] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 3
[138656.046968] d-con VM_STORAGE2_1: [drbd_w_VM_STORA/3871] sock_sendmsg 
time expired, ko = 2
[138666.012277] d-con VM_STORAGE2_1: sock_sendmsg returned -104
[138666.012315] block drbd0: Online Verify reached sector 3356693320
[138666.012324] d-con VM_STORAGE2_1: meta connection shut down by peer.
[138666.012376] d-con VM_STORAGE2_1: peer( Primary -> Unknown ) conn( 
VerifyS -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
[138666.012430] block drbd0: drbd_alloc_pages interrupted!
[138666.012462] d-con VM_STORAGE2_1: error receiving OVReply, e: -12 l: 20!
[138666.012610] d-con VM_STORAGE2_1: asender terminated
[138666.012640] d-con VM_STORAGE2_1: Terminating drbd_a_VM_STORA
[138666.016071] d-con VM_STORAGE2_1: Connection closed
[138666.016108] d-con VM_STORAGE2_1: conn( BrokenPipe -> Unconnected )
[138666.016137] d-con VM_STORAGE2_1: receiver terminated
[138666.016163] d-con VM_STORAGE2_1: Restarting receiver thread
[138666.016190] d-con VM_STORAGE2_1: receiver (re)started
[138666.016220] d-con VM_STORAGE2_1: conn( Unconnected -> WFConnection )
[138672.027505] d-con VM_STORAGE2_1: Handshake successful: Agreed 
network protocol version 101
[138672.027751] d-con VM_STORAGE2_1: Peer authenticated using 20 bytes HMAC
[138672.027806] d-con VM_STORAGE2_1: conn( WFConnection -> WFReportParams )
[138672.027836] d-con VM_STORAGE2_1: Starting asender thread (from 
drbd_r_VM_STORA [3899])
[138672.157504] block drbd0: drbd_sync_handshake:
[138672.157532] block drbd0: self 
4069562B352F2C2A:0000000000000000:509EB0F8F224F810:509DB0F8F224F811 
bits:0 flags:0
[138672.157583] block drbd0: peer 
C3668D93B3352143:4069562B352F2C2B:509EB0F8F224F811:509DB0F8F224F811 
bits:3966 flags:0
[138672.157634] block drbd0: uuid_compare()=-1 by rule 50
[138672.157665] block drbd0: peer( Unknown -> Primary ) conn( 
WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( 
DUnknown -> UpToDate )
[138672.218870] block drbd0: receive bitmap stats [Bytes(packets)]: 
plain 0(0), RLE 1606(1), total 1606; compression: 100.0%
[138672.276014] block drbd0: send bitmap stats [Bytes(packets)]: plain 
0(0), RLE 1606(1), total 1606; compression: 100.0%
[138672.276073] block drbd0: conn( WFBitMapT -> WFSyncUUID )
[138672.295163] block drbd0: updated sync uuid 
406A562B352F2C2A:0000000000000000:509EB0F8F224F810:509DB0F8F224F811
[138672.295429] block drbd0: helper command: /sbin/drbdadm 
before-resync-target minor-0
[138672.296713] block drbd0: helper command: /sbin/drbdadm 
before-resync-target minor-0 exit code 0 (0x0)
[138672.296776] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( 
Outdated -> Inconsistent )
[138672.296829] block drbd0: Began resync as SyncTarget (will sync 15864 
KB [3966 bits set]).
[138688.644326] block drbd0: Resync done (total 16 sec; paused 0 sec; 
988 K/sec)
[138688.644361] block drbd0: 9 % had equal checksums, eliminated: 1580K; 
transferred 14284K total 15864K
[138688.644412] block drbd0: updated UUIDs 
C3668D93B3352142:0000000000000000:406A562B352F2C2A:4069562B352F2C2B
[138688.644465] block drbd0: conn( SyncTarget -> Connected ) disk( 
Inconsistent -> UpToDate )
[138688.644658] block drbd0: helper command: /sbin/drbdadm 
after-resync-target minor-0
[138688.645847] block drbd0: helper command: /sbin/drbdadm 
after-resync-target minor-0 exit code 0 (0x0)

Global config:
global {
     usage-count no;
}

common {
     protocol B;

     handlers {
     }

     startup {
         wfc-timeout 10;
     }

     disk {
         c-plan-ahead 0;
         al-extents 6433;
         resync-rate 400M;
         disk-barrier no;
         disk-flushes no;
         disk-drain yes;
     }

     net {
         sndbuf-size 1024k;
         rcvbuf-size 1024k;

         max-buffers 8192;
         max-epoch-size 8192;
         unplug-watermark 8192;

         timeout 100;
         ping-int 15;
         ping-timeout 60;
         connect-int 15;
         timeout 50;

         verify-alg sha1;
         csums-alg sha1;
         data-integrity-alg crc32c;
         cram-hmac-alg sha1;
         shared-secret "xxxxxxxxxxxxxxxxxxxxxxxx";
         use-rle;
     }
}




More information about the drbd-user mailing list