[DRBD-user] DRBD resyncs completely after each partitial sync

Dimitrij Hilt dimitrij.hilt at fhe3.com
Tue Jul 21 12:43:17 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

we have one cluster where DRBD starts whole sync after partitial sync is
done after slave outage (reboot).

OS: Debian wheezy (with backports)
Kernel: 3.16.0-0.bpo.4-amd64
DRBD: 8.4.3 (api:1/proto:86-101)

Messages master:

Last message after sync:
[1206040.458026] block drbd0: Resync done (total 500 sec; paused 0 sec;
87380 K/sec)

Next messages:
[1206041.174251] block drbd0: 6 % had equal checksums, eliminated:
2666868K; transferred 41023432K total 43690300K
[1206042.140002] block drbd0: updated UUIDs
42B85C0E500CF72D:0000000000000000:0873BB844C6201E3:0872BB844C6201E3
[1206043.080803] block drbd0: conn( SyncSource -> Connected ) pdsk(
Inconsistent -> UpToDate )
[1206043.262926] drbd data: sock was shut down by peer
[1206043.262976] drbd data: peer( Secondary -> Unknown ) conn( Connected
-> BrokenPipe ) pdsk( UpToDate -> DUnknown )
[1206043.262980] drbd data: short read (expected size 16)
[1206043.263044] block drbd0: new current UUID
1BD4DE479F2C0B97:42B85C0E500CF72D:0873BB844C6201E3:0872BB844C6201E3
[1206043.263180] drbd data: sock_sendmsg returned -32
[1206043.263204] drbd data: sock_sendmsg returned -32
[1206044.534408] drbd data: asender terminated
[1206044.534412] drbd data: Terminating drbd_a_data
[1206044.535425] drbd data: Connection closed
[1206044.535615] drbd data: conn( BrokenPipe -> Unconnected )
[1206044.535646] drbd data: receiver terminated
[1206044.535648] drbd data: Restarting receiver thread
[1206044.535649] drbd data: receiver (re)started
[1206044.535666] drbd data: conn( Unconnected -> WFConnection )
[1206046.465751] drbd data: Handshake successful: Agreed network
protocol version 101
[1206046.465752] drbd data: Agreed to support TRIM on protocol level
[1206047.740547] drbd data: Peer authenticated using 20 bytes HMAC
[1206047.740577] drbd data: conn( WFConnection -> WFReportParams )
[1206047.740579] drbd data: Starting asender thread (from drbd_r_data
[5895])
[1206054.482162] block drbd0: drbd_sync_handshake:
[1206054.915614] block drbd0: self
1BD4DE479F2C0B97:42B85C0E500CF72D:0873BB844C6201E3:0872BB844C6201E3
bits:11267 flags:0
[1206055.939586] block drbd0: peer
0873BB844C6201E2:0000000000000000:059C0C1327B18B3A:059B0C1327B18B3B
bits:0 flags:0
[1206056.931137] block drbd0: uuid_compare()=2 by rule 80
[1206057.423291] block drbd0: Becoming sync source due to disk states.
[1206058.023637] block drbd0: Writing the whole bitmap, full sync
required after drbd_sync_handshake.
[1206059.401331] block drbd0: bitmap WRITE of 38438 pages took 116 jiffies
[1206060.034413] block drbd0: 4826 GB (1265106944 bits) marked
out-of-sync by on disk bit-map.
[1206060.835611] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Inconsistent )


Messages on slave:

Last message after sync:
[ 1158.409405] drbd data: PingAck did not arrive in time.

Next messages on slave:
[ 1158.901182] drbd data: peer( Primary -> Unknown ) conn( SyncTarget ->
Network
Failure ) pdsk( UpToDate -> DUnknown )
[ 1159.908724] drbd data: asender terminated
[ 1160.292206] drbd data: Terminating drbd_a_data
[ 1160.717772] drbd data: Connection closed
[ 1161.093012] drbd data: conn( NetworkFailure -> Unconnected )
[ 1161.642927] drbd data: receiver terminated
[ 1162.034807] drbd data: Restarting receiver thread
[ 1162.484870] drbd data: receiver (re)started
[ 1162.885054] drbd data: conn( Unconnected -> WFConnection )
[ 1163.920393] drbd data: Handshake successful: Agreed network protocol
version 101
[ 1164.628365] drbd data: Agreed to support TRIM on protocol level
[ 1165.195112] drbd data: Peer authenticated using 20 bytes HMAC
[ 1165.745138] drbd data: conn( WFConnection -> WFReportParams )
[ 1166.303334] drbd data: Starting asender thread (from drbd_r_data [9063])
[ 1166.969118] block drbd0: drbd_sync_handshake:
[ 1167.494609] block drbd0: self
0873BB844C6201E2:0000000000000000:059C0C1327B18B3A:059B0C1327B18B3B
bits:0 flags:0
[ 1168.601650] block drbd0: peer
1BD4DE479F2C0B97:42B85C0E500CF72D:0873BB844C6201E3:0872BB844C6201E3
bits:11267 flags:0
[ 1168.601796] block drbd0: uuid_compare()=-2 by rule 60
[ 1168.601798] block drbd0: Becoming sync target due to disk states.
[ 1168.601801] block drbd0: Writing the whole bitmap, full sync required
after drbd_sync_handshake.
[ 1169.084569] block drbd0: bitmap WRITE of 38608 pages took 109 jiffies
[ 1169.084571] block drbd0: 4826 GB (1265106944 bits) marked out-of-sync
by on disk bit-map.
[ 1169.084625] block drbd0: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
[ 1180.450416] block drbd0: receive bitmap stats [Bytes(packets)]: plain
0(0), RLE 23(1), total 23; compression: 100.0%
[ 1181.493636] block drbd0: send bitmap stats [Bytes(packets)]: plain
0(0), RLE 23(1), total 23; compression: 100.0%
[ 1182.476110] block drbd0: conn( WFBitMapT -> WFSyncUUID )
[ 1186.602623] block drbd0: updated sync uuid
42B95C0E500CF72C:0000000000000000:059C0C1327B18B3A:059B0C1327B18B3B
[ 1187.560349] block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0
[ 1188.295849] block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 0 (0x0)
[ 1189.178590] block drbd0: conn( WFSyncUUID -> SyncTarget )
[ 1189.703520] block drbd0: Began resync as SyncTarget (will sync
5060427776 KB [1265106944 bits set]).



And it is repeatable.

Any idea whats going wrong here?

Best,

Dimitrij




-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150721/8fe38186/attachment.pgp>


More information about the drbd-user mailing list