Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, we have one cluster where DRBD starts whole sync after partitial sync is done after slave outage (reboot). OS: Debian wheezy (with backports) Kernel: 3.16.0-0.bpo.4-amd64 DRBD: 8.4.3 (api:1/proto:86-101) Messages master: Last message after sync: [1206040.458026] block drbd0: Resync done (total 500 sec; paused 0 sec; 87380 K/sec) Next messages: [1206041.174251] block drbd0: 6 % had equal checksums, eliminated: 2666868K; transferred 41023432K total 43690300K [1206042.140002] block drbd0: updated UUIDs 42B85C0E500CF72D:0000000000000000:0873BB844C6201E3:0872BB844C6201E3 [1206043.080803] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) [1206043.262926] drbd data: sock was shut down by peer [1206043.262976] drbd data: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) [1206043.262980] drbd data: short read (expected size 16) [1206043.263044] block drbd0: new current UUID 1BD4DE479F2C0B97:42B85C0E500CF72D:0873BB844C6201E3:0872BB844C6201E3 [1206043.263180] drbd data: sock_sendmsg returned -32 [1206043.263204] drbd data: sock_sendmsg returned -32 [1206044.534408] drbd data: asender terminated [1206044.534412] drbd data: Terminating drbd_a_data [1206044.535425] drbd data: Connection closed [1206044.535615] drbd data: conn( BrokenPipe -> Unconnected ) [1206044.535646] drbd data: receiver terminated [1206044.535648] drbd data: Restarting receiver thread [1206044.535649] drbd data: receiver (re)started [1206044.535666] drbd data: conn( Unconnected -> WFConnection ) [1206046.465751] drbd data: Handshake successful: Agreed network protocol version 101 [1206046.465752] drbd data: Agreed to support TRIM on protocol level [1206047.740547] drbd data: Peer authenticated using 20 bytes HMAC [1206047.740577] drbd data: conn( WFConnection -> WFReportParams ) [1206047.740579] drbd data: Starting asender thread (from drbd_r_data [5895]) [1206054.482162] block drbd0: drbd_sync_handshake: [1206054.915614] block drbd0: self 1BD4DE479F2C0B97:42B85C0E500CF72D:0873BB844C6201E3:0872BB844C6201E3 bits:11267 flags:0 [1206055.939586] block drbd0: peer 0873BB844C6201E2:0000000000000000:059C0C1327B18B3A:059B0C1327B18B3B bits:0 flags:0 [1206056.931137] block drbd0: uuid_compare()=2 by rule 80 [1206057.423291] block drbd0: Becoming sync source due to disk states. [1206058.023637] block drbd0: Writing the whole bitmap, full sync required after drbd_sync_handshake. [1206059.401331] block drbd0: bitmap WRITE of 38438 pages took 116 jiffies [1206060.034413] block drbd0: 4826 GB (1265106944 bits) marked out-of-sync by on disk bit-map. [1206060.835611] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Inconsistent ) Messages on slave: Last message after sync: [ 1158.409405] drbd data: PingAck did not arrive in time. Next messages on slave: [ 1158.901182] drbd data: peer( Primary -> Unknown ) conn( SyncTarget -> Network Failure ) pdsk( UpToDate -> DUnknown ) [ 1159.908724] drbd data: asender terminated [ 1160.292206] drbd data: Terminating drbd_a_data [ 1160.717772] drbd data: Connection closed [ 1161.093012] drbd data: conn( NetworkFailure -> Unconnected ) [ 1161.642927] drbd data: receiver terminated [ 1162.034807] drbd data: Restarting receiver thread [ 1162.484870] drbd data: receiver (re)started [ 1162.885054] drbd data: conn( Unconnected -> WFConnection ) [ 1163.920393] drbd data: Handshake successful: Agreed network protocol version 101 [ 1164.628365] drbd data: Agreed to support TRIM on protocol level [ 1165.195112] drbd data: Peer authenticated using 20 bytes HMAC [ 1165.745138] drbd data: conn( WFConnection -> WFReportParams ) [ 1166.303334] drbd data: Starting asender thread (from drbd_r_data [9063]) [ 1166.969118] block drbd0: drbd_sync_handshake: [ 1167.494609] block drbd0: self 0873BB844C6201E2:0000000000000000:059C0C1327B18B3A:059B0C1327B18B3B bits:0 flags:0 [ 1168.601650] block drbd0: peer 1BD4DE479F2C0B97:42B85C0E500CF72D:0873BB844C6201E3:0872BB844C6201E3 bits:11267 flags:0 [ 1168.601796] block drbd0: uuid_compare()=-2 by rule 60 [ 1168.601798] block drbd0: Becoming sync target due to disk states. [ 1168.601801] block drbd0: Writing the whole bitmap, full sync required after drbd_sync_handshake. [ 1169.084569] block drbd0: bitmap WRITE of 38608 pages took 109 jiffies [ 1169.084571] block drbd0: 4826 GB (1265106944 bits) marked out-of-sync by on disk bit-map. [ 1169.084625] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) [ 1180.450416] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% [ 1181.493636] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% [ 1182.476110] block drbd0: conn( WFBitMapT -> WFSyncUUID ) [ 1186.602623] block drbd0: updated sync uuid 42B95C0E500CF72C:0000000000000000:059C0C1327B18B3A:059B0C1327B18B3B [ 1187.560349] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 [ 1188.295849] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) [ 1189.178590] block drbd0: conn( WFSyncUUID -> SyncTarget ) [ 1189.703520] block drbd0: Began resync as SyncTarget (will sync 5060427776 KB [1265106944 bits set]). And it is repeatable. Any idea whats going wrong here? Best, Dimitrij -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150721/8fe38186/attachment.pgp>