Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I get occasionally an error when starting DRBD with a not synced peer. Error is rare and results in an unusable drive, but surprisingly the synchronization progresses, and the DRBD restart solves the problem. Seems like some kind of race condition. As one can see from the logs below, there are 2 DRBD resources (data0, data1), where 2 nodes mirror each other (host1(primary:data0, secondary:data1), host2(primary:data1, secondary:data0)). The DRBD receivers keep restarting due to some error, but the synchronization progresses, and eventually will finish, drives will remain unusable though. During this repeated error/restart/connect client reports following errors: lk_bdev_save(/var/lib/drbd/drbd-minor-0.lkbd) failed: No such file or directory lk_bdev_save(/var/lib/drbd/drbd-minor-1.lkbd) failed: No such file or directory * Details: - Linux Kernel: Linux version 3.14.35-28.38.amzn1.x86_64 (mockbuild at gobi-build-64012) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Wed Mar 11 22:50:37 UTC 2015 - DRBD: drbd: initialized. Version: 8.4.5 (api:1/proto:86-101) drbd: GIT-hash: f8d2f4f19bb065a1e505fc32bd98610f545e1fbf build by root at ip-10-65-4-190, 2015-04-07 23:54:22 * Logs: Apr 11 14:27:20 ip-10-10-10-133 kernel: [54404.488120] drbd data0: receiver terminated Apr 11 14:27:20 ip-10-10-10-133 kernel: [54404.489956] drbd data0: Restarting receiver thread Apr 11 14:27:20 ip-10-10-10-133 kernel: [54404.491898] drbd data0: receiver (re)started Apr 11 14:27:20 ip-10-10-10-133 kernel: [54404.493670] drbd data0: conn( Unconnected -> WFConnection ) Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.296091] drbd data0: Handshake successful: Agreed network protocol version 101 Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.299204] drbd data0: Agreed to support TRIM on protocol level Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.301686] drbd data0: conn( WFConnection -> WFReportParams ) Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.304083] drbd data0: Starting asender thread (from drbd_r_data0 [30986]) Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.332074] block drbd0: drbd_sync_handshake: Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.333881] block drbd0: self CB53CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000 bits:731360 flags:0 Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.336056] block drbd0: peer 1B1BD292DDDF5BF8:CB53CA7FFD8AA1AB:CB52CA7FFD8AA1AB:CB51CA7FFD8AA1AB bits:731360 flags:0 Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.342748] block drbd0: uuid_compare()=-1 by rule 50 Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.344878] block drbd0: Becoming sync target due to disk states. Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.347655] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.354849] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 26(1), total 26; compression: 100.0% Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.359205] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 26(1), total 26; compression: 100.0% Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.363333] block drbd0: conn( WFBitMapT -> WFSyncUUID ) Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.383740] block drbd0: updated sync uuid CB54CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000 Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.388389] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.393042] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.396775] block drbd0: conn( WFSyncUUID -> SyncTarget ) Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.399003] block drbd0: Began resync as SyncTarget (will sync 2925440 KB [731360 bits set]). Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.078861] drbd data1: peer( Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.083212] drbd data1: short read (expected size 4096) Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.085476] drbd data1: error receiving RSDataReply, e: -5 l: 65536! Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.088271] drbd data1: asender terminated Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.090076] drbd data1: Terminating drbd_a_data1 Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.120091] drbd data1: Connection closed Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.121777] drbd data1: conn( BrokenPipe -> Unconnected ) Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.124343] drbd data1: receiver terminated Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.126096] drbd data1: Restarting receiver thread Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.128217] drbd data1: receiver (re)started Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.130212] drbd data1: conn( Unconnected -> WFConnection ) Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.932069] drbd data1: Handshake successful: Agreed network protocol version 101 Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.936550] drbd data1: Agreed to support TRIM on protocol level Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.939115] drbd data1: conn( WFConnection -> WFReportParams ) Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.941640] drbd data1: Starting asender thread (from drbd_r_data1 [30983]) Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.976076] block drbd1: drbd_sync_handshake: Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.977945] block drbd1: self 6A10BDA711D50A14:0000000000000000:0000000000000000:0000000000000000 bits:784272 flags:0 Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.980057] block drbd1: peer 00B04D9B652DC49A:6A10BDA711D50A14:6A0FBDA711D50A14:6A0EBDA711D50A14 bits:784272 flags:0 Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.986833] block drbd1: uuid_compare()=-1 by rule 50 Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.988954] block drbd1: Becoming sync target due to disk states. Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.991484] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.996753] block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 62(1), total 62; compression: 100.0% Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.001216] block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 62(1), total 62; compression: 100.0% Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.005574] block drbd1: conn( WFBitMapT -> WFSyncUUID ) Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.025545] block drbd1: updated sync uuid 6A11BDA711D50A14:0000000000000000:0000000000000000:0000000000000000 Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.030232] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.034928] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.038706] block drbd1: conn( WFSyncUUID -> SyncTarget ) Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.041035] block drbd1: Began resync as SyncTarget (will sync 3137088 KB [784272 bits set]). Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.454870] drbd data0: peer( Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.459167] drbd data0: short read (expected size 4096) Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.461306] drbd data0: error receiving RSDataReply, e: -5 l: 65536! Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.463868] drbd data0: asender terminated Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.465676] drbd data0: Terminating drbd_a_data0 Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.488085] drbd data0: Connection closed Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.489874] drbd data0: conn( BrokenPipe -> Unconnected ) Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.492185] drbd data0: receiver terminated Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.493930] drbd data0: Restarting receiver thread Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.495910] drbd data0: receiver (re)started Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.497779] drbd data0: conn( Unconnected -> WFConnection ) Apr 11 14:28:10 ip-10-10-10-133 kernel: [54454.300077] drbd data0: Handshake successful: Agreed network protocol version 101 [...] ... and it keeps going on, when at the end: pr 11 18:41:23 ip-10-10-10-133 kernel: [69646.953346] drbd data1: peer( Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.957662] drbd data1: short read (expected size 4096) Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.959790] drbd data1: error receiving RSDataReply, e: -5 l: 65536! Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.962508] drbd data1: asender terminated Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.964302] drbd data1: Terminating drbd_a_data1 Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.988099] drbd data1: Connection closed Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.989826] drbd data1: conn( BrokenPipe -> Unconnected ) Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.992286] drbd data1: receiver terminated Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.994114] drbd data1: Restarting receiver thread Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.996141] drbd data1: receiver (re)started Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.997934] drbd data1: conn( Unconnected -> WFConnection ) Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.800084] drbd data1: Handshake successful: Agreed network protocol version 101 Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.803443] drbd data1: Agreed to support TRIM on protocol level Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.807274] drbd data1: conn( WFConnection -> WFReportParams ) Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.810949] drbd data1: Starting asender thread (from drbd_r_data1 [30983]) Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.840073] block drbd1: drbd_sync_handshake: Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.842730] block drbd1: self 6B40BDA711D50A14:0000000000000000:0000000000000000:0000000000000000 bits:784144 flags:0 Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.844056] block drbd1: peer 00B04D9B652DC49A:6B40BDA711D50A14:6B3FBDA711D50A14:6B3EBDA711D50A14 bits:784144 flags:0 Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.855179] block drbd1: uuid_compare()=-1 by rule 50 Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.858261] block drbd1: Becoming sync target due to disk states. Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.861863] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.873530] block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 49(1), total 49; compression: 100.0% Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.877773] block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 49(1), total 49; compression: 100.0% Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.881893] block drbd1: conn( WFBitMapT -> WFSyncUUID ) Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.909812] block drbd1: updated sync uuid 6B41BDA711D50A14:0000000000000000:0000000000000000:0000000000000000 Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.914703] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.919550] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.923385] block drbd1: conn( WFSyncUUID -> SyncTarget ) Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.925739] block drbd1: Began resync as SyncTarget (will sync 3136576 KB [784144 bits set]). Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.105348] drbd data0: peer( Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.109937] drbd data0: short read (expected size 4096) Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.112268] drbd data0: error receiving RSDataReply, e: -5 l: 65536! Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.115098] drbd data0: asender terminated Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.116878] drbd data0: Terminating drbd_a_data0 Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.198211] drbd data0: Connection closed Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.200013] drbd data0: conn( BrokenPipe -> Unconnected ) Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.202312] drbd data0: receiver terminated Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.204123] drbd data0: Restarting receiver thread Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.206142] drbd data0: receiver (re)started Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.207969] drbd data0: conn( Unconnected -> WFConnection ) Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.010548] drbd data0: Handshake successful: Agreed network protocol version 101 Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.013844] drbd data0: Agreed to support TRIM on protocol level Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.016327] drbd data0: conn( WFConnection -> WFReportParams ) Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.018874] drbd data0: Starting asender thread (from drbd_r_data0 [30986]) Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.048072] block drbd0: drbd_sync_handshake: Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.049985] block drbd0: self CC85CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000 bits:731360 flags:0 Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.052055] block drbd0: peer 1B1BD292DDDF5BF8:CC85CA7FFD8AA1AB:CC84CA7FFD8AA1AB:CC83CA7FFD8AA1AB bits:731360 flags:0 Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.058665] block drbd0: uuid_compare()=-1 by rule 50 Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.060762] block drbd0: Becoming sync target due to disk states. Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.063247] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.085244] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 26(1), total 26; compression: 100.0% Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.089599] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 26(1), total 26; compression: 100.0% Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.093732] block drbd0: conn( WFBitMapT -> WFSyncUUID ) Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.124153] block drbd0: updated sync uuid CC86CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000 Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.128792] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.133632] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.137459] block drbd0: conn( WFSyncUUID -> SyncTarget ) Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.139704] block drbd0: Began resync as SyncTarget (will sync 2925440 KB [731360 bits set]). Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.122721] block drbd1: Resync done (total 63 sec; paused 0 sec; 49784 K/sec) Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.125366] block drbd1: updated UUIDs 00B04D9B652DC49A:0000000000000000:6B41BDA711D50A14:6B40BDA711D50A14 Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.126695] block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.132999] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.137767] block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0) Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.285914] block drbd0: Resync done (total 61 sec; paused 0 sec; 47956 K/sec) Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.288864] block drbd0: updated UUIDs 1B1BD292DDDF5BF8:0000000000000000:CC86CA7FFD8AA1AA:CC85CA7FFD8AA1AB Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.289893] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.296068] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.300961] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) [DRBD END] * At the end the DRBD startup logs (the longest, thus placed at the end) [DRBD BEGIN] Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.617812] block drbd0: disk( Diskless -> Attaching ) Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.624329] block drbd0: max BIO size = 4096 Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.626857] block drbd0: Adjusting my ra_pages to backing device's (32 -> 64) Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.631080] block drbd0: drbd_bm_resize called with capacity == 55349528 Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.635155] block drbd0: resync bitmap: bits=6918691 words=108105 pages=212 Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.639141] block drbd0: size = 26 GB (27674764 KB) Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.642539] block drbd0: Writing the whole bitmap, size changed Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.657231] block drbd0: bitmap WRITE of 212 pages took 3 jiffies Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.660845] block drbd0: 26 GB (6918691 bits) marked out-of-sync by on disk bit-map. Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.676939] block drbd0: recounting of set bits took additional 0 jiffies Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.680918] block drbd0: 26 GB (6918691 bits) marked out-of-sync by on disk bit-map. Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.685407] block drbd0: Suspended AL updates Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.687990] block drbd0: disk( Attaching -> Inconsistent ) Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.689388] block drbd0: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000 Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.708720] block drbd0: disk( Inconsistent -> Failed ) Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.711544] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.715410] block drbd0: 26 GB (6918691 bits) marked out-of-sync by on disk bit-map. Apr 10 23:26:20 ip-10-10-10-133 kernel: [ 344.720225] block drbd0: disk( Failed -> Diskless ) Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.791610] block drbd1: disk( Diskless -> Attaching ) Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.797369] block drbd1: max BIO size = 4096 Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.800188] block drbd1: Adjusting my ra_pages to backing device's (32 -> 64) Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.803812] block drbd1: drbd_bm_resize called with capacity == 55351576 Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.807475] block drbd1: resync bitmap: bits=6918947 words=108109 pages=212 Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.810562] block drbd1: size = 26 GB (27675788 KB) Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.813046] block drbd1: Writing the whole bitmap, size changed Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.824917] block drbd1: bitmap WRITE of 212 pages took 3 jiffies Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.827584] block drbd1: 26 GB (6918947 bits) marked out-of-sync by on disk bit-map. Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.839691] block drbd1: recounting of set bits took additional 0 jiffies Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.842504] block drbd1: 26 GB (6918947 bits) marked out-of-sync by on disk bit-map. Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.845647] block drbd1: Suspended AL updates Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.847406] block drbd1: disk( Attaching -> Inconsistent ) Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.849636] block drbd1: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000 Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.865072] block drbd1: disk( Inconsistent -> Failed ) Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.867306] block drbd1: bitmap WRITE of 0 pages took 0 jiffies Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.869791] block drbd1: 26 GB (6918947 bits) marked out-of-sync by on disk bit-map. Apr 10 23:26:21 ip-10-10-10-133 kernel: [ 344.873025] block drbd1: disk( Failed -> Diskless ) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.246893] block drbd0: disk( Diskless -> Attaching ) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.251592] block drbd0: max BIO size = 4096 Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.253348] block drbd0: Adjusting my ra_pages to backing device's (32 -> 64) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.256202] block drbd0: drbd_bm_resize called with capacity == 55349528 Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.258980] block drbd0: resync bitmap: bits=6918691 words=108105 pages=212 Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.261865] block drbd0: size = 26 GB (27674764 KB) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.264306] block drbd0: Writing the whole bitmap, size changed Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.282278] block drbd0: bitmap WRITE of 212 pages took 4 jiffies Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.284776] block drbd0: 26 GB (6918691 bits) marked out-of-sync by on disk bit-map. Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.306547] block drbd0: recounting of set bits took additional 0 jiffies Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.309311] block drbd0: 26 GB (6918691 bits) marked out-of-sync by on disk bit-map. Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.312469] block drbd0: Suspended AL updates Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.314241] block drbd0: disk( Attaching -> Inconsistent ) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.316457] block drbd0: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000 Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.331727] block drbd0: disk( Inconsistent -> Failed ) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.334009] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.336464] block drbd0: 26 GB (6918691 bits) marked out-of-sync by on disk bit-map. Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.339581] block drbd0: disk( Failed -> Diskless ) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.396619] block drbd1: disk( Diskless -> Attaching ) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.401370] block drbd1: max BIO size = 4096 Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.403098] block drbd1: Adjusting my ra_pages to backing device's (32 -> 64) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.405970] block drbd1: drbd_bm_resize called with capacity == 55351576 Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.408818] block drbd1: resync bitmap: bits=6918947 words=108109 pages=212 Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.411628] block drbd1: size = 26 GB (27675788 KB) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.414042] block drbd1: Writing the whole bitmap, size changed Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.426173] block drbd1: bitmap WRITE of 212 pages took 2 jiffies Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.428697] block drbd1: 26 GB (6918947 bits) marked out-of-sync by on disk bit-map. Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.442044] block drbd1: recounting of set bits took additional 0 jiffies Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.444816] block drbd1: 26 GB (6918947 bits) marked out-of-sync by on disk bit-map. Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.447943] block drbd1: Suspended AL updates Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.449758] block drbd1: disk( Attaching -> Inconsistent ) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.452174] block drbd1: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000 Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.466565] block drbd1: disk( Inconsistent -> Failed ) Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.468766] block drbd1: bitmap WRITE of 0 pages took 0 jiffies Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.471163] block drbd1: 26 GB (6918947 bits) marked out-of-sync by on disk bit-map. Apr 10 23:32:29 ip-10-10-10-133 kernel: [ 713.474277] block drbd1: disk( Failed -> Diskless ) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.047588] drbd data0: Starting worker thread (from drbdsetup-84 [14660]) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.051108] block drbd0: disk( Diskless -> Attaching ) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.053493] drbd data0: Method to ensure write ordering: flush Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.055839] block drbd0: max BIO size = 4096 Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.057627] block drbd0: Adjusting my ra_pages to backing device's (32 -> 64) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.060547] block drbd0: drbd_bm_resize called with capacity == 6291456 Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.063218] block drbd0: resync bitmap: bits=786432 words=12288 pages=24 Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.066006] block drbd0: size = 3072 MB (3145728 KB) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.068489] block drbd0: Writing the whole bitmap, size changed Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.074493] block drbd0: bitmap WRITE of 24 pages took 1 jiffies Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.076930] block drbd0: 3072 MB (786432 bits) marked out-of-sync by on disk bit-map. Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.094494] block drbd0: recounting of set bits took additional 0 jiffies Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.097247] block drbd0: 3072 MB (786368 bits) marked out-of-sync by on disk bit-map. Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.100469] block drbd0: disk( Attaching -> Inconsistent ) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.102726] block drbd0: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000 Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.125845] drbd data1: Starting worker thread (from drbdsetup-84 [14671]) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.129036] block drbd1: disk( Diskless -> Attaching ) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.131303] drbd data1: Method to ensure write ordering: flush Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.133788] block drbd1: max BIO size = 4096 Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.135571] block drbd1: Adjusting my ra_pages to backing device's (32 -> 64) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.138550] block drbd1: drbd_bm_resize called with capacity == 6291456 Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.141288] block drbd1: resync bitmap: bits=786432 words=12288 pages=24 Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.144073] block drbd1: size = 3072 MB (3145728 KB) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.146511] block drbd1: Writing the whole bitmap, size changed Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.150314] block drbd1: bitmap WRITE of 24 pages took 0 jiffies Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.152970] block drbd1: 3072 MB (786432 bits) marked out-of-sync by on disk bit-map. Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.170153] block drbd1: recounting of set bits took additional 0 jiffies Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.172982] block drbd1: 3072 MB (786368 bits) marked out-of-sync by on disk bit-map. Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.176192] block drbd1: disk( Attaching -> Inconsistent ) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.178509] block drbd1: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000 Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.313145] drbd data0: conn( StandAlone -> Unconnected ) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.315483] drbd data0: Starting receiver thread (from drbd_w_data0 [14663]) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.316110] drbd data1: conn( StandAlone -> Unconnected ) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.316128] drbd data1: Starting receiver thread (from drbd_w_data1 [14672]) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.316174] drbd data1: receiver (re)started Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.316205] drbd data1: conn( Unconnected -> WFConnection ) Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.327555] drbd data0: receiver (re)started Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.329347] drbd data0: conn( Unconnected -> WFConnection ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.225305] drbd data1: Handshake successful: Agreed network protocol version 101 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.228468] drbd data1: Agreed to support TRIM on protocol level Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.231053] drbd data1: conn( WFConnection -> WFReportParams ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233510] drbd data1: Starting asender thread (from drbd_r_data1 [14690]) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233512] drbd data0: Handshake successful: Agreed network protocol version 101 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233513] drbd data0: Agreed to support TRIM on protocol level Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233554] drbd data0: conn( WFConnection -> WFReportParams ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233558] drbd data0: Starting asender thread (from drbd_r_data0 [14693]) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.280068] block drbd1: max BIO size = 65536 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.281945] block drbd1: drbd_sync_handshake: Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.283747] block drbd1: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:786368 flags:0 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.284056] block drbd1: peer 00B04D9B652DC49A:0000000000000000:7AA448E7BFF704AD:7AA348E7BFF704AD bits:0 flags:0 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.292229] block drbd1: uuid_compare()=-2 by rule 20 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.294348] block drbd1: Becoming sync target due to disk states. Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.296895] block drbd1: Writing the whole bitmap, full sync required after drbd_sync_handshake. Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300072] block drbd0: max BIO size = 65536 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300077] block drbd0: drbd_sync_handshake: Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300078] block drbd0: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:786368 flags:0 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300080] block drbd0: peer 1B1BD292DDDF5BF8:0000000000000000:F8206DD104DCF398:F81F6DD104DCF399 bits:0 flags:0 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300081] block drbd0: uuid_compare()=-2 by rule 20 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300081] block drbd0: Becoming sync target due to disk states. Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300082] block drbd0: Writing the whole bitmap, full sync required after drbd_sync_handshake. Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.305996] block drbd0: bitmap WRITE of 24 pages took 1 jiffies Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.305997] block drbd0: 3072 MB (786432 bits) marked out-of-sync by on disk bit-map. Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.306376] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.308341] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0% Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.308359] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0% Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.308362] block drbd0: conn( WFBitMapT -> WFSyncUUID ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.311121] block drbd0: updated sync uuid C727CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.311582] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.313137] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.313150] block drbd0: conn( WFSyncUUID -> SyncTarget ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.313175] block drbd0: Began resync as SyncTarget (will sync 3145728 KB [786432 bits set]). Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.361230] block drbd1: bitmap WRITE of 24 pages took 1 jiffies Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.363758] block drbd1: 3072 MB (786432 bits) marked out-of-sync by on disk bit-map. Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.367399] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.374746] block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0% Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.379193] block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 100.0% Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.383397] block drbd1: conn( WFBitMapT -> WFSyncUUID ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.405761] block drbd1: updated sync uuid 65F0BDA711D50A14:0000000000000000:0000000000000000:0000000000000000 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.410324] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.414769] block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.418539] block drbd1: conn( WFSyncUUID -> SyncTarget ) Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.420911] block drbd1: Began resync as SyncTarget (will sync 3145728 KB [786432 bits set]). Apr 10 23:40:14 ip-10-10-10-133 kernel: [ 1178.418763] drbd data0: peer( Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Apr 10 23:40:14 ip-10-10-10-133 kernel: [ 1178.423017] drbd data0: short read (expected size 4096) Apr 10 23:40:14 ip-10-10-10-133 kernel: [ 1178.425132] drbd data0: error receiving RSDataReply, e: -5 l: 65536! [...] -- Best Regards, Radoslaw Garbacz XtremeData Incorporation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150414/c516ea87/attachment.htm>