[DRBD-user] "error receiving RSDataReply" during sync

Radoslaw Garbacz radoslaw.garbacz at xtremedatainc.com
Tue Apr 14 21:17:01 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I get occasionally an error when starting DRBD with a not synced peer.
Error is rare and results in an unusable drive, but surprisingly the
synchronization progresses, and the DRBD restart solves the problem. Seems
like some kind of race condition.

As one can see from the logs below, there are 2 DRBD resources (data0,
data1), where 2 nodes mirror each other (host1(primary:data0,
secondary:data1), host2(primary:data1, secondary:data0)). The DRBD
receivers keep restarting due to some error, but the synchronization
progresses, and eventually will finish, drives will remain unusable though.

During this repeated error/restart/connect client reports following errors:

lk_bdev_save(/var/lib/drbd/drbd-minor-0.lkbd) failed: No such file or
directory
lk_bdev_save(/var/lib/drbd/drbd-minor-1.lkbd) failed: No such file or
directory



* Details:
- Linux Kernel:
  Linux version 3.14.35-28.38.amzn1.x86_64 (mockbuild at gobi-build-64012)
(gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Wed Mar 11
22:50:37 UTC 2015

- DRBD:
  drbd: initialized. Version: 8.4.5 (api:1/proto:86-101)
  drbd: GIT-hash: f8d2f4f19bb065a1e505fc32bd98610f545e1fbf build by
root at ip-10-65-4-190, 2015-04-07 23:54:22


* Logs:

Apr 11 14:27:20 ip-10-10-10-133 kernel: [54404.488120] drbd data0: receiver
terminated
Apr 11 14:27:20 ip-10-10-10-133 kernel: [54404.489956] drbd data0:
Restarting receiver thread
Apr 11 14:27:20 ip-10-10-10-133 kernel: [54404.491898] drbd data0: receiver
(re)started
Apr 11 14:27:20 ip-10-10-10-133 kernel: [54404.493670] drbd data0: conn(
Unconnected -> WFConnection )
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.296091] drbd data0:
Handshake successful: Agreed network protocol version 101
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.299204] drbd data0: Agreed
to support TRIM on protocol level
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.301686] drbd data0: conn(
WFConnection -> WFReportParams )
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.304083] drbd data0: Starting
asender thread (from drbd_r_data0 [30986])
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.332074] block drbd0:
drbd_sync_handshake:
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.333881] block drbd0: self
CB53CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000
bits:731360 flags:0
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.336056] block drbd0: peer
1B1BD292DDDF5BF8:CB53CA7FFD8AA1AB:CB52CA7FFD8AA1AB:CB51CA7FFD8AA1AB
bits:731360 flags:0
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.342748] block drbd0:
uuid_compare()=-1 by rule 50
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.344878] block drbd0:
Becoming sync target due to disk states.
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.347655] block drbd0: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
-> UpToDate )
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.354849] block drbd0: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 26(1), total 26;
compression: 100.0%
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.359205] block drbd0: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 26(1), total 26;
compression: 100.0%
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.363333] block drbd0: conn(
WFBitMapT -> WFSyncUUID )
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.383740] block drbd0: updated
sync uuid
CB54CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.388389] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.393042] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.396775] block drbd0: conn(
WFSyncUUID -> SyncTarget )
Apr 11 14:27:21 ip-10-10-10-133 kernel: [54405.399003] block drbd0: Began
resync as SyncTarget (will sync 2925440 KB [731360 bits set]).

Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.078861] drbd data1: peer(
Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate ->
DUnknown )
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.083212] drbd data1: short
read (expected size 4096)
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.085476] drbd data1: error
receiving RSDataReply, e: -5 l: 65536!
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.088271] drbd data1: asender
terminated
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.090076] drbd data1:
Terminating drbd_a_data1
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.120091] drbd data1:
Connection closed
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.121777] drbd data1: conn(
BrokenPipe -> Unconnected )
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.124343] drbd data1: receiver
terminated
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.126096] drbd data1:
Restarting receiver thread
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.128217] drbd data1: receiver
(re)started
Apr 11 14:27:49 ip-10-10-10-133 kernel: [54433.130212] drbd data1: conn(
Unconnected -> WFConnection )
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.932069] drbd data1:
Handshake successful: Agreed network protocol version 101
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.936550] drbd data1: Agreed
to support TRIM on protocol level
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.939115] drbd data1: conn(
WFConnection -> WFReportParams )
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.941640] drbd data1: Starting
asender thread (from drbd_r_data1 [30983])
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.976076] block drbd1:
drbd_sync_handshake:
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.977945] block drbd1: self
6A10BDA711D50A14:0000000000000000:0000000000000000:0000000000000000
bits:784272 flags:0
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.980057] block drbd1: peer
00B04D9B652DC49A:6A10BDA711D50A14:6A0FBDA711D50A14:6A0EBDA711D50A14
bits:784272 flags:0
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.986833] block drbd1:
uuid_compare()=-1 by rule 50
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.988954] block drbd1:
Becoming sync target due to disk states.
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.991484] block drbd1: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
-> UpToDate )
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54433.996753] block drbd1: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 62(1), total 62;
compression: 100.0%
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.001216] block drbd1: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 62(1), total 62;
compression: 100.0%
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.005574] block drbd1: conn(
WFBitMapT -> WFSyncUUID )
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.025545] block drbd1: updated
sync uuid
6A11BDA711D50A14:0000000000000000:0000000000000000:0000000000000000
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.030232] block drbd1: helper
command: /sbin/drbdadm before-resync-target minor-1
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.034928] block drbd1: helper
command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.038706] block drbd1: conn(
WFSyncUUID -> SyncTarget )
Apr 11 14:27:50 ip-10-10-10-133 kernel: [54434.041035] block drbd1: Began
resync as SyncTarget (will sync 3137088 KB [784272 bits set]).
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.454870] drbd data0: peer(
Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate ->
DUnknown )

Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.459167] drbd data0: short
read (expected size 4096)
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.461306] drbd data0: error
receiving RSDataReply, e: -5 l: 65536!
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.463868] drbd data0: asender
terminated
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.465676] drbd data0:
Terminating drbd_a_data0
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.488085] drbd data0:
Connection closed
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.489874] drbd data0: conn(
BrokenPipe -> Unconnected )
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.492185] drbd data0: receiver
terminated
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.493930] drbd data0:
Restarting receiver thread
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.495910] drbd data0: receiver
(re)started
Apr 11 14:28:09 ip-10-10-10-133 kernel: [54453.497779] drbd data0: conn(
Unconnected -> WFConnection )
Apr 11 14:28:10 ip-10-10-10-133 kernel: [54454.300077] drbd data0:
Handshake successful: Agreed network protocol version 101

[...]

... and it keeps going on, when at the end:


pr 11 18:41:23 ip-10-10-10-133 kernel: [69646.953346] drbd data1: peer(
Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate ->
DUnknown )
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.957662] drbd data1: short
read (expected size 4096)
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.959790] drbd data1: error
receiving RSDataReply, e: -5 l: 65536!
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.962508] drbd data1: asender
terminated
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.964302] drbd data1:
Terminating drbd_a_data1
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.988099] drbd data1:
Connection closed
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.989826] drbd data1: conn(
BrokenPipe -> Unconnected )
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.992286] drbd data1: receiver
terminated
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.994114] drbd data1:
Restarting receiver thread
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.996141] drbd data1: receiver
(re)started
Apr 11 18:41:23 ip-10-10-10-133 kernel: [69646.997934] drbd data1: conn(
Unconnected -> WFConnection )
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.800084] drbd data1:
Handshake successful: Agreed network protocol version 101
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.803443] drbd data1: Agreed
to support TRIM on protocol level
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.807274] drbd data1: conn(
WFConnection -> WFReportParams )
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.810949] drbd data1: Starting
asender thread (from drbd_r_data1 [30983])
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.840073] block drbd1:
drbd_sync_handshake:
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.842730] block drbd1: self
6B40BDA711D50A14:0000000000000000:0000000000000000:0000000000000000
bits:784144 flags:0
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.844056] block drbd1: peer
00B04D9B652DC49A:6B40BDA711D50A14:6B3FBDA711D50A14:6B3EBDA711D50A14
bits:784144 flags:0
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.855179] block drbd1:
uuid_compare()=-1 by rule 50
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.858261] block drbd1:
Becoming sync target due to disk states.
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.861863] block drbd1: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
-> UpToDate )
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.873530] block drbd1: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 49(1), total 49;
compression: 100.0%
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.877773] block drbd1: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 49(1), total 49;
compression: 100.0%
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.881893] block drbd1: conn(
WFBitMapT -> WFSyncUUID )
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.909812] block drbd1: updated
sync uuid
6B41BDA711D50A14:0000000000000000:0000000000000000:0000000000000000
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.914703] block drbd1: helper
command: /sbin/drbdadm before-resync-target minor-1
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.919550] block drbd1: helper
command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.923385] block drbd1: conn(
WFSyncUUID -> SyncTarget )
Apr 11 18:41:24 ip-10-10-10-133 kernel: [69647.925739] block drbd1: Began
resync as SyncTarget (will sync 3136576 KB [784144 bits set]).

Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.105348] drbd data0: peer(
Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate ->
DUnknown )
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.109937] drbd data0: short
read (expected size 4096)
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.112268] drbd data0: error
receiving RSDataReply, e: -5 l: 65536!
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.115098] drbd data0: asender
terminated
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.116878] drbd data0:
Terminating drbd_a_data0
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.198211] drbd data0:
Connection closed
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.200013] drbd data0: conn(
BrokenPipe -> Unconnected )
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.202312] drbd data0: receiver
terminated
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.204123] drbd data0:
Restarting receiver thread
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.206142] drbd data0: receiver
(re)started
Apr 11 18:41:32 ip-10-10-10-133 kernel: [69656.207969] drbd data0: conn(
Unconnected -> WFConnection )
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.010548] drbd data0:
Handshake successful: Agreed network protocol version 101
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.013844] drbd data0: Agreed
to support TRIM on protocol level
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.016327] drbd data0: conn(
WFConnection -> WFReportParams )
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.018874] drbd data0: Starting
asender thread (from drbd_r_data0 [30986])
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.048072] block drbd0:
drbd_sync_handshake:
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.049985] block drbd0: self
CC85CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000
bits:731360 flags:0
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.052055] block drbd0: peer
1B1BD292DDDF5BF8:CC85CA7FFD8AA1AB:CC84CA7FFD8AA1AB:CC83CA7FFD8AA1AB
bits:731360 flags:0
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.058665] block drbd0:
uuid_compare()=-1 by rule 50
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.060762] block drbd0:
Becoming sync target due to disk states.
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.063247] block drbd0: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
-> UpToDate )
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.085244] block drbd0: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 26(1), total 26;
compression: 100.0%
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.089599] block drbd0: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 26(1), total 26;
compression: 100.0%
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.093732] block drbd0: conn(
WFBitMapT -> WFSyncUUID )
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.124153] block drbd0: updated
sync uuid
CC86CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.128792] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.133632] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.137459] block drbd0: conn(
WFSyncUUID -> SyncTarget )
Apr 11 18:41:33 ip-10-10-10-133 kernel: [69657.139704] block drbd0: Began
resync as SyncTarget (will sync 2925440 KB [731360 bits set]).

Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.122721] block drbd1: Resync
done (total 63 sec; paused 0 sec; 49784 K/sec)
Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.125366] block drbd1: updated
UUIDs 00B04D9B652DC49A:0000000000000000:6B41BDA711D50A14:6B40BDA711D50A14
Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.126695] block drbd1: conn(
SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.132999] block drbd1: helper
command: /sbin/drbdadm after-resync-target minor-1
Apr 11 18:42:27 ip-10-10-10-133 kernel: [69711.137767] block drbd1: helper
command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0)

Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.285914] block drbd0: Resync
done (total 61 sec; paused 0 sec; 47956 K/sec)
Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.288864] block drbd0: updated
UUIDs 1B1BD292DDDF5BF8:0000000000000000:CC86CA7FFD8AA1AA:CC85CA7FFD8AA1AB
Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.289893] block drbd0: conn(
SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.296068] block drbd0: helper
command: /sbin/drbdadm after-resync-target minor-0
Apr 11 18:42:34 ip-10-10-10-133 kernel: [69718.300961] block drbd0: helper
command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)

[DRBD END]


* At the end the DRBD startup logs (the longest, thus placed at the end)

[DRBD BEGIN]

Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.617812] block drbd0: disk(
Diskless -> Attaching )
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.624329] block drbd0: max BIO
size = 4096
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.626857] block drbd0:
Adjusting my ra_pages to backing device's (32 -> 64)
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.631080] block drbd0:
drbd_bm_resize called with capacity == 55349528
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.635155] block drbd0: resync
bitmap: bits=6918691 words=108105 pages=212
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.639141] block drbd0: size =
26 GB (27674764 KB)
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.642539] block drbd0: Writing
the whole bitmap, size changed
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.657231] block drbd0: bitmap
WRITE of 212 pages took 3 jiffies
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.660845] block drbd0: 26 GB
(6918691 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.676939] block drbd0:
recounting of set bits took additional 0 jiffies
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.680918] block drbd0: 26 GB
(6918691 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.685407] block drbd0:
Suspended AL updates
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.687990] block drbd0: disk(
Attaching -> Inconsistent )
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.689388] block drbd0:
attached to UUIDs
0000000000000004:0000000000000000:0000000000000000:0000000000000000
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.708720] block drbd0: disk(
Inconsistent -> Failed )
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.711544] block drbd0: bitmap
WRITE of 0 pages took 0 jiffies
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.715410] block drbd0: 26 GB
(6918691 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:26:20 ip-10-10-10-133 kernel: [  344.720225] block drbd0: disk(
Failed -> Diskless )
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.791610] block drbd1: disk(
Diskless -> Attaching )
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.797369] block drbd1: max BIO
size = 4096
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.800188] block drbd1:
Adjusting my ra_pages to backing device's (32 -> 64)
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.803812] block drbd1:
drbd_bm_resize called with capacity == 55351576
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.807475] block drbd1: resync
bitmap: bits=6918947 words=108109 pages=212
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.810562] block drbd1: size =
26 GB (27675788 KB)
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.813046] block drbd1: Writing
the whole bitmap, size changed
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.824917] block drbd1: bitmap
WRITE of 212 pages took 3 jiffies
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.827584] block drbd1: 26 GB
(6918947 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.839691] block drbd1:
recounting of set bits took additional 0 jiffies
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.842504] block drbd1: 26 GB
(6918947 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.845647] block drbd1:
Suspended AL updates
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.847406] block drbd1: disk(
Attaching -> Inconsistent )
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.849636] block drbd1:
attached to UUIDs
0000000000000004:0000000000000000:0000000000000000:0000000000000000
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.865072] block drbd1: disk(
Inconsistent -> Failed )
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.867306] block drbd1: bitmap
WRITE of 0 pages took 0 jiffies
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.869791] block drbd1: 26 GB
(6918947 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:26:21 ip-10-10-10-133 kernel: [  344.873025] block drbd1: disk(
Failed -> Diskless )
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.246893] block drbd0: disk(
Diskless -> Attaching )
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.251592] block drbd0: max BIO
size = 4096
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.253348] block drbd0:
Adjusting my ra_pages to backing device's (32 -> 64)
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.256202] block drbd0:
drbd_bm_resize called with capacity == 55349528
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.258980] block drbd0: resync
bitmap: bits=6918691 words=108105 pages=212
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.261865] block drbd0: size =
26 GB (27674764 KB)
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.264306] block drbd0: Writing
the whole bitmap, size changed
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.282278] block drbd0: bitmap
WRITE of 212 pages took 4 jiffies
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.284776] block drbd0: 26 GB
(6918691 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.306547] block drbd0:
recounting of set bits took additional 0 jiffies
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.309311] block drbd0: 26 GB
(6918691 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.312469] block drbd0:
Suspended AL updates
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.314241] block drbd0: disk(
Attaching -> Inconsistent )
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.316457] block drbd0:
attached to UUIDs
0000000000000004:0000000000000000:0000000000000000:0000000000000000
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.331727] block drbd0: disk(
Inconsistent -> Failed )
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.334009] block drbd0: bitmap
WRITE of 0 pages took 0 jiffies
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.336464] block drbd0: 26 GB
(6918691 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.339581] block drbd0: disk(
Failed -> Diskless )
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.396619] block drbd1: disk(
Diskless -> Attaching )
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.401370] block drbd1: max BIO
size = 4096
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.403098] block drbd1:
Adjusting my ra_pages to backing device's (32 -> 64)
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.405970] block drbd1:
drbd_bm_resize called with capacity == 55351576
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.408818] block drbd1: resync
bitmap: bits=6918947 words=108109 pages=212
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.411628] block drbd1: size =
26 GB (27675788 KB)
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.414042] block drbd1: Writing
the whole bitmap, size changed
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.426173] block drbd1: bitmap
WRITE of 212 pages took 2 jiffies
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.428697] block drbd1: 26 GB
(6918947 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.442044] block drbd1:
recounting of set bits took additional 0 jiffies
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.444816] block drbd1: 26 GB
(6918947 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.447943] block drbd1:
Suspended AL updates
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.449758] block drbd1: disk(
Attaching -> Inconsistent )
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.452174] block drbd1:
attached to UUIDs
0000000000000004:0000000000000000:0000000000000000:0000000000000000
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.466565] block drbd1: disk(
Inconsistent -> Failed )
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.468766] block drbd1: bitmap
WRITE of 0 pages took 0 jiffies
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.471163] block drbd1: 26 GB
(6918947 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:32:29 ip-10-10-10-133 kernel: [  713.474277] block drbd1: disk(
Failed -> Diskless )
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.047588] drbd data0: Starting
worker thread (from drbdsetup-84 [14660])
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.051108] block drbd0: disk(
Diskless -> Attaching )
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.053493] drbd data0: Method
to ensure write ordering: flush
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.055839] block drbd0: max BIO
size = 4096
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.057627] block drbd0:
Adjusting my ra_pages to backing device's (32 -> 64)
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.060547] block drbd0:
drbd_bm_resize called with capacity == 6291456
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.063218] block drbd0: resync
bitmap: bits=786432 words=12288 pages=24
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.066006] block drbd0: size =
3072 MB (3145728 KB)
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.068489] block drbd0: Writing
the whole bitmap, size changed
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.074493] block drbd0: bitmap
WRITE of 24 pages took 1 jiffies
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.076930] block drbd0: 3072 MB
(786432 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.094494] block drbd0:
recounting of set bits took additional 0 jiffies
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.097247] block drbd0: 3072 MB
(786368 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.100469] block drbd0: disk(
Attaching -> Inconsistent )
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.102726] block drbd0:
attached to UUIDs
0000000000000004:0000000000000000:0000000000000000:0000000000000000
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.125845] drbd data1: Starting
worker thread (from drbdsetup-84 [14671])
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.129036] block drbd1: disk(
Diskless -> Attaching )
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.131303] drbd data1: Method
to ensure write ordering: flush
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.133788] block drbd1: max BIO
size = 4096
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.135571] block drbd1:
Adjusting my ra_pages to backing device's (32 -> 64)
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.138550] block drbd1:
drbd_bm_resize called with capacity == 6291456
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.141288] block drbd1: resync
bitmap: bits=786432 words=12288 pages=24
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.144073] block drbd1: size =
3072 MB (3145728 KB)
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.146511] block drbd1: Writing
the whole bitmap, size changed
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.150314] block drbd1: bitmap
WRITE of 24 pages took 0 jiffies
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.152970] block drbd1: 3072 MB
(786432 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.170153] block drbd1:
recounting of set bits took additional 0 jiffies
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.172982] block drbd1: 3072 MB
(786368 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.176192] block drbd1: disk(
Attaching -> Inconsistent )
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.178509] block drbd1:
attached to UUIDs
0000000000000004:0000000000000000:0000000000000000:0000000000000000
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.313145] drbd data0: conn(
StandAlone -> Unconnected )
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.315483] drbd data0: Starting
receiver thread (from drbd_w_data0 [14663])
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.316110] drbd data1: conn(
StandAlone -> Unconnected )
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.316128] drbd data1: Starting
receiver thread (from drbd_w_data1 [14672])
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.316174] drbd data1: receiver
(re)started
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.316205] drbd data1: conn(
Unconnected -> WFConnection )
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.327555] drbd data0: receiver
(re)started
Apr 10 23:39:25 ip-10-10-10-133 kernel: [ 1129.329347] drbd data0: conn(
Unconnected -> WFConnection )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.225305] drbd data1:
Handshake successful: Agreed network protocol version 101
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.228468] drbd data1: Agreed
to support TRIM on protocol level
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.231053] drbd data1: conn(
WFConnection -> WFReportParams )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233510] drbd data1: Starting
asender thread (from drbd_r_data1 [14690])
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233512] drbd data0:
Handshake successful: Agreed network protocol version 101
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233513] drbd data0: Agreed
to support TRIM on protocol level
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233554] drbd data0: conn(
WFConnection -> WFReportParams )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.233558] drbd data0: Starting
asender thread (from drbd_r_data0 [14693])
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.280068] block drbd1: max BIO
size = 65536
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.281945] block drbd1:
drbd_sync_handshake:
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.283747] block drbd1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:786368 flags:0
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.284056] block drbd1: peer
00B04D9B652DC49A:0000000000000000:7AA448E7BFF704AD:7AA348E7BFF704AD bits:0
flags:0
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.292229] block drbd1:
uuid_compare()=-2 by rule 20
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.294348] block drbd1:
Becoming sync target due to disk states.
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.296895] block drbd1: Writing
the whole bitmap, full sync required after drbd_sync_handshake.
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300072] block drbd0: max BIO
size = 65536
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300077] block drbd0:
drbd_sync_handshake:
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300078] block drbd0: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:786368 flags:0
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300080] block drbd0: peer
1B1BD292DDDF5BF8:0000000000000000:F8206DD104DCF398:F81F6DD104DCF399 bits:0
flags:0
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300081] block drbd0:
uuid_compare()=-2 by rule 20
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300081] block drbd0:
Becoming sync target due to disk states.
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.300082] block drbd0: Writing
the whole bitmap, full sync required after drbd_sync_handshake.
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.305996] block drbd0: bitmap
WRITE of 24 pages took 1 jiffies
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.305997] block drbd0: 3072 MB
(786432 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.306376] block drbd0: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
-> UpToDate )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.308341] block drbd0: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21;
compression: 100.0%
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.308359] block drbd0: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21;
compression: 100.0%
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.308362] block drbd0: conn(
WFBitMapT -> WFSyncUUID )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.311121] block drbd0: updated
sync uuid
C727CA7FFD8AA1AA:0000000000000000:0000000000000000:0000000000000000
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.311582] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.313137] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.313150] block drbd0: conn(
WFSyncUUID -> SyncTarget )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.313175] block drbd0: Began
resync as SyncTarget (will sync 3145728 KB [786432 bits set]).
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.361230] block drbd1: bitmap
WRITE of 24 pages took 1 jiffies
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.363758] block drbd1: 3072 MB
(786432 bits) marked out-of-sync by on disk bit-map.
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.367399] block drbd1: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
-> UpToDate )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.374746] block drbd1: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21;
compression: 100.0%
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.379193] block drbd1: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21;
compression: 100.0%
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.383397] block drbd1: conn(
WFBitMapT -> WFSyncUUID )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.405761] block drbd1: updated
sync uuid
65F0BDA711D50A14:0000000000000000:0000000000000000:0000000000000000
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.410324] block drbd1: helper
command: /sbin/drbdadm before-resync-target minor-1
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.414769] block drbd1: helper
command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0)
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.418539] block drbd1: conn(
WFSyncUUID -> SyncTarget )
Apr 10 23:39:26 ip-10-10-10-133 kernel: [ 1130.420911] block drbd1: Began
resync as SyncTarget (will sync 3145728 KB [786432 bits set]).
Apr 10 23:40:14 ip-10-10-10-133 kernel: [ 1178.418763] drbd data0: peer(
Secondary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate ->
DUnknown )
Apr 10 23:40:14 ip-10-10-10-133 kernel: [ 1178.423017] drbd data0: short
read (expected size 4096)
Apr 10 23:40:14 ip-10-10-10-133 kernel: [ 1178.425132] drbd data0: error
receiving RSDataReply, e: -5 l: 65536!

[...]


-- 
Best Regards,

Radoslaw Garbacz
XtremeData Incorporation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150414/c516ea87/attachment.htm>


More information about the drbd-user mailing list