[DRBD-user] DRBD Initial Sync error receiving RSDataRequest, l: 24!

Ian MacDonald ian+drbd at dotto-one.com
Mon Jul 14 20:46:01 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have two hosts configured as Primary-Secondary on Debian 7 (current
stable) and DRBD (debian packaged 8.3.13-2)

Both are VPS hosts, host1 (kb) is with Chicago VPS (KVM/SoluzVM on Intel
CacheCade SSD Cached Disk) and host2 (backup) is with AWS.

Upon initial sync, the primary host1 (kb) seems to generate the following
error (detailed syslogs below)

Jul 14 12:08:03 kb kernel: [  990.382914] block drbd0: Began resync as
SyncSource (will sync 15071232 KB [3767808 bits set]).
Jul 14 12:08:03 kb kernel: [  990.382928] block drbd0: updated sync UUID
177C0F56441E807B:0082000000000004:0081000000000004:0080000000000004
Jul 14 12:08:03 kb kernel: [  990.400257] block drbd0:
/build/linux-baBndT/linux-3.2.60/drivers/block/drbd/drbd_receiver.c:1953:
sector: 0s, size: 262144
Jul 14 12:08:03 kb kernel: [  990.401997] block drbd0: *error receiving
RSDataRequest, l: 24!*
Jul 14 12:08:03 kb kernel: [  990.402918] block drbd0: peer( Secondary ->
Unknown ) conn( SyncSource -> ProtocolError )

My research on this issue leads to a few similar instances, all dead-ends:
http://lists.linbit.com/pipermail/drbd-user/2014-February/020601.html
https://forums.suse.com/showthread.php?292-DRBD-will-not-start-the-sync-process
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648963

After setting up the configuration, with host1 (kb) as (UptoDate, Primary,
WFConnection) on the host2 (backup) I checked dstate=Inconsistent and
role=Secondary and then
#drbdadm disconnect resource
#drbdadm invalidate resource
#drbdadm -- --discard-my-data connect root

And receive the error continuously as described below in the logs on each.
 I tried protocol C, removed the handlers, verify-alg to no avail.

The AWS host2 (backup) has different public IP and local IP, so I manually
changed the IP on host2 (backup) to reflect the local IP once I have copied
over the resource configuration.  The hosts have ICMP and TCP 7789
connectivity between them.

any help appreciated,
cheers,
Ian


The resource configuration is as follows:

resource root {
       protocol A;
       handlers {
               pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh";
               pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh";
               local-io-error "/usr/lib/drbd/notify-io-error.sh";
       split-brain "/usr/lib/drbd/notify-split-brain.sh root";
       }
       startup {
               become-primary-on kb;
               # Wait 10 seconds on boot until the peer connects.
               wfc-timeout 10;
       }
       net {
               data-integrity-alg crc32c;
       after-sb-0pri discard-younger-primary;
            after-sb-1pri discard-secondary;
               after-sb-2pri disconnect;
       }
       syncer {
               rate 10M;
               verify-alg crc32c;
       }
       on kb {
               device /dev/drbd0;
               disk /dev/vda1;
               address 172.245.43.142:7789;
               meta-disk /dev/vda2[0];
       }
       on backup {
               device /dev/drbd0;
               disk /dev/xvdb1;
               address 54.88.198.212:7789;
               meta-disk /dev/xvdb2[0];
       }
}

On host1 (kb) I see this:
Jul 14 12:06:06 kb kernel: [  872.748456] block drbd0: Restarting
drbd0_receiver
Jul 14 12:06:06 kb kernel: [  872.748459] block drbd0: receiver (re)started
Jul 14 12:06:06 kb kernel: [  872.748464] block drbd0: conn( Unconnected ->
WFConnection )
Jul 14 12:06:06 kb kernel: [  873.496352] block drbd0: Handshake
successful: Agreed network protocol version 96
Jul 14 12:06:06 kb kernel: [  873.496364] block drbd0: conn( WFConnection
-> WFReportParams )
Jul 14 12:06:06 kb kernel: [  873.496385] block drbd0: Starting asender
thread (from drbd0_receiver [2652])
Jul 14 12:06:06 kb kernel: [  873.504935] block drbd0: data-integrity-alg:
crc32c
Jul 14 12:06:06 kb kernel: [  873.504958] block drbd0: drbd_sync_handshake:
Jul 14 12:06:06 kb kernel: [  873.504963] block drbd0: self
177C0F56441E807B:0071000000000004:0070000000000004:006F000000000004
bits:3767808 flags:0
Jul 14 12:06:06 kb kernel: [  873.504968] block drbd0: peer
0071000000000004:0000000000000000:0000000000000000:0000000000000000
bits:3767808 flags:0
Jul 14 12:06:06 kb kernel: [  873.504973] block drbd0: uuid_compare()=1 by
rule 70
Jul 14 12:06:06 kb kernel: [  873.504976] block drbd0: Becoming sync source
due to disk states.
Jul 14 12:06:06 kb kernel: [  873.504985] block drbd0: peer( Unknown ->
Secondary ) conn( WFReportParams -> WFBitMapS )
Jul 14 12:06:07 kb kernel: [  873.694631] block drbd0: helper command:
/sbin/drbdadm before-resync-source minor-0
Jul 14 12:06:07 kb kernel: [  873.697722] block drbd0: helper command:
/sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
Jul 14 12:06:07 kb kernel: [  873.697733] block drbd0: conn( WFBitMapS ->
SyncSource )
Jul 14 12:06:07 kb kernel: [  873.697742] block drbd0: Began resync as
SyncSource (will sync 15071232 KB [3767808 bits set]).
Jul 14 12:06:07 kb kernel: [  873.697758] block drbd0: updated sync UUID
177C0F56441E807B:0072000000000004:0071000000000004:0070000000000004
Jul 14 12:06:07 kb kernel: [  873.716189] block drbd0:
/build/linux-baBndT/linux-3.2.60/drivers/block/drbd/drbd_receiver.c:1953:
sector: 0s, size: 262144
Jul 14 12:06:07 kb kernel: [  873.717179] block drbd0: error receiving
RSDataRequest, l: 24!
Jul 14 12:06:07 kb kernel: [  873.717692] block drbd0: peer( Secondary ->
Unknown ) conn( SyncSource -> ProtocolError )
Jul 14 12:06:07 kb kernel: [  873.718441] block drbd0: bitmap WRITE of 115
pages took 0 jiffies
Jul 14 12:06:07 kb kernel: [  873.719408] block drbd0: asender terminated
Jul 14 12:06:07 kb kernel: [  873.719414] block drbd0: Terminating
drbd0_asender
Jul 14 12:06:07 kb kernel: [  873.719897] block drbd0: 14 GB (3767808 bits)
marked out-of-sync by on disk bit-map.
Jul 14 12:06:07 kb kernel: [  873.719907] block drbd0: Connection closed
Jul 14 12:06:07 kb kernel: [  873.719912] block drbd0: conn( ProtocolError
-> Unconnected )
Jul 14 12:06:07 kb kernel: [  873.719916] block drbd0: receiver terminated
Jul 14 12:06:07 kb kernel: [  873.719918] block drbd0: Restarting
drbd0_receiver
Jul 14 12:06:07 kb kernel: [  873.719921] block drbd0: receiver (re)started
Jul 14 12:06:07 kb kernel: [  873.719924] block drbd0: conn( Unconnected ->
WFConnection )
Jul 14 12:06:07 kb kernel: [  874.472337] block drbd0: Handshake
successful: Agreed network protocol version 96

On host2(backup) I see this:
Jul 14 12:06:05 backup kernel: [22632330.627435] block drbd0: Restarting
drbd0_receiver
Jul 14 12:06:05 backup kernel: [22632330.627440] block drbd0: receiver
(re)started
Jul 14 12:06:05 backup kernel: [22632330.627447] block drbd0: conn(
Unconnected -> WFConnection )
Jul 14 12:06:05 backup kernel: [22632331.366181] block drbd0: Handshake
successful: Agreed network protocol version 96
Jul 14 12:06:05 backup kernel: [22632331.366209] block drbd0: conn(
WFConnection -> WFReportParams )
Jul 14 12:06:05 backup kernel: [22632331.366236] block drbd0: Starting
asender thread (from drbd0_receiver [2425])
Jul 14 12:06:05 backup kernel: [22632331.366497] block drbd0:
data-integrity-alg: crc32c
Jul 14 12:06:05 backup kernel: [22632331.366571] block drbd0: max BIO size
= 4294966784
Jul 14 12:06:05 backup kernel: [22632331.366614] block drbd0:
drbd_sync_handshake:
Jul 14 12:06:05 backup kernel: [22632331.366622] block drbd0: self
0070000000000004:0000000000000000:0000000000000000:0000000000000000
bits:3767808 flags:0
Jul 14 12:06:05 backup kernel: [22632331.366632] block drbd0: peer
177C0F56441E807B:0070000000000004:006F000000000004:006E000000000004
bits:3767808 flags:2
Jul 14 12:06:05 backup kernel: [22632331.366642] block drbd0:
uuid_compare()=-1 by rule 50
Jul 14 12:06:05 backup kernel: [22632331.366647] block drbd0: Becoming sync
target due to disk states.
Jul 14 12:06:05 backup kernel: [22632331.366658] block drbd0: peer( Unknown
-> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate
)
Jul 14 12:06:06 backup kernel: [22632331.529400] block drbd0: conn(
WFBitMapT -> WFSyncUUID )
Jul 14 12:06:06 backup kernel: [22632331.574618] block drbd0: updated sync
uuid 0071000000000004:0000000000000000:0000000000000000:0000000000000000
Jul 14 12:06:06 backup kernel: [22632331.575600] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0
Jul 14 12:06:06 backup kernel: [22632331.580785] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Jul 14 12:06:06 backup kernel: [22632331.580806] block drbd0: conn(
WFSyncUUID -> SyncTarget )
Jul 14 12:06:06 backup kernel: [22632331.580817] block drbd0: Began resync
as SyncTarget (will sync 15071232 KB [3767808 bits set]).
Jul 14 12:06:06 backup kernel: [22632331.593772] block drbd0: sock was shut
down by peer
Jul 14 12:06:06 backup kernel: [22632331.593785] block drbd0: peer( Primary
-> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Jul 14 12:06:06 backup kernel: [22632331.593797] block drbd0: short read
expecting header on sock: r=0
Jul 14 12:06:06 backup kernel: [22632331.594563] block drbd0: asender
terminated
Jul 14 12:06:06 backup kernel: [22632331.594574] block drbd0: Terminating
drbd0_asender
Jul 14 12:06:06 backup kernel: [22632331.609118] block drbd0: bitmap WRITE
of 115 pages took 4 jiffies
Jul 14 12:06:06 backup kernel: [22632331.609133] block drbd0: 14 GB
(3767808 bits) marked out-of-sync by on disk bit-map.
Jul 14 12:06:06 backup kernel: [22632331.609148] block drbd0: Connection
closed
Jul 14 12:06:06 backup kernel: [22632331.609156] block drbd0: conn(
BrokenPipe -> Unconnected )
Jul 14 12:06:06 backup kernel: [22632331.609164] block drbd0: receiver
terminated
Jul 14 12:06:06 backup kernel: [22632331.609169] block drbd0: Restarting
drbd0_receiver
Jul 14 12:06:06 backup kernel: [22632331.609174] block drbd0: receiver
(re)started
Jul 14 12:06:06 backup kernel: [22632331.609181] block drbd0: conn(
Unconnected -> WFConnection )
Jul 14 12:06:06 backup kernel: [22632332.342057] block drbd0: Handshake
successful: Agreed network protocol version 96

Both have external metadata on the second primary partition, and the
partition layout is simple with identical block sizes.
KB:
   Device Boot      Start         End      Blocks   Id  System
/dev/vda1   *        2048    30144511    15071232   83  Linux
/dev/vda2        30144512    30408703      132096   83  Linux

BACKUP:
    Device Boot      Start         End      Blocks   Id  System
/dev/xvdb1            2048    30144511    15071232   83  Linux
/dev/xvdb2        30144512    30408703      132096   83  Linux
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140714/00fd8565/attachment.htm>


More information about the drbd-user mailing list