Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have two hosts configured as Primary-Secondary on Debian 7 (current stable) and DRBD (debian packaged 8.3.13-2) Both are VPS hosts, host1 (kb) is with Chicago VPS (KVM/SoluzVM on Intel CacheCade SSD Cached Disk) and host2 (backup) is with AWS. Upon initial sync, the primary host1 (kb) seems to generate the following error (detailed syslogs below) Jul 14 12:08:03 kb kernel: [ 990.382914] block drbd0: Began resync as SyncSource (will sync 15071232 KB [3767808 bits set]). Jul 14 12:08:03 kb kernel: [ 990.382928] block drbd0: updated sync UUID 177C0F56441E807B:0082000000000004:0081000000000004:0080000000000004 Jul 14 12:08:03 kb kernel: [ 990.400257] block drbd0: /build/linux-baBndT/linux-3.2.60/drivers/block/drbd/drbd_receiver.c:1953: sector: 0s, size: 262144 Jul 14 12:08:03 kb kernel: [ 990.401997] block drbd0: *error receiving RSDataRequest, l: 24!* Jul 14 12:08:03 kb kernel: [ 990.402918] block drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError ) My research on this issue leads to a few similar instances, all dead-ends: http://lists.linbit.com/pipermail/drbd-user/2014-February/020601.html https://forums.suse.com/showthread.php?292-DRBD-will-not-start-the-sync-process https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648963 After setting up the configuration, with host1 (kb) as (UptoDate, Primary, WFConnection) on the host2 (backup) I checked dstate=Inconsistent and role=Secondary and then #drbdadm disconnect resource #drbdadm invalidate resource #drbdadm -- --discard-my-data connect root And receive the error continuously as described below in the logs on each. I tried protocol C, removed the handlers, verify-alg to no avail. The AWS host2 (backup) has different public IP and local IP, so I manually changed the IP on host2 (backup) to reflect the local IP once I have copied over the resource configuration. The hosts have ICMP and TCP 7789 connectivity between them. any help appreciated, cheers, Ian The resource configuration is as follows: resource root { protocol A; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh"; local-io-error "/usr/lib/drbd/notify-io-error.sh"; split-brain "/usr/lib/drbd/notify-split-brain.sh root"; } startup { become-primary-on kb; # Wait 10 seconds on boot until the peer connects. wfc-timeout 10; } net { data-integrity-alg crc32c; after-sb-0pri discard-younger-primary; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } syncer { rate 10M; verify-alg crc32c; } on kb { device /dev/drbd0; disk /dev/vda1; address 172.245.43.142:7789; meta-disk /dev/vda2[0]; } on backup { device /dev/drbd0; disk /dev/xvdb1; address 54.88.198.212:7789; meta-disk /dev/xvdb2[0]; } } On host1 (kb) I see this: Jul 14 12:06:06 kb kernel: [ 872.748456] block drbd0: Restarting drbd0_receiver Jul 14 12:06:06 kb kernel: [ 872.748459] block drbd0: receiver (re)started Jul 14 12:06:06 kb kernel: [ 872.748464] block drbd0: conn( Unconnected -> WFConnection ) Jul 14 12:06:06 kb kernel: [ 873.496352] block drbd0: Handshake successful: Agreed network protocol version 96 Jul 14 12:06:06 kb kernel: [ 873.496364] block drbd0: conn( WFConnection -> WFReportParams ) Jul 14 12:06:06 kb kernel: [ 873.496385] block drbd0: Starting asender thread (from drbd0_receiver [2652]) Jul 14 12:06:06 kb kernel: [ 873.504935] block drbd0: data-integrity-alg: crc32c Jul 14 12:06:06 kb kernel: [ 873.504958] block drbd0: drbd_sync_handshake: Jul 14 12:06:06 kb kernel: [ 873.504963] block drbd0: self 177C0F56441E807B:0071000000000004:0070000000000004:006F000000000004 bits:3767808 flags:0 Jul 14 12:06:06 kb kernel: [ 873.504968] block drbd0: peer 0071000000000004:0000000000000000:0000000000000000:0000000000000000 bits:3767808 flags:0 Jul 14 12:06:06 kb kernel: [ 873.504973] block drbd0: uuid_compare()=1 by rule 70 Jul 14 12:06:06 kb kernel: [ 873.504976] block drbd0: Becoming sync source due to disk states. Jul 14 12:06:06 kb kernel: [ 873.504985] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) Jul 14 12:06:07 kb kernel: [ 873.694631] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 Jul 14 12:06:07 kb kernel: [ 873.697722] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0) Jul 14 12:06:07 kb kernel: [ 873.697733] block drbd0: conn( WFBitMapS -> SyncSource ) Jul 14 12:06:07 kb kernel: [ 873.697742] block drbd0: Began resync as SyncSource (will sync 15071232 KB [3767808 bits set]). Jul 14 12:06:07 kb kernel: [ 873.697758] block drbd0: updated sync UUID 177C0F56441E807B:0072000000000004:0071000000000004:0070000000000004 Jul 14 12:06:07 kb kernel: [ 873.716189] block drbd0: /build/linux-baBndT/linux-3.2.60/drivers/block/drbd/drbd_receiver.c:1953: sector: 0s, size: 262144 Jul 14 12:06:07 kb kernel: [ 873.717179] block drbd0: error receiving RSDataRequest, l: 24! Jul 14 12:06:07 kb kernel: [ 873.717692] block drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError ) Jul 14 12:06:07 kb kernel: [ 873.718441] block drbd0: bitmap WRITE of 115 pages took 0 jiffies Jul 14 12:06:07 kb kernel: [ 873.719408] block drbd0: asender terminated Jul 14 12:06:07 kb kernel: [ 873.719414] block drbd0: Terminating drbd0_asender Jul 14 12:06:07 kb kernel: [ 873.719897] block drbd0: 14 GB (3767808 bits) marked out-of-sync by on disk bit-map. Jul 14 12:06:07 kb kernel: [ 873.719907] block drbd0: Connection closed Jul 14 12:06:07 kb kernel: [ 873.719912] block drbd0: conn( ProtocolError -> Unconnected ) Jul 14 12:06:07 kb kernel: [ 873.719916] block drbd0: receiver terminated Jul 14 12:06:07 kb kernel: [ 873.719918] block drbd0: Restarting drbd0_receiver Jul 14 12:06:07 kb kernel: [ 873.719921] block drbd0: receiver (re)started Jul 14 12:06:07 kb kernel: [ 873.719924] block drbd0: conn( Unconnected -> WFConnection ) Jul 14 12:06:07 kb kernel: [ 874.472337] block drbd0: Handshake successful: Agreed network protocol version 96 On host2(backup) I see this: Jul 14 12:06:05 backup kernel: [22632330.627435] block drbd0: Restarting drbd0_receiver Jul 14 12:06:05 backup kernel: [22632330.627440] block drbd0: receiver (re)started Jul 14 12:06:05 backup kernel: [22632330.627447] block drbd0: conn( Unconnected -> WFConnection ) Jul 14 12:06:05 backup kernel: [22632331.366181] block drbd0: Handshake successful: Agreed network protocol version 96 Jul 14 12:06:05 backup kernel: [22632331.366209] block drbd0: conn( WFConnection -> WFReportParams ) Jul 14 12:06:05 backup kernel: [22632331.366236] block drbd0: Starting asender thread (from drbd0_receiver [2425]) Jul 14 12:06:05 backup kernel: [22632331.366497] block drbd0: data-integrity-alg: crc32c Jul 14 12:06:05 backup kernel: [22632331.366571] block drbd0: max BIO size = 4294966784 Jul 14 12:06:05 backup kernel: [22632331.366614] block drbd0: drbd_sync_handshake: Jul 14 12:06:05 backup kernel: [22632331.366622] block drbd0: self 0070000000000004:0000000000000000:0000000000000000:0000000000000000 bits:3767808 flags:0 Jul 14 12:06:05 backup kernel: [22632331.366632] block drbd0: peer 177C0F56441E807B:0070000000000004:006F000000000004:006E000000000004 bits:3767808 flags:2 Jul 14 12:06:05 backup kernel: [22632331.366642] block drbd0: uuid_compare()=-1 by rule 50 Jul 14 12:06:05 backup kernel: [22632331.366647] block drbd0: Becoming sync target due to disk states. Jul 14 12:06:05 backup kernel: [22632331.366658] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Jul 14 12:06:06 backup kernel: [22632331.529400] block drbd0: conn( WFBitMapT -> WFSyncUUID ) Jul 14 12:06:06 backup kernel: [22632331.574618] block drbd0: updated sync uuid 0071000000000004:0000000000000000:0000000000000000:0000000000000000 Jul 14 12:06:06 backup kernel: [22632331.575600] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Jul 14 12:06:06 backup kernel: [22632331.580785] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Jul 14 12:06:06 backup kernel: [22632331.580806] block drbd0: conn( WFSyncUUID -> SyncTarget ) Jul 14 12:06:06 backup kernel: [22632331.580817] block drbd0: Began resync as SyncTarget (will sync 15071232 KB [3767808 bits set]). Jul 14 12:06:06 backup kernel: [22632331.593772] block drbd0: sock was shut down by peer Jul 14 12:06:06 backup kernel: [22632331.593785] block drbd0: peer( Primary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Jul 14 12:06:06 backup kernel: [22632331.593797] block drbd0: short read expecting header on sock: r=0 Jul 14 12:06:06 backup kernel: [22632331.594563] block drbd0: asender terminated Jul 14 12:06:06 backup kernel: [22632331.594574] block drbd0: Terminating drbd0_asender Jul 14 12:06:06 backup kernel: [22632331.609118] block drbd0: bitmap WRITE of 115 pages took 4 jiffies Jul 14 12:06:06 backup kernel: [22632331.609133] block drbd0: 14 GB (3767808 bits) marked out-of-sync by on disk bit-map. Jul 14 12:06:06 backup kernel: [22632331.609148] block drbd0: Connection closed Jul 14 12:06:06 backup kernel: [22632331.609156] block drbd0: conn( BrokenPipe -> Unconnected ) Jul 14 12:06:06 backup kernel: [22632331.609164] block drbd0: receiver terminated Jul 14 12:06:06 backup kernel: [22632331.609169] block drbd0: Restarting drbd0_receiver Jul 14 12:06:06 backup kernel: [22632331.609174] block drbd0: receiver (re)started Jul 14 12:06:06 backup kernel: [22632331.609181] block drbd0: conn( Unconnected -> WFConnection ) Jul 14 12:06:06 backup kernel: [22632332.342057] block drbd0: Handshake successful: Agreed network protocol version 96 Both have external metadata on the second primary partition, and the partition layout is simple with identical block sizes. KB: Device Boot Start End Blocks Id System /dev/vda1 * 2048 30144511 15071232 83 Linux /dev/vda2 30144512 30408703 132096 83 Linux BACKUP: Device Boot Start End Blocks Id System /dev/xvdb1 2048 30144511 15071232 83 Linux /dev/xvdb2 30144512 30408703 132096 83 Linux -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140714/00fd8565/attachment.htm>