<div dir="ltr"><span style="font-family:arial,sans-serif;font-size:13px">I have two hosts configured as Primary-Secondary on Debian 7 (current stable) and DRBD (debian packaged 8.3.13-2)</span><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">Both are VPS hosts, host1 (kb) is with Chicago VPS (KVM/SoluzVM on Intel CacheCade SSD Cached Disk) and host2 (backup) is with AWS. </div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">Upon initial sync, the primary host1 (kb) seems to generate the following error (detailed syslogs below) </div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px"><div>Jul 14 12:08:03 kb kernel: [ 990.382914] block drbd0: Began resync as SyncSource (will sync 15071232 KB [3767808 bits set]).</div><div>Jul 14 12:08:03 kb kernel: [ 990.382928] block drbd0: updated sync UUID 177C0F56441E807B:0082000000000004:0081000000000004:0080000000000004</div>
<div>Jul 14 12:08:03 kb kernel: [ 990.400257] block drbd0: /build/linux-baBndT/linux-3.2.60/drivers/block/drbd/drbd_receiver.c:1953: sector: 0s, size: 262144</div><div>Jul 14 12:08:03 kb kernel: [ 990.401997] block drbd0: <b>error receiving RSDataRequest, l: 24!</b></div>
<div>Jul 14 12:08:03 kb kernel: [ 990.402918] block drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError ) </div></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">
My research on this issue leads to a few similar instances, all dead-ends:</div><div style="font-family:arial,sans-serif;font-size:13px"><a href="http://lists.linbit.com/pipermail/drbd-user/2014-February/020601.html" target="_blank">http://lists.linbit.com/pipermail/drbd-user/2014-February/020601.html</a></div>
<div style="font-family:arial,sans-serif;font-size:13px"><a href="https://forums.suse.com/showthread.php?292-DRBD-will-not-start-the-sync-process" target="_blank">https://forums.suse.com/showthread.php?292-DRBD-will-not-start-the-sync-process</a><br>
</div><div style="font-family:arial,sans-serif;font-size:13px"><a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648963" target="_blank">https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648963</a><br></div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">After setting up the configuration, with host1 (kb) as (UptoDate, Primary, WFConnection) on the host2 (backup) I checked dstate=Inconsistent and role=Secondary and then </div>
<div style="font-family:arial,sans-serif;font-size:13px">#drbdadm disconnect resource</div><div style="font-family:arial,sans-serif;font-size:13px">#drbdadm invalidate resource</div><div style="font-family:arial,sans-serif;font-size:13px">
#drbdadm -- --discard-my-data connect root<br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">And receive the error continuously as described below in the logs on each. I tried protocol C, removed the handlers, verify-alg to no avail. </div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">The AWS host2 (backup) has different public IP and local IP, so I manually changed the IP on host2 (backup) to reflect the local IP once I have copied over the resource configuration. The hosts have ICMP and TCP 7789 connectivity between them. </div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">any help appreciated, </div><div style="font-family:arial,sans-serif;font-size:13px">cheers,</div>
<div style="font-family:arial,sans-serif;font-size:13px">Ian</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">
The resource configuration is as follows:<br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><div>resource root {</div><div> protocol A; </div>
<div> handlers {</div><div> pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh";</div><div> pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh";</div>
<div> local-io-error "/usr/lib/drbd/notify-io-error.sh";</div><div><span style="white-space:pre-wrap">        </span> split-brain "/usr/lib/drbd/notify-split-brain.sh root";</div><div> }</div>
<div> startup {</div><div> become-primary-on kb;</div><div> # Wait 10 seconds on boot until the peer connects.</div><div> wfc-timeout 10;</div><div> }</div><div> net {</div>
<div> data-integrity-alg crc32c;</div><div><span style="white-space:pre-wrap">        </span> after-sb-0pri discard-younger-primary;</div><div> <span style="white-space:pre-wrap">        </span> after-sb-1pri discard-secondary;</div>
<div> after-sb-2pri disconnect;</div><div> }</div><div> syncer {</div><div> rate 10M;</div><div> verify-alg crc32c;</div><div> }</div><div> on kb {</div><div>
device /dev/drbd0;</div><div> disk /dev/vda1;</div><div> address <a href="http://172.245.43.142:7789/" target="_blank">172.245.43.142:7789</a>;</div><div> meta-disk /dev/vda2[0];</div>
<div> }</div><div> on backup {</div><div> device /dev/drbd0;</div><div> disk /dev/xvdb1;</div><div> address <a href="http://54.88.198.212:7789/" target="_blank">54.88.198.212:7789</a>;</div>
<div> meta-disk /dev/xvdb2[0];</div><div> }</div><div>}</div></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">On host1 (kb) I see this:</div>
<div style="font-family:arial,sans-serif;font-size:13px"><div>Jul 14 12:06:06 kb kernel: [ 872.748456] block drbd0: Restarting drbd0_receiver<br></div><div>Jul 14 12:06:06 kb kernel: [ 872.748459] block drbd0: receiver (re)started</div>
<div>Jul 14 12:06:06 kb kernel: [ 872.748464] block drbd0: conn( Unconnected -> WFConnection ) </div><div>Jul 14 12:06:06 kb kernel: [ 873.496352] block drbd0: Handshake successful: Agreed network protocol version 96</div>
<div>Jul 14 12:06:06 kb kernel: [ 873.496364] block drbd0: conn( WFConnection -> WFReportParams ) </div><div>Jul 14 12:06:06 kb kernel: [ 873.496385] block drbd0: Starting asender thread (from drbd0_receiver [2652])</div>
<div>Jul 14 12:06:06 kb kernel: [ 873.504935] block drbd0: data-integrity-alg: crc32c</div><div>Jul 14 12:06:06 kb kernel: [ 873.504958] block drbd0: drbd_sync_handshake:</div><div>Jul 14 12:06:06 kb kernel: [ 873.504963] block drbd0: self 177C0F56441E807B:0071000000000004:0070000000000004:006F000000000004 bits:3767808 flags:0</div>
<div>Jul 14 12:06:06 kb kernel: [ 873.504968] block drbd0: peer 0071000000000004:0000000000000000:0000000000000000:0000000000000000 bits:3767808 flags:0</div><div>Jul 14 12:06:06 kb kernel: [ 873.504973] block drbd0: uuid_compare()=1 by rule 70</div>
<div>Jul 14 12:06:06 kb kernel: [ 873.504976] block drbd0: Becoming sync source due to disk states.</div><div>Jul 14 12:06:06 kb kernel: [ 873.504985] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) </div>
<div>Jul 14 12:06:07 kb kernel: [ 873.694631] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0</div><div>Jul 14 12:06:07 kb kernel: [ 873.697722] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)</div>
<div>Jul 14 12:06:07 kb kernel: [ 873.697733] block drbd0: conn( WFBitMapS -> SyncSource ) </div><div>Jul 14 12:06:07 kb kernel: [ 873.697742] block drbd0: Began resync as SyncSource (will sync 15071232 KB [3767808 bits set]).</div>
<div>Jul 14 12:06:07 kb kernel: [ 873.697758] block drbd0: updated sync UUID 177C0F56441E807B:0072000000000004:0071000000000004:0070000000000004</div><div>Jul 14 12:06:07 kb kernel: [ 873.716189] block drbd0: /build/linux-baBndT/linux-3.2.60/drivers/block/drbd/drbd_receiver.c:1953: sector: 0s, size: 262144</div>
<div>Jul 14 12:06:07 kb kernel: [ 873.717179] block drbd0: error receiving RSDataRequest, l: 24!</div><div>Jul 14 12:06:07 kb kernel: [ 873.717692] block drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError ) </div>
<div>Jul 14 12:06:07 kb kernel: [ 873.718441] block drbd0: bitmap WRITE of 115 pages took 0 jiffies</div><div>Jul 14 12:06:07 kb kernel: [ 873.719408] block drbd0: asender terminated</div><div>Jul 14 12:06:07 kb kernel: [ 873.719414] block drbd0: Terminating drbd0_asender</div>
<div>Jul 14 12:06:07 kb kernel: [ 873.719897] block drbd0: 14 GB (3767808 bits) marked out-of-sync by on disk bit-map.</div><div>Jul 14 12:06:07 kb kernel: [ 873.719907] block drbd0: Connection closed</div><div>Jul 14 12:06:07 kb kernel: [ 873.719912] block drbd0: conn( ProtocolError -> Unconnected ) </div>
<div>Jul 14 12:06:07 kb kernel: [ 873.719916] block drbd0: receiver terminated</div><div>Jul 14 12:06:07 kb kernel: [ 873.719918] block drbd0: Restarting drbd0_receiver</div><div>Jul 14 12:06:07 kb kernel: [ 873.719921] block drbd0: receiver (re)started</div>
<div>Jul 14 12:06:07 kb kernel: [ 873.719924] block drbd0: conn( Unconnected -> WFConnection ) </div><div>Jul 14 12:06:07 kb kernel: [ 874.472337] block drbd0: Handshake successful: Agreed network protocol version 96</div>
</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">On host2(backup) I see this: </div><div style="font-family:arial,sans-serif;font-size:13px">
<div>Jul 14 12:06:05 backup kernel: [22632330.627435] block drbd0: Restarting drbd0_receiver</div><div>Jul 14 12:06:05 backup kernel: [22632330.627440] block drbd0: receiver (re)started</div><div>Jul 14 12:06:05 backup kernel: [22632330.627447] block drbd0: conn( Unconnected -> WFConnection ) </div>
<div>Jul 14 12:06:05 backup kernel: [22632331.366181] block drbd0: Handshake successful: Agreed network protocol version 96</div><div>Jul 14 12:06:05 backup kernel: [22632331.366209] block drbd0: conn( WFConnection -> WFReportParams ) </div>
<div>Jul 14 12:06:05 backup kernel: [22632331.366236] block drbd0: Starting asender thread (from drbd0_receiver [2425])</div><div>Jul 14 12:06:05 backup kernel: [22632331.366497] block drbd0: data-integrity-alg: crc32c</div>
<div>Jul 14 12:06:05 backup kernel: [22632331.366571] block drbd0: max BIO size = 4294966784</div><div>Jul 14 12:06:05 backup kernel: [22632331.366614] block drbd0: drbd_sync_handshake:</div><div>Jul 14 12:06:05 backup kernel: [22632331.366622] block drbd0: self 0070000000000004:0000000000000000:0000000000000000:0000000000000000 bits:3767808 flags:0</div>
<div>Jul 14 12:06:05 backup kernel: [22632331.366632] block drbd0: peer 177C0F56441E807B:0070000000000004:006F000000000004:006E000000000004 bits:3767808 flags:2</div><div>Jul 14 12:06:05 backup kernel: [22632331.366642] block drbd0: uuid_compare()=-1 by rule 50</div>
<div>Jul 14 12:06:05 backup kernel: [22632331.366647] block drbd0: Becoming sync target due to disk states.</div><div>Jul 14 12:06:05 backup kernel: [22632331.366658] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) </div>
<div>Jul 14 12:06:06 backup kernel: [22632331.529400] block drbd0: conn( WFBitMapT -> WFSyncUUID ) </div><div>Jul 14 12:06:06 backup kernel: [22632331.574618] block drbd0: updated sync uuid 0071000000000004:0000000000000000:0000000000000000:0000000000000000</div>
<div>Jul 14 12:06:06 backup kernel: [22632331.575600] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0</div><div>Jul 14 12:06:06 backup kernel: [22632331.580785] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)</div>
<div>Jul 14 12:06:06 backup kernel: [22632331.580806] block drbd0: conn( WFSyncUUID -> SyncTarget ) </div><div>Jul 14 12:06:06 backup kernel: [22632331.580817] block drbd0: Began resync as SyncTarget (will sync 15071232 KB [3767808 bits set]).</div>
<div>Jul 14 12:06:06 backup kernel: [22632331.593772] block drbd0: sock was shut down by peer</div><div>Jul 14 12:06:06 backup kernel: [22632331.593785] block drbd0: peer( Primary -> Unknown ) conn( SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) </div>
<div>Jul 14 12:06:06 backup kernel: [22632331.593797] block drbd0: short read expecting header on sock: r=0</div><div>Jul 14 12:06:06 backup kernel: [22632331.594563] block drbd0: asender terminated</div><div>Jul 14 12:06:06 backup kernel: [22632331.594574] block drbd0: Terminating drbd0_asender</div>
<div>Jul 14 12:06:06 backup kernel: [22632331.609118] block drbd0: bitmap WRITE of 115 pages took 4 jiffies</div><div>Jul 14 12:06:06 backup kernel: [22632331.609133] block drbd0: 14 GB (3767808 bits) marked out-of-sync by on disk bit-map.</div>
<div>Jul 14 12:06:06 backup kernel: [22632331.609148] block drbd0: Connection closed</div><div>Jul 14 12:06:06 backup kernel: [22632331.609156] block drbd0: conn( BrokenPipe -> Unconnected ) </div><div>Jul 14 12:06:06 backup kernel: [22632331.609164] block drbd0: receiver terminated</div>
<div>Jul 14 12:06:06 backup kernel: [22632331.609169] block drbd0: Restarting drbd0_receiver</div><div>Jul 14 12:06:06 backup kernel: [22632331.609174] block drbd0: receiver (re)started</div><div>Jul 14 12:06:06 backup kernel: [22632331.609181] block drbd0: conn( Unconnected -> WFConnection ) </div>
<div>Jul 14 12:06:06 backup kernel: [22632332.342057] block drbd0: Handshake successful: Agreed network protocol version 96</div></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">
Both have external metadata on the second primary partition, and the partition layout is simple with identical block sizes. </div><div style="font-family:arial,sans-serif;font-size:13px">KB:</div><div style="font-family:arial,sans-serif;font-size:13px">
<div> Device Boot Start End Blocks Id System</div><div>/dev/vda1 * 2048 30144511 15071232 83 Linux</div><div>/dev/vda2 30144512 30408703 132096 83 Linux</div></div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">BACKUP:</div><div style="font-family:arial,sans-serif;font-size:13px"><div> Device Boot Start End Blocks Id System</div>
<div>/dev/xvdb1 2048 30144511 15071232 83 Linux</div><div>/dev/xvdb2 30144512 30408703 132096 83 Linux</div></div></div>