<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 16/07/2013 14:55, Brian Candler

      wrote:<br>

    </div>

    <blockquote cite="mid:51E550C4.3030107@pobox.com" type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      <br>

      * Check /proc/drbd on target, require network is Connected and

      local disk is UpToDate. [No check on source?]<br>

      * on target: drbdsetup &lt;dev&gt; secondary (just to be sure?).

      No wait or status check?<br>

      * on both nodes: drbdsetup &lt;dev&gt; disconnect. No wait or

      status check?<br>

    </blockquote>

    Actually it does wait for GetProcStatus().is_standalone (i.e.

    connection status StandAlone)<br>

    <blockquote cite="mid:51E550C4.3030107@pobox.com" type="cite"> * on

      both nodes: drbdsetup &lt;dev&gt; connect. Poll /proc/drbd until

      connected or syncing<br>

    </blockquote>

    More precisely, the code is doing the following on both sides

    (roughly simultaneously) to reconnect in multi-master mode:<br>

    <br>

    drbdsetup &lt;dev&gt; syncer -r 61440 --create-device<br>

    drbdsetup &lt;dev&gt; net ipv4:x:x ipv4:y:y C -A

    discard-zero-changes -B consensus --create-device -m -a md5 -x

    XXXXXX<br>

    <br>

    You said:<br>

    "

    <meta charset="utf-8">

    Apparently a node was promoted right in the middle of a resync

    handshake, and did not like that at all."<br>

    <br>

    Now, I'm not clear which bit is the "promotion": It looks like

    "drbdsetup &lt;dev&gt; connect ... -m" both reconnects *and*

    promotes to master in one step.<br>

    <br>

    Now if there has been a write to the primary disk during the short

    time period when the secondary is disconnected from the primary, and

    then we reconnect in dual-master mode, then it's expected to do some

    resync along with the promotion. This appears to work: if I

    configure the VM to write aggressively to disk, then migrate, I see

    it goes through a resync phase:<br>

    <br>

    &nbsp;0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----<br>

    &nbsp;0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown&nbsp;&nbsp; r-----<br>

    &nbsp;0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C

    r-----<br>

    &nbsp;0: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C

    r-----<br>

    &nbsp;0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----<br>

    &nbsp;0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown&nbsp;&nbsp; r-----<br>

    &nbsp;0: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/Consistent C

    r-----<br>

    &nbsp;0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----<br>

    <br>

    So the race seems to be elsewhere.<br>

    <br>

    To answer your other question: no I've not tried building any other

    version of drbd, I'm just using the stock one in Debian Wheezy.<br>

    <br>

    Regards,<br>

    <br>

    Brian.<br>

    <br>

  </body>

</html>