Felix, thanks for your reply.<div><br></div><div>I did an experiment yesterday. Here is what I`ve got.</div><div><br></div><div>I created <span style="background-color:rgb(255, 255, 0)">ten </span>DRBD devices using<span style="background-color:rgb(255, 255, 0)"> default </span>syncer rate(primary/primary configuration). I got the initial sync rate at about 1M/s per device. My client is a Windows 2008 server with IOMeter running on it. The default IO timeout is 60s.</div>


<div><br></div><div>II reboot one node, the other takes over. IO runs smoothly <span style="background-color:rgb(255, 255, 0)">until </span>the other node re-synchronization starts. Application IO drops to nearly zero with re-sync rate of 500K/S per device. </div>


<div><br></div><div>What confuses me is 500(k/s /device) * 10 (devices) = 5M/s. It only uses 5M/1000M = 0.5% of the overall bandwidth. Application IOs eventually are abandoned by client due to timeout which result in LUN reset.(all assigned devices went offline).</div>


<div><br></div><div>As mentioned in your previous email, I did not put a device to primary until sync finishes. </div><div><br></div><div>Again, here is my DRBD configuration </div><div><br></div><div> </div><div><span style="border-collapse:collapse;font-family:arial, sans-serif;font-size:13px"><div>


resource drbd10 {</div><div>  on FA33 {</div><div>    device /dev/drbd10;</div><div>    disk /dev/disk/by-id/scsi-360030480003ae2e0159207cc2a2ac9d4;</div><div>    address <a href="http://192.168.251.1:7799/" style="color:rgb(28, 81, 168)" target="_blank">192.168.251.1:7799</a>;</div>


<div>    meta-disk internal;</div><div>  }</div><div>  on FA34 {</div><div>    device /dev/drbd10;</div><div>    disk /dev/disk/by-id/scsi-360030480003ae32015920a821ca7f075;</div><div>    address <a href="http://192.168.251.2:7799/" style="color:rgb(28, 81, 168)" target="_blank">192.168.251.2:7799</a>;</div>


<div>    meta-disk internal;</div><div>  }</div><div>  net {</div><div>    allow-two-primaries;</div><div>    after-sb-0pri discard-younger-primary;</div><div>    after-sb-1pri discard-secondary;</div><div>    after-sb-2pri violently-as0p;</div>


<div>    rr-conflict violently;</div><div>    max-buffers 8000;</div><div>    max-epoch-size 8000;</div><div>    unplug-watermark 16;</div><div>    sndbuf-size 0;</div><div>  }</div><div>  syncer {</div><div>    verify-alg crc32c;</div>


<div>    al-extents 3800;</div><div>  }</div><div>  handlers {</div><div>    before-resync-target &quot;/sbin/before_resync_target.sh&quot;;</div><div>

    after-resync-target &quot;/sbin/after_resync_target.sh&quot;;</div><div>  }</div><div>}</div></span></div><div><br></div><div>Anyone encountered similar problem before?</div><div><br></div><div>Commit yourself to constant self-improvement<br>


<br><br><div class="gmail_quote">On Wed, Jun 22, 2011 at 3:34 AM, Felix Frank <span dir="ltr">&lt;<a href="mailto:ff@mpexnet.de" target="_blank">ff@mpexnet.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>On 06/22/2011 04:28 AM, Digimer wrote:<br>

&gt; Are all ten DRBD resources on the same set of drives?<br>

<br>

</div>Good hint: if there *are* indeed 10 DRBDs, the syncer rate should of<br>

course be 30% * THROUGHPUT / NUM_DRBDs, because each resource will use<br>

the defined rate. I.e. in your case, some 30M.<br>

<br>

To the OP: Does the rebooted node become Primary before the sync is<br>

complete? If so, you may want to try leaving it Secondary until<br>

everything is back up in sync.<br>

Requests to an Inconsistent node can cause network overhead.<br>

<br>

Cheers,<br>

<font color="#888888">Felix<br>

</font></blockquote></div><br></div>