Felix, thanks for your reply.<div><br></div><div>I did an experiment yesterday. Here is what I`ve got.</div><div><br></div><div>I created <span style="background-color:rgb(255, 255, 0)">ten </span>DRBD devices using<span style="background-color:rgb(255, 255, 0)"> default </span>syncer rate(primary/primary configuration). I got the initial sync rate at about 1M/s per device. My client is a Windows 2008 server with IOMeter running on it. The default IO timeout is 60s.</div>
<div><br></div><div>II reboot one node, the other takes over. IO runs smoothly <span style="background-color:rgb(255, 255, 0)">until </span>the other node re-synchronization starts. Application IO drops to nearly zero with re-sync rate of 500K/S per device. </div>
<div><br></div><div>What confuses me is 500(k/s /device) * 10 (devices) = 5M/s. It only uses 5M/1000M = 0.5% of the overall bandwidth. Application IOs eventually are abandoned by client due to timeout which result in LUN reset.(all assigned devices went offline).</div>
<div><br></div><div>As mentioned in your previous email, I did not put a device to primary until sync finishes. </div><div><br></div><div>Again, here is my DRBD configuration </div><div><br></div><div> </div><div><span style="border-collapse:collapse;font-family:arial, sans-serif;font-size:13px"><div>
resource drbd10 {</div><div> on FA33 {</div><div> device /dev/drbd10;</div><div> disk /dev/disk/by-id/scsi-360030480003ae2e0159207cc2a2ac9d4;</div><div> address <a href="http://192.168.251.1:7799/" style="color:rgb(28, 81, 168)" target="_blank">192.168.251.1:7799</a>;</div>
<div> meta-disk internal;</div><div> }</div><div> on FA34 {</div><div> device /dev/drbd10;</div><div> disk /dev/disk/by-id/scsi-360030480003ae32015920a821ca7f075;</div><div> address <a href="http://192.168.251.2:7799/" style="color:rgb(28, 81, 168)" target="_blank">192.168.251.2:7799</a>;</div>
<div> meta-disk internal;</div><div> }</div><div> net {</div><div> allow-two-primaries;</div><div> after-sb-0pri discard-younger-primary;</div><div> after-sb-1pri discard-secondary;</div><div> after-sb-2pri violently-as0p;</div>
<div> rr-conflict violently;</div><div> max-buffers 8000;</div><div> max-epoch-size 8000;</div><div> unplug-watermark 16;</div><div> sndbuf-size 0;</div><div> }</div><div> syncer {</div><div> verify-alg crc32c;</div>
<div> al-extents 3800;</div><div> }</div><div> handlers {</div><div> before-resync-target "/sbin/before_resync_target.sh";</div><div>
after-resync-target "/sbin/after_resync_target.sh";</div><div> }</div><div>}</div></span></div><div><br></div><div>Anyone encountered similar problem before?</div><div><br></div><div>Commit yourself to constant self-improvement<br>
<br><br><div class="gmail_quote">On Wed, Jun 22, 2011 at 3:34 AM, Felix Frank <span dir="ltr"><<a href="mailto:ff@mpexnet.de" target="_blank">ff@mpexnet.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On 06/22/2011 04:28 AM, Digimer wrote:<br>
> Are all ten DRBD resources on the same set of drives?<br>
<br>
</div>Good hint: if there *are* indeed 10 DRBDs, the syncer rate should of<br>
course be 30% * THROUGHPUT / NUM_DRBDs, because each resource will use<br>
the defined rate. I.e. in your case, some 30M.<br>
<br>
To the OP: Does the rebooted node become Primary before the sync is<br>
complete? If so, you may want to try leaving it Secondary until<br>
everything is back up in sync.<br>
Requests to an Inconsistent node can cause network overhead.<br>
<br>
Cheers,<br>
<font color="#888888">Felix<br>
</font></blockquote></div><br></div>