[DRBD-user] DRBD resync QoS

Thu Jun 23 03:03:02 CEST 2011

Felix, thanks for your reply.

I did an experiment yesterday. Here is what I`ve got.

I created ten DRBD devices using default syncer rate(primary/primary
configuration). I got the initial sync rate at about 1M/s per device. My
client is a Windows 2008 server with IOMeter running on it. The default IO
timeout is 60s.

II reboot one node, the other takes over. IO runs smoothly until the other
node re-synchronization starts. Application IO drops to nearly zero with
re-sync rate of 500K/S per device.

What confuses me is 500(k/s /device) * 10 (devices) = 5M/s. It only uses
5M/1000M = 0.5% of the overall bandwidth. Application IOs eventually are
abandoned by client due to timeout which result in LUN reset.(all assigned
devices went offline).

As mentioned in your previous email, I did not put a device to primary until
sync finishes.

Again, here is my DRBD configuration

resource drbd10 {
  on FA33 {
    device /dev/drbd10;
    disk /dev/disk/by-id/scsi-360030480003ae2e0159207cc2a2ac9d4;
    address 192.168.251.1:7799;
    meta-disk internal;
  }
  on FA34 {
    device /dev/drbd10;
    disk /dev/disk/by-id/scsi-360030480003ae32015920a821ca7f075;
    address 192.168.251.2:7799;
    meta-disk internal;
  }
  net {
    allow-two-primaries;
    after-sb-0pri discard-younger-primary;
    after-sb-1pri discard-secondary;
    after-sb-2pri violently-as0p;
    rr-conflict violently;
    max-buffers 8000;
    max-epoch-size 8000;
    unplug-watermark 16;
    sndbuf-size 0;
  }
  syncer {
    verify-alg crc32c;
    al-extents 3800;
  }
  handlers {
    before-resync-target "/sbin/before_resync_target.sh";
    after-resync-target "/sbin/after_resync_target.sh";
  }
}

Anyone encountered similar problem before?

Commit yourself to constant self-improvement

On Wed, Jun 22, 2011 at 3:34 AM, Felix Frank <ff at mpexnet.de> wrote:

> On 06/22/2011 04:28 AM, Digimer wrote:
> > Are all ten DRBD resources on the same set of drives?
>
> Good hint: if there *are* indeed 10 DRBDs, the syncer rate should of
> course be 30% * THROUGHPUT / NUM_DRBDs, because each resource will use
> the defined rate. I.e. in your case, some 30M.
>
> To the OP: Does the rebooted node become Primary before the sync is
> complete? If so, you may want to try leaving it Secondary until
> everything is back up in sync.
> Requests to an Inconsistent node can cause network overhead.
>
> Cheers,
> Felix
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110622/c02a1fd9/attachment.htm>