Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Felix, thanks for your reply.
I did an experiment yesterday. Here is what I`ve got.
I created ten DRBD devices using default syncer rate(primary/primary
configuration). I got the initial sync rate at about 1M/s per device. My
client is a Windows 2008 server with IOMeter running on it. The default IO
timeout is 60s.
II reboot one node, the other takes over. IO runs smoothly until the other
node re-synchronization starts. Application IO drops to nearly zero with
re-sync rate of 500K/S per device.
What confuses me is 500(k/s /device) * 10 (devices) = 5M/s. It only uses
5M/1000M = 0.5% of the overall bandwidth. Application IOs eventually are
abandoned by client due to timeout which result in LUN reset.(all assigned
devices went offline).
As mentioned in your previous email, I did not put a device to primary until
sync finishes.
Again, here is my DRBD configuration
resource drbd10 {
on FA33 {
device /dev/drbd10;
disk /dev/disk/by-id/scsi-360030480003ae2e0159207cc2a2ac9d4;
address 192.168.251.1:7799;
meta-disk internal;
}
on FA34 {
device /dev/drbd10;
disk /dev/disk/by-id/scsi-360030480003ae32015920a821ca7f075;
address 192.168.251.2:7799;
meta-disk internal;
}
net {
allow-two-primaries;
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri violently-as0p;
rr-conflict violently;
max-buffers 8000;
max-epoch-size 8000;
unplug-watermark 16;
sndbuf-size 0;
}
syncer {
verify-alg crc32c;
al-extents 3800;
}
handlers {
before-resync-target "/sbin/before_resync_target.sh";
after-resync-target "/sbin/after_resync_target.sh";
}
}
Anyone encountered similar problem before?
Commit yourself to constant self-improvement
On Wed, Jun 22, 2011 at 3:34 AM, Felix Frank <ff at mpexnet.de> wrote:
> On 06/22/2011 04:28 AM, Digimer wrote:
> > Are all ten DRBD resources on the same set of drives?
>
> Good hint: if there *are* indeed 10 DRBDs, the syncer rate should of
> course be 30% * THROUGHPUT / NUM_DRBDs, because each resource will use
> the defined rate. I.e. in your case, some 30M.
>
> To the OP: Does the rebooted node become Primary before the sync is
> complete? If so, you may want to try leaving it Secondary until
> everything is back up in sync.
> Requests to an Inconsistent node can cause network overhead.
>
> Cheers,
> Felix
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110622/c02a1fd9/attachment.htm>