[DRBD-user] Large block IO bottleneck

Wed Jan 3 16:56:47 CET 2007

Am Mittwoch, 3. Januar 2007 16:20 schrieb Ross S. W. Walker:
> > -----Original Message-----
> > From: drbd-user-bounces at lists.linbit.com
> > [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of
> > Philipp Reisner
> > Sent: Wednesday, January 03, 2007 5:58 AM
> > To: drbd-user at lists.linbit.com
> > Subject: Re: [DRBD-user] Large block IO bottleneck
> >
> > Am Dienstag, 2. Januar 2007 22:05 schrieb Ross S. W. Walker:
> > > Hi there I am using DRBD 0.7.21 with iSCSI Enterprise
> >
> > Target 0.4.14 on
> >
> > > CentOS 4.4.
> > >
> > > When I run iSCSI direct to the LVM lv on top of hardware
> >
> > RAID I can get
> >
> > > 225 MB/s over two sessions in MPIO with 256K block size,
> >
> > but when I put
> >
> > > DRBD in-between iSCSI and LVM the throughput tops out at 80
> >
> > MB/s and I
> >
> > > can't seem to go over that.
> > >
> > > DRBD seems to report it's max number of sectors as 8 (4K), does that
> > > mean each io operation is limited to 4K? My hardware raid
> >
> > reports it's
> >
> > > max sectors as 128, could this explain the reduction to 1/3
> >
> > throughput?
> >
> >
> > Hi,
> >
> > The cause for the limitation to 4k is the Linux-2.4 compabitility of
> > DRBD-0.7.
> >
> > Repeat your test with drbd-8.0(rc1).
> >
> > drbd-8.0 will do BIOs up to 32k, but much more important are other
> > changes (e.g. the non blocking make_request() function), that makes
> > drbd-8.0 to scale much better with high end hardware.
> >
> > PS: What kind of network link are you using ?
>
> We're using dual 1Gbps adapters, one for each path in the MPIO
> connection (actually 4 adapters 2 separate bonded pairs using ALB since
> we have multiple initiators, 4 to be exact).

So, this are 2Gbps per logical link, right ?

is this for iSCSI only, is it for DRBD also 2Gbps.

BTW, the load balancing of most switches is not usable for DRBD,
since most of them does a constant mapping of target MAC to port.
If you use that for DRBD you will get the whole traffic on a single
1Gbps port :(

> Is there an issue with using the max_sectors from the underlying
> hardware, that way DRBD would scale up or down depending on the backing
> device that is used?

The 32k are a rather arbitrary value we have chosen in DRBD. Such a limit
is needed for some of the algorithms in the area of the two primary
nodes, write conflict detection code.

If you really want you can change that define (HT_SHIFT), but probably
the positive effects are out-numbered by negative effects (more collision
in hash tables etc...)

We did quite some measuring and our results where that the actual size
of the BIOs does not influence the performance. 

> Of course DRBD may have to automatically re-configure the min-buffers
> that it needs depending on the size of the BIOs it accepts, so
> replication at that speed doesn't overflow.

>
> The secondary peer isn't in place yet here and when it does come online
> it will be geographically separated and therefore over a high latency
> low bandwidth connection. I am planning on replicating to this peer
> asynchronously using Prot A, is there a formula for calculating optimum
> snd_buffer based on dataset/bandwidth/latency?
>

Huh.

I think you are concerned about performance.

The issue is, when the snd_buffer is full on the primary node, it
has to block the writing application until there is space for
the next write in the snd_buffer. 

The outflow rate out of the snd_buffer is the bandwith of your
replication link.

E.g. with a snd_buffer of 1M, and a bandwith of 1MBit/sec on the 
replication network.

Writing a 990kb file will be as fast as your local disk is.
Writing a second 990kb file will take aprrox 10 seconds!!!

(1MByte / 100Kbyte/s =~ 10 seconds.)

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :