[DRBD-user] tuning DRBD

Tue Jun 12 20:34:22 CEST 2007

On Tue, Jun 12, 2007 at 09:55:49AM -0700, Ben wrote:
> Thanks for the response, Lars. Followup questions below....
> 
> On Jun 12, 2007, at 9:19 AM, Lars Ellenberg wrote:
> 
> >>2. Can I eck out more throughput by increasing sndbuf-size? Does that
> >>increase my write latency when using Protocol C?
> >
> >most likely not.
> 
> Is that a no on the throughput, the increased latency, or both? :)

throughput:

you simply cannot increase throughput by adding buffers somewhere.
buffers are good for smoothing out bursts.
they cannot do away with any bottleneck
you may have for sustained access.

latency:

you use protocol 'C'.
it is synchronous.

for the file system/application to notice a write as completed, you need
to wait for it to travel to the other node, get written on both disks,
and for the ack to travel back.

whether or not the tcp buffer fills up or not does not change things.
in this case the tcp buffer does only what it was supposed to do,
smooth the tcp stuff. having a larger buffer (which, depending on your
actual network performance may never fill up) cannot possibly increase
latency?

> >>3. What's a reasonable formula for determining max-buffers? Does
> >>increasing them imply I should increase something else too?
> >
> >that number is basically in "pages", iirc, so if you compare
> >(max-buffers * 4k * number of drbd's in use) to your physical RAM,
> >it should be some not too impressive number.
> 
> Got it. Is there a benefit to increasing it?

if you tune it too small, io will throttle on it.
to not thottle here, it should be larger than the maximum expected
amount of in-flight io (io requests submitted but not yet completed).

> >>5. Does decreasing max-epoch-size reduce my write latency?
> >
> >most likely not.
> 
> What are the tradeoffs with turning this knob, then?

it it the upper limit for drbd-wire-protocol reorder domains.
if you want to know the details, read some of the papers at
 http://drbd.org/publications.html

it normally is not very interessting for the performance of drbd.

it may affect throughput negatively if made too small, because some
side effects triggered implicitly when closing such a reorder domain
(in drbd speak also known as sending a "barrier" requesting a "barrier
ack" thus closing the "current epoch" ...)

it happens to sometimes give interessting counter-intuitive results
because of said side effects (basically unplugging the lower level
device). in case that would lead to better performance for your setup,
your are likely still turning the wrong knob and should much rather be
tuning the io-scheduler of the lower level devce instead.

> >>Does anybody have any experience playing with the schedulers in the
> >>2.6 kernel?
> >
> >yes of course.
> >but the answers here depend very much on what you specifically ask  
> >for.
> 
> I'm looking to make me a database cluster, so I'm looking to minimize  
> latency for random reads and writes.

well.
 buy loads of RAM.
 try deadline with short expiry times for reads (the writes will be
 forced to expire by the db when necessary).
 buy loads of RAM.
 try anticipatory, but be aware that it might well be the worst choice.
 buy loads of RAM.
 I generally recommend strongly against anticipatory on server systems,
 but sometimes it may prove useful given certain access patterns
 (e.g.  you have lots of uninteressting streaming writes and some very
 important reads). but then you probably should do those streaming
 writes somewhere else :)
 or use the cfq, and play with the io priorities instead.
 and buy loads of RAM...

 finally, given sufficiently intelligent hardware,
 the noop scheduler is the right one...

depending on the size of you data set,
chose a large number of al-extends.

did I mention that you should buy loads of RAM?

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.