[DRBD-user] performance under difficult circumstances

Richard Hector richard at catalyst.net.nz
Fri Aug 21 05:53:28 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Hi all,

I realise I'm trying to make things work with a bunch of conditions that
make it difficult.

I'm aiming for a 3-way setup, with 2 local (proto C) and one remote
(proto A) box.

The primary box will be a new one we haven't got yet (SunFire x4275 -
12xSATA bays) the temporary one has reasonable specs too (x4100: SAS

The secondary box (if we choose to use it, rather than just doing 2-way
async) is also our backup PostgreSQL server.

The third box is an oldish Dell server, with minimal upgradeability, but
has a pair of 1TB disks.

All machines use software RAID1 - we choose that for portability, ie the
ability to chuck a disk in any old machine with the right interface, and
at least be able to read it.

The two local machines are connected by a direct Gbit link (same rack);
the other is over the internet (theoretical max 100Mbit I think;
practically probably much lower) with a RTT of around 18ms.

I'm managing to get quite reasonable times for copying to a just-local,
proto C drbd set, but adding the third is prohibitive. Unfortunately my
testing is also rather haphazard, as other unpredictable factors seem to
make more difference to the results than any tweaks I've done to drbd
settings - I'm not convinced they're even worth posting; they're so
varied. But copying a 300M file onto the drbd fs takes anything from
2min to 15min.

I've tried playing with:
sndbuf-size - best result seems to be about 512K?
max-buffers - haven't played much, but I think 256 was an improvement
over the default of 32
max-epoch-size - increasing to 8192 seemed a decent improvement, but
could do with more testing
al-extents - is this supposed to have an effect on ordinary performance,
given it's in the syncer section? or is it only for syncing, like rate?

I know this says very little, and I haven't done enough. But it also
takes a long time to do the tests ...

I'm concluding, though, that the third remote device adds a huge
performance hit. We're prepared to tolerate quite a significant lag
between the local pair and the third device, but I can't see where to
allow this - sndbuf-size seems the most obvious, but the recommendation
that 1M is too big seems rather restricting. We're currently rsyncing
the data in question on an hourly basis, and want it a bit more up to
date than that, but drbd only seems to offer _much_ more up to date, at
the expense of local performance.

I also wondered whether we'd be better off with the third device
detached, with a cronjob syncing it periodically as described here:

Essentially, we want the third device sufficiently asynchronous that it
doesn't impact local write times at all.

Can anyone offer any suggestions? I'm obviously happy to attempt to
answer questions, since I know my description is inadequate :-)

Have we just picked the wrong tool for the job?



-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090821/4a90c66a/attachment.pgp>

More information about the drbd-user mailing list