[DRBD-user] Initial sync
cliff at aaisp.net.uk
Tue Feb 19 19:17:44 CET 2008
[Apologies if this issue has already been discussed.]
I have concerns over the initial sync time on large
(multi-terabyte) devices, and also on the performance
hit on application disk access while sync is in progress.
Here are my setup details:
I have drbd on two servers each having a 9TB hardware
RAID6. I am using drbd 8.0.11 with CentOS5.
To provide flexibility, I used parted/gpt to partition the
raw 9TB array into 13 smaller areas of around 700GB each,
and I have six of these primary on one server and seven
primary on the other. The servers have a GigaBit
private (crossover) link for the drbd traffic.
The initial synchronization is, not surprisingly, taking
a long time. I have set the max sync rate to 100Mbyte/sec,
but the speed I am seeing is actually about 1/3 of this
(in both directions, as seen looking at ether stats),
so the whole process will take around two days.
I have not changed the default "net" parameters, nor
have I arranged for the syncs to happen sequentially
rather than in parallel. I have also made no attempt
to match the block/region/partition sizes to the
underlying raid6, so I expect the speed I am seeing
can be improved somewhat. It still seems unduly long,
and, more to the point, unnecessary. I have seen the
drbd+ skip-initial-sync article, so I know I could
work around this using drbdmeta - but I'd much rather
see a cleaner mechanism available - either a built-in
way to avoid initial sync or preferably a way to mark
both disks as "zero" - ie blocks should read zero until
first written. [I suspect that's not as easy as it
sounds as it would increase the metadata size?]
In addition to the sync speed, I am finding that disk
accesses are extremely slow while the sync is in progress.
For example, running mkfs.ext3 on a 1TB partition
is taking several hours to complete. I'm guessing here -
I know the sync process is meant to be running at lower
priority - but is it possibly just filling the drbd IO
queue, so that any application IO has to wait for its
time through the queue? Would reducing max-buffers help?
Would it be better if application requests could be
given higher priority - boosted to the front of the queue?
More information about the drbd-user