[DRBD-user] DRBD module causes severe disk performance decrease

Wed Dec 24 00:16:22 CET 2008

Lars,
> first, _why_ did you have to run
> a recovery on your software raid, at all?

The reason for the re-sync was due to saving a copy of the data "disk"
during the DRBD and kernel upgrades, so once everything looked good, I
needed to sync up my data disks.

> second, _any_ io on the devices under md recovery
> will cause "degradation" of the sync.
> especially random io, or io causing it to seek often.

My main concern with the assumption that DRBD having a device enabled is
causing the resync to go slow is because *any* disk I/O will slow the sync
... that doesn't explain why the host with the Primary roles was able to
complete the same exact re-sync in 1/3 of the time.

The DRBD devices were in sync when these mdadm re-syncs were running, so any
traffic to the DRBD devices should have had the same disruptive affect on
*both* hosts.

Is there something different about the host that has DRBD devices in
Secondary roles vs Primary roles?  Protocol C, in case that helps.

>> and a pretty disturbing issue with resizing an LV/DRBD
>> device/Filesystem with flexible-meta-disk: internal,

> care to give details?

I posted to the drbd-users list on Tue Nov 11 08:14:37 CET 2008, but got no
response.  I ended up having to try to salvage the data manually (almost
170GB worth) and recreate the DRBD devices.  Quite a bummer.  I was hoping
the internal metadata would make Xen live migration easier, but, if it
causes the resize to fail, that's not so good.  I've switched the new DRBD
device back to external metadata device and will test with it.

Thanks for the follow up ... I will investigate a bit more.

BTW, this is not intended to be a dig at DRBD ... I've been using DRBD since
older v0.7.x ... I'm just trying to report things I've observed in case
there is a special case or bug that can be discovered and fixed.

For a visual of the basic layout of our setup, see
http://skylab.org/~mush/Xen_Server_Architecture.png

That was done in June 2006 to describe the configuration we'd been running
with for a while ... our current implementation is similar, just newer code,
newer servers, and much, much more storage (and more guests).  The reasoning
behind putting the md raid1 below LVM is so that if we loose a physical
disk, we don't have to fix a *bunch* of little mirrors or other devices.  It
has proven to be a stable and flexible architecture.  We run guests on both
servers (load balancing), which we would not be able to do if we had DRBD
lower in the stack (i.e. under the LVM).

The next iteration is to swap out the md raid1 for a md raid5 (yeah, some
performance will get hurt, but linux raid5 can be grown dynamically a disk
at a time, which I can't do in our current architecture), and use the new
stacking features of DRBD 8.3 to keep a remote node in sync.  Our current
remote sync is done with rsync, but our datasets are so large now, it takes
several hours just to build the disk list (as well as >1GB of RAM).
 Considering the data changes each day are, at most, a few percent of the
total data space, something like DRBD makes a LOT more sense!

Sorry for going a little off topic, but I thought someone else might gain
some value from seeing how our environment has been built and running for ~3
years now.

Thanks,
Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20081223/45f4806e/attachment.htm>