[DRBD-user] DRBD module causes severe disk performance decrease

Tue Dec 23 13:04:25 CET 2008

On Mon, Dec 22, 2008 at 03:12:45PM -0700, Sam Howard wrote:
> Hi.
> 
> So, I have a 2-node setup ... it functions fine, but I recently (last night/
> this morning) upgraded to the current Xen source, Xen patches to linux 2.6.18,
> and DRBD 8.3.0.
> 
> It *functions* fine, so far, however when I was re-syncing my data mirrors on
> both hosts, I noticed something really disturbing.  The host that only had
> Secondary roles on it was re-syncing the disk at a substantially slower rate
> than it should have been.  The host with Primary roles may have been
> performance impacted as well, but I didn't take metrics at the time.
> 
> When I went to pass out at 5am, both nodes were giving me an ETA of 4 hours to
> sync.  When I checked on them ~7h later, the host with Primary roles was
> complete and the host with Secondary roles was not!  I grabbed some data
> points:
> 
> md1 : active raid1 sdb3[2] sda3[0] 
>        727688192 blocks [2/1] [U_] 
>        [===================>.]  recovery = 98.9% (720040384/727688192) finish=
> 5.1 
>  min speed=24888K/sec 

ok. so to not confuse anything here:
you have drbd on top of md software raid.

first, _why_ did you have to run
a recovery on your software raid, at all?

second, _any_ io on the devices under md recovery
will cause "degradation" of the sync.
especially random io, or io causing it to seek often.

this has absolutley nothing to do with drbd as such.
that is intended, and md provides you with the necessary knobs
to tune its recovery behaviour
(e.g. /sys/block/md0/md/sync_speed, _min, _max).

you should also use the deadline io scheduler, if you do not already.
(kernel boot parameter: elevator=deadline, runtime:
/sys/block/*/queue/scheduler)

> Just for fun, I shut down the DRBD devices and module (/etc/init.d/drbd stop)
> and checked again:
> 
> md1 : active raid1 sdb3[2] sda3[0] 
>        727688192 blocks [2/1] [U_] 
>        [===================>.]  recovery = 98.2% (715188224/727688192) finish=
> 4.8 
>  min speed=43250K/sec 
> 
> That's a pretty significant difference.

sure.
no more other io on the devices, md recovery can use up all the
bandwidth and does not need to play nice.

just read up on md sync_speed_min.

> Aside from these performance issues,

absolutely nothing to do with drbd as such.

> and a pretty disturbing issue with resizing an LV/DRBD
> device/Filesystem with flexible-meta-disk: internal,

care to give details?

> everything has been running OK.
> 
> I do not suspect the upgrade to the Xen, linux, or DRBD source ... I've seen
> these performance issues in the past, but never been annoyed enough to capture
> them in detail.
> 
> Any ideas what I can do to pep up the performance?

read up about the concept of md recovery, and how it is trying to be
nice to application io (in this case, DRBD-replication io, but md does
not care). don't overdo when un-nicing it (tuning up the sync_speed_min)
and use the deadline io scheduler.

> My ultimate goal is to use the new stacking feature in DRBD 8.3.0 to put a 3rd
> node at a remote (WAN) location ... any thing I should consider for that with
> my configuration as detailed above?

collect some usage statistics for your DRBD usage.  double check if
bandwidth and latency requirements can still be met when adding the WAN
replication link.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed