[DRBD-user] BUG: High I/O Wait and reduced throughput due to wrong I/O size on the secondary

Lars Ellenberg lars.ellenberg at linbit.com
Mon Mar 18 09:51:31 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On Sat, Mar 16, 2013 at 07:44:45PM +0100, Felix Zachlod wrote:
> I have to answer myself again.
> I tried reverting to drbd 8.3.11 on my test setup.
> This again leaded to some interesting observations.
> When running 8.3.11 on both my virtual machines the average request size
> lowers but it is by far not as low as 4k.
> This seems to be correct as according to
> http://www.drbd.org/users-guide/re-drbdmeta.html
> the max block i/o size of drbd is 128k in 8.3.11 and 1M in 8.4.3 that is
> absolutely what I am able to observe on my test setup!
> This is just NOT working on my production cluster.
> I now am relatively sure this is a bug and possibly related to some
> incompatibility with the raid card driver?!

Did you tell us which driver that would be?
What does your device stack look like?

> But... why does drbd issue larger BIOs correctly on the primary and smaller
> BIOs wrongly on the secondary ... while the raid cards running here are the
> same!
> I tried a different thing too. I updated my production secondary to 8.4.3 to
> see i it is the same with 8.4.3 over there and what I can say now is that
> neither 8.3.11 nor 8.4.3 is doing larger BIO than 4k on my production
> secondary which leads to the assumption that this problem still persists
> with the current version 8.4.3.

DRBD does no request merging.

If coming from the page (or buffer) cache, IO is simply 4k.
(That's all normal file IO, unless using direct io).
Those are the requests that DRBD sees in its make_request function.
That's just the way it is.

It is the lower level device's IO scheduler ("elevator")
that does the aggregation/merging of these requests
before submitting to the backend "real" device.

> What I tried additionally is adding another block layer between drbd and the
> raw disks on my test setup- I accomplished this by using lvm2. This lead to
> another interesting observation:
> When running drbd above lvm on my test setup I can see that the lvm device
> mapper is being hit with 4k i/o as I can observe on my production cluster.

As I assume DRBD is being hit with that exact IOs from the page cache,
and just passes those along.

> BUT lvm2 itself

No, I don't think device mapper does such thing.
Well, "request based" device mapper targets will do that.
not sure if you actually use those, though. Do you?

> is aggregating all writes to larger BIO to the lower level
> raid disk- this leads to BIO sizes it between the both setups. 

But obviously *something* is different now,
which allows the lower level IO scheduler to merge things.

> I thank everyone for his/her interest and hope this is of interest for the
> DRBD crew too.
> Feel free to add information from your setup or to get in touch for addition
> information.

: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

More information about the drbd-user mailing list