[DRBD-user] High io-wait

Wed Aug 15 03:54:15 CEST 2012

On 08/14/2012 08:26 PM, Dennis Jacobfeuerborn wrote:
> Hi,
> now that I got the re-sync issue sorted by moving to 8.3 instead of 8.4 I'm
> investigating why I'm seeing an i/o load of 20% even when I run a fio
> sequential write test throttled to 5M/s. Once I disconnect the secondary
> node the i/o wait time decreases significantly so it seems the connection
> to the other node is the problem.
> I did a bandwidth test with iperf that shows 940Mbit/s and ping shows a
> latency of consistently 0.3ms so I can't really see an issue here.
> Any ideas what could be going on here.

By I/O load, do you mean, for example, the "%util" number iostat reports 
for the DRBD device? I've observed similar numbers in my production 
environment, and I've decided it's a non-issue.

I don't know the details of why that number is high for the DRBD device, 
but I pretty much ignore it for anything but a single physical spindle, 
and even then I consider it only a hint. The number is calculated from a 
counter which is incremented for each millisecond that a block device 
queue has a non-zero number of requests in it. iostat (and sar, and 
similar) calculate %util by reading this counter, finding the change 
since the last reading, then dividing that quantity by the number of ms 
elapsed since the last reading. Thus if the queue always had something 
in it, %util is 100%.

The problem is that always having something in the queue doesn't mean 
the device is saturated. A RAID 0 device, for example, won't reach full 
potential until at least as many requests as there are spindles are 
pending. You can make %util reach 100% on this RAID 0 device by issuing 
one request after another, but all but one spindle will be idle since 
there's never more than one thing to do, and the RAID device as a whole 
isn't saturated. The same is true (perhaps to a lesser extent) even of 
single physical drives with NCQ enabled, or of SSDs or RAID 5 devices 
given writes that don't write whole physical blocks/stripes, etc.

I suspect a similar phenomenon is at work in DRBD. I'd guess (and this 
is just a guess, I've never examined DRBD internals beyond what's in the 
manual) that the unusually high %util is due to the activity log [1] or 
perhaps some other housekeeping function. With a slow write, DRBD has 
time to create and clean a hot extent for each write it receives. So for 
each block actually written, maybe there are a handful of other writes 
to the activity log that are housekeeping overhead, which works out to 
something absurd like 500%, causing your high %util. Once you start 
giving DRBD more real write requests, however, all these writes can be 
batched into one activity log transaction, so now these same handful of 
housekeeping writes work out to a small overhead like 2%, and the 
unusually high %util vanishes.

[1] http://www.drbd.org/users-guide/s-activity-log.html