Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 08/14/2012 08:26 PM, Dennis Jacobfeuerborn wrote: > Hi, > now that I got the re-sync issue sorted by moving to 8.3 instead of 8.4 I'm > investigating why I'm seeing an i/o load of 20% even when I run a fio > sequential write test throttled to 5M/s. Once I disconnect the secondary > node the i/o wait time decreases significantly so it seems the connection > to the other node is the problem. > I did a bandwidth test with iperf that shows 940Mbit/s and ping shows a > latency of consistently 0.3ms so I can't really see an issue here. > Any ideas what could be going on here. By I/O load, do you mean, for example, the "%util" number iostat reports for the DRBD device? I've observed similar numbers in my production environment, and I've decided it's a non-issue. I don't know the details of why that number is high for the DRBD device, but I pretty much ignore it for anything but a single physical spindle, and even then I consider it only a hint. The number is calculated from a counter which is incremented for each millisecond that a block device queue has a non-zero number of requests in it. iostat (and sar, and similar) calculate %util by reading this counter, finding the change since the last reading, then dividing that quantity by the number of ms elapsed since the last reading. Thus if the queue always had something in it, %util is 100%. The problem is that always having something in the queue doesn't mean the device is saturated. A RAID 0 device, for example, won't reach full potential until at least as many requests as there are spindles are pending. You can make %util reach 100% on this RAID 0 device by issuing one request after another, but all but one spindle will be idle since there's never more than one thing to do, and the RAID device as a whole isn't saturated. The same is true (perhaps to a lesser extent) even of single physical drives with NCQ enabled, or of SSDs or RAID 5 devices given writes that don't write whole physical blocks/stripes, etc. I suspect a similar phenomenon is at work in DRBD. I'd guess (and this is just a guess, I've never examined DRBD internals beyond what's in the manual) that the unusually high %util is due to the activity log [1] or perhaps some other housekeeping function. With a slow write, DRBD has time to create and clean a hot extent for each write it receives. So for each block actually written, maybe there are a handful of other writes to the activity log that are housekeeping overhead, which works out to something absurd like 500%, causing your high %util. Once you start giving DRBD more real write requests, however, all these writes can be batched into one activity log transaction, so now these same handful of housekeeping writes work out to a small overhead like 2%, and the unusually high %util vanishes. [1] http://www.drbd.org/users-guide/s-activity-log.html