[DRBD-user] read-balancing: could not measure any performance benefits + alternative proposal

Thu Jan 5 18:22:53 CET 2012

[This is kind of a re-post, as my previous post as a response
  in another thread might not have been noticed by most readers.]

I had some opportunity to try out some of the "read-balancing"-settings
- in a setup that I thought could benefit from that parameter most:
Dedicated short 10GBit/s line between two computers, each of which
just uses one ordinary SATA magnetic disk for a resource.

However, I did not succeed to measure any significant benefit
from using "read-balancing":

When there is just one reading process, it is no wonder that using
"least-pending",  "1M-striping" or such cannot really be faster
than reading from one disk, as the other disk will have to skip/seek over
the "unread" parts of the file, anyway.

But I hoped for some increased performance when running two
independent processes reading two different files. No such luck -
while "iostat" clearly showed how the remote disk was also being
used for reading, the total rate of both processes reading did
not really exceed that of both processes reading from the local disk,
regardless of the balancing mode I chose.

Then I thought: Well, it's explainable if DRBD does not make
any use of knowing _where_ on disk data is asked to be read from:
Both processes issue read requests for different locations, and
if the distribution of those read requests is arbitrary, chances
are that each disk will need to do a lot of seeking.

I don't know whether that's technically really feasible, but DRBD
could do something like:
  "Remember the location of the last read. Now if the next read goes to
   a location next to the end of the last read, issue it to the same
   host (whether it was "near" or "remote"). If the next read goes
   to a far away location, issue it the "the other" host."
There's certainly room for more much more sophisticated methods to
benefit from "read-balancing", but this sounds like one very simple
approach that could at least yield some really measureable benefit
from read-balancing.

(For the situation when many processes do many small reads from
different locations, the above should not show much different
performance from the "least-pending" approach.)

What do you think?

Regards,

Lutz Vieweg