[DRBD-user] tuning?

Mon Jun 7 04:17:15 CEST 2010

re. previous messages on this topic:

It's absolutely amazing with mounting volumes with "noatime" set will do 
to reduce i/o wait times!  Took a while to figure this out, though.

Miles
>
>> I wrote:
>>> I've been doing some experimenting to see how far I can push some old
>>> hardware into a virtualized environment - partially to see how much use
>>> I can get out of the hardware, and partially to learn more about the
>>> behavior of, and interactions between, software RAID, LVM, DRBD, and 
>>> Xen.
>>>
>>> What I'm finding is that it's really easy to get into a state where one
>>> of my VMs is spending all of its time in i/o wait (95%+).  Other times,
>>> everything behaves fine.
>>>
>> Bart Coninckx replied:
>>> Test the low level storage with bonnie++ by bringing DRBD down first 
>>> and have
>>> it on run on the RAID6. If it hits below 110 MB/sec, that is your 
>>> bottleneck.
>>> If it above, you might to replace the sync NICs by a bond. This will 
>>> give you
>>> about 180 MB/sec in mode 0. Then test with bonnie++ on top of active 
>>> DRBD
>>> resource.
>>>
>> and Michael Iverson wrote:
>>> Your read performance is going to be limited by your RAID selection. 
>>> Be prepared to experiment and document the performance of various 
>>> different nodes.
>>>
>>> With a 1G interconnect, write performance will be dictated by 
>>> network speed. You'll want jumbo frames at a minimum, and might have 
>>> to mess with buffer sizes. Keep in mind that latency is just as 
>>> important as throughput.
>> <snip>
>>> However, I think you'll need to install a benchmark like iozone, and 
>>> spend a lot of time doing before/after comparisons.
>> And to summarize the configuration again:
>>> - two machines, 4 disk drives each, two 1G ethernet ports (1 each to 
>>> the
>>> outside world, 1 each as a cross-connect)
>>> - each machine runs Xen 3 on top of Debian Lenny (the basic install)
>>> - very basic Dom0s - just running the hypervisor and i/o (including 
>>> disk
>>> management)
>>> ---- software RAID6 (md)
>>> ---- LVM
>>> ---- DRBD
>>> ---- heartbeat to provide some failure migration
>>> - each Xen VM uses 2 DRBD volumes - one for root, one for swap
>>> - one of the VMs has a third volume, used for backup copies of files
>>>
>>>
>> First off, thanks for the suggestions guys!
>>
>> What I've tried so far, which leaves me just a bit confused:
>>
>> TEST 1
>> - machine 1: running a mail server, in a DomU, on DRBD root and swap 
>> volumes, on LVs, on raid6 (md)
>> --- baseline operation, disk wait seems to vary from 0% to about 25% 
>> while running mail
>> --- note: when this was a non-virtualized machine, running on a 
>> RAID-1 volume, never saw disk waits
>> - machine 2: just running a Dom0, DRBD is mirroring volumes from 
>> machine 1
>> --- Dom0's root and swap are directly on raid6 md volumes
>> --- installed bonnie++ into Dom0, ran it
>> --- different tests showed a range of speeds from around 50MB/sec to 
>> 80MB/sec (not blindingly fast)
>>
>> TEST2
>> - same as above, but TURNED OFF DRBD on machine 2
>> -- some improvement, but not a lot - one test went from 80MB/sec to 
>> 90MB/sec
>>
>> TEST3
>> - tuned DRBD back on on machine 2
>> - added a domU to machine 2
>> - ran bonnie++ inside the domU
>> -- reported test speeds dropped to 23M/sec to 54M/sec, depending on 
>> the test
>> -- I saw up to 30MB/sec of traffic on the cross-connect ethernet 
>> (vnstat) - nothing approaching the 1G theoretical limit
>>
>> TEST4
>> - started a 2nd domU on machine2
>> - re-ran the test (inside the other domU)
>> - reported speeds dropped marginally (20M - 50M)
>>
>> TEST5
>> - moved to machine 1 (the one running the mail server), left one domU 
>> running on the other machine
>> - while mail server was running in domU; ran bonnie++ in dom0
>> -- reported speeds from 31M to 44M
>> -- interestingly, saw nothing above 1MB/sec on the cross-connect, 
>> even though dom0 has priority
>>
>> TEST6
>> - again, on the mail server machine
>> - started a 2nd domU, ran bonnie++ in the 2nd domU
>> --- reported speeds of 23M up to 72M; up to 30M/sec on the cross-connect
>> --- what was noticeable was that the mail server's i/o wait time 
>> (top) moved up from 5-25% to more like 25-50%
>>
>> TEST7
>> - as above, but ran bonnie++ in the same domU as the mail server
>> - reported speeds dropped to 34M-60M depending on the test
>> - most noticeable: started seeing i/o wait time pushing up to 90%, 
>> highest during the "writing intelligently" and "reading 
>> intelligently" tests
>>
>> OTHER DATA POINTS
>> - when running basic mail and list service, the domU runs at about 
>> 25% i/o wait as reported by top
>> - when I start a tar job, i/o wait jumps up to the 70-90% range
>> - i/o wait seems to drop just slightly if the tar job is reading from 
>> one DRBD volume and writing to another (somewhat counterintuitive as 
>> it would seem that there's more complexity involved)
>>
>> Overall, I'm really not sure what to make of this.  It seems like:
>> - there's a 40-50% drop in disk throughput when I add LVM, DRBD, and 
>> a domU on top of raid6
>> - the network is never particularly loaded
>> - lots of disk i/o pushes a lot of cpu cycles into i/o wait - BUT... 
>> it's not clear what's going on during those wait cycles
>>
>> I'm starting to wonder if this is more a function of the hypervisor 
>> and/or memory/caching issues than the underlying disk stack.  Any 
>> reactions, thoughts, diagnostic suggestions?
>>
>> Thanks again,
>>
>> Miles Fidelman
>>
>>
>> -- 
>> In theory, there is no difference between theory and practice.
>> In<fnord>  practice, there is.   .... Yogi Berra
>>
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra