On Die, 2008-01-08 at 11:29 +0100, Lars Ellenberg wrote: > On Mon, Jan 07, 2008 at 08:13:29PM +0100, Bernd Petrovitsch wrote: > > Hi all! > > > > I have 2 DRBD clusters - one cluster with Pentium 4 3.2 GHz CPUs and > > "3ware 7000-series ATA-RAID" controllers with RAID1 over two 75GB SCSI > > disks each, the other with Xeon 3.2 GHz CPUs and "Adaptec SmartRAID V" > > controllers with 6 disks (RAID0 over 3 RAID0), both with a > > 2.6.15-1-em64t-p4-smp kernel kernel from Debian/Sarge backports > > (admittedly from many months ago). > > any software raid or lvm striping involved? None - it's all done by the hardware (on two smaller we have /dev/sda and on the larger /dev/i2o/hda as the only block devices). > > Typical workload is manipulating many many small files. However the > > nightly backup job (especially on the 6-disk hosts) job stresses the I/O > > subsystem that much that it blocks the rest of the host almost > > completely unusable and programs run into timeouts on I/O. > > That kernel uses per default the "anticipatory" I/O scheduler (which > > seems to be the problem of the starvation). > > I wonder > > - if there is any risk involved in changing that (via kernel command > > line and/or via /sys/block//queue/scheduler) to "cfq" - > > the presumbly best one, and > > go ahead. > > I personally prefer "deadline" on servers. > even though for a busy postgress, we had good results with cfq, > which "feeled" marginally better than deadline there. > main tunables for deadline apear to be: > /sys/block/hda/queue/iosched/front_merges:1 > (can be switched off for raid controllers with good write cache) > /sys/block/hda/queue/iosched/read_expire:500 > (ms. tune to 300 maybe) > /sys/block/hda/queue/iosched/write_expire:5000 > (ms. tune to 1500 maybe) > cfq parameters are much less intuitive to tune :) OK. > > - if that interferes with DRBD on top, and > > if you care about write latency, > anticipatory is very bad for DRBD. > even noop should be better. > > if you care most about read latency, > and don't care for write latency at all, > stay with anticipatory. Given the current situation where "anticipatory" doesn't work that good, we very probably (also) care for write latency. > > - if that actually buys anything, especially avoids the starvation. > > depends. tuning the io scheduler can buy you a lot. ;-) ACK, "tuning" is by far not an exact science;-) > depending on how your backup job works, I suggest tuning there, too. Of course - fiddling with the nice level didn't really help. > e.g. if there is any streaming pipe involved, like > input_streaming | output_streaming > just insert some "buffer -u 100" (apt-get install buffer; man buffer) > like so: > input_streaming | buffer -u 100 | output_streaming > > that should be enough to avoid it starving the rest of the system. Thanks a lot (also for the hints above), I have to see if that is possible. Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services