Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
From: Florian Haas <florian at hastexo.com> Sent: Mon 05-03-2012 23:59 > On Mon, Mar 5, 2012 at 11:45 PM, Andreas Bauer <ab at voltage.de> wrote: > > I can share an observation: > > > > (Disclaimer: my knowledge of the Linux I/O stack is very limited) > > > > Kernel 3.1.0, DRBD 8.3.11, DRBD->LVM->MD-RAID1->SATA DISKS > > (Disks use CFQ scheduler) > > > > issue command: drbdadm verify all > > (with combined sync rate set to exceed disk performance) > > > > The system will become totally unresponsive, up to the the point that all > processes will wait longer than 120s to complete any I/O. In fact their I/O > does not get through until I/O load from the DRBD Verify reduces because some > volumes have completed their run. > > Sorry but this is a bit like: > > "Doctor, I poked a rusty knife into my eye..." > "Yes?" > "... and now I have a problem." > "Well you already said that." Nice one. :-) > If you're telling your system to use an sync/verify rate that you > _know_ to be higher than what the disk can handle, then kicking off a > verify (drbdadm verify) or full sync (drbdadm invalidate-remote) will > badly beat up your I/O stack. > > The documentation tells you to use a sync rate that doesn't exceed > about one third of your available bandwidth. You can also use > variable-rate synchronization which should take care of properly > throttling the syncer rate for you. But by deliberately setting a sync > rate that exceeds disk bandwidth, you're begging for trouble. Why > would you want to do this? Because I want to badly beat up my I/O stack? The point of this exercise is to reproduce the kernel crash. So to stay with the image, the stack should be able to take a beating without dying in the process. My point was that with DRBD verify I can produce a total I/O lock which otherwise with CFQ scheduler I was not able to produce. I run my configuration with fixed sync rates at the moment, but this can hit the same situation, when disk performance is dramatically reduced, for example when a RAID resync runs on the underlying disks. > The CFQ I/O scheduler is a bad choice for servers too, but that's > probably the lesser of your concerns right now. I am aware of the general recommendation to not use CFQ for servers, but I picked up the information that for a software RAID 1 on plain SATA rotating disks CFQ yields slightly better performance than the other schedulers. This would not apply to hardware RAID. regards, Andreas