[DRBD-user] Severe disk IO problems

Stephen Marsh stephen at serverforce.net
Fri Aug 30 19:31:58 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Hi all,

I've recently upgraded to DRBD 8.4.3 (protocol C) on CentOS 6.4 (kernel 
3.10.10) with Xen 4.3.0 on hardware RAID10 with an Infiniband 20Gbit/sec 
replication link.

For a few days now, we've been experiencing a very strange issue whereby 
(seemingly randomly) the system will become almost unresponsive, with 
iowait going to 100% on some (but not all) domUs and dom0, but even the 
domUs whose load remains stable will still be incredibly sluggish. The 
problem occurs even when the resources are in standalone mode.

Sometimes it self-corrects, but it's becoming more severe and is now 
less likely to go away without a reboot. Earlier today, the system 
running as primary was at 0.02 load, and the slave (which was doing 
nothing other than receiving updates from the master, no domUs running) 
went to 13 load and was pretty much dead.

I've tried a variety of tuning options, including enabling 
disable_sendpage, but nothing is making it any better. Nothing is 
printed to the logs.

My next thought is to try downgrading to DRBD 8.3, but considering 
support ends in December, I'd much prefer to continue using 8.4.

I'm very much hoping that someone more experienced than myself will be 
able to offer some words of wisdom. :)

Stephen Marsh

More information about the drbd-user mailing list