Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, I've been having significant performance issues with my DRBD storage, and I'm trying to work out what I need to do to fix it. I'm assuming the purchase of new hardware will be part of the solution, but I don't want to do that if it isn't going to help. I'm running DRBD with the secondary disconnected from 7am to 7pm, so these performance figures are not affected by the network connection to the secondary, nor the performance of the secondary server itself. The primary is a 5 x 480G Intel SSD in RAID5 using Linux md (no battery backed storage). I have collected data (every 10 secs) into an RRD from /sys/block/<device>/stat which includes the reads/writes, and the backlog and activetime. In particular, during times of high read/write, I see spikes in the equivalent backlog and activetime values. 1) Does a high value for backlog imply that a lower layer is slowing things down? ie, if my drbd2 device shows a backlog value of 2100 (my max peak in the past two hours), then that means the underlying md device is too slow? At the same time drbd2 had a backlog value of 2100, the underlying drives sd[b-f] had high values of around 70 to 100. Graphs can be seen at: http://203.98.89.64/graphs/ You can see the drbd2 and each of the SSD from the RAID5 array for backlog/activetime 2) Does it follow that if I improve the performance of the underlying RAID, then DRBD will improve, reducing the backlog, and making the users happy? /proc/mdstat: md1 : active raid5 sdf1[0] sdc1[4] sdb1[5] sdd1[3] sde1[1] 1863535104 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] bitmap: 4/4 pages [16KB], 65536KB chunk There is only a single partition on each drive, and I have left a small part of each drive empty (which is supposed to help performance because of the lack of TRIM). Each SSD has it's scheduler set to deadline (which did seem to help a little).... DRBD config: resource storage2 { protocol A; device /dev/drbd2 minor 2; meta-disk internal; on san1 { address 172.17.56.1:7802; disk /dev/md1; } on san2 { address 172.17.56.2:7802; disk /dev/md0; } net { after-sb-0pri discard-younger-primary; after-sb-1pri discard-secondary; after-sb-2pri call-pri-lost-after-sb; max-buffers 16000; max-epoch-size 16000; unplug-watermark 8192; sndbuf-size 512k; on-congestion pull-ahead; } startup { wfc-timeout 10; degr-wfc-timeout 20; } syncer { verify-alg crc32c; rate 20M; al-extents 3389; } } Any other comments on this config which might impact on performance? Basically, I just want to rule out DRBD as being the performance limiting factor, and focus on underlying hardware, but no point throwing hardware at it if the problem is actually somewhere else. Finally, suggestions on specific hardware which could be added to improve the performance (or software, or config changes) ? Thanks, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au