Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello list, We are currently investigating a performance issue with our DRBD cluster. We have recently upgraded the hardware in the cluster (added new and additional raid cards, SSD caching and a lot of spindles) and experiencing the following problem: When our secondary node is connected to the cluster this leads to a dramatical performance drop in terms of time the primary spends in i/o wait and a dramatically reduced throughput to the backing devices. Before upgrading our cluster hardware we just thought that high i/o wait is due to our backing devices hitting the ground but that can't be the reason according to my tests. We currently have 4 raid sets on each cluster node as backing devices connected to two LSI 9280 raid cards (in each cluster node) with BBWC and FastPath and CacheCade option. CacheCade is turned on for this tests and the CacheCade disks are Intel 520 SSD running in raid 1. Each controller has a ssd cache like this. Even if there is abolutely no load on the cluster we obverse the following: When doing a single sequential write to the primary node to one of the raid sets the i/o wait of the cluster node rises to about 40-50 % of the cpu time where the secondary is sitting IDLE (i/o wait 0.2 percent on here). Notice these are 8 core machines so 50% i/o wait means 4 cores or 4 threads waiting for the block device all the time. Throughput drops from ~ 450Mbyte/s to 200 Mbyte/s opposed to the situation where we take down the corresponding drbd device on the secondary. If the drbd device is running in stand alone mode on the primary the i/o wait is as low as ~ 10 - 15 percent which I assume is normal behaviour when one sequential write is hitting a block device at max rate. We first thought that this might be an issue with our scst configuration but this also happens if we do a test locally on the cluster node. The cluster nodes are connected with a 10GBE link in a back-2-back fashion. The measured RTT is about 100us and the measured TCP bandwidth is 9.6 Gbps. As I said the backing device on the secondary is just sitting there bored- the i/o wait on the secondary is about 0.2 percent. We already rose the al-extends parameter as I read that it could be a cause for performance issues if drbd has to do frequent meta data updates. This is the current config of one of our drbd- devices: resource XXX { device /dev/drbd3; disk /dev/disk/by-id/scsi-3600605b00494046018bf37719f7e1786-part1; meta-disk internal; net { max-buffers 16384; unplug-watermark 32; max-epoch-size 16384; sndbuf-size 1024k; } syncer { rate 200M; al-extents 3833; } disk { no-disk-barrier; no-disk-flushes; } on node-a { address 10.1.200.1:7792; } on node-b { address 10.1.200.2:7792; } } Here is a screenshot showing a single sequential write to the device (the primary is on the left): http://imageshack.us/f/823/drbd.png/. By the way can anyone possibly elaborate why the tps is so much higher on the secondary? There are 16GB mem in each node and this device whe are talking about is a 12 spindle raid 6 with ssd caching (read and write). We aleady tried disabling ssd cache but this is even getting worse. Although the cluster is STILL in good responsiveness most oft he time this is getting an issue for us as this seems to impact performance on all devices on the cluster so i/o service time of ALL devices rises to about 300ms when e.g. a storage vmotion is done on one oft he devices so putting more disks to the cluster does not help us to reasonably improve the performance in the moment. BTW this happens on ALL drbd devices in this cluster with different raid sets and different disks and so on. Thanks to everyone in advance, regards, Felix