[DRBD-user] High iowait on primary DRBD node with large sustained writes and replication enabled to secondary

Sebastian Riemer sebastian.riemer at profitbricks.com
Tue Jan 8 12:24:21 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 07.01.2013 22:51, Paul Freeman wrote:
> I have come across a few references of similar behaviour on the net but have not come across the solution(s) which appear relevant in my situation.
> 
> Any comments and suggestions would be welcome.

The first thing to do is a "blktrace". I've already described how to do
this on this list.

The interesting bit is how big your IOs are for regular data as well as
for syncer IO. Try to do as big as possible IOs (e.g. 1 MiB) with direct
IO for this.

If IO sizes are O.K. (>= 128 KiB), but latency between IOs is extremely
high, then you know that something slows down the IOs.
Otherwise, it is possible that you hit the IO limits bug in DRBD < 8.3.14:

http://git.drbd.org/gitweb.cgi?p=drbd-8.3.git;a=commit;h=3a2911f0c7ab80d061a5a6878c95f6d731c98554

Repeat the blktracing connected and disconnected.

If you see 4 KiB IOs while connected, then it is an/the IO limits bug.

The dumb thing is that DRBD has a dynamic IO limits detection which
resets the IO limits to 4 KiB if you loose connection to the secondary.
This is why people need a caching RAID controller which merges and fixes
such crappy behavior and can't use an HBA with SW RAID.

Also with a really fast transport like QDR InfiniBand (40 Gbit/s) you'll
see that DRBD introduces lots of latency. Can't get more than approx.
250 MiB/s although the storage server can do > 600 MiB/s and the
transport can do 4 GiB/s at < 1 us latency.

This is why we enhanced MD RAID-1 for replication. MD is much more
stable, gives up to 512 KiB IOs, has a very intelligent write-intent
bitmap and is high performance replication. As transport we've got SRP
which is RDMA only. With replication separated from the transport we can
also do better blktracing. And we have the same latency for both RAID-1
disks as we have them both on SRP remote storage.

Cheers,
Sebastian

-- 
Sebastian Riemer
Linux Kernel Developer - Storage

We are looking for (SENIOR) LINUX KERNEL DEVELOPERS!

ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.riemer at profitbricks.com



More information about the drbd-user mailing list