[DRBD-user] Performance regression with DRBD 8.3.12 and newer

Matthias Hensler lists-drbd at wspse.de
Mon Jun 11 18:35:18 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi at all,

it seems to me that there is some kind of write performance regression
that started with DRBD 8.3.12, and is still present in 8.3.13 and 8.4.1.

First some details about the testing hardware:
1. two identical servers with an Intel Server S2600CP mainboard
2. each server has 2 physical Xeon E5-2620 CPUs and 64 GB Ram
3. the network connection is a patchcable between the on-board
   Intel dual I350 Gigabit network card (so no active hardware/switches
   between both nodes)
4. the servers system is on a separate SSD, for DRBD performance tests
   I used a dedicated harddisk, there is no other load (I/O, CPU or
   network) on the server while testing
5. harddisk is a 3 TB Seagate ST3000DM001-9YN166 (firmware CC4H)
   connected to a Sata III port on the mainboard.
6. OS is a fully patched CentOS 6.2 with DRBD 8.3 packages from elrepo.

I started with the recent DRBD 8.3.13, trying to optimize the setup for
later production usage. On each harddisk on both sides I created a 3 TB
LVM partition (using gdisk, creating a GPT partition table), created a
volume group and two seperate logical volumes in it (one for usage
without DRBD and one for usage with DRBD to measure differences).

Both partitions have a size of 50G. One partition was put setup as DRBD
block device with more or less default values (protocol C, al-extents
3389 and sndbuf-size of 0 (auto)). Both (the real partition, as well as
the DRBD blockdevice) were formatted using ext4 with default values and
then mounted.

I started running bonnie++ 1.96 with different values on both
filesystems. Most of my later tests were made with 
 "bonnie++ -u root -d /mnt/drbd/ -r 8192 -b -n 0"
and looking at the sequential output block results there.

Writing on the physical volume I see a block-write rate of around
160-180 MBytes/s (which is to be expected).

I then checked the raw tcp network performance with iperf seeing a rate
of 992 MBit/s.

I then exported the filesystem on the non-drbd volume using NFS, running
bonnie++ on the NFS mount. I reached between 100 and 105 MBytes/s there.
That is without any further tuning and a reasonable value for a Gigabit
network.

Then the drbd volume was mounted and I run bonnie++ on it. I only got
values between 46 and 52 MBytes/s in several runs, which is only half of
the expected rate.

I spend a lot of time tuning the following parameters:
1. setting MTU from 1500 to 9000 on both sides
2. switched from CFQ to Deadline scheduler
3. increased net.ipv4.tcp_*mem values to 131072 131072 10485760
4. tried several settings for the deadline scheduler (disabled
   front_merges, changed read/write-expire, etc)
5. set max-buffers, max-epoch-size and unplug-watermark to 8000
6. mounted my filesystem with barriers=0
7. tried bonnie++ with the secondard DRBD node disconnected
8. reformatted the filesystem with ext2 (this is the only case were I
   saw an actually different value: the write performance dropped to 38
   MByte/s)
9. added "no-tcp-cork" to DRBDs config
10. tries "use-bmbv" in DRBD
11. switched from protocol C to protocol A
12. finally removed the LVM layer by recreating the partitiontable on
    both sides using parted (still GPT table) and a physical device for
    DRBD (again 50 GB)

Not one of these tweaks changed my write performace, which left me
puzzled. I also removed the filesystem from the equation, running "dd"
on the drbd blockdevice with several blocksizes and different oflags
(direct, fsync, etc), but that also did not bring a write performance
above 50 MByte/s.

Finally I thought about trying DRBD 8.4.1, but decided to first try a
different version without changing the metadata. Therefore I downloaded
the oldest RHEL6 package (version 8.3.9) and redid my tests with an ext4
formatted drbddevice.

The result was actual different. Just by downgrading from 8.3.13 to
8.3.9 the write performance doubled and reached 100 MByte/s. That is the
value I expected on the first place and could nowhere archive with
8.3.13.

I then updated to 8.3.10 and then to 8.3.11. The results were still good,
reaching between 100 and 105 MByte/s.

Finally I installed DRBD 8.3.12: and back was the low write performance
(54 MByte/s). I cross checked with DRBD 8.4.1, but the performance was
still low (50 MByte/s).

As last check I mixed the versions:
1. 8.3.12 on the primary side and 8.3.11 on the secondary side gave me a
   throughput of 53 MByte/s (still low)
2. 8.3.11 on the primary and 8.3.12 on the secondary side resulted in a
   new value: 70 MByte/s (that is slightly higher than any value I could
   archieve with both 8.3.12er versions, but still a big factor lower
   than it should be).


In conclusion: all versions up to and including 8.3.11 give me a good
and to be expected write performace. Starting with 8.3.12 something did
break, leaving me with only half of the possible write throughput.

I should mention that the read performance was never affected and always
high.

I checked the changelog for 8.3.12, but nothing obviously struck me.
Also diffing the sourcetrees 8.3.11->8.3.12 I did not find any obvious.

So I want to ask here: anyone seeing that problem too, or maybe having a
clue what is going on here. As far as I see I already turned any knob
that should have an effect, but nothing gives me a good write
performance with 8.3.12 and later.

Maybe I missed something very obvious, but then I do not understand why
just downgrading the DRBD version doubles the write performance.

So, please advise :) I am happy to provide any additional details that
might be needed.

Regards,
Matthias
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 308 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120611/c4f5686e/attachment.pgp>


More information about the drbd-user mailing list