[DRBD-user] performance issue when writing to drbd devices

Thu Sep 26 17:23:31 CEST 2013

Hello,

We notice a performance issue when writing to drbd devices on our
cluster.  Write throughput averages at about 12MB/s with disks and
network both being able to deliver more than 100MB/s.

The setup in brief:

- two nodes, reasonably powerful hardware
- internal RAID controller with RAID-10 configured
- direct 1Gbit network interlink used only for syncing drbd devices
- Ubuntu 12.04 LTS server, Kernel 3.2.0-53, DRBD 8.3.11
- synchronous replication protocol used for drbd devices
- no fancy drbd tunables set (e.g. "rate")

The storage stack on each node looks like this:

- Physical disks -> RAID-10 -> LVM2 logical volume -> drbd device

- We tested the network connection simply with netcat achieving the
expected 110MB/s throughput of a proper 1Gbit interlink.
- We checked the write througput of the backend storage on each node
by dd'ing data to a logical volume and achieved writing speeds at
about 250MB/s on average.

Hence, drbd should perform quite well and should use up all the
bandwidth the 1Gbit network interlink can deliver.  Well,
unfortunately, it does not - we are stalled at about 12MB/s.

So something along the chain: Node A phys.  disks -> RAID-10 -> LVM2
logical volume -> drbd device -> network interlink -> drbd device ->
LVM2 logical volume -> RAID-10 -> node B phys.  disks is slow.

Then we did another test.  We disconnected the drbd device and dd'ed
some data to it.  That was unexpectedly slow.  After the dd finished
(average throughput 12MB/s), we connected the drbd to its peer again
and it synced with 100MB/s over the interlink using all the bandwidth
there is.

TL;DR:

- local logical volume -> local logical volume: FAST writing speeds
- local LVM2 logical volume -> local drbd device: SLOW writing speeds

Any ideas what could eat up our performance while writing to a local
drbd device?

Thanks
Matthias