Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
We notice a performance issue when writing to drbd devices on our
cluster. Write throughput averages at about 12MB/s with disks and
network both being able to deliver more than 100MB/s.
The setup in brief:
- two nodes, reasonably powerful hardware
- internal RAID controller with RAID-10 configured
- direct 1Gbit network interlink used only for syncing drbd devices
- Ubuntu 12.04 LTS server, Kernel 3.2.0-53, DRBD 8.3.11
- synchronous replication protocol used for drbd devices
- no fancy drbd tunables set (e.g. "rate")
The storage stack on each node looks like this:
- Physical disks -> RAID-10 -> LVM2 logical volume -> drbd device
- We tested the network connection simply with netcat achieving the
expected 110MB/s throughput of a proper 1Gbit interlink.
- We checked the write througput of the backend storage on each node
by dd'ing data to a logical volume and achieved writing speeds at
about 250MB/s on average.
Hence, drbd should perform quite well and should use up all the
bandwidth the 1Gbit network interlink can deliver. Well,
unfortunately, it does not - we are stalled at about 12MB/s.
So something along the chain: Node A phys. disks -> RAID-10 -> LVM2
logical volume -> drbd device -> network interlink -> drbd device ->
LVM2 logical volume -> RAID-10 -> node B phys. disks is slow.
Then we did another test. We disconnected the drbd device and dd'ed
some data to it. That was unexpectedly slow. After the dd finished
(average throughput 12MB/s), we connected the drbd to its peer again
and it synced with 100MB/s over the interlink using all the bandwidth
there is.
TL;DR:
- local logical volume -> local logical volume: FAST writing speeds
- local LVM2 logical volume -> local drbd device: SLOW writing speeds
Any ideas what could eat up our performance while writing to a local
drbd device?
Thanks
Matthias