[DRBD-user] DRBD performance (only uses 40% of a GE link)

Sat Mar 1 06:49:30 CET 2008

Hello,

I'm trying to build a HA cluster here. Each node has 8 2.66GHz cpu cores,
24GB RAM and 8 1TB SATA drives behind a LSI (Fusion MPT) SAS 1068E
controller. Interconnection is via one of 4 1GE interfaces, directly. 
Kernel is 2.6.22.18 and DRBD is 8.0.11, the storage device in question is
a 3TB MD RAID5 spread across all 8 drives. The native results for this
device using ext3 and bonnie for benchmarking are:
---
 Version  1.03       ------Sequential Output------ --Sequential Input-
--Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
borg00a      50000M           120486  36 87998  17           535665  44 390.9   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                128 74265  90 +++++ +++ 83659 100 71540  88 +++++ +++ 81619  99
---

The same test done on the resulting (UpToDate) drbd device:
---
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
borg00a      50000M           41801  13 39659  11           413367  37 397.7   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                128 78847  95 +++++ +++ 86936  99 78722  95 +++++ +++ 63054  76
---

The acronym WTF surely did cross my lips at this result.
Only 33% of the original write speed?
Using only about 400Mbit/s of the link's capacity and most definitely not
being CPU bound? 
And while a 410MB/s read speed is just fine still, how does DRBD manage
to loose about 20% speed in READs???

I probably should have seen this coming when the syncer on the initial
build only managed to get a bit shy of 50MB/s throughput, even though it
was permitted 160 (I was pondering bonding 2 interfaces, but that seems to
be a wasted effort now).

The interlink is fine and completely capable of handling the full capacity
one would expect from it. The only tuning done was to set the MTU to 9000
since the NPtcp (netpipes-tcp) results showed a slight improvement with
this setting (800Mbit/s @128k message size versus 840/Mbit/s).
A ftp transfer clearly and happily gobbled up all the bandwidth:
---
3020690000 bytes sent in 24.95 secs (118231.8 kB/s)
---

I ran ethstats in parallel to all these tests and its numbers confirmed
the link utilization to match the test results.

According to the NPtcp results a throughput of about 400Mbit/s would
equate to a message size of about 16KB, is DRBD really just sending such
tiny chunks and thus artificially limiting itself?

DRBD conf:
---
common {
  syncer { rate 160M; al-extents 1801; }
}
resource "data-a" {
  protocol C;
  startup {
    wfc-timeout         0;  ## Infinite!
    degr-wfc-timeout  120;  ## 2 minutes.
  }
  disk {
    on-io-error detach;
    use-bmbv;
  }
  net {
    # sndbuf-size 512k;	
    # timeout           60;
    # connect-int       10;
    # ping-int          10;
    max-buffers     2048;
    # max-epoch-size  2048;
  }
  syncer {
  }

  on borg00a {
    device      /dev/drbd0;
    disk        /dev/md5;
    address     10.0.0.1:7789;
    meta-disk internal;
  }

  on borg00b {
    device     /dev/drbd0;
    disk       /dev/md5;
    address    10.0.0.2:7789;
    meta-disk internal;
  }
}
---

The test machines (also with a 1GE interlink) on which I tried DRBD before
only had an ATA MD RAID1, which was the limiting factor I/O wise, so I
never saw this coming.

All the above tests/results were repeated several times and only one
sample is shown, as they had no significant variations. The machines were
totally idle otherwise. 

What am I missing here? Anything else to tune or look for? I didn't play
with the kernel tcp buffers, since obviously neither NPtcp nor ftp were
slowed down by those defaults.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                NOC
chibi at gol.com   	Global OnLine Japan/Fusion Network Services
http://www.gol.com/