[DRBD-user] Drbd and network speed

Lars Ellenberg lars.ellenberg at linbit.com
Wed Sep 16 15:04:08 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On Wed, Sep 16, 2009 at 07:36:08AM -0400, Diego Remolina wrote:
> You will be limited on writes to the speed of your drbd replication link  
> (If using protocol C, which you should if you care about your data).  
> Network teaming, bonding, etc will not work because you are pretty much  
> going from a single IP to a single IP, so there is no benefit in  
> aggregating NICs. If you really want to use the full potential of your  
> backend storage, you need to purchase 10GBit network cards for drbd.
> A very long time ago I ran some benchmarks:
> https://services.ibb.gatech.edu/wiki/index.php/Benchmarks:Storage#Benchmark_Results_3
> If you look at the first result in the table, even if the backend is  
> faster (I got over 200MB/s writes on a non-drbd partition), the drbd  
> partition maxes out for writes at around gigabit speed 123,738KB/s.
> I currently have a new set of servers with the ARECA SAS controllers and  
> 24 1TB drives. The backend can write up to ~500MB/s but when I use drbd,  
> the bottleneck is just 120MB/s.

linux bonding of _two_ nics in "balance-rr" mode, after some tuning
of the network stack sysctls, should give you about 1.6 to 1.8 x
the throughput of a single link.
For a single TCP connection (as DRBDs bulk data socket is),
bonding more than two will degrade throughput again,
mostly due to packet reordering.

> I guess the only other configuration that may help speed would be to  
> have a separate NIC per drbd device if you backend is capable of reading  
> and writing from different locations on disk and feeding several gigabit  
> replication links. I think it should be able with SAS drives.
> e.g
> /dev/drbd0 uses eth1 for replication
> /dev/drbd1 uses eth2 for replication
> /dev/drbd2 uses eth3 for replication
> ... you get the idea...


or, as suggested, go for 10GBit.

or "supersockets" (Dolphin nics).

or infiniband, which can also be used back-to-back (if you don't have
the need for an infiniband switch)
two options there:
  IPoIB (use "connected" mode!).
  or "SDP" (you need drbd 8.3.3 and OFED >= 1.4.2).

but, long story short: DRBD cannot write faster than your bottleneck.

This is how DATA modifications flow in a "water hose" picture of DRBD.
(view with fixed width font, please)

            \    WRITE    /
             \           /
         .---'           '---.
    ,---'     REPLICATION     '---.
   /              / \              \
   \    WRITE    /   \    WRITE    /
    \           /     \           /
     \         /       \         /
      |       |         '''| |'''
      |       |            |N|
      |       |            |E|
      |       |            |T|
      | LOCAL |            | |
      |  DISK |            |L|
      |       |            |I|
      +-------+            |N|
                           | |
                     .-----' '---------.
                     \      WRITE      /
                      \               /
                       \             /
                        |  REMOTE   |
                        |   DISK    |

REPLICATION basically doubles the data (though, of course, DRBD uses
zero copy for that, if technically possible).

Interpretation and implications for throughput should be obvious.
You want the width of all those things as broad as possible.

For the latency aspect, consider the height of the vertical bars.
You want all of them to be as short as possible.

Unfortunately, you sometimes cannot have it both short and wide ;)

But you of course knew all of that already.

: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
please don't Cc me, but send to list   --   I'm subscribed

More information about the drbd-user mailing list