[DRBD-user] bonding more than two network cards still a bad idea?

Wed Oct 6 19:13:35 CEST 2010

On Monday 04 October 2010 20:45:30 J. Ryan Earl wrote:
> On Mon, Oct 4, 2010 at 11:13 AM, Bart Coninckx 
<bart.coninckx at telenet.be>wrote:
> > JR,
> > 
> > thank you for this very elaborate and technically rich reply. I will
> > certainly
> > look into your suggestions about using Broadcom cards. I have one dual
> > port Broadcom card in this server, but I was using one port combined
> > with one port
> > on an Intel e1000 dual port NIC in balanced-rr to provide for backup in
> > the event a NIC goes down. Two port NICs usually share one chip for two
> > ports, so
> > in case of a problem with the chip, the complete DRBD would be out.
> > Reality shows this might be a bad idea though: doing a bonnie++ test to
> > the backend storage (RAID5 on 15K rpm disks) gives me a 255 MB/sec write
> > performance, doing the same test on the DRBD device drops this to 77
> > MB/sec, even with the
> > MTU set to 9000. It would be nice to get as close as possible to the
> > theoretical maximum, so a lot needs to be done to get there.
> > Step 1 would be changing everything to the broadcom NIC. Any other
> > suggestions?
> 
> 77MB/sec is low for a single GigE link if you backing store can do
> 250MB/sec.  I think you should test on your hardware with a single GigE--no
> bonding--and work on getting close to the 110-120M/sec range before
> pursuing bonding optimization.  Did you go through:
> http://www.drbd.org/users-guide-emb/p-performance.html ?
> 
> I use the following network sysctl tuning:
> 
> # Tune TCP and network parameters
> net.ipv4.tcp_rmem = 4096 87380 16777216
> net.ipv4.tcp_wmem = 4096 65536 16777216
> net.core.rmem_max = 16777216
> net.core.wmem_max = 16777216
> vm.min_free_kbytes = 65536
> net.ipv4.tcp_max_syn_backlog = 8192
> net.core.netdev_max_backlog = 25000
> net.ipv4.tcp_no_metrics_save = 1
> sys.net.ipv4.route.flush = 1
> 
> This gives me up to 16MB TCP windows and considerable backlog to tolerate
> latency with high-throughput.  It's tuned for 40gbit IPoIB, you could
> reduce some of these numbers for slower connections...
> 
> Anyway, what NICs are you using?  Older interrupt-based NICs like the
> e1000/e1000e (older Intel) and tg3 (older Broadcom) will not perform as
> well as the newer RDMA-based hardware, but they should be well above the
> 77MB/sec range.  Does your RAID controller have a power-backed write
> cache?  Have you tried RAID10?
> 
> -JR

JR,

finalized some testing and it seems there is quite a bit of difference in what 
a "dd" test reports opposed to a bonnie++ test. 

dd shows me an average of about 180 MB/sec, which I think is decent on two 
bonded gigabit NICs.
bonnie++ shows me 100 MB/sec for block writes on ext3 on DRBD.

Probably my conclusion needs be that my hardware is performing acceptably but 
that my test tool is wrong.

B.