On Mon, Oct 4, 2010 at 11:13 AM, Bart Coninckx <span dir="ltr"><<a href="mailto:bart.coninckx@telenet.be">bart.coninckx@telenet.be</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
JR,<br>
<br>
thank you for this very elaborate and technically rich reply. I will certainly<br>
look into your suggestions about using Broadcom cards. I have one dual port<br>
Broadcom card in this server, but I was using one port combined with one port<br>
on an Intel e1000 dual port NIC in balanced-rr to provide for backup in the<br>
event a NIC goes down. Two port NICs usually share one chip for two ports, so<br>
in case of a problem with the chip, the complete DRBD would be out. Reality<br>
shows this might be a bad idea though: doing a bonnie++ test to the backend<br>
storage (RAID5 on 15K rpm disks) gives me a 255 MB/sec write performance,<br>
doing the same test on the DRBD device drops this to 77 MB/sec, even with the<br>
MTU set to 9000. It would be nice to get as close as possible to the<br>
theoretical maximum, so a lot needs to be done to get there.<br>
Step 1 would be changing everything to the broadcom NIC. Any other<br>
suggestions?</blockquote><div><br></div><div>77MB/sec is low for a single GigE link if you backing store can do 250MB/sec. I think you should test on your hardware with a single GigE--no bonding--and work on getting close to the 110-120M/sec range before pursuing bonding optimization. Did you go through: <a href="http://www.drbd.org/users-guide-emb/p-performance.html">http://www.drbd.org/users-guide-emb/p-performance.html</a> ? </div>
<div><br></div><div>I use the following network sysctl tuning:</div><div><br></div><div><div># Tune TCP and network parameters</div><div>net.ipv4.tcp_rmem = 4096 87380 16777216</div><div>net.ipv4.tcp_wmem = 4096 65536 16777216</div>
<div>net.core.rmem_max = 16777216</div><div>net.core.wmem_max = 16777216</div><div>vm.min_free_kbytes = 65536</div><div>net.ipv4.tcp_max_syn_backlog = 8192</div><div>net.core.netdev_max_backlog = 25000</div><div>net.ipv4.tcp_no_metrics_save = 1</div>
<div>sys.net.ipv4.route.flush = 1</div></div><div><br></div></div>This gives me up to 16MB TCP windows and considerable backlog to tolerate latency with high-throughput. It's tuned for 40gbit IPoIB, you could reduce some of these numbers for slower connections...<div>
<br></div><div>Anyway, what NICs are you using? Older interrupt-based NICs like the e1000/e1000e (older Intel) and tg3 (older Broadcom) will not perform as well as the newer RDMA-based hardware, but they should be well above the 77MB/sec range. Does your RAID controller have a power-backed write cache? Have you tried RAID10?</div>
<div><br></div><div>-JR</div>