On Thu, Dec 18, 2008 at 7:24 AM, Lars Ellenberg <span dir="ltr"><<a href="mailto:lars.ellenberg@linbit.com">lars.ellenberg@linbit.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">On Wed, Dec 17, 2008 at 04:17:00PM -0500, Parak wrote:<br>
> Hi all,<br>
><br>
> I'm currently playing with DRBD (8.2.7) on 20Gb/s Infiniband, and it seems that<br>
> I'm running at the sync rate as the limiting speed factor. The local storage on<br>
> both nodes is identical (SAS array), and has been benchmarked at about 650MB/s<br>
> (or higher, depending on benchmark) to native disk, and about 550MB/s when<br>
> writing to it through a disconnected DRBD device. The network link for DRBD is<br>
> Infiniband as well (IPoIB), which has been benchmarked with netperf at ~800MB/<br>
> s.<br>
><br>
> The fastest speed that I'm able to get from the DRBD sync with this<br>
> configuration is ~340MB/s, which limits the speed from my initiator to that as<br>
> well. Interestingly, I was also able to benchmark DRBD sync speed over 10Gbe,<br>
> which despite my repeated attempts to tweak drbd.conf, mtu, and tcp kernel<br>
> parameters, has produced the same speed as the aformentioned 340MB/s over<br>
> IPoIB.<br>
><br>
> Here's the drbd.conf:<br>
><br>
> global {<br>
> usage-count yes;<br>
> }<br>
><br>
> common {<br>
> syncer {<br>
> rate 900M;<br>
<br>
</div>check if<br>
cpu-mask 3;<br>
or cpu-mask 7;<br>
or cpu-mask f;<br>
or something like that<br>
has any effect.</blockquote><div> <br>No effect for these.<br><br>
</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>
> }<br>
> }<br>
><br>
> resource drbd0 {<br>
><br>
> protocol C;<br>
><br>
> handlers {<br>
> }<br>
><br>
> startup {<br>
> degr-wfc-timeout 30;<br>
> }<br>
><br>
> disk {<br>
> on-io-error detach;<br>
> fencing dont-care;<br>
> no-disk-flushes;<br>
> no-md-flushes;<br>
> no-disk-drain;<br>
> no-disk-barrier;<br>
> }<br>
><br>
> net {<br>
> ko-count 2;<br>
> after-sb-1pri discard-secondary;<br>
> sndbuf-size 1M;<br>
<br>
</div>you can try sndbuf-size 0; (auto-tuning)</blockquote><div><br>Slightly slower by about 20-30MB/s. <br></div><div> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
and check whether tweaking<br>
/proc/sys/net/ipv4/tcp_rmem<br>
/proc/sys/net/ipv4/tcp_wmem<br>
/proc/sys/net/core/optmem_max<br>
/proc/sys/net/core/rmem_max<br>
/proc/sys/net/core/wmem_max<br>
and the like has any effect.</blockquote><div><br>These did have a positive effect, but they were already applied in my case (as per recommendations from the Infiniband vendor and ixgb readme):<br><br>net.ipv4.tcp_timestamps=0<br>
net.ipv4.tcp_sack=0<br>net.ipv4.tcp_rmem='10000000 10000000 10000000'<br>net.ipv4.tcp_wmem='10000000 10000000 10000000'<br>net.ipv4.tcp_mem='10000000 10000000 10000000'<br>net.core.rmem_max=524287<br>
net.core.wmem_max=524287<br>net.core.rmem_default=524287<br>net.core.wmem_default=524287<br>net.core.optmem_max=524287<br>net.core.netdev_max_backlog=300000<br></div><div> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
check wether the drbd option<br>
no-tcp-cork;<br>
has any positiv/negative effect.</blockquote><div> </div><div>This one has a negative effect - about 70MB/s slower. <br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">
> }<br>
><br>
> on srpt1 {<br>
> device /dev/drbd0;<br>
> disk /dev/sdb;<br>
> address <a href="http://10.0.0.2:7789" target="_blank">10.0.0.2:7789</a>;<br>
> flexible-meta-disk internal;<br>
> }<br>
><br>
> on srpt2 {<br>
> device /dev/drbd0;<br>
> disk /dev/sdb;<br>
> address <a href="http://10.0.0.3:7789" target="_blank">10.0.0.3:7789</a>;<br>
> flexible-meta-disk internal;<br>
> }<br>
> }<br>
><br>
> Any advice/thoughts would be highly appreciated; thanks!<br>
<br>
</div>cpu utilization during benchmarks?<br>
"wait state"?<br>
memory bandwidth?<br>
interrupt rate?</blockquote><div><br>The cpu utilization during the sync for the top tasks looks like so (fluctuates, and typically lower), and is similiar on both nodes. I have not seen any iowait:<br>Cpu(s): 1.2%us, 43.9%sy, 0.0%ni, 13.6%id, 0.0%wa, 0.5%hi, 40.9%si, 0.0%st<br>
<br> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND<br>29513 root 16 0 0 0 0 R 69 0.0 7:31.92 drbd0_receiver<br> 32 root 10 -5 0 0 0 S 39 0.0 44:32.93 kblockd/0<br>
29518 root -3 0 0 0 0 S 18 0.0 1:55.06 drbd0_asender<br>21392 root 15 0 0 0 0 S 1 0.0 0:36.02 drbd0_worker<br><br>The memory bandwidth I've benchmarked with ramspeed to be ~2500-2700Mb/s on one node, and ~2200Mb/s on the other, due to it having fewer memory modules and memory total.<br>
<br>Interrupt rate is ~13500-14000/sec on the primary and ~11500/sec on the secondary during a sync.<br></div><div> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
maybe bind or unbind NIC interrupts to cpus?<br>
/proc/interrupts<br>
/proc/irq/*/smp_affinity</blockquote><div><br>They are on CPU0 currently, but would it help to move it if the CPU is not being overly taxed?<br><br>Thanks,<br><br>-Gennadiy <br></div></div>