Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Cédric, I was able to replicate your results in my environment. The large block 'dd' test saw the biggest improvement in transfer rate when I dropped the ib_sdp module's recv_poll to 100usec. The small block 'dd' test saw no significant change from modifying recv_poll, but did see a marginally better improvement from increasing sndbuf-size. I would agree with you that it seems SDP doesn't like smaller blocks (and it might just be my ignorance as to how the SDP protocol works.) I might go digging through the RFC if I can't sort it out because for our application we do almost nothing but small writes. Although if thats the case I wonder why Netpipe can get better performance over SDP than IP. (See test results for Netpipe at the bottom.) Here are my results after making the same changes you did: ============================================================================= SDP - after sndbuf-size=10240k; Large dd test slightly better but on avg more or less the same Small dd test slightly better ============================================================================= IP - after sndbuf-size=10240k; Large dd test slightly better but on avg more or less the same Small dd test slightly better ============================================================================= SDP - after sndbuf-size=10240k and ib_sdp recv_poll 100; Large dd test significant improvement! Small dd test no change # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct 4+0 records in 4+0 records out 2147483648 bytes (2.1 GB) copied, 3.28283 s, 654 MB/s <- excellent! faster than IP. Here are the previous SDP results if you recall: # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct 4+0 records in 4+0 records out 2147483648 bytes (2.1 GB) copied, 12.507 s, 172 MB/s And previous IP results: # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct 4+0 records in 4+0 records out 2147483648 bytes (2.1 GB) copied, 5.1764 s, 415 MB/s ============================================================================= Here is why I expected the small block dd SDP test to outperform IP: ============================================================================= The through-put of IP over Infiniband was tested using Netpipe: nodeA# NPtcp nodeB# NPtcp -h 10.0.99.108 Send and receive buffers are 16384 and 87380 bytes (A bug in Linux doubles the requested buffer sizes) Now starting the main loop 0: 1 bytes 2912 times --> 0.28 Mbps in 27.64 usec 1: 2 bytes 3617 times --> 0.57 Mbps in 26.77 usec 2: 3 bytes 3735 times --> 0.83 Mbps in 27.65 usec 3: 4 bytes 2411 times --> 1.10 Mbps in 27.71 usec 4: 6 bytes 2706 times --> 1.70 Mbps in 26.85 usec 5: 8 bytes 1862 times --> 2.27 Mbps in 26.92 usec . . . 117: 4194307 bytes 6 times --> 4446.10 Mbps in 7197.32 usec 118: 6291453 bytes 6 times --> 5068.46 Mbps in 9470.32 usec 119: 6291456 bytes 7 times --> 4873.45 Mbps in 9849.29 usec 120: 6291459 bytes 6 times --> 4454.66 Mbps in 10775.25 usec 121: 8388605 bytes 3 times --> 4651.95 Mbps in 13757.67 usec 122: 8388608 bytes 3 times --> 4816.20 Mbps in 13288.50 usec 123: 8388611 bytes 3 times --> 4977.90 Mbps in 12856.84 usec ============================================================================= The through-put of SDP over Infiniband was tested using Netpipe: nodeA# LD_PRELOAD=libsdp.so NPtcp nodeB# LD_PRELOAD=libsdp.so NPtcp -h 10.0.99.108 Send and receive buffers are 126976 and 126976 bytes (A bug in Linux doubles the requested buffer sizes) Now starting the main loop 0: 1 bytes 17604 times --> 1.54 Mbps in 4.95 usec 1: 2 bytes 20215 times --> 3.11 Mbps in 4.91 usec 2: 3 bytes 20380 times --> 4.67 Mbps in 4.90 usec 3: 4 bytes 13608 times --> 6.15 Mbps in 4.96 usec 4: 6 bytes 15116 times --> 9.25 Mbps in 4.95 usec 5: 8 bytes 10100 times --> 12.38 Mbps in 4.93 usec . . . 117: 4194307 bytes 15 times --> 9846.87 Mbps in 3249.76 usec 118: 6291453 bytes 15 times --> 9745.60 Mbps in 4925.30 usec 119: 6291456 bytes 13 times --> 9719.69 Mbps in 4938.43 usec 120: 6291459 bytes 13 times --> 9721.99 Mbps in 4937.26 usec 121: 8388605 bytes 6 times --> 9714.01 Mbps in 6588.42 usec 122: 8388608 bytes 7 times --> 9713.48 Mbps in 6588.78 usec 123: 8388611 bytes 7 times --> 9731.95 Mbps in 6576.28 usec ============================================================================= -aj On Mon, Aug 22, 2011 at 09:28:27AM +0200, Cédric Dufour - Idiap Research Institute wrote: > Hello, > > Have you seen my post on (quite) the same subject: > http://lists.linbit.com/pipermail/drbd-user/2011-July/016598.html ? > > Based on your experiments and mine, it would seem that SDP does not like > "transferring small bits of data" (not being a TCP/SDP guru, I don't > know how to put it more appropriately). This would somehow correlate > with my finding of needing to increase the 'sndbuf-size' as much as > possible. And this also correlates with the fact that initial sync or > "dd" test with large block size actually use SDP very efficiently, while > operations involving smaller "data bits" don't. > > I'm curious whether playing with the 'sndbuf-size' and ib_sdp's > 'recv_poll' parameters would affect your setup the same way it did mine. > > Cheers, > > Cédric > > On 19/08/11 21:45, Aj Mirani wrote: > > I'm currently testing DRBD over Infiniband/SDP vs Infiniband/IP. > > > > My configuration is as follows: > > DRBD 8.3.11 (Protocol C) > > Linux kernel 2.6.39 > > OFED 1.5.4 > > Infiniband: Mellanox Technologies MT26428 > > > > My baseline test was to attempt a resync of the secondary node using Infiniband over IP. I noted the sync rate. Once complete, I performed some other very rudimentary tests using 'dd' and 'mkfs' to get a sense of actual performance. Then I shutdown DRBD on both primary and secondary, modified the config to use SDP and started it back up to re-try all of the tests. > > > > original: > > address 10.0.99.108:7790 ; > > to use SDP: > > address sdp 10.0.99.108:7790 ; > > > > No other config changes were made. > > > > After this, I issued "drbdadm invalidate-remote all" on the primary to force a re-sync. I noted my sync rate almost doubled, which was excellent. > > > > Once the sync was complete I re-attempted my other tests. Amazingly every tests using Infiniband over SDP performed significantly worse than Infiniband over IP. > > > > Is there anything that can explain this? > > > > > > Here are my actual tests/results for each config: > > ============================================================================= > > Infiniband over IP > > ============================================================================= > > # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct > > 4+0 records in > > 4+0 records out > > 2147483648 bytes (2.1 GB) copied, 5.1764 s, 415 MB/s > > > > # dd if=/dev/zero of=/dev/drbd0 bs=4k count=100 oflag=direct > > 100+0 records in > > 100+0 records out > > 409600 bytes (410 kB) copied, 0.0232504 s, 17.6 MB/s > > > > # time mkfs.ext4 /dev/drbd0 > > real 3m54.848s > > user 0m4.272s > > sys 0m37.758s > > > > > > ============================================================================= > > Infiniband over SDP > > ============================================================================= > > # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct > > 4+0 records in > > 4+0 records out > > 2147483648 bytes (2.1 GB) copied, 12.507 s, 172 MB/s <--- (2.4x slower) > > > > # dd if=/dev/zero of=/dev/drbd0 bs=4k count=100 oflag=direct > > 100+0 records in > > 100+0 records out > > 409600 bytes (410 kB) copied, 19.6418 s, 20.9 kB/s <--- (844x slower) > > > > # time mkfs.ext4 /dev/drbd0 > > real 10m12.337s <--- (4.25x slower) > > user 0m4.336s > > sys 0m39.866s > > > > > > ============================================================================= > > > > At the same time I've used the netpipe benchmark to test Infiniband SDP performance, and it looks good. > > > > netpipe benchmark using: > > nodeA# LD_PRELOAD=libsdp.so NPtcp > > nodeB# LD_PRELOAD=libsdp.so NPtcp -h 10.0.99.108 > > > > It consistently out performs Infiniband/IP as I would expect. So this leaves me thinking there is either a problem with my DRBD config or DRBD is using SDP differently for re-sync vs keeping in sync or my testing is flawed. > > > > > > Here is what my config looks like: > > # drbdsetup /dev/drbd0 show > > disk { > > size 0s _is_default; # bytes > > on-io-error pass_on _is_default; > > fencing dont-care _is_default; > > no-disk-flushes ; > > no-md-flushes ; > > max-bio-bvecs 0 _is_default; > > } > > net { > > timeout 60 _is_default; # 1/10 seconds > > max-epoch-size 8192; > > max-buffers 8192; > > unplug-watermark 16384; > > connect-int 10 _is_default; # seconds > > ping-int 10 _is_default; # seconds > > sndbuf-size 0 _is_default; # bytes > > rcvbuf-size 0 _is_default; # bytes > > ko-count 4; > > after-sb-0pri disconnect _is_default; > > after-sb-1pri disconnect _is_default; > > after-sb-2pri disconnect _is_default; > > rr-conflict disconnect _is_default; > > ping-timeout 5 _is_default; # 1/10 seconds > > on-congestion block _is_default; > > congestion-fill 0s _is_default; # byte > > congestion-extents 127 _is_default; > > } > > syncer { > > rate 524288k; # bytes/second > > after -1 _is_default; > > al-extents 3833; > > cpu-mask "15"; > > on-no-data-accessible io-error _is_default; > > c-plan-ahead 0 _is_default; # 1/10 seconds > > c-delay-target 10 _is_default; # 1/10 seconds > > c-fill-target 0s _is_default; # bytes > > c-max-rate 102400k _is_default; # bytes/second > > c-min-rate 4096k _is_default; # bytes/second > > } > > protocol C; > > _this_host { > > device minor 0; > > disk "/dev/sdc1"; > > meta-disk internal; > > address sdp 10.0.99.108:7790; > > } > > _remote_host { > > address ipv4 10.0.99.107:7790; > > > > > > Any insight would be greatly appreciated. > > > > -- Aj Mirani Operations Manager, Tucows Inc. 416-535-0123 x1294