Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, On 23/08/11 00:25, Aj Mirani wrote: > Hi Cédric, > > I was able to replicate your results in my environment. The large block 'dd' test saw the biggest improvement in transfer rate when I dropped the ib_sdp module's recv_poll to 100usec. > > The small block 'dd' test saw no significant change from modifying recv_poll, but did see a marginally better improvement from increasing sndbuf-size. Thanks for that feedback! > I would agree with you that it seems SDP doesn't like smaller blocks (and it might just be my ignorance as to how the SDP protocol works.) I might go digging through the RFC if I can't sort it out because for our application we do almost nothing but small writes. > > Although if thats the case I wonder why Netpipe can get better performance over SDP than IP. (See test results for Netpipe at the bottom.) Since I'm completely out of my depth here, I can only express my intuition, which is that all those results point to some flaw in the SDP implementation of DRBD. I remember looking at the patch that brought SDP support to DRBD and it is very simple (just a matter of instantiating a SDP socket instead of a TCP one, IIRC). Maybe, in the case of DRBD and its traffic patterns, some further "intelligence" would be needed. Again, I might be totally wrong. Let's hope for some knowledgeable one to stumble on our messages and shed some light on the matter. Cheers, Cédric > Here are my results after making the same changes you did: > > ============================================================================= > SDP - after sndbuf-size=10240k; > Large dd test slightly better but on avg more or less the same > Small dd test slightly better > ============================================================================= > IP - after sndbuf-size=10240k; > Large dd test slightly better but on avg more or less the same > Small dd test slightly better > ============================================================================= > SDP - after sndbuf-size=10240k and ib_sdp recv_poll 100; > Large dd test significant improvement! > Small dd test no change > > # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct > 4+0 records in > 4+0 records out > 2147483648 bytes (2.1 GB) copied, 3.28283 s, 654 MB/s <- excellent! faster than IP. > > Here are the previous SDP results if you recall: > # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct > 4+0 records in > 4+0 records out > 2147483648 bytes (2.1 GB) copied, 12.507 s, 172 MB/s > > And previous IP results: > # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct > 4+0 records in > 4+0 records out > 2147483648 bytes (2.1 GB) copied, 5.1764 s, 415 MB/s > ============================================================================= > > > > > Here is why I expected the small block dd SDP test to outperform IP: > ============================================================================= > The through-put of IP over Infiniband was tested using Netpipe: > > nodeA# NPtcp > nodeB# NPtcp -h 10.0.99.108 > > Send and receive buffers are 16384 and 87380 bytes > (A bug in Linux doubles the requested buffer sizes) > Now starting the main loop > 0: 1 bytes 2912 times --> 0.28 Mbps in 27.64 usec > 1: 2 bytes 3617 times --> 0.57 Mbps in 26.77 usec > 2: 3 bytes 3735 times --> 0.83 Mbps in 27.65 usec > 3: 4 bytes 2411 times --> 1.10 Mbps in 27.71 usec > 4: 6 bytes 2706 times --> 1.70 Mbps in 26.85 usec > 5: 8 bytes 1862 times --> 2.27 Mbps in 26.92 usec > . > . > . > 117: 4194307 bytes 6 times --> 4446.10 Mbps in 7197.32 usec > 118: 6291453 bytes 6 times --> 5068.46 Mbps in 9470.32 usec > 119: 6291456 bytes 7 times --> 4873.45 Mbps in 9849.29 usec > 120: 6291459 bytes 6 times --> 4454.66 Mbps in 10775.25 usec > 121: 8388605 bytes 3 times --> 4651.95 Mbps in 13757.67 usec > 122: 8388608 bytes 3 times --> 4816.20 Mbps in 13288.50 usec > 123: 8388611 bytes 3 times --> 4977.90 Mbps in 12856.84 usec > > ============================================================================= > The through-put of SDP over Infiniband was tested using Netpipe: > > nodeA# LD_PRELOAD=libsdp.so NPtcp > nodeB# LD_PRELOAD=libsdp.so NPtcp -h 10.0.99.108 > > Send and receive buffers are 126976 and 126976 bytes > (A bug in Linux doubles the requested buffer sizes) > Now starting the main loop > 0: 1 bytes 17604 times --> 1.54 Mbps in 4.95 usec > 1: 2 bytes 20215 times --> 3.11 Mbps in 4.91 usec > 2: 3 bytes 20380 times --> 4.67 Mbps in 4.90 usec > 3: 4 bytes 13608 times --> 6.15 Mbps in 4.96 usec > 4: 6 bytes 15116 times --> 9.25 Mbps in 4.95 usec > 5: 8 bytes 10100 times --> 12.38 Mbps in 4.93 usec > . > . > . > 117: 4194307 bytes 15 times --> 9846.87 Mbps in 3249.76 usec > 118: 6291453 bytes 15 times --> 9745.60 Mbps in 4925.30 usec > 119: 6291456 bytes 13 times --> 9719.69 Mbps in 4938.43 usec > 120: 6291459 bytes 13 times --> 9721.99 Mbps in 4937.26 usec > 121: 8388605 bytes 6 times --> 9714.01 Mbps in 6588.42 usec > 122: 8388608 bytes 7 times --> 9713.48 Mbps in 6588.78 usec > 123: 8388611 bytes 7 times --> 9731.95 Mbps in 6576.28 usec > > ============================================================================= > > > > > > -aj > > > > > On Mon, Aug 22, 2011 at 09:28:27AM +0200, Cédric Dufour - Idiap Research Institute wrote: >> Hello, >> >> Have you seen my post on (quite) the same subject: >> http://lists.linbit.com/pipermail/drbd-user/2011-July/016598.html ? >> >> Based on your experiments and mine, it would seem that SDP does not like >> "transferring small bits of data" (not being a TCP/SDP guru, I don't >> know how to put it more appropriately). This would somehow correlate >> with my finding of needing to increase the 'sndbuf-size' as much as >> possible. And this also correlates with the fact that initial sync or >> "dd" test with large block size actually use SDP very efficiently, while >> operations involving smaller "data bits" don't. >> >> I'm curious whether playing with the 'sndbuf-size' and ib_sdp's >> 'recv_poll' parameters would affect your setup the same way it did mine. >> >> Cheers, >> >> Cédric >> >> On 19/08/11 21:45, Aj Mirani wrote: >>> I'm currently testing DRBD over Infiniband/SDP vs Infiniband/IP. >>> >>> My configuration is as follows: >>> DRBD 8.3.11 (Protocol C) >>> Linux kernel 2.6.39 >>> OFED 1.5.4 >>> Infiniband: Mellanox Technologies MT26428 >>> >>> My baseline test was to attempt a resync of the secondary node using Infiniband over IP. I noted the sync rate. Once complete, I performed some other very rudimentary tests using 'dd' and 'mkfs' to get a sense of actual performance. Then I shutdown DRBD on both primary and secondary, modified the config to use SDP and started it back up to re-try all of the tests. >>> >>> original: >>> address 10.0.99.108:7790 ; >>> to use SDP: >>> address sdp 10.0.99.108:7790 ; >>> >>> No other config changes were made. >>> >>> After this, I issued "drbdadm invalidate-remote all" on the primary to force a re-sync. I noted my sync rate almost doubled, which was excellent. >>> >>> Once the sync was complete I re-attempted my other tests. Amazingly every tests using Infiniband over SDP performed significantly worse than Infiniband over IP. >>> >>> Is there anything that can explain this? >>> >>> >>> Here are my actual tests/results for each config: >>> ============================================================================= >>> Infiniband over IP >>> ============================================================================= >>> # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct >>> 4+0 records in >>> 4+0 records out >>> 2147483648 bytes (2.1 GB) copied, 5.1764 s, 415 MB/s >>> >>> # dd if=/dev/zero of=/dev/drbd0 bs=4k count=100 oflag=direct >>> 100+0 records in >>> 100+0 records out >>> 409600 bytes (410 kB) copied, 0.0232504 s, 17.6 MB/s >>> >>> # time mkfs.ext4 /dev/drbd0 >>> real 3m54.848s >>> user 0m4.272s >>> sys 0m37.758s >>> >>> >>> ============================================================================= >>> Infiniband over SDP >>> ============================================================================= >>> # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct >>> 4+0 records in >>> 4+0 records out >>> 2147483648 bytes (2.1 GB) copied, 12.507 s, 172 MB/s <--- (2.4x slower) >>> >>> # dd if=/dev/zero of=/dev/drbd0 bs=4k count=100 oflag=direct >>> 100+0 records in >>> 100+0 records out >>> 409600 bytes (410 kB) copied, 19.6418 s, 20.9 kB/s <--- (844x slower) >>> >>> # time mkfs.ext4 /dev/drbd0 >>> real 10m12.337s <--- (4.25x slower) >>> user 0m4.336s >>> sys 0m39.866s >>> >>> >>> ============================================================================= >>> >>> At the same time I've used the netpipe benchmark to test Infiniband SDP performance, and it looks good. >>> >>> netpipe benchmark using: >>> nodeA# LD_PRELOAD=libsdp.so NPtcp >>> nodeB# LD_PRELOAD=libsdp.so NPtcp -h 10.0.99.108 >>> >>> It consistently out performs Infiniband/IP as I would expect. So this leaves me thinking there is either a problem with my DRBD config or DRBD is using SDP differently for re-sync vs keeping in sync or my testing is flawed. >>> >>> >>> Here is what my config looks like: >>> # drbdsetup /dev/drbd0 show >>> disk { >>> size 0s _is_default; # bytes >>> on-io-error pass_on _is_default; >>> fencing dont-care _is_default; >>> no-disk-flushes ; >>> no-md-flushes ; >>> max-bio-bvecs 0 _is_default; >>> } >>> net { >>> timeout 60 _is_default; # 1/10 seconds >>> max-epoch-size 8192; >>> max-buffers 8192; >>> unplug-watermark 16384; >>> connect-int 10 _is_default; # seconds >>> ping-int 10 _is_default; # seconds >>> sndbuf-size 0 _is_default; # bytes >>> rcvbuf-size 0 _is_default; # bytes >>> ko-count 4; >>> after-sb-0pri disconnect _is_default; >>> after-sb-1pri disconnect _is_default; >>> after-sb-2pri disconnect _is_default; >>> rr-conflict disconnect _is_default; >>> ping-timeout 5 _is_default; # 1/10 seconds >>> on-congestion block _is_default; >>> congestion-fill 0s _is_default; # byte >>> congestion-extents 127 _is_default; >>> } >>> syncer { >>> rate 524288k; # bytes/second >>> after -1 _is_default; >>> al-extents 3833; >>> cpu-mask "15"; >>> on-no-data-accessible io-error _is_default; >>> c-plan-ahead 0 _is_default; # 1/10 seconds >>> c-delay-target 10 _is_default; # 1/10 seconds >>> c-fill-target 0s _is_default; # bytes >>> c-max-rate 102400k _is_default; # bytes/second >>> c-min-rate 4096k _is_default; # bytes/second >>> } >>> protocol C; >>> _this_host { >>> device minor 0; >>> disk "/dev/sdc1"; >>> meta-disk internal; >>> address sdp 10.0.99.108:7790; >>> } >>> _remote_host { >>> address ipv4 10.0.99.107:7790; >>> >>> >>> Any insight would be greatly appreciated. >>> >>>