[DRBD-user] DRBD over Infiniband (SDP) performance oddity

Aj Mirani aj at tucows.com
Tue Aug 23 00:25:00 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Cédric,

I was able to replicate your results in my environment. The large block 'dd' test saw the biggest improvement in transfer rate when I dropped the ib_sdp module's recv_poll to 100usec. 

The small block 'dd' test saw no significant change from modifying recv_poll, but did see a marginally better improvement from increasing sndbuf-size.

I would agree with you that it seems SDP doesn't like smaller blocks (and it might just be my ignorance as to how the SDP protocol works.)  I might go digging through the RFC if I can't sort it out because for our application we do almost nothing but small writes.

Although if thats the case I wonder why Netpipe can get better performance over SDP than IP.  (See test results for Netpipe at the bottom.)

Here are my results after making the same changes you did:

=============================================================================
SDP - after sndbuf-size=10240k;
Large dd test slightly better but on avg more or less the same
Small dd test slightly better
=============================================================================
IP - after sndbuf-size=10240k;
Large dd test slightly better but on avg more or less the same
Small dd test slightly better
=============================================================================
SDP - after sndbuf-size=10240k and ib_sdp recv_poll 100;
Large dd test significant improvement!
Small dd test no change

# dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct
4+0 records in
4+0 records out
2147483648 bytes (2.1 GB) copied, 3.28283 s, 654 MB/s  <- excellent! faster than IP.

Here are the previous SDP results if you recall:
# dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct
4+0 records in
4+0 records out
2147483648 bytes (2.1 GB) copied, 12.507 s, 172 MB/s 

And previous IP results:
# dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct
4+0 records in
4+0 records out
2147483648 bytes (2.1 GB) copied, 5.1764 s, 415 MB/s
=============================================================================




Here is why I expected the small block dd SDP test to outperform IP:
=============================================================================
The through-put of IP over Infiniband was tested using Netpipe:

    nodeA# NPtcp
    nodeB# NPtcp -h 10.0.99.108

Send and receive buffers are 16384 and 87380 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0:       1 bytes   2912 times -->      0.28 Mbps in      27.64 usec
  1:       2 bytes   3617 times -->      0.57 Mbps in      26.77 usec
  2:       3 bytes   3735 times -->      0.83 Mbps in      27.65 usec
  3:       4 bytes   2411 times -->      1.10 Mbps in      27.71 usec
  4:       6 bytes   2706 times -->      1.70 Mbps in      26.85 usec
  5:       8 bytes   1862 times -->      2.27 Mbps in      26.92 usec
  .
  .
  .
117: 4194307 bytes      6 times -->   4446.10 Mbps in    7197.32 usec
118: 6291453 bytes      6 times -->   5068.46 Mbps in    9470.32 usec
119: 6291456 bytes      7 times -->   4873.45 Mbps in    9849.29 usec
120: 6291459 bytes      6 times -->   4454.66 Mbps in   10775.25 usec
121: 8388605 bytes      3 times -->   4651.95 Mbps in   13757.67 usec
122: 8388608 bytes      3 times -->   4816.20 Mbps in   13288.50 usec
123: 8388611 bytes      3 times -->   4977.90 Mbps in   12856.84 usec

=============================================================================
The through-put of SDP over Infiniband was tested using Netpipe:

    nodeA# LD_PRELOAD=libsdp.so NPtcp 
    nodeB# LD_PRELOAD=libsdp.so  NPtcp -h 10.0.99.108

Send and receive buffers are 126976 and 126976 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0:       1 bytes  17604 times -->      1.54 Mbps in       4.95 usec
  1:       2 bytes  20215 times -->      3.11 Mbps in       4.91 usec
  2:       3 bytes  20380 times -->      4.67 Mbps in       4.90 usec
  3:       4 bytes  13608 times -->      6.15 Mbps in       4.96 usec
  4:       6 bytes  15116 times -->      9.25 Mbps in       4.95 usec
  5:       8 bytes  10100 times -->     12.38 Mbps in       4.93 usec
  .
  .
  .
117: 4194307 bytes     15 times -->   9846.87 Mbps in    3249.76 usec
118: 6291453 bytes     15 times -->   9745.60 Mbps in    4925.30 usec
119: 6291456 bytes     13 times -->   9719.69 Mbps in    4938.43 usec
120: 6291459 bytes     13 times -->   9721.99 Mbps in    4937.26 usec
121: 8388605 bytes      6 times -->   9714.01 Mbps in    6588.42 usec
122: 8388608 bytes      7 times -->   9713.48 Mbps in    6588.78 usec
123: 8388611 bytes      7 times -->   9731.95 Mbps in    6576.28 usec

=============================================================================





			-aj




On Mon, Aug 22, 2011 at 09:28:27AM +0200, Cédric Dufour - Idiap Research Institute wrote:
> Hello,
> 
> Have you seen my post on (quite) the same subject:
> http://lists.linbit.com/pipermail/drbd-user/2011-July/016598.html ?
> 
> Based on your experiments and mine, it would seem that SDP does not like
> "transferring small bits of data" (not being a TCP/SDP guru, I don't
> know how to put it more appropriately). This would somehow correlate
> with my finding of needing to increase the 'sndbuf-size' as much as
> possible. And this also correlates with the fact that initial sync or
> "dd" test with large block size actually use SDP very efficiently, while
> operations involving smaller "data bits" don't.
> 
> I'm curious whether playing with the 'sndbuf-size' and ib_sdp's
> 'recv_poll' parameters would affect your setup the same way it did mine.
> 
> Cheers,
> 
> Cédric
> 
> On 19/08/11 21:45, Aj Mirani wrote:
> > I'm currently testing DRBD over Infiniband/SDP vs Infiniband/IP.  
> >
> > My configuration is as follows:
> > DRBD 8.3.11 (Protocol C)
> > Linux kernel 2.6.39 
> > OFED 1.5.4
> > Infiniband: Mellanox Technologies MT26428
> >
> > My baseline test was to attempt a resync of the secondary node using Infiniband over IP.  I noted the sync rate. Once complete, I performed some other very rudimentary tests using 'dd' and 'mkfs' to get a sense of actual performance.  Then I shutdown DRBD on both primary and secondary, modified the config to use SDP and started it back up to re-try all of the tests.
> >
> > original:
> >     address   10.0.99.108:7790 ;
> > to use SDP:
> >     address   sdp 10.0.99.108:7790 ;
> >
> > No other config changes were made.
> >
> > After this, I issued "drbdadm invalidate-remote all" on the primary to force a re-sync.  I noted my sync rate almost doubled, which was excellent.
> >
> > Once the sync was complete I re-attempted my other tests.  Amazingly every tests using Infiniband over SDP performed significantly worse than Infiniband over IP.  
> >
> > Is there anything that can explain this? 
> >
> >
> > Here are my actual tests/results for each config:
> > =============================================================================
> > Infiniband over IP
> > =============================================================================
> > # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct
> > 4+0 records in
> > 4+0 records out
> > 2147483648 bytes (2.1 GB) copied, 5.1764 s, 415 MB/s
> >
> > # dd if=/dev/zero of=/dev/drbd0 bs=4k count=100 oflag=direct
> > 100+0 records in
> > 100+0 records out
> > 409600 bytes (410 kB) copied, 0.0232504 s, 17.6 MB/s
> >
> > # time mkfs.ext4 /dev/drbd0
> > real    3m54.848s
> > user    0m4.272s
> > sys     0m37.758s
> >
> >
> > =============================================================================
> > Infiniband over SDP
> > =============================================================================
> > # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct
> > 4+0 records in
> > 4+0 records out
> > 2147483648 bytes (2.1 GB) copied, 12.507 s, 172 MB/s    <--- (2.4x slower)
> >
> > # dd if=/dev/zero of=/dev/drbd0 bs=4k count=100 oflag=direct
> > 100+0 records in
> > 100+0 records out
> > 409600 bytes (410 kB) copied, 19.6418 s, 20.9 kB/s      <--- (844x slower)
> >
> > # time mkfs.ext4 /dev/drbd0
> > real    10m12.337s                                      <--- (4.25x slower)
> > user    0m4.336s
> > sys     0m39.866s
> >
> >
> > =============================================================================
> >
> > At the same time I've used the netpipe benchmark to test Infiniband SDP performance, and it looks good.  
> >
> > netpipe benchmark using:
> >     nodeA# LD_PRELOAD=libsdp.so NPtcp 
> >     nodeB# LD_PRELOAD=libsdp.so  NPtcp -h 10.0.99.108
> >
> > It consistently out performs Infiniband/IP as I would expect.  So this leaves me thinking there is either a problem with my DRBD config or DRBD is using SDP differently for re-sync vs keeping in sync or my testing is flawed.
> >
> >
> > Here is what my config looks like:
> > # drbdsetup /dev/drbd0 show
> > disk {
> >         size                    0s _is_default; # bytes
> >         on-io-error             pass_on _is_default;
> >         fencing                 dont-care _is_default;
> >         no-disk-flushes ;
> >         no-md-flushes   ;
> >         max-bio-bvecs           0 _is_default;
> > }
> > net {
> >         timeout                 60 _is_default; # 1/10 seconds
> >         max-epoch-size          8192;
> >         max-buffers             8192;
> >         unplug-watermark        16384;
> >         connect-int             10 _is_default; # seconds
> >         ping-int                10 _is_default; # seconds
> >         sndbuf-size             0 _is_default; # bytes
> >         rcvbuf-size             0 _is_default; # bytes
> >         ko-count                4;
> >         after-sb-0pri           disconnect _is_default;
> >         after-sb-1pri           disconnect _is_default;
> >         after-sb-2pri           disconnect _is_default;
> >         rr-conflict             disconnect _is_default;
> >         ping-timeout            5 _is_default; # 1/10 seconds
> >         on-congestion           block _is_default;
> >         congestion-fill         0s _is_default; # byte
> >         congestion-extents      127 _is_default;
> > }
> > syncer {
> >         rate                    524288k; # bytes/second
> >         after                   -1 _is_default;
> >         al-extents              3833;
> >         cpu-mask                "15";
> >         on-no-data-accessible   io-error _is_default;
> >         c-plan-ahead            0 _is_default; # 1/10 seconds
> >         c-delay-target          10 _is_default; # 1/10 seconds
> >         c-fill-target           0s _is_default; # bytes
> >         c-max-rate              102400k _is_default; # bytes/second
> >         c-min-rate              4096k _is_default; # bytes/second
> > }
> > protocol C;
> > _this_host {
> >         device                  minor 0;
> >         disk                    "/dev/sdc1";
> >         meta-disk               internal;
> >         address                 sdp 10.0.99.108:7790;
> > }
> > _remote_host {
> >         address                 ipv4 10.0.99.107:7790;
> >
> >
> > Any insight would be greatly appreciated.
> >
> >

-- 
Aj Mirani  
Operations Manager, Tucows Inc.
416-535-0123 x1294



More information about the drbd-user mailing list