[DRBD-user] DRBD over Infiniband (SDP) performance oddity

Cédric Dufour - Idiap Research Institute cedric.dufour at idiap.ch
Mon Aug 22 09:28:27 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

Have you seen my post on (quite) the same subject:
http://lists.linbit.com/pipermail/drbd-user/2011-July/016598.html ?

Based on your experiments and mine, it would seem that SDP does not like
"transferring small bits of data" (not being a TCP/SDP guru, I don't
know how to put it more appropriately). This would somehow correlate
with my finding of needing to increase the 'sndbuf-size' as much as
possible. And this also correlates with the fact that initial sync or
"dd" test with large block size actually use SDP very efficiently, while
operations involving smaller "data bits" don't.

I'm curious whether playing with the 'sndbuf-size' and ib_sdp's
'recv_poll' parameters would affect your setup the same way it did mine.

Cheers,

Cédric

On 19/08/11 21:45, Aj Mirani wrote:
> I'm currently testing DRBD over Infiniband/SDP vs Infiniband/IP.  
>
> My configuration is as follows:
> DRBD 8.3.11 (Protocol C)
> Linux kernel 2.6.39 
> OFED 1.5.4
> Infiniband: Mellanox Technologies MT26428
>
> My baseline test was to attempt a resync of the secondary node using Infiniband over IP.  I noted the sync rate. Once complete, I performed some other very rudimentary tests using 'dd' and 'mkfs' to get a sense of actual performance.  Then I shutdown DRBD on both primary and secondary, modified the config to use SDP and started it back up to re-try all of the tests.
>
> original:
>     address   10.0.99.108:7790 ;
> to use SDP:
>     address   sdp 10.0.99.108:7790 ;
>
> No other config changes were made.
>
> After this, I issued "drbdadm invalidate-remote all" on the primary to force a re-sync.  I noted my sync rate almost doubled, which was excellent.
>
> Once the sync was complete I re-attempted my other tests.  Amazingly every tests using Infiniband over SDP performed significantly worse than Infiniband over IP.  
>
> Is there anything that can explain this? 
>
>
> Here are my actual tests/results for each config:
> =============================================================================
> Infiniband over IP
> =============================================================================
> # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 2147483648 bytes (2.1 GB) copied, 5.1764 s, 415 MB/s
>
> # dd if=/dev/zero of=/dev/drbd0 bs=4k count=100 oflag=direct
> 100+0 records in
> 100+0 records out
> 409600 bytes (410 kB) copied, 0.0232504 s, 17.6 MB/s
>
> # time mkfs.ext4 /dev/drbd0
> real    3m54.848s
> user    0m4.272s
> sys     0m37.758s
>
>
> =============================================================================
> Infiniband over SDP
> =============================================================================
> # dd if=/dev/zero of=/dev/drbd0 bs=512M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 2147483648 bytes (2.1 GB) copied, 12.507 s, 172 MB/s    <--- (2.4x slower)
>
> # dd if=/dev/zero of=/dev/drbd0 bs=4k count=100 oflag=direct
> 100+0 records in
> 100+0 records out
> 409600 bytes (410 kB) copied, 19.6418 s, 20.9 kB/s      <--- (844x slower)
>
> # time mkfs.ext4 /dev/drbd0
> real    10m12.337s                                      <--- (4.25x slower)
> user    0m4.336s
> sys     0m39.866s
>
>
> =============================================================================
>
> At the same time I've used the netpipe benchmark to test Infiniband SDP performance, and it looks good.  
>
> netpipe benchmark using:
>     nodeA# LD_PRELOAD=libsdp.so NPtcp 
>     nodeB# LD_PRELOAD=libsdp.so  NPtcp -h 10.0.99.108
>
> It consistently out performs Infiniband/IP as I would expect.  So this leaves me thinking there is either a problem with my DRBD config or DRBD is using SDP differently for re-sync vs keeping in sync or my testing is flawed.
>
>
> Here is what my config looks like:
> # drbdsetup /dev/drbd0 show
> disk {
>         size                    0s _is_default; # bytes
>         on-io-error             pass_on _is_default;
>         fencing                 dont-care _is_default;
>         no-disk-flushes ;
>         no-md-flushes   ;
>         max-bio-bvecs           0 _is_default;
> }
> net {
>         timeout                 60 _is_default; # 1/10 seconds
>         max-epoch-size          8192;
>         max-buffers             8192;
>         unplug-watermark        16384;
>         connect-int             10 _is_default; # seconds
>         ping-int                10 _is_default; # seconds
>         sndbuf-size             0 _is_default; # bytes
>         rcvbuf-size             0 _is_default; # bytes
>         ko-count                4;
>         after-sb-0pri           disconnect _is_default;
>         after-sb-1pri           disconnect _is_default;
>         after-sb-2pri           disconnect _is_default;
>         rr-conflict             disconnect _is_default;
>         ping-timeout            5 _is_default; # 1/10 seconds
>         on-congestion           block _is_default;
>         congestion-fill         0s _is_default; # byte
>         congestion-extents      127 _is_default;
> }
> syncer {
>         rate                    524288k; # bytes/second
>         after                   -1 _is_default;
>         al-extents              3833;
>         cpu-mask                "15";
>         on-no-data-accessible   io-error _is_default;
>         c-plan-ahead            0 _is_default; # 1/10 seconds
>         c-delay-target          10 _is_default; # 1/10 seconds
>         c-fill-target           0s _is_default; # bytes
>         c-max-rate              102400k _is_default; # bytes/second
>         c-min-rate              4096k _is_default; # bytes/second
> }
> protocol C;
> _this_host {
>         device                  minor 0;
>         disk                    "/dev/sdc1";
>         meta-disk               internal;
>         address                 sdp 10.0.99.108:7790;
> }
> _remote_host {
>         address                 ipv4 10.0.99.107:7790;
>
>
> Any insight would be greatly appreciated.
>
>



More information about the drbd-user mailing list