[DRBD-user] 300% latency difference between protocol A and B with NVMe

Wido den Hollander wido at denhollander.io
Tue Nov 24 13:25:23 CET 2020



On 24/11/2020 11:21, Joel Colledge wrote:
> Hi Wido,
> 
> These results are not too surprising. Consider the steps involved in a
> protocol C write. Note that tcp_lat is one way latency, so we get:
> 
> Send data to peer: 13.3 us (perhaps more, if qperf was testing with a
> size less than 4K)
> Write on peer: 1s / 32200 == 31.1 us
> Confirmation of write from peer: 13.3 us
> 
> Total: 13.3 us + 31.1 us + 13.3 us == 57.7 us
> 
> IOPS: 1s / 57.7 us == 17300
> 
> DRBD achieved 11000 IOPS, so 63% of the theoretical maximum. So not
> all that far off. I would test latency with qperf for 4K messages too,
> perhaps DRBD is even closer to the maximum.

tcp_lat:
     latency   =  18 us
     msg_size  =   4 KB
tcp_lat:
     latency   =  21.9 us
     msg_size  =     8 KB
tcp_lat:
     latency   =  27.3 us
     msg_size  =    16 KB
tcp_lat:
     latency   =  48.4 us
     msg_size  =    32 KB
tcp_lat:
     latency   =  77.4 us
     msg_size  =    64 KB

So the total would be 67.0 us instead of 57.0 us.

Interesting information! Would RDMA make a huge difference here?

> 
> To improve this you could try disabling LACP, using the disk directly
> instead of in RAID, pinning DRBD and fio threads to the same core,
> adjusting the interrupt affinities... Anything that simplifies the
> process might help a little, but I would be surprised if you get it
> much faster.
I disabled LACP in one of my tests, but that didn't change much. I'll 
probably use the second NIC just for heartbeats for Pacemaker. Single 
25Gbit is sufficient for the replication.

Without RAID we wouldn't have any redundancy inside the boxes.

I did notice that a single PM983 maxes out at 37k 4k write IOps. So 
RAID-10 won't make it much faster. Or do you have any other suggestions?

Wido

> 
> Best regards,
> Joel
> 
> On Tue, Nov 24, 2020 at 10:46 AM Wido den Hollander
> <wido at denhollander.io> wrote:
>>
>>
>>
>> On 23/11/2020 16:35, Wido den Hollander wrote:
>>> Hi,
>>>
>>> I have a fairly simple and straightforward setup where I'm testing and
>>> benchmarking DRBD9 under Ubuntu 20.04
>>>
>>> Using DKMS and the PPAs I compiled DRBD 9.0.25-1 for Ubuntu 20.04 and
>>> started testing.
>>>
>>> My setup (2x):
>>>
>>> - SuperMicro 1U machine
>>> - AMD Epyc 7302P 16-core
>>> - 128GB Memory
>>> - 10x Samsung PM983 in RAID-10
>>> - Mellanox ConnectX-5 25Gbit interconnect
>>>
>>> The 10 NVMe drives are in software RAID-10 with MDADM.
>>>
>>> My benchmark is focused on latency. Not on throughput. I tested this
>>> with fio:
>>>
>>> $ fio --name=rw_io_1_4k --ioengine=libaio --rw=randwrite \
>>>     --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --direct=1
>>>
>>> I tested on the md0 device, on DRBD with protocol A and C. The results
>>> are as followed:
>>>
>>> - md0: 32.200 IOps
>>> - Protocol A: 30.200 IOps
>>> - Protocol C: 11.000 IOps
>>>
>>> The network between the two nodes is a direct LACP 2x25Gbit connection
>>> with a 50cm DAC cable. About the lowest latency you can get on Ethernet
>>> at the moment.
>>>
>>> To me it seems obvious the TCP/IP stack or Ethernet is the problem here,
>>> but I can't pinpoint what is causing such a massive drop.
>>>
>>> The latency between the nodes is 0.150ms for a 8192 bytes ping which
>>> seems very reasonable.
>>
>> I also tested with qperf to measure the tcp latency and bandwidth:
>>
>> tcp_lat:
>>       latency  =  13.3 us
>> tcp_bw:
>>       bw  =  3.08 GB/sec
>>
>> Looking at those values the network seems to perform good, but is this
>> good enough to not have that big performance impact when writing?
>>
>> Wido
>>
>>>
>>> Is this to be expected or is there something wrong here?
>>>
>>> Wido
>>> _______________________________________________
>>> Star us on GITHUB: https://github.com/LINBIT
>>> drbd-user mailing list
>>> drbd-user at lists.linbit.com
>>> https://lists.linbit.com/mailman/listinfo/drbd-user
>> _______________________________________________
>> Star us on GITHUB: https://github.com/LINBIT
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> https://lists.linbit.com/mailman/listinfo/drbd-user


More information about the drbd-user mailing list