[DRBD-user] 300% latency difference between protocol A and B with NVMe
Wido den Hollander
wido at denhollander.io
Tue Nov 24 10:17:29 CET 2020
On 23/11/2020 16:35, Wido den Hollander wrote:
> Hi,
>
> I have a fairly simple and straightforward setup where I'm testing and
> benchmarking DRBD9 under Ubuntu 20.04
>
> Using DKMS and the PPAs I compiled DRBD 9.0.25-1 for Ubuntu 20.04 and
> started testing.
>
> My setup (2x):
>
> - SuperMicro 1U machine
> - AMD Epyc 7302P 16-core
> - 128GB Memory
> - 10x Samsung PM983 in RAID-10
> - Mellanox ConnectX-5 25Gbit interconnect
>
> The 10 NVMe drives are in software RAID-10 with MDADM.
>
> My benchmark is focused on latency. Not on throughput. I tested this
> with fio:
>
> $ fio --name=rw_io_1_4k --ioengine=libaio --rw=randwrite \
> --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --direct=1
>
> I tested on the md0 device, on DRBD with protocol A and C. The results
> are as followed:
>
> - md0: 32.200 IOps
> - Protocol A: 30.200 IOps
> - Protocol C: 11.000 IOps
>
> The network between the two nodes is a direct LACP 2x25Gbit connection
> with a 50cm DAC cable. About the lowest latency you can get on Ethernet
> at the moment.
>
> To me it seems obvious the TCP/IP stack or Ethernet is the problem here,
> but I can't pinpoint what is causing such a massive drop.
>
> The latency between the nodes is 0.150ms for a 8192 bytes ping which
> seems very reasonable.
I also tested with qperf to measure the tcp latency and bandwidth:
tcp_lat:
latency = 13.3 us
tcp_bw:
bw = 3.08 GB/sec
Looking at those values the network seems to perform good, but is this
good enough to not have that big performance impact when writing?
Wido
>
> Is this to be expected or is there something wrong here?
>
> Wido
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user at lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user
More information about the drbd-user
mailing list