[DRBD-user] 300% latency difference between protocol A and B with NVMe
Wido den Hollander
wido at denhollander.io
Mon Nov 23 16:35:33 CET 2020
Hi,
I have a fairly simple and straightforward setup where I'm testing and
benchmarking DRBD9 under Ubuntu 20.04
Using DKMS and the PPAs I compiled DRBD 9.0.25-1 for Ubuntu 20.04 and
started testing.
My setup (2x):
- SuperMicro 1U machine
- AMD Epyc 7302P 16-core
- 128GB Memory
- 10x Samsung PM983 in RAID-10
- Mellanox ConnectX-5 25Gbit interconnect
The 10 NVMe drives are in software RAID-10 with MDADM.
My benchmark is focused on latency. Not on throughput. I tested this
with fio:
$ fio --name=rw_io_1_4k --ioengine=libaio --rw=randwrite \
--bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --direct=1
I tested on the md0 device, on DRBD with protocol A and C. The results
are as followed:
- md0: 32.200 IOps
- Protocol A: 30.200 IOps
- Protocol C: 11.000 IOps
The network between the two nodes is a direct LACP 2x25Gbit connection
with a 50cm DAC cable. About the lowest latency you can get on Ethernet
at the moment.
To me it seems obvious the TCP/IP stack or Ethernet is the problem here,
but I can't pinpoint what is causing such a massive drop.
The latency between the nodes is 0.150ms for a 8192 bytes ping which
seems very reasonable.
Is this to be expected or is there something wrong here?
Wido
More information about the drbd-user
mailing list