[DRBD-user] 300% latency difference between protocol A and B with NVMe

Tue Nov 24 11:21:29 CET 2020

Hi Wido,

These results are not too surprising. Consider the steps involved in a
protocol C write. Note that tcp_lat is one way latency, so we get:

Send data to peer: 13.3 us (perhaps more, if qperf was testing with a
size less than 4K)
Write on peer: 1s / 32200 == 31.1 us
Confirmation of write from peer: 13.3 us

Total: 13.3 us + 31.1 us + 13.3 us == 57.7 us

IOPS: 1s / 57.7 us == 17300

DRBD achieved 11000 IOPS, so 63% of the theoretical maximum. So not
all that far off. I would test latency with qperf for 4K messages too,
perhaps DRBD is even closer to the maximum.

To improve this you could try disabling LACP, using the disk directly
instead of in RAID, pinning DRBD and fio threads to the same core,
adjusting the interrupt affinities... Anything that simplifies the
process might help a little, but I would be surprised if you get it
much faster.

Best regards,
Joel

On Tue, Nov 24, 2020 at 10:46 AM Wido den Hollander
<wido at denhollander.io> wrote:
>
>
>
> On 23/11/2020 16:35, Wido den Hollander wrote:
> > Hi,
> >
> > I have a fairly simple and straightforward setup where I'm testing and
> > benchmarking DRBD9 under Ubuntu 20.04
> >
> > Using DKMS and the PPAs I compiled DRBD 9.0.25-1 for Ubuntu 20.04 and
> > started testing.
> >
> > My setup (2x):
> >
> > - SuperMicro 1U machine
> > - AMD Epyc 7302P 16-core
> > - 128GB Memory
> > - 10x Samsung PM983 in RAID-10
> > - Mellanox ConnectX-5 25Gbit interconnect
> >
> > The 10 NVMe drives are in software RAID-10 with MDADM.
> >
> > My benchmark is focused on latency. Not on throughput. I tested this
> > with fio:
> >
> > $ fio --name=rw_io_1_4k --ioengine=libaio --rw=randwrite \
> >    --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --direct=1
> >
> > I tested on the md0 device, on DRBD with protocol A and C. The results
> > are as followed:
> >
> > - md0: 32.200 IOps
> > - Protocol A: 30.200 IOps
> > - Protocol C: 11.000 IOps
> >
> > The network between the two nodes is a direct LACP 2x25Gbit connection
> > with a 50cm DAC cable. About the lowest latency you can get on Ethernet
> > at the moment.
> >
> > To me it seems obvious the TCP/IP stack or Ethernet is the problem here,
> > but I can't pinpoint what is causing such a massive drop.
> >
> > The latency between the nodes is 0.150ms for a 8192 bytes ping which
> > seems very reasonable.
>
> I also tested with qperf to measure the tcp latency and bandwidth:
>
> tcp_lat:
>      latency  =  13.3 us
> tcp_bw:
>      bw  =  3.08 GB/sec
>
> Looking at those values the network seems to perform good, but is this
> good enough to not have that big performance impact when writing?
>
> Wido
>
> >
> > Is this to be expected or is there something wrong here?
> >
> > Wido
> > _______________________________________________
> > Star us on GITHUB: https://github.com/LINBIT
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > https://lists.linbit.com/mailman/listinfo/drbd-user
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user at lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user