[DRBD-user] Extremely high latency problem

Thu Jun 5 07:35:46 CEST 2014

On 04/06/14 11:31 AM, Bret Mette wrote:
> Hello,
>
> I started looking at DRBD as a HA ISCSI target. I am experiencing very
> poor performance and decided to run some tests. My current setup is as
> follows:
>
> Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GH
> CentoS 6.5 - 2.6.32-431.17.1.el6.x86_64
> drbd version: 8.3.16 (api:88/proto:86-97)
> md RAID10 using 7200rpm drives
>
> The 2 drbd nodes are synced using an intel  82579LM Gigabit card
>
> I have created an logical drive using LVM and configured a couple drbd
> resources on top of that. drbd0 is my iscsi configuration file, which is
> shared between the 2 nodes and drbd1 is a 1.75TB ISCSI target.
>
> I run heartbeat on the two nodes and expose a virtual IP to the ISCSI
> initiators.
>
> Originally I was running ISCSI with write-cache off (for data integrity
> reasons) but have recently switched to write-cache on during testing
> (with little to no gain).
>
> My major concern is the extremely high latency test results I got when
> when dd against drbd0 mounted on the primary node.
>
> dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
> 512000 bytes (512 kB) copied, 32.3254 s, 15.8 kB/s
>
> I have pinged the second node as a very basic network latency test and
> get 0.209ms response time. I have also run the same test on both nodes
> with drbd disconnected (or on partitions not associated with drbd) and
> get typical results:
>
> node1
> dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
> 12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s
>
> node2
> dd if=/dev/zero of=~/testbin  bs=512 count=1000 oflag=direct
> 512000 bytes (512 kB) copied, 0.864994 s, 592 kB/s
> 512000 bytes (512 kB) copied, 0.328994 s, 1.6 MB/s
>
> node2's latency (without drbd connected) is inconsistent but always
> falls between those two ranges.
>
> These tests were run with no ISCSI targets exposed, no initiators
> connected, essentially on an idle system.
>
> My question is why are my drbd connected latency tests showing results
> 35 to 100 times slower than my results when dbrd is not connected (or
> against partitions not backed by drbd)?
>
> This seems to be the source of my horrible performance on the ISCSI
> targs (300~900 K/sec dd writes on the initiators) and very high iowait
> (35-75%) on mildly busy initiators.
>
>
> Any advice pointers, etc. would be highly appreciated. I have already
> tried numerous performance tuning settings (suggested by the drbd
> manual). But I am open to any suggestion and will try anything again if
> it might solve my problem.
>
> Here are the important bits of my current drbd.conf
>
>          net {
>          cram-hmac-alg sha1;
>          shared-secret "password";
>          after-sb-0pri disconnect;
>          after-sb-1pri disconnect;
>          after-sb-2pri disconnect;
>          rr-conflict disconnect;
>          max-buffers 8000;
>          max-epoch-size 8000;
>          sndbuf-size 0;
>          }
>
>          syncer {
>          rate 100M;
>          verify-alg sha1;
>          al-extents 3389;
>          }
>
> I've played with the watermark setting and a few others and latency only
> seems to get worse or stay where it's at.
>
>
> Thank you,
> Bret

Have you tried testing the network in isolation? Is the DRBD resource 
syncing? With a syncer rate of 100M on a 1 Gbps NIC, that's just about 
all your bandwidth consumed by background sync. Can you test the speed 
of the storage directly, not over iSCSI/network?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?