[DRBD-user] Re: Poor network performance on 0.7.22

Wed Jun 18 23:10:30 CEST 2008

Oliver,

I am talking about the hardware cache of the harddisk, this is the ultimate
bottleneck. Your idea that the buffer size might be too large coincides with
my thinking.

Resyncing is an extraordinary event that is happening under very little time
constraints so I cant see the concern there.

Actually I wonder what other services you have running over the DRBD
ethernet link. In a typical setup you have drbd running over that and
nothing else. Everything else runs over the other interface.

Cheers

Gerry

-----Original Message-----
From: Oliver Hookins [mailto:oliver.hookins at anchor.com.au]
Sent: Mittwoch, 18. Juni 2008 23:58
To: G.Jacobsen
Cc: drbd-user at linbit.com
Subject: Re: [DRBD-user] Re: Poor network performance on 0.7.22

Gerry,

On Wed Jun 18, 2008 at 23:00:39 +0300, G.Jacobsen wrote:
>Oliver,
>
>When doing a dd between two drives performance is best when bs is slightly
>below the cache size of the receiving harddisk, according to my very humble
>experiences. I suppose the same holds for drbd sndbuf-size.

Which cache though, linux dirty buffer cache, drive write-back buffer (which
is turned off), hardware RAID card write cache (which could be up to 256MB)?
I've been continuing to read up on various issues and my current mode of
thinking is that perhaps one or more buffer settings is too large which I've
read can contribute to poor performance. This is only the current theory
though.

>
>BTW, I wonder what kind of application you are running that the transfer
>rate is such an issue. Its somewhat hard to believe that most production
>systems would really saturate even a 100MB link constantly.

Consider this: when one of the DRBD resources is resyncing it uses up all
available network capacity (not really, but for DRBD it does because of the
network issues it is having). Then writes to that resource on the primary
and writes to other resources communicating over the same link end up
suffering immensely, not because the actual link is saturated but because
the network performance issue with DRBD causes it to use the link so
inefficiently.

I've had to tune the syncer right down to only a few Mbps just so it doesn't
kill the real-time use of the resource, but then resyncing takes a very
large and alarming amount of time.

>
>Just my 0.23 Aussie cents on the matter.
>
>Cheers
>
>Gerry
>
>
>-----Original Message-----
>From: drbd-user-bounces at lists.linbit.com
>[mailto:drbd-user-bounces at lists.linbit.com]On Behalf Of Oliver Hookins
>Sent: Mittwoch, 18. Juni 2008 08:39
>To: drbd-user at lists.linbit.com
>Subject: Re: [DRBD-user] Re: Poor network performance on 0.7.22
>
>
>Another snippet of information that might twig someone's memory... I took a
>tcpdump of DRBD traffic when doing a large file write and although the MTU
>is set to 9000 over the direct 1Gbps connection, both systems have their
TCP
>windows set to very small values, such as around 800.
>
>During a 10 second packet capture I'm also seeing 25 TCP out-of-order
>segments and 1427 TCP Window updates, which seems to be very high. I've
>already had a go at raising TCP buffers in /proc/sys/net/core and
>/proc/sys/net/ipv4 but without any noticeable change in connected speed...
>
>On Wed Jun 18, 2008 at 12:39:14 +1000, Oliver Hookins wrote:
>>Anybody have any tips at all for this issue? I'm running out of ideas...
>>
>>On Thu Jun 12, 2008 at 16:04:10 +1000, Oliver Hookins wrote:
>>>Hi again,
>>>
>>>I've been doing a lot of testing and I'm fairly certain I've narrowed
down
>>>my performance issues to the network connection. Previously I was getting
>>>fairly abysmal performance in even DRBD-disconnected mode but I realise
>now
>>>this was mainly due to my test file size far exceeding the al-extents
>>>setting.
>>>
>>>I am performing dd tests (bs=1G, count=8) with syncs on the connected
DRBD
>>>resources and getting about 10MB/s only. The disks are 10krpm 300GB SCSI
>and
>>>can easily get sustained speeds of 60-70MB/s when DRBD is disconnected or
>>>not used. There is a direct cable between the machines giving them full
>>>gigabit connectivity via their Intel 80003ES2LAN adaptors (running the
>e1000
>>>driver version 7.3.20-k2-NAPI that is standard with RHEL4 x86_64). I have
>tested
>>>this connection with Netpipe and get up to 940Mbps.
>>>
>>>However DRBD still crawls along at 10MB/s. I have attempted to increase
>the
>>>/proc/sys/net/core/{r,w}mem_{default,max} settings which were previously
>all
>>>at 132KB, to 1MB for defaults and 2MB for max without any increase in
>>>performance. MTU on the link is set to 9000 bytes.
>>>
>>>In drbd.conf I have sndbuf-size 2M; max-buffers 8192; max-epoch-size
8192.
>>>I've also played a little with the unplug watermark setting it to very
low
>>>and very high values without any apparent change.
>>>
>>>Taking a look at a tcpdump of the traffic the only weird things I could
>see
>>>are a lot of TCP window size change notifications and some strange packet
>>>"clumping", but it's not really offering me any insights I can
immediately
>>>see.
>>>
>>>Is there anything else I could tune to solve this problem?

--
Regards,
Oliver Hookins

___________________________________________________________ 
Try the all-new Yahoo! Mail. "The New Version is radically easier to use" – The Wall Street Journal 
http://uk.docs.yahoo.com/nowyoucan.html