[DRBD-user] Re: Poor network performance on 0.7.22

Thu Jun 19 00:11:00 CEST 2008

On Thu Jun 19, 2008 at 00:10:30 +0300, G.Jacobsen wrote:
>Oliver,
>
>I am talking about the hardware cache of the harddisk, this is the ultimate
>bottleneck. Your idea that the buffer size might be too large coincides with
>my thinking.

The disk cache (which is turned off anyway) will have no effect on large
sequential writes which should only be limited by the disks' sustained
throughput. This is the testing methodology I am using which is showing up
problems, and which I've seen many other sysadmins perform in tests posted
to this list.

>
>Resyncing is an extraordinary event that is happening under very little time
>constraints so I cant see the concern there.

Resyncing at 10MB/s shouldn't kill performance of the resource when it is
communicating over a gigabit link and using 15krpm drives. This doesn't even
come close to the configured value of 50MB/s for the syncer.

>Actually I wonder what other services you have running over the DRBD
>ethernet link. In a typical setup you have drbd running over that and
>nothing else. Everything else runs over the other interface.

Just DRBD and Heartbeat (which would contribute perhaps a few bytes per
second).

>
>-----Original Message-----
>From: Oliver Hookins [mailto:oliver.hookins at anchor.com.au]
>Sent: Mittwoch, 18. Juni 2008 23:58
>To: G.Jacobsen
>Cc: drbd-user at linbit.com
>Subject: Re: [DRBD-user] Re: Poor network performance on 0.7.22
>
>
>Gerry,
>
>On Wed Jun 18, 2008 at 23:00:39 +0300, G.Jacobsen wrote:
>>Oliver,
>>
>>When doing a dd between two drives performance is best when bs is slightly
>>below the cache size of the receiving harddisk, according to my very humble
>>experiences. I suppose the same holds for drbd sndbuf-size.
>
>Which cache though, linux dirty buffer cache, drive write-back buffer (which
>is turned off), hardware RAID card write cache (which could be up to 256MB)?
>I've been continuing to read up on various issues and my current mode of
>thinking is that perhaps one or more buffer settings is too large which I've
>read can contribute to poor performance. This is only the current theory
>though.
>
>>
>>BTW, I wonder what kind of application you are running that the transfer
>>rate is such an issue. Its somewhat hard to believe that most production
>>systems would really saturate even a 100MB link constantly.
>
>Consider this: when one of the DRBD resources is resyncing it uses up all
>available network capacity (not really, but for DRBD it does because of the
>network issues it is having). Then writes to that resource on the primary
>and writes to other resources communicating over the same link end up
>suffering immensely, not because the actual link is saturated but because
>the network performance issue with DRBD causes it to use the link so
>inefficiently.
>
>I've had to tune the syncer right down to only a few Mbps just so it doesn't
>kill the real-time use of the resource, but then resyncing takes a very
>large and alarming amount of time.
>
>>
>>Just my 0.23 Aussie cents on the matter.
>>
>>Cheers
>>
>>Gerry
>>
>>
>>-----Original Message-----
>>From: drbd-user-bounces at lists.linbit.com
>>[mailto:drbd-user-bounces at lists.linbit.com]On Behalf Of Oliver Hookins
>>Sent: Mittwoch, 18. Juni 2008 08:39
>>To: drbd-user at lists.linbit.com
>>Subject: Re: [DRBD-user] Re: Poor network performance on 0.7.22
>>
>>
>>Another snippet of information that might twig someone's memory... I took a
>>tcpdump of DRBD traffic when doing a large file write and although the MTU
>>is set to 9000 over the direct 1Gbps connection, both systems have their
>TCP
>>windows set to very small values, such as around 800.
>>
>>During a 10 second packet capture I'm also seeing 25 TCP out-of-order
>>segments and 1427 TCP Window updates, which seems to be very high. I've
>>already had a go at raising TCP buffers in /proc/sys/net/core and
>>/proc/sys/net/ipv4 but without any noticeable change in connected speed...
>>
>>On Wed Jun 18, 2008 at 12:39:14 +1000, Oliver Hookins wrote:
>>>Anybody have any tips at all for this issue? I'm running out of ideas...
>>>
>>>On Thu Jun 12, 2008 at 16:04:10 +1000, Oliver Hookins wrote:
>>>>Hi again,
>>>>
>>>>I've been doing a lot of testing and I'm fairly certain I've narrowed
>down
>>>>my performance issues to the network connection. Previously I was getting
>>>>fairly abysmal performance in even DRBD-disconnected mode but I realise
>>now
>>>>this was mainly due to my test file size far exceeding the al-extents
>>>>setting.
>>>>
>>>>I am performing dd tests (bs=1G, count=8) with syncs on the connected
>DRBD
>>>>resources and getting about 10MB/s only. The disks are 10krpm 300GB SCSI
>>and
>>>>can easily get sustained speeds of 60-70MB/s when DRBD is disconnected or
>>>>not used. There is a direct cable between the machines giving them full
>>>>gigabit connectivity via their Intel 80003ES2LAN adaptors (running the
>>e1000
>>>>driver version 7.3.20-k2-NAPI that is standard with RHEL4 x86_64). I have
>>tested
>>>>this connection with Netpipe and get up to 940Mbps.
>>>>
>>>>However DRBD still crawls along at 10MB/s. I have attempted to increase
>>the
>>>>/proc/sys/net/core/{r,w}mem_{default,max} settings which were previously
>>all
>>>>at 132KB, to 1MB for defaults and 2MB for max without any increase in
>>>>performance. MTU on the link is set to 9000 bytes.
>>>>
>>>>In drbd.conf I have sndbuf-size 2M; max-buffers 8192; max-epoch-size
>8192.
>>>>I've also played a little with the unplug watermark setting it to very
>low
>>>>and very high values without any apparent change.
>>>>
>>>>Taking a look at a tcpdump of the traffic the only weird things I could
>>see
>>>>are a lot of TCP window size change notifications and some strange packet
>>>>"clumping", but it's not really offering me any insights I can
>immediately
>>>>see.
>>>>
>>>>Is there anything else I could tune to solve this problem?

-- 
Regards,
Oliver Hookins