Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 11/11/2013 15:10, Mark Coetser wrote:
> 2 x debian wheezy nodes, directly connected with 2 x bonded gigibit nics
> (bond-mode 0).
>
> drbd version 8.3.13-2
>
> kernel 3.2.0-4-amd64
>
> running noop scheduler on both nodes and the following sysctl/disk changes
>
> sysctl -w net.core.rmem_max=131071
> sysctl -w net.core.wmem_max=131071
> sysctl -w net.ipv4.tcp_rmem="4096 87380 3080192"
> sysctl -w net.ipv4.tcp_wmem="4096 16384 3080192"
> sysctl -w net.core.netdev_max_backlog=1000
> sysctl -w net.ipv4.tcp_congestion_control=reno
>
> for i in $(ifconfig | grep "^eth" | awk '{print $1}'); do
> ifconfig $i txqueuelen 1000
> ethtool -K $i gro off
> done
>
> sysctl -w net.ipv4.tcp_timestamps=1
> sysctl -w net.ipv4.tcp_sack=1
> sysctl -w net.ipv4.tcp_fin_timeout=60
>
> for i in sda sdb sdc sdd; do
> blockdev --setra 1024 /dev/$i
> echo noop > /sys/block/$i/queue/scheduler
> echo 16384 > /sys/block/$i/queue/max_sectors_kb
> echo 1024 > /sys/block/$i/queue/nr_requests
> done
>
>
> mtu 9000 across underlying bonded interfaces as well as the bond, switch
> configured the same.
>
> drbd.conf sync settings
>
>
> net {
> allow-two-primaries;
> max-buffers 8192;
> max-epoch-size 8192;
> #unplug-watermark 128;
> #sndbuf-size 0;
> sndbuf-size 512k;
> }
> syncer {
> rate 1000M;
> #rate 24M;
> #group 1;
> al-extents 3833;
> #al-extents 257;
> #verify-alg sha1;
> }
>
>
> iperf between servers
>
> [ 5] 0.0-10.0 sec 388 MBytes 325 Mbits/sec
> [ 4] 0.0-10.2 sec 356 MBytes 293 Mbits/sec
>
>
> I am storing users maildirs on the nfs4 share and I have about 5 mail
> servers mounting the nfs share, under normal conditions things work fine
> and the servers performance is normal but at certain times there is a
> large number of either imap connections or mail that is being written to
> the nfs share and when this happens the nfs clients cpu load starts
> climbing and access to the nfs share becomes almost unresponsive as soon
> as I disconnect the secondary node "drbdadm disconnect resource" the
> load on the nfs clients drops and things start working again. If I then
> reconnect the node at a later stage the resource resyncs fairly quickly
> with no noticeable load to the servers or clients and things work fine
> for awhile until either a major read/write. Its difficult for me to tell
> exactly if its a read/write at the time of the issue.
>
> Any help or pointers would be great.
>
>
>
>
anyone have any input on this? I modified the syncer/net configs as
below which seems to have improved the load issue, I had about an hour
of issues yesterday.
net {
allow-two-primaries;
max-buffers 8192;
max-epoch-size 8192;
unplug-watermark 128;
#sndbuf-size 0;
sndbuf-size 512k;
}
syncer {
# Adaptative syncer rate: let DRBD decide the best sync
speed
# initial sync rate
rate 100M;
# size of the rate adaptation window
c-plan-ahead 20;
# min/max rate
# The network will allow only up to ~110MB/s, but
verify and identical-bloc resyncs use very little network BW
c-max-rate 800M;
# quantity of sync data to maintain in the buffers
(impacts the length of the wait queue)
c-fill-target 100k;
# Limit the bandwidth available for resync on the
primary node when DRBD detects application I/O
c-min-rate 8M;
al-extents 1023;
#al-extents 3389;
Thank you,
Mark Adrian Coetser