Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 19/11/2013 08:17, Mark Coetser wrote:
>
> On 11/11/2013 15:10, Mark Coetser wrote:
>> 2 x debian wheezy nodes, directly connected with 2 x bonded gigibit nics
>> (bond-mode 0).
>>
>> drbd version 8.3.13-2
>>
>> kernel 3.2.0-4-amd64
>>
>> running noop scheduler on both nodes and the following sysctl/disk
>> changes
>>
>> sysctl -w net.core.rmem_max=131071
>> sysctl -w net.core.wmem_max=131071
>> sysctl -w net.ipv4.tcp_rmem="4096 87380 3080192"
>> sysctl -w net.ipv4.tcp_wmem="4096 16384 3080192"
>> sysctl -w net.core.netdev_max_backlog=1000
>> sysctl -w net.ipv4.tcp_congestion_control=reno
>>
>> for i in $(ifconfig | grep "^eth" | awk '{print $1}'); do
>> ifconfig $i txqueuelen 1000
>> ethtool -K $i gro off
>> done
>>
>> sysctl -w net.ipv4.tcp_timestamps=1
>> sysctl -w net.ipv4.tcp_sack=1
>> sysctl -w net.ipv4.tcp_fin_timeout=60
>>
>> for i in sda sdb sdc sdd; do
>> blockdev --setra 1024 /dev/$i
>> echo noop > /sys/block/$i/queue/scheduler
>> echo 16384 > /sys/block/$i/queue/max_sectors_kb
>> echo 1024 > /sys/block/$i/queue/nr_requests
>> done
>>
>>
>> mtu 9000 across underlying bonded interfaces as well as the bond, switch
>> configured the same.
>>
>> drbd.conf sync settings
>>
>>
>> net {
>> allow-two-primaries;
>> max-buffers 8192;
>> max-epoch-size 8192;
>> #unplug-watermark 128;
>> #sndbuf-size 0;
>> sndbuf-size 512k;
>> }
>> syncer {
>> rate 1000M;
>> #rate 24M;
>> #group 1;
>> al-extents 3833;
>> #al-extents 257;
>> #verify-alg sha1;
>> }
>>
>>
>> iperf between servers
>>
>> [ 5] 0.0-10.0 sec 388 MBytes 325 Mbits/sec
>> [ 4] 0.0-10.2 sec 356 MBytes 293 Mbits/sec
>>
>>
>> I am storing users maildirs on the nfs4 share and I have about 5 mail
>> servers mounting the nfs share, under normal conditions things work fine
>> and the servers performance is normal but at certain times there is a
>> large number of either imap connections or mail that is being written to
>> the nfs share and when this happens the nfs clients cpu load starts
>> climbing and access to the nfs share becomes almost unresponsive as soon
>> as I disconnect the secondary node "drbdadm disconnect resource" the
>> load on the nfs clients drops and things start working again. If I then
>> reconnect the node at a later stage the resource resyncs fairly quickly
>> with no noticeable load to the servers or clients and things work fine
>> for awhile until either a major read/write. Its difficult for me to tell
>> exactly if its a read/write at the time of the issue.
>>
>> Any help or pointers would be great.
>>
>>
>>
>>
>
> anyone have any input on this? I modified the syncer/net configs as
> below which seems to have improved the load issue, I had about an hour
> of issues yesterday.
>
> net {
> allow-two-primaries;
> max-buffers 8192;
> max-epoch-size 8192;
> unplug-watermark 128;
> #sndbuf-size 0;
> sndbuf-size 512k;
> }
> syncer {
> # Adaptative syncer rate: let DRBD decide the best sync
> speed
> # initial sync rate
> rate 100M;
> # size of the rate adaptation window
> c-plan-ahead 20;
> # min/max rate
> # The network will allow only up to ~110MB/s, but
> verify and identical-bloc resyncs use very little network BW
> c-max-rate 800M;
> # quantity of sync data to maintain in the buffers
> (impacts the length of the wait queue)
> c-fill-target 100k;
>
> # Limit the bandwidth available for resync on the
> primary node when DRBD detects application I/O
> c-min-rate 8M;
>
> al-extents 1023;
> #al-extents 3389;
Upgraded to 8.4 issue still persists, looks like when lots of email is
being delivered to the nfs share the load on the nfs clients starts to
climb, if I disconnect the drbd resource the load drops.