Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 11/11/2013 15:10, Mark Coetser wrote: > 2 x debian wheezy nodes, directly connected with 2 x bonded gigibit nics > (bond-mode 0). > > drbd version 8.3.13-2 > > kernel 3.2.0-4-amd64 > > running noop scheduler on both nodes and the following sysctl/disk changes > > sysctl -w net.core.rmem_max=131071 > sysctl -w net.core.wmem_max=131071 > sysctl -w net.ipv4.tcp_rmem="4096 87380 3080192" > sysctl -w net.ipv4.tcp_wmem="4096 16384 3080192" > sysctl -w net.core.netdev_max_backlog=1000 > sysctl -w net.ipv4.tcp_congestion_control=reno > > for i in $(ifconfig | grep "^eth" | awk '{print $1}'); do > ifconfig $i txqueuelen 1000 > ethtool -K $i gro off > done > > sysctl -w net.ipv4.tcp_timestamps=1 > sysctl -w net.ipv4.tcp_sack=1 > sysctl -w net.ipv4.tcp_fin_timeout=60 > > for i in sda sdb sdc sdd; do > blockdev --setra 1024 /dev/$i > echo noop > /sys/block/$i/queue/scheduler > echo 16384 > /sys/block/$i/queue/max_sectors_kb > echo 1024 > /sys/block/$i/queue/nr_requests > done > > > mtu 9000 across underlying bonded interfaces as well as the bond, switch > configured the same. > > drbd.conf sync settings > > > net { > allow-two-primaries; > max-buffers 8192; > max-epoch-size 8192; > #unplug-watermark 128; > #sndbuf-size 0; > sndbuf-size 512k; > } > syncer { > rate 1000M; > #rate 24M; > #group 1; > al-extents 3833; > #al-extents 257; > #verify-alg sha1; > } > > > iperf between servers > > [ 5] 0.0-10.0 sec 388 MBytes 325 Mbits/sec > [ 4] 0.0-10.2 sec 356 MBytes 293 Mbits/sec > > > I am storing users maildirs on the nfs4 share and I have about 5 mail > servers mounting the nfs share, under normal conditions things work fine > and the servers performance is normal but at certain times there is a > large number of either imap connections or mail that is being written to > the nfs share and when this happens the nfs clients cpu load starts > climbing and access to the nfs share becomes almost unresponsive as soon > as I disconnect the secondary node "drbdadm disconnect resource" the > load on the nfs clients drops and things start working again. If I then > reconnect the node at a later stage the resource resyncs fairly quickly > with no noticeable load to the servers or clients and things work fine > for awhile until either a major read/write. Its difficult for me to tell > exactly if its a read/write at the time of the issue. > > Any help or pointers would be great. > > > > anyone have any input on this? I modified the syncer/net configs as below which seems to have improved the load issue, I had about an hour of issues yesterday. net { allow-two-primaries; max-buffers 8192; max-epoch-size 8192; unplug-watermark 128; #sndbuf-size 0; sndbuf-size 512k; } syncer { # Adaptative syncer rate: let DRBD decide the best sync speed # initial sync rate rate 100M; # size of the rate adaptation window c-plan-ahead 20; # min/max rate # The network will allow only up to ~110MB/s, but verify and identical-bloc resyncs use very little network BW c-max-rate 800M; # quantity of sync data to maintain in the buffers (impacts the length of the wait queue) c-fill-target 100k; # Limit the bandwidth available for resync on the primary node when DRBD detects application I/O c-min-rate 8M; al-extents 1023; #al-extents 3389; Thank you, Mark Adrian Coetser