Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
2 x debian wheezy nodes, directly connected with 2 x bonded gigibit nics (bond-mode 0). drbd version 8.3.13-2 kernel 3.2.0-4-amd64 running noop scheduler on both nodes and the following sysctl/disk changes sysctl -w net.core.rmem_max=131071 sysctl -w net.core.wmem_max=131071 sysctl -w net.ipv4.tcp_rmem="4096 87380 3080192" sysctl -w net.ipv4.tcp_wmem="4096 16384 3080192" sysctl -w net.core.netdev_max_backlog=1000 sysctl -w net.ipv4.tcp_congestion_control=reno for i in $(ifconfig | grep "^eth" | awk '{print $1}'); do ifconfig $i txqueuelen 1000 ethtool -K $i gro off done sysctl -w net.ipv4.tcp_timestamps=1 sysctl -w net.ipv4.tcp_sack=1 sysctl -w net.ipv4.tcp_fin_timeout=60 for i in sda sdb sdc sdd; do blockdev --setra 1024 /dev/$i echo noop > /sys/block/$i/queue/scheduler echo 16384 > /sys/block/$i/queue/max_sectors_kb echo 1024 > /sys/block/$i/queue/nr_requests done mtu 9000 across underlying bonded interfaces as well as the bond, switch configured the same. drbd.conf sync settings net { allow-two-primaries; max-buffers 8192; max-epoch-size 8192; #unplug-watermark 128; #sndbuf-size 0; sndbuf-size 512k; } syncer { rate 1000M; #rate 24M; #group 1; al-extents 3833; #al-extents 257; #verify-alg sha1; } iperf between servers [ 5] 0.0-10.0 sec 388 MBytes 325 Mbits/sec [ 4] 0.0-10.2 sec 356 MBytes 293 Mbits/sec I am storing users maildirs on the nfs4 share and I have about 5 mail servers mounting the nfs share, under normal conditions things work fine and the servers performance is normal but at certain times there is a large number of either imap connections or mail that is being written to the nfs share and when this happens the nfs clients cpu load starts climbing and access to the nfs share becomes almost unresponsive as soon as I disconnect the secondary node "drbdadm disconnect resource" the load on the nfs clients drops and things start working again. If I then reconnect the node at a later stage the resource resyncs fairly quickly with no noticeable load to the servers or clients and things work fine for awhile until either a major read/write. Its difficult for me to tell exactly if its a read/write at the time of the issue. Any help or pointers would be great. -- Thank you, Mark Adrian Coetser mark at tux-edo.co.za http://www.tux-edo.co.za http://www.tux-voip.co.za cel: +27 76 527 8789 * This is complicated. Has to do with interrupts. Thus, I am * scared witless. Therefore I refuse to write this function. :-P -- From the maclinux patch