Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 19/11/2013 08:17, Mark Coetser wrote: > > On 11/11/2013 15:10, Mark Coetser wrote: >> 2 x debian wheezy nodes, directly connected with 2 x bonded gigibit nics >> (bond-mode 0). >> >> drbd version 8.3.13-2 >> >> kernel 3.2.0-4-amd64 >> >> running noop scheduler on both nodes and the following sysctl/disk >> changes >> >> sysctl -w net.core.rmem_max=131071 >> sysctl -w net.core.wmem_max=131071 >> sysctl -w net.ipv4.tcp_rmem="4096 87380 3080192" >> sysctl -w net.ipv4.tcp_wmem="4096 16384 3080192" >> sysctl -w net.core.netdev_max_backlog=1000 >> sysctl -w net.ipv4.tcp_congestion_control=reno >> >> for i in $(ifconfig | grep "^eth" | awk '{print $1}'); do >> ifconfig $i txqueuelen 1000 >> ethtool -K $i gro off >> done >> >> sysctl -w net.ipv4.tcp_timestamps=1 >> sysctl -w net.ipv4.tcp_sack=1 >> sysctl -w net.ipv4.tcp_fin_timeout=60 >> >> for i in sda sdb sdc sdd; do >> blockdev --setra 1024 /dev/$i >> echo noop > /sys/block/$i/queue/scheduler >> echo 16384 > /sys/block/$i/queue/max_sectors_kb >> echo 1024 > /sys/block/$i/queue/nr_requests >> done >> >> >> mtu 9000 across underlying bonded interfaces as well as the bond, switch >> configured the same. >> >> drbd.conf sync settings >> >> >> net { >> allow-two-primaries; >> max-buffers 8192; >> max-epoch-size 8192; >> #unplug-watermark 128; >> #sndbuf-size 0; >> sndbuf-size 512k; >> } >> syncer { >> rate 1000M; >> #rate 24M; >> #group 1; >> al-extents 3833; >> #al-extents 257; >> #verify-alg sha1; >> } >> >> >> iperf between servers >> >> [ 5] 0.0-10.0 sec 388 MBytes 325 Mbits/sec >> [ 4] 0.0-10.2 sec 356 MBytes 293 Mbits/sec >> >> >> I am storing users maildirs on the nfs4 share and I have about 5 mail >> servers mounting the nfs share, under normal conditions things work fine >> and the servers performance is normal but at certain times there is a >> large number of either imap connections or mail that is being written to >> the nfs share and when this happens the nfs clients cpu load starts >> climbing and access to the nfs share becomes almost unresponsive as soon >> as I disconnect the secondary node "drbdadm disconnect resource" the >> load on the nfs clients drops and things start working again. If I then >> reconnect the node at a later stage the resource resyncs fairly quickly >> with no noticeable load to the servers or clients and things work fine >> for awhile until either a major read/write. Its difficult for me to tell >> exactly if its a read/write at the time of the issue. >> >> Any help or pointers would be great. >> >> >> >> > > anyone have any input on this? I modified the syncer/net configs as > below which seems to have improved the load issue, I had about an hour > of issues yesterday. > > net { > allow-two-primaries; > max-buffers 8192; > max-epoch-size 8192; > unplug-watermark 128; > #sndbuf-size 0; > sndbuf-size 512k; > } > syncer { > # Adaptative syncer rate: let DRBD decide the best sync > speed > # initial sync rate > rate 100M; > # size of the rate adaptation window > c-plan-ahead 20; > # min/max rate > # The network will allow only up to ~110MB/s, but > verify and identical-bloc resyncs use very little network BW > c-max-rate 800M; > # quantity of sync data to maintain in the buffers > (impacts the length of the wait queue) > c-fill-target 100k; > > # Limit the bandwidth available for resync on the > primary node when DRBD detects application I/O > c-min-rate 8M; > > al-extents 1023; > #al-extents 3389; Upgraded to 8.4 issue still persists, looks like when lots of email is being delivered to the nfs share the load on the nfs clients starts to climb, if I disconnect the drbd resource the load drops.