<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:Courier New, courier, monaco, monospace, sans-serif;font-size:10pt"><DIV>Greetings,</DIV>
<DIV> </DIV>
<DIV>We are experiencing issues with our IBM Server x3655/DRBD-NFS cluster<BR>setup in our lab. (This IBM server uses an onboard Broadcom NetXtreme II<BR>gigabit ethernet chipset.) </DIV>
<DIV> </DIV>
<DIV>Our setup has never worked completely correctly. It seems <BR>to be extremely slow, and we have noticed some other symptoms that <BR>lead me to believe that we are seeing the same ToE issue that other<BR>people have reported.</DIV>
<DIV> </DIV>
<DIV>So far I have been unable to locate any jumper or switch that I can set<BR>on the IBM server to turn this off.</DIV>
<DIV> </DIV>
<DIV>Symptoms:</DIV>
<DIV>1. Generally SLOW performance when all nodes are in the cluster, but if the</DIV>
<DIV> backup node is down, performance is acceptable.<BR> <BR> I did a set of copy and remove operations with both DRBD nodes <BR> (active/backup) enables, and again with the backup node shut down.<BR> The directory structure I was copying is 122 megs in size<BR> COPY DELETE<BR> both Nodes active ~20 seconds ~6 seconds<BR> One active node ~ 4 seconds >1 second</DIV>
<DIV> </DIV>
<DIV> When the copy or delete are taking place, we see the I/O wait<BR> on one of the CPUs on the server spike to 100% if both DRBD <BR> nodes are active.</DIV>
<DIV> </DIV>
<DIV>2. SSH issues<BR> Servers that mount directories from our DRBD/NFS server will, on<BR> occasion, seem to pause for 5 to 15 seconds, then continue to work.<BR> I noticed that this was mentioned in one of the ToE threads as well.</DIV>
<DIV> </DIV>
<DIV>I have not found a way to disable TOE via hardware on these servers.<BR>Disabling via software (with the commands below) has not helped.<BR> ethtool -K eth0 rx off<BR> ethtool -K eth0 tx off<BR> ethtool -K eth0 sg off<BR> ethtool -K eth1 rx off<BR> ethtool -K eth1 tx off<BR> ethtool -K eth1 sg off</DIV>
<DIV> </DIV>
<DIV>Does anyone have any experience running DRBD on these servers?</DIV>
<DIV>Any other suggestions on what to try?</DIV>
<DIV> </DIV>
<DIV>Configuration:<BR> Two IBM Server x3655 servers<BR> DRBD version 8.2.5<BR> Red Hat Enterprise Linux Server release 5.1 (Tikanga) (64 bit)<BR> pacemaker-0.6.5-2.2<BR> heartbeat-2.1.3-23.1<BR> <BR> DRBD currently uses the same NIC as the network, but we are<BR> going to move it to the second NIC later.<BR> We have used transfer rates of 10M, 40M and 400M (on an isolated<BR> network), but still see the same issues.<BR> </DIV>
<DIV>------------------------------------------------------------------<BR>>>>cat /proc/drbd<BR>version: 8.2.5 (api:88/proto:86-88)<BR>GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by bachbuilder@, 2008-03-23 14:10:04<BR> 0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---<BR> ns:0 nr:252 dw:252 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0<BR> resync: used:0/31 hits:20 misses:6 starving:0 dirty:0 changed:6<BR> act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0</DIV>
<DIV><BR>------------------------------------------------------------------<BR>>>>cat /etc/drbd.conf<BR># drbd.conf<BR>resource drbd0 {<BR> protocol C;<BR> handlers {<BR> pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";<BR> pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";<BR> local-io-error "echo o > /proc/sysrq-trigger ; halt -f";<BR> outdate-peer "/usr/sbin/drbd-peer-outdater";<BR> }<BR> startup {<BR> degr-wfc-timeout 120; # 2 minutes.<BR> }</DIV>
<DIV> disk {<BR> on-io-error detach;<BR> }</DIV>
<DIV> syncer {<BR> rate 40M;<BR> al-extents 257;<BR> }</DIV>
<DIV>on int-dbs-01 {<BR> device /dev/drbd0; #<BR> disk /dev/sdd2;<BR> address 172.24.2.211:7799;<BR> meta-disk /dev/sdd1[0];<BR> }</DIV>
<DIV>on int-dbs-02 {<BR> device /dev/drbd0; #<BR> disk /dev/sdd2;<BR> address 172.24.2.212:7799;<BR> meta-disk /dev/sdd1[0];<BR> }<BR>}<BR></DIV></div><br>
</body></html>