Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Greetings, We are experiencing issues with our IBM Server x3655/DRBD-NFS cluster setup in our lab. (This IBM server uses an onboard Broadcom NetXtreme II gigabit ethernet chipset.) Our setup has never worked completely correctly. It seems to be extremely slow, and we have noticed some other symptoms that lead me to believe that we are seeing the same ToE issue that other people have reported. So far I have been unable to locate any jumper or switch that I can set on the IBM server to turn this off. Symptoms: 1. Generally SLOW performance when all nodes are in the cluster, but if the backup node is down, performance is acceptable. I did a set of copy and remove operations with both DRBD nodes (active/backup) enables, and again with the backup node shut down. The directory structure I was copying is 122 megs in size COPY DELETE both Nodes active ~20 seconds ~6 seconds One active node ~ 4 seconds >1 second When the copy or delete are taking place, we see the I/O wait on one of the CPUs on the server spike to 100% if both DRBD nodes are active. 2. SSH issues Servers that mount directories from our DRBD/NFS server will, on occasion, seem to pause for 5 to 15 seconds, then continue to work. I noticed that this was mentioned in one of the ToE threads as well. I have not found a way to disable TOE via hardware on these servers. Disabling via software (with the commands below) has not helped. ethtool -K eth0 rx off ethtool -K eth0 tx off ethtool -K eth0 sg off ethtool -K eth1 rx off ethtool -K eth1 tx off ethtool -K eth1 sg off Does anyone have any experience running DRBD on these servers? Any other suggestions on what to try? Configuration: Two IBM Server x3655 servers DRBD version 8.2.5 Red Hat Enterprise Linux Server release 5.1 (Tikanga) (64 bit) pacemaker-0.6.5-2.2 heartbeat-2.1.3-23.1 DRBD currently uses the same NIC as the network, but we are going to move it to the second NIC later. We have used transfer rates of 10M, 40M and 400M (on an isolated network), but still see the same issues. ------------------------------------------------------------------ >>>cat /proc/drbd version: 8.2.5 (api:88/proto:86-88) GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by bachbuilder@, 2008-03-23 14:10:04 0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r--- ns:0 nr:252 dw:252 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 resync: used:0/31 hits:20 misses:6 starving:0 dirty:0 changed:6 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0 ------------------------------------------------------------------ >>>cat /etc/drbd.conf # drbd.conf resource drbd0 { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/sbin/drbd-peer-outdater"; } startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } syncer { rate 40M; al-extents 257; } on int-dbs-01 { device /dev/drbd0; # disk /dev/sdd2; address 172.24.2.211:7799; meta-disk /dev/sdd1[0]; } on int-dbs-02 { device /dev/drbd0; # disk /dev/sdd2; address 172.24.2.212:7799; meta-disk /dev/sdd1[0]; } } -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080826/ceae2cde/attachment.htm>