No subject


Sat Nov 21 08:32:40 CET 2009


First thing I looked into here was enabling jumbo frames on the replication link, but my NIC's don't seem to support this - I cannot set anything above 1500 MTU. In the past when I was doing some legwork on optimization, the deadline scheduler didn't seem to help me out at all. Does anyone have any ideas why I get such an odd bottle neck here? It almost seems like the server is getting behind on DRBD replication which appears to be loading up the system. After the rcp ends, it takes the system several minutes to calm down to a "normal" load again.  We decided on protocol A early on to minimize this effect. From what I can tell, the protocol version is the only thing the should affect replication speed. We are using a straight connection between the secondary onboard 10/100/1000 NICs so there is nothing else on that interface other than secondary heartbeat communications between machines. Any insight into this issue would be greatly appreciated!

 
global { usage-count yes; }
common {
 startup {
  # If a cluster starts up in degraded mode, it will echo a message to all
  # users. It'll wait 60 seconds then halt the system.
  wfc-timeout 120; # drbd init script will wait infinitely on resources.
  degr-wfc-timeout 120; # 2 minutes.
 }
 syncer {
  rate 100M; # Sync rate, in megabytes. 10M is good for 100Mb network.
  verify-alg md5; # can also use md5, crc32c, ect
  csums-alg md5;  # can also use md5, crc32c, ect
  al-extents 3833; # Must be prime, number of active sets.
 }
 handlers {
  pri-on-incon-degr "/usr/local/bin/support_drbd_deg";
  split-brain "/usr/local/bin/support_drbd_sb";
  fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
  fence-peer "/usr/lib64/heartbeat/drbd-peer-outdater -t 5";
  after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
 }
 disk {
  on-io-error detach; # What to do when the lower level device errors.
  no-disk-barrier;
  no-disk-flushes;
  no-disk-drain;
  no-md-flushes;
  fencing resource-only;
 }
 net {
  unplug-watermark 8192; 
  max-buffers 8192; 
  max-epoch-size 8192; 
  sndbuf-size 512k;
  rcvbuf-size 0;
  ko-count 4; # Peer is dead if this count is exceeded.
  after-sb-0pri           discard-zero-changes;
  after-sb-1pri           discard-secondary;
  after-sb-2pri           disconnect;
 }
}
resource drbd0 {
 protocol A;
 device /dev/drbd0;
 disk /dev/sda4;
 meta-disk internal;
 on supportHA1 {
  address 10.0.0.1:7789;
 }
 on supportHA2 {
  address 10.0.0.2:7789;
 }
}

Kenneth M DeChick
Linux Systems Administrator
Community Computer Service, Inc.
(315)-255-1751 ext154
http://www.medent.com
kend at medent.com
Registered Linux User #497318
-- -- -- -- -- -- -- -- -- -- --
"You canna change the laws of physics, Captain; I've got to have thirtyminutes! "

.
 
This message has been scanned for viruses and dangerous content by MailScanner, SpamAssassin &nbsp;&amp; ClamAV. <BR>
 <BR>
This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use <BR>
of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, copying or retention of this e-mail <BR>
or the information contained herein is strictly prohibited. If you received this e-mail in error, please immediately notify the sender by e-mail, and permanently <BR>
delete this e-mail. <BR>


------=OPENWEBMAIL_ATT_0.442011531067209
Content-Type: text/html;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable

<HTML>
<HEAD>
<META content=3D"text/html; charset=3Dutf-8" http-equiv=3DContent-Type>
<META content=3D"OPENWEBMAIL" name=3DGENERATOR>
</HEAD>
<BODY bgColor=3D#ffffff>

<font size=3D"2">Hello all,
<br />
<br />I have a fresh question that's bugging me for the past week or so. Be=
en unable to determine why my cluster is acting this way so I will throw th=
e question out here to the list for your opinions. I have a 2node cluster r=
unning CentOS 5.3 x86_64, with DRBD 8.3.6, Heartbeat 3.0.1-1, and Pacemaker=
 1.0.6 all on a pair of IBM x3500 servers each running 2 x QuadCore 3.0GHz =
and 32GB RAM using 8 x 450GB SAS drives running in RAID10 configuration wit=
h the onboard IBM AACRAID (ServeRAID 8k) controllers.
<br />
<br />What we are seeing is an extremely high load during rcp data transfer=
 from a remote machine to the cluster. I cannot reproduce this on a test cl=
uster, it only seems to happen on our production machine and only during th=
e day with 200+ users connected to the machine. This machine has a lot of u=
sers and activity, but nothing too extreme - we rarely see our 1min load av=
gerage go over a 1 or a 2.
<br />First time we had someone inhouse transfer about 20GB of data to this=
 cluster from a remote machine using rcp - the system load on the primary c=
luster node soared to over a 33! I can now reproduce this on demand at any =
time. After a fresh install and reload of the OS and all our data on both m=
achines, things looked better (still loaded up but not as bad). Now here a =
week later and we are seeing the very same thing again. Transfer of 10 -30G=
B of data from elsewhere on the LAN to this cluster causes it to spin out o=
f control on disk I/O (CPU uses is almost non-existant when this happens). =
I don't see anything posted about possible fragmentation of the DRBD meta-d=
ata, so I am trying to forget that the cluster acted more civilized when it=
 was freshly installed. I have about only about 25% of the available space =
in use on my /dev/drbd0 partition (1.6TB total size).
<br />
<br />I followed the DRBD Users Guide and did a lot of throughput and laten=
cy testing months ago and arrived at what I thought would be optimal for ou=
r uses, but this is something new. Below is my current drbd.conf file. Syst=
em almost acts like it is getting bogged down while replicating data to the=
 secondary, but the secondary is showing almost no load of any kind going o=
n.=20
<br />
<br />I have compiled up some numbers to make the issue a bit more evident =
to you. Using sar I pulled some numbers on disk activity.
<br />These are the numbers I see when both cluster nodes are up and runnin=
g normally with (sar -b) during an rcp data transfer:
<br />
<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0=C2=A0 tps=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 rtps=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 wtps=C2=A0=C2=A0 bread/s=C2=A0=C2=A0 bwrtn/s
<br />13:08:07=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 163.75=C2=A0=C2=A0=C2=A0=
=C2=A0 46.22=C2=A0=C2=A0=C2=A0 117.53=C2=A0=C2=A0=C2=A0 454.18=C2=A0=C2=A0 =
1568.13
<br />13:08:12=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2627.89=C2=A0=C2=A0=C2=A0=C2=
=A0 23.90=C2=A0=C2=A0 2603.98=C2=A0=C2=A0=C2=A0 286.85=C2=A0 59270.52
<br />13:08:17=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3052.60=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 8.40=C2=A0=C2=A0 3044.20=C2=A0=C2=A0=C2=A0=C2=A0 76.80=C2=A0 6867=
3.60
<br />13:08:22=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3050.89=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 2.39=C2=A0=C2=A0 3048.51=C2=A0=C2=A0=C2=A0=C2=A0 19.09=C2=A0 6889=
5.43
<br />13:08:27=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3035.91=C2=A0=C2=A0=C2=A0=C2=
=A0 17.26=C2=A0=C2=A0 3018.65=C2=A0=C2=A0=C2=A0 242.86=C2=A0 67758.73
<br />13:08:32=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3029.11=C2=A0=C2=A0=C2=A0=C2=
=A0 15.45=C2=A0=C2=A0 3013.66=C2=A0=C2=A0=C2=A0 123.56=C2=A0 67873.27
<br />13:08:37=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2990.12=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 0.00=C2=A0=C2=A0 2990.12=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0.00=C2=A0=
 68151.78
<br />13:08:42=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3038.69=C2=A0=C2=A0=C2=A0=C2=
=A0 33.33=C2=A0=C2=A0 3005.36=C2=A0=C2=A0=C2=A0 276.19=C2=A0 68150.79
<br />13:08:47=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3009.86=C2=A0=C2=A0=C2=A0=C2=
=A0 14.79=C2=A0=C2=A0 2995.07=C2=A0=C2=A0=C2=A0 250.89=C2=A0 67747.53
<br />13:08:52=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3046.34=C2=A0=C2=A0=C2=A0=C2=
=A0 17.03=C2=A0=C2=A0 3029.31=C2=A0=C2=A0=C2=A0 242.38=C2=A0 68451.88
<br />13:08:57=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2962.77=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 4.16=C2=A0=C2=A0 2958.61=C2=A0=C2=A0=C2=A0=C2=A0 33.27=C2=A0 6806=
9.70
<br />13:09:02=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2996.63=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 3.56=C2=A0=C2=A0 2993.07=C2=A0=C2=A0=C2=A0=C2=A0 28.51=C2=A0 6800=
0.00
<br />13:09:07=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2982.57=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 0.00=C2=A0=C2=A0 2982.57=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0.00=C2=A0=
 67719.60
<br />13:09:12=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3008.13=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 1.19=C2=A0=C2=A0 3006.94=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 9.52=C2=A0=
 68449.21
<br />13:09:17=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2998.42=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 0.00=C2=A0=C2=A0 2998.42=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0.00=C2=A0=
 68071.29
<br />13:09:22=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1886.36=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 4.74=C2=A0=C2=A0 1881.62=C2=A0=C2=A0=C2=A0=C2=A0 56.92=C2=A0 3585=
7.71
<br />13:09:27=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 432.27=C2=A0=C2=A0=C2=A0=
=C2=A0 17.73=C2=A0=C2=A0=C2=A0 414.54=C2=A0=C2=A0=C2=A0 277.29=C2=A0=C2=A0 =
4231.47
<br />13:09:32=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 134.13=C2=A0=C2=A0=C2=A0=
=C2=A0 32.34=C2=A0=C2=A0=C2=A0 101.80=C2=A0=C2=A0=C2=A0 297.01=C2=A0=C2=A0 =
1191.22=20
<br />
<br />I then shutdown the sceondary server to compare what happens when the=
re is no DRDB replication going on, and the numbers shot up dramatically:
<br />
<br />=C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 tps=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 rtps=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 wtps=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bread/s=C2=A0=C2=A0=C2=
=A0 bwrtn/s
<br />3:58:14=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0 127.69=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 1.79=C2=A0=C2=A0=C2=A0 125.90=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 23.90=C2=A0 =C2=A0 1730.68
<br />13:58:19=C2=A0=C2=A0=C2=A0=C2=A0 20386.06=C2=A0=C2=A0=C2=A0=C2=A0 11.=
75=C2=A0 20374.30=C2=A0=C2=A0=C2=A0 210.36=C2=A0=C2=A0 467987.25=20
<br />13:58:24=C2=A0=C2=A0=C2=A0=C2=A0 22312.57=C2=A0=C2=A0=C2=A0=C2=A0 98.=
80=C2=A0 22213.77=C2=A0=C2=A0 1216.77=C2=A0=C2=A0 510267.47
<br />13:58:29=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 260.44=C2=A0=C2=A0=C2=
=A0=C2=A0 38.17=C2=A0=C2=A0=C2=A0 222.27=C2=A0=C2=A0=C2=A0 419.88=C2=A0=C2=
=A0=C2=A0=C2=A0 2874.75=20
<br />
<br />
<br />From these numbers you can see how my disk activity shot up dramatica=
lly when there was no DRBD replication going on in the background.
<br />
<br />First thing I looked into here was enabling jumbo frames on the repli=
cation link, but my NIC's don't seem to support this - I cannot set anythin=
g above 1500 MTU. In the past when I was doing some legwork on optimization=
, the deadline scheduler didn't seem to help me out at all. Does anyone hav=
e any ideas why I get such an odd bottle neck here? It almost seems like th=
e server is getting behind on DRBD replication which appears to be loading =
up the system. After the rcp ends, it takes the system several minutes to c=
alm down to a &quot;normal&quot; load again.=C2=A0 We decided on protocol A=
 early on to minimize this effect. From what I can tell, the protocol versi=
on is the only thing the should affect replication speed. We are using a st=
raight connection between the secondary onboard 10/100/1000 NICs so there i=
s nothing else on that interface other than secondary heartbeat communicati=
ons between machines. Any insight into this issue would be greatly apprecia=
ted!
<br />
<br />=C2=A0
<br />global { usage-count yes; }
<br />common {
<br />=C2=A0startup {
<br />=C2=A0 # If a cluster starts up in degraded mode, it will echo a mess=
age to all
<br />=C2=A0 # users. It'll wait 60 seconds then halt the system.
<br />=C2=A0 wfc-timeout 120; # drbd init script will wait infinitely on re=
sources.
<br />=C2=A0 degr-wfc-timeout 120; # 2 minutes.
<br />=C2=A0}
<br />=C2=A0syncer {
<br />=C2=A0 rate 100M; # Sync rate, in megabytes. 10M is good for 100Mb ne=
twork.
<br />=C2=A0 verify-alg md5; # can also use md5, crc32c, ect
<br />=C2=A0 csums-alg md5;=C2=A0 # can also use md5, crc32c, ect
<br />=C2=A0 al-extents 3833; # Must be prime, number of active sets.
<br />=C2=A0}
<br />=C2=A0handlers {
<br />=C2=A0 pri-on-incon-degr &quot;/usr/local/bin/support_drbd_deg&quot;;
<br />=C2=A0 split-brain &quot;/usr/local/bin/support_drbd_sb&quot;;
<br />=C2=A0 fence-peer &quot;/usr/lib/drbd/crm-fence-peer.sh&quot;;
<br />=C2=A0 fence-peer &quot;/usr/lib64/heartbeat/drbd-peer-outdater -t 5&=
quot;;
<br />=C2=A0 after-resync-target &quot;/usr/lib/drbd/crm-unfence-peer.sh&qu=
ot;;
<br />=C2=A0}
<br />=C2=A0disk {
<br />=C2=A0 on-io-error detach; # What to do when the lower level device e=
rrors.
<br />=C2=A0 no-disk-barrier;
<br />=C2=A0 no-disk-flushes;
<br />=C2=A0 no-disk-drain;
<br />=C2=A0 no-md-flushes;
<br />=C2=A0 fencing resource-only;
<br />=C2=A0}
<br />=C2=A0net {
<br />=C2=A0 unplug-watermark 8192;=20
<br />=C2=A0 max-buffers 8192;=20
<br />=C2=A0 max-epoch-size 8192;=20
<br />=C2=A0 sndbuf-size 512k;
<br />=C2=A0 rcvbuf-size 0;
<br />=C2=A0 ko-count 4; # Peer is dead if this count is exceeded.
<br />=C2=A0 after-sb-0pri=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 discard-zero-changes;
<br />=C2=A0 after-sb-1pri=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 discard-secondary;
<br />=C2=A0 after-sb-2pri=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 disconnect;
<br />=C2=A0}
<br />}
<br />resource drbd0 {
<br />=C2=A0protocol A;
<br />=C2=A0device /dev/drbd0;
<br />=C2=A0disk /dev/sda4;
<br />=C2=A0meta-disk internal;
<br />=C2=A0on supportHA1 {
<br />=C2=A0 address 10.0.0.1:7789;
<br />=C2=A0}
<br />=C2=A0on supportHA2 {
<br />=C2=A0 address 10.0.0.2:7789;
<br />=C2=A0}
<br />}
<br />
<br />
<br />Kenneth M=20
DeChick

<br />Linux Systems=20
Administrator

<br />Community Computer Service,=20
Inc.

<br />(315)-255-1751=20=20
ext154

<br />http://www.medent.com

<br />kend at medent.com

<br />Registered Linux User=20
#497318

<br />-- -- -- -- -- -- -- -- -- --=20
--
<br />
&quot;You canna change the laws of physics, Captain; I've got to have=20
thirty
minutes!=20
&quot;
<br />

<br />.
<center><img src=3D"https://www.medent.com/openwebmail/images/defmailsig.jp=
g" /></center>
<br />
</font>
</BODY>
</HTML>

This message has been scanned for viruses and dangerous content by MailScan=
ner, SpamAssassin &nbsp;&amp; ClamAV. <BR>
 <BR>
This message and any attachments may contain information that is protected =
by law as privileged and confidential, and is transmitted for the sole use =
<BR>
of the intended recipient(s). If you are not the intended recipient, you ar=
e hereby notified that any use, dissemination, copying or retention of this=
 e-mail <BR>
or the information contained herein is strictly prohibited. If you received=
 this e-mail in error, please immediately notify the sender by e-mail, and =
permanently <BR>
delete this e-mail. <BR>


------=OPENWEBMAIL_ATT_0.442011531067209--


More information about the drbd-user mailing list