[DRBD-user] High Load issue repost - legible?

Ken Dechick kend at medent.com
Thu Feb 18 20:52:30 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello all,

Not sure what happened to my post - almost looks like I posted in html
somehow. Trying to repost so this is legible - double checking that I am in
text mode...

I have a fresh question that's bugging me for the past week or so. Been unable
to determine why my cluster is acting this way so I will throw the question
out here to the list for your opinions.

I have a 2 node cluster running CentOS 5.3 x86_64, with DRBD 8.3.6, Heartbeat
3.0.1-1, and Pacemaker 1.0.6 all on a pair of IBM x3500 servers each running 2
x QuadCore 3.0GHz and 32GB RAM using 8 x 450GB SAS drives running in RAID10
configuration with the onboard IBM AACRAID (ServeRAID 8k) controllers.

What we are seeing is an extremely high load during rcp data transfer from a
remote machine to the cluster. I cannot reproduce this on a test cluster, it
only seems to happen on our production machine and only during the day with
150+ users connected to the machine. This machine has a lot of users and
activity, but nothing too extreme - we rarely see our 1min load avgerage go
over a 1 or a 2. When the load shoots up, it is all IO wait - gets as high as
30% - almost not CPU activity at all. (we have 16 cores per server!)

First time we had someone inhouse transfer about 20GB of data to this cluster
from a remote machine using rcp - the system load on the primary cluster node
soared to over a 33! I can now reproduce this on demand at any time. After a
fresh install and reload of the OS and all our data on both machines, things
looked better (still loaded up but not as bad). 
Now here a week later and we are seeing the very same thing again. Transfer of
10 -30GB of data from elsewhere on the LAN to this cluster causes it to spin
out of control on disk I/O (CPU uses is almost non-existant when this happens). 
I don't see anything posted about possible fragmentation of the DRBD
meta-data, so I am trying to forget that the cluster acted more civilized when
it was freshly installed. I have about only about 25% of the available space
in use on my /dev/drbd0 partition (1.6TB total size).

I followed the DRBD Users Guide and did a lot of throughput and latency
testing months ago and arrived at what I thought would be optimal for our
uses, but this is something new. Below is my current drbd.conf file. System
almost acts like it is getting bogged down while replicating data to the
secondary, but the secondary is showing almost no load of any kind going on.

I have compiled up some numbers to make the issue a bit more evident to you.
Using sar I pulled some numbers on disk activity.
These are the numbers I see when both cluster nodes are up and running
normally with (sar -b) during an rcp data transfer:

                  tps      rtps      wtps   bread/s   bwrtn/s
13:08:07       163.75     46.22    117.53    454.18   1568.13
13:08:12      2627.89     23.90   2603.98    286.85  59270.52
13:08:17      3052.60      8.40   3044.20     76.80  68673.60
13:08:22      3050.89      2.39   3048.51     19.09  68895.43
13:08:27      3035.91     17.26   3018.65    242.86  67758.73
13:08:32      3029.11     15.45   3013.66    123.56  67873.27
13:08:37      2990.12      0.00   2990.12      0.00  68151.78
13:08:42      3038.69     33.33   3005.36    276.19  68150.79
13:08:47      3009.86     14.79   2995.07    250.89  67747.53
13:08:52      3046.34     17.03   3029.31    242.38  68451.88
13:08:57      2962.77      4.16   2958.61     33.27  68069.70
13:09:02      2996.63      3.56   2993.07     28.51  68000.00
13:09:07      2982.57      0.00   2982.57      0.00  67719.60
13:09:12      3008.13      1.19   3006.94      9.52  68449.21
13:09:17      2998.42      0.00   2998.42      0.00  68071.29
13:09:22      1886.36      4.74   1881.62     56.92  35857.71
13:09:27       432.27     17.73    414.54    277.29   4231.47
13:09:32       134.13     32.34    101.80    297.01   1191.22

I then shutdown the sceondary server to compare what happens when there is no
DRDB replication going on, and the numbers shot up dramatically:

                   tps      rtps      wtps      bread/s    bwrtn/s
3:58:14         127.69      1.79    125.90        23.90    1730.68
13:58:19      20386.06     11.75  20374.30       210.36   467987.25
13:58:24      22312.57     98.80  22213.77      1216.77   510267.47
13:58:29        260.44     38.17    222.27       419.88     2874.75


>From these numbers you can see how my disk activity shot up dramatically when
there was no DRBD replication going on in the background.

First thing I looked into here was enabling jumbo frames on the replication
link, but my NIC's don't seem to support this - I cannot set anything above
1500 MTU. In the past when I was doing some legwork on optimization, the
deadline scheduler didn't seem to help me out at all. 
Does anyone have any ideas why I get such an odd bottle neck here? It almost
seems like the server is getting behind on DRBD replication which appears to
be loading up the system. After the rcp ends, it takes the system several
minutes to calm down to a "normal" load again.  
We decided on protocol A early in our testing - provided better speed. From
what I can tell, the protocol version is the only thing the should affect
replication speed. 
We are using a straight connection between the secondary onboard 10/100/1000
NICs so there is nothing else on that interface other than secondary heartbeat
communications between machines.

Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express

Sorry for the long post - trying to get as much detail here as possibe. Any
insight into this issue would be greatly appreciated!

 
common {
 startup {
  # If a cluster starts up in degraded mode, it will echo a message to all
  # users. It'll wait 60 seconds then halt the system.
  wfc-timeout 120; # drbd init script will wait infinitely on resources.
  degr-wfc-timeout 120; # 2 minutes.
 }
 syncer {
  rate 100M; # Sync rate, in megabytes. 10M is good for 100Mb network.
  verify-alg md5; # can also use md5, crc32c, ect
  csums-alg md5;  # can also use md5, crc32c, ect
  al-extents 3833; # Must be prime, number of active sets.
 }
 handlers {
  pri-on-incon-degr "/usr/local/bin/support_drbd_deg";
  split-brain "/usr/local/bin/support_drbd_sb";
  fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
  fence-peer "/usr/lib64/heartbeat/drbd-peer-outdater -t 5";
  after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
 }
 disk {
  on-io-error detach; # What to do when the lower level device errors.
  no-disk-barrier;
  no-disk-flushes;
  no-disk-drain;
  no-md-flushes;
  fencing resource-only;
 }
 net {
  unplug-watermark 8192;
  max-buffers 8192;
  max-epoch-size 8192;
  sndbuf-size 512k;
  rcvbuf-size 0;
  ko-count 4; # Peer is dead if this count is exceeded.
  after-sb-0pri           discard-zero-changes;
  after-sb-1pri           discard-secondary;
  after-sb-2pri           disconnect;
 }
}
resource drbd0 {
 protocol A;
 device /dev/drbd0;
 disk /dev/sda4;
 meta-disk internal;
 on supportHA1 {
  address 10.0.0.1:7789;
 }
 on supportHA2 {
  address 10.0.0.2:7789;
 }
} 

Does anyone have any suggestions before I consider Linbit support??


Thanks!


 Kenneth M DeChick 
 Linux Systems Administrator 
 Community Computer Service, Inc. 
 (315)-255-1751 ext154 
 http://www.medent.com 
 kend at medent.com 
 Registered Linux User #497318 
 -- -- -- -- -- -- -- -- -- -- -- 
 "You canna change the laws of physics, Captain; I've got to have thirtyminutes! "

 .

This message has been scanned for viruses and dangerous content by MailScanner, SpamAssassin  & ClamAV.

This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use
of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, copying or retention of this e-mail
or the information contained herein is strictly prohibited. If you received this e-mail in error, please immediately notify the sender by e-mail, and permanently
delete this e-mail.




More information about the drbd-user mailing list