[DRBD-user] High system load during rcp data transfers

Mon Feb 15 16:16:17 CET 2010

Hello all,

I have a fresh question that's bugging me for the past week or so. Been unable to determine why my cluster is acting this way so I will throw the question out here to the list for your opinions. I have a 2node cluster running CentOS 5.3 x86_64, with DRBD 8.3.6, Heartbeat 3.0.1-1, and Pacemaker 1.0.6 all on a pair of IBM x3500 servers each running 2 x QuadCore 3.0GHz and 32GB RAM using 8 x 450GB SAS drives running in RAID10 configuration with the onboard IBM AACRAID (ServeRAID 8k) controllers.

What we are seeing is an extremely high load during rcp data transfer from a remote machine to the cluster. I cannot reproduce this on a test cluster, it only seems to happen on our production machine and only during the day with 200+ users connected to the machine. This machine has a lot of users and activity, but nothing too extreme - we rarely see our 1min load avgerage go over a 1 or a 2.
First time we had someone inhouse transfer about 20GB of data to this cluster from a remote machine using rcp - the system load on the primary cluster node soared to over a 33! I can now reproduce this on demand at any time. After a fresh install and reload of the OS and all our data on both machines, things looked better (still loaded up but not as bad). Now here a week later and we are seeing the very same thing again. Transfer of 10 -30GB of data from elsewhere on the LAN to this cluster causes it to spin out of control on disk I/O (CPU uses is almost non-existant when this happens). I don't see anything posted about possible fragmentation of the DRBD meta-data, so I am trying to forget that the cluster acted more civilized when it was freshly installed. I have about only about 25% of the available space in use on my /dev/drbd0 partition (1.6TB total size).

I followed the DRBD Users Guide and did a lot of throughput and latency testing months ago and arrived at what I thought would be optimal for our uses, but this is something new. Below is my current drbd.conf file. System almost acts like it is getting bogged down while replicating data to the secondary, but the secondary is showing almost no load of any kind going on.

I have compiled up some numbers to make the issue a bit more evident to you. Using sar I pulled some numbers on disk activity.
These are the numbers I see when both cluster nodes are up and running normally with (sar -b) during an rcp data transfer:

                         tps      rtps      wtps   bread/s   bwrtn/s
13:08:07       163.75     46.22    117.53    454.18   1568.13
13:08:12      2627.89     23.90   2603.98    286.85  59270.52
13:08:17      3052.60      8.40   3044.20     76.80  68673.60
13:08:22      3050.89      2.39   3048.51     19.09  68895.43
13:08:27      3035.91     17.26   3018.65    242.86  67758.73
13:08:32      3029.11     15.45   3013.66    123.56  67873.27
13:08:37      2990.12      0.00   2990.12      0.00  68151.78
13:08:42      3038.69     33.33   3005.36    276.19  68150.79
13:08:47      3009.86     14.79   2995.07    250.89  67747.53
13:08:52      3046.34     17.03   3029.31    242.38  68451.88
13:08:57      2962.77      4.16   2958.61     33.27  68069.70
13:09:02      2996.63      3.56   2993.07     28.51  68000.00
13:09:07      2982.57      0.00   2982.57      0.00  67719.60
13:09:12      3008.13      1.19   3006.94      9.52  68449.21
13:09:17      2998.42      0.00   2998.42      0.00  68071.29
13:09:22      1886.36      4.74   1881.62     56.92  35857.71
13:09:27       432.27     17.73    414.54    277.29   4231.47
13:09:32       134.13     32.34    101.80    297.01   1191.22

I then shutdown the sceondary server to compare what happens when there is no DRDB replication going on, and the numbers shot up dramatically:

                        tps      rtps      wtps      bread/s    bwrtn/s
3:58:14         127.69      1.79    125.90       23.90    1730.68
13:58:19     20386.06     11.75  20374.30    210.36   467987.25 
13:58:24     22312.57     98.80  22213.77   1216.77   510267.47
13:58:29        260.44     38.17    222.27    419.88     2874.75

>From these numbers you can see how my disk activity shot up dramatically when there was no DRBD replication going on in the background.

First thing I looked into here was enabling jumbo frames on the replication link, but my NIC's don't seem to support this - I cannot set anything above 1500 MTU. In the past when I was doing some legwork on optimization, the deadline scheduler didn't seem to help me out at all. Does anyone have any ideas why I get such an odd bottle neck here? It almost seems like the server is getting behind on DRBD replication which appears to be loading up the system. After the rcp ends, it takes the system several minutes to calm down to a "normal" load again.  We decided on protocol A early on to minimize this effect. From what I can tell, the protocol version is the only thing the should affect replication speed. We are using a straight connection between the secondary onboard 10/100/1000 NICs so there is nothing else on that interface other than secondary heartbeat communications between machines. Any insight into this issue would be greatly appreciated!

global { usage-count yes; }
common {
 startup {
  # If a cluster starts up in degraded mode, it will echo a message to all
  # users. It'll wait 60 seconds then halt the system.
  wfc-timeout 120; # drbd init script will wait infinitely on resources.
  degr-wfc-timeout 120; # 2 minutes.
 }
 syncer {
  rate 100M; # Sync rate, in megabytes. 10M is good for 100Mb network.
  verify-alg md5; # can also use md5, crc32c, ect
  csums-alg md5;  # can also use md5, crc32c, ect
  al-extents 3833; # Must be prime, number of active sets.
 }
 handlers {
  pri-on-incon-degr "/usr/local/bin/support_drbd_deg";
  split-brain "/usr/local/bin/support_drbd_sb";
  fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
  fence-peer "/usr/lib64/heartbeat/drbd-peer-outdater -t 5";
  after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
 }
 disk {
  on-io-error detach; # What to do when the lower level device errors.
  no-disk-barrier;
  no-disk-flushes;
  no-disk-drain;
  no-md-flushes;
  fencing resource-only;
 }
 net {
  unplug-watermark 8192; 
  max-buffers 8192; 
  max-epoch-size 8192; 
  sndbuf-size 512k;
  rcvbuf-size 0;
  ko-count 4; # Peer is dead if this count is exceeded.
  after-sb-0pri           discard-zero-changes;
  after-sb-1pri           discard-secondary;
  after-sb-2pri           disconnect;
 }
}
resource drbd0 {
 protocol A;
 device /dev/drbd0;
 disk /dev/sda4;
 meta-disk internal;
 on supportHA1 {
  address 10.0.0.1:7789;
 }
 on supportHA2 {
  address 10.0.0.2:7789;
 }
}

Kenneth M DeChick
Linux Systems Administrator
Community Computer Service, Inc.
(315)-255-1751 ext154
http://www.medent.com
kend at medent.com
Registered Linux User #497318
-- -- -- -- -- -- -- -- -- -- --
"You canna change the laws of physics, Captain; I've got to have thirtyminutes! "

.

This message has been scanned for viruses and dangerous content by MailScanner, SpamAssassin &nbsp;&amp; ClamAV. 
 
This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use 
of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, copying or retention of this e-mail 
or the information contained herein is strictly prohibited. If you received this e-mail in error, please immediately notify the sender by e-mail, and permanently 
delete this e-mail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100215/183817f8/attachment.htm>