No subject


Sat Nov 21 08:32:40 CET 2009


there was no DRBD replication going on in the background.

First thing I looked into here was enabling jumbo frames on the replication
link, but my NIC's don't seem to support this - I cannot set anything above
1500 MTU. In the past when I was doing some legwork on optimization, the
deadline scheduler didn't seem to help me out at all. 
Does anyone have any ideas why I get such an odd bottle neck here? It almost
seems like the server is getting behind on DRBD replication which appears to
be loading up the system. After the rcp ends, it takes the system several
minutes to calm down to a "normal" load again.  
We decided on protocol A early in our testing - provided better speed. From
what I can tell, the protocol version is the only thing the should affect
replication speed. 
We are using a straight connection between the secondary onboard 10/100/1000
NICs so there is nothing else on that interface other than secondary heartbeat
communications between machines.

Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express

Sorry for the long post - trying to get as much detail here as possibe. Any
insight into this issue would be greatly appreciated!

 
common {
 startup {
  # If a cluster starts up in degraded mode, it will echo a message to all
  # users. It'll wait 60 seconds then halt the system.
  wfc-timeout 120; # drbd init script will wait infinitely on resources.
  degr-wfc-timeout 120; # 2 minutes.
 }
 syncer {
  rate 100M; # Sync rate, in megabytes. 10M is good for 100Mb network.
  verify-alg md5; # can also use md5, crc32c, ect
  csums-alg md5;  # can also use md5, crc32c, ect
  al-extents 3833; # Must be prime, number of active sets.
 }
 handlers {
  pri-on-incon-degr "/usr/local/bin/support_drbd_deg";
  split-brain "/usr/local/bin/support_drbd_sb";
  fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
  fence-peer "/usr/lib64/heartbeat/drbd-peer-outdater -t 5";
  after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
 }
 disk {
  on-io-error detach; # What to do when the lower level device errors.
  no-disk-barrier;
  no-disk-flushes;
  no-disk-drain;
  no-md-flushes;
  fencing resource-only;
 }
 net {
  unplug-watermark 8192;
  max-buffers 8192;
  max-epoch-size 8192;
  sndbuf-size 512k;
  rcvbuf-size 0;
  ko-count 4; # Peer is dead if this count is exceeded.
  after-sb-0pri           discard-zero-changes;
  after-sb-1pri           discard-secondary;
  after-sb-2pri           disconnect;
 }
}
resource drbd0 {
 protocol A;
 device /dev/drbd0;
 disk /dev/sda4;
 meta-disk internal;
 on supportHA1 {
  address 10.0.0.1:7789;
 }
 on supportHA2 {
  address 10.0.0.2:7789;
 }
} 

Does anyone have any suggestions before I consider Linbit support??


Thanks!


 Kenneth M DeChick 
 Linux Systems Administrator 
 Community Computer Service, Inc. 
 (315)-255-1751 ext154 
 http://www.medent.com 
 kend at medent.com 
 Registered Linux User #497318 
 -- -- -- -- -- -- -- -- -- -- -- 
 "You canna change the laws of physics, Captain; I've got to have thirtyminutes! "

 .

This message has been scanned for viruses and dangerous content by MailScanner, SpamAssassin  & ClamAV.

This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use
of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, copying or retention of this e-mail
or the information contained herein is strictly prohibited. If you received this e-mail in error, please immediately notify the sender by e-mail, and permanently
delete this e-mail.



More information about the drbd-user mailing list