[DRBD-user] ReSync...

Thu Mar 3 18:58:30 CET 2005

We have a 1.8T Software Raided system that is DRBD'd across to an identical 
system... it takes 7 hrs or so to Sync.  Which doesnt seem out of the 
ordinary.  However - after all is Synced up and done  - if we reboot the 
boxes - the ReSync starts all over on the entire disk.  Is this normal 
behavior?  Am i doing something wrong?

Box 1 we do:

  modprobe drbd
  drbdadm up all

Then box 2 we do:

  modprobe drbd
  drbdadm up all

then back on box 1 we set it to Primary:

 drbdsetup /dev/drbd0 primary

----------------

[root at ydog1 root]# cat /proc/drbd
version: 0.7.10 (api:77/proto:74)
SVN Revision: 1743 build by root at ydog1.hostname.net, 2005-02-27 00:38:31
 0: cs:SyncSource st:Primary/Secondary ld:Consistent
    ns:122979178 nr:0 dw:326 dr:122987125 al:3 bm:128976 lo:1 pe:2111 
ua:2028 ap:0
        [=>..................] sync'ed:  6.2% (1823439/1943527)M
        finish: 7:08:24 speed: 72,572 (67,012) K/sec
 1: cs:Unconfigured

-----------------

skip {
  As you can see, you can also comment chunks of text
  with a 'skip[optional nonsense]{ skipped text }' section.
  This comes in handy, if you just want to comment out
  some 'resource <some name> {...}' section:
  just precede it with 'skip'.

  The basic format of option assignment is
  <option name><linear whitespace><value>;

  It should be obvious from the examples below,
  but if you really care to know the details:

  <option name> :=
        valid options in the respective scope
  <value>  := <num>|<string>|<choice>|...
              depending on the set of allowed values
              for the respective option.
  <num>    := [0-9]+, sometimes with an optional suffix of K,M,G
  <string> := (<name>|\"([^\"\\\n]*|\\.)*\")+
  <name>   := [/_.A-Za-z0-9-]+
}

#
# At most ONE global section is allowed.
# It must precede any resource section.
#
# global {
    # use this if you want to define more resources later
    # without reloading the module.
    # by default we load the module with exactly as many devices
    # as configured mentioned in this file.
    #
    # minor-count 5;

    # The user dialog counts and displays the seconds it waited so
    # far. You might want to disable this if you have the console
    # of your server connected to a serial terminal server with
    # limited logging capacity.
    # The Dialog will print the count each 'dialog-refresh' seconds,
    # set it to 0 to disable redrawing completely. [ default = 1 ]
    #
    # dialog-refresh 5; # 5 seconds

    # this is for people who set up a drbd device via the
    # loopback network interface or between two VMs on the same
    # box, for testing/simulating/presentation
    # otherwise it could trigger a run_tasq_queue deadlock.
    # I'm not sure whether this deadlock can happen with two
    # nodes, but it seems at least extremely unlikely; and since
    # the io_hints boost performance, keep them enabled.
    #
    # With linux 2.6 it no longer makes sense.
    # So this option should vanish.     --lge
    #
    # disable-io-hints;
# }

#
# this need not be r#, you may use phony resource names,
# like "resource web" or "resource mail", too
#

resource r0 {

  # transfer protocol to use.
  # C: write IO is reported as completed, if we know it has
  #    reached _both_ local and remote DISK.
  #    * for critical transactional data.
  # B: write IO is reported as completed, if it has reached
  #    local DISK and remote buffer cache.
  #    * for most cases.
  # A: write IO is reported as completed, if it has reached
  #    local DISK and local tcp send buffer. (see also sndbuf-size)
  #    * for high latency networks
  #
  #**********
  # uhm, benchmarks have shown that C is actually better than B.
  # this note shall disappear, when we are convinced that B is
  # the right choice "for most cases".
  # Until then, always use C unless you have a reason not to.
  #     --lge
  #**********
  #
  protocol C;

  # what should be done in case the cluster starts up in
  # degraded mode, but knows it has inconsistent data.
  incon-degr-cmd "halt -f";

  startup {
    # Wait for connection timeout.
    # The init script blocks the boot process until the resources
    # are connected.
    # In case you want to limit the wait time, do it here.
    #
    # wfc-timeout  0;

    # Wait for connection timeout if this node was a degraded cluster.
    # In case a degraded cluster (= cluster with only one node left)
    # is rebooted, this timeout value is used.
    #
    degr-wfc-timeout 120;    # 2 minutes.
  }

  disk {
    # if the lower level device reports io-error you have the choice of
    #  "pass_on"  ->  Report the io-error to the upper layers.
    #                 Primary   -> report it to the mounted file system.
    #                 Secondary -> ignore it.
    #  "panic"    ->  The node leaves the cluster by doing a kernel panic.
    #  "detach"   ->  The node drops its backing storage device, and
    #                 continues in disk less mode.
    #
    on-io-error   detach;
  }

  net {
    # this is the size of the tcp socket send buffer
    # increase it _carefully_ if you want to use protocol A over a
    # high latency network with reasonable write throughput.
    # defaults to 2*65535; you might try even 1M, but if your kernel or
    # network driver chokes on that, you have been warned.
    # sndbuf-size 512k;

     timeout       60;    #  6 seconds  (unit = 0.1 seconds)
     connect-int   10;    # 10 seconds  (unit = 1 second)
     ping-int      10;    # 10 seconds  (unit = 1 second)

    # Maximal number of requests (4K) to be allocated by DRBD.
    # The minimum is hardcoded to 32 (=128 kb).
    # For hight performance installations it might help if you
    # increase that number. These buffers are used to hold
    # datablocks while they are written to disk.
    #
     max-buffers     2048;

    # The highest number of data blocks between two write barriers.
    # If you set this < 10 you might decrease your performance.
     max-epoch-size  2048;

    # if some block send times out this many times, the peer is
    # considered dead, even if it still answers ping requests.
     ko-count 4;

    # if the connection to the peer is lost you have the choice of
    #  "reconnect"   -> Try to reconnect (AKA WFConnection state)
    #  "stand_alone" -> Do not reconnect (AKA StandAlone state)
    #  "freeze_io"   -> Try to reconnect but freeze all IO until
    #                   the connection is established again.
     on-disconnect reconnect;

  }

  syncer {
    # Limit the bandwith used by the resynchronisation process.
    # default unit is KB/sec; optional suffixes K,M,G are allowed
    #
    rate 500M;

    # All devices in one group are resynchronized parallel.
    # Resychronisation of groups is serialized in ascending order.
    # Put DRBD resources which are on different physical disks in one group.
    # Put DRBD resources on one physical disk in different groups.
    #
    group 1;

    # Configures the size of the active set. Each extent is 4M,
    # 257 Extents ~> 1GB active set size. In case your syncer
    # runs @ 10MB/sec, all resync after a primary's crash will last
    # 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds.
    # BTW, the hash algorithm works best if the number of al-extents
    # is prime. (To test the worst case performace use a power of 2)
    al-extents 257;
  }

  on ydog1.hostname.net {
    device     /dev/drbd0;
    disk       /dev/md0;
    address   192.168.111.102:7788;
    meta-disk  internal;

    # meta-disk is either 'internal' or '/dev/ice/name [idx]'
    #
    # You can use a single block device to store meta-data
    # of multiple DRBD's.
    # E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1];
    # for two different resources. In this case the meta-disk
    # would need to be at least 256 MB in size.
    #
    # 'internal' means, that the last 128 MB of the lower device
    # are used to store the meta-data.
    # You must not give an index with 'internal'.
  }

  on ydog2.hostname.net {
    device    /dev/drbd0;
    disk      /dev/md0;
    address   192.168.111.103:7788;
    meta-disk internal;
  }
}