Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We have a 1.8T Software Raided system that is DRBD'd across to an identical
system... it takes 7 hrs or so to Sync. Which doesnt seem out of the
ordinary. However - after all is Synced up and done - if we reboot the
boxes - the ReSync starts all over on the entire disk. Is this normal
behavior? Am i doing something wrong?
Box 1 we do:
modprobe drbd
drbdadm up all
Then box 2 we do:
modprobe drbd
drbdadm up all
then back on box 1 we set it to Primary:
drbdsetup /dev/drbd0 primary
----------------
[root at ydog1 root]# cat /proc/drbd
version: 0.7.10 (api:77/proto:74)
SVN Revision: 1743 build by root at ydog1.hostname.net, 2005-02-27 00:38:31
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:122979178 nr:0 dw:326 dr:122987125 al:3 bm:128976 lo:1 pe:2111
ua:2028 ap:0
[=>..................] sync'ed: 6.2% (1823439/1943527)M
finish: 7:08:24 speed: 72,572 (67,012) K/sec
1: cs:Unconfigured
-----------------
skip {
As you can see, you can also comment chunks of text
with a 'skip[optional nonsense]{ skipped text }' section.
This comes in handy, if you just want to comment out
some 'resource <some name> {...}' section:
just precede it with 'skip'.
The basic format of option assignment is
<option name><linear whitespace><value>;
It should be obvious from the examples below,
but if you really care to know the details:
<option name> :=
valid options in the respective scope
<value> := <num>|<string>|<choice>|...
depending on the set of allowed values
for the respective option.
<num> := [0-9]+, sometimes with an optional suffix of K,M,G
<string> := (<name>|\"([^\"\\\n]*|\\.)*\")+
<name> := [/_.A-Za-z0-9-]+
}
#
# At most ONE global section is allowed.
# It must precede any resource section.
#
# global {
# use this if you want to define more resources later
# without reloading the module.
# by default we load the module with exactly as many devices
# as configured mentioned in this file.
#
# minor-count 5;
# The user dialog counts and displays the seconds it waited so
# far. You might want to disable this if you have the console
# of your server connected to a serial terminal server with
# limited logging capacity.
# The Dialog will print the count each 'dialog-refresh' seconds,
# set it to 0 to disable redrawing completely. [ default = 1 ]
#
# dialog-refresh 5; # 5 seconds
# this is for people who set up a drbd device via the
# loopback network interface or between two VMs on the same
# box, for testing/simulating/presentation
# otherwise it could trigger a run_tasq_queue deadlock.
# I'm not sure whether this deadlock can happen with two
# nodes, but it seems at least extremely unlikely; and since
# the io_hints boost performance, keep them enabled.
#
# With linux 2.6 it no longer makes sense.
# So this option should vanish. --lge
#
# disable-io-hints;
# }
#
# this need not be r#, you may use phony resource names,
# like "resource web" or "resource mail", too
#
resource r0 {
# transfer protocol to use.
# C: write IO is reported as completed, if we know it has
# reached _both_ local and remote DISK.
# * for critical transactional data.
# B: write IO is reported as completed, if it has reached
# local DISK and remote buffer cache.
# * for most cases.
# A: write IO is reported as completed, if it has reached
# local DISK and local tcp send buffer. (see also sndbuf-size)
# * for high latency networks
#
#**********
# uhm, benchmarks have shown that C is actually better than B.
# this note shall disappear, when we are convinced that B is
# the right choice "for most cases".
# Until then, always use C unless you have a reason not to.
# --lge
#**********
#
protocol C;
# what should be done in case the cluster starts up in
# degraded mode, but knows it has inconsistent data.
incon-degr-cmd "halt -f";
startup {
# Wait for connection timeout.
# The init script blocks the boot process until the resources
# are connected.
# In case you want to limit the wait time, do it here.
#
# wfc-timeout 0;
# Wait for connection timeout if this node was a degraded cluster.
# In case a degraded cluster (= cluster with only one node left)
# is rebooted, this timeout value is used.
#
degr-wfc-timeout 120; # 2 minutes.
}
disk {
# if the lower level device reports io-error you have the choice of
# "pass_on" -> Report the io-error to the upper layers.
# Primary -> report it to the mounted file system.
# Secondary -> ignore it.
# "panic" -> The node leaves the cluster by doing a kernel panic.
# "detach" -> The node drops its backing storage device, and
# continues in disk less mode.
#
on-io-error detach;
}
net {
# this is the size of the tcp socket send buffer
# increase it _carefully_ if you want to use protocol A over a
# high latency network with reasonable write throughput.
# defaults to 2*65535; you might try even 1M, but if your kernel or
# network driver chokes on that, you have been warned.
# sndbuf-size 512k;
timeout 60; # 6 seconds (unit = 0.1 seconds)
connect-int 10; # 10 seconds (unit = 1 second)
ping-int 10; # 10 seconds (unit = 1 second)
# Maximal number of requests (4K) to be allocated by DRBD.
# The minimum is hardcoded to 32 (=128 kb).
# For hight performance installations it might help if you
# increase that number. These buffers are used to hold
# datablocks while they are written to disk.
#
max-buffers 2048;
# The highest number of data blocks between two write barriers.
# If you set this < 10 you might decrease your performance.
max-epoch-size 2048;
# if some block send times out this many times, the peer is
# considered dead, even if it still answers ping requests.
ko-count 4;
# if the connection to the peer is lost you have the choice of
# "reconnect" -> Try to reconnect (AKA WFConnection state)
# "stand_alone" -> Do not reconnect (AKA StandAlone state)
# "freeze_io" -> Try to reconnect but freeze all IO until
# the connection is established again.
on-disconnect reconnect;
}
syncer {
# Limit the bandwith used by the resynchronisation process.
# default unit is KB/sec; optional suffixes K,M,G are allowed
#
rate 500M;
# All devices in one group are resynchronized parallel.
# Resychronisation of groups is serialized in ascending order.
# Put DRBD resources which are on different physical disks in one group.
# Put DRBD resources on one physical disk in different groups.
#
group 1;
# Configures the size of the active set. Each extent is 4M,
# 257 Extents ~> 1GB active set size. In case your syncer
# runs @ 10MB/sec, all resync after a primary's crash will last
# 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds.
# BTW, the hash algorithm works best if the number of al-extents
# is prime. (To test the worst case performace use a power of 2)
al-extents 257;
}
on ydog1.hostname.net {
device /dev/drbd0;
disk /dev/md0;
address 192.168.111.102:7788;
meta-disk internal;
# meta-disk is either 'internal' or '/dev/ice/name [idx]'
#
# You can use a single block device to store meta-data
# of multiple DRBD's.
# E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1];
# for two different resources. In this case the meta-disk
# would need to be at least 256 MB in size.
#
# 'internal' means, that the last 128 MB of the lower device
# are used to store the meta-data.
# You must not give an index with 'internal'.
}
on ydog2.hostname.net {
device /dev/drbd0;
disk /dev/md0;
address 192.168.111.103:7788;
meta-disk internal;
}
}