Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We have a 1.8T Software Raided system that is DRBD'd across to an identical system... it takes 7 hrs or so to Sync. Which doesnt seem out of the ordinary. However - after all is Synced up and done - if we reboot the boxes - the ReSync starts all over on the entire disk. Is this normal behavior? Am i doing something wrong? Box 1 we do: modprobe drbd drbdadm up all Then box 2 we do: modprobe drbd drbdadm up all then back on box 1 we set it to Primary: drbdsetup /dev/drbd0 primary ---------------- [root at ydog1 root]# cat /proc/drbd version: 0.7.10 (api:77/proto:74) SVN Revision: 1743 build by root at ydog1.hostname.net, 2005-02-27 00:38:31 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:122979178 nr:0 dw:326 dr:122987125 al:3 bm:128976 lo:1 pe:2111 ua:2028 ap:0 [=>..................] sync'ed: 6.2% (1823439/1943527)M finish: 7:08:24 speed: 72,572 (67,012) K/sec 1: cs:Unconfigured ----------------- skip { As you can see, you can also comment chunks of text with a 'skip[optional nonsense]{ skipped text }' section. This comes in handy, if you just want to comment out some 'resource <some name> {...}' section: just precede it with 'skip'. The basic format of option assignment is <option name><linear whitespace><value>; It should be obvious from the examples below, but if you really care to know the details: <option name> := valid options in the respective scope <value> := <num>|<string>|<choice>|... depending on the set of allowed values for the respective option. <num> := [0-9]+, sometimes with an optional suffix of K,M,G <string> := (<name>|\"([^\"\\\n]*|\\.)*\")+ <name> := [/_.A-Za-z0-9-]+ } # # At most ONE global section is allowed. # It must precede any resource section. # # global { # use this if you want to define more resources later # without reloading the module. # by default we load the module with exactly as many devices # as configured mentioned in this file. # # minor-count 5; # The user dialog counts and displays the seconds it waited so # far. You might want to disable this if you have the console # of your server connected to a serial terminal server with # limited logging capacity. # The Dialog will print the count each 'dialog-refresh' seconds, # set it to 0 to disable redrawing completely. [ default = 1 ] # # dialog-refresh 5; # 5 seconds # this is for people who set up a drbd device via the # loopback network interface or between two VMs on the same # box, for testing/simulating/presentation # otherwise it could trigger a run_tasq_queue deadlock. # I'm not sure whether this deadlock can happen with two # nodes, but it seems at least extremely unlikely; and since # the io_hints boost performance, keep them enabled. # # With linux 2.6 it no longer makes sense. # So this option should vanish. --lge # # disable-io-hints; # } # # this need not be r#, you may use phony resource names, # like "resource web" or "resource mail", too # resource r0 { # transfer protocol to use. # C: write IO is reported as completed, if we know it has # reached _both_ local and remote DISK. # * for critical transactional data. # B: write IO is reported as completed, if it has reached # local DISK and remote buffer cache. # * for most cases. # A: write IO is reported as completed, if it has reached # local DISK and local tcp send buffer. (see also sndbuf-size) # * for high latency networks # #********** # uhm, benchmarks have shown that C is actually better than B. # this note shall disappear, when we are convinced that B is # the right choice "for most cases". # Until then, always use C unless you have a reason not to. # --lge #********** # protocol C; # what should be done in case the cluster starts up in # degraded mode, but knows it has inconsistent data. incon-degr-cmd "halt -f"; startup { # Wait for connection timeout. # The init script blocks the boot process until the resources # are connected. # In case you want to limit the wait time, do it here. # # wfc-timeout 0; # Wait for connection timeout if this node was a degraded cluster. # In case a degraded cluster (= cluster with only one node left) # is rebooted, this timeout value is used. # degr-wfc-timeout 120; # 2 minutes. } disk { # if the lower level device reports io-error you have the choice of # "pass_on" -> Report the io-error to the upper layers. # Primary -> report it to the mounted file system. # Secondary -> ignore it. # "panic" -> The node leaves the cluster by doing a kernel panic. # "detach" -> The node drops its backing storage device, and # continues in disk less mode. # on-io-error detach; } net { # this is the size of the tcp socket send buffer # increase it _carefully_ if you want to use protocol A over a # high latency network with reasonable write throughput. # defaults to 2*65535; you might try even 1M, but if your kernel or # network driver chokes on that, you have been warned. # sndbuf-size 512k; timeout 60; # 6 seconds (unit = 0.1 seconds) connect-int 10; # 10 seconds (unit = 1 second) ping-int 10; # 10 seconds (unit = 1 second) # Maximal number of requests (4K) to be allocated by DRBD. # The minimum is hardcoded to 32 (=128 kb). # For hight performance installations it might help if you # increase that number. These buffers are used to hold # datablocks while they are written to disk. # max-buffers 2048; # The highest number of data blocks between two write barriers. # If you set this < 10 you might decrease your performance. max-epoch-size 2048; # if some block send times out this many times, the peer is # considered dead, even if it still answers ping requests. ko-count 4; # if the connection to the peer is lost you have the choice of # "reconnect" -> Try to reconnect (AKA WFConnection state) # "stand_alone" -> Do not reconnect (AKA StandAlone state) # "freeze_io" -> Try to reconnect but freeze all IO until # the connection is established again. on-disconnect reconnect; } syncer { # Limit the bandwith used by the resynchronisation process. # default unit is KB/sec; optional suffixes K,M,G are allowed # rate 500M; # All devices in one group are resynchronized parallel. # Resychronisation of groups is serialized in ascending order. # Put DRBD resources which are on different physical disks in one group. # Put DRBD resources on one physical disk in different groups. # group 1; # Configures the size of the active set. Each extent is 4M, # 257 Extents ~> 1GB active set size. In case your syncer # runs @ 10MB/sec, all resync after a primary's crash will last # 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds. # BTW, the hash algorithm works best if the number of al-extents # is prime. (To test the worst case performace use a power of 2) al-extents 257; } on ydog1.hostname.net { device /dev/drbd0; disk /dev/md0; address 192.168.111.102:7788; meta-disk internal; # meta-disk is either 'internal' or '/dev/ice/name [idx]' # # You can use a single block device to store meta-data # of multiple DRBD's. # E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1]; # for two different resources. In this case the meta-disk # would need to be at least 256 MB in size. # # 'internal' means, that the last 128 MB of the lower device # are used to store the meta-data. # You must not give an index with 'internal'. } on ydog2.hostname.net { device /dev/drbd0; disk /dev/md0; address 192.168.111.103:7788; meta-disk internal; } }