[DRBD-user] Strange behaviour (mine or drbd's, im not sure yet)

Thu Oct 4 18:39:15 CEST 2007

Okay...

DRBD vanilla (from the drbd site) on RHEL5. It all works smoothly, my
little cluster works very well and I thank you guys for that.

Thing is, we tried backuping the drbd filesystem with amanda, which uses
tar and the dump command to find out the size.

It all should be working well if not because the dump command gives
amanda this error:

[root at fw amanda]# su - amandabackup
-sh-3.1$ /sbin/dump -S /hadisk
/dev/drbd0: Can't read next inode while scanning inode #6094848

"Okay then", we thought, "lets check our filesystem on the secondary
server while drbd is disconnected":

[root at fw2 ~]# fsck.ext3 /dev/drbd0
e2fsck 1.39 (29-May-2006)
The filesystem size (according to the superblock) is 12209392 blocks
The physical size of the device is 12176624 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>? yes

Now I wouldnt mind the message if the filesystem was corrupted, but it
isnt. There isnt even one message on the logs complaining about the
filesystem, the whole thing works very well, it mounts without any
complaints.

Here is my partition table:
[root at fw ~]# fdisk -l /dev/sda

Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        3837    30716280   8e  Linux LVM
/dev/sda4            3838       19457   125467650    5  Extended
/dev/sda5            3838        9917    48837568+  83  Linux

Okay, so im gonna put next my drbd.conf, but first id like to point out
that there is no place there to specify the size of the underlying block
device (there used to be one for protocol 7, methinks no longer needed
for proto 8). Perhaps this is a case of pebkac (ME==p)... well, no, not
"perhaps" im certaintly doing something wrong, i just dont know where or
how to fix it.

So, okay then, here is my drbd.conf's drbd0 section (along with all
those helpful comments):
resource drbd0 {

  # transfer protocol to use.
  # C: write IO is reported as completed, if we know it has
  #    reached _both_ local and remote DISK.
  #    * for critical transactional data.
  # B: write IO is reported as completed, if it has reached
  #    local DISK and remote buffer cache.
  #    * for most cases.
  # A: write IO is reported as completed, if it has reached
  #    local DISK and local tcp send buffer. (see also sndbuf-size)
  #    * for high latency networks
  #
  #**********
  # uhm, benchmarks have shown that C is actually better than B.
  # this note shall disappear, when we are convinced that B is
  # the right choice "for most cases".
  # Until then, always use C unless you have a reason not to.
  #     --lge
  #**********
  #
  protocol B;

  # what should be done in case the cluster starts up in
  # degraded mode, but knows it has inconsistent data.
  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ;
halt -f";

  startup {
    # Wait for connection timeout.
    # The init script blocks the boot process until the resources
    # are connected. This is so when the cluster manager starts later,
    # it does not see a resource with internal split-brain.
    # In case you want to limit the wait time, do it here.
    # Default is 0, which means unlimited. Unit is seconds.
    #
    # wfc-timeout  0;

    # Wait for connection timeout if this node was a degraded cluster.
    # In case a degraded cluster (= cluster with only one node left)
    # is rebooted, this timeout value is used.
    #
    degr-wfc-timeout 10;    # 2 minutes.

 }

  disk {
    # if the lower level device reports io-error you have the choice of
    #  "pass_on"  ->  Report the io-error to the upper layers.
    #                 Primary   -> report it to the mounted file system.
    #                 Secondary -> ignore it.
    #  "panic"    ->  The node leaves the cluster by doing a kernel panic.
    #  "detach"   ->  The node drops its backing storage device, and
    #                 continues in disk less mode.
    #
    on-io-error   detach;
  }

  net {
    # this is the size of the tcp socket send buffer
    # increase it _carefully_ if you want to use protocol A over a
    # high latency network with reasonable write throughput.
    # defaults to 2*65535; you might try even 1M, but if your kernel or
    # network driver chokes on that, you have been warned.
    # sndbuf-size 512k;

    # timeout       60;    #  6 seconds  (unit = 0.1 seconds)
    # connect-int   10;    # 10 seconds  (unit = 1 second)
    # ping-int      10;    # 10 seconds  (unit = 1 second)

    # Maximal number of requests (4K) to be allocated by DRBD.
    # The minimum is hardcoded to 32 (=128 kb).
    # For hight performance installations it might help if you
    # increase that number. These buffers are used to hold
    # datablocks while they are written to disk.
    #
    # max-buffers     2048;

    # The highest number of data blocks between two write barriers.
    # If you set this < 10 you might decrease your performance.
    # max-epoch-size  2048;

    # if some block send times out this many times, the peer is
    # considered dead, even if it still answers ping requests.
    # ko-count 4;

  # if the connection to the peer is lost you have the choice of
    #  "reconnect"   -> Try to reconnect (AKA WFConnection state)
    #  "stand_alone" -> Do not reconnect (AKA StandAlone state)
    #  "freeze_io"   -> Try to reconnect but freeze all IO until
    #                   the connection is established again.
     on-disconnect reconnect;

  }

  syncer {
    # Limit the bandwith used by the resynchronisation process.
    # default unit is KB/sec; optional suffixes K,M,G are allowed
    #
    rate 60M;

    # All devices in one group are resynchronized parallel.
    # Resychronisation of groups is serialized in ascending order.
    # Put DRBD resources which are on different physical disks in one group.
    # Put DRBD resources on one physical disk in different groups.
    #
    group 1;

    # Configures the size of the active set. Each extent is 4M,
    # 257 Extents ~> 1GB active set size. In case your syncer
    # runs @ 10MB/sec, all resync after a primary's crash will last
    # 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds.
    # BTW, the hash algorithm works best if the number of al-extents
    # is prime. (To test the worst case performace use a power of 2)
    al-extents 257;
  }

  on fw2.fnsi.com.mx {
    device     /dev/drbd0;
    disk       /dev/sda2;
    address    10.254.254.254:7788;
    meta-disk  internal;

    # meta-disk is either 'internal' or '/dev/ice/name [idx]'
    #
    # You can use a single block device to store meta-data

 # of multiple DRBD's.
    # E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1];
    # for two different resources. In this case the meta-disk
    # would need to be at least 256 MB in size.
    #
    # 'internal' means, that the last 128 MB of the lower device
    # are used to store the meta-data.
    # You must not give an index with 'internal'.
  }

  on fw.fnsi.com.mx {
    device    /dev/drbd0;
    disk      /dev/sda5;
    address   10.254.254.222:7788;
    meta-disk internal;
  }