[DRBD-user] Why isn't DRBD recognized as valid LVM PV?

Wed Mar 12 16:27:55 CET 2008

Hello DRBD Users,

I need to update the OS of one of our 2-Node Linux Heartbeat
(HB1) cluster nodes
while the HB managed resources must stay available on the other
node, albeit no more HA during the update.

On the free node I installed RHEL 5.1 with the distro's shipped
kernel + kernel-headers.

# uname -srvi
Linux 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.1 (Tikanga)

After fruitless attempts (at least the cause of my posting still
persists)
with various prebuilt RPM packages of DRBD Kmod and User Tools
for x86_64 from CentOS 5.1 
I compiled from the sources of drbd-8.2.5.tar.gz
as outlined in INSTALL's section on Build DBRD against distro
kernel with precompiled headers.

This produced the following drbd kernel module

# modinfo /lib/modules/$(uname -r)/kernel/drivers/block/drbd.ko
filename:
/lib/modules/2.6.18-53.el5/kernel/drivers/block/drbd.ko
alias:          block-major-147-*
license:        GPL
description:    drbd - Distributed Replicated Block Device v8.2.5
author:         Philipp Reisner <phil at linbit.com>, Lars Ellenberg
<lars at linbit.com>
srcversion:     694285A12A998FCF4E55FB4
depends:        
vermagic:       2.6.18-53.el5 SMP mod_unload gcc-4.1
parm:           minor_count:Maximum number of drbd devices
(1-255) (int)
parm:           allow_oos:DONT USE! (bool)
parm:           enable_faults:int
parm:           fault_rate:int
parm:           fault_count:int
parm:           fault_devs:int
parm:           trace_level:int
parm:           trace_type:int
parm:           trace_devs:int
parm:           usermode_helper:string

Accordingly, I installed the DRBD User Tools.

I need to stack the DRBD HB resource on an MD RAID1 device.
The resulting DRBD then must serve as an LVM PV to found the
"shared" VG.

This setup has been working perfectly with Fedora Core 3 and DRBD
0.7.15 in the old cluster.

I had prepared the internal disks on the free node to use slice 6

# fdisk -l /dev/sd[ab]|grep sd[ab]6
/dev/sda6            5223        9235    32234391   fd  Linux
raid autodetect
/dev/sdb6            5223        9235    32234391   fd  Linux
raid autodetect

of which I had created a RAID1 array
which can be assembled to use /dev/md4 

# mdadm -A /dev/md4 -a yes /dev/sd[ab]6
mdadm: /dev/md4 has been started with 2 drives.

# mdadm -QD /dev/md4|grep State\ *:
          State : clean

# mdadm -Dsv|grep -A1 md4    
ARRAY /dev/md4 level=raid1 num-devices=2
UUID=0f3aa0e2:f8983059:8dab5f27:9293bac7
   devices=/dev/sda6,/dev/sdb6

While I set up the required dedicated DRBD network connection...

# ifconfig eth2|grep inet\ 
          inet addr:192.168.3.3  Bcast:192.168.3.255
Mask:255.255.255.0

# ping -c 1 192.168.3.2
PING 192.168.3.2 (192.168.3.2) 56(84) bytes of data.
64 bytes from 192.168.3.2: icmp_seq=1 ttl=64 time=1.23 ms

--- 192.168.3.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.236/1.236/1.236/0.000 ms

...for the time of the OS upgrade I needed to configure peer
ports which aren't serviced
by a remote drbdd because I must by all means avoid any
accidental syncing attempts
as long as the application is productive on the bound Fedora node
(FC3).

But to my understanding, much like I can start an MD array in
degraded mode if I specify any 
devices as "missing" on creation or deliberately fail or remove
them,
DRBD should allow primary activation and offering for LVM usage
of a device
even if it has no network connectivity to the peer.
Or is this the basic flaw of my assumption?

Let me demonstrate what I mean.

This is my prelimary configuration:

# cat /etc/drbd.conf
# /etc/drbd.conf
global {
    usage-count     no;
}
common {
    syncer {
        rate             10M;
    }
}

resource r0 {
    protocol             C;
    meta-disk            internal;
    on RHEL5 {
        device           /dev/drbd0;
        disk             /dev/md4;
        address          192.168.3.3:7790;
    }
    on FC3 {
        device           /dev/drbd5;
        disk             /dev/md5;
        address          192.168.3.2:7790;
    }
    net {
        after-sb-0pri    disconnect;
        after-sb-1pri    disconnect;
        after-sb-2pri    disconnect;
        rr-conflict      disconnect;
    }
    disk {
        on-io-error      detach;
    }
    syncer {
        rate             10M;
        al-extents       257;
    }
    startup {
        degr-wfc-timeout 120;
    }
    handlers {
        pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt
-f";
        pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt
-f";
        local-io-error   "echo o > /proc/sysrq-trigger ; halt
-f";
    }
}

As noted, port 7790/tcp isn't currently serviced,
why for now I will bring up the device without connect.

To avoid any stale meta data from my earlier attempts
let me override these blocks

# drbdadm create-md r0
md_offset 33007923200
al_offset 33007890432
bm_offset 33006882816

Found some data 
 ==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

v07 Magic number not found
v07 Magic number not found
You want me to create a v08 style flexible-size internal meta
data block.
There apears to be a v08 flexible-size internal meta data block
already in place on /dev/md4 at byte offset 33007923200
Do you really want to overwrite the existing v08 meta-data?
[need to type 'yes' to confirm] yes

Writing meta data...
initialising activity log
NOT initialized bitmap
New drbd meta data block sucessfully created.

Now start drbd,
whose wait for connect, which it will never get, I quit by
entering "yes"

# service drbd status
drbd not loaded

# service drbd start
Starting DRBD resources:    [ d(r0) s(r0) n(r0) ].
..........
***************************************************************
 DRBD's startup script waits for the peer node(s) to appear.
 - In case this node was already a degraded cluster before the
   reboot the timeout is 120 seconds. [degr-wfc-timeout]
 - If the peer was available before the reboot the timeout will
   expire after 0 seconds. [wfc-timeout]
   (These values are for resource 'r0'; 0 sec -> wait forever)
 To abort waiting enter 'yes' [ -- ]:[  10]:[  11]:[  12]:yes

Check states

# drbdadm state r0
Secondary/Unknown

# drbdadm cstate r0
WFConnection

# drbdadm disconnect r0

# drbdadm cstate r0
StandAlone

# drbdadm dstate r0
Inconsistent/DUnknown

# drbdadm -- --overwrite-data-of-peer primary r0

# drbdadm dstate r0
UpToDate/DUnknown

# drbdadm state r0
Primary/Unknown

So now, if my assumptions are correct,
the device r0 should now be in a state to be used as PV, correct?

# drbdadm sh-dev r0
/dev/drbd0

# pvcreate -M2 $(drbdadm sh-dev r0)
  Physical volume "/dev/drbd0" successfully created

Now, what I absolutely can't understand, and what is driving me
insane,
is why the hack a pvscan now lists /dev/md4 and not /dev/drbd0 as
found PV,
as it should (and has been doing on the bound node)?

# pvscan
  PV /dev/md5   VG vgrh     lvm2 [9.54 GB / 3.91 GB free]
  PV /dev/md3   VG vgdata   lvm2 [27.94 GB / 9.44 GB free]
  PV /dev/md1   VG vg00     lvm2 [9.54 GB / 3.91 GB free]
  PV /dev/md4               lvm2 [30.74 GB]
  Total: 4 [77.76 GB] / in use: 3 [47.02 GB] / in no VG: 1 [30.74
GB]

I first thought to have found the reason in a misconfigured
lvm.conf filter.
But then the default "match all", which I had commented out,
should have matched any drbd[0-9] anyway.

# grep -B3 '^ *filter' /etc/lvm/lvm.conf
    # By default we accept every block device:
    #filter = [ "a/.*/" ]

    filter = [ "a|^/dev/sd[ab][1-9]?$|", "a|^/dev/md[0-9]$|",
"a|^/dev/drbd[0-9]$|", "r|.*|" ]

For some strange reason I cannot use the drbd as a normal PV

# pvdisplay -vm /dev/{md4,drbd0}
    Using physical volume(s) on command line
    Wiping cache of LVM-capable devices
  --- NEW Physical volume ---
  PV Name               /dev/md4
  VG Name               
  PV Size               30.74 GB
  Allocatable           NO
  PE Size (KByte)       0
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               nFUj7P-Hu03-V6mF-bNGd-SiCC-Fe37-FAZ7MJ

  No physical volume label read from /dev/drbd0
  Failed to read physical volume "/dev/drbd0"

So this of course is futile as well

# vgcreate -M2 -s 8m vgBLA /dev/drbd0
  No physical volume label read from /dev/drbd0
  /dev/drbd0 not identified as an existing physical volume
  Unable to add physical volume '/dev/drbd0' to volume group
'vgBLA'.

# pvremove /dev/md4
  Can't open /dev/md4 exclusively - not removing. Mounted
filesystem?

# pvremove -ff /dev/md4
  Can't open /dev/md4 exclusively - not removing. Mounted
filesystem?

# cat /proc/drbd 
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by
root at RHEL5, 2008-03-12 13:13:18
 0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r---
    ns:0 nr:0 dw:16 dr:208 al:1 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0
changed:0
        act_log: used:0/257 hits:3 misses:1 starving:0 dirty:0
changed:1

I am lost here.
I should have set up the cluster node already by yesterday and
wasted 
far too many hours with this mess.
Could any one advise me what I must have forgotten here?
Or should I revert to a DRBD 7 release which has proved to work
so far?