[DRBD-user] First DRBD attempt -- HELP pls

Tue Jan 24 04:05:39 CET 2012

On Mon, Jan 23, 2012 at 3:04 AM, Felix Frank <ff at mpexnet.de> wrote:

> Hi,
>
> On 01/23/2012 01:34 AM, Trey Dockendorf wrote:
> > Using drbd84 gave the same "Can not open backing device" with exit code
> 10.
> >
> > The strange part is these systems are identical in every way except
> > their volume groups are named after their hostname.  The drbd setup is
> > identical also as I'm using Puppet for that too.  Any advice on how to
> > troubleshoot or resolve this ?
>
> so you got a hung sync on your first try? Ugh, dreadful. Kudos for
> staying aboard despite this.
>
> First thing is to check your kernel output (dmesg, kern.log or similar)
> for more details. Please share a meaningul excerpt with the list (i.e.,
> do another "drbdadm attach all", then paste the new log entries).
>
> Regards,
> Felix
>

So this is the the two failures (one in same I presume).
========
# service drbd start
Starting DRBD resources: [ d(r0) 0: Failure: (104) Can not open backing
device.

[r0] cmd /sbin/drbdsetup 0 disk /dev/vg_cllakvm2/lv_vmstore
/dev/vg_cllakvm2/lv_vmstore internal --set-defaults --create-device  failed
- continuing!

n(r0) ].

========
# drbdadm attach r0
0: Failure: (104) Can not open backing device.
Command 'drbdsetup 0 disk /dev/vg_cllakvm2/lv_vmstore
/dev/vg_cllakvm2/lv_vmstore internal --set-defaults --create-device'
terminated with exit code 10

This is the DRBD information as well as the logs that show up during this
failure

# cat /proc/drbd
version: 8.3.12 (api:88/proto:86-96)
GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build64R6,
2011-11-20 10:57:03
 0: cs:Connected ro:Secondary/Secondary ds:Diskless/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

# dmesgdrbd module is older than RHEL 6.2 ... applying fixups
drbd: initialized. Version: 8.3.12 (api:88/proto:86-96)
drbd: GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by
dag at Build64R6, 2011-11-20 10:57:03
drbd: registered as block device major 147
drbd: minor_table @ 0xffff880431310880
block drbd0: Starting worker thread (from cqueue [13446])
block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16
block drbd0: drbd_bm_resize called with capacity == 0
block drbd0: worker terminated
block drbd0: Terminating worker thread
block drbd0: Starting worker thread (from cqueue [13446])
block drbd0: conn( StandAlone -> Unconnected )
block drbd0: Starting receiver thread (from drbd0_worker [13457])
block drbd0: receiver (re)started
block drbd0: conn( Unconnected -> WFConnection )
block drbd0: Handshake successful: Agreed network protocol version 96
block drbd0: conn( WFConnection -> WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [13458])
block drbd0: data-integrity-alg: <not-used>
block drbd0: max BIO size = 4096
block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected
) pdsk( DUnknown -> Inconsistent )
block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16

I don't have a kernel.log, it all goes to /var/log/messages but it has the
same output as dmesg
=========
Jan 23 20:53:33 cllakvm2 kernel: drbd: initialized. Version: 8.3.12
(api:88/proto:86-96)
Jan 23 20:53:33 cllakvm2 kernel: drbd: GIT-hash:
e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build64R6, 2011-11-20
10:57:03
Jan 23 20:53:33 cllakvm2 kernel: drbd: registered as block device major 147
Jan 23 20:53:33 cllakvm2 kernel: drbd: minor_table @ 0xffff880431310880
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting worker thread (from
cqueue [13446])
Jan 23 20:53:33 cllakvm2 kernel: block drbd0:
open("/dev/vg_cllakvm2/lv_vmstore") failed with -16
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: drbd_bm_resize called with
capacity == 0
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: worker terminated
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Terminating worker thread
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting worker thread (from
cqueue [13446])
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: conn( StandAlone ->
Unconnected )
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting receiver thread
(from drbd0_worker [13457])
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: receiver (re)started
Jan 23 20:53:33 cllakvm2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jan 23 20:53:54 cllakvm2 kernel: block drbd0: Handshake successful: Agreed
network protocol version 96
Jan 23 20:53:54 cllakvm2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jan 23 20:53:54 cllakvm2 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [13458])
Jan 23 20:53:54 cllakvm2 kernel: block drbd0: data-integrity-alg: <not-used>
Jan 23 20:53:54 cllakvm2 kernel: block drbd0: max BIO size = 4096
Jan 23 20:53:54 cllakvm2 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent )
Jan 23 20:54:03 cllakvm2 kernel: block drbd0:
open("/dev/vg_cllakvm2/lv_vmstore") failed with -16

Versions ...
==============

# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ e2a8ef4656be026bbae540305fcb998a5991090f\
build\ by\ dag at Build64R6\,\ 2011-11-20\ 10:57:26
DRBDADM_API_VERSION=88
DRBD_KERNEL_VERSION_CODE=0x08030c
DRBDADM_VERSION_CODE=0x08030c
DRBDADM_VERSION=8.3.12

# uname -r
2.6.32-220.2.1.el6.x86_64

What's odd is the sync hang only happened once I unmounted the LV /vmstore.
 Reading the docs I found it mentions the resource not having to be empty,
but doesn't mention if it can be in use.

I'd like to give this another shot, but due to the long outage this caused
I've been told to leave this system be and not work on syncing my two nodes
to facilitate the migration of our VMs from their current temporary system.

If I'm allowed to give this another go, for the node with data to become
"UpToDate" , shouldn't it have to be promoted to primary first?  The sync
seemed to start once the resource was not in use and attached.
 Additionally, does getting the resource "UpToDate" actually take time
depending on the size of resource / space used or it just populating the
metadisk?  The system I have no data on took no time at all to become
available and reach "Inconsistent".

Thanks!
- Trey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120123/4b5ffba0/attachment.htm>