Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Jan 23, 2012 at 3:04 AM, Felix Frank <ff at mpexnet.de> wrote: > Hi, > > On 01/23/2012 01:34 AM, Trey Dockendorf wrote: > > Using drbd84 gave the same "Can not open backing device" with exit code > 10. > > > > The strange part is these systems are identical in every way except > > their volume groups are named after their hostname. The drbd setup is > > identical also as I'm using Puppet for that too. Any advice on how to > > troubleshoot or resolve this ? > > so you got a hung sync on your first try? Ugh, dreadful. Kudos for > staying aboard despite this. > > First thing is to check your kernel output (dmesg, kern.log or similar) > for more details. Please share a meaningul excerpt with the list (i.e., > do another "drbdadm attach all", then paste the new log entries). > > Regards, > Felix > So this is the the two failures (one in same I presume). ======== # service drbd start Starting DRBD resources: [ d(r0) 0: Failure: (104) Can not open backing device. [r0] cmd /sbin/drbdsetup 0 disk /dev/vg_cllakvm2/lv_vmstore /dev/vg_cllakvm2/lv_vmstore internal --set-defaults --create-device failed - continuing! n(r0) ]. ======== # drbdadm attach r0 0: Failure: (104) Can not open backing device. Command 'drbdsetup 0 disk /dev/vg_cllakvm2/lv_vmstore /dev/vg_cllakvm2/lv_vmstore internal --set-defaults --create-device' terminated with exit code 10 This is the DRBD information as well as the logs that show up during this failure # cat /proc/drbd version: 8.3.12 (api:88/proto:86-96) GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build64R6, 2011-11-20 10:57:03 0: cs:Connected ro:Secondary/Secondary ds:Diskless/Inconsistent C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 # dmesgdrbd module is older than RHEL 6.2 ... applying fixups drbd: initialized. Version: 8.3.12 (api:88/proto:86-96) drbd: GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build64R6, 2011-11-20 10:57:03 drbd: registered as block device major 147 drbd: minor_table @ 0xffff880431310880 block drbd0: Starting worker thread (from cqueue [13446]) block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16 block drbd0: drbd_bm_resize called with capacity == 0 block drbd0: worker terminated block drbd0: Terminating worker thread block drbd0: Starting worker thread (from cqueue [13446]) block drbd0: conn( StandAlone -> Unconnected ) block drbd0: Starting receiver thread (from drbd0_worker [13457]) block drbd0: receiver (re)started block drbd0: conn( Unconnected -> WFConnection ) block drbd0: Handshake successful: Agreed network protocol version 96 block drbd0: conn( WFConnection -> WFReportParams ) block drbd0: Starting asender thread (from drbd0_receiver [13458]) block drbd0: data-integrity-alg: <not-used> block drbd0: max BIO size = 4096 block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent ) block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16 I don't have a kernel.log, it all goes to /var/log/messages but it has the same output as dmesg ========= Jan 23 20:53:33 cllakvm2 kernel: drbd: initialized. Version: 8.3.12 (api:88/proto:86-96) Jan 23 20:53:33 cllakvm2 kernel: drbd: GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build64R6, 2011-11-20 10:57:03 Jan 23 20:53:33 cllakvm2 kernel: drbd: registered as block device major 147 Jan 23 20:53:33 cllakvm2 kernel: drbd: minor_table @ 0xffff880431310880 Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting worker thread (from cqueue [13446]) Jan 23 20:53:33 cllakvm2 kernel: block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16 Jan 23 20:53:33 cllakvm2 kernel: block drbd0: drbd_bm_resize called with capacity == 0 Jan 23 20:53:33 cllakvm2 kernel: block drbd0: worker terminated Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Terminating worker thread Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting worker thread (from cqueue [13446]) Jan 23 20:53:33 cllakvm2 kernel: block drbd0: conn( StandAlone -> Unconnected ) Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting receiver thread (from drbd0_worker [13457]) Jan 23 20:53:33 cllakvm2 kernel: block drbd0: receiver (re)started Jan 23 20:53:33 cllakvm2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jan 23 20:53:54 cllakvm2 kernel: block drbd0: Handshake successful: Agreed network protocol version 96 Jan 23 20:53:54 cllakvm2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Jan 23 20:53:54 cllakvm2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [13458]) Jan 23 20:53:54 cllakvm2 kernel: block drbd0: data-integrity-alg: <not-used> Jan 23 20:53:54 cllakvm2 kernel: block drbd0: max BIO size = 4096 Jan 23 20:53:54 cllakvm2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent ) Jan 23 20:54:03 cllakvm2 kernel: block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16 Versions ... ============== # drbdadm --version DRBDADM_BUILDTAG=GIT-hash:\ e2a8ef4656be026bbae540305fcb998a5991090f\ build\ by\ dag at Build64R6\,\ 2011-11-20\ 10:57:26 DRBDADM_API_VERSION=88 DRBD_KERNEL_VERSION_CODE=0x08030c DRBDADM_VERSION_CODE=0x08030c DRBDADM_VERSION=8.3.12 # uname -r 2.6.32-220.2.1.el6.x86_64 What's odd is the sync hang only happened once I unmounted the LV /vmstore. Reading the docs I found it mentions the resource not having to be empty, but doesn't mention if it can be in use. I'd like to give this another shot, but due to the long outage this caused I've been told to leave this system be and not work on syncing my two nodes to facilitate the migration of our VMs from their current temporary system. If I'm allowed to give this another go, for the node with data to become "UpToDate" , shouldn't it have to be promoted to primary first? The sync seemed to start once the resource was not in use and attached. Additionally, does getting the resource "UpToDate" actually take time depending on the size of resource / space used or it just populating the metadisk? The system I have no data on took no time at all to become available and reach "Inconsistent". Thanks! - Trey -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120123/4b5ffba0/attachment.htm>