<br><br><div class="gmail_quote">On Mon, Jan 23, 2012 at 3:04 AM, Felix Frank <span dir="ltr"><<a href="mailto:ff@mpexnet.de">ff@mpexnet.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<div class="im"><br>
On 01/23/2012 01:34 AM, Trey Dockendorf wrote:<br>
> Using drbd84 gave the same "Can not open backing device" with exit code 10.<br>
><br>
> The strange part is these systems are identical in every way except<br>
> their volume groups are named after their hostname. The drbd setup is<br>
> identical also as I'm using Puppet for that too. Any advice on how to<br>
> troubleshoot or resolve this ?<br>
<br>
</div>so you got a hung sync on your first try? Ugh, dreadful. Kudos for<br>
staying aboard despite this.<br>
<br>
First thing is to check your kernel output (dmesg, kern.log or similar)<br>
for more details. Please share a meaningul excerpt with the list (i.e.,<br>
do another "drbdadm attach all", then paste the new log entries).<br>
<br>
Regards,<br>
Felix<br>
</blockquote></div><br><div>So this is the the two failures (one in same I presume).</div><div>========</div><div><div># service drbd start</div><div>Starting DRBD resources: [ d(r0) 0: Failure: (104) Can not open backing device.</div>
<div><br></div><div>[r0] cmd /sbin/drbdsetup 0 disk /dev/vg_cllakvm2/lv_vmstore /dev/vg_cllakvm2/lv_vmstore internal --set-defaults --create-device failed - continuing!</div><div> </div><div>n(r0) ].</div></div><div><br>
</div><div>========</div><div><div># drbdadm attach r0</div><div>0: Failure: (104) Can not open backing device.</div><div>Command 'drbdsetup 0 disk /dev/vg_cllakvm2/lv_vmstore /dev/vg_cllakvm2/lv_vmstore internal --set-defaults --create-device' terminated with exit code 10</div>
</div><div><br></div><div>This is the DRBD information as well as the logs that show up during this failure</div><div><br></div><div><div># cat /proc/drbd </div><div>version: 8.3.12 (api:88/proto:86-96)</div><div>GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag@Build64R6, 2011-11-20 10:57:03</div>
<div> 0: cs:Connected ro:Secondary/Secondary ds:Diskless/Inconsistent C r-----</div><div> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0</div></div><div><br></div><div># dmesgdrbd module is older than RHEL 6.2 ... applying fixups</div>
<div>drbd: initialized. Version: 8.3.12 (api:88/proto:86-96)</div><div>drbd: GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag@Build64R6, 2011-11-20 10:57:03</div><div>drbd: registered as block device major 147</div>
<div>drbd: minor_table @ 0xffff880431310880</div><div>block drbd0: Starting worker thread (from cqueue [13446])</div><div>block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16</div><div>block drbd0: drbd_bm_resize called with capacity == 0</div>
<div>block drbd0: worker terminated</div><div>block drbd0: Terminating worker thread</div><div>block drbd0: Starting worker thread (from cqueue [13446])</div><div>block drbd0: conn( StandAlone -> Unconnected ) </div><div>
block drbd0: Starting receiver thread (from drbd0_worker [13457])</div><div>block drbd0: receiver (re)started</div><div>block drbd0: conn( Unconnected -> WFConnection ) </div><div>block drbd0: Handshake successful: Agreed network protocol version 96</div>
<div>block drbd0: conn( WFConnection -> WFReportParams ) </div><div>block drbd0: Starting asender thread (from drbd0_receiver [13458])</div><div>block drbd0: data-integrity-alg: <not-used></div><div>block drbd0: max BIO size = 4096</div>
<div>block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent ) </div><div>block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16</div><div>
<br></div><div>I don't have a kernel.log, it all goes to /var/log/messages but it has the same output as dmesg</div><div>=========</div><div><div>Jan 23 20:53:33 cllakvm2 kernel: drbd: initialized. Version: 8.3.12 (api:88/proto:86-96)</div>
<div>Jan 23 20:53:33 cllakvm2 kernel: drbd: GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag@Build64R6, 2011-11-20 10:57:03</div><div>Jan 23 20:53:33 cllakvm2 kernel: drbd: registered as block device major 147</div>
<div>Jan 23 20:53:33 cllakvm2 kernel: drbd: minor_table @ 0xffff880431310880</div><div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting worker thread (from cqueue [13446])</div><div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16</div>
<div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: drbd_bm_resize called with capacity == 0</div><div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: worker terminated</div><div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Terminating worker thread</div>
<div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting worker thread (from cqueue [13446])</div><div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: conn( StandAlone -> Unconnected ) </div><div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: Starting receiver thread (from drbd0_worker [13457])</div>
<div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: receiver (re)started</div><div>Jan 23 20:53:33 cllakvm2 kernel: block drbd0: conn( Unconnected -> WFConnection ) </div><div>Jan 23 20:53:54 cllakvm2 kernel: block drbd0: Handshake successful: Agreed network protocol version 96</div>
<div>Jan 23 20:53:54 cllakvm2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) </div><div>Jan 23 20:53:54 cllakvm2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [13458])</div><div>Jan 23 20:53:54 cllakvm2 kernel: block drbd0: data-integrity-alg: <not-used></div>
<div>Jan 23 20:53:54 cllakvm2 kernel: block drbd0: max BIO size = 4096</div><div>Jan 23 20:53:54 cllakvm2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent ) </div>
<div>Jan 23 20:54:03 cllakvm2 kernel: block drbd0: open("/dev/vg_cllakvm2/lv_vmstore") failed with -16</div></div><div><br></div><div><br></div><div>Versions ... </div><div>==============</div><div><br></div><div>
<div># drbdadm --version</div><div>DRBDADM_BUILDTAG=GIT-hash:\ e2a8ef4656be026bbae540305fcb998a5991090f\ build\ by\ dag@Build64R6\,\ 2011-11-20\ 10:57:26</div><div>DRBDADM_API_VERSION=88</div><div>DRBD_KERNEL_VERSION_CODE=0x08030c</div>
<div>DRBDADM_VERSION_CODE=0x08030c</div><div>DRBDADM_VERSION=8.3.12</div></div><div><br></div><div><div># uname -r</div><div>2.6.32-220.2.1.el6.x86_64</div></div><div><br></div><div><br></div><div>What's odd is the sync hang only happened once I unmounted the LV /vmstore. Reading the docs I found it mentions the resource not having to be empty, but doesn't mention if it can be in use.</div>
<div><br></div><div>I'd like to give this another shot, but due to the long outage this caused I've been told to leave this system be and not work on syncing my two nodes to facilitate the migration of our VMs from their current temporary system.</div>
<div><br></div><div>If I'm allowed to give this another go, for the node with data to become "UpToDate" , shouldn't it have to be promoted to primary first? The sync seemed to start once the resource was not in use and attached. Additionally, does getting the resource "UpToDate" actually take time depending on the size of resource / space used or it just populating the metadisk? The system I have no data on took no time at all to become available and reach "Inconsistent".</div>
<div><br></div><div>Thanks!</div><div>- Trey</div>