<br><br><div class="gmail_quote">On Sun, Jan 22, 2012 at 4:46 PM, Trey Dockendorf <span dir="ltr"><<a href="mailto:treydock@gmail.com">treydock@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I'm in the process of setting up DRBD from scratch for the first time. The first phase of my project is simply to sync one virtual server to another in primary/secondary with no intention to facilitate failover. <span style="font-size:13px;background-color:rgb(247,245,242)">The primary (cllakvm2) has 7 VMs in production on it. The secondary (cllakvm1) has the exact same size LVM with nothing currently stored there.</span><div>
<br></div><div>I've created both resources, but what's strange is once I created the resource on my "primary" and started the DRBD daemon, it began to sync before I told it "<span style="background-color:rgb(247,245,242)">drbdadm primary --force resource". Now every command I attempt to run just hangs. I've also tried "kill -9 <pid>" on the processes with no luck. I also can't remount the /vmstore partition (LV where all virtual disks live). </span><span style="background-color:rgb(247,245,242);font-size:13px">I've tried drdbadmn down r0, drbdadm disconnect --force r0, and nothing will stop the processes which seem to hang and then never stop. The process list of drbd processes in towards bottom of this email.</span></div>
<div><span style="background-color:rgb(247,245,242);font-size:13px"><br></span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">This is on CentOS 6 with DRBD 8.4.1. Here's my relevant configs</span></div>
<div><span style="background-color:rgb(247,245,242);font-size:13px"><br></span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">global_common.conf</span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">------------</span></div>
<div><span style="background-color:rgb(247,245,242)"><div><div>global {</div><div> usage-count no;</div><div> # minor-count dialog-refresh disable-ip-verification</div><div>}</div><div><br></div><div>common {</div>
<div> handlers {</div><div> # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";</div><div>
# pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";</div><div> # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";</div>
<div> # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";</div><div> # split-brain "/usr/lib/drbd/notify-split-brain.sh root";</div><div> # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";</div>
<div> # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";</div><div> # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;</div><div>
}</div><div><br></div><div> startup {</div><div> # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb</div><div> }</div><div><br></div><div> options {</div><div> # cpu-mask on-no-data-accessible</div>
<div> }</div><div><br></div><div> disk {</div><div> # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes</div><div> # disk-drain md-flushes resync-rate resync-after al-extents</div>
<div> # c-plan-ahead c-delay-target c-fill-target c-max-rate</div><div> # c-min-rate disk-timeout</div><div> }</div><div><br></div><div> net {</div><div> # protocol timeout max-epoch-size max-buffers unplug-watermark</div>
<div> # connect-int ping-int sndbuf-size rcvbuf-size ko-count</div><div> # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri</div><div> # after-sb-1pri after-sb-2pri always-asbp rr-conflict</div>
<div> # ping-timeout data-integrity-alg tcp-cork on-congestion</div><div> # congestion-fill congestion-extents csums-alg verify-alg</div><div> # use-rle</div><div><br></div><div>
protocol C;</div><div> }</div><div>}</div></div><div style="font-size:13px"><br></div></span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">r0.res</span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">===============</span></div>
<div><span style="background-color:rgb(247,245,242)"><div style="font-size:13px">resource r0 {</div><div style="font-size:13px"> on <a href="http://cllakvm2.tamu.edu" target="_blank">cllakvm2.tamu.edu</a> {</div><div style="font-size:13px">
device /dev/drbd1;</div><div style="font-size:13px"> disk /dev/vg_cllakvm2/lv_vmstore;</div><div style="font-size:13px"> address <a href="http://128.194.115.76:7789" target="_blank">128.194.115.76:7789</a>;</div>
<div style="font-size:13px">
meta-disk internal;</div><div style="font-size:13px"> }</div><div style="font-size:13px"> on <a href="http://cllakvm1.tamu.edu" target="_blank">cllakvm1.tamu.edu</a> {</div><div style="font-size:13px"> device /dev/drbd1;</div>
<div style="font-size:13px"> disk /dev/vg_cllakvm1/lv_vmstore;</div><div style="font-size:13px"> address <a href="http://165.91.253.227:7789" target="_blank">165.91.253.227:7789</a>;</div><div style="font-size:13px">
meta-disk internal;</div>
<div style="font-size:13px"> }</div><div style="font-size:13px">}</div><div style="font-size:13px">=================</div><div style="font-size:13px"><br></div><div style="font-size:13px">Since both were built before the DRBD configuration I had to shrink each filesystem in the LV by about 70M. I then ran "drbdadm create-md r0". When I tried to start the drbd service on cllakvm2 I got the following..</div>
<div style="font-size:13px">==============</div><div style="font-size:13px"><div># service drbd start</div><div>Starting DRBD resources: [</div><div> create res: r0</div><div> prepare disk: r0</div><div> adjust disk: r0:failed(attach:10)</div>
<div> adjust net: r0</div><div>]</div><div>.</div></div><div style="font-size:13px">=============</div><div style="font-size:13px"><br></div><div style="font-size:13px">I then unmounted "/vmstore" (with all VMs stopped) , re-ran create-md and then restarted drbd which produced no errors. Right after that that no drbdadm commands would respond, and saw from "/proc/drbd" that the status showed syncing, without is yet being promoted to primary.</div>
<div style="font-size:13px"><br></div><div style="font-size:13px">This is what the current status is on cllakvm1 (secondary)</div><div style="font-size:13px">==============</div><div><div># service drbd status</div><div>
drbd driver loaded OK; device status:</div>
<div>version: 8.4.1 (api:1/proto:86-100)</div><div>GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag@Build64R6, 2011-12-21 06:08:50</div><div>m:res cs ro ds p mounted fstype</div>
<div>1:r0 Connected Secondary/Secondary Inconsistent/Inconsistent C</div><div><br></div><div># cat /proc/drbd </div><div>version: 8.4.1 (api:1/proto:86-100)</div><div>GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag@Build64R6, 2011-12-21 06:08:50</div>
<div><br></div><div> 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----</div><div> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1132853452</div></div><div style="font-size:13px">
===========</div><div style="font-size:13px"><br></div><div style="font-size:13px">On the primary, cllakvm2, this happens</div><div style="font-size:13px"><br></div><div style="font-size:13px">=============</div><div style="font-size:13px">
<br></div><div><div># service drbd status</div><div>drbd driver loaded OK; device status:</div><div>version: 8.4.1 (api:1/proto:86-100)</div><div>GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag@Build64R6, 2011-12-21 06:08:50</div>
</div><div><b>< HANGS HERE ></b></div><div><br></div><div><div># cat /proc/drbd </div><div>version: 8.4.1 (api:1/proto:86-100)</div><div>GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag@Build64R6, 2011-12-21 06:08:50</div>
<div><br></div><div> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r-----</div><div> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1132853452</div><div> [>....................] sync'ed: 0.1% (1106300/1106300)M</div>
<div> finish: 756809:02:32 speed: 0 (0) K/sec</div></div></span></div><div><br></div><div>====================</div><div><span style="background-color:rgb(247,245,242)"><div style="font-size:13px"><br></div><div style="font-size:13px">
=========</div><div style="font-size:13px"><div># ps aux | grep drbd</div><div>root 1099 0.0 0.0 4140 512 pts/10 D 16:21 0:00 drbdsetup sh-status 1</div><div>root 1560 0.0 0.0 103220 872 pts/10 S+ 16:40 0:00 grep drbd</div>
<div>root 4484 0.0 0.0 4140 508 pts/10 D 16:23 0:00 drbdsetup sh-status 1</div><div>root 6542 0.0 0.0 4140 512 pts/10 D 16:24 0:00 drbdsetup primary 1 --force</div><div>root 7959 0.0 0.0 4140 512 pts/10 D 16:25 0:00 drbdsetup down r0</div>
<div>root 10581 0.0 0.0 4140 512 pts/10 D 16:27 0:00 drbdsetup disconnect ipv4:<a href="http://128.194.115.76:7789" target="_blank">128.194.115.76:7789</a> ipv4:<a href="http://165.91.253.227:7789" target="_blank">165.91.253.227:7789</a></div>
<div>root 10783 0.0 0.0 4140 512 pts/10 D 16:27 0:00 drbdsetup disconnect ipv4:<a href="http://128.194.115.76:7789" target="_blank">128.194.115.76:7789</a> ipv4:<a href="http://165.91.253.227:7789" target="_blank">165.91.253.227:7789</a> --force</div>
<div>root 12652 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_w_r0]</div><div>root 12654 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_r_r0]</div><div>root 12659 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_a_r0]</div>
<div>root 26059 0.0 0.0 11284 664 pts/10 S 16:36 0:00 /bin/bash /etc/init.d/drbd status</div><div>root 26062 0.0 0.0 4140 508 pts/10 D 16:36 0:00 drbdsetup sh-status 1</div><div>root 27570 0.0 0.0 11284 664 pts/10 S 16:37 0:00 /bin/bash /etc/init.d/drbd status</div>
<div>root 27573 0.0 0.0 4140 512 pts/10 D 16:37 0:00 drbdsetup sh-status 1</div><div>root 32255 0.0 0.0 4140 552 pts/10 D 16:20 0:00 drbdsetup r0 down</div></div><div style="font-size:13px">
==============</div><div style="font-size:13px"><br></div><div>Any advice is greatly welcome, I'm having a mild panic attack because the VMs were paused long enough to re-size the filesystem to allow internal metadisk but now I can't remount and they can't be started back up.</div>
<div style="font-size:13px"><br></div><div style="font-size:13px">Thanks</div><span class="HOEnZb"><font color="#888888"><div style="font-size:13px">- Trey</div></font></span></span></div>
</blockquote></div><br><div>Sorry to reply to my own post, but I got around the uninterruptable sleep processes by rebooting but now the problematic system can't attach to the drbd resource. I've changed to drbd83 instead of drbd84 from elrepo and the problem is the same.</div>
<div>================</div><div><div># drbdadm attach r0</div><div>0: Failure: (104) Can not open backing device.</div><div>Command 'drbdsetup 0 disk /dev/vg_cllakvm2/lv_vmstore /dev/vg_cllakvm2/lv_vmstore internal --set-defaults --create-device' terminated with exit code 10</div>
</div><div><br></div><div>The status of drbd on cllakvm2 (primary w/ data).</div><div>================</div><div><div># cat /proc/drbd </div><div>version: 8.3.12 (api:88/proto:86-96)</div><div>GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag@Build64R6, 2011-11-20 10:57:03</div>
<div> 0: cs:Connected ro:Secondary/Secondary ds:Diskless/Inconsistent C r-----</div><div> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0</div></div><div><br></div><div><br></div><div><br></div><div>Using drbd84 gave the same "Can not open backing device" with exit code 10.</div>
<div><br></div><div>The strange part is these systems are identical in every way except their volume groups are named after their hostname. The drbd setup is identical also as I'm using Puppet for that too. Any advice on how to troubleshoot or resolve this ?</div>
<div><br></div><div>Thanks</div><div>- Trey</div><div><br></div>