I'm in the process of setting up DRBD from scratch for the first time. The first phase of my project is simply to sync one virtual server to another in primary/secondary with no intention to facilitate failover. <span style="font-size:13px;background-color:rgb(247,245,242)">The primary (cllakvm2) has 7 VMs in production on it. The secondary (cllakvm1) has the exact same size LVM with nothing currently stored there.</span><div>
<br></div><div>I've created both resources, but what's strange is once I created the resource on my "primary" and started the DRBD daemon, it began to sync before I told it "<span style="background-color:rgb(247,245,242)">drbdadm primary --force resource". Now every command I attempt to run just hangs. I've also tried "kill -9 <pid>" on the processes with no luck. I also can't remount the /vmstore partition (LV where all virtual disks live). </span><span style="background-color:rgb(247,245,242);font-size:13px">I've tried drdbadmn down r0, drbdadm disconnect --force r0, and nothing will stop the processes which seem to hang and then never stop. The process list of drbd processes in towards bottom of this email.</span></div>
<div><span style="background-color:rgb(247,245,242);font-size:13px"><br></span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">This is on CentOS 6 with DRBD 8.4.1. Here's my relevant configs</span></div>
<div><span style="background-color:rgb(247,245,242);font-size:13px"><br></span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">global_common.conf</span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">------------</span></div>
<div><span style="background-color:rgb(247,245,242)"><div><div>global {</div><div> usage-count no;</div><div> # minor-count dialog-refresh disable-ip-verification</div><div>}</div><div><br></div><div>common {</div>
<div> handlers {</div><div> # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";</div><div>
# pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";</div><div> # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";</div>
<div> # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";</div><div> # split-brain "/usr/lib/drbd/notify-split-brain.sh root";</div><div> # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";</div>
<div> # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";</div><div> # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;</div><div>
}</div><div><br></div><div> startup {</div><div> # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb</div><div> }</div><div><br></div><div> options {</div><div> # cpu-mask on-no-data-accessible</div>
<div> }</div><div><br></div><div> disk {</div><div> # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes</div><div> # disk-drain md-flushes resync-rate resync-after al-extents</div>
<div> # c-plan-ahead c-delay-target c-fill-target c-max-rate</div><div> # c-min-rate disk-timeout</div><div> }</div><div><br></div><div> net {</div><div> # protocol timeout max-epoch-size max-buffers unplug-watermark</div>
<div> # connect-int ping-int sndbuf-size rcvbuf-size ko-count</div><div> # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri</div><div> # after-sb-1pri after-sb-2pri always-asbp rr-conflict</div>
<div> # ping-timeout data-integrity-alg tcp-cork on-congestion</div><div> # congestion-fill congestion-extents csums-alg verify-alg</div><div> # use-rle</div><div><br></div><div>
protocol C;</div><div> }</div><div>}</div></div><div style="font-size:13px"><br></div></span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">r0.res</span></div><div><span style="background-color:rgb(247,245,242);font-size:13px">===============</span></div>
<div><span style="background-color:rgb(247,245,242)"><div style="font-size:13px">resource r0 {</div><div style="font-size:13px"> on <a href="http://cllakvm2.tamu.edu">cllakvm2.tamu.edu</a> {</div><div style="font-size:13px">
device /dev/drbd1;</div><div style="font-size:13px"> disk /dev/vg_cllakvm2/lv_vmstore;</div><div style="font-size:13px"> address <a href="http://128.194.115.76:7789">128.194.115.76:7789</a>;</div><div style="font-size:13px">
meta-disk internal;</div><div style="font-size:13px"> }</div><div style="font-size:13px"> on <a href="http://cllakvm1.tamu.edu">cllakvm1.tamu.edu</a> {</div><div style="font-size:13px"> device /dev/drbd1;</div>
<div style="font-size:13px"> disk /dev/vg_cllakvm1/lv_vmstore;</div><div style="font-size:13px"> address <a href="http://165.91.253.227:7789">165.91.253.227:7789</a>;</div><div style="font-size:13px"> meta-disk internal;</div>
<div style="font-size:13px"> }</div><div style="font-size:13px">}</div><div style="font-size:13px">=================</div><div style="font-size:13px"><br></div><div style="font-size:13px">Since both were built before the DRBD configuration I had to shrink each filesystem in the LV by about 70M. I then ran "drbdadm create-md r0". When I tried to start the drbd service on cllakvm2 I got the following..</div>
<div style="font-size:13px">==============</div><div style="font-size:13px"><div># service drbd start</div><div>Starting DRBD resources: [</div><div> create res: r0</div><div> prepare disk: r0</div><div> adjust disk: r0:failed(attach:10)</div>
<div> adjust net: r0</div><div>]</div><div>.</div></div><div style="font-size:13px">=============</div><div style="font-size:13px"><br></div><div style="font-size:13px">I then unmounted "/vmstore" (with all VMs stopped) , re-ran create-md and then restarted drbd which produced no errors. Right after that that no drbdadm commands would respond, and saw from "/proc/drbd" that the status showed syncing, without is yet being promoted to primary.</div>
<div style="font-size:13px"><br></div><div style="font-size:13px">This is what the current status is on cllakvm1 (secondary)</div><div style="font-size:13px">==============</div><div><div># service drbd status</div><div>drbd driver loaded OK; device status:</div>
<div>version: 8.4.1 (api:1/proto:86-100)</div><div>GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag@Build64R6, 2011-12-21 06:08:50</div><div>m:res cs ro ds p mounted fstype</div>
<div>1:r0 Connected Secondary/Secondary Inconsistent/Inconsistent C</div><div><br></div><div># cat /proc/drbd </div><div>version: 8.4.1 (api:1/proto:86-100)</div><div>GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag@Build64R6, 2011-12-21 06:08:50</div>
<div><br></div><div> 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----</div><div> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1132853452</div></div><div style="font-size:13px">
===========</div><div style="font-size:13px"><br></div><div style="font-size:13px">On the primary, cllakvm2, this happens</div><div style="font-size:13px"><br></div><div style="font-size:13px">=============</div><div style="font-size:13px">
<br></div><div><div># service drbd status</div><div>drbd driver loaded OK; device status:</div><div>version: 8.4.1 (api:1/proto:86-100)</div><div>GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag@Build64R6, 2011-12-21 06:08:50</div>
</div><div><b>< HANGS HERE ></b></div><div><br></div><div><div># cat /proc/drbd </div><div>version: 8.4.1 (api:1/proto:86-100)</div><div>GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag@Build64R6, 2011-12-21 06:08:50</div>
<div><br></div><div> 1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r-----</div><div> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1132853452</div><div> [>....................] sync'ed: 0.1% (1106300/1106300)M</div>
<div> finish: 756809:02:32 speed: 0 (0) K/sec</div></div></span></div><div><br></div><div>====================</div><div><span style="background-color:rgb(247,245,242)"><div style="font-size:13px"><br></div><div style="font-size:13px">
=========</div><div style="font-size:13px"><div># ps aux | grep drbd</div><div>root 1099 0.0 0.0 4140 512 pts/10 D 16:21 0:00 drbdsetup sh-status 1</div><div>root 1560 0.0 0.0 103220 872 pts/10 S+ 16:40 0:00 grep drbd</div>
<div>root 4484 0.0 0.0 4140 508 pts/10 D 16:23 0:00 drbdsetup sh-status 1</div><div>root 6542 0.0 0.0 4140 512 pts/10 D 16:24 0:00 drbdsetup primary 1 --force</div><div>root 7959 0.0 0.0 4140 512 pts/10 D 16:25 0:00 drbdsetup down r0</div>
<div>root 10581 0.0 0.0 4140 512 pts/10 D 16:27 0:00 drbdsetup disconnect ipv4:<a href="http://128.194.115.76:7789">128.194.115.76:7789</a> ipv4:<a href="http://165.91.253.227:7789">165.91.253.227:7789</a></div>
<div>root 10783 0.0 0.0 4140 512 pts/10 D 16:27 0:00 drbdsetup disconnect ipv4:<a href="http://128.194.115.76:7789">128.194.115.76:7789</a> ipv4:<a href="http://165.91.253.227:7789">165.91.253.227:7789</a> --force</div>
<div>root 12652 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_w_r0]</div><div>root 12654 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_r_r0]</div><div>root 12659 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_a_r0]</div>
<div>root 26059 0.0 0.0 11284 664 pts/10 S 16:36 0:00 /bin/bash /etc/init.d/drbd status</div><div>root 26062 0.0 0.0 4140 508 pts/10 D 16:36 0:00 drbdsetup sh-status 1</div><div>root 27570 0.0 0.0 11284 664 pts/10 S 16:37 0:00 /bin/bash /etc/init.d/drbd status</div>
<div>root 27573 0.0 0.0 4140 512 pts/10 D 16:37 0:00 drbdsetup sh-status 1</div><div>root 32255 0.0 0.0 4140 552 pts/10 D 16:20 0:00 drbdsetup r0 down</div></div><div style="font-size:13px">
==============</div><div style="font-size:13px"><br></div><div>Any advice is greatly welcome, I'm having a mild panic attack because the VMs were paused long enough to re-size the filesystem to allow internal metadisk but now I can't remount and they can't be started back up.</div>
<div style="font-size:13px"><br></div><div style="font-size:13px">Thanks</div><div style="font-size:13px">- Trey</div></span></div>