Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm in the process of setting up DRBD from scratch for the first time. The
first phase of my project is simply to sync one virtual server to another
in primary/secondary with no intention to facilitate failover. The primary
(cllakvm2) has 7 VMs in production on it. The secondary (cllakvm1) has the
exact same size LVM with nothing currently stored there.
I've created both resources, but what's strange is once I created the
resource on my "primary" and started the DRBD daemon, it began to sync
before I told it "drbdadm primary --force resource". Now every command I
attempt to run just hangs. I've also tried "kill -9 <pid>" on
the processes with no luck. I also can't remount the /vmstore partition
(LV where all virtual disks live). I've tried drdbadmn down r0, drbdadm
disconnect --force r0, and nothing will stop the processes which seem to
hang and then never stop. The process list of drbd processes in towards
bottom of this email.
This is on CentOS 6 with DRBD 8.4.1. Here's my relevant configs
global_common.conf
------------
global {
usage-count no;
# minor-count dialog-refresh disable-ip-verification
}
common {
handlers {
# pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
# pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
# local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target
"/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target
/usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout
wait-after-sb
}
options {
# cpu-mask on-no-data-accessible
}
disk {
# size max-bio-bvecs on-io-error fencing disk-barrier
disk-flushes
# disk-drain md-flushes resync-rate resync-after al-extents
# c-plan-ahead c-delay-target c-fill-target c-max-rate
# c-min-rate disk-timeout
}
net {
# protocol timeout max-epoch-size max-buffers
unplug-watermark
# connect-int ping-int sndbuf-size rcvbuf-size ko-count
# allow-two-primaries cram-hmac-alg shared-secret
after-sb-0pri
# after-sb-1pri after-sb-2pri always-asbp rr-conflict
# ping-timeout data-integrity-alg tcp-cork on-congestion
# congestion-fill congestion-extents csums-alg verify-alg
# use-rle
protocol C;
}
}
r0.res
===============
resource r0 {
on cllakvm2.tamu.edu {
device /dev/drbd1;
disk /dev/vg_cllakvm2/lv_vmstore;
address 128.194.115.76:7789;
meta-disk internal;
}
on cllakvm1.tamu.edu {
device /dev/drbd1;
disk /dev/vg_cllakvm1/lv_vmstore;
address 165.91.253.227:7789;
meta-disk internal;
}
}
=================
Since both were built before the DRBD configuration I had to shrink each
filesystem in the LV by about 70M. I then ran "drbdadm create-md r0". When
I tried to start the drbd service on cllakvm2 I got the following..
==============
# service drbd start
Starting DRBD resources: [
create res: r0
prepare disk: r0
adjust disk: r0:failed(attach:10)
adjust net: r0
]
.
=============
I then unmounted "/vmstore" (with all VMs stopped) , re-ran create-md and
then restarted drbd which produced no errors. Right after that that no
drbdadm commands would respond, and saw from "/proc/drbd" that the status
showed syncing, without is yet being promoted to primary.
This is what the current status is on cllakvm1 (secondary)
==============
# service drbd status
drbd driver loaded OK; device status:
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag at Build64R6,
2011-12-21 06:08:50
m:res cs ro ds p
mounted fstype
1:r0 Connected Secondary/Secondary Inconsistent/Inconsistent C
# cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag at Build64R6,
2011-12-21 06:08:50
1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C
r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
oos:1132853452
===========
On the primary, cllakvm2, this happens
=============
# service drbd status
drbd driver loaded OK; device status:
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag at Build64R6,
2011-12-21 06:08:50
*< HANGS HERE >*
# cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by dag at Build64R6,
2011-12-21 06:08:50
1: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
oos:1132853452
[>....................] sync'ed: 0.1% (1106300/1106300)M
finish: 756809:02:32 speed: 0 (0) K/sec
====================
=========
# ps aux | grep drbd
root 1099 0.0 0.0 4140 512 pts/10 D 16:21 0:00 drbdsetup
sh-status 1
root 1560 0.0 0.0 103220 872 pts/10 S+ 16:40 0:00 grep drbd
root 4484 0.0 0.0 4140 508 pts/10 D 16:23 0:00 drbdsetup
sh-status 1
root 6542 0.0 0.0 4140 512 pts/10 D 16:24 0:00 drbdsetup
primary 1 --force
root 7959 0.0 0.0 4140 512 pts/10 D 16:25 0:00 drbdsetup
down r0
root 10581 0.0 0.0 4140 512 pts/10 D 16:27 0:00 drbdsetup
disconnect ipv4:128.194.115.76:7789 ipv4:165.91.253.227:7789
root 10783 0.0 0.0 4140 512 pts/10 D 16:27 0:00 drbdsetup
disconnect ipv4:128.194.115.76:7789 ipv4:165.91.253.227:7789 --force
root 12652 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_w_r0]
root 12654 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_r_r0]
root 12659 0.0 0.0 0 0 ? S 16:09 0:00 [drbd_a_r0]
root 26059 0.0 0.0 11284 664 pts/10 S 16:36 0:00 /bin/bash
/etc/init.d/drbd status
root 26062 0.0 0.0 4140 508 pts/10 D 16:36 0:00 drbdsetup
sh-status 1
root 27570 0.0 0.0 11284 664 pts/10 S 16:37 0:00 /bin/bash
/etc/init.d/drbd status
root 27573 0.0 0.0 4140 512 pts/10 D 16:37 0:00 drbdsetup
sh-status 1
root 32255 0.0 0.0 4140 552 pts/10 D 16:20 0:00 drbdsetup
r0 down
==============
Any advice is greatly welcome, I'm having a mild panic attack because the
VMs were paused long enough to re-size the filesystem to allow internal
metadisk but now I can't remount and they can't be started back up.
Thanks
- Trey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120122/31613a5d/attachment.htm>