Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Jul 03, 2014 at 04:05:36AM +0200, Giuseppe Ragusa wrote: > Hi all, > I deployed a 2 nodes (physical) RHCS Pacemaker cluster on CentOS 6.5 x86_64 (fully up-to-date) with: > > cman-3.0.12.1-59.el6_5.2.x86_64 > pacemaker-1.1.10-14.el6_5.3.x86_64 > pcs-0.9.90-2.el6.centos.3.noarch > qemu-kvm-0.12.1.2-2.415.el6_5.10.x86_64 > qemu-kvm-tools-0.12.1.2-2.415.el6_5.10.x86_64 > drbd-utils-8.9.0-1.el6.x86_64 > drbd-udev-8.9.0-1.el6.x86_64 > drbd-rgmanager-8.9.0-1.el6.x86_64 > drbd-bash-completion-8.9.0-1.el6.x86_64 > drbd-pacemaker-8.9.0-1.el6.x86_64 > drbd-8.9.0-1.el6.x86_64 > drbd-km-2.6.32_431.20.3.el6.x86_64-8.4.5-1.x86_64 > kernel-2.6.32-431.20.3.el6.x86_64 > > The aim is to run KVM virtual machines backed by DRBD (8.4.5) in an > active/passive mode (no dual primary and so no live migration). > > Just to err on the side of consistency against HA (and to pave the way > for a possible dual-primary live-migration-capable setup), I > configured DRBD for resource-and-stonith with rhcs_fence (that's why I > installed drbd-rgmanager) as fence-peer handler and stonith devices > configured in Pacemaker (pcmk-redirect in cluster.conf). > > The setup "almost" works (all seems ok with: "pcs status", "crm_mon > -Arf1", "corosync-cfgtool -s", "corosync-objctl | grep member") , but > every time it needs a resource promotion (to Master, i.e. becoming > primary) it either fails or fences the other node (the one supposed to > become Slave i.e. secondary) and only then succeeds. > > It happens, for example both on initial resource definition (when > attempting first start) and on node entering standby (when trying to > automatically move the resources by stopping then starting them). > > I collected a full "pcs cluster report" and I can provide a CIB dump, > but I will initially paste here an excerpt from my configuration just > in case it happens to be a simple configuration error that someone can > spot on the fly ;> (hoping...) > > Keep in mind that the setup has separated redundant network > connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s > roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back) > and that FQDNs are correctly resolved through /etc/hosts Make sure youre DRBD are "Connected UpToDate/UpToDate" before you let the cluster take over control of who is master. > DRBD: > > /etc/drbd.d/global_common.conf: > > ------------------------------------------------------------------------------------------------------ > > global { > usage-count no; > } > > common { > protocol C; > disk { > on-io-error detach; > fencing resource-and-stonith; > disk-barrier no; > disk-flushes no; > al-extents 3389; > c-plan-ahead 200; > c-fill-target 15M; > c-max-rate 100M; > c-min-rate 10M; > } > net { > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > csums-alg sha1; > data-integrity-alg sha1; > max-buffers 8000; > max-epoch-size 8000; > unplug-watermark 16; > sndbuf-size 0; > verify-alg sha1; > } > startup { > wfc-timeout 300; > outdated-wfc-timeout 80; > degr-wfc-timeout 120; > } > handlers { > fence-peer "/usr/lib/drbd/rhcs_fence"; > } > } > > ------------------------------------------------------------------------------------------------------ > > Sample DRBD resource (there are others, similar) > /etc/drbd.d/dc_vm.res: > > ------------------------------------------------------------------------------------------------------ > > resource dc_vm { > device /dev/drbd1; > disk /dev/VolGroup00/dc_vm; > meta-disk internal; > on cluster1.verolengo.privatelan { > address ipv4 172.16.200.1:7790; > } > on cluster2.verolengo.privatelan { > address ipv4 172.16.200.2:7790; > } > } > > ------------------------------------------------------------------------------------------------------ > > RHCS: > > /etc/cluster/cluster.conf > > ------------------------------------------------------------------------------------------------------ > > <?xml version="1.0"?> > <cluster name="vclu" config_version="14"> > <cman two_node="1" expected_votes="1" keyfile="/etc/corosync/authkey" transport="udpu" port="5405"/> > <totem consensus="60000" join="6000" token="100000" token_retransmits_before_loss_const="20" rrp_mode="passive" secauth="on"/> > <clusternodes> > <clusternode name="cluster1.verolengo.privatelan" votes="1" nodeid="1"> > <altname name="clusterlan1.verolengo.privatelan" port="6405"/> > <fence> > <method name="pcmk-redirect"> > <device name="pcmk" port="cluster1.verolengo.privatelan"/> > </method> > </fence> > </clusternode> > <clusternode name="cluster2.verolengo.privatelan" votes="1" nodeid="2"> > <altname name="clusterlan2.verolengo.privatelan" port="6405"/> > <fence> > <method name="pcmk-redirect"> > <device name="pcmk" port="cluster2.verolengo.privatelan"/> > </method> > </fence> > </clusternode> > </clusternodes> > <fencedevices> > <fencedevice name="pcmk" agent="fence_pcmk"/> > </fencedevices> > <fence_daemon clean_start="0" post_fail_delay="30" post_join_delay="30"/> > <logging debug="on"/> > <rm disabled="1"> > <failoverdomains/> > <resources/> > </rm> > </cluster> > > ------------------------------------------------------------------------------------------------------ > > Pacemaker: > > PROPERTIES: > > pcs property set default-resource-stickiness=100 > pcs property set no-quorum-policy=ignore > > STONITH: > > pcs stonith create ilocluster1 fence_ilo2 action="off" delay="10" \ > ipaddr="ilocluster1.verolengo.privatelan" login="cluster2" passwd="test" power_wait="4" \ > pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan" op monitor interval=60s > pcs stonith create ilocluster2 fence_ilo2 action="off" \ > ipaddr="ilocluster2.verolengo.privatelan" login="cluster1" passwd="test" power_wait="4" \ > pcmk_host_check="static-list" pcmk_host_list="cluster2.verolengo.privatelan" op monitor interval=60s > pcs stonith create pdu1 fence_apc action="off" \ > ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \ > pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7" \ > pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan" op monitor interval=60s > > pcs stonith level add 1 cluster1.verolengo.privatelan ilocluster1 > pcs stonith level add 2 cluster1.verolengo.privatelan pdu1 > pcs stonith level add 1 cluster2.verolengo.privatelan ilocluster2 > pcs stonith level add 2 cluster2.verolengo.privatelan pdu1 > > pcs property set stonith-enabled=true > pcs property set stonith-action=off > > SAMPLE RESOURCE: > > pcs cluster cib dc_cfg > pcs -f dc_cfg resource create DCVMDisk ocf:linbit:drbd \ > drbd_resource=dc_vm op monitor interval="31s" role="Master" \ > op monitor interval="29s" role="Slave" \ > op start interval="0" timeout="120s" \ > op stop interval="0" timeout="180s" > pcs -f dc_cfg resource master DCVMDiskClone DCVMDisk \ > master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \ > notify=true target-role=Started is-managed=true > pcs -f dc_cfg resource create DCVM ocf:heartbeat:VirtualDomain \ > config=/etc/libvirt/qemu/dc.xml migration_transport=tcp migration_network_suffix=-10g \ > hypervisor=qemu:///system meta allow-migrate=false target-role=Started is-managed=true \ > op start interval="0" timeout="120s" \ > op stop interval="0" timeout="120s" \ > op monitor interval="60s" timeout="120s" > pcs -f dc_cfg constraint colocation add DCVM DCVMDiskClone INFINITY with-rsc-role=Master > pcs -f dc_cfg constraint order promote DCVMDiskClone then start DCVM > pcs -f dc_cfg constraint location DCVM prefers cluster2.verolengo.privatelan=50 > pcs cluster cib-push firewall_cfg > > Since I know that pcs still has some rough edges, I installed crmsh too, but never actually used it. > > Many thanks in advance for your attention. > > Kind regards, > Giuseppe Ragusa > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed