Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, I deployed a 2 nodes (physical) RHCS Pacemaker cluster on CentOS 6.5 x86_64 (fully up-to-date) with: cman-3.0.12.1-59.el6_5.2.x86_64 pacemaker-1.1.10-14.el6_5.3.x86_64 pcs-0.9.90-2.el6.centos.3.noarch qemu-kvm-0.12.1.2-2.415.el6_5.10.x86_64 qemu-kvm-tools-0.12.1.2-2.415.el6_5.10.x86_64 drbd-utils-8.9.0-1.el6.x86_64 drbd-udev-8.9.0-1.el6.x86_64 drbd-rgmanager-8.9.0-1.el6.x86_64 drbd-bash-completion-8.9.0-1.el6.x86_64 drbd-pacemaker-8.9.0-1.el6.x86_64 drbd-8.9.0-1.el6.x86_64 drbd-km-2.6.32_431.20.3.el6.x86_64-8.4.5-1.x86_64 kernel-2.6.32-431.20.3.el6.x86_64 The aim is to run KVM virtual machines backed by DRBD (8.4.5) in an active/passive mode (no dual primary and so no live migration). Just to err on the side of consistency against HA (and to pave the way for a possible dual-primary live-migration-capable setup), I configured DRBD for resource-and-stonith with rhcs_fence (that's why I installed drbd-rgmanager) as fence-peer handler and stonith devices configured in Pacemaker (pcmk-redirect in cluster.conf). The setup "almost" works (all seems ok with: "pcs status", "crm_mon -Arf1", "corosync-cfgtool -s", "corosync-objctl | grep member") , but every time it needs a resource promotion (to Master, i.e. becoming primary) it either fails or fences the other node (the one supposed to become Slave i.e. secondary) and only then succeeds. It happens, for example both on initial resource definition (when attempting first start) and on node entering standby (when trying to automatically move the resources by stopping then starting them). I collected a full "pcs cluster report" and I can provide a CIB dump, but I will initially paste here an excerpt from my configuration just in case it happens to be a simple configuration error that someone can spot on the fly ;> (hoping...) Keep in mind that the setup has separated redundant network connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back) and that FQDNs are correctly resolved through /etc/hosts DRBD: /etc/drbd.d/global_common.conf: ------------------------------------------------------------------------------------------------------ global { usage-count no; } common { protocol C; disk { on-io-error detach; fencing resource-and-stonith; disk-barrier no; disk-flushes no; al-extents 3389; c-plan-ahead 200; c-fill-target 15M; c-max-rate 100M; c-min-rate 10M; } net { after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; csums-alg sha1; data-integrity-alg sha1; max-buffers 8000; max-epoch-size 8000; unplug-watermark 16; sndbuf-size 0; verify-alg sha1; } startup { wfc-timeout 300; outdated-wfc-timeout 80; degr-wfc-timeout 120; } handlers { fence-peer "/usr/lib/drbd/rhcs_fence"; } } ------------------------------------------------------------------------------------------------------ Sample DRBD resource (there are others, similar) /etc/drbd.d/dc_vm.res: ------------------------------------------------------------------------------------------------------ resource dc_vm { device /dev/drbd1; disk /dev/VolGroup00/dc_vm; meta-disk internal; on cluster1.verolengo.privatelan { address ipv4 172.16.200.1:7790; } on cluster2.verolengo.privatelan { address ipv4 172.16.200.2:7790; } } ------------------------------------------------------------------------------------------------------ RHCS: /etc/cluster/cluster.conf ------------------------------------------------------------------------------------------------------ <?xml version="1.0"?> <cluster name="vclu" config_version="14"> <cman two_node="1" expected_votes="1" keyfile="/etc/corosync/authkey" transport="udpu" port="5405"/> <totem consensus="60000" join="6000" token="100000" token_retransmits_before_loss_const="20" rrp_mode="passive" secauth="on"/> <clusternodes> <clusternode name="cluster1.verolengo.privatelan" votes="1" nodeid="1"> <altname name="clusterlan1.verolengo.privatelan" port="6405"/> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="cluster1.verolengo.privatelan"/> </method> </fence> </clusternode> <clusternode name="cluster2.verolengo.privatelan" votes="1" nodeid="2"> <altname name="clusterlan2.verolengo.privatelan" port="6405"/> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="cluster2.verolengo.privatelan"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="pcmk" agent="fence_pcmk"/> </fencedevices> <fence_daemon clean_start="0" post_fail_delay="30" post_join_delay="30"/> <logging debug="on"/> <rm disabled="1"> <failoverdomains/> <resources/> </rm> </cluster> ------------------------------------------------------------------------------------------------------ Pacemaker: PROPERTIES: pcs property set default-resource-stickiness=100 pcs property set no-quorum-policy=ignore STONITH: pcs stonith create ilocluster1 fence_ilo2 action="off" delay="10" \ ipaddr="ilocluster1.verolengo.privatelan" login="cluster2" passwd="test" power_wait="4" \ pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan" op monitor interval=60s pcs stonith create ilocluster2 fence_ilo2 action="off" \ ipaddr="ilocluster2.verolengo.privatelan" login="cluster1" passwd="test" power_wait="4" \ pcmk_host_check="static-list" pcmk_host_list="cluster2.verolengo.privatelan" op monitor interval=60s pcs stonith create pdu1 fence_apc action="off" \ ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \ pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7" \ pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan" op monitor interval=60s pcs stonith level add 1 cluster1.verolengo.privatelan ilocluster1 pcs stonith level add 2 cluster1.verolengo.privatelan pdu1 pcs stonith level add 1 cluster2.verolengo.privatelan ilocluster2 pcs stonith level add 2 cluster2.verolengo.privatelan pdu1 pcs property set stonith-enabled=true pcs property set stonith-action=off SAMPLE RESOURCE: pcs cluster cib dc_cfg pcs -f dc_cfg resource create DCVMDisk ocf:linbit:drbd \ drbd_resource=dc_vm op monitor interval="31s" role="Master" \ op monitor interval="29s" role="Slave" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="180s" pcs -f dc_cfg resource master DCVMDiskClone DCVMDisk \ master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \ notify=true target-role=Started is-managed=true pcs -f dc_cfg resource create DCVM ocf:heartbeat:VirtualDomain \ config=/etc/libvirt/qemu/dc.xml migration_transport=tcp migration_network_suffix=-10g \ hypervisor=qemu:///system meta allow-migrate=false target-role=Started is-managed=true \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="60s" timeout="120s" pcs -f dc_cfg constraint colocation add DCVM DCVMDiskClone INFINITY with-rsc-role=Master pcs -f dc_cfg constraint order promote DCVMDiskClone then start DCVM pcs -f dc_cfg constraint location DCVM prefers cluster2.verolengo.privatelan=50 pcs cluster cib-push firewall_cfg Since I know that pcs still has some rough edges, I installed crmsh too, but never actually used it. Many thanks in advance for your attention. Kind regards, Giuseppe Ragusa -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140703/3374cab0/attachment.htm>