[DRBD-user] KVM on multiple DRBD resources + IPMI stonith - questions

Fri Jul 1 15:43:36 CEST 2011

Am 24.06.2011 um 16:58 schrieb Whit Blauvelt:

> I truly appreciate that, Pete. What I've been hoping to find is
> distribution-neutral models for handling KVM failover, where each  
> KVM VM is
> directly on top of a dedicated DRBD resource - which is a setup with  
> an
> ideal granularity, as compared to having multiple VMs sharing a  
> single DRBD
> resource. Because the granularity is right-sized, it greatly  
> simplifies the
> possible failure modes, and what should be needed for reliable  
> failover. For
> example, if it were one shared DRBD resource for many VMs running  
> across two
> hosts, it would need to run primary-primary, with CLVM and a  
> clustering file
> system and the full pacemaker-corosync/heartbeat treatment. I get  
> that. But
> with dedicated, per-VM DRBD resources, each can be run primary- 
> secondary (if
> you don't mind a few seconds down during migration - which will be  
> there in
> failover in any case), so there's no need for CLVM or one of the (less
> mature than ext4 or xfs) clustering file systems in the arrangement.

ACK. Such a configuration should be usual and wide-spread:
A two node cluster with some VMs running criss-cross, each on a  
dedicated Primary/secondary DRBD resource.

One can do it with KVM or Xen or something similar, Primary/Primary  
with live-migration, or Primary/Secondary without.

IMHO it is a very common configuration.

> There also should be a whole lot less needed on the pacemaker-
> corosync/heartbeat side. What I've been hoping to find is  
> documentation on
> just enough of pacemaker-corosync/heartbeat to handle this simplified
> architecture adequately. But most of the documentation isn't aimed  
> towards
> an architecture like this at all, and just about nothing I've found
> addresses a KVM environment.

I join in ranting: there is a lot docs and HOWTOs out there, but  
nothing of sufficient quality.

It still needs many hours (days) of trial and error.

Here is my crm conf for a similar XEN on dedicated DRBD (shortened to  
two VMs):

node $id="..." xen11
node $id="..." xen10

primitive xen_cmsdb ocf:heartbeat:Xen \
         params xmfile="/etc/xen/cmsdb.cfg" \
         op monitor interval="3s" timeout="30s" \
         op start interval="0" timeout="60s" \
         op stop interval="0" timeout="40s" \
         meta target-role="Started" allow-migrate="false"
primitive xen_www ocf:heartbeat:Xen \
         params xmfile="/etc/xen/www.cfg" \
         op monitor interval="3s" timeout="30s" \
         op start interval="0" timeout="60s" \
         op stop interval="0" timeout="40s" \
         meta target-role="started" allow-migrate="false"

primitive xen_drbd1_1 ocf:linbit:drbd \
         params drbd_resource="drbd1_1" \
         op monitor interval="15s" \
         op start interval="0" timeout="240s" \
         op stop interval="0" timeout="100s"
primitive xen_drbd1_2 ocf:linbit:drbd \
         params drbd_resource="drbd1_2" \
         op monitor interval="15s" \
         op start interval="0" timeout="240s" \
         op stop interval="0" timeout="100s"
primitive xen_drbd5_1 ocf:linbit:drbd \
         params drbd_resource="drbd5_1" \
         op monitor interval="15s" \
         op start interval="0" timeout="240s" \
         op stop interval="0" timeout="100s"
primitive xen_drbd5_2 ocf:linbit:drbd \
         params drbd_resource="drbd5_2" \
         op monitor interval="15s" \
         op start interval="0" timeout="240s" \
         op stop interval="0" timeout="100s"

group group_drbd1 xen_drbd1_1 xen_drbd1_2
group group_drbd5 xen_drbd5_1 xen_drbd5_2

ms DrbdClone1 group_drbd1 \
         meta master_max="1" master-mode-max="1" clone-max="2" clone- 
node-max="1" notify="true"
ms DrbdClone5 group_drbd5 \
         meta master_max="1" master-mode-max="1" clone-max="2" clone- 
node-max="1" notify="true"

location cli-prefer-xen_cmsdb xen_cmsdb \
         rule $id="cli-prefer-rule-xen_cmsdb" inf: #uname eq xen10
location cli-prefer-xen_www xen_www \
         rule $id="cli-prefer-rule-xen_www" inf: #uname eq xen11

location prefer_xen10_cmsdb xen_cmsdb 50: xen10
location prefer_xen10_www xen_www 40: xen10

location prefer_xen11_cmsdb xen_cmsdb 40: xen11
location prefer_xen11_www xen_www 50: xen11

colocation xen_cmsdb_and_drbd inf: xen_cmsdb DrbdClone5:Master
colocation xen_www_and_drbd inf: xen_www DrbdClone1:Master

order xen_cmsdb_after_drbd inf: DrbdClone5:promote xen_cmsdb:start
order xen_www_after_drbd inf: DrbdClone1:promote xen_www:start

property $id="cib-bootstrap-options" \
         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
         cluster-infrastructure="Heartbeat" \
         stonith-enabled="false" \
         no-quorum-policy="ignore" \
         last-lrm-refresh="1309523267"

rsc_defaults $id="rsc-options" \
         resource-stickiness="100"

HTH

Helmut Wollmersdorfer