[DRBD-user] Trouble with active/active

Mon Oct 31 22:15:31 CET 2011

Hello Everyone,

I have the following built from source:

Corosync 1.4.2
Pacemaker 1.1.6
Cman 3.1.7

Corosync, with service.d/pcmk works fine pcmk crm is started etc.. I
have an existing
cib configuration as shown bellow, and the RAs load fine.

<corosync.conf>

totem {

 	version: 2

 	# How long before declaring a token lost (ms)
 	token:          5000

 	# How many token retransmits before forming a new configuration
 	token_retransmits_before_loss_const: 20

 	# How long to wait for join messages in the membership protocol (ms)
 	join:           1000

 	# How long to wait for consensus to be achieved before starting a
new round of membership configuration (ms)
 	consensus:      7500

 	# Turn off the virtual synchrony filter
 	vsftype:        none

 	# Number of messages that may be sent by one processor on receipt of the token
 	max_messages:   20

 	# Disable encryption
 	secauth:	off

 	# How many threads to use for encryption/decryption
 	threads:   	0

 	# Limit generated nodeids to 31-bits (positive signed integers)
 	clear_node_high_bit: yes

 	# Optionally assign a fixed node id (integer)
 	nodeid:         4

 	interface {
 		ringnumber: 0

 		# The following three values need to be set based on your environment
 		bindnetaddr: 192.168.2.0
 		mcastaddr: 226.94.1.1
 		mcastport: 5405
 	}
 }

amf {
 	mode: disabled
}

<cib conf>

node astdrbd1 \
       attributes standby="off"
node astdrbd2 \
       attributes standby="off"
primitive astIP ocf:heartbeat:IPaddr2 \
	op monitor interval="60" timeout="20" \
        params ip="192.168.2.6" cidr_netmask="24" \
        nic="eth2" broadcast="192.168.2.255" \
	lvs_support="true"
primitive astDRBD ocf:linbit:drbd \
	params drbd_resource="r0.res" \
	op monitor role=Master interval="20" timeout="20"\
	op monitor role=Slave interval="30" timeout="20"
ms msAstDRBD astDRBD \
	meta master-max="2" clone-max=2 interleave="true" \
	notify="true" globally-unique="false"
primitive astDLM ocf:pacemaker:controld \
	op monitor interval="120s"
primitive astO2CB ocf:pacemaker:o2cb op monitor interval="120s"
primitive astFilesystem ocf:heartbeat:Filesystem \
	params device="/dev/drbd0" directory="/service" fstype="ocfs2" \
        op monitor interval="120" \
        meta target-role="Started"
order astDrbdAfterIP \
	inf: astIP msAstDRBD
order dlmAfterDRBD \
	inf: msAstDRBD:promote astDLM:start
order o2cbAfterDLM \
	inf:  astDLM:promote astO2CB:start
order astFilesystemAfterO2cb \
	inf: astO2CB:promote astFilesystem:start
colocation astDrbdOnIP \
	inf: msAstDRBD:Master astIP
colocation dlmOnDRBD \
	inf: astDLM msAstDRBD:Master
colocation o2cbOnDLM \
	inf: astO2CB astDLM:Master
colocation astFilesystemOnO2CB \
	inf: astFilesystem astO2CB:Master
location prefer-ast1 astIP inf: astdrbd1
location prefer-ast2 astIP inf: astdrbd2
property $id="cib-bootstrap-options" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        expected-quorum-votes="5" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-recheck-interval="0" \
        cluster-infrastructure="openais"
	rsc_defaults $id="rsc-options" \
	resource-stickiness="100"

Add cman for active/active support into the formula, and I am not sure
how the whole thing
should start spinning

<cluster.conf>

<?xml version="1.0"?>
<cluster name="ASTCluster" config_version="3">
<logging debug="off"/>
<cman expected_votes="1" two_node="1"/>
<clusternodes>
<clusternode name="astdrbd1" nodeid="1">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="astdrbd1"/>
</method>
</fence>
</clusternode>
<clusternode name="astdrbd2" nodeid="2">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="astdrbd2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_pcmk" name="pcmk"/>		
</fencedevices>
</cluster>

/etc/cororosync/service.d/pcmk renamed pcmk.bak

Starting cman works fine

When trying to start pacemaker I get the following:

/etc/init.d/pacemaker start

Oct 27 15:41:54 astdrbd1 pacemakerd: [18628]: info:
crm_log_init_worker: Changed active directory to
/usr/var/lib/heartbeat/cores/root
Oct 27 15:42:07 astdrbd1 pacemakerd: [18630]: info: Invoked: pacemakerd -$
Oct 27 15:42:07 astdrbd1 pacemakerd: [18630]: info:
crm_log_init_worker: Changed active directory to
/usr/var/lib/heartbeat/cores/root
Oct 27 16:17:01 astdrbd1 /USR/SBIN/CRON[30164]: (root) CMD (   cd / &&
run-parts --report /etc/cron.hourly)
Oct 27 17:01:16 astdrbd1 udevd-work[4484]: kernel-provided name
'ocfs2_control' and NAME= 'misc/ocfs2_control' disagree, please use
SYMLINK+= or change the kernel to provide the proper name
Oct 27 17:01:16 astdrbd1 kernel: [26174.953112] ocfs2: Registered
cluster interface user
Oct 27 17:01:16 astdrbd1 kernel: [26175.082045] OCFS2 Node Manager 1.5.0
Oct 27 17:01:16 astdrbd1 kernel: [26175.252185] OCFS2 1.5.0
Oct 27 17:01:17 astdrbd1 ocfs2_controld: [4497]: info:
get_cluster_type: Assuming a 'heartbeat' based cluster
Oct 27 17:01:17 astdrbd1 ocfs2_controld: [4497]: CRIT:
get_cluster_type: This installation of Pacemaker does not support the
'heartbeat' cluster infrastructure.  Terminating.

I never installed heartbeat? I was never quite sure why
/var/lib/heartbeat existed in the first place?

pacemakerd -v

pacemakerd[2038]: 2011/10/31_16:43:47 info: config_find_next:
Processing additional service options...
pacemakerd[2038]: 2011/10/31_16:43:47 info: get_config_opt: Found
'pacemaker' for option: name
pacemakerd[2038]: 2011/10/31_16:43:47 info: get_config_opt: Found '0'
for option: ver
pacemakerd[2038]: 2011/10/31_16:43:47 info: get_cluster_type: Detected
an active 'classic openais (with plugin)' cluster
pacemakerd[2038]: 2011/10/31_16:43:47 info: read_config: Reading
configure for stack: classic openais (with plugin)
pacemakerd[2038]: 2011/10/31_16:43:47 info: config_find_next:
Processing additional service options...
pacemakerd[2038]: 2011/10/31_16:43:47 info: get_config_opt: Found
'pacemaker' for option: name
pacemakerd[2038]: 2011/10/31_16:43:47 info: get_config_opt: Found '0'
for option: ver
pacemakerd[2038]: 2011/10/31_16:43:47 ERROR: read_config: We can only
start Pacemaker from init if using version 1 of the Pacemaker plugin
for Corosync.

Can someone please help me understand what is goiing on here. Because
at this point there are:

Two Cluster Managers (Pacemaker, CMAN)
Two Messaging Layers (Corosync, OpenAIS), and for some
reason some Heartbeat material?
I am not sure but I think I have 2 RA as well (Cluster Labs RA, CMAN RA)?

Please Help,

Nick.