[DRBD-user] Fw: DRBD STONITH - how is Pacemaker constraint cleared?

Wed Aug 17 00:35:52 CEST 2011

Jake,

Answers to questions:

(Q)  What versions?
(A)  Ubuntu 10.04.2 LTS,  Pacemaker 1.0.8,  DRBD 8.3.7 and LVM 2.02.54 using Corosync

(Q)  Why have LVM in configuration?
(A) I removed LVM from the configuration as well as the group.

(Q) What dependency do you want?
(A)  IP should start after the file system is up and the file system should only mount after DRBD is MASTER.   (I plan on adding MySQL on top of the IP address after this)

(Q) (curiosity) why 59/61 for intervals?
(A) I knew that I had to pick different intervals for the monitors and just picked them off the top of my head.   What are you using?

(Q) STONITH lists both hostnames, why?
(A) I removed STONITH for now to aid in my debugging.   I listed both names since the STONITH seemed to fail.   It appears that the STONITH would not start on the local node because the plugin removed the local host name from the list.   How did you specify your STONITH devices?   Can you give me the exact configuration?

(Q) You specified the grouping of resources incorrectly in the configuration file.
(A) Thanks.   I changed the configuration to no longer use groups and to just use "order" and "colocation".   However, I still see the IP address starting BEFORE the files system completes START (mount).  I have tried playing with the order directive but it does not seem to matter.

Here is my updated configuration:

        node cnode-1-3-5
        node cnode-1-3-6
        primitive glance-drbd-p ocf:linbit:drbd \
                params drbd_resource="glance-repos-drbd" \
                op start interval="0" timeout="240" \
                op stop interval="0" timeout="100" \
                op monitor interval="59s" role="Master" timeout="30s" \
                op monitor interval="61s" role="Slave" timeout="30s"
        primitive glance-fs-p ocf:heartbeat:Filesystem \
                params device="/dev/drbd1" directory="/glance-mount" fstype="ext4" \
                op start interval="0" timeout="60" \
                op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" \
                op stop interval="0" timeout="120"
        primitive glance-ip-p ocf:heartbeat:IPaddr2 \
                params ip="10.4.0.25" nic="br100" \
                op monitor interval="5s"
        ms ms-glance-drbd glance-drbd-p \
                meta master-node-max="1" clone-max="2" clone-node-max="1" globally-unique="false" notify="true" target-role="Started"
        location drbd-fence-by-handler-ms-glance-drbd ms-glance-drbd \
                rule $id="drbd-fence-by-handler-rule-ms-glance-drbd" $role="Master" -inf: #uname ne cnode-1-3-5
        colocation coloc-ip-and-fs-and-drbd inf: glance-ip-p glance-fs-p ms-glance-drbd:Master        order order-glance-drbd-master-before-fs-before-ip inf: ms-glance-drbd:promote glance-fs-p:start glance-ip-p:start 
        property $id="cib-bootstrap-options" \
                dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
                cluster-infrastructure="openais" \
                expected-quorum-votes="2" \
                stonith-enabled="false" \
                no-quorum-policy="ignore" \
                last-lrm-refresh="1313440611"
        rsc_defaults $id="rsc-options" \
                resource-stickiness="100"

Thanks again,

Bob

________________________________
From: Jake Smith <jsmith at argotec.com>
To: Bob Schatz <bschatz at yahoo.com>
Cc: drbd-user at lists.linbit.com
Sent: Tuesday, August 16, 2011 8:31 AM
Subject: Re: [DRBD-user] Fw: DRBD STONITH - how is Pacemaker constraint cleared?

Bob - np that's what we're all out here for.

A few baseline questions I missed earlier:

- What OS?
- What versions of DRBD/Pacmaker/LVM2?
- Not that it should matter but Heartbeat or Corosync?

I'm thinking that the LVM stuff is unnecessary so I put some notes in your config below to remove it.  Also I what is missing to get the startup order working is there too.  Let me know if you have more questions and how you make out.

Jake

----- Original Message ----- 

> From: "Bob Schatz" <bschatz at yahoo.com>
> To: "Jake Smith" <jsmith at argotec.com>
> Cc: drbd-user at lists.linbit.com
> Sent: Monday, August 15, 2011 6:25:22 PM
> Subject: Re: [DRBD-user] Fw: DRBD STONITH - how is Pacemaker
> constraint cleared?

> Jake,

> Thanks for your help!

> Answers to questions:

> 1. (Q) Why do you have LVM defined in the configuration?
> (A) I wanted to make sure the LVM volumes were started before I start
> DRBD (I have DRBD configured on top of LVM). I assume that this
> should be okay.

My config is also DRBD on top of LVM however I don't "start" LVM... I believe it starts on boot.  I don't have anything in my Pacemaker config for LVM; I just start with the DRBD primitive and build out from there.
I believe the purpose of the LVM resource agent is for LVM on top of DRBD not the other way around...

> 2. (Q) Can you clarify what you mean by "DRBD is not started"?
> (A) If I do a "cat /proc/drbd", I see "unconfigured". The DRBD agent
> START routine is never called. I believe this problem will be fixed
> once I work through my other problems.

Got it

> 3. (Q) Colocation appears to be backwards per the documentation.
> (A) Thanks! I changed it per your suggestion. However, the Filesystem
> agent START routine is now called before the DRBD resources enters
> the MASTER state.

There is no order constraint between FS and DRBD - see below in config

> I made the changes you suggested. (I assumed I should not have to
> specify the stopping/demoting sequences but it was the only way I
> could get it to work.)

> After these changes, a timeline of the behavior I see is this
> sequence logged by agent entry point calls:

> 1. Call LVM start and before LVM start finishes
> 2. Call Filesystem start

At this time I expect the IP is up? According to your grouping that would trigger FS start.

> This fails since DRBD volume is readonly

> 3. LVM start completes
> 4. Filesystem stop (called because Filesystem start fails)
> 5. DRBD start called
> 6. DRBD promote called

> My expectation was that the Filesystem start routine would not be
> called until DRBD was MASTER.

> My configuration is:

> node cnode-1-3-5
> node cnode-1-3-6
> primitive glance-drbd-p ocf:linbit:drbd \
> params drbd_resource="glance-repos-drbd" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100" \
> op monitor interval="59s" role="Master" timeout="30s" \
> op monitor interval="61s" role="Slave" timeout="30s"

(curiosity) why 59/61 for intervals?

> primitive glance-fs-p ocf:heartbeat:Filesystem \
> params device="/dev/drbd1" directory="/glance-mount" fstype="ext4" \
> op start interval="0" timeout="60" \
> op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" \
> op stop interval="0" timeout="120"
> primitive glance-ip-p ocf:heartbeat:IPaddr2 \
> params ip="10.4.0.25" nic="br100" \
> op monitor interval="5s"
> primitive glance-lvm-p ocf:heartbeat:LVM \
> params volgrpname="glance-repos" exclusive="true" \
> op start interval="0" timeout="30" \
> op stop interval="0" timeout="30"

Remove the primitive for LVM.

> primitive node-stonith-5-p stonith:external/ipmi \
> op monitor interval="10m" timeout="1m" target_role="Started" \
> params hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.99"
> userid="ADMIN" passwd="foo" interface="lan"

Did you mean to have both node names in the hostname param? Not relevant to the problem...

> primitive node-stonith-6-p stonith:external/ipmi \
> op monitor interval="10m" timeout="1m" target_role="Started" \
> params hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.100"
> userid="ADMIN" passwd="foo" interface="lan"
> group group-glance-fs glance-fs-p glance-ip-p \
> meta target-role="Started"

This group is saying that the glance-fs has to start after the glance-ip and on the same node as the glance-ip... as soon as the IP is started then you're fs will try to start irregardless of the state of your DRBD resource.  You probably know already but group my_group_resx_resy resx resy is basically saying resx must colocate with and order start after resy. What does the IP have to do with the FS? Without knowing I would expect the IP to start after the FS is ready making this one backward...
And lastly you can't have it this way (FS starts after IP in a group) and set FS to start on DRBD master below.  If you make IP rely on FS then it's OK.  If you want it this way then you need to remove the group and do a colo and order like this:

order order-glance-drbd-master-before-ip-before-fs inf: ms-glance-drbd:promote glance-ip-p:start glance-fs-p:start
colocation coloc-fs-and-ip-and-drbd inf: glance-fs-p glance-ip-p ms-glance-drbd:Master

Did I confuse you yet? ;-)

> ms ms-glance-drbd glance-drbd-p \
> meta master-node-max="1" clone-max="2" clone-node-max="1"
> globally-unique="false" notify="true" target-role="Master"
> clone cloneLvm glance-lvm-p

Remove this clone

> location loc-node-stonith-5 node-stonith-5-p \
> rule $id="loc-node-stonith-5-rule" -inf: #uname eq cnode-1-3-5
> location loc-node-stonith-6 node-stonith-6-p \
> rule $id="loc-node-stonith-6-rule" -inf: #uname eq cnode-1-3-6
> colocation coloc-fs-group-and-drbd inf: group-glance-fs
> ms-glance-drbd:Master

Need an order contraint to match with this colo to make sure DRBD is promoted before FS starts like:
order order-glance-drbd-master-before-fs inf: ms-glance-drbd:promote group-glance-fs:start

> order order-glance-lvm-before-drbd inf: cloneLvm:start
> ms-glance-drbd:start

Remove this order

> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="true" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1313440611"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"

*snip*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110816/be96b932/attachment.htm>