[DRBD-user] DRBD+Pacemaker: Won't promote with only one node

William Seligman seligman at nevis.columbia.edu
Thu Jan 5 18:36:17 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> Message: 3
> Date: Thu, 5 Jan 2012 10:34:28 +0200
> From: Dan Frincu <df.cluster at gmail.com>
> Subject: Re: [DRBD-user] DRBD+Pacemaker: Won't promote with only one
> 	node
> To: drbd-user at lists.linbit.com
> Message-ID:
> 	<CADQRkwiTL-rPDx_4JUDuzsVnCe-ED4UihJNeV7gaVgdSnR5cZw at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> Hi,
> 
> On Wed, Jan 4, 2012 at 10:10 PM, William Seligman
> <seligman at nevis.columbia.edu> wrote:
>> I'll give the technical details in a moment, but I thought I'd start with a
>> description of the problem.
>>
>> I have a two-node active/passive cluster, with DRBD controlled by Pacemaker. I
>> upgraded to DRBD 8.4.x about six months ago (probably too soon); everything was
>> fine. Then last week we did some power-outage tests on our cluster.
>>
>> Each node in the cluster is attached to its own uninterruptible power supply;
>> the STONITH mechanism is to turn off the other node's UPS. In the event of an
>> extended power outage (this happens 2-3 times a year at my site), it's likely
>> that one node will STONITH the other when the other node's UPS runs out of power
>> and shuts it down. This means that when power comes back on, only one node will
>> come back up, since the STONITHed UPS won't turn on again without manual
>> intervention.
>>
>> The problem is that with only one node, Pacemaker+DRBD won't promote the DRBD
>> resource to primary; it just sits there at secondary and won't start up any
>> DRBD-dependent resources. Only when the second node comes back up will Pacemaker
>> assign one of them the primary role. I've confirmed this by shutting down
>> corosync on both nodes, then bringing it up again on just one of them.
>>
> 
> Could you also post your Pacemaker configuration?

Sure. I didn't do this before, since the configuration is complex. I also don't
know which would be more comprehensible, so I've attached both cib.xml and the
result of "crm configure show". I should mention that I'm a lazy bum, so I use
crm-gui to configure corosync; that's why these files look more baroque than usual.

To keep everything in one place, I've also attached the DRBD configuration files
again.

> Also you might want to check
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#id890288
> for no-quorum-policy, in two-node clusters, losing one node means you
> don't have quorum, and unless you something else as a quorum device,
> then the policy is set to stop.

As you'll see, I already have this in my setup:

crm configure property no-quorum-policy=ignore

Another thing I left out: The cluster configuration was working "once upon a
time." The only significant change I made in recent months was upgrading to DRBD
8.4.0 (then to 8.4.1), then changing my DRBD configuration to add fencing plus
split-brain recovery. I don't think any of the changes I've made to the corosync
configuration would have any bearing on DRBD resource promotion when only one
node is available... but I could be wrong.

>> I'm pretty sure that this is due to a mistake I"ve made in made in my DRBD
>> configuration when I fiddled with it during the 8.4.x upgrade. I've attached the
>> files. Can one of you kind folks spot the error?
>>
>> Technical details:
>>
>> Two-node configuration: hypatia and orestes
>> OS: Scientific Linux 5.5, kernel 2.6.18-238.19.1.el5xen
>> Packages:
>> drbd-8.4.1-1
>> corosync-1.2.7-1.1.el5
>> pacemaker-1.0.12-1.el5.centos
>> openais-1.1.3-1.6.el5
>>

Attached: cib.xml, crm-configure-show.txt, global_common.conf, nevis-admin.res

-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib.xml
Type: text/xml
Size: 45185 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120105/8789c157/attachment.bin>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crm-configure-show.txt
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120105/8789c157/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: global_common.conf
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120105/8789c157/attachment.asc>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nevis-admin.res
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120105/8789c157/attachment-0001.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120105/8789c157/attachment-0001.bin>


More information about the drbd-user mailing list