[DRBD-user] DRBD+Pacemaker: Won't promote with only one node

William Seligman seligman at nevis.columbia.edu
Thu Jan 5 18:06:25 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> Message: 1
> Date: Wed, 4 Jan 2012 15:58:09 -0500
> From: "Dan Barker" <dbarker at visioncomm.net>
> Subject: Re: [DRBD-user] DRBD+Pacemaker: Won't promote with only one
> 	node
> To: <drbd-user at lists.linbit.com>
> Message-ID: <01a901cccb23$90815270$b183f750$@visioncomm.net>
> Content-Type: text/plain;	charset="UTF-8"
> 
> I'd say the error is in the STONITH method. You evidently are giving the UPS a "SHUTDOWN" command when you should be giving it a "SLEEP" or "SUSPEND" command (Whatever your UPS Vendor's idea of power off the outlets only until the mains come on and have charged the batteries to above 5% or whatever. With the APC family and a Network Card, there are very fine controls over this sort of action. The APIs published are fairly primitive. I had to write SNMP routines to make my APC do what I wanted, when I wanted. The doc was like pulling teeth to find. If you have APC equipment, I can share. If not, what do you have? What controls does it publish?

I'm using APC SMART-UPSes, and issuing the SHUTDOWN command as you suspected. I
wouldn't mind seeing the SNMP write-up you've got on the obscure APC API.

However, I believe what I've got is what I want to do. Suppose one node STONITHs
another for reasons that have nothing to do with a power outage. I don't want
the STONITHed UPS to come back on for any reason. I'm concerned about the
(admittedly unlikely) possibility that a node is STONITHed because it goes
wonky, and then there's a power outage. Upon power recovery I'd get the wonky
node trying to rejoin the cluster again.

So I think I've got STONITH set up satisfactorily. I just need help figuring out
why a single node's DRBD resource is not being promoted to primary after a restart.

On 1/4/12 3:10 PM, William Seligman wrote:
> I'll give the technical details in a moment, but I thought I'd start with a
> description of the problem.
> 
> I have a two-node active/passive cluster, with DRBD controlled by Pacemaker. I
> upgraded to DRBD 8.4.x about six months ago (probably too soon); everything was
> fine. Then last week we did some power-outage tests on our cluster.
> 
> Each node in the cluster is attached to its own uninterruptible power supply;
> the STONITH mechanism is to turn off the other node's UPS. In the event of an
> extended power outage (this happens 2-3 times a year at my site), it's likely
> that one node will STONITH the other when the other node's UPS runs out of power
> and shuts it down. This means that when power comes back on, only one node will
> come back up, since the STONITHed UPS won't turn on again without manual
> intervention.
> 
> The problem is that with only one node, Pacemaker+DRBD won't promote the DRBD
> resource to primary; it just sits there at secondary and won't start up any
> DRBD-dependent resources. Only when the second node comes back up will Pacemaker
> assign one of them the primary role. I've confirmed this by shutting down
> corosync on both nodes, then bringing it up again on just one of them.
> 
> I'm pretty sure that this is due to a mistake I"ve made in made in my DRBD
> configuration when I fiddled with it during the 8.4.x upgrade. I've attached the
> files. Can one of you kind folks spot the error?
> 
> Technical details:
> 
> Two-node configuration: hypatia and orestes
> OS: Scientific Linux 5.5, kernel 2.6.18-238.19.1.el5xen
> Packages:
> drbd-8.4.1-1
> corosync-1.2.7-1.1.el5
> pacemaker-1.0.12-1.el5.centos
> openais-1.1.3-1.6.el5


-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120105/0e41abf0/attachment.bin>


More information about the drbd-user mailing list