Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> Message: 1 > Date: Wed, 4 Jan 2012 15:58:09 -0500 > From: "Dan Barker" <dbarker at visioncomm.net> > Subject: Re: [DRBD-user] DRBD+Pacemaker: Won't promote with only one > node > To: <drbd-user at lists.linbit.com> > Message-ID: <01a901cccb23$90815270$b183f750$@visioncomm.net> > Content-Type: text/plain; charset="UTF-8" > > I'd say the error is in the STONITH method. You evidently are giving the UPS a "SHUTDOWN" command when you should be giving it a "SLEEP" or "SUSPEND" command (Whatever your UPS Vendor's idea of power off the outlets only until the mains come on and have charged the batteries to above 5% or whatever. With the APC family and a Network Card, there are very fine controls over this sort of action. The APIs published are fairly primitive. I had to write SNMP routines to make my APC do what I wanted, when I wanted. The doc was like pulling teeth to find. If you have APC equipment, I can share. If not, what do you have? What controls does it publish? I'm using APC SMART-UPSes, and issuing the SHUTDOWN command as you suspected. I wouldn't mind seeing the SNMP write-up you've got on the obscure APC API. However, I believe what I've got is what I want to do. Suppose one node STONITHs another for reasons that have nothing to do with a power outage. I don't want the STONITHed UPS to come back on for any reason. I'm concerned about the (admittedly unlikely) possibility that a node is STONITHed because it goes wonky, and then there's a power outage. Upon power recovery I'd get the wonky node trying to rejoin the cluster again. So I think I've got STONITH set up satisfactorily. I just need help figuring out why a single node's DRBD resource is not being promoted to primary after a restart. On 1/4/12 3:10 PM, William Seligman wrote: > I'll give the technical details in a moment, but I thought I'd start with a > description of the problem. > > I have a two-node active/passive cluster, with DRBD controlled by Pacemaker. I > upgraded to DRBD 8.4.x about six months ago (probably too soon); everything was > fine. Then last week we did some power-outage tests on our cluster. > > Each node in the cluster is attached to its own uninterruptible power supply; > the STONITH mechanism is to turn off the other node's UPS. In the event of an > extended power outage (this happens 2-3 times a year at my site), it's likely > that one node will STONITH the other when the other node's UPS runs out of power > and shuts it down. This means that when power comes back on, only one node will > come back up, since the STONITHed UPS won't turn on again without manual > intervention. > > The problem is that with only one node, Pacemaker+DRBD won't promote the DRBD > resource to primary; it just sits there at secondary and won't start up any > DRBD-dependent resources. Only when the second node comes back up will Pacemaker > assign one of them the primary role. I've confirmed this by shutting down > corosync on both nodes, then bringing it up again on just one of them. > > I'm pretty sure that this is due to a mistake I"ve made in made in my DRBD > configuration when I fiddled with it during the 8.4.x upgrade. I've attached the > files. Can one of you kind folks spot the error? > > Technical details: > > Two-node configuration: hypatia and orestes > OS: Scientific Linux 5.5, kernel 2.6.18-238.19.1.el5xen > Packages: > drbd-8.4.1-1 > corosync-1.2.7-1.1.el5 > pacemaker-1.0.12-1.el5.centos > openais-1.1.3-1.6.el5 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4497 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120105/0e41abf0/attachment.bin>