Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'll give the technical details in a moment, but I thought I'd start with a description of the problem. I have a two-node active/passive cluster, with DRBD controlled by Pacemaker. I upgraded to DRBD 8.4.x about six months ago (probably too soon); everything was fine. Then last week we did some power-outage tests on our cluster. Each node in the cluster is attached to its own uninterruptible power supply; the STONITH mechanism is to turn off the other node's UPS. In the event of an extended power outage (this happens 2-3 times a year at my site), it's likely that one node will STONITH the other when the other node's UPS runs out of power and shuts it down. This means that when power comes back on, only one node will come back up, since the STONITHed UPS won't turn on again without manual intervention. The problem is that with only one node, Pacemaker+DRBD won't promote the DRBD resource to primary; it just sits there at secondary and won't start up any DRBD-dependent resources. Only when the second node comes back up will Pacemaker assign one of them the primary role. I've confirmed this by shutting down corosync on both nodes, then bringing it up again on just one of them. I'm pretty sure that this is due to a mistake I"ve made in made in my DRBD configuration when I fiddled with it during the 8.4.x upgrade. I've attached the files. Can one of you kind folks spot the error? Technical details: Two-node configuration: hypatia and orestes OS: Scientific Linux 5.5, kernel 2.6.18-238.19.1.el5xen Packages: drbd-8.4.1-1 corosync-1.2.7-1.1.el5 pacemaker-1.0.12-1.el5.centos openais-1.1.3-1.6.el5 Attached: global_common.conf, nevis-admin.res -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: global_common.conf URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120104/93ccab0c/attachment.asc> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: nevis-admin.res URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120104/93ccab0c/attachment.txt> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4497 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120104/93ccab0c/attachment.bin>