[DRBD-user] Promote fails in state = { cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown r--- }

Lars Ellenberg lars.ellenberg at linbit.com
Fri Jan 20 09:57:55 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Jan 20, 2012 at 06:55:21AM +0000, Oren Nechushtan wrote:
> It seems to me that adding a configuration timout indicating how long to wait before allowing promoting is required, possibly indefinitely by def.
> I understand why you might want to wait for either primary up again or manual recovery.
> However, in active stand two node setup with the system req. to be up ALL the time there is another approach.
> Promote old secondary after a timeout.
> If old primary was down for long time - we are up quickly  and old primary should sync - fine.
> If old primary was down shortly but beyond timeout, SB handlers should recover, possibly with manual recovery.
> Acceptable since we couldnt wait forever
> 
> What say you?

If you don't care for fencing, don't configure it ;-)

Problem here is, there are many failure scenarios.
We can not know if the "old primary" is "down" (he is bad),
or "unreachable" only (we are bad).
What may seem right for one scenario may be very wrong for an other.
If we can not talk to the peer, we just don't know which scenario we have.

Note that we are already talking about multiple failure scenarios here,
for single failure cases it all works out fine.

How to "best" deal with multiple failure cases can likely not be solved
generically, as "best" depends very much on the specific deployment and
use case requirements, and what multiple failure scenarios you can think of.
And because there are near infinite multiple failure scenarios ;-)

You are free to implement whatever policy you want.
I'd not implement that in the fence peer handler though,
but outside of pacemaker and drbd logic.

If you think you want that, I suggest that you add this to your monitoring
(I mean strategic monitoring, outside of pacemaker),
and trigger an automatic "--forced" promotion, if whatever policies you
may come up with decide that this was a good idea, based on whatever
conditions and parameters, current and previous, your monitoring may know about.

And there will always be an other scenario you did not anticipate.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com



More information about the drbd-user mailing list