[DRBD-user] Dual Primary Mode: Shared Directory blocked after node crash until reboot

Wed May 13 08:40:11 CEST 2015

On 13/05/15 02:36 AM, DRBD User wrote:
> Ok i understand:
> 
> In a dual primary setup without a valid stonith configuration i have to wait until the crashed node is set to a *known* state: eg. using reboot, manual intervention.
> 
> But what if the crashed node never gets alive:
> Will the stonith setup set the state of the crashed node to a *known* state, so that the active node can continue to operate ?
> Or do I have to intervene manually ?
> 
> So for my plan to have a high available service (which saves its state to a shared directory) a primary/secondary setup may be the way to go - or i is fencing/stonith always a must ?

The cluster (and DRBD) will remain blocked indefinitely. The only way to
resume operation is when fencing succeeds or a human tells DRBD that the
lost node is offline (via manual intervention).

As an aside; Red Hat used to support manual fencing, but dropped support
for it when RHEL 6 was released in 2010. The reason was that, on paper,
manual fencing sounds fine. In practice, it was often misused and lead
to split-brains.

Consider; A cluster is locked up, people are screaming and the poor
admin is trying to remember what to after a year of not touching things.
s/he remembers about manually clearing the fence and then does so
without first verifying the last node was off/dead. Is such a case, you
have a split-brain and all the potential damage that goes with it.

Get hardware fencing. It is the only sane and sensible option.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?