[DRBD-user] Automatic split-brain recovery strategies configuration
Abraham olivares Varela
abraham_ov at yahoo.com.mx
Mon Aug 13 10:36:11 CEST 2007
Thanks for your answer , i only hava one last question.
I got two nodes one primary and one secondary like these:
Node 1 connects to Node 2 by a switched network and Node 1 connects to Node 2 by a crossover cable.
Consider the following software configuration:
DRBD communicates over the switched network and Heartbeat (HA) communicates over both networks.
Now, following these steps I can repeat the case when i have both nodes in a standalone state which
require user intervention to reconnect the current PRIMARY DRBD node.
1.) Node 1 is DRBD Primary and Node 2 is DRBD Secondary
2.) I pull the switched network cable from node 1
3.) Node 1 detects the failure and HA moves the resources to node 2
4.) Node 2 is now StandAlone->Primary/Unknown and node 1 is StandAlone->Secondary/Unknown
5.) I reconnect the switched network cable from node 1 and it is at this moment when i got "split-brain".
my question is, there is any posibility to avoid the "split brain" situation when y reconnect the cable from node1?
Lars Ellenberg <lars.ellenberg at linbit.com> escribió: On Fri, Aug 10, 2007 at 12:57:03PM +0000, paddy at panici.net wrote:
> On Fri, Aug 10, 2007 at 04:05:02AM -0500, Abraham olivares Varela wrote:
> > Hi everybody,
> > Does anybody knows how can i configurate the Automatic split-brain
> > recovery strategies, in order to avoid a "split brain situation". ?
> > any example or any idea to do it that.
> > please help me
> On the one hand you talk about recovery and on the other hand you
> talk about avoiding it.
> I fear you may already be incurable ;-)
> Do you *ever* want to go split brain. are there scenarios for you where
> that would be preferable and you want to think about what you are going
> to do afterwards, or would you prefer to avoid it ever happening ?
basic problem is, that currently, with drbd 8 and two-primaries, as
necessary for cluster file systems, drbd will _always_ run into a
resource-internal (drbd specific) split brain as soon as you lose
the replication link, even if it is a very short network hiccup,
even if there has been no io on-the-fly.
that is because we did not yet implement any freeze-io due to loss of
write-quorum for drbd8 yet.
so as of now, if you want to use cluster file system with drbd8,
and you expect network hiccups, you run into "split-brain".
then you either always need to recover this by hand.
your you can configure some non-intrusive after-split-brain handler,
like the "discard-zero-changes", which would have nothing to "discard",
and "feature" auto-rejoin when there had been no-in-flight io, or no
changes on one side.
once you start using destructive settings,
"auto-recovery strategies" get very ugly very quickly, though.
and they are not a solution to the problem,
but only a work-around for those that commonly run into these problems,
e.g. people how configure two-primaries, but usually only accessing it
exclusively from one node (xen images). if you have a network hiccup
here, you will only have changes on one node, so you are fine with the
solution is related to the implementation of (dynamically
reconfigurable) write-quorum, suspending IO as soon as we lose
quorum, then timeout, arbitrate, retransmit, resume ...
sorry, no time frame on that, besides "as soon as possible",
we are very busy with a lot of things around here :)
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
please use the "List-Reply" function of your email client.
drbd-user mailing list
drbd-user at lists.linbit.com
¡Sé un mejor ambientalista!
Encuentra consejos para cuidar el lugar donde vivimos.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the drbd-user