[DRBD-user] Automatic split-brain recovery strategies configuration

Mon Aug 13 10:36:11 CEST 2007

Hi,

Thanks for your answer , i only hava one last question.

I got two nodes one primary and one secondary like these:

Node 1 connects to Node 2 by a switched network and Node 1 connects to Node 2 by a crossover cable.
Consider the following software configuration:

DRBD communicates over the switched network and Heartbeat (HA) communicates over both networks.
Now, following these steps I can repeat the case when  i have both nodes in a standalone state which
require user intervention to reconnect the current PRIMARY DRBD node.

1.) Node 1 is DRBD Primary and Node 2 is DRBD Secondary
2.) I pull the switched network cable from node 1
3.) Node 1 detects the failure and HA moves the resources to node 2
4.) Node 2 is now StandAlone->Primary/Unknown and node 1 is StandAlone->Secondary/Unknown
5.) I reconnect the switched network cable from node 1 and it is at this moment when i got "split-brain".
my question is, there is any posibility to avoid the "split brain" situation when y reconnect the cable from node1?

Thanks 
Best Regards

Abraham OLIVARES

Lars Ellenberg <lars.ellenberg at linbit.com> escribió: On Fri, Aug 10, 2007 at 12:57:03PM +0000, paddy at panici.net wrote:
> On Fri, Aug 10, 2007 at 04:05:02AM -0500, Abraham olivares Varela wrote:
> > Hi everybody,
> > 
> > 
> > Does anybody knows how can i configurate the Automatic split-brain
> > recovery strategies, in order to avoid a "split brain situation". ?
> > 
> > any example or any idea to do it that.
> > 
> > please help me
> >
> 
> On the one hand you talk about recovery and on the other hand you
> talk about avoiding it.  
> 
> I fear you may already be incurable ;-)
> 
> 
> Do you *ever* want to go split brain.  are there scenarios for you where
> that would be preferable and you want to think about what you are going 
> to do afterwards, or would you prefer to avoid it ever happening ?

basic problem is, that currently, with drbd 8 and two-primaries, as
necessary for cluster file systems, drbd will _always_ run into a
resource-internal (drbd specific) split brain as soon as you lose
the replication link, even if it is a very short network hiccup,
even if there has been no io on-the-fly.

that is because we did not yet implement any freeze-io due to loss of
write-quorum for drbd8 yet.

so as of now, if you want to use cluster file system with drbd8,
and you expect network hiccups, you run into "split-brain".

then you either always need to recover this by hand.

your you can configure some non-intrusive after-split-brain handler,
like the "discard-zero-changes", which would have nothing to "discard",
and "feature" auto-rejoin when there had been no-in-flight io, or no
changes on one side.

once you start using destructive settings,
"auto-recovery strategies" get very ugly very quickly, though.
and they are not a solution to the problem,
but only a work-around for those that commonly run into these problems,
e.g. people how configure two-primaries, but usually only accessing it
exclusively from one node (xen images). if you have a network hiccup
here, you will only have changes on one node, so you are fine with the
"discard-zero-changes" option.

solution is related to the implementation of (dynamically
reconfigurable) write-quorum, suspending IO as soon as we lose
quorum, then timeout, arbitrate, retransmit, resume ...
sorry, no time frame on that, besides "as soon as possible",
we are very busy with a lot of things around here :)

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

---------------------------------

¡Sé un mejor ambientalista!
Encuentra consejos para cuidar el lugar donde vivimos.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070813/dacb4cc6/attachment.htm>