Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Aug 10, 2007 at 12:57:03PM +0000, paddy at panici.net wrote: > On Fri, Aug 10, 2007 at 04:05:02AM -0500, Abraham olivares Varela wrote: > > Hi everybody, > > > > > > Does anybody knows how can i configurate the Automatic split-brain > > recovery strategies, in order to avoid a "split brain situation". ? > > > > any example or any idea to do it that. > > > > please help me > > > > On the one hand you talk about recovery and on the other hand you > talk about avoiding it. > > I fear you may already be incurable ;-) > > > Do you *ever* want to go split brain. are there scenarios for you where > that would be preferable and you want to think about what you are going > to do afterwards, or would you prefer to avoid it ever happening ? basic problem is, that currently, with drbd 8 and two-primaries, as necessary for cluster file systems, drbd will _always_ run into a resource-internal (drbd specific) split brain as soon as you lose the replication link, even if it is a very short network hiccup, even if there has been no io on-the-fly. that is because we did not yet implement any freeze-io due to loss of write-quorum for drbd8 yet. so as of now, if you want to use cluster file system with drbd8, and you expect network hiccups, you run into "split-brain". then you either always need to recover this by hand. your you can configure some non-intrusive after-split-brain handler, like the "discard-zero-changes", which would have nothing to "discard", and "feature" auto-rejoin when there had been no-in-flight io, or no changes on one side. once you start using destructive settings, "auto-recovery strategies" get very ugly very quickly, though. and they are not a solution to the problem, but only a work-around for those that commonly run into these problems, e.g. people how configure two-primaries, but usually only accessing it exclusively from one node (xen images). if you have a network hiccup here, you will only have changes on one node, so you are fine with the "discard-zero-changes" option. solution is related to the implementation of (dynamically reconfigurable) write-quorum, suspending IO as soon as we lose quorum, then timeout, arbitrate, retransmit, resume ... sorry, no time frame on that, besides "as soon as possible", we are very busy with a lot of things around here :) -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.