[DRBD-user] Question about Linux-HA, stonith and data loss

Thu Dec 15 11:22:32 CET 2005

/ 2005-12-14 19:51:04 +0100
\ Christof Amelunxen:
> Hi,
> 
> Lars Marowsky-Bree wrote:
> > On 2005-12-14T18:59:29, Christof Amelunxen <ca at ordix.de> wrote:
> >>
> >> 1. NodeA (P) --- NodeB (S)   # everything ok
> >> 2. NodeA (P) - - NodeB (S)   # DRBD detects connection loss, goes WFC
> >> 3. NodeA (P) - - NodeB (S)   # Linux-HA detects split brain, A kills B
> >> 4.   /       - - NodeB (P)   # NodeB takes over, goes primary
> >>
> >> There have been writes on NodeA between step 2 and 3. These are lost
> >> after
> >> Linux-HA has killed A and made B primary. I know the best solution is to
> >> avoid this situation by any chance and we are using serial heartbeats,
> >> too, but what if it happens anyway?
> >
> > The writes are lost.
> 
> Does that mean in this case automatical failover without any human
> intervention is not possible at all if data loss is unacceptable?

In short, yes.

In DRBD 8 we can avoid this.
NodeA will block writes right at 2., and trigger a callback.

It will resume only if the callback (or later cluster manager or
operator intervention) confirms that either NodeB is in fact dead,
or has been told that its data is "stale" (it would then refuse to
become Primary later).
Or, of course, NodeA might be killed itself, as outlined above.
But since it blocked all writes, there are no uncommited writes.

BTW, don't ask when DRBD-8 is production ready.
This list will be the first one to know.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.