[DRBD-user] Question about Linux-HA, stonith and data loss

Lars Ellenberg Lars.Ellenberg at linbit.com
Thu Dec 15 11:22:32 CET 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2005-12-14 19:51:04 +0100
\ Christof Amelunxen:
> Hi,
> 
> Lars Marowsky-Bree wrote:
> > On 2005-12-14T18:59:29, Christof Amelunxen <ca at ordix.de> wrote:
> >>
> >> 1. NodeA (P) --- NodeB (S)   # everything ok
> >> 2. NodeA (P) - - NodeB (S)   # DRBD detects connection loss, goes WFC
> >> 3. NodeA (P) - - NodeB (S)   # Linux-HA detects split brain, A kills B
> >> 4.   /       - - NodeB (P)   # NodeB takes over, goes primary
> >>
> >> There have been writes on NodeA between step 2 and 3. These are lost
> >> after
> >> Linux-HA has killed A and made B primary. I know the best solution is to
> >> avoid this situation by any chance and we are using serial heartbeats,
> >> too, but what if it happens anyway?
> >
> > The writes are lost.
> 
> Does that mean in this case automatical failover without any human
> intervention is not possible at all if data loss is unacceptable?

In short, yes.

In DRBD 8 we can avoid this.
NodeA will block writes right at 2., and trigger a callback.

It will resume only if the callback (or later cluster manager or
operator intervention) confirms that either NodeB is in fact dead,
or has been told that its data is "stale" (it would then refuse to
become Primary later).
Or, of course, NodeA might be killed itself, as outlined above.
But since it blocked all writes, there are no uncommited writes.

BTW, don't ask when DRBD-8 is production ready.
This list will be the first one to know.


-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list