[DRBD-user] What factors decide: "split-brain detected"?

Lars Ellenberg lars.ellenberg at linbit.com
Fri Mar 6 21:38:49 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Mar 04, 2009 at 09:23:50AM +0100, Rustedt, Florian wrote:
> Hello list,
> 
> What exact is the reason for drbd(8.3.0) to detect split-brain( on dual-primary)?
> 
> Parallel write-access?

no.
that would log "conflicting write detected" or some such.

> Too short delay between two write-accesses on both sides, although they are sequential?

no.

you are looking in a wrong direction.


split-brain is a situation when nodes can not communicate.
it can only be detected once they do communicate again.

simplifying some special cases,
whenever DRBD is Primary without being able to communicate with its
peer, it generates a "uuid" (large "random" number) to tag its
"data generation". it keeps some history of former such uuids.

during DRBD network handshake, the peers compare their set of uuids
(current, bitmap, history...).
if one is a strict ancestor of the other (the "current" uuid of one node
is the "bitmap"-uuid of the other, that decides the syncdirection,
as it is clear which one has the "better", more recent, data.

if both nodes share some (all) former uuids,
but both have a new, different, "current" uuid,
well, that is when "split-brain" is detected: now they can determin that
they used to have the same dataset, but then lost communication,
and both proceeded to modify the data, independently.

there is much more detail about that uuid scheme and algorithm
in some of the papers/publications at drbd.org.


your other posts indicate that you simply try to do xen migration
using DRBD as the xen image backing store.

and you seem to think that the migration causes the split-brain, or the
split-brain detection.  that is not so. you are looking at the wrong end
of the problem.


whenever you see "split-brain detected",
then you should go back,
and find when, where, and why, the "split-brain" was _caused_. 
becaust there and then is the problem you should solve.

when and why does DRBD lose the connection?
while being primary on both nodes?
or is it made primary without being connected?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list