[DRBD-user] Automatic split brain recovery policies - after-sb-0pri - discard-older-primary

Lars Ellenberg lars.ellenberg at linbit.com
Fri Jan 9 13:28:01 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Jan 09, 2009 at 12:25:24PM +0100, GAUTIER Hervé wrote:
> Lars Ellenberg a écrit :
>> your example is _broken_.
>> well, if it is a real-world example, then your setup is broken.
>>
>> seriously.
>>
>> drbd is supposed to be "Connected" in normal operation.
>> lets stick with single Primary for now.
>>
>> t0:  node1 Primary, node2 Secondary, both happy and replicating.
>>
>> while connected, and node1 stays Primary, there is no way that node2 can
>> become Primary. First, the cluster manager should not attempt to promote
>> it, second, while drbd is Connected, there will be only one node in
>> Primary role.
>>
>> lets assume you now lose connectivity,
>> and your cluster manager decides at
>> t1:  that node2 should be promoted, as it assumes it would be the
>> sinlge remaining node of the cluster, while node1 keeps running as
>> Primary, assuming that node2 has crashed.
>> (you now run into a "split brain").
>>
>> t2:  once connectivity is restored,
>> discard-younger-primary would mean that node2 will be discarded,
>> and node1 (which has been Primary before, and was during that period,
>> and probably is still) will be used as sync source.
>>   
>
> I thought that discard-younger-primary (a policy for after-sb-0pri) was  
> only used when the resource was not primary on any nodes ?

well, if you have "after-sb-1pri consensus"...
but anyways.

a node that currently is Primary will refuse to become SyncTarget.
so whatever you configure there, if the algorithm according to your
configuration determines that this would be the "wrong" one,
there is not much drbd can do about it.

> Node1 will be the source even if a lot of secondary->primary and  
> primary->secondary cycles on both nodes occurs while the split-brain. Is  
> it right ?

it is supposed to be that way, yes.

> This question is very important for me.

then you better just _try_ it.
and report back wether it does what it is supposed to do or not.

also note that we do not have any "timestamp" or some such.
we "simply" compare UUIDs.
the actual algorithm used is explained in some of the papers linked from
www.drbd.org/publications, if you want to know the gory details.

maybe you can describe what failure scenario you try to protect against,
how you would like DRBD to handle it, and why.
and, preferably, what information DRBD should base its decisions on,
and how to obtain them...

>> what ever else happens between loss of connectivity and promotion of
>> node2 at t1, and the first reconnect at t2 when the data divergence
>> is detected is supposed to be irrelevant for this auto recovery strategy.
>>
>> if you can argue to use discard-older-primary for your case,
>> just do it. but instead of automatically throwing away diverging
>> changes, I think you should make it less likely to run into that
>> situation in the first place -- unless your priority is being online,
>> and you don't care which data you are online with.
>>   
> And so lost any data change on node 2 in your example ?

I'm not sure what part of the above that question refers to,
but, well, "discard" says it all, does it not?

>> using multiple redundant heartbeat communication links, configuring
>> dopd, and using the wait-for-connection-timeout and
>> degr-wait-for-connection-timeout parameters during drbd configuration
>> (usually during the boot process by the init script) is supposed to help
>> with that, and should cover all multiple error situation we have been
>> able to think of in an "optimal" way, where optimal is defined as least
>> likely to cause data divergence.
>>   
> My aim is to, while a split brain, allow the promotion and demotion of a  
> DRBD disk on the node which was the last to have promoting this DRBD 
> disk.

if all you have is loss of connectivity,
then the former primary is still primary.

> So, if  the cluster manager has another path to communicate (IP network  
> for example),

as long as the cluster manager still has communication,
you do not have "split brain" in a cluster wide sense,
so the cluster manager should not attempt to change DRBD roles.
also dopd would still have a chance to mark the other node
as out-of-date, in case the replication link is down.

> it has to record this information, or have to ask to DRBD  
> if it can answer (even in split-brain situation, which I doubt).

I don't understand?

> With  this information and the other path, it will not allow to
> promote and  demote the DRBD disk on another node than the last one to
> have promoting  this DRBD disk.

huh?

> Maybe dopd will give all I need,

I guess so.

> but I haven't tested it at now.
> And then I think, in order to in line,

I don't understand?

> I have to use the  discard-older-primary for after-sb-0pri policy

why?
as far as I understand it, you should use discard-younger-primary.

> and discard-secondary for after-sb-1pri policy.

why?

sorry, you have sucessfully confused me.

please try to describe your failure scenarios.
distinguish between the various failures.
 * replication link down only?
 * all communication down?
 * where do WRITEs to drbd originate from,
   i.e. can clients still reach the DRBD?
   if so, why can clients still reach the DRBD
   when all cluster communication is down?
   how likely is it that the other node is sill alive,
   when all cluster communication is down?
 * multiple failures in succession?
 * (quasi) simultaneous multiple failures?
 * stonith in use?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list