[DRBD-user] Automatic split brain recovery policies - after-sb-0pri - discard-older-primary

Fri Jan 9 12:25:24 CET 2009

Lars Ellenberg a écrit :
> your example is _broken_.
> well, if it is a real-world example, then your setup is broken.
>
> seriously.
>
> drbd is supposed to be "Connected" in normal operation.
> lets stick with single Primary for now.
>
> t0:  node1 Primary, node2 Secondary, both happy and replicating.
>
> while connected, and node1 stays Primary, there is no way that node2 can
> become Primary. First, the cluster manager should not attempt to promote
> it, second, while drbd is Connected, there will be only one node in
> Primary role.
>
> lets assume you now lose connectivity,
> and your cluster manager decides at
> t1:  that node2 should be promoted, as it assumes it would be the
> sinlge remaining node of the cluster, while node1 keeps running as
> Primary, assuming that node2 has crashed.
> (you now run into a "split brain").
>
> t2:  once connectivity is restored,
> discard-younger-primary would mean that node2 will be discarded,
> and node1 (which has been Primary before, and was during that period,
> and probably is still) will be used as sync source.
>   

I thought that discard-younger-primary (a policy for after-sb-0pri) was 
only used when the resource was not primary on any nodes ?

Node1 will be the source even if a lot of secondary->primary and 
primary->secondary cycles on both nodes occurs while the split-brain. Is 
it right ?
This question is very important for me.

> what ever else happens between loss of connectivity and promotion of
> node2 at t1, and the first reconnect at t2 when the data divergence
> is detected is supposed to be irrelevant for this auto recovery strategy.
>
> if you can argue to use discard-older-primary for your case,
> just do it. but instead of automatically throwing away diverging
> changes, I think you should make it less likely to run into that
> situation in the first place -- unless your priority is being online,
> and you don't care which data you are online with.
>   
And so lost any data change on node 2 in your example ?
> using multiple redundant heartbeat communication links, configuring
> dopd, and using the wait-for-connection-timeout and
> degr-wait-for-connection-timeout parameters during drbd configuration
> (usually during the boot process by the init script) is supposed to help
> with that, and should cover all multiple error situation we have been
> able to think of in an "optimal" way, where optimal is defined as least
> likely to cause data divergence.
>   
My aim is to, while a split brain, allow the promotion and demotion of a 
DRBD disk on the node which was the last to have promoting this DRBD disk.
So, if  the cluster manager has another path to communicate (IP network 
for example), it has to record this information, or have to ask to DRBD 
if it can answer (even in split-brain situation, which I doubt). With 
this information and the other path, it will not allow to promote and 
demote the DRBD disk on another node than the last one to have promoting 
this DRBD disk.
Maybe dopd will give all I need, but I haven't tested it at now.
And then I think, in order to in line, I have to use the 
discard-older-primary for after-sb-0pri policy and discard-secondary for 
after-sb-1pri policy.

Thank for your time.

-- 
Hervé GAUTIER