[DRBD-user] Automatic split brain recovery policies - after-sb-0pri - discard-older-primary

Lars Ellenberg lars.ellenberg at linbit.com
Thu Jan 8 10:41:37 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Jan 07, 2009 at 02:10:56PM +0100, GAUTIER Hervé wrote:
> Dominik Klein a écrit :
>>> I don't understand why this policy is very rarely used.
>>>     
>>
>> Well, the "older" primary is presumably the one with more changes. So
>> why automatically throw away that instead of the presumably "less"
>> changes on the "younger" primary?
>>
>> That's just my explanation for it though.
>>
>>   
>>> And in case of "discard-younger-primary" policy, node1 will be selected
>>> as the sync source. Am I right ?
>>>     
>>
>> Yes.
>>
>> If I could give an answer on the rest of this E-Mail, I would have, so
>> please don't ask why I didn't :)
>>
>>
>>   
> Thank you for your light Dominik.
> Anybody else for my example ?

your example is _broken_.
well, if it is a real-world example, then your setup is broken.

seriously.

drbd is supposed to be "Connected" in normal operation.
lets stick with single Primary for now.

t0:  node1 Primary, node2 Secondary, both happy and replicating.

while connected, and node1 stays Primary, there is no way that node2 can
become Primary. First, the cluster manager should not attempt to promote
it, second, while drbd is Connected, there will be only one node in
Primary role.

lets assume you now lose connectivity,
and your cluster manager decides at
t1:  that node2 should be promoted, as it assumes it would be the
sinlge remaining node of the cluster, while node1 keeps running as
Primary, assuming that node2 has crashed.
(you now run into a "split brain").

t2:  once connectivity is restored,
discard-younger-primary would mean that node2 will be discarded,
and node1 (which has been Primary before, and was during that period,
and probably is still) will be used as sync source.

what ever else happens between loss of connectivity and promotion of
node2 at t1, and the first reconnect at t2 when the data divergence
is detected is supposed to be irrelevant for this auto recovery strategy.

if you can argue to use discard-older-primary for your case,
just do it. but instead of automatically throwing away diverging
changes, I think you should make it less likely to run into that
situation in the first place -- unless your priority is being online,
and you don't care which data you are online with.

using multiple redundant heartbeat communication links, configuring
dopd, and using the wait-for-connection-timeout and
degr-wait-for-connection-timeout parameters during drbd configuration
(usually during the boot process by the init script) is supposed to help
with that, and should cover all multiple error situation we have been
able to think of in an "optimal" way, where optimal is defined as least
likely to cause data divergence.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list