[Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem

Philipp Reisner philipp.reisner@linbit.com
Fri, 20 Aug 2004 14:52:52 +0200


On Thursday 19 August 2004 14:14, Lars Ellenberg wrote:
[...]
> > Split-brain Szenarien die mit Primary/Primary (beide StandAlone) enden
> > habe ich schon im neuen Design bedacht (ich schreibe gerade). Was sonst?
>
> gar nicht soo unwahrscheinlich:
>
> wenn der primary stirbt (oder getötet wird), aber vor dem sterben
> irgendwie noch geschafft hat, seine drbd connection zu verlieren _und_
> daher den "ConnectedCount" hochgezählt hat...
>
> der "slave" wird jetzt Secondary->Primary, zählt aber, weil < Connected
> den ArbitraryCount hoch...
>
> situation beim nächsten connect:
>
>  Flags: consistent,             ,been primary last time
>
> früherer Primary  1:X:Y:a+1:b  :10 (nach reboot jetzt Secondary)
> jetziger Primary  1:X:Y:a  :b+1:10
>
> doh. jetziger Primary soll SyncTarget werden... shitty.
> --> jetziger Primary goes StandAlone.
>
> nächster verbindungsversuch (von operator eingeleitet)
> ... -> "split brain detected"
> --> both go StandAlone
>
> u.U. müssen wir einen zusätzlichen counter einführen, einen "CRM
> count", und der CRM muss, wenn er den anderen node geschossen hat,
> sicherheitshalber ein drbdsetup "--crm" (vgl. --human) primary
> machen, dass würde zumindest das oben beschriebene scenario auflösen...
>

Hi,

Right, old toppic: What should we do after a split-brain situation.
I have looked up my papers from 2001 to unterstand, why it is done 
the way it is today:

The situation:

 N1    N2
 P --- S   Everything ok.
 P - - S   Link breaks.
 P - - P   A (also split-brained) Cluster-mgr makes N2 primary too.
 X     X   Both nodes down.
 P --- S   The current behaviour. 

What should be done after Split brain ? 

The current policy is, that the node that was Primary before the
split-brain situation should be primary afterwards.

This Policy is hard-coded into DRBD. It is an arbitrary decission, 
I thought it is a good idea.

The question are:
Should this policy be configurable ? (IMO: yes)
Which policies do we want to offer ?

 * The node that was primary before split brain (current behaviour)
 * The node that becaume primary during split brain 
 * The node that modified more of it's data during the split-brain
   situation  [ Do not think about implementation yet, just about
                the policy ]
 * others ?...

The second question to answer is:
What should we do if the connecting network heals ? I.e.

 N1    N2
 P --- S   Everything ok.
 P - - S   Link breaks.
 P - - P   A (also split-brained) Cluster-mgr makes N2 primary too.
 ? --- ?   What now ?

Current policy: The two nodes will refuse to connect. The administrator
                has to resove this.

Are there any other policies that would make sense ?

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :