[DRBD-user] howto recover after split brain [DRBD syncronization order specification.]

Lars Ellenberg Lars.Ellenberg at linbit.com
Fri Jun 17 21:32:15 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2005-06-13 15:34:59 +0400
\ Igor Yu. Zhbanov:
> Hello!
> 
> In some situations such as split-brain it is hard to predict syncronization
> order of DRBD.
if you know what happened, you can predict.
if you don't know, you can use the "read_gc.pl" script (testing
directory of the tgz) to read the generation counters on both nodes,
and you know again...

> So is it possible to ask DRBD to test whether it will be SyncTarget or
> SyncSource?

not directly, only by reading the generation counters by hand/script.
btw, a running Primary will refuse to become SyncTarget, and rather go
StandAlone.

> Or is it possible to specify synchronization order
> without invalidating the whole device?

not directly, but by using the "--human" flag when appropriate,
or by carefully manipulating the generation counters by hand.

> It will help to recover after split-brains, so more recent data will not be
> lost and total synchronization of 1.2 TB disk array will be avoided.

well, after a split brain situation, you have to recover by hand
anyways, and you will lose data, anyways.

> If it not already possible, I think it will be two very useful features of
> cluster management:
> 1) Probing a synchronization order without starting the synchronization.

no.

> 2) Explicit chosing the synchronization order without total invalidation,
>    so sychronization can be quick.

no.

But.

you recognize a split brain situation.
 - make sure that both nodes are disconnected,
   both# drbdadm disconnect all
 - stop heartbeat and/or services and monitoring thingies
   both# <whatever> stop
 - make the one with the worse data secondary
   worse# drbdadm secondary all
   (still disconnected!)
 - make the one with the better data secondary
   better# drbdadm secondary all
   (still disconnected!)
 - make the one with the better data primary, using the --human flag
   better# drbdadm -- --human primary all
   (still disconnected!)
 - reconnect the devices
   both# drbdadm reconnect all
 - start heartbeat and/or services ...
   both# <whatever> start

should pretty much work as good as it gets.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list