[Drbd-dev] DRBD8: Split-brain if primary and syncTarget
Montrose, Ernest
Ernest.Montrose at stratus.com
Mon Mar 12 15:36:31 CET 2007
Phil,
Our config is close to what you suggested. But we have after_sb_0pri set to
"discard_zero_changes". Hmmm...I have to test and think about this some more.
Thanks,
EM--
-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner at linbit.com]
Sent: Monday, March 12, 2007 10:28 AM
To: drbd-dev at linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: Split-brain if primary and syncTarget
Am Donnerstag, 8. März 2007 23:21 schrieb Montrose, Ernest:
> Hi all,
>
> We are seeing an issue with split brain if one node is syncing as
> syncTarget while being Primary.
> two node A and B.
> * make B primary and the syncTarget
> * Start a sync.
> * ifdown eth1 to break communication
> * ifup eth1.
> * then on the node in standalone "drbdadm connect"
> We get a split-brain.
>
> I think the problem is that if we are primary and we lose contact from
> the other side we generate a new current UUID which causes a Split-Brain
> next time we connect.
> This only happens if we are the sync target and we are primary. Perhaps
> we should not generate a UUID if we were syncing when the disconnect
> happen. Below is a possible patch for this in after_state_ch():
Hi Ernest,
I think the current behaviour is correct.
* When a node is SyncTarget it actually exposes the data of the sync
source node to its applications. (And the applications can potentially
see the data when the SyncTarget node is primary.)
* When you disconnect such a node, it has to fall back to its local
data set. == suddenly the applications see a different data set,
and of course the apps might continue to modify this data set...
* Wen you reconnect this, you have a split brain situation. But you
might let the automatic-split-brain resolving handler solve the
situation. Use some after-sb-?pri settings, and an rr-conflict of
"violently" E.g.:
after-sb-0pri discard-least-changes
after-sb-1pri violently-as0p
after-sb-2pri violently-as0p
rr-conflict violently
Then the resync should continue. Since the "violently" allows DRBD
to change the data set again, that is seen on the Primary node.
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria http://www.linbit.com :
More information about the drbd-dev
mailing list