[DRBD-user] Split-brain

Jérôme Augé jerome.auge at gmail.com
Tue Jun 12 11:26:16 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


2007/6/11, N.J. van der Horn (Nico) <nico at vanderhorn.nl>:
>
> Hallo DRBD-meisters und lovers !
>
> As far as i am aware of, i never had any real
> problems using DRBD, but that changed a couple of days ago.
> Both nodes suddenly have status "StandAlone" and
> "messages" shows that i am blessed with "Split-Brain".
>
> I suspect myself forgotting to change the state
> of node foc1 to Secondary before starting Heartbeat.
> There is no other clue coming up into my mind to
> explain what caused this situation.... grinzz
>
> On both nodes fsck is happy, even with "fsck -n"
> (readonly) on the physical device after stopping DRBD.
> I can mount (did that 1-at-a-time) both sides and
> my data looks about the same (no real comparison made).
>
> The cluster is a test-setup in my lab, the data
> has no real value, but i like to understand what's wrong.
>
> Thanks in advance for your valued answers.
>
> Nico van der Horn
>
>
> Questions:
> ----------
> 1. how can i determine the real cause of the split-brain ?


Split-Brains are mainly caused by a loss of communication between the
two nodes,
and the Secondary node becoming Primary, while the other node remains in
Primary state (both nodes think their peer is dead, so they become both
Primary).

Check your syslog to see what happened before the Split-Brain occurred
(eth0 link down, etc.)

2. how to correct the situation ?


You have to decide which node you want to "sacrifice", and tell him to
discard his data.

Run the following command on the node on which you want to discard the data
:

    root at bad-data# drbdadm -- --discard-my-data connect all

Then, simply connect the other machine :

    root at good-data# drbdadm connect all

The nodes will start to resynchronize, transferring data from the good-data
node (will appears as SyncSource in /proc/drbd) to the bad-data node
(SyncTarget).

Regards,
Jérôme Augé
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070612/7258b263/attachment.htm>


More information about the drbd-user mailing list