[DRBD-user] about split-brain

Lars Ellenberg lars.ellenberg at linbit.com
Fri Apr 20 15:35:05 CEST 2018

On Thu, Apr 19, 2018 at 05:06:03PM +0300, Veli Cakmak wrote:
> hello guys, i have 2 drbd node (primary/secondary) I try to solve split
> brain without any lost data.  How can i find last updated node ? Is there
> any command or something like ?

Divergence of instances of supposedly replicated data is
typically the result of a cluster split brain,
thus "split-brain" in this context is often used as a synonym
for data divergence.

You *cannot* resolve data divergence without data loss.
There is no "merge". Not on the block device level.

If you wanted to "merge" both sets of changes,
you could keep them both,
bring them "online" in "single-user" mode
(whatever that means for the data set in question)
and "merge" changes (files, database records, ...)
on the application level.

That may be trivial (e.g. monitoring time series in rrd files),
though still a lot of effort.
Or it may take more time than you think,
cause a lot of frustration,
and finally result in the insight that merging the changes
is not possible in any sane way anyways.

On the block level,
you have to throw away the changes on the "victim" instance(s),
and override with the changes of some other instance
which you declare to be the "winner".

*automating* that is automating data loss
(throwing away the changes of the victim(s)).


Using DRBD after-split-brain auto recovery strategies
is configuring automatic data loss.

This is for people who care more about being online
with "some" data than for being online with the "right" data.

People who care about the "right" data
try to *avoid* data divergence in the first place.

Which usually is achieved by a combination of out-of-band
communication of the DRBD instances (drbd resource fencing policies
and the fence-peer handler) and fencing on the cluster manager level.

To reliably avoid "split brain" (data divergence), you need fencing
properly configured on both the pacemaker level and the DRBD level.

If you rather want to "get away" without fencing, and instead
try to automate "recovery" from "split brain" situations,
you are automating data loss.

Don't blame DRBD, it is just that flexible.

The choice is yours.

> Running : Drbd(8.9.10-2), Pacemaker, Corosync, Postgresql
> my auto solve config:
> net      {
>       after-sb-0pri discard-zero-changes;
>       after-sb-1pri discard-secondary;

If the "good data" (by whatever metric)
happens to be secondary during that handshake,
and the "bad data" happens to be primary,
you just did automatically throw away the good,
and kept the bad.

: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
please don't Cc me, but send to list -- I'm subscribed

More information about the drbd-user mailing list