[DRBD-user] New storage: avoid split brains

Thu Oct 20 15:14:02 CEST 2016

On Thu, Oct 20, 2016 at 09:13:18AM +0200, Gandalf Corvotempesta wrote:
> Il 18 ott 2016 8:20 PM, "Lars Ellenberg" <lars.ellenberg at linbit.com> ha
> scritto:
> >
> > There is no "write quorum" yet, but I'm working on that.
> >
> 
> Any ETA about this?
> 
> > Data divergence is still very much possible.
> >
> > The DRBD 8.4 integration with pacemaker and fencing mechanisms
> > is proven to work, whereas the DRBD 9 integration with pacemaker
> > and fencing mechanisms is still pretty much non-exisitant.
> >
> 
> So, currently, there is no safe way to provide split brain protection
> write quorum doesn't exists and this is the only real way to be protected
> by split brains

As convincing as that sounds, it is incorrect.

Scenario:
DRBD 8.4, two node replication, pacemaker controlled cluster.

Redundant cluster communication.
Fencing on the cluster level.
Fencing policies on DRBD level.

Normal operation, all healthy.

DRBD sees replication link problems,
freezes IO,
triggers a user space helper.

User space helper can contact pacemaker,
confirm if the peer has gone away
(in which case it would also be shot by Pacemaker),
or is still reachable via redundant cluster communication.

a) 
Helper tells pacemaker to please not promote anyone
but the node with good data.
(injects a pacemaker constraint).

Returns success to kernel module.

DRBD resumes IO.

b)
Peer node is not reachable,
and pacemaker level fencing fails.

Does not return success,
DRBD keeps blocking IO.

c)
peer node itself is healthy,
pacemaker on that node considers this node dead,
also triggers fencing,
shoots us (while we still have frozen IO),
wins,
promotes DRBD there,
DRBD again triggers that user land helper on that node,
which again tells pacemaker to not promote anyone but the node with the
good data (the one that just has taken over).

----------

Later, nodes re-join,
DRBD syncs up,
when fully synced up, removes constraint.

-----------

If you bring up DRBD with fencing policies enabled,
it does not come up as "UpToDate", but as "Consistent",
which cannot directly be promoted, but again needs to ask that
"fence-peer" helper, which can deny promotion,
and should deny promotion, unless it *knows* (by whatever decision
matrix you need to implement) that there can not be any better data
anywhere.

When configured correctly, Pacemaker will not even try to promote
a DRBD that is only Consistent.

DRBD handshake (and usually, but not necessarily, resulting resync)
brings a node out of Consistent.

----------

You end up with a system that will NOT experience data divergence,
unless you force it to.

But you may run into (multiple failure, mind you!) situations
where you are offline, rather than risk to go online
with a possibly stale data set.

That's the thing: you cannot really have both
availability AND consistency in a distributed system.

But using your own handlers and heuristics,
DRBD allows you to find your own "perfect spot"
in the possible tradeoffs.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed