[DRBD-user] Why not keep track of peer outdated on up node?

Mon Mar 9 18:59:46 CET 2009

--- On Mon, 3/9/09, GAUTIER Hervé <herve.gautier at thalesgroup.com> wrote:

Thanks for the reply. :)

> Martin Fick a écrit :
> >
> > If node B goes down while node A is still primary,
> > should it not be possible to keep track of the fact
> > that node B is now outdated on node A?  This way,
> > if node A goes down while node B is still down,
> > when node A comes back up it should know that it
> > can safely proceed to primary without waiting for
> > node B to return.
> >   
>
> How do you know that, while node A was down, node B
> haven't been up and down several times ???

You don't! :(  But, if it has, you already have a split
brain situation and you are not likely making things
worse (depending on your split brain resolution
scenario).

At least with my proposal in mind, the cluster manager
can potentially be configured to never bring up node B
without some form of manual override causing the split
brain in the first place.  The idea, is that by keeping
track of the down status of nodes on peers, you can
automate one extra scenario in your cluster making the
manual intervention steps fewer and therefor hopefully
also fewer opportunities for split brain (and down
time).

This leaves only the following scenarios where a node
comes up and it either must wait for its peer to return
to continue normal operation:

1) Node A & B go down exactly at the same time

2) Node B goes down, Node A goes down, Node B returns
(or vice versa)

  The only time you would want B to become primary
  here is if node A is going to be permanently
  down and you are forced to discard its more recent
  updates

Whereas currently, any time a node comes up without
its peer, split brain is a risk if it does not
establish a connection to its peer before going
primary. 

Since the objective of drbd is, I assume, HA (not
data protection like raid since it does not verify
reads), it seems strange to make your system have
two dependencies on cold starts (when both nodes
go down.)  In this sense, a drbd cluster is
typically currently configured to be less HA
reliable (on boots) than a single node without drbd
since you can never safely automate the starting of
the cluster with a single node!

However, if you keep track of your peer's failure,
this restriction is potentially removed.  If node
B suffers an HD failure and you are replacing its
drive, do you want your cluster to require manual
boot intervention if node A happens to go down
for a minute?  Seems unnecessary to me, node A
should be smart enough to be able to reboot and
continue on its own?

> > If the cluster was degraded when node A went down,
> > it should be able to continue to operate degraded
> > safely when node A comes backup right?  Is there
> > anything wrong with this logic?  Are there
> > currently any mechanisms to do this?  Would you
> > consider implementing this in drbd?
> >
>
> I think it is a cluster matter, not DRBD one.

Well, it certainly can be handled on the cluster level
(and I plan on doing so), but why would drbd not want
to store extra important information if possible?
Even if drbd does not use this info, why not store the
fact that you are positive that your peer is outdated
(or that you are in split brain)?!

-Martin