[DRBD-user] Help with drbddisk modification to block takeover when the local resource is not in a safe state

Mon Sep 6 21:37:29 CEST 2010

On Mon, Sep 06, 2010 at 10:02:51PM +0800, jan gestre wrote:
> On Mon, Sep 6, 2010 at 8:48 PM, Lars Ellenberg
> <lars.ellenberg at linbit.com> wrote:
> > On Mon, Sep 06, 2010 at 08:34:40PM +0800, jan gestre wrote:
> >> Hi Everyone,
> >>
> >> I've found this drbddisk modification that will block takeover when
> >> the local resource is not in a safe state, however it only works if
> >> you only have one resource, but since I have two resources namely r0
> >> and r1, it would not work.
> >>
> >> case "$CMD" in
> >>    start)
> >>      # forbid to become primary if ressource is not clean
> >>      DRBDSTATEOK=`cat /proc/drbd | grep ' cs:Connected ' | grep '
> >> ds:UpToDate/' | wc -l`
> >>      if [ $DRBDSTATEOK -ne 1 ]; then
> >>        echo >&2 "drbd is not in Connected/UpToDate state. refusing to
> >> start resource"
> >>        exit 1
> >>      fi
> >>
> >> I would be truly grateful if anyone could care to show how to effect
> >> said modification.
> >>
> >> I'm trying to prevent a Split Brain scenario here, and I'm still
> >> testing my setup; I was in a predicament earlier wherein one of the
> >> resource r1 is in healthy state and r0 is in standalone
> >> Primary/Unknown state, I had to issue drdbadm -- --discard-my-data r0
> >> to resolve the split brain.
> >
> > No Sir.
> >
> > What if the Primary dies? Hard?
> > You now want your Secondary to take over, no?
> > Well, you cannot anymore. Because it is not Connected.
> > How could it, you just lost the peer ;-)
> >
> > Don't focus only on one specific scenario.
> > Because, if you just "fix" that specific scenario,
> > you break a truckload of others.
> >
> > Maybe it helps a bit to read
> > http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg04312.html
> >
> 
> Thanks Lars, but now I am confused, maybe you can enlighten me, you're
> saying that I would be better off without modifying, what then would
> you recommend to prevent Split Brain? Add a stonith device, e.g. IBM
> RSA? Add handlers like dopd?
> 
> BTW, I got the modification from this url -->
> http://lemonnier.se/erwan/blog/item/53/

Which is mislead.
And, it is not an attempt to avoid split brain,
but to avoid diverging data sets,
one of the ill consequences a split brain can lead to. 

What the presented patch does is disable takeover in case the Primary node dies.
So why then have heartbeat, in the first place?

I'll partially quote that blog:

| Let's take an example: two nodes, N0 and N1. N0 is primary, N1 is secondary.
| Both have redundant heartbeat links and at least one dedicated drbd
| replication link. Let's consider the (highly) hypothetical case when the drbd
| link goes down, soon followed by a power outage for N0. What will happen in a
| standard heartbeat/drbd setup is that when the drbd link goes down, the drbd
| daemon will set the local ressources on both nodes in state 'cs:WFConnection'
| (Waiting For Connection) and mark the peer data as outdated.

So far that is correct. Where "the drbd daemon" would be dopd.  Or, in a
pacemaker cluster, you could also use the crm-fence-peer script to achieve
a similar effect.

| Then when N0
| disappears due to the power outage, heartbeat on N1 will takeover ressources
| and become the primary node.

Which is wrong.

First, drbd will refuse to be promoted if it is outdated.
So this outdating seems to have not worked in the above setup.
Fix it.

| What we may want is to forbid a node to become primary in case its drbd
| resources are not in a connected and up-to-date state.

Which you already have: if it is Outdated, it cannot be promoted.

Second, in a properly configured Pacemaker setup,
Pacemaker (resp. the drbd OCF resource agent) would already know,
and not even try to promote it on the outdated node.

Besides, it should be a very unlikely event that a just rebooted, isolated node
decides to take over resources.

Maybe you should increase your initdead time.

Or wait for connection before even starting heartbeat/pacemaker.
In the haresources mode heartbeat clusters and using drbddisk, the drbd
wfc-timeout parameter is used for this, and the default for it is "unlimited",
so by default, the drbd init script would in most cases wait forever for drbd
to establish a connection to its peer, thereby blocking the bootup process on
purpose. Heartbeat would only start, once DRBD was able to establish its
connection.

Additionally maybe add a third node, so you have real quorum?

But it depends on you, and what you want to achieve, of course.
There is no one single best way.

The pacemaker list post about whether or not a DRBD setup needs STONITH
(I put the link here again)
http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg04312.html
explains why with DRBD (and basically any not shared, but replicated resource)
STONITH alone is NOT SUFFICIENT at all, and resource level fencing alone
is not sufficient if you lose all communication paths at the same time
(or in too quick succession for the chosen outdate mechanism to work).
So if you are paranoid enough, you need both, and real quorum,
and maybe on boot start nothing but sshd.

And even then, I'm sure someone can come up with a multiple failure scenario,
possibly involving operator failure, to still get diverging data sets ;-)

And, btw, this part does not make sense to me either:

| if you are using a stonith device, you may want to modify the stonith script
| to forbid stonithing the peer if the local resources are not in
| connected/up-to-date state. There might indeed be a chance that the peer node
| still is functional while the local node definitely is not.

You need STONITH to make sure that a node that you think is dead (i.e. you can
no longer communicate with -- but you still have the doubt that it may only be
the communication that is broken, not the node) really is dead.
Now you forbid the STONITH operation in case DRBD is not connected,
i.e. not communication with its peer.
Wait.
Wasn't communication failure the only reason you wanted to use STONITH in the
first place?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed