[DRBD-user] Fencing DRBD on Poweroff of Primary

Lars Ellenberg lars.ellenberg at linbit.com
Mon Apr 10 14:42:38 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Apr 10, 2017 at 12:47:01PM +1000, Igor Cicimov wrote:
> On 10 Apr 2017 7:42 am, "Marco Certelli" <marco_certelli at yahoo.it> wrote:
> 
> Hello. Thanks for the answer.
> Maybe I was not clear: I do not want the authomatic poweroff of the server.
> 
> 
> Why do you have problem with this? The server is already powering off right?
> 
> My problem is that if I manually poweroff the primary node (i.e. the server
> with DRBD primary mounted on), the secondary does not become primary
> (promote) anymore!


Still Digimer is right, usually you want "fencing resource-and-stonith;".

Which, btw, does not itself cause stonith,
but tells DRBD to *expect* stonith (node level fencing) to happen,
and to freeze IO in case the replication link is down unexpectedly.

On a "clean" poweroff (shutdown, reboot),
you also want to *first* bring down pacemaker,
which is then supposed to cleanly stop all resources,
in the "correct" order, and also cleanly stop DRBD.

*THEN* maybe stop the network,
and only then maybe stop whatever is still left.

I suspect your shutdown process stops several things in parallel,
and for some reason manages to kill the network before DRBD is demoted.
Or something like that.
Logs from kernel DRBD and pacemaker should tell.

> From the docs:
> Thus, if the DRBD replication link becomes disconnected, the
> crm-fence-peer.sh script contacts the cluster manager, determines the
> Pacemaker Master/Slave resource associated with this DRBD resource, and
> ensures that the Master/Slave resource no longer gets promoted on any node
> other than the currently active one. Conversely, when the connection is
> re-established and DRBD completes its synchronization process, then that
> constraint is removed and the cluster manager is free to promote the
> resource on any node again.
> 
> It seems that the primary, just before powering off, fences the other node
> and precludes it to become primary.

That:
> > before-resync-target "/usr/lib/drbd/crm-fence-peer.sh";
does not make much sense to me.

It should be:
> >        fence-peer "/usr/lib/drbd/crm-fence-peer.sh";

and this:
> >        after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";

with a sufficiently recent DRBD (>= 8.4.7),
that should be "unfence-peer", no longer after-resync-target.
For reasons... but the effect is the same, basically,
and after-resync-target ... unfence ... remains still valid,
if you prefer. For your issue, it should not make much difference.

Maybe you also have a "left over", stale, fencing constraint around
from previous testing?

check with "crm configure show" or equivalent,
or "crm_mon -1rfnAL" or cibadmin or whatever you like.

> > It happen that if I poweroff the Active server (the one with DRBD
> > Primary mounted on), the backup cannot promote and mount the DRBD
> > anymore. This is not what I would like to happen and this problem does
> > not occur if I remove the above fencing configuration (fencing,
> > fence-peer and after-resync-target commands).
> >
> > My only objective is to prevent promoting of a disk that is under
> > resynch. Is there a solution?

Your objective should also include to prevent promoting of a disk that
may contain stale data. Which is what DRBD fencing policies and handlers
are about.

A disk that is target of resync is "Inconsistent", and as such would
by default receive only reduced "master score" anyways.
You can change the "master scores" for various situations
with the DRBD resource agent parameter "adjust_master_score":

| Space separated list of four master score adjustments for different scenarios:
|  - only access to 'consistent' data
|  - only remote access to 'uptodate' data
|  - currently Secondary, local access to 'uptodate' data, but remote is unknown
|  - local access to 'uptodate' data, and currently Primary or remote is known
| 
| Numeric values are expected to be non-decreasing.
| 
| The first value is 0 by default to prevent pacemaker from trying to promote
| while it is unclear whether the data is really the most recent copy.
| (DRBD knows it is "consistent", but is unsure about "uptodate"ness).
| Please configure proper fencing methods both in DRBD
| (fencing resource-and-stonith; appropriate (un)fence-peer handlers)
| AND in Pacemaker to make this work reliably.
| 
| Advanced use: Adjust the other values to better fit into complex
| dependency score calculations.
| 
| Intentionally diskless nodes ("Diskless Clients") with access to good data via
| some (or all) their peers will use the 3rd or 4th value (minus one) when they
| are (Secondary, not all peers up-to-date) or (ALL peers are up-to-date, or they
| are Primary themselves). This may need to change if this should become a
| frequent use case.

Defaults used to be "5 10 1000 10000", and are now "0 10 1000 10000".
If you want to allow promotion only with "good" local data,
or want to experiment with this setting, try "0 0 1 100" or something,
and see where it gets you.

Maybe learn crm_simulate and do some cibadmin/crm_attribute direct manipulation
during crm_simulate steps, and see if you achieve whatever your goal is.


-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list