[DRBD-user] Impossible to get primary node.

Lars Ellenberg lars.ellenberg at linbit.com
Fri Sep 27 11:07:25 CEST 2019


On Thu, Sep 26, 2019 at 05:20:34PM +0800, Rob Kramer wrote:
> Hi all,
> 
> I'm using a dual-node pacemaker cluster with drbd9 on centos 7.7. DRBD is
> set up for 'resource-only' fencing, and the setup does not use STONITH. The
> issue is that if both nodes are stopped in sequence, then there is no way to
> start the cluster with only the node that was powered down first, because
> DRBD considers the data outdated.
> 
> I understand that using outdated data should be prevented, but in my case
> outdated data is better than no system at all (in case the other node it
> completely dead). Any drbd command to force the outdated node to be primary
> fails:
> 
>   [*fims2] ~> drbdadm primary tapas --force
>   tapas: State change failed: (-7) Refusing to be Primary while peer is not
> outdated
>   Command 'drbdsetup primary tapas --force' terminated with exit code 11
> 
> I can't find any sequence of commands that can convince drbd (or pacemaker)
> that I *want* to use outdated data. If I remove the 'fencing resource-only'
> entry from the drbd config, then I can to a sequence of commands that make
> the primary --force work (basically, set cluster in maintenance,  down and
> up drbd, primary --force). I've made sure that stray fencing constraints are
> removed from the cluster cib as well.
> 
> Surely there has to be some way to force drbd to listen to me, and stop
> trying to protect my data at the cost of having no system that is runnable
> at all?
> 
> This is the first system that we've rolled out that used drbd9; it's
> possible that the --force would work OK in 8.x.
> 
> I've included the drbd config below.


The force is needed to even attempt to go primary while outdated locally.
But it does not "force" us to consider the peer to be outdated.

So as long as you have a fencing policy configured,
and DRBD cannot confirm (or is being lied to) that the peer
won't go primary, you will run into this.

If you "don't care" (or no longer, or not now),
then set the fencing policy to dont-care.
You found that work around already yourself.

Alternatively, 
you could *add* a suitable fencing constraint to your sole survivor
node, which should make the fencing succeed.

You could tell the crm-fence-peer.9.sh fencing handler
that an --unreachable-peer-is-outdated.
(Manually. From a root shell.
That switch is not effective from within
the drbd configuration; for reasons).

>     fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
>     after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";

Please use "unfence-peer", NOT after-resync-target.
That was from the times when there was no unfence-peer handler,
and we overloaded/abused the after-resync-target handler
for this purpose.

>     fencing resource-only;
> 
>     after-sb-0pri       discard-least-changes;

You are automating data loss.
That is your choice, but please be aware of that,
and not complain later.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed


More information about the drbd-user mailing list