[DRBD-user] Impossible to get primary node.

Wed Oct 9 09:19:44 CEST 2019

Thanks a lot for your suggestions (Robert and Lars), it took a while 
before I was able to try them on virtual machines. I hope you don't mind 
that I reply to both of you in one mail -- I messed up my mail delivery 
options (now corrected).

I've added my latest drbd config below, for reference.

>> I can't find any sequence of commands that can convince drbd (or pacemaker) that I *want* to use outdated data.
> This should work:
> drbdadm del-peer tapas:fims1
> drbdadm primary —force tapas

This seems to work (get to UpToDate state) briefly, until the next time 
the pacemaker drbd monitor runs, which 'demotes' the resource again to 
it's original state.

Failed Resource Actions:
* drbd_monitor_20000 on vmnbiaas2 'master' (8): call=84, 
status=complete, exitreason='',
     last-rc-change='Wed Oct  9 14:49:06 2019', queued=0ms, exec=0ms

The corosync logs are difficult to follow, so I'm not sure how I can get 
pacemaker to accept the trickery done behind its back..

Lars wrote:

Alternatively, you could *add* a suitable fencing constraint to your sole survivor node, which should make the fencing succeed.

You could tell the crm-fence-peer.9.sh fencing handler that an --unreachable-peer-is-outdated.
(Manually. From a root shell. That switch is not effective from within the drbd configuration; for reasons).

I tried this, after finding what the command should look like in 
/var/log/messages:

DRBD_BACKING_DEV_0=/dev/mapper/centos-drbd DRBD_CONF=/etc/drbd.conf 
DRBD_LL_DISK=/dev/mapper/centos-drbd DRBD_MINOR=0 DRBD_MINOR_0=0 
DRBD_MY_ADDRESS=172.17.5.62 DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=1 
DRBD_NODE_ID_0=vmnbiaas1 DRBD_NODE_ID_1=vmnbiaas2 
DRBD_PEER_ADDRESS=172.17.5.61 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=0 
DRBD_RESOURCE=tapas DRBD_VOLUME=0 UP_TO_DATE_NODES=0x00000002 
/usr/lib/drbd/crm-fence-peer.9.sh --unreachable-peer-is-outdated

This failed as follows:

Oct  9 14:29:42 vmnbiaas2 crm-fence-peer.9.sh[6153]: WARNING Found <cib 
crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="48" 
num_updates="23" admin_epoch="0" cib-last-written="Wed Oct  9 14:07:22 
2019" update-origin="vmnbiaas1" update-client="cibadmin" 
update-user="root" have-quorum="0" dc-uuid="1"

Oct  9 14:29:42 vmnbiaas2 crm-fence-peer.9.sh[6153]: WARNING I don't 
have quorum; did not place the constraint!

OK, while I'm experimenting, I quick-hacked the script to use

fail_if_no_quorum=false

After which the error changes to

Oct  9 14:38:13 vmnbiaas2 crm-fence-peer.9.sh[7579]: WARNING some peer 
is UNCLEAN, my disk is not UpToDate, did not place the constraint!

Cheers!

      Rob

resource tapas {
   protocol C;

   startup {
     wfc-timeout            0;    ## Infinite!
     outdated-wfc-timeout    120;
     degr-wfc-timeout        120;  ## 2 minutes.
   }

   disk {
     on-io-error detach;
   }

   handlers {
     split-brain "/opt/sol/tapas/bin/split-brain-helper.sh";

     fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
     unfence-peer "/usr/lib/drbd/crm-unfence-peer.9.sh";
   }

   net {
     fencing resource-only;

#    after-sb-0pri       discard-least-changes;
   }

   device        /dev/drbd0;
   disk            /dev/mapper/centos-drbd;
   meta-disk        internal;

   on vmnbiaas1 {
     address    172.17.5.61:7789;
   }

   on vmnbiaas2 {
     address    172.17.5.62:7789;
   }
}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20191009/b0027872/attachment.htm>