[DRBD-user] DRBD fencing prevents resource promotion in active/passive cluster

Tue Sep 20 13:58:40 CEST 2016

On 20/09/16 07:07 AM, Auer, Jens wrote:
> Hi,
> 
> I am using a drbd device in an active/passive cluster setup with pacemaker. We have dedicated connections for corosync heartbeats, drbd and a 10GB data connection:
> - A bonded 10GB network card for data traffic that will be accessed via a virtual ip managed by pacemaker in 192.168.120.1/24. In the cluster nodes MDA1PFP-S01 and MDA1PFP-S02 are assigned to 192.168.120.10 and 192.168.120.11.
> 
> - A dedicated back-to-back connection for corosync heartbeats in 192.168.121.1/24. MDA1PFP-PCS01 and MDA1PFP-S02 are assigned to 192.168.121.10 and 192.168.121.11. When the cluster is created, we use these as primary node names and use the 10GB device as a second backup connection for increased reliability: pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02
> 
> - A dedicated back-to-back connection for drbd in 192.168.122.1/24. Hosts MDA1PFP-DRBD01 and MDA1PFP-DRBD02 are assigned 192.168.23.10 and 192.168.123.11.
> 
> In my tests, I force a failover by
> 1. Shutdown the cluster node with the master with pcs cluster stop 2. Disable the network device for the virtual ip with ifdown and wait until ping detects it
> 
> The initial state of the cluster is
> MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status Cluster name: MDA1PFP
> Last updated: Fri Sep 16 14:41:18 2016        Last change: Fri Sep 16 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
> Stack: corosync
> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
> 2 nodes and 7 resources configured
> 
> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
> 
> Full list of resources:
> 
>  Master/Slave Set: drbd1_sync [drbd1]
>      Masters: [ MDA1PFP-PCS02 ]
>      Slaves: [ MDA1PFP-PCS01 ]
>  mda-ip    (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS02
>  Clone Set: ping-clone [ping]
>      Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
>  ACTIVE    (ocf::heartbeat:Dummy):    Started MDA1PFP-PCS02
>  shared_fs    (ocf::heartbeat:Filesystem):    Started MDA1PFP-PCS02
> 
> PCSD Status:
>   MDA1PFP-PCS01: Online
>   MDA1PFP-PCS02: Online
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> 
> MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
>  Master: drbd1_sync
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
>   Resource: drbd1 (class=ocf provider=linbit type=drbd)
>    Attributes: drbd_resource=shared_fs 
>    Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
>                promote interval=0s timeout=90 (drbd1-promote-interval-0s)
>                demote interval=0s timeout=90 (drbd1-demote-interval-0s)
>                stop interval=0s timeout=100 (drbd1-stop-interval-0s)
>                monitor interval=60s (drbd1-monitor-interval-60s)
>  Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
>   Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
>               stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
>               monitor interval=1s (mda-ip-monitor-interval-1s)
>  Clone: ping-clone
>   Resource: ping (class=ocf provider=pacemaker type=ping)
>    Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1 timeout=1 attempts=3 
>    Operations: start interval=0s timeout=60 (ping-start-interval-0s)
>                stop interval=0s timeout=20 (ping-stop-interval-0s)
>                monitor interval=1 (ping-monitor-interval-1)
>  Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy)
>   Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s)
>               stop interval=0s timeout=20 (ACTIVE-stop-interval-0s)
>               monitor interval=10 timeout=20 (ACTIVE-monitor-interval-10)
>  Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
>   Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
>               stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
>               monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)
> 
> MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full Location Constraints:
>   Resource: mda-ip
>     Enabled on: MDA1PFP-PCS01 (score:50) (id:location-mda-ip-MDA1PFP-PCS01-50)
>     Constraint: location-mda-ip
>       Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
>         Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
>         Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1) Ordering Constraints:
>   start ping-clone then start mda-ip (kind:Optional) (id:order-ping-clone-mda-ip-Optional)
>   promote drbd1_sync then start shared_fs (kind:Mandatory) (id:order-drbd1_sync-shared_fs-mandatory)
> Colocation Constraints:
>   ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
>   drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master) (with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
>   shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY)
> 
> The status after starting is:
> Last updated: Fri Sep 16 14:39:57 2016          Last change: Fri Sep 16 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
> Stack: corosync
> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
> 2 nodes and 7 resources configured
> 
> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
> 
>  Master/Slave Set: drbd1_sync [drbd1]
>      Masters: [ MDA1PFP-PCS02 ]
>      Slaves: [ MDA1PFP-PCS01 ]
> mda-ip  (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS02
>  Clone Set: ping-clone [ping]
>      Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] ACTIVE  (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02
> shared_fs    (ocf::heartbeat:Filesystem):    Started MDA1PFP-PCS02
> 
> The drbd is up as primary on MDA1PFP-PCS02 and mounted. Everything is fine.
> 
> If I do any of the two tests to force a failover the resources are moved to the other node but the drbd is not promoted to master on the new active node:
> Last updated: Fri Sep 16 14:43:33 2016          Last change: Fri Sep 16 14:43:31 2016 by root via cibadmin on MDA1PFP-PCS01
> Stack: corosync
> Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
> 2 nodes and 7 resources configured
> 
> Online: [ MDA1PFP-PCS01 ]
> OFFLINE: [ MDA1PFP-PCS02 ]
> 
>  Master/Slave Set: drbd1_sync [drbd1]
>      Slaves: [ MDA1PFP-PCS01 ]
> mda-ip  (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS01
>  Clone Set: ping-clone [ping]
>      Started: [ MDA1PFP-PCS01 ]
> ACTIVE  (ocf::heartbeat:Dummy): Started MDA1PFP-PCS01
> 
> I was able to trace this to the fencing in the drbd configuration
> MDA1PFP-S01 14:41:44 1806 0 ~ # cat /etc/drbd.d/shared_fs.res resource shared_fs {
> disk    /dev/mapper/rhel_mdaf--pf--pep--1-drbd;
>   disk {
>     fencing resource-only;
>   }
>   handlers {
>     fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>     after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>   }
>     device    /dev/drbd1;
>     meta-disk internal;
>     on MDA1PFP-S01 {
>         address 192.168.123.10:7789;
>     }
>     on MDA1PFP-S02 {
>         address 192.168.123.11:7789;
>     }
> }
> 
> I am using drbd 8.4.7, drbd utils 8.9.5 and pacemaker 2.3.4-7.el7 with corosyinc 0.9.143-15.el7 from the Centos7 repositories.
> 
> MDA1PFP-S01 15:00:20 1841 0 ~ # drbdadm --version DRBDADM_BUILDTAG=GIT-hash:\ 5d50d9fb2a967d21c0f5746370ccc066d3a67f7d\ build\ by\ mockbuild@\,\ 2016-01-12\ 12:46:45
> DRBDADM_API_VERSION=1
> DRBD_KERNEL_VERSION_CODE=0x080407
> DRBDADM_VERSION_CODE=0x080905
> DRBDADM_VERSION=8.9.5
> 
> If I disable the fencing scripts everything works as expected. If enabled, no node is promoted to master after failover. It seems to be a sticky modificaton because once a failover is simulated with fencing scripts activated I cannot get the cluster to work anymore. Even removing the setting from the DRBD configuration does not help.
> 
> I discussed this on the Pacemaker mailing list and from a Pacemaker point of view this should not happen. The nodes are still online so no fencing should happen, however DRBD fences the wrong node?
> 
> Best wishes,
>   Jens Auer

Don't disable fencing!

You need to configure and test stonith in pacemaker. Once that's
working, then you set DRBD's fencing to 'resource-and-stonith;' and
configure the 'crm-{un,}fence-handler.sh' un/fence handlers.

With this, if a node fails (and no, redundant network links is not
enough, nodes can die in many ways), then drbd will block when the peer
is lost, call the fence handler and wait for pacemaker to report back
that the fence action was completed. This way, you will never get a
split-brain and you will get reliable recovery.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?