[DRBD-user] DRBD fencing prevents resource promotion in active/passive cluster

Tue Sep 20 13:07:09 CEST 2016

Hi,

I am using a drbd device in an active/passive cluster setup with pacemaker. We have dedicated connections for corosync heartbeats, drbd and a 10GB data connection:
- A bonded 10GB network card for data traffic that will be accessed via a virtual ip managed by pacemaker in 192.168.120.1/24. In the cluster nodes MDA1PFP-S01 and MDA1PFP-S02 are assigned to 192.168.120.10 and 192.168.120.11.

- A dedicated back-to-back connection for corosync heartbeats in 192.168.121.1/24. MDA1PFP-PCS01 and MDA1PFP-S02 are assigned to 192.168.121.10 and 192.168.121.11. When the cluster is created, we use these as primary node names and use the 10GB device as a second backup connection for increased reliability: pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02

- A dedicated back-to-back connection for drbd in 192.168.122.1/24. Hosts MDA1PFP-DRBD01 and MDA1PFP-DRBD02 are assigned 192.168.23.10 and 192.168.123.11.

In my tests, I force a failover by
1. Shutdown the cluster node with the master with pcs cluster stop 2. Disable the network device for the virtual ip with ifdown and wait until ping detects it

The initial state of the cluster is
MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status Cluster name: MDA1PFP
Last updated: Fri Sep 16 14:41:18 2016        Last change: Fri Sep 16 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
Stack: corosync
Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 7 resources configured

Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]

Full list of resources:

 Master/Slave Set: drbd1_sync [drbd1]
     Masters: [ MDA1PFP-PCS02 ]
     Slaves: [ MDA1PFP-PCS01 ]
 mda-ip    (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS02
 Clone Set: ping-clone [ping]
     Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
 ACTIVE    (ocf::heartbeat:Dummy):    Started MDA1PFP-PCS02
 shared_fs    (ocf::heartbeat:Filesystem):    Started MDA1PFP-PCS02

PCSD Status:
  MDA1PFP-PCS01: Online
  MDA1PFP-PCS02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
 Master: drbd1_sync
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
  Resource: drbd1 (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=shared_fs 
   Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
               promote interval=0s timeout=90 (drbd1-promote-interval-0s)
               demote interval=0s timeout=90 (drbd1-demote-interval-0s)
               stop interval=0s timeout=100 (drbd1-stop-interval-0s)
               monitor interval=60s (drbd1-monitor-interval-60s)
 Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
  Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
              stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
              monitor interval=1s (mda-ip-monitor-interval-1s)
 Clone: ping-clone
  Resource: ping (class=ocf provider=pacemaker type=ping)
   Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1 timeout=1 attempts=3 
   Operations: start interval=0s timeout=60 (ping-start-interval-0s)
               stop interval=0s timeout=20 (ping-stop-interval-0s)
               monitor interval=1 (ping-monitor-interval-1)
 Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s)
              stop interval=0s timeout=20 (ACTIVE-stop-interval-0s)
              monitor interval=10 timeout=20 (ACTIVE-monitor-interval-10)
 Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
  Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
              stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
              monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)

MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full Location Constraints:
  Resource: mda-ip
    Enabled on: MDA1PFP-PCS01 (score:50) (id:location-mda-ip-MDA1PFP-PCS01-50)
    Constraint: location-mda-ip
      Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
        Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
        Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1) Ordering Constraints:
  start ping-clone then start mda-ip (kind:Optional) (id:order-ping-clone-mda-ip-Optional)
  promote drbd1_sync then start shared_fs (kind:Mandatory) (id:order-drbd1_sync-shared_fs-mandatory)
Colocation Constraints:
  ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
  drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master) (with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
  shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY)

The status after starting is:
Last updated: Fri Sep 16 14:39:57 2016          Last change: Fri Sep 16 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
Stack: corosync
Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 7 resources configured

Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]

 Master/Slave Set: drbd1_sync [drbd1]
     Masters: [ MDA1PFP-PCS02 ]
     Slaves: [ MDA1PFP-PCS01 ]
mda-ip  (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS02
 Clone Set: ping-clone [ping]
     Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ] ACTIVE  (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02
shared_fs    (ocf::heartbeat:Filesystem):    Started MDA1PFP-PCS02

The drbd is up as primary on MDA1PFP-PCS02 and mounted. Everything is fine.

If I do any of the two tests to force a failover the resources are moved to the other node but the drbd is not promoted to master on the new active node:
Last updated: Fri Sep 16 14:43:33 2016          Last change: Fri Sep 16 14:43:31 2016 by root via cibadmin on MDA1PFP-PCS01
Stack: corosync
Current DC: MDA1PFP-PCS01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 7 resources configured

Online: [ MDA1PFP-PCS01 ]
OFFLINE: [ MDA1PFP-PCS02 ]

 Master/Slave Set: drbd1_sync [drbd1]
     Slaves: [ MDA1PFP-PCS01 ]
mda-ip  (ocf::heartbeat:IPaddr2):    Started MDA1PFP-PCS01
 Clone Set: ping-clone [ping]
     Started: [ MDA1PFP-PCS01 ]
ACTIVE  (ocf::heartbeat:Dummy): Started MDA1PFP-PCS01

I was able to trace this to the fencing in the drbd configuration
MDA1PFP-S01 14:41:44 1806 0 ~ # cat /etc/drbd.d/shared_fs.res resource shared_fs {
disk    /dev/mapper/rhel_mdaf--pf--pep--1-drbd;
  disk {
    fencing resource-only;
  }
  handlers {
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
  }
    device    /dev/drbd1;
    meta-disk internal;
    on MDA1PFP-S01 {
        address 192.168.123.10:7789;
    }
    on MDA1PFP-S02 {
        address 192.168.123.11:7789;
    }
}

I am using drbd 8.4.7, drbd utils 8.9.5 and pacemaker 2.3.4-7.el7 with corosyinc 0.9.143-15.el7 from the Centos7 repositories.

MDA1PFP-S01 15:00:20 1841 0 ~ # drbdadm --version DRBDADM_BUILDTAG=GIT-hash:\ 5d50d9fb2a967d21c0f5746370ccc066d3a67f7d\ build\ by\ mockbuild@\,\ 2016-01-12\ 12:46:45
DRBDADM_API_VERSION=1
DRBD_KERNEL_VERSION_CODE=0x080407
DRBDADM_VERSION_CODE=0x080905
DRBDADM_VERSION=8.9.5

If I disable the fencing scripts everything works as expected. If enabled, no node is promoted to master after failover. It seems to be a sticky modificaton because once a failover is simulated with fencing scripts activated I cannot get the cluster to work anymore. Even removing the setting from the DRBD configuration does not help.

I discussed this on the Pacemaker mailing list and from a Pacemaker point of view this should not happen. The nodes are still online so no fencing should happen, however DRBD fences the wrong node?

Best wishes,
  Jens Auer

--
Dr. Jens Auer | CGI | Software Engineer
CGI Deutschland Ltd. & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.auer at cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply e-mail.