[DRBD-user] crm-fence-peer.sh an multiple communication channels

Tue Jan 17 16:39:00 CET 2017

Hi

I have a basic 2 node active/passive cluster with Pacemaker (1.1.14 , pcs: 0.9.148) / CMAN (3.0.12.1) / Corosync (1.4.7) on RHEL 6.8.
This cluster runs HA-services on top of DRBD (8.4.4).

Basically the system is working on both nodes and I can switch the resources from one node to the other.
But (in case of appropriate test-cases all automatic) calls of crm-fence-peer.sh end with exit code 5; i.e. timeout after 60 seconds.

After some debugging sessions I am pretty sure the problem is due to the script comparing information read from the CIB against $HOSTNAME.
In our case no matching entries are found and thus the script tries over and over again until the timeout expires.

The reason why no matching entries are found is our use of multiple communication channels:
-- bond0: standard-interface matching $HOSTNAME
                    used to offer the clustered HA-services via clustered IP-addresses and used to access the individual nodes for maintenance
-- bond1: interface reserved for cluster-communication (pacemaker, corosync) matching the information found in the CIB
-- bond2: interface reserved for DRBD-replication matching the information found in the DRBD-configuration
I.e. each node has actually 3 different names, one on each interface.

I think this setup with dedicated communication channels is common.
So I guess there must be some best practice how to deal with this situation and I just miss some configuration item to tell the system how to use the correct names.

Please advise ...

Kind regards
Andi