[DRBD-user] Problem with drbd and replication network

Mon Apr 23 17:39:44 CEST 2018

Hello, I've configured a drbd/pacemaker cluster with 2 nodes and I'm doing
some tests for failover. Basically my cluster is quite simple: I have 2
drbd resources configured in pacemaker:
[root at pcmk2 ~]# pcs resource show DrbdRes
 Resource: DrbdRes (class=ocf provider=linbit type=drbd)
  Attributes: drbd_resource=myres
  Operations: demote interval=0s timeout=90 (DrbdRes-demote-interval-0s)
              monitor interval=29s role=Master
(DrbdRes-monitor-interval-29s)
              monitor interval=31s role=Slave (DrbdRes-monitor-interval-31s)
              promote interval=0s timeout=90 (DrbdRes-promote-interval-0s)
              start interval=0s timeout=240 (DrbdRes-start-interval-0s)
              stop interval=0s timeout=100 (DrbdRes-stop-interval-0s)
[root at pcmk2 ~]# pcs resource show DrbdResClone
 Master: DrbdResClone
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
clone-node-max=1
  Resource: DrbdRes (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=myres
   Operations: demote interval=0s timeout=90 (DrbdRes-demote-interval-0s)
               monitor interval=29s role=Master
(DrbdRes-monitor-interval-29s)
               monitor interval=31s role=Slave
(DrbdRes-monitor-interval-31s)
               promote interval=0s timeout=90 (DrbdRes-promote-interval-0s)
               start interval=0s timeout=240 (DrbdRes-start-interval-0s)
               stop interval=0s timeout=100 (DrbdRes-stop-interval-0s)
[root at pcmk2 ~]#

Furthermore, in /etc/drbd.d/myres.res I have:
disk {
                fencing resource-only;
        }
        handlers {
                fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
        }

So, I'm testing various cases for stonith / failover and high availability
in general:
1) pcs cluster standby / unstandby first on the secondary node then on the
primary node
2) stonith_admin --reboot=pcmk[12]
3) Shutdown one vm at a time causing a failover of all resources and a
resync after the node returns up
4) nmcli connection down corosync-network
5) nmcli connection down replication-network

All tests have been passed except the last one. Please note that I have 2
separated networks on each node: one for corosync and another for drbd
replication. When I try to simulate a down on the replication network, I
see resources:

On the secondary:
 0:myres/0     WFConnection Secondary/Unknown UpToDate/DUnknown

On the primary:
 0:myres/0     StandAlone Primary/Unknown UpToDate/Outdated

Is this normal? It seems that I have to manually do some actions to adjust
the cluster:
drdbadm  connect --discard-my-data myres  <<--- On  the secondary
drbdadm connect myres <--- On the primary

Is there an automated way to do this when the replication network returns
up??

Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180423/e3e09c41/attachment.htm>