<div dir="ltr"><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div>Hello, I&#39;ve configured a drbd/pacemaker cluster with 2 nodes and I&#39;m doing some tests for failover. Basically my cluster is quite simple: I have 2 drbd resources configured in pacemaker:<br>[root@pcmk2 ~]# pcs resource show DrbdRes<br> Resource: DrbdRes (class=ocf provider=linbit type=drbd)<br>  Attributes: drbd_resource=myres<br>  Operations: demote interval=0s timeout=90 (DrbdRes-demote-interval-0s)<br>              monitor interval=29s role=Master (DrbdRes-monitor-interval-29s)<br>              monitor interval=31s role=Slave (DrbdRes-monitor-interval-31s)<br>              promote interval=0s timeout=90 (DrbdRes-promote-interval-0s)<br>              start interval=0s timeout=240 (DrbdRes-start-interval-0s)<br>              stop interval=0s timeout=100 (DrbdRes-stop-interval-0s)<br>[root@pcmk2 ~]# pcs resource show DrbdResClone<br> Master: DrbdResClone<br>  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 <br>  Resource: DrbdRes (class=ocf provider=linbit type=drbd)<br>   Attributes: drbd_resource=myres<br>   Operations: demote interval=0s timeout=90 (DrbdRes-demote-interval-0s)<br>               monitor interval=29s role=Master (DrbdRes-monitor-interval-29s)<br>               monitor interval=31s role=Slave (DrbdRes-monitor-interval-31s)<br>               promote interval=0s timeout=90 (DrbdRes-promote-interval-0s)<br>               start interval=0s timeout=240 (DrbdRes-start-interval-0s)<br>               stop interval=0s timeout=100 (DrbdRes-stop-interval-0s)<br>[root@pcmk2 ~]#<br><br><br></div>Furthermore, in /etc/drbd.d/myres.res I have:<br>disk {<br>                fencing resource-only;<br>        }<br>        handlers {<br>                fence-peer &quot;/usr/lib/drbd/crm-fence-peer.sh&quot;;<br>                after-resync-target &quot;/usr/lib/drbd/crm-unfence-peer.sh&quot;;<br>        }<br><br></div>So, I&#39;m testing various cases for stonith / failover and high availability in general:<br></div>1) pcs cluster standby / unstandby first on the secondary node then on the primary node<br></div>2) stonith_admin --reboot=pcmk[12]<br></div>3) Shutdown one vm at a time causing a failover of all resources and a resync after the node returns up <br></div>4) nmcli connection down corosync-network<br></div>5) nmcli connection down replication-network<br><br></div>All tests have been passed except the last one. Please note that I have 2 separated networks on each node: one for corosync and another for drbd replication. When I try to simulate a down on the replication network, I see resources:<br><br></div>On the secondary:<br> 0:myres/0     WFConnection Secondary/Unknown UpToDate/DUnknown<br><br></div>On the primary:<br> 0:myres/0     StandAlone Primary/Unknown UpToDate/Outdated<br><br></div>Is this normal? It seems that I have to manually do some actions to adjust the cluster:<br></div>drdbadm  connect --discard-my-data myres  &lt;&lt;--- On  the secondary<br></div>drbdadm connect myres &lt;--- On the primary<br><br></div>Is there an automated way to do this when the replication network returns up??<br><br></div>Thank you<br><div><div><div><br><div><div><br><div><div><div><br><div><div><div><div><div><div><br><br><div><br></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>