[DRBD-user] Problem with drbd and replication network
Marco Marino
marino.mrc at gmail.com
Mon Apr 23 17:39:44 CEST 2018
Hello, I've configured a drbd/pacemaker cluster with 2 nodes and I'm doing
some tests for failover. Basically my cluster is quite simple: I have 2
drbd resources configured in pacemaker:
[root at pcmk2 ~]# pcs resource show DrbdRes
Resource: DrbdRes (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=myres
Operations: demote interval=0s timeout=90 (DrbdRes-demote-interval-0s)
monitor interval=29s role=Master
(DrbdRes-monitor-interval-29s)
monitor interval=31s role=Slave (DrbdRes-monitor-interval-31s)
promote interval=0s timeout=90 (DrbdRes-promote-interval-0s)
start interval=0s timeout=240 (DrbdRes-start-interval-0s)
stop interval=0s timeout=100 (DrbdRes-stop-interval-0s)
[root at pcmk2 ~]# pcs resource show DrbdResClone
Master: DrbdResClone
Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
clone-node-max=1
Resource: DrbdRes (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=myres
Operations: demote interval=0s timeout=90 (DrbdRes-demote-interval-0s)
monitor interval=29s role=Master
(DrbdRes-monitor-interval-29s)
monitor interval=31s role=Slave
(DrbdRes-monitor-interval-31s)
promote interval=0s timeout=90 (DrbdRes-promote-interval-0s)
start interval=0s timeout=240 (DrbdRes-start-interval-0s)
stop interval=0s timeout=100 (DrbdRes-stop-interval-0s)
[root at pcmk2 ~]#
Furthermore, in /etc/drbd.d/myres.res I have:
disk {
fencing resource-only;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
So, I'm testing various cases for stonith / failover and high availability
in general:
1) pcs cluster standby / unstandby first on the secondary node then on the
primary node
2) stonith_admin --reboot=pcmk[12]
3) Shutdown one vm at a time causing a failover of all resources and a
resync after the node returns up
4) nmcli connection down corosync-network
5) nmcli connection down replication-network
All tests have been passed except the last one. Please note that I have 2
separated networks on each node: one for corosync and another for drbd
replication. When I try to simulate a down on the replication network, I
see resources:
On the secondary:
0:myres/0 WFConnection Secondary/Unknown UpToDate/DUnknown
On the primary:
0:myres/0 StandAlone Primary/Unknown UpToDate/Outdated
Is this normal? It seems that I have to manually do some actions to adjust
the cluster:
drdbadm connect --discard-my-data myres <<--- On the secondary
drbdadm connect myres <--- On the primary
Is there an automated way to do this when the replication network returns
up??
Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180423/e3e09c41/attachment.htm>
More information about the drbd-user
mailing list