[DRBD-user] HA DRBD with Pacemaker

Thu Apr 26 15:49:40 CEST 2018

Hello guys,
By two weeks I'm struggling with the DRBD&Pacemaker configuration in order to have an HA NFS server
I tried all the examples google was able to display me without success
Also, I've read lots of articles on this distribution list and was not able to end up with a working configuration either
This article is interesting enough secundary not finish synchronizing especially this quote:

| 
| 
|  | 
secundary not finish synchronizing

 |

 |

 |

"To be able to avoid DRBD data divergence due to cluster split-brain,
you'd need both. Stonith alone is not good enough, DRBD fencing
policies alone are not good enough. You need both."
but still not able to make it work

Now that I have expressed my feelings about the product/s :) let me summarize my experience:
2 identical VMs with an LVM volume and a SINGLE NIC
DRBD 9.0.9
# rpm -qa|grep drbddrbd90-utils-9.1.0-1.el7.elrepo.x86_64kmod-drbd90-9.0.9-1.el7_4.elrepo.x86_64

Pacemaker 1.1.16# rpm -qa|grep pacemaker
pacemaker-1.1.16-12.el7_4.8.x86_64pacemaker-libs-1.1.16-12.el7_4.8.x86_64pacemaker-cluster-libs-1.1.16-12.el7_4.8.x86_64pacemaker-cli-1.1.16-12.el7_4.8.x86_64

Corosync 2.4.0
# rpm -qa|grep corosynccorosynclib-2.4.0-9.el7_4.2.x86_64corosync-2.4.0-9.el7_4.2.x86_6

DRBD resource on both nodes:# cat /etc/drbd.d/r0.res
resource r0 {net {
#        fencing resource-only;        fencing resource-and-stonith;}

handlers {        fence-peer      "/usr/lib/drbd/crm-fence-peer.9.sh";        after-resync-target     "/usr/lib/drbd/crm-unfence-peer.9.sh";}
protocol C;on nfs1 {    device    /dev/drbd0;    disk      /dev/mapper/vg_cdf-lv_cdf;    address   10.200.50.21:7788;    meta-disk internal;  }  on nfs2 {    device    /dev/drbd0;    disk      /dev/mapper/vg_cdf-lv_cdf;    address   10.200.50.22:7788;    meta-disk internal;  }}

Everything is good up until now; mounted the volume on both nodes and was able to see how data flies
The problem occurs with the Pacemaker on top because I was not able to configure it to have a Master and a Slave resource, only a Master and a stopped one
Here the Pacemaker configs:

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.200.50.20 cidr_netmask=24 op monitor interval=30s
pcs cluster cib drbd_cfgpcs -f drbd_cfg resource create Data ocf:linbit:drbd drbd_resource=r0 op monitor interval=60spcs -f drbd_cfg resource master DataClone Data master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=truepcs -f drbd_cfg constraint colocation add DataClone with ClusterIP INFINITYpcs -f drbd_cfg constraint order ClusterIP then DataClonepcs cluster cib-push drbd_cfg

pcs cluster cib fs_cfgpcs -f fs_cfg resource create DataFS Filesystem device="/dev/drbd0" directory="/var/vols/itom" fstype="xfs"pcs -f fs_cfg constraint colocation add DataFS with DataClone INFINITY with-rsc-role=Masterpcs -f fs_cfg constraint order promote DataClone then start DataFSpcs cluster cib-push fs_cfg

pcs cluster cib nfs_cfg pcs -f nfs_cfg resource create nfsd nfsserver nfs_shared_infodir=/var/vols/nfsinfopcs -f nfs_cfg resource create nfscore exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash directory=/var/vols/core fsid=1999pcs -f nfs_cfg resource create nfsdca exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash directory=/var/vols/dca fsid=1999pcs -f nfs_cfg resource create nfsnode1 exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash directory=/var/vols/node1 fsid=1999pcs -f nfs_cfg resource create nfsnode2 exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash directory=/var/vols/node2 fsid=1999pcs -f nfs_cfg constraint order DataFS then nfsdpcs -f nfs_cfg constraint order nfsd then nfscorepcs -f nfs_cfg constraint order nfsd then nfsdcapcs -f nfs_cfg constraint order nfsd then nfsnode1pcs -f nfs_cfg constraint order nfsd then nfsnode2pcs -f nfs_cfg constraint colocation add nfsd with DataFS INFINITYpcs -f nfs_cfg constraint colocation add nfscore with nfsd INFINITYpcs -f nfs_cfg constraint colocation add nfsdca with nfsd INFINITYpcs -f nfs_cfg constraint colocation add nfsnode1 with nfsd INFINITYpcs -f nfs_cfg constraint colocation add nfsnode2 with nfsd INFINITYpcs cluster cib-push nfs_cfg

pcs stonith create nfs1_fen fence_ipmilan pcmk_host_list="nfs1" ipaddr=100.200.50.21 login=user passwd=pass lanplus=1 cipher=1 op monitor interval=60spcs constraint location nfs1_fen avoids nfs1pcs stonith create nfs2_fen fence_ipmilan pcmk_host_list="nfs2" ipaddr=100.200.50.22 login=user  passwd=pass lanplus=1 cipher=1 op monitor interval=60spcs constraint location nfs2_fen avoids nfs2

And here the status of the cluster:

# pcs statusCluster name: nfs-clusterStack: corosyncCurrent DC: nfs2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorumLast updated: Thu Apr 26 13:31:20 2018Last change: Thu Apr 26 09:10:44 2018 by root via cibadmin on nfs1
2 nodes configured11 resources configured
Online: [ nfs1 nfs2 ]
Full list of resources:
 ClusterIP      (ocf::heartbeat:IPaddr2):       Started nfs1 Master/Slave Set: DataClone [Data]     Masters: [ nfs1 ]     Stopped: [ nfs2 ] DataFS (ocf::heartbeat:Filesystem):    Started nfs1 nfsd   (ocf::heartbeat:nfsserver):     Started nfs1 nfscore        (ocf::heartbeat:exportfs):      Started nfs1 nfsdca (ocf::heartbeat:exportfs):      Started nfs1 nfsnode1       (ocf::heartbeat:exportfs):      Started nfs1 nfsnode2       (ocf::heartbeat:exportfs):      Started nfs1 nfs1_fen       (stonith:fence_ipmilan):        Stopped nfs2_fen       (stonith:fence_ipmilan):        Stopped
Failed Actions:* nfs1_fen_start_0 on nfs2 'unknown error' (1): call=97, status=Timed Out, exitreason='none',    last-rc-change='Thu Apr 26 09:10:45 2018', queued=0ms, exec=20009ms* nfs2_fen_start_0 on nfs1 'unknown error' (1): call=118, status=Timed Out, exitreason='none',    last-rc-change='Thu Apr 26 09:11:03 2018', queued=0ms, exec=20013ms

Daemon Status:  corosync: active/enabled  pacemaker: active/enabled  pcsd: active/enabled

So, with the above config, I'm seeing drbd started on the "promoted" master node with a connecting status because the "slave"'s drbd is not runningThis is my first concern: how to instruct Pacemaker to start both drbd processes on both hosts/VMs at the cluster startup? (kinda Master/Slave and the synchronization to happen)(I have to manually start the drbd on the slave to have the following resources deployed/started so no automation/resilience...etc)

My second concern is about STONITH; is this ipmilan applicable for the current implementation? (2 VMs with a single NIC each)
Third one: how to test that this HA indeed happens; I was trying by forcing the switch via a constraint like"pcs constraint location ClusterIP prefers nfs2=INFINITY" or by disconnecting the NIC
If somebody may share their experience and why not, some sample configs, I'll appreciate it. Also, any additional feedback regarding the current configuration is more than welcome
Many thanks,Mihai
PS. Although this is a really good book, I was not able to make it work :(
PS.PS. this is just a personal assessment in order to understand the power of these technologies 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180426/27e08329/attachment-0001.htm>