Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Below are my configs. The issue I'm experiencing is that when stonith reboots the primary, the secondary doesn't get promoted. I thought the handlers in drbd.conf were supposed to "handle" that. Anybody know what I'm missing? I've been looking at logs, but nothing stands out to me. The logging is pretty verbose. Perhaps I could make the logs a little less verbose. I don't know where those options are. I had this working with this identical configuration with the same nodes but simple hostnames. I switched it up to simulate a real world change, where I am using fqdn style hostnames, and also in the configuration files below. This is a pair of "appliances" that run out proprietary software. FQDNs are part of our requirements for our software. So I or someone I train, is going to have to configure this for each pair we deliver. The old HA was much simpler. Have to always move forward. /etc/drbd.conf global { usage-count no; } common { protocol C; handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm_unfence-peer.sh"; split-brain "/usr/lib/drbd/notify-split-brain.sh root"; } startup { } disk { fencing resource-and-stonith; #on-io-error detach; } net { allow-two-primaries yes; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } } resource mysql { protocol C; meta-disk internal; device /dev/drbd1; syncer { verify-alg sha1; rate 33M; csums-alg sha1; } on node1 { disk /dev/sda1; address 10.6.7.24:7789; } on awpnode2 { disk /dev/sda1; address 10.6.7.27:7789; } } /etc/corosync/corosync.conf # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 secauth: off threads: 0 interface { member { memberaddr: 10.6.7.24 } member { memberaddr: 10.6.7.27 } ringnumber: 0 bindnetaddr: 10.6.7.0 mcastport: 5405 ttl: 1 } transport: udpu } logging { fileline: off to_stderr: no to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } pcs config show Master/Slave Set: mysql_data_clone [mysql_data] Masters: [ node2 ] Slaves: [ node1i ] pcs config show Cluster Name: awpcluster Corosync Nodes: node1 node2 Pacemaker Nodes: node1 node2 Resources: Master: mysql_data_clone Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify =true Resource: mysql_data (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=mysql Operations: start interval=0s timeout=240 (mysql_data-start-interval-0s) promote interval=0s timeout=90 (mysql_data-promote-interval-0s) demote interval=0s timeout=90 (mysql_data-demote-interval-0s) stop interval=0s timeout=100 (mysql_data-stop-interval-0s) monitor interval=30s (mysql_data-monitor-interval-30s) Stonith Devices: Resource: fence_node1_kvm (class=stonith type=fence_virsh) Attributes: pcmk_host_list=node1 ipaddr=10.6.7.10 action=reboot login=root passwd=password port=node1 Operations: monitor interval=30s (fence_node1_kvm-monitor-interval-30s) Resource: fence_node2_kvm (class=stonith type=fence_virsh) Attributes: pcmk_host_list=node2 ipaddr=10.6.7.12 action=reboot login=root passwd=password port=awpnode2 delay=15 Operations: monitor interval=30s (fence_awpnode2_kvm-monitor-interval-30s) Fencing Levels: Location Constraints: Resource: mysql_data_clone Constraint: drbd-fence-by-handler-mysql-mysql_data_clone Rule: score=-INFINITY role=Master (id:drbd-fence-by-handler-mysql-rule-mysql_data_clone) Expression: #uname ne cleardata-awpnode2.awarepoint.com (id:drbd-fence-by-handler-mysql-expr-mysql_data_clone) Ordering Constraints: Colocation Constraints: Resources Defaults: resource-stickiness: 200 Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.14-8.el6_8.1-70404b0 have-watchdog: false last-lrm-refresh: 1472768147 no-quorum-policy: ignore stonith-enabled: true Thank you. Neil Schneider DevOps Engineer