Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
I have been able to get a successful cluster started, and even after one or
two reboots, but sometimes ungracefully (the cluster is on and I just sudo
reboot)
Doing some testing for how it handles that situation. Sometimes the boot
just hangs, and I have to manually reset the VM (esxi) I was wondering if
its doing something or just literally hanging. The Console shows nothing.
Regardless, I am e-mailing because my cluster finally started, doesn't throw
any errors that I can see, but now it just stays with everything stopped and
the two nodes online.
Can you take a look at my confis? Any logs you might need? What could be the
culprit? Much Appreciated!
DRBD Config - NO CLUSTER
sed -i 's/\(^SELINUX=\).*/\SELINUX=disabled/' /etc/selinux/config && reboot
sestatus
systemctl stop firewalld
systemctl disable firewalld
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install -y kmod-drbd84 drbd84-utils mariadb-server mariadb
systemctl disable mariadb.service
fdisk /dev/sdb
cat << EOL >/etc/drbd.d/sql.res
resource sql {
protocol C;
meta-disk internal;
device /dev/drbd0;
disk /dev/sdb1;
net {
allow-two-primaries;
}
syncer {
verify-alg sha1;
}
on node1.freesoftwareservers.com {
address 192.168.1.216:7788;
}
on node2.freesoftwareservers.com {
address 192.168.1.219:7788;
}
}
EOL
drbdadm create-md sql
modprobe drbd
drbdadm up sql
# Node1
drbdadm primary --force sql
watch cat /proc/drbd
mkfs.xfs /dev/drbd0
mount /dev/drbd0 /mnt
df -h | grep drbd
systemctl start mariadb
mysql_install_db --datadir=/mnt --user=mysql
umount /mnt
systemctl stop mariadb
Cluster Configs - Post DRBD Configuration
yum install -y pcs policycoreutils-python psmisc
echo "passwd" | passwd hacluster --stdin
systemctl start pcsd.service
systemctl enable pcsd.service
# Node1
pcs cluster auth node1 node2 -u hacluster -p passwd
pcs cluster setup --force --name mysql_cluster node1 node2
pcs cluster start --all
# Wait 2 seconds
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs cluster start --all
# Wait 5 seconds
pcs status
pcs resource create sql_drbd_res ocf:linbit:drbd \
drbd_resource=sql \
op monitor interval=30s
pcs resource master SQLClone sql_drbd_res \
master-max=1 master-node-max=1 \
clone-max=2 clone-node-max=1 \
notify=true
pcs resource create sql_fs Filesystem \
device="/dev/drbd0" \
directory="/var/lib/mysql" \
fstype="xfs"
pcs resource create sql_service ocf:heartbeat:mysql \
binary="/usr/bin/mysqld_safe" \
config="/etc/my.cnf" \
datadir="/var/lib/mysql" \
pid="/var/lib/mysql/mysql.pid" \
socket="/var/lib/mysql/mysql.sock" \
additional_parameters="--bind-address=0.0.0.0" \
op start timeout=60s \
op stop timeout=60s \
op monitor interval=20s timeout=30s
pcs resource create sql_virtual_ip ocf:heartbeat:IPaddr2 \
ip=192.168.1.215 cidr_netmask=32 nic=eth0 \
op monitor interval=30s
pcs constraint colocation add sql_virtual_ip with sql_service INFINITY
pcs constraint colocation add sql_fs with SQLClone \
INFINITY with-rsc-role=Master
pcs constraint order promote SQLClone then start sql_fs
pcs constraint colocation add sql_service with sql_fs INFINITY
pcs constraint order sql_fs then sql_service
pcs resource group add SQL-Group sql_service sql_fs sql_virtual_ip
pcs cluster start --all
pcs status
[root at node1 ~]# pcs status
Cluster name: mysql_cluster
Last updated: Sun Jun 12 03:04:44 2016 Last change: Sun Jun 12
02:50:16 2016 by root via cibadmin on node1
Stack: corosync
Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
quorum
2 nodes and 5 resources configured
Online: [ node1 node2 ]
Full list of resources:
Master/Slave Set: SQLClone [sql_drbd_res]
Masters: [ node1 ]
Slaves: [ node2 ]
Resource Group: SQL-Group
sql_service (ocf::heartbeat:mysql): Stopped
sql_fs (ocf::heartbeat:Filesystem): Stopped
sql_virtual_ip (ocf::heartbeat:IPaddr2): Stopped
PCSD Status:
node1: Online
node2: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root at node1 ~]# grep -e ERROR -e WARN /var/log/messages
Jun 12 02:07:24 node1 drbd(sql_drbd_res)[3022]: ERROR: meta parameter
misconfigured, expected clone-max -le 2, but found unset.
Jun 12 02:07:24 node1 drbd(sql_drbd_res)[3051]: ERROR: meta parameter
misconfigured, expected clone-max -le 2, but found unset.
Jun 12 02:07:51 node1 Filesystem(sql_fs)[4094]: ERROR: Couldn't unmount
/var/lib/mysql; trying cleanup with TERM
Jun 12 02:12:03 node1 Filesystem(sql_fs)[2019]: WARNING: Couldn't find
device [/dev/drbd0]. Expected /dev/??? to exist
Jun 12 02:12:03 node1 drbd(sql_drbd_res)[2214]: ERROR: meta parameter
misconfigured, expected clone-max -le 2, but found unset.
Jun 12 02:12:04 node1 Filesystem(sql_fs)[2248]: ERROR: Couldn't find device
[/dev/drbd0]. Expected /dev/??? to exist
Jun 12 02:12:04 node1 Filesystem(sql_fs)[2384]: WARNING: Couldn't find
device [/dev/drbd0]. Expected /dev/??? to exist
Jun 12 02:38:07 node1 drbd: WARN: stdin/stdout is not a TTY; using
/dev/console.
[root at node1 ~]#
Note : The cluster stared at some point after 2:12 I am fairly sure, that
was my debugging time, I just included this to show the lack of any errors.
[root at node1 ~]# tail /var/log/cluster/corosync.log
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
print_synapse: [Action 43]: Pending pseudo op SQL-Group_running_0
on N/A (priority: 0, waiting: 36 38 40)
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
print_synapse: [Action 42]: Completed pseudo op SQL-Group_start_0
on N/A (priority: 0, waiting: none)
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
print_synapse: [Action 37]: Pending rsc op sql_service_monitor_20000
on node1 (priority: 0, waiting: 36)
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
print_synapse: [Action 36]: Pending rsc op sql_service_start_0
on node1 (priority: 0, waiting: 38)
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
print_synapse: [Action 39]: Pending rsc op sql_fs_monitor_20000
on node1 (priority: 0, waiting: 38)
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
print_synapse: [Action 38]: Pending rsc op sql_fs_start_0
on node1 (priority: 0, waiting: 36)
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
print_synapse: [Action 41]: Pending rsc op
sql_virtual_ip_monitor_30000 on node1 (priority: 0, waiting: 40)
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
print_synapse: [Action 40]: Pending rsc op sql_virtual_ip_start_0
on node1 (priority: 0, waiting: 38)
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: info:
do_log: FSA: Input I_TE_SUCCESS from notify_crmd() received in state
S_TRANSITION_ENGINE
Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice:
do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
[root at node1 ~]#
Post Reboot :
#!/bin/bash
#/root/drbdstart.sh
#This should run on both nodes after post-mortem if unexpected failure
modprobe drbd
drbdadm up sql
#!/bin/bash
#/root/clusterstart.sh
#This should run on primary node after post-mortem if unexpected failure
drbdadm primary --force sql
cat /proc/drbd
pcs cluster start --all
pcs status
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160612/9283e5a2/attachment.htm>