Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I have been able to get a successful cluster started, and even after one or two reboots, but sometimes ungracefully (the cluster is on and I just sudo reboot) Doing some testing for how it handles that situation. Sometimes the boot just hangs, and I have to manually reset the VM (esxi) I was wondering if its doing something or just literally hanging. The Console shows nothing. Regardless, I am e-mailing because my cluster finally started, doesn't throw any errors that I can see, but now it just stays with everything stopped and the two nodes online. Can you take a look at my confis? Any logs you might need? What could be the culprit? Much Appreciated! DRBD Config - NO CLUSTER sed -i 's/\(^SELINUX=\).*/\SELINUX=disabled/' /etc/selinux/config && reboot sestatus systemctl stop firewalld systemctl disable firewalld rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm yum install -y kmod-drbd84 drbd84-utils mariadb-server mariadb systemctl disable mariadb.service fdisk /dev/sdb cat << EOL >/etc/drbd.d/sql.res resource sql { protocol C; meta-disk internal; device /dev/drbd0; disk /dev/sdb1; net { allow-two-primaries; } syncer { verify-alg sha1; } on node1.freesoftwareservers.com { address 192.168.1.216:7788; } on node2.freesoftwareservers.com { address 192.168.1.219:7788; } } EOL drbdadm create-md sql modprobe drbd drbdadm up sql # Node1 drbdadm primary --force sql watch cat /proc/drbd mkfs.xfs /dev/drbd0 mount /dev/drbd0 /mnt df -h | grep drbd systemctl start mariadb mysql_install_db --datadir=/mnt --user=mysql umount /mnt systemctl stop mariadb Cluster Configs - Post DRBD Configuration yum install -y pcs policycoreutils-python psmisc echo "passwd" | passwd hacluster --stdin systemctl start pcsd.service systemctl enable pcsd.service # Node1 pcs cluster auth node1 node2 -u hacluster -p passwd pcs cluster setup --force --name mysql_cluster node1 node2 pcs cluster start --all # Wait 2 seconds pcs property set stonith-enabled=false pcs property set no-quorum-policy=ignore pcs cluster start --all # Wait 5 seconds pcs status pcs resource create sql_drbd_res ocf:linbit:drbd \ drbd_resource=sql \ op monitor interval=30s pcs resource master SQLClone sql_drbd_res \ master-max=1 master-node-max=1 \ clone-max=2 clone-node-max=1 \ notify=true pcs resource create sql_fs Filesystem \ device="/dev/drbd0" \ directory="/var/lib/mysql" \ fstype="xfs" pcs resource create sql_service ocf:heartbeat:mysql \ binary="/usr/bin/mysqld_safe" \ config="/etc/my.cnf" \ datadir="/var/lib/mysql" \ pid="/var/lib/mysql/mysql.pid" \ socket="/var/lib/mysql/mysql.sock" \ additional_parameters="--bind-address=0.0.0.0" \ op start timeout=60s \ op stop timeout=60s \ op monitor interval=20s timeout=30s pcs resource create sql_virtual_ip ocf:heartbeat:IPaddr2 \ ip=192.168.1.215 cidr_netmask=32 nic=eth0 \ op monitor interval=30s pcs constraint colocation add sql_virtual_ip with sql_service INFINITY pcs constraint colocation add sql_fs with SQLClone \ INFINITY with-rsc-role=Master pcs constraint order promote SQLClone then start sql_fs pcs constraint colocation add sql_service with sql_fs INFINITY pcs constraint order sql_fs then sql_service pcs resource group add SQL-Group sql_service sql_fs sql_virtual_ip pcs cluster start --all pcs status [root at node1 ~]# pcs status Cluster name: mysql_cluster Last updated: Sun Jun 12 03:04:44 2016 Last change: Sun Jun 12 02:50:16 2016 by root via cibadmin on node1 Stack: corosync Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum 2 nodes and 5 resources configured Online: [ node1 node2 ] Full list of resources: Master/Slave Set: SQLClone [sql_drbd_res] Masters: [ node1 ] Slaves: [ node2 ] Resource Group: SQL-Group sql_service (ocf::heartbeat:mysql): Stopped sql_fs (ocf::heartbeat:Filesystem): Stopped sql_virtual_ip (ocf::heartbeat:IPaddr2): Stopped PCSD Status: node1: Online node2: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root at node1 ~]# grep -e ERROR -e WARN /var/log/messages Jun 12 02:07:24 node1 drbd(sql_drbd_res)[3022]: ERROR: meta parameter misconfigured, expected clone-max -le 2, but found unset. Jun 12 02:07:24 node1 drbd(sql_drbd_res)[3051]: ERROR: meta parameter misconfigured, expected clone-max -le 2, but found unset. Jun 12 02:07:51 node1 Filesystem(sql_fs)[4094]: ERROR: Couldn't unmount /var/lib/mysql; trying cleanup with TERM Jun 12 02:12:03 node1 Filesystem(sql_fs)[2019]: WARNING: Couldn't find device [/dev/drbd0]. Expected /dev/??? to exist Jun 12 02:12:03 node1 drbd(sql_drbd_res)[2214]: ERROR: meta parameter misconfigured, expected clone-max -le 2, but found unset. Jun 12 02:12:04 node1 Filesystem(sql_fs)[2248]: ERROR: Couldn't find device [/dev/drbd0]. Expected /dev/??? to exist Jun 12 02:12:04 node1 Filesystem(sql_fs)[2384]: WARNING: Couldn't find device [/dev/drbd0]. Expected /dev/??? to exist Jun 12 02:38:07 node1 drbd: WARN: stdin/stdout is not a TTY; using /dev/console. [root at node1 ~]# Note : The cluster stared at some point after 2:12 I am fairly sure, that was my debugging time, I just included this to show the lack of any errors. [root at node1 ~]# tail /var/log/cluster/corosync.log Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: print_synapse: [Action 43]: Pending pseudo op SQL-Group_running_0 on N/A (priority: 0, waiting: 36 38 40) Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: print_synapse: [Action 42]: Completed pseudo op SQL-Group_start_0 on N/A (priority: 0, waiting: none) Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: print_synapse: [Action 37]: Pending rsc op sql_service_monitor_20000 on node1 (priority: 0, waiting: 36) Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: print_synapse: [Action 36]: Pending rsc op sql_service_start_0 on node1 (priority: 0, waiting: 38) Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: print_synapse: [Action 39]: Pending rsc op sql_fs_monitor_20000 on node1 (priority: 0, waiting: 38) Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: print_synapse: [Action 38]: Pending rsc op sql_fs_start_0 on node1 (priority: 0, waiting: 36) Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: print_synapse: [Action 41]: Pending rsc op sql_virtual_ip_monitor_30000 on node1 (priority: 0, waiting: 40) Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: print_synapse: [Action 40]: Pending rsc op sql_virtual_ip_start_0 on node1 (priority: 0, waiting: 38) Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: info: do_log: FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE Jun 12 03:12:31 [2021] node1.freesoftwareservers.com crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] [root at node1 ~]# Post Reboot : #!/bin/bash #/root/drbdstart.sh #This should run on both nodes after post-mortem if unexpected failure modprobe drbd drbdadm up sql #!/bin/bash #/root/clusterstart.sh #This should run on primary node after post-mortem if unexpected failure drbdadm primary --force sql cat /proc/drbd pcs cluster start --all pcs status -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160612/9283e5a2/attachment.htm>