Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
I am in the process of testing a 3 node (2 real nodes and 1 quorum node) cluster
with Pacemaker 1.1.11 + Corosync 2.3.3 and DRBD 8.3.11 on Ubuntu 12.04. I have
backported most of these packages in this PPA:
https://launchpad.net/~xespackages/+archive/clustertesting
I have configured a one-primary DRBD resource and configured it to run on either
node (node0 or node1):
primitive p_drbd_drives ocf:linbit:drbd \
params drbd_resource="r0" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100" \
op monitor interval="10" role="Master" timeout="90" \
op monitor interval="20" role="Slave" timeout="60"
ms ms_drbd_drives p_drbd_drives \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master"
colocation c_drbd_fs_services inf: g_store ms_drbd_drives:Master
order o_drbd_fs_services inf: ms_drbd_drives:promote g_store:start
As you can see, it is colocated with a group of other resources (g_store) and the
above order constraint makes it promote the DRBD resource before starting the
other resources. Due to this bug, I am stuck at DRBD 8.3.11:
https://bugs.launchpad.net/ubuntu/+source/drbd8/+bug/1185756
However, this version of DRBD's crm-fence-peer.sh doesn't support newer versions
of pacemaker which no longer use ha="active" as part of the <node_state> tag:
http://lists.linbit.com/pipermail/drbd-user/2012-October/019204.html
Therefore, I updated the copy of /usr/lib/drbd/crm-fence-peer.sh on all nodes to
use the latest version in the DRBD 8.3 series (2013-09-09):
http://git.linbit.com/gitweb.cgi?p=drbd-8.3.git;a=history;f=scripts/crm-fence-peer.sh;h=6c8c6a4eda870b506b175d9833fea94761237d20;hb=HEAD
During testing, I've tried shutting down the currently-active node. When doing
so, the fence peer handler inserts the constraint correctly, but it exits with
exit code 5:
INFO peer is not reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-ms_drbd_drives'
crm-fence-peer.sh exit codes:
http://www.drbd.org/users-guide-8.3/s-fence-peer.html
I can see this constraint in the CIB, however, the remaining (still secondary)
node fails to promote. Moreover, when the original node is powered back on, it
repeatedly attempts to remove the constraint by calling crm-unfence-peer.sh,
which exits with exit code 0, removing the constraint. However it doesn't seem to
recognize this and repeatedly keeps calling crm-unfence-peer.sh.
How can I resolve these problems with crm-fence-peer.sh? Is exit code 5 an
acceptable state to allow DRBD to promote the resource on the remaining node? It
would seem so given that the constraint would prevent the DRBD resource from
being promoted on the bad node until it has rejoined the cluster.
Thanks,
Andrew