Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I am in the process of testing a 3 node (2 real nodes and 1 quorum node) cluster with Pacemaker 1.1.11 + Corosync 2.3.3 and DRBD 8.3.11 on Ubuntu 12.04. I have backported most of these packages in this PPA: https://launchpad.net/~xespackages/+archive/clustertesting I have configured a one-primary DRBD resource and configured it to run on either node (node0 or node1): primitive p_drbd_drives ocf:linbit:drbd \ params drbd_resource="r0" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="10" role="Master" timeout="90" \ op monitor interval="20" role="Slave" timeout="60" ms ms_drbd_drives p_drbd_drives \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master" colocation c_drbd_fs_services inf: g_store ms_drbd_drives:Master order o_drbd_fs_services inf: ms_drbd_drives:promote g_store:start As you can see, it is colocated with a group of other resources (g_store) and the above order constraint makes it promote the DRBD resource before starting the other resources. Due to this bug, I am stuck at DRBD 8.3.11: https://bugs.launchpad.net/ubuntu/+source/drbd8/+bug/1185756 However, this version of DRBD's crm-fence-peer.sh doesn't support newer versions of pacemaker which no longer use ha="active" as part of the <node_state> tag: http://lists.linbit.com/pipermail/drbd-user/2012-October/019204.html Therefore, I updated the copy of /usr/lib/drbd/crm-fence-peer.sh on all nodes to use the latest version in the DRBD 8.3 series (2013-09-09): http://git.linbit.com/gitweb.cgi?p=drbd-8.3.git;a=history;f=scripts/crm-fence-peer.sh;h=6c8c6a4eda870b506b175d9833fea94761237d20;hb=HEAD During testing, I've tried shutting down the currently-active node. When doing so, the fence peer handler inserts the constraint correctly, but it exits with exit code 5: INFO peer is not reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-ms_drbd_drives' crm-fence-peer.sh exit codes: http://www.drbd.org/users-guide-8.3/s-fence-peer.html I can see this constraint in the CIB, however, the remaining (still secondary) node fails to promote. Moreover, when the original node is powered back on, it repeatedly attempts to remove the constraint by calling crm-unfence-peer.sh, which exits with exit code 0, removing the constraint. However it doesn't seem to recognize this and repeatedly keeps calling crm-unfence-peer.sh. How can I resolve these problems with crm-fence-peer.sh? Is exit code 5 an acceptable state to allow DRBD to promote the resource on the remaining node? It would seem so given that the constraint would prevent the DRBD resource from being promoted on the bad node until it has rejoined the cluster. Thanks, Andrew