Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Jun 11, 2014 at 03:50:37PM -0500, Andrew Martin wrote: > Hello, > > I am in the process of testing a 3 node (2 real nodes and 1 quorum node) cluster > with Pacemaker 1.1.11 + Corosync 2.3.3 and DRBD 8.3.11 on Ubuntu 12.04. I have > backported most of these packages in this PPA: > https://launchpad.net/~xespackages/+archive/clustertesting > > I have configured a one-primary DRBD resource and configured it to run on either > node (node0 or node1): > primitive p_drbd_drives ocf:linbit:drbd \ > params drbd_resource="r0" \ > op start interval="0" timeout="240" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" role="Master" timeout="90" \ > op monitor interval="20" role="Slave" timeout="60" > ms ms_drbd_drives p_drbd_drives \ > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master" > colocation c_drbd_fs_services inf: g_store ms_drbd_drives:Master > order o_drbd_fs_services inf: ms_drbd_drives:promote g_store:start > > As you can see, it is colocated with a group of other resources (g_store) and the > above order constraint makes it promote the DRBD resource before starting the > other resources. Due to this bug, I am stuck at DRBD 8.3.11: > https://bugs.launchpad.net/ubuntu/+source/drbd8/+bug/1185756 No. You are stuck with 8.3.11 because you *chose* to be stuck there. If you wanted to, you'd simply use an 8.4.5 module and corresponding userland. Should be easy enough seeing that you chose to "backport" all the other packages. > However, this version of DRBD's crm-fence-peer.sh doesn't support newer versions > of pacemaker which no longer use ha="active" as part of the <node_state> tag: > http://lists.linbit.com/pipermail/drbd-user/2012-October/019204.html > > Therefore, I updated the copy of /usr/lib/drbd/crm-fence-peer.sh on all nodes to > use the latest version in the DRBD 8.3 series (2013-09-09): > http://git.linbit.com/gitweb.cgi?p=drbd-8.3.git;a=history;f=scripts/crm-fence-peer.sh;h=6c8c6a4eda870b506b175d9833fea94761237d20;hb=HEAD > > During testing, I've tried shutting down the currently-active node. When doing > so, the fence peer handler inserts the constraint correctly, but it exits with > exit code 5: > INFO peer is not reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-ms_drbd_drives' "Shutting down", is in how? Do you first cut the replication link, while still being primary? Well, that *of course* will prevent the other node from being promoted. That's exactly what this is supposed to do if a Primary loses the replication link. > crm-fence-peer.sh exit codes: > http://www.drbd.org/users-guide-8.3/s-fence-peer.html > > I can see this constraint in the CIB, however, the remaining (still secondary) > node fails to promote. Yes. Because that constraint tells it to not become Master. > Moreover, when the original node is powered back on, it > repeatedly attempts to remove the constraint by calling crm-unfence-peer.sh, Is that so. I don't see why it would do that. the crm unfence should be called only by the after-resync-target handler, so you would need to have a resync, be sync target, and finish that resync successfully. > which exits with exit code 0, removing the constraint. However it doesn't seem to > recognize this and repeatedly keeps calling crm-unfence-peer.sh. I don't think that is what happens. Please double check the logs. > How can I resolve these problems with crm-fence-peer.sh? Is exit code 5 an > acceptable state to allow DRBD to promote the resource on the remaining node? No. It is an acceptable exit code for *this* node to continue operating as Primary, and *prevent* the other node from being promoted, because it has stale data. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed