[Drbd-dev] /usr/lib/drbd/crm-unfence-peer.sh: fencing rule leak?

Fri Dec 21 21:21:28 CET 2012

Hi,

I have 3 remote sites in a Pacemaker cluster with quorum, 2 sites has
DRBD nodes (active-passive). There's one common communication path for
Corosync and DRBD. One site is preferred as primary. When the primary
node gets unclean shutdown, other site promoted as primary as
expected.
When the failed node reboots, it starts- and *immediately* promotes
DRBD as expected. Promoting may done before DRBD even connects so it
goes online with stale data sometimes - I'm using resource fencing
scripts to prevent this behavior.

Unfortunately there's a problem triggered by short network blackouts:
the unfencing script leaks random fencing rules that can prevent
failover on real outages later. I have a "high" token timeout in
corosync.conf (180s), thoose short blackouts are not detected by
Pacemaker at all.

Based on logs I tried to reconstruct what happens:

* nodeP: the preferred node
* nodeS: the secondary node (currently is the Pacemaker DC)
* nodeX: the quorum site (there are the clients)

1. network outage; nodeP and nodeX has the quorum
2. detected by both DRBD soon (PingAck timeout)
=> Case I:
3. nodeP: DRBD fence-peer script called and *finished* successfully
4. network restored
5. DRBD communication restored, Sync finished almost immediately
6. nodeS: crm-unfence-peer.sh called; $have_constraint in
drbd_peer_fencing() is false, so does nothing and exit [1]
7. nodeS: CIB gets replicated, the fencing rule appears in local CIB
=> Case II:
3. nodeP: DRBD fence-peer script called and *blocked* somewhere
4. network restored
5. DRBD communication restored, Sync finished almost immediately
6. nodeS: crm-unfence-peer.sh called; $have_constraint in
drbd_peer_fencing() is false, so does nothing and exit [1]
7. nodeP: crm-fence-peer.sh finished
8. nodeS: the fencing rule appears in local CIB

[1] http://git.drbd.org/gitweb.cgi?p=drbd-8.3.git;a=blob;f=scripts/crm-fence-peer.sh;h=dc776a3d1b7f313bcd315bef3029e841de7646cb;hb=HEAD#l298

Case II. only can happen when outage is short (< $dc_timeout), case I.
even when outage is longer but there are just a few bytes to Sync -
otherwise the cluster have time to "finalize" the local CIB.

Actually this is a race between the communication paths of DRBD and
Pacemaker so can be solved by separate the paths. Unfortunately I have
no possibility for that, this is a low cost SOHO project. Another
solution could be to call the unfence script on the primary node, but
drbd.conf's handlers does not support this AFAIK.
So I made a patch for /usr/lib/drbd/crm-unfence-peer.sh that fixed the
problem (attached), but I'm not sure if this is a proper solution,
please share your opinion.

The fencing is used only to prevent split-brain after unclean reboot
of the preferred node - if there's another solution for this, I can
drop resource fencing without drawbacks in this setup.

I'm using version 8.3.7 with Debian Squeeze.
(Or is it a drbd-user@ question..?)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crm-unfence-peer-wait_for_queued_request.diff
Type: application/octet-stream
Size: 2087 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-dev/attachments/20121221/6c628f49/attachment.obj>