[DRBD-user] DRBD and CoroSync fencing and unfencing not working

Sat Nov 13 20:29:33 CET 2010

On Fri, Nov 12, 2010 at 01:25:22PM -0500, Georges-Etienne Legendre wrote:
> Hi,
> 
> I'm testing my DRBD + CoroSync cluster. I've come across a situation that I'm not sure is supported.
> 
> First, my setup:
> - I have a 2 nodes setup, with dual ring for CoroSync.
> - Stonith is configured in CoroSync.
> - DRBD is using a cross-over link between the 2 nodes.
> - DRBD is configured to fence/unfence peer (resource-only) with the scripts (crm-fence/unfence-peer.sh).
> 
> Test case:
> - The cross-over link becomes unavailable (simulated with "ifdown ethX")
> - DRBD fences the peer
> - Then, 2nd failure: the secondary node (DRBD is secondary) node is crashed (e.g. hardware issue on this server, simulated by resetting the server with the ILO).
> 
> The problem:
> When secondary node comes back, CoroSync doesn't see the node coming
> back. The node appears as "Offline" even though CoroSync is started
> and network interface is up. To recover from that situation, I had to
> remove CoroSync constraints, and then reboot the primary node.

That has nothing to do with DRBD or its fencing scripts.

> Is this supposed to work by automatically unfencing the peer?
> Is there something I'm doing wrong here?

Do you need to reset the fault status of your rings on the remaining
node using corosync-cfgtool?  Corosync apparently does not heal itself,
but needs administrative help every now and then.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed