Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Nov 12, 2010 at 01:25:22PM -0500, Georges-Etienne Legendre wrote: > Hi, > > I'm testing my DRBD + CoroSync cluster. I've come across a situation that I'm not sure is supported. > > First, my setup: > - I have a 2 nodes setup, with dual ring for CoroSync. > - Stonith is configured in CoroSync. > - DRBD is using a cross-over link between the 2 nodes. > - DRBD is configured to fence/unfence peer (resource-only) with the scripts (crm-fence/unfence-peer.sh). > > Test case: > - The cross-over link becomes unavailable (simulated with "ifdown ethX") > - DRBD fences the peer > - Then, 2nd failure: the secondary node (DRBD is secondary) node is crashed (e.g. hardware issue on this server, simulated by resetting the server with the ILO). > > The problem: > When secondary node comes back, CoroSync doesn't see the node coming > back. The node appears as "Offline" even though CoroSync is started > and network interface is up. To recover from that situation, I had to > remove CoroSync constraints, and then reboot the primary node. That has nothing to do with DRBD or its fencing scripts. > Is this supposed to work by automatically unfencing the peer? > Is there something I'm doing wrong here? Do you need to reset the fault status of your rings on the remaining node using corosync-cfgtool? Corosync apparently does not heal itself, but needs administrative help every now and then. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed