[DRBD-user] Consistent device to primary fences remote node

Lars Ellenberg lars.ellenberg at linbit.com
Thu Nov 27 14:47:32 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Nov 26, 2008 at 06:31:53PM +0100, Federico Simoncelli wrote:
> On Wed, Nov 26, 2008 at 5:40 PM, Lars Ellenberg
> <lars.ellenberg at linbit.com> wrote:
> >> I suppose that the problem is my outdate-peer which just fences the
> >> node without actually outdate it.
> >> Even if this is the reason I think that a "Consistent" device should
> >> not be allowed to become primary (fencing the remote node) without any
> >> confirmation.
> >
> > the "confirmation" is supposed to come from the outdate-peer handler.
> > a current Primary will refuse to be Outdated,
> > the outdate would fail,
> > denying the promotion of the merely Consistent one.
> 
> So basically the problem is that my outdate-peer doesn't try to
> outdate the remote peer but it just fences it.
> To fix this behaviour I could modify the outdate-peer handler to check
> the DRBD_RESOURCE dstate and return the exit code 6 if local resource
> is not "UpToDate".
> 
> What do you think? Comments are welcome.

if the power supply is your only means of communication,
you don't have many options.

and this outdate and resource-level fencing buisiness gets cumbersome
very quickly. whenever you think you covered "the" interessting
scenario, "fixed" that special case, you likely broke a dozend other
possible failure scenarios (and yes, we are deep in multiple failure land
here already).

if you return 6,
drbd will (try to) outdate itself as a side effect.

if you return anything else (not in the range of 3 to 7),
say 10, you get a "outdate-peer helper broken, returned 10"
log message from drbd, but drbd will stay "Consistent"
and "DUnknown" for the peer, as to not make any assumptions.


why don't you just set a high initial wait for connection timeout?
   wfc-timeout 172800;
if within two days no-one came and told me that I'm outdated,
and I still cannot reach the other node, I have all right to assume I'm
the only survivor and allowed to become primary.

or am I off track?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list