[Linux-ha-dev] Re: [DRBD-user] drbd peer outdater: higher level implementation?

Mon Sep 15 08:28:18 CEST 2008

On Mon, Sep 15, 2008 at 12:56:22AM +0200, Lars Marowsky-Bree wrote:
> > > To be honest, simply using the same links would be simpler. 
> > 
> > then we are back to "true" split brain scenarios.
> > and discussing quorum in a two-node cluster.
> > 
> > sure that would be simpler.
> > but it would cause either no-availability
> > or data divergence every time that link breaks.
> 
> Right; note how my proposal works for "true" split-brain too, of course.

how so?

> > > The restart of the secondary is not just "spurious" though. It might
> > > actually help "fix" (or at least "reset") things. Restarts are amazingly
> > > simple and effective.
> > hmm.
> 
> You've got to admit that it's a valid point ;-)

that was more a disagreeing grumble.
it may also break things.

> > ok, you modify "your" ocf drbd RA as a proof of concept?
> 
> Yes, I can do that.

but.
before you do.

> > according to your proposal,
> > on the drbd part,
> > we'd only need to replace the outdate-peer-handler
> > from "drbd-peer-outdater" to "some other program calling crm fail
> > appropriately and block until confirmed".
> 
> Does drbd on the primary side indeed freeze IO until that script
> returns?

if you set "fencing resource-and-stonith",
yes it does.

"freeze" in the sense that it does not accept new IO.

> And I think the need for the secondary to not allow itself to be
> promoted as I described might need to be implemented in drbd. Hrm. I
> think I could work-around this by setting the "outdated" flag if
> stoppd while disconnected ...
>
> > thats just an entry in the config file
> > (and someone needs to write that script).
> 
> That script should be easy too; not pretty, but easy ...
> 
> > later we may make it easier for the script by
> > extending the logic in the drbd module,
> > to make it easier for asynchonous confirmation.
> 
> I'd probably make the script block and then have the notification signal
> it to continue.
> 
> Ok. I'll try to get to this this week, but I might not make it until
> Wednesday or so. (I'm doing a half-week and thus need to cram a bit.) If
> someone else wants to give it a shot before that, be my guest ;-)

great. but wait.

if we set aside confused admins for the moment,
and assume CRM is the only entity promoting/demoting drbd.

would it not be enough for a Primary on connection loss to
set some constraint pinning the master role on that node/node group?

the DRBD after-resync handler can then remove that contraint again.

-- 
: Lars Ellenberg                
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed