Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Sep 15, 2008 at 12:56:22AM +0200, Lars Marowsky-Bree wrote: > > > To be honest, simply using the same links would be simpler. > > > > then we are back to "true" split brain scenarios. > > and discussing quorum in a two-node cluster. > > > > sure that would be simpler. > > but it would cause either no-availability > > or data divergence every time that link breaks. > > Right; note how my proposal works for "true" split-brain too, of course. how so? > > > The restart of the secondary is not just "spurious" though. It might > > > actually help "fix" (or at least "reset") things. Restarts are amazingly > > > simple and effective. > > hmm. > > You've got to admit that it's a valid point ;-) that was more a disagreeing grumble. it may also break things. > > ok, you modify "your" ocf drbd RA as a proof of concept? > > Yes, I can do that. but. before you do. > > according to your proposal, > > on the drbd part, > > we'd only need to replace the outdate-peer-handler > > from "drbd-peer-outdater" to "some other program calling crm fail > > appropriately and block until confirmed". > > Does drbd on the primary side indeed freeze IO until that script > returns? if you set "fencing resource-and-stonith", yes it does. "freeze" in the sense that it does not accept new IO. > And I think the need for the secondary to not allow itself to be > promoted as I described might need to be implemented in drbd. Hrm. I > think I could work-around this by setting the "outdated" flag if > stoppd while disconnected ... > > > thats just an entry in the config file > > (and someone needs to write that script). > > That script should be easy too; not pretty, but easy ... > > > later we may make it easier for the script by > > extending the logic in the drbd module, > > to make it easier for asynchonous confirmation. > > I'd probably make the script block and then have the notification signal > it to continue. > > Ok. I'll try to get to this this week, but I might not make it until > Wednesday or so. (I'm doing a half-week and thus need to cram a bit.) If > someone else wants to give it a shot before that, be my guest ;-) great. but wait. if we set aside confused admins for the moment, and assume CRM is the only entity promoting/demoting drbd. would it not be enough for a Primary on connection loss to set some constraint pinning the master role on that node/node group? the DRBD after-resync handler can then remove that contraint again. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed