[Linux-ha-dev] Re: [DRBD-user] drbd peer outdater: higher level implementation?

Lars Ellenberg lars.ellenberg at linbit.com
Sat Sep 13 03:48:45 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Sat, Sep 13, 2008 at 03:01:11AM +0200, Lars Marowsky-Bree wrote:
> On 2008-09-12T23:55:53, Lars Ellenberg <lars.ellenberg at linbit.com> wrote:
> 
> > > When drbd loses the peer internally, but w/o us providing the
> > > notification, it's either the replication link crashed, or fencing
> > > failing or loss of quorum; anyway, you'd "outdate" yourself (and freeze
> > > io) until this notification was provided (which of course needs to be
> > > persistent across reboots).
> > > 
> > > Wouldn't that work?
> > 
> > that would prevent normal failover, no?
> 
> No. Normal fail-over will only occur after 'we' have demoted/stopped the
> peer. The cluster manager is quite good at enforcing dependencies ;-)

unfortunately it does not know about the "up2date" ness dependency.
that is the point.

> > what we need is,
> >  * on the "Secondary", "slave",
> >    or whatever you want to call it,
> >  * the signal of the peer, that says:
> >    hey, I'm still alive, I'm still Primary,
> >    and continue to modify the data set,
> >    so you better keep out of the way.
> > then we mark us as outdated.
> 
> Pacemaker/CRM doesn't send signals when nothing changed, so this would
> be a weird thing for it to deliver. However, it _will_ tell you when
> something changed, ie the logic simply needs to be turned around.

and I don't think that is possible.

> > I don't think that this can be mapped into
> > multiple negation plus timeout logic effectively.
> 
> I don't think this needs a timeout.
> 
> > do you suggest that,
> >  * on the Secondary
> >  * we get no signal that the peer is not dead in no time,
> >    and therefore don't mark ourself as not uptodate?
> > uh?
> 
> On the secondary, until you get a signal that the peer is dead
> (stopped/demoted), consider yourself "not eligible" to be promoted (ie,
> outdated).

once outdated, it is outdated.
there is only one way for outdated, stale, data
to become uptodate again: resync.
sorry.

that does not work.

> More generally: on a primary, if the connection to the peer goes away,
> set said flag & freeze IO until this signal/notification is delivered.
> 
> I believe that covers all of the cases.

it does not.
it does cover "Secondary is dead"
it does cover "Primary is dead"

it happens to cover those,
because in both cases no outdating takes place.

so basically it does nothing,
and in situations where there is nothing to do,
that happens to work.

it does not cover "replication link is down".

> > situation 1:
> > 
> > 	primary crash.
> > 	secondary has to take over,
> > 	so it better not mark itself outdated.
> 
> No problem; we'll deliver a "peer is stopped" notification to the
> secondary so it won't be outdated by the time we ask it to promote.

as I said: I'm NOT interested in the situation where I do NOT need to outdate.

I want to know when I have to.

heartbeat aparently cannot tell me, or can it.

> > situation 2:
> > 
> > 	replication link breaks
> > 	primary wants to continue serving data.
> > 	so secondary must mark itself outdated.
> > 	otherwise on a later primary crash heartbeat would try to make
> > 	it primary and succeed in going online with stale data.
> 
> Right. The logic above would protect the data, but if just the
> replication link freezes, this would freeze both nodes. Not good,
> obviously. Indeed that requires some additional logic.
> 
> One possible way is to not freeze IO on the primary; the secondary would
> still outdate itself implicitly,

_when_ does the secondary outdate itself,
based on _what_.
if you implicitly outdate on connection loss,
you prevent normal failover.

you need to outdate on connection loss,
while the primary continues to write.
that cannot happen implicitly.

> and then fail its monitor, and be
> stopped (and moved elsewhere, if we could ;-). That seems correct, and
> not worse than anything dopd does today; freeze-io probably is an
> additional "panic guard".

you can already configure freezing and non-freezing in drbd 
by saying "fencing resource-only" or "fencing resource-and-stonith".

> BTW, when it fails the "monitor", we'll stop it. That could for example
> un-freeze the primary. An alternative is to use crm_resource -F as a
> call-out when drbd notices the master is gone, which would provide
> Pacemaker with an async failure notification and prevent the timeouts
> ...

you try to solve the node failure.
but that is already solved.
we don't need any outdate for a node failure.

solve the replication link failure and later primary crash.
solve the problem to not go online with stale data.

> > 	that is why dopd was invented in the first place.
> 
> Yes, I know.
> 
> > variation:
> > 	as it may be a cluster partition.
> > 	with stonith, (at least) one of the nodes gets shot.
> > 	primary must freeze until peer is confirmed outdated (or shot)
> > 	and must unfreeze again as soon as peer is confirmed outdated (or shot)
> 
> We can't confirm it's outdated, but we can tell you when the peer is
> shot/stopped.
> 
> > where and when do what notifications come in,
> 
> That's explained here:
> http://wiki.linux-ha.org/v2/Concepts/Clones#head-f9fa0f9ab22e08d82c8f00e15d9724eba47f7576

see, that is handwaving.

I describe simple situations,
you could comment inline when and which notifications would take place.
unfortunately, aparently it is not that simple,
as there are no notifications taking place in the interessting situation.

you point to some web page (which I already know)
that outlines a neat mechanism.
which does not apply.

none of those notifications would happen for
"replication link down, but Primary still up and eager to continue to write".
or even "... and still writing along"

again:
situation 1 "normal failover":

    all healthy.
    primary crashes
    heartbeat promotes secondary to primary
    and goes online with good data.

no outdate takes place.

compare with
situation 2: "outdate needed or data jumps back in time"
 
    replication link breaks
    primary keeps writing
	(which means secondary has now stale data)
    primary crashes
    heartbeat promotes secondary to primary
    and goes online with stale data.
 
at which point would the secondary get a notification?
which one?
how could that trigger the outdate mechanism,
and prevent the promotion?
logic in RA script?
trigger on what arguments/parameters/environment variables?

so you think it is sufficient that a secondary without
communication link to the peer refuses to become primary
until heartbeat notifies it that the primary is down?
that is a no-op, as heartbeat will do that always.
it and cannot prevent situation 2.
 
> > and how is drbd (the RA) to react on those?
> 
> See above.

sorry, I don't see.

> How to actually provide the signals to drbd (the module ;-) is of
> course open to discussion,

not at all, that part is solved.

> and I look to you as to understand what works best.

Iff I'd get a signal in the RA with the appropriate meaning
at the appropriate time, I'd just say "drbdadm outdate resource".
that is what dopd does now.

> > I recently discussed with our Andreas Kurz, that
> > what _could_ possibly work is a "monitor" action,
> > (and optionally some daemon)
> > that periodically gets the "data generation uuids" from drbd
> > and feed that into the cib (reuse attrd?)
> 
> I think that is way too complicated and not needed; I think the
> notifications are sufficient, as they provide the peer up/down
> promote/demote events. But I may be wrong.
> 
> > so I'll try again after I got some sleep.
> 
> Good point ;-) I will do the same. And, as I mentioned, bring a
> whiteboard to Prague.
> 
> If I can explain this so that it works, can I have my floating peers
> supported in exchange? ;-)

perhaps.
but you have much work to do ;)

-- 
: Lars Ellenberg                
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list