[Linux-ha-dev] Re: [DRBD-user] drbd peer outdater: higher level implementation?

Lars Ellenberg lars.ellenberg at linbit.com
Fri Sep 12 23:55:53 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


was: In-Reply-To: <20080912202727.GC14037 at marowsky-bree.de>

On Fri, Sep 12, 2008 at 10:27:27PM +0200, Lars Marowsky-Bree wrote:
> > > listen to the notifications we provide, and infer the peer state by that
> > > means ... ;-)
> > yeah.  I asked you before,
> > how exactly that would look like,
> > and so far I saw only handwaving.
> 
> Hm, I don't think there was hand-waving. Sorry. What was unclear?
> 
> You get notifications when the peer starts or goes down (or is fenced,
> which looks the same). This is not yet relayed to drbd internally (just
> the RA gets the notification so far), but we could, for example, call
> "standalone" explicity to disconnect; we can discuss this mechanism.
> 
> When drbd loses the peer internally, but w/o us providing the
> notification, it's either the replication link crashed, or fencing
> failing or loss of quorum; anyway, you'd "outdate" yourself (and freeze
> io) until this notification was provided (which of course needs to be
> persistent across reboots).
> 
> Wouldn't that work?

that would prevent normal failover, no?

what we need is,
 * on the "Secondary", "slave",
   or whatever you want to call it,
 * the signal of the peer, that says:
   hey, I'm still alive, I'm still Primary,
   and continue to modify the data set,
   so you better keep out of the way.
then we mark us as outdated.

I don't think that this can be mapped into
multiple negation plus timeout logic effectively.
do you suggest that,
 * on the Secondary
 * we get no signal that the peer is not dead in no time,
   and therefore don't mark ourself as not uptodate?
uh?

sorry, it is late.
can you explain slowly?

situation 1:

	primary crash.
	secondary has to take over,
	so it better not mark itself outdated.

situation 2:

	replication link breaks
	primary wants to continue serving data.
	so secondary must mark itself outdated.
	otherwise on a later primary crash heartbeat would try to make
	it primary and succeed in going online with stale data.

	that DID HAPPEN.
	that is why dopd was invented in the first place.

variation:
	as it may be a cluster partition.
	with stonith, (at least) one of the nodes gets shot.
	primary must freeze until peer is confirmed outdated (or shot)
	and must unfreeze again as soon as peer is confirmed outdated (or shot)

where and when do what notifications come in,
and how is drbd (the RA) to react on those?


I recently discussed with our Andreas Kurz, that
what _could_ possibly work is a "monitor" action,
(and optionally some daemon)
that periodically gets the "data generation uuids" from drbd
and feed that into the cib (reuse attrd?)

then when we lose the replication link,
primary freezes, the user land callback
on the primary queries the cib.
if the other nodes is dead
  (it should better be shot; we might need a pseudo resource for exactly
   that purpose; or would pacemaker shoot a node that does not hold any
   resources [that could be started elsewhere]?)
we'd notice, and unfreeze.

if the other node is still alive, it will propagate its uuids.
so will we.

on the secondary, the next monitor action will see the other still alive
with newer UUID, so it would outdate itself, which is just one flag in
the UUIDs anyways, so they would get propagated by the cib to the
primary, which eventually will see the secondaries UUIDs saying it is
outdated.
now we can unfreeze on the Primary.

on the secondary,
if it was a Primary crash, there will be no newer Primary UUID
propagated from the cib, so there will be no self-outdate.
when heartbeat decides to make it primary, we are online again.

but I don't see where any notification would come in.

reading that again, I was not really able to follow myself,
so I'll try again after I got some sleep.
unless, of course, it is all clear to you.
in which case, please,
would you rephrase my wording so I can understand it?  ;)

cheers,


-- 
: Lars Ellenberg                
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list