[Linux-ha-dev] Re: [DRBD-user] drbd peer outdater: higher level implementation?

Lars Ellenberg lars.ellenberg at linbit.com
Sat Sep 13 14:52:53 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Sat, Sep 13, 2008 at 04:00:34AM +0200, Lars Marowsky-Bree wrote:
> I'm sorry, I wasn't aware that that was what you were looking for, and
> the web page describes all scenarios when pacemaker delivers a
> notification to an RA (basically, whenever the peer changes state).

and none of them is useful for _outdating_.
some of them may be useful for "unfreezing".

> > Iff I'd get a signal in the RA with the appropriate meaning
> > at the appropriate time, I'd just say "drbdadm outdate resource".
> > that is what dopd does now.
> 
> I think that "outdate" mechanism as it stands today might need some
> minor changes, yes. Just as the logic in the RA surely needs to, and
> possibly we even need to improve m/s if we find a lack there.

so.
what you are suggeting is

when drbd loses replication link
  primary
     freezes and calles out to userland,
       telling heartbeat that the peer has "failed",
       in which case heartbeat would stop drbd on the secondary.
     either receives "secondary was stopped",
        maybe stores to meta data "_I_ am ahead of peer",
            (useful for cluster wide crash/reboot later)
	and unfreezes
     or is being stopped itself
       (which would result in the node being self fenced, as the fs on
        top of drbd cannot be unmounted as drbd is freezed,...)
     or is even being shot as result of a cluster partition.

     so either primary continues to write,
     or it will soon look like a crashed primary.

  secondary
    sets a flag "primary may be ahead of me",
    then waits for
    either being stopped, in which case
      it would save to meta data "primary _IS_ ahead of me"
    or being told that the Primary was stopped
      when it would clear that flag again,
        maybe store to meta data "_I_ am ahead of peer"
      and then most likely soon after be promoted.

while drbd has the "peer may be ahead of me" flag set, i.e. basically
while drbd is not connected and no "certain" flag is set yet, it
will refuse to be promoted.

Did I get that right?

[note that drbd has both "certain" flags already implemented,
 namely "I am outdated" = peer IS ahead of me, and
     "peer is outdated" = _I_ am ahead of peer ]
 
some questions:
  wouldn't that "peer has failed" first trigger a monitor?
  wouldn't that mean that on monitor, a not connected secondary would
  have to report "failed", as otherwise it would not get stopped?
  wouldn't that prevent normal failover?

  if not,
  wouldn't heartbeat try to restart the "failed" secondary?
  what would happen?
  what does a secondary do when started, and it finds the
    "primary IS ahead of me" flag in meta data?
    refuse to start even as slave?
      (would prevent it from ever being resync'ed!)
    start as slave, but refuse to be promoted?

[note that typical DRBD cluster deployment
 is still 2node, in case that matters]

problem: secondary crash.
   secondary reboots,
   heartbeat rejoins the cluster.
   
   replication link is still broken.

   secondary does not have "primary IS ahead of me" flag in meta data
   as because of the crash there was no way to store that.

   would heartbeat try to start drbd (slave) here?
   what would trigger the "IS ahead of me" flag get stored on disk?

   if for some reason policy engine now figures the master should rather
   run on the just rejoined node, how can that migration be prevented?


and so on and on.
there are many scenarios.
I'm still not convinced that this method
covers as many as dopd as good as dopd.
but, at least, it is getting closer...

-- 
: Lars Ellenberg                
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list