[DRBD-user] receiver & asender dying after a stonith recovery
Dave Dykstra
dwdha at drdykstra.us
Wed Mar 23 17:02:29 CET 2005
Thanks for the explanation. I think it's something that I shouldn't
worry about happening on the production cluster because the scenario of
both network connections going down on the live server, necessitating
a stonith hit on it, seems pretty unlikely. It seems much more likely
that the live server will fail catastrophically all at once and then we
wouldn't have this problem. I just need to come up with some other test
scenario to force a stonith hit, anything that will halt the live server
without removing the power.
- Dave
On Wed, Mar 23, 2005 at 11:37:04AM +0100, Philipp Reisner wrote:
> Am Dienstag, 22. M?rz 2005 22:55 schrieb Dave Dykstra:
> > I've been working on getting heartbeat's stonith to function properly on
> > my cluster that's using drbd. I've got it to the point where I can unplug
> > the two network connections on the live server (one is a direct connect
> > between the two servers, which drbd uses, and the other is the main company
> > network) and stonith will temporarily remove power from the live server.
> > I always plug in the networks again as soon as the power comes back up.
> > The problem I'm having is that almost every time when that server comes
> > back up, drbd on the new live server does not re-establish communication
> > and the receiver and asender are not running. If I then manually run
> > 'drbdadm adjust all' on the new live server everything comes back up.
> > Below is /var/adm/messages from one of the cases. Time 15:19:53 is when I
> > ran 'drbdadm adjust'. Can anybody explain what's going on? Am I supposed
> > to be having heartbeat doing something more so that 'drbdadm adjust'
> > will run?
> >
>
> I can.
>
> I think that you have found the weak point in the design of the generation
> counters, I became aware of in January.
>
> Actually you have a double fault:
>
> 1st Complete Network failure
> 2nd Power failure on the former primary.
>
> You might have a look at
> http://www.drbd.org/fileadmin/drbd/publications/drbd_paper_for_NLUUG_2001.pdf
> and other more recent papers, to see what happens.
>
> I am in the progress to come up with a new scheme of data generation
> identifying for drbd-0.8. For drbd-0.7 things will stay as they are.
>
> Item 16 of http://svn.drbd.org/drbd/trunk/ROADMAP, is still wrong
> and unfinished, but outlines the ideas how to get this right in the
> future.
>
> -Philipp
More information about the drbd-user
mailing list