Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Thanks for the explanation. I think it's something that I shouldn't worry about happening on the production cluster because the scenario of both network connections going down on the live server, necessitating a stonith hit on it, seems pretty unlikely. It seems much more likely that the live server will fail catastrophically all at once and then we wouldn't have this problem. I just need to come up with some other test scenario to force a stonith hit, anything that will halt the live server without removing the power. - Dave On Wed, Mar 23, 2005 at 11:37:04AM +0100, Philipp Reisner wrote: > Am Dienstag, 22. M?rz 2005 22:55 schrieb Dave Dykstra: > > I've been working on getting heartbeat's stonith to function properly on > > my cluster that's using drbd. I've got it to the point where I can unplug > > the two network connections on the live server (one is a direct connect > > between the two servers, which drbd uses, and the other is the main company > > network) and stonith will temporarily remove power from the live server. > > I always plug in the networks again as soon as the power comes back up. > > The problem I'm having is that almost every time when that server comes > > back up, drbd on the new live server does not re-establish communication > > and the receiver and asender are not running. If I then manually run > > 'drbdadm adjust all' on the new live server everything comes back up. > > Below is /var/adm/messages from one of the cases. Time 15:19:53 is when I > > ran 'drbdadm adjust'. Can anybody explain what's going on? Am I supposed > > to be having heartbeat doing something more so that 'drbdadm adjust' > > will run? > > > > I can. > > I think that you have found the weak point in the design of the generation > counters, I became aware of in January. > > Actually you have a double fault: > > 1st Complete Network failure > 2nd Power failure on the former primary. > > You might have a look at > http://www.drbd.org/fileadmin/drbd/publications/drbd_paper_for_NLUUG_2001.pdf > and other more recent papers, to see what happens. > > I am in the progress to come up with a new scheme of data generation > identifying for drbd-0.8. For drbd-0.7 things will stay as they are. > > Item 16 of http://svn.drbd.org/drbd/trunk/ROADMAP, is still wrong > and unfinished, but outlines the ideas how to get this right in the > future. > > -Philipp