Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Am Dienstag, 22. März 2005 22:55 schrieb Dave Dykstra: > I've been working on getting heartbeat's stonith to function properly on > my cluster that's using drbd. I've got it to the point where I can unplug > the two network connections on the live server (one is a direct connect > between the two servers, which drbd uses, and the other is the main company > network) and stonith will temporarily remove power from the live server. > I always plug in the networks again as soon as the power comes back up. > The problem I'm having is that almost every time when that server comes > back up, drbd on the new live server does not re-establish communication > and the receiver and asender are not running. If I then manually run > 'drbdadm adjust all' on the new live server everything comes back up. > Below is /var/adm/messages from one of the cases. Time 15:19:53 is when I > ran 'drbdadm adjust'. Can anybody explain what's going on? Am I supposed > to be having heartbeat doing something more so that 'drbdadm adjust' > will run? > I can. I think that you have found the weak point in the design of the generation counters, I became aware of in January. Actually you have a double fault: 1st Complete Network failure 2nd Power failure on the former primary. You might have a look at http://www.drbd.org/fileadmin/drbd/publications/drbd_paper_for_NLUUG_2001.pdf and other more recent papers, to see what happens. I am in the progress to come up with a new scheme of data generation identifying for drbd-0.8. For drbd-0.7 things will stay as they are. Item 16 of http://svn.drbd.org/drbd/trunk/ROADMAP, is still wrong and unfinished, but outlines the ideas how to get this right in the future. -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :