[DRBD-user] receiver & asender dying after a stonith recovery
philipp.reisner at linbit.com
Wed Mar 23 11:37:04 CET 2005
Am Dienstag, 22. März 2005 22:55 schrieb Dave Dykstra:
> I've been working on getting heartbeat's stonith to function properly on
> my cluster that's using drbd. I've got it to the point where I can unplug
> the two network connections on the live server (one is a direct connect
> between the two servers, which drbd uses, and the other is the main company
> network) and stonith will temporarily remove power from the live server.
> I always plug in the networks again as soon as the power comes back up.
> The problem I'm having is that almost every time when that server comes
> back up, drbd on the new live server does not re-establish communication
> and the receiver and asender are not running. If I then manually run
> 'drbdadm adjust all' on the new live server everything comes back up.
> Below is /var/adm/messages from one of the cases. Time 15:19:53 is when I
> ran 'drbdadm adjust'. Can anybody explain what's going on? Am I supposed
> to be having heartbeat doing something more so that 'drbdadm adjust'
> will run?
I think that you have found the weak point in the design of the generation
counters, I became aware of in January.
Actually you have a double fault:
1st Complete Network failure
2nd Power failure on the former primary.
You might have a look at
and other more recent papers, to see what happens.
I am in the progress to come up with a new scheme of data generation
identifying for drbd-0.8. For drbd-0.7 things will stay as they are.
Item 16 of http://svn.drbd.org/drbd/trunk/ROADMAP, is still wrong
and unfinished, but outlines the ideas how to get this right in the
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
More information about the drbd-user