Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-03-28 20:14:26 -0500 \ Maurice Volaski: > The primary is running drbd 0.7.17 under kernel 2.6.13 and it crashed. can you be more precise than "crashed"? > Having drbd crash is not that unusual, um. what? can you give some details here? > but what is unusual was the heartbeat/drbd failover behavior. > > The secondary was still running drbd 0.7.15. And when it took over, every resource on it but one became primary. One was still left in > secondary state! So naturally heartbeat couldn't mount it. I restarted the primary and allowed drbd to sync and watched it weirdly > sending all data from the secondary computer (acting as primary) to the true primary (the one that crashed) except for this resource that > was left hanging, which synced in the reverse direction! When it was done, I ran fsck to be sure (underlying filesystem is ext3) and it > checked fine. > > How could that have happened? hard to tell without logs. probably your heartbeat deadtime is much shorter than the drbd timeout... so when heartbeat asked the resource to become primary, it still thought that its peer, although silent for a few seconds now, is already and still primary, and thus refused to become primary itself. when the old primary came back, both compared their "generation counts", the crashed now back node recognized that the other node had not modified the resource in between, thus the direction of sync. > What would have happened had the true primary not been available? pure speculation without logs. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.