Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 2008-09-12T23:55:53, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: Trying to explain again. > situation 1: > > primary crash. -> secondary receives "peer is stopped (fenced)" notification, clears outdated flag > secondary has to take over, -> secondary promotes fine > so it better not mark itself outdated. > > situation 2: > > replication link breaks -> Pacemaker doesn't do anything, because it doesn't know ;-) (Actually, to drbd, it doesn't know if the link broke or the secondary is indeed down) -> Primary marks itself as "outdated" for now, freezes IO (As you don't like me to say that it is outdated, because this seems to invoke the current meaning instead of the new behaviour, maybe I should call it "marks itself as 'in flux'"? I'm open to using terminology which is more clear.) > primary wants to continue serving data. -> primary calls out to mark the peer as failed -> peer (secondary) is stopped by pacemaker, or fenced (if the machine hung, crashed, whatever) > so secondary must mark itself outdated. -> Secondary is "outdated" by virtue of not having received one of the signals that cleared the flag -> Primary receives "peer is stopped" notification, clears flag, and continues saving data > otherwise on a later primary crash heartbeat would try to make > it primary and succeed in going online with stale data. > > that DID HAPPEN. > that is why dopd was invented in the first place. Right, and I don't think it can happen with this scheme. > > variation: > as it may be a cluster partition. > with stonith, (at least) one of the nodes gets shot. That is actually identical to either one of the above scenarios, I think, depending on which side wins. Only the surviving side will receive all the right steps to continue serving data. > primary must freeze until peer is confirmed outdated (or shot) It'd still call out to try and fail the peer; but as that is impossible (peer is unreachable), it'll instead receive the fencing notification. > and must unfreeze again as soon as peer is confirmed outdated (or shot) Or the primary is shot; could go either way, but that would look like scenario 1. Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde