> Lars said:
> <snip>
> just make sure that your heartbeat won't decide to make a node primary
> that happens to have long-since outdated data
>   cluster fine
>   secondary crash [first spike of a brown out]
>   time passes
>   primary crash   [well, now its a real black out]
>   ... [power back]
>   previously secondary comes up, heartbeat decides to make it primary
>      *** you are primary with outdated data ***
>   previously primary needs a lot longer (recounts its scsi devices,
>   thinks it needs to fsck its root, whatever)...
>   same effect as split brain: diverging data sets.
> </snip>
> In the above case, I assume the new primary would update the new
> secondary and you would not have diverging data sets, just old data from
> before the first brownout, no?
> >From all of this the question arises: is there a good general
> configuration (read "silver bullet") that covers most typical failure
> scenarios and doesn't block when one node is down?

No. It always depends. Different deployments have different
requirements. Some might rather be non-operational than working with
even slightly outdated data, some might prefer to just have _any_ data
online, just so long as they _are_ online...

> By the way, does drbd include a timestamp in its metadata, kind of like
> a watchdog timer? It seems like a timestamp could be combined with the
> primary/secondary metadata field to gracefully handle most failures, but
> I'm probably being naive.

no, it does not. for obvious reasons.
but, drbd 8 tags its data generations with uuid, and keeps a short
history of those uuids, which helps a lot in detecting various kinds of
bad things...

