Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 01/06/12 10:16, Philip Gaw wrote: > Hi All, > > I posted about this edge case in #drbd last night and it was recommended > to bring it to attention of the devs. > > I have a two node DRBD setup which is in primary/secondary. > > The following test case presents a problem - > > 1. Secondary goes into diskless state due to a broken array (as > expected) while primary is being written to > 2. Primary then dies (power failure) > 3. Secondary gets rebooted, or dies and comes back online etc. > > The secondary will become primary - and as secondary was 'diskless' > before, contains out of date stale data. > > When the 'real' primary comes back online we then have our split brain. > > > What I think needs to happen is a way to mark 'diskless' state as > outdated so that pacemaker will not attempt to bring this node into > primary. You can't solve this problem without compromising what DRBD already does, i.e. comprehensively deal with the problems associated with a _single_ network or disk failure. You can already work around the above potential failure case by e.g. putting the metadata on another physical device, but you introduce performance concerns, and the additional failure case that the metadata device fails! There is maybe a case for a second metadata device that _just_ records these sorts of conditions, but... phew, quite a lot of new code and test cases I'd imagine. I think this potential problem needs _documenting_ but I'm not sure it flags anything to be improved in DRBD. -- Matthew