[DRBD-user] Split Brain due to 'diskless' state with pacemaker/heartbeat

Fri Jun 1 12:30:19 CEST 2012

On 01/06/12 10:16, Philip Gaw wrote:
> Hi All,
>
> I posted about this edge case in #drbd last night and it was recommended
> to bring it to attention of the devs.
>
> I have a two node DRBD setup which is in primary/secondary.
>
> The following test case presents a problem -
>
> 1. Secondary goes into diskless state due to a broken array (as
> expected) while primary is being written to
> 2. Primary then dies (power failure)
> 3. Secondary gets rebooted, or dies and comes back online etc.
>
> The secondary will become primary - and as secondary was 'diskless'
> before, contains out of date stale data.
>
> When the 'real' primary comes back online we then have our split brain.
>
>
> What I think needs to happen is a way to mark 'diskless' state as
> outdated so that pacemaker will not attempt to bring this node into
> primary.

You can't solve this problem without compromising what DRBD already 
does, i.e. comprehensively deal with the problems associated with a 
_single_ network or disk failure.

You can already work around the above potential failure case by e.g. 
putting the metadata on another physical device, but you introduce 
performance concerns, and the additional failure case that the metadata 
device fails!

There is maybe a case for a second metadata device that _just_ records 
these sorts of conditions, but... phew, quite a lot of new code and test 
cases I'd imagine.

I think this potential problem needs _documenting_ but I'm not sure it 
flags anything to be improved in DRBD.

-- 
Matthew