[DRBD-user] [BUG] drbd 0.7.4 reconnect problem after network failure

Tue Sep 21 19:10:09 CEST 2004

/ 2004-09-21 17:37:03 +0100
\ Steve Purkis:
> On Sep 21, 2004, at 15:16, Lars Ellenberg wrote:
> 
> >/ 2004-09-21 14:19:14 +0100
> >\ Steve Purkis:
> >>On Sep 21, 2004, at 12:31, Lars Ellenberg wrote:
> >>
> >>>/ 2004-09-21 08:48:51 +0100
> >>>\ Steve Purkis:
> >>>>Hi all,
> >>>>
> >>>>It seems DRBD 0.7.4 cannot recover from a network failure.
> >>>
> >>>nonsense.
> >>>see below.
> >>
> >>[snip explanation]
> >>
> >>Ta for the explanation.  To summarize, the problem is that when the
> >>primary's NIC is disconnected, the secondary takes over and DRBD ends
> >>up in a split-brain state.  Even though the only modifications done on
> >>the original primary are to de-activate the device, it has still
> >>changed independently of the current primary.  DRBD (correctly) 
> >>notices
> >>the discrepancy, and bails out to avoid nasty conflicts.
> >>
> >>After thinking through things, I still think that from a layman's 
> >>point
> >>of view this is a functional bug -- ideally drbd should recognize the
> >>fact that it's in this state, and work around it ;-).  But now I
> >>appreciate the difficulties.  Stonith is an option, yes, but one I'd
> >>prefer not to use if I can avoid it.
> >
> >when it was all automatic, then it would just work, because of fencing.
> >if some operator did these things, then he is expected to know what he
> >is doing, and on promoting the not connected secondary to primary he
> >should use the --human flag.
> >
> >a-ha!
> > :)
> 
> ho ho! :)
> Thanks for pointing that flag out; I missed it on my first pass thru 
> the docs...  I'll try it out tomorrow when I get in.
> 
> (Still, I'm actually downgrading the disconnected primary - I wonder if 
> it will have an effect?  we'll see..)

 N1  link    N2
 Pri  ok     Sec
 Pri  broke  Sec
 Sec  broke  Sec
     now, on N2 do: "drbdadm -- --human primary all"
 Sec  broke  Pri

 Sec  ok     Pri
   and they will resync N2 -> N1 happily.

> 
> 
> >>>	we are going to provide a config mechanism somewhen, where
> >>>	one can configure that the node with less modification will
> >>>	be chosen, or the current primary will be chosen, or that
> >>>	... there are many possible ways.
> >>
> >>Hmm... I'm quite interested in these options...  it's true that a node
> >>with less modifications will typically need to be the one that gets
> >>sync'd.  Might be an idea to let them be rules (ie: if current primary
> >>AND has more modifications ...).
> >>
> >>Thinking out loud...
> >
> >remember: if you think here about how to cope with multiple failures:
> >	happy thinking. will give your brain a tough twist...
> >
> >if you just think how to make drbd to
> >do what you (as an operator) mean,
> >rather do what drbd expects the operator to do.
> 
> Yeah, fuzzy line...  But I begin to see why this problem should be 
> solved outside of drbd.
> 
> And you're right that I hadn't thought how to cope with multiple 
> failovers (perhaps sync from the drbd that was the previous master if 

I talked about multiple failures, not yet multiple failovers.
and basically we only pretend to be able to cope with ONE failure.
though we are able to cope with certain double failures, too.

> it had more / the same number of changes).  But if both nodes in a 
> cluster failed, I think I'd want an admin checking things out 
> manually...

> >you can always resort to manual override of the generation counters...
> >but that is intentionally undocumented :)
> 
> I wonder why? ;)

because people need to understand it first, before they can do any
good with it. there are very few special situationswhere this is useful
at all. it is very easy to screw up. and then they would come back
to this list complaining about drbd had eaten their data. no way :)
if someone is able to figure out how the GCs work in DRBD, I suppose he
will be able to play with them to his needs, if at all necessary.

> I'm assuming a generation counter is something attached to change sets..

well, we don't have change sets. we are not a version control system.
we don't have data journalling. we don't do delayed mirroring or provide
a means to go back in time.
though we of course can implement all those stuff,
given enough time, support, and (human) resource ...

> Out of curiosity, do they get 'reset' (or similar) when you stop & 
> start drbd?

no.  there is somewhere in the older publications (see drbd.org) an
explanation what the GCs are, and how they are supposed to work.
I think that might be in the NLUUG 2001 paper.
we meanwhile have decoupled sync direction and Primary status,
but the concept of the generation counters basically holds.

	Lars Ellenberg