[DRBD-user] [BUG] drbd 0.7.4 reconnect problem after network failure

Tue Sep 21 18:37:03 CEST 2004

On Sep 21, 2004, at 15:16, Lars Ellenberg wrote:

> / 2004-09-21 14:19:14 +0100
> \ Steve Purkis:
>> On Sep 21, 2004, at 12:31, Lars Ellenberg wrote:
>>
>>> / 2004-09-21 08:48:51 +0100
>>> \ Steve Purkis:
>>>> Hi all,
>>>>
>>>> It seems DRBD 0.7.4 cannot recover from a network failure.
>>>
>>> nonsense.
>>> see below.
>>
>> [snip explanation]
>>
>> Ta for the explanation.  To summarize, the problem is that when the
>> primary's NIC is disconnected, the secondary takes over and DRBD ends
>> up in a split-brain state.  Even though the only modifications done on
>> the original primary are to de-activate the device, it has still
>> changed independently of the current primary.  DRBD (correctly) 
>> notices
>> the discrepancy, and bails out to avoid nasty conflicts.
>>
>> After thinking through things, I still think that from a layman's 
>> point
>> of view this is a functional bug -- ideally drbd should recognize the
>> fact that it's in this state, and work around it ;-).  But now I
>> appreciate the difficulties.  Stonith is an option, yes, but one I'd
>> prefer not to use if I can avoid it.
>
> when it was all automatic, then it would just work, because of fencing.
> if some operator did these things, then he is expected to know what he
> is doing, and on promoting the not connected secondary to primary he
> should use the --human flag.
>
> a-ha!
>  :)

ho ho! :)
Thanks for pointing that flag out; I missed it on my first pass thru 
the docs...  I'll try it out tomorrow when I get in.

(Still, I'm actually downgrading the disconnected primary - I wonder if 
it will have an effect?  we'll see..)

>>> 	we are going to provide a config mechanism somewhen, where
>>> 	one can configure that the node with less modification will
>>> 	be chosen, or the current primary will be chosen, or that
>>> 	... there are many possible ways.
>>
>> Hmm... I'm quite interested in these options...  it's true that a node
>> with less modifications will typically need to be the one that gets
>> sync'd.  Might be an idea to let them be rules (ie: if current primary
>> AND has more modifications ...).
>>
>> Thinking out loud...
>
> remember: if you think here about how to cope with multiple failures:
> 	happy thinking. will give your brain a tough twist...
>
> if you just think how to make drbd to
> do what you (as an operator) mean,
> rather do what drbd expects the operator to do.

Yeah, fuzzy line...  But I begin to see why this problem should be 
solved outside of drbd.

And you're right that I hadn't thought how to cope with multiple 
failovers (perhaps sync from the drbd that was the previous master if 
it had more / the same number of changes).  But if both nodes in a 
cluster failed, I think I'd want an admin checking things out 
manually...

>   yes, I know, we should better consolidate the documentation and
>   hints, and more prominently give advice for the weird multiple
>   failure corner cases.  but somewhere there is all this
>   expertise: if all else fails, contact linbit.

Well, I suppose I'm kind of doing that via this list ;-)

But in all seriousness, if we go with drbd, my team will be expected to 
sort out any failures quickly to minimize downtime (that's why I'm 
looking into this ;-), so it's best that we know the tools well.

>> What about a preventative option:
>> 	a. become secondary & discard modifications on connection loss
>> 	(ok, so that's crap for primaries - forget it)
>
> hehehehe... :)
>
>> On that train of thought, a command to discard all changes since last
>> connected to peer could be handy?  Something like the 'invalidate', 
>> but
>> one that doesn't force a complete resync.
>
> you can always resort to manual override of the generation counters...
> but that is intentionally undocumented :)

I wonder why? ;)
I'm assuming a generation counter is something attached to change sets..

Out of curiosity, do they get 'reset' (or similar) when you stop & 
start drbd?

>> Sounds like it's more of a high-level problem when I think about it...
>> Maybe better solved at the FS or failover layer.  I can see why human
>> interaction here is a good thing.
>>
>>
>>> the interessting point is what do we _do_ now.
>>> and I think we are not too bad currently.
>>
>> I agree ;-)
>> I'm just trying to give feedback as I learn to help improve drbd.
>
> thank you for that. really.
> and please continue to do so.

No worries - I will do.  That's how OSS improves, afterall...

> -- 
> please use the "List-Reply" function of your email client.

Unfortunately Apple haven't introduced that into Mail.app yet :-/
I keep thinking Mutt, but then I get lazy...

cheers,
-Steve