Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sep 21, 2004, at 15:16, Lars Ellenberg wrote: > / 2004-09-21 14:19:14 +0100 > \ Steve Purkis: >> On Sep 21, 2004, at 12:31, Lars Ellenberg wrote: >> >>> / 2004-09-21 08:48:51 +0100 >>> \ Steve Purkis: >>>> Hi all, >>>> >>>> It seems DRBD 0.7.4 cannot recover from a network failure. >>> >>> nonsense. >>> see below. >> >> [snip explanation] >> >> Ta for the explanation. To summarize, the problem is that when the >> primary's NIC is disconnected, the secondary takes over and DRBD ends >> up in a split-brain state. Even though the only modifications done on >> the original primary are to de-activate the device, it has still >> changed independently of the current primary. DRBD (correctly) >> notices >> the discrepancy, and bails out to avoid nasty conflicts. >> >> After thinking through things, I still think that from a layman's >> point >> of view this is a functional bug -- ideally drbd should recognize the >> fact that it's in this state, and work around it ;-). But now I >> appreciate the difficulties. Stonith is an option, yes, but one I'd >> prefer not to use if I can avoid it. > > when it was all automatic, then it would just work, because of fencing. > if some operator did these things, then he is expected to know what he > is doing, and on promoting the not connected secondary to primary he > should use the --human flag. > > a-ha! > :) ho ho! :) Thanks for pointing that flag out; I missed it on my first pass thru the docs... I'll try it out tomorrow when I get in. (Still, I'm actually downgrading the disconnected primary - I wonder if it will have an effect? we'll see..) >>> we are going to provide a config mechanism somewhen, where >>> one can configure that the node with less modification will >>> be chosen, or the current primary will be chosen, or that >>> ... there are many possible ways. >> >> Hmm... I'm quite interested in these options... it's true that a node >> with less modifications will typically need to be the one that gets >> sync'd. Might be an idea to let them be rules (ie: if current primary >> AND has more modifications ...). >> >> Thinking out loud... > > remember: if you think here about how to cope with multiple failures: > happy thinking. will give your brain a tough twist... > > if you just think how to make drbd to > do what you (as an operator) mean, > rather do what drbd expects the operator to do. Yeah, fuzzy line... But I begin to see why this problem should be solved outside of drbd. And you're right that I hadn't thought how to cope with multiple failovers (perhaps sync from the drbd that was the previous master if it had more / the same number of changes). But if both nodes in a cluster failed, I think I'd want an admin checking things out manually... > yes, I know, we should better consolidate the documentation and > hints, and more prominently give advice for the weird multiple > failure corner cases. but somewhere there is all this > expertise: if all else fails, contact linbit. Well, I suppose I'm kind of doing that via this list ;-) But in all seriousness, if we go with drbd, my team will be expected to sort out any failures quickly to minimize downtime (that's why I'm looking into this ;-), so it's best that we know the tools well. >> What about a preventative option: >> a. become secondary & discard modifications on connection loss >> (ok, so that's crap for primaries - forget it) > > hehehehe... :) > >> On that train of thought, a command to discard all changes since last >> connected to peer could be handy? Something like the 'invalidate', >> but >> one that doesn't force a complete resync. > > you can always resort to manual override of the generation counters... > but that is intentionally undocumented :) I wonder why? ;) I'm assuming a generation counter is something attached to change sets.. Out of curiosity, do they get 'reset' (or similar) when you stop & start drbd? >> Sounds like it's more of a high-level problem when I think about it... >> Maybe better solved at the FS or failover layer. I can see why human >> interaction here is a good thing. >> >> >>> the interessting point is what do we _do_ now. >>> and I think we are not too bad currently. >> >> I agree ;-) >> I'm just trying to give feedback as I learn to help improve drbd. > > thank you for that. really. > and please continue to do so. No worries - I will do. That's how OSS improves, afterall... > -- > please use the "List-Reply" function of your email client. Unfortunately Apple haven't introduced that into Mail.app yet :-/ I keep thinking Mutt, but then I get lazy... cheers, -Steve