Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> What is the best way to manage a split-brain on a Master<>Master setup on
> DRBD 8.0 ?
Personally, I'd say "Manually" or "With extreme prejudice". Anything
else is likely to cuase difficulty somewhere - and that's a generic
thing about split brain, not drbd specific.
> DRBD will see by itself see if a node is up for what I unserstood,
DRBD will see if the node is up, and available via a network
connection. There's no magic "It's up, even though I can't talk to it
over the defined network" option. Support for a secondary network
link would be nice, but it's probably not worth the extra effort.
> will it be automaticly sync it to 1-1 for both ways ?
If they are both still in "primary" mode, then this isn't going to
work. There's no way to do a 2 way resync without understanding of
the higher level data - and even then you're likely to get conflicts.
Take the simple case of a single EXT3 partition that got mounted on
both as a result of a split brain - you could use something like rsync
to resync the filesystems, but that wouldn't necessarily be the right
thing to do anyway:
1) Edit file A on node 1 (A->A1), edit file B on node 2 (B->B2)
2) Now wait a while, and edit B on node1 (B->B1), and A on node 2 (A-A2).
3) remerge, and rsync.
4) You are left with A2 & B1, which means that you've not got the
correct data from either mirror, and probably nothing that makes any
sense (think about them as config files, you've got half the config
from each node).
> Some People claim that when node-2 came back online it needs to resync all
> the data from node-01. Is this true, or is it smart enought to only sync
> the new files ?
Files are at a different level to the drbd device. Without teaching
the sync utility about all the available file systems (ext2, ext3,
reisferfs, jfs, xfs, gfs, ocfs, the list goes on) this couldn't
happen. And even then (see example above) is probably not what you
want.
Y ou could possibly get away with re-syncing any blocks that have
changed on either node to the secondary you pick (e.g. add together
the change list for both nodes, and push that) but I'm not sure that
I'd want to do it that way, even if in theory it would work... I'm too
scared that my data would be corrupted (although if that is what this
does, I'm happy to trust these guys - it's my code I don't trust).
> I think DRBD 8.0 has almost everything in this case you need, the only
> think is a split-brain that you have to manange well.
With split brain and drbd, one of the two nodes is about to be told
that it's wrong. That it's data is wrong, and that all that it thinks
it knows about the drbd device is wrong, and this tends to get ugly.
The only safe way (without lots of hooks into lots of applications) to
do this is to kill anything that's talking to the device, refresh the
device from the copy you want to use (or just make it available during
the resync), and then let things access it again. You can't pause and
re-allow, since that way the app could easily have cached data - it
has to be a kill. Personally my belief is that the best option is to
reboot the secondary node - that way you guarantee that everything is
reset to a known good state - but I certainly accept that this is a
little heavy for some uses. I think it's configurable within the drbd
config file - mine is set to disconnect instead, and wait for me to
deal with it - as I said, I think manual intervention is the way to go
at this point.
To put it another way, the only way to really deal with split brain is
to not let it develop in the first place - and this is something
that's been causing grief for clusters for a long time. In a 2 node
environment you basically have three options (that I can think of) for
how to avoid it:
1) Have a 3rd "thing" that you just use as a votekeeper.
Advantage: This means that you've got 3 votes available, and therefore
you can never have a 50/50 split.
Disadvantage: You have extra complexity, and dependance on a 3rd device
2) Weight one of the nodes as "more important".
Advantage: Very simple to do, very easy to configure, "just works".
Disadvantage: The other node cannot operate if the more important one is
not available, without manual intervention
3) STONITH (Shoot The Other Node In The Head)
Advantage: It means that one node will be down if you ever end up in a
split brain situation.
Disadvantage: It kills one of the machines (fsck, etc) - and normally needs
human intervention to bring it back. You can also,
if you are
unlucky, end up with both nodes dead (Have had this
happen with
Sun Cluster 2.1). Which is great for data
consistency, but is
a bit silly.
With any even number of "votes", you've got the possibility of a 50/50
split. With 4 or more you've got other options that would work
reasonably well - two nodes is often treated as a "special case".
Graham