[DRBD-user] Heartbeat & DRBD.SplitBrain.Auto recovering

Stefan Seifert nine at detonation.org
Fri Oct 31 10:39:16 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On Friday, 31. October 2008, Francisco José Méndez Cirera wrote:

> DRBD never recovers "primary/secondary" state after Split Brain, even if
> drbd.conf is configured to discard changes in younger primary (that's what
> I need):

You don't recover from split brain, you make sure it never occures!

> For testing purposes, I unplug the ethernet cable, and because of
> that, they have no comunication (so Split Brain is expected to occur when
> plug again).

Unplugging any cable should not lead to a split brain in a well configured 
cluster. Here are the things, you absolutely _must_ do, in order to have a 
useful system:

* have redundant communication paths, including a serial cable
* use dopd to outdate your secondary in case the drbd replication connection
   goes down

A serial cable is the simplest and best way to add redundancy to your 
heartbeat communication path, because it will still work, even if you 
completely shut down the network on a node, for example by adding the wrong 
firewall rule. Of course, it's a good thing to have more than one network 
connection, too. I usually have a direct cable beween my nodes for drbd 
replication and a second cable for outgoing communication, which I use both 
for heartbeat, too.

dopd makes sure that you don't get a drbd split brain, if your replication 
link goes down by using the remaining heartbeat communication channels to 
outdate the secondary, so it is neutralized until communications is restored. 
You have to use the newest drbd and heartbeat versions to get it working 

Also a good idea is to have a couple of ping nodes, so heartbeat can detect a 
communications loss with the outer world, even if the cluster communications 
work normally.

Hope this helps.


More information about the drbd-user mailing list