[DRBD-user] Heartbeat & DRBD.SplitBrain.Auto recovering

Lars Ellenberg lars.ellenberg at linbit.com
Fri Oct 31 11:33:00 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Oct 31, 2008 at 10:18:04AM +0100, Francisco José Méndez Cirera wrote:
> Hi there, I'm using Heartbeat 2.1.4 and DRBD 8.2.7. I have a problem using
> Split Brain during recovering.
> 
> I have two node cluster in active/pasive way. During normal operation, the
> master node runs drbd as primary role, while slave node runs as secondary role.
> For testing purposes, I unplug the ethernet cable, and because of that, they
> have no comunication (so Split Brain is expected to occur when plug again).

no No NO.

this _IS_ split brain.

at least drbd resource internal, possibly even cluster wide, you did not
tell us how many other heartbeat communication links you have.

if this was your only cluster communication link, that setup is
seriously broken.

split brain is the situation where both nodes act as primary,
because either thinks the other is dead.

it can only be _detected_ once communication is re-esatblished.

>  In
> this situation, the master node shows DRBD as "primary/unknown" and the slave
> node shows "primary/unknown". This situation is considered ok, and there is no
> problem because this is the expected behaviour.

this situation is absolutely NOT OK.
prevent this from happening.

> The problem arises when the ethernet cable is plugged again and the
> comunication is up again. Master node allways shows "primary/unknown" and slave
> node remains in "primary/unknown" for about 90 seconds. After that, salve shows
> "secondary/unknown"
> 
> I would like to know:
>  
> Why is Heartbeat taking so long to "demote" the slave node from primary to
> secondary? I had to increase "cluster-timeout-action" to 120 seconds....
>
> DRBD never recovers "primary/secondary" state after Split Brain, even if
> drbd.conf is configured to discard changes in younger primary (that's what I
> need):
>  
> after-sb-0p discard-younger-primary;
> after-sb-2p violently-as0p;
> 
> What can I do??

look into the logs.

configure multiple heartbeat communication channels.

configure dopd.


out of curiosity: what application(s) do you intend to run on that?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list