[DRBD-user] Question for Split Brain

Tue Apr 15 17:57:57 CEST 2008

Hi Gordan and Everyone,

Thank you for your tips.

Now that you mention it, this is the error I get when the process fails

Apr 14 03:44:16 tweety1 kernel: drbd1: receiver terminated

Apr 14 03:44:16 tweety1 kernel: drbd1: receiver (re)started

Apr 14 03:44:16 tweety1 kernel: drbd1: conn( Unconnected -> WFConnection )

Apr 14 03:44:16 tweety1 kernel: drbd1: Handshake successful: Agreed network
protocol version 88

Apr 14 03:44:16 tweety1 kernel: drbd1: Peer authenticated using 20 bytes of
'sha1' HMAC

Apr 14 03:44:16 tweety1 kernel: drbd1: conn( WFConnection -> WFReportParams
)

Apr 14 03:44:16 tweety1 kernel: drbd1: Starting asender thread (from
drbd1_receiver [2631])

Apr 14 03:44:16 tweety1 kernel: drbd1: data-integrity-alg: <not-used>

Apr 14 03:44:16 tweety1 kernel: drbd1: Split-Brain detected, 2 primaries,
automatically solved. Sync from peer node

Apr 14 03:44:16 tweety1 kernel: drbd1: helper command: /sbin/drbdadm
pri-lost

Apr 14 03:44:16 tweety1 kernel: drbd1: I shall become SyncTarget, but I am
primary!

Any ideas on how to go through it?

Extract from my config

handlers {

    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

    outdate-peer "/usr/lib/drbd/outdate-peer.sh on tweety1 192.168.1.251
10.254.254.253 on tweety2 192.168.1.252 10.254.254.254";

    outdate-peer "/sbin/obliterate";

    pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD
Alert' root";

    split-brain "echo split-brain. drbdadm -- --discard-my-data connect
$DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";

}

Thank you.

-----Original Message-----
From: drbd at bobich.net [mailto:drbd at bobich.net] 
Sent: Monday, April 14, 2008 3:23 PM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Question for Split Brain

On Mon, 14 Apr 2008, Theophanis Kontogiannis wrote:

> If I have two nodes and all the resources are run Primary/Primary, when
Split Brain is

> dedected, and based on the algorithms, one of them will become the sync
target.

> 

> Let us assume that for some reason (it happens to me once per 2-4 days),
the worker fails, so

> the systems operate in split brain condition for a long time.

Sounds like you need to fix your networking problem. Reliable 

communication between the nodes is a pretty fundamental requirement.

> In that case some files have been written on A side and some on B side.

> 

> Also let us assume that node A is the SyncSource and node B is the
SyncTarget.

> 

> This means that all files changed on node A during the SB, will be updated
on node B but the

> files changed in node B will not be updated on node A?

Yes. Node A and node B will both end up with the volume image from node A. 

All changes on node B will be lost. If this is a problem (and I can't 

imagine it not being a problem), you should implement fencing that will 

forcefully shut down one of the nodes and fail over the resources (only 

the IP addresses if you are running two primaries) to the remaining 

primary.

Gordan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080415/9653150c/attachment.htm>