[Drbd-dev] DRBD8: pri-lost-after-sb handler not kicking in on Split-brain or so it seems

Montrose, Ernest Ernest.Montrose at stratus.com
Wed Oct 25 16:20:49 CEST 2006


Phil,
Thanks for the clarification on the algorithm. Actually, I may not have an immediate need any longer for the handler to be called every time based on what you explained.  But I think that if it is called every time it gives increased flexibility.  Besides, if it is called every time we would still keep the current behavior while adding that extra bit of flexibility. I may just be paranoid here thinking that I will need this somehow soon. Though I cannot think of what it would really be good and critical for at this moment:)

Thanks again,
EM--

-----Original Message-----
From: Philipp Reisner [mailto:philipp.reisner at linbit.com] 
Sent: Wednesday, October 25, 2006 9:40 AM
To: drbd-dev at linbit.com
Cc: Montrose, Ernest
Subject: Re: [Drbd-dev] DRBD8: pri-lost-after-sb handler not kicking in on Split-brain or so it seems

Am Mittwoch, 25. Oktober 2006 00:06 schrieb Montrose, Ernest:
> Hi all,
> I set my configuration to allow two primaries and have those parameters
> set:
>
> allow-two-primaries;
> after-sb-0pri discard-least-changes;
> after-sb-2pri call-pri-lost-after-sb;
>
> I, of course, set a pri-lost-after-sb handler.    I then induced a split
> brain (Ifdown hbiface; ifup hbiface).
> I issue a drbdadm connect all on my primary.
>
> What happens is that:
> *	one of the nodes is forced to be secondary. (T original state
> was Primary/Primary before the split brain)
> *	My handler I never called.

The idea is: 

1) run the "discard-least-changes" algorithm
2) try to make the looser secondary (and start resync)
3) If that failes call the user space helper

You might ask, why this is like that? ...

The root problem is that after a split brain, with both nodes in
primary state, we have to discard the data of one of the two nodes.

In case there is a a filesystem on top of DRBD, I do not see an other
option that to reboot the machine. There is not way a block devices
says to a file system: Sorry, I have to change my content, please
do not mind ;)

But if a device can be switched into secondary state, we know that
there is not file system on top of DRBD and therefore the reboot
can be avioded!


Actually we should avoid split brain situations in the first place,
see the "outdate-peer" hander!

>
> A quick look at drbd_receiver.c:drbd_asb_recover_2p() reveals that my
> handler would not be called because I successfully
> Setup the node to secondary.  Is this by design?  I would like for the
> handler to be called every time I attempt to recover from SB. Or at
> least,
> If in fact, the SB is "automatically solved" then my Sstate of
> Primary/Primary should return.  Any thoughts?

Ernest, 
I see, that you want to have a handler that is called every time a
split-brain happens. I might consider this, but first I want to 
understand what you want to do with that hander. What it could 
be good for... 

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :


More information about the drbd-dev mailing list