[DRBD-user] Help with drbddisk modification to block takeover when the local resource is not in a safe state

Tue Sep 7 04:21:57 CEST 2010

On Tue, Sep 7, 2010 at 3:37 AM, Lars Ellenberg
<lars.ellenberg at linbit.com> wrote:
> On Mon, Sep 06, 2010 at 10:02:51PM +0800, jan gestre wrote:
>> On Mon, Sep 6, 2010 at 8:48 PM, Lars Ellenberg
>> <lars.ellenberg at linbit.com> wrote:
>> > On Mon, Sep 06, 2010 at 08:34:40PM +0800, jan gestre wrote:
>> >> Hi Everyone,
>> >>
>> >> I've found this drbddisk modification that will block takeover when
>> >> the local resource is not in a safe state, however it only works if
>> >> you only have one resource, but since I have two resources namely r0
>> >> and r1, it would not work.
>> >>
>> >> case "$CMD" in
>> >>    start)
>> >>      # forbid to become primary if ressource is not clean
>> >>      DRBDSTATEOK=`cat /proc/drbd | grep ' cs:Connected ' | grep '
>> >> ds:UpToDate/' | wc -l`
>> >>      if [ $DRBDSTATEOK -ne 1 ]; then
>> >>        echo >&2 "drbd is not in Connected/UpToDate state. refusing to
>> >> start resource"
>> >>        exit 1
>> >>      fi
>> >>
>> >> I would be truly grateful if anyone could care to show how to effect
>> >> said modification.
>> >>
>> >> I'm trying to prevent a Split Brain scenario here, and I'm still
>> >> testing my setup; I was in a predicament earlier wherein one of the
>> >> resource r1 is in healthy state and r0 is in standalone
>> >> Primary/Unknown state, I had to issue drdbadm -- --discard-my-data r0
>> >> to resolve the split brain.
>> >
>> > No Sir.
>> >
>> > What if the Primary dies? Hard?
>> > You now want your Secondary to take over, no?
>> > Well, you cannot anymore. Because it is not Connected.
>> > How could it, you just lost the peer ;-)
>> >
>> > Don't focus only on one specific scenario.
>> > Because, if you just "fix" that specific scenario,
>> > you break a truckload of others.
>> >
>> > Maybe it helps a bit to read
>> > http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg04312.html
>> >
>>
>> Thanks Lars, but now I am confused, maybe you can enlighten me, you're
>> saying that I would be better off without modifying, what then would
>> you recommend to prevent Split Brain? Add a stonith device, e.g. IBM
>> RSA? Add handlers like dopd?
>>
>> BTW, I got the modification from this url -->
>> http://lemonnier.se/erwan/blog/item/53/
>
> Which is mislead.
> And, it is not an attempt to avoid split brain,
> but to avoid diverging data sets,
> one of the ill consequences a split brain can lead to.
>
> What the presented patch does is disable takeover in case the Primary node dies.
> So why then have heartbeat, in the first place?
>
>
>
> I'll partially quote that blog:
>
> | Let's take an example: two nodes, N0 and N1. N0 is primary, N1 is secondary.
> | Both have redundant heartbeat links and at least one dedicated drbd
> | replication link. Let's consider the (highly) hypothetical case when the drbd
> | link goes down, soon followed by a power outage for N0. What will happen in a
> | standard heartbeat/drbd setup is that when the drbd link goes down, the drbd
> | daemon will set the local ressources on both nodes in state 'cs:WFConnection'
> | (Waiting For Connection) and mark the peer data as outdated.
>
> So far that is correct. Where "the drbd daemon" would be dopd.  Or, in a
> pacemaker cluster, you could also use the crm-fence-peer script to achieve
> a similar effect.
>
> | Then when N0
> | disappears due to the power outage, heartbeat on N1 will takeover ressources
> | and become the primary node.
>
> Which is wrong.
>
> First, drbd will refuse to be promoted if it is outdated.
> So this outdating seems to have not worked in the above setup.
> Fix it.
>
> | What we may want is to forbid a node to become primary in case its drbd
> | resources are not in a connected and up-to-date state.
>
> Which you already have: if it is Outdated, it cannot be promoted.
>
>
> Second, in a properly configured Pacemaker setup,
> Pacemaker (resp. the drbd OCF resource agent) would already know,
> and not even try to promote it on the outdated node.
>
>
>
> Besides, it should be a very unlikely event that a just rebooted, isolated node
> decides to take over resources.
>
> Maybe you should increase your initdead time.
>
> Or wait for connection before even starting heartbeat/pacemaker.
> In the haresources mode heartbeat clusters and using drbddisk, the drbd
> wfc-timeout parameter is used for this, and the default for it is "unlimited",
> so by default, the drbd init script would in most cases wait forever for drbd
> to establish a connection to its peer, thereby blocking the bootup process on
> purpose. Heartbeat would only start, once DRBD was able to establish its
> connection.
>
>
>
> Additionally maybe add a third node, so you have real quorum?
>
> But it depends on you, and what you want to achieve, of course.
> There is no one single best way.
>
> The pacemaker list post about whether or not a DRBD setup needs STONITH
> (I put the link here again)
> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg04312.html
> explains why with DRBD (and basically any not shared, but replicated resource)
> STONITH alone is NOT SUFFICIENT at all, and resource level fencing alone
> is not sufficient if you lose all communication paths at the same time
> (or in too quick succession for the chosen outdate mechanism to work).
> So if you are paranoid enough, you need both, and real quorum,
> and maybe on boot start nothing but sshd.
>
> And even then, I'm sure someone can come up with a multiple failure scenario,
> possibly involving operator failure, to still get diverging data sets ;-)
>
>
>
> And, btw, this part does not make sense to me either:
>
> | if you are using a stonith device, you may want to modify the stonith script
> | to forbid stonithing the peer if the local resources are not in
> | connected/up-to-date state. There might indeed be a chance that the peer node
> | still is functional while the local node definitely is not.
>
> You need STONITH to make sure that a node that you think is dead (i.e. you can
> no longer communicate with -- but you still have the doubt that it may only be
> the communication that is broken, not the node) really is dead.
> Now you forbid the STONITH operation in case DRBD is not connected,
> i.e. not communication with its peer.
> Wait.
> Wasn't communication failure the only reason you wanted to use STONITH in the
> first place?
>
> --

Many thanks Lars for your very informative response, and yes, that's
the only reason I wanted to use STONITH, I'm just using R1 style
configuration so I'm not sure if the aforementioned still applies.