[DRBD-user] [Pacemaker] drbd on heartbeat links

Wed Nov 3 11:14:21 CET 2010

On 2 November 2010 22:57, Lars Ellenberg <lars.ellenberg at linbit.com> wrote:
> On Tue, Nov 02, 2010 at 10:07:17PM +0100, Pavlos Parissis wrote:
>> On 2 November 2010 16:15, Dan Frincu <dfrincu at streamwide.ro> wrote:
>> > Hi,
>> >
>> > Pavlos Parissis wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am trying to figure out how I can resolve the following scenario
>> >>
>> >> Facts
>> >> 3 nodes
>> >> 2 DRBD ms resource
>> >> 2 group resource
>> >> by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2
>> >> drbd1/group1  can only run on node-01 and node-03
>> >> drbd2/group2  can only run on node-02 and node-03
>> >> DRBD fencing_policy is resource-only [1]
>> >> 2 heartbeat links and one of them used by DRBD communication
>> >>
>> >> Scenario
>> >> 1) node-01 loses both heartbeat links
>> >> 2) DRBD monitor detects first the absence of the drbd communication
>> >> and does resource fencing by add location constraint which prevent
>> >> drbd1 to run on node3
>> >> 3) pacemaker fencing kicks in and kills node-01
>> >>
>> >> due to location constraint created at step 2, drbd1/group1 can run in
>> >> the cluster
>> >>
>> >>
>> >
>> > I don't understand exactly what you mean by this. Resource-only fencing
>> > would create a -inf score on node1 when the node loses the drbd
>> > communication channel (the only one drbd uses),
>> Because node-01 is the primary at the moment of the failure,
>> resource-fencing will create an -inf score for the node-03.
>>
>> > however you could still have
>> > heartbeat communication available via the secondary link, then you shouldn't
>> As I wrote none of the heartbeat links is available.
>> After I sent the mail, I realized that the node-03 will not see
>> location constraint created by node-01 because there no heartbeat
>> communication!
>> Thus I think my scenario has a flaw, since none of the heartbeat links
>> are available on node-01.
>> Resource-fencing from DRBD will be triggered but without any effect
>> and node-03 or node-02 will fence node-01, and node-03 will be become
>> the primary for drbd1
>>
>> > fence the entire node, the resource-only fencing does that for you, the only
>> > thing you need to do is to add the drbd fence handlers in /etc/drbd.conf.
>> >       handlers {
>> >               fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>> >               after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>> >       }
>> >
>> > Is this what you meant?
>>
>> No.
>> Dan thanks for your mail.
>>
>>
>> Since there is a flaw on the scenario let's define a similar scenario.
>>
>> status
>> node-01 primary for drbd1 and group1 runs on it
>> node-02 primary for drbd2 and group2 runs on it
>> node-3 secondary for drbd1 and drbd2
>>
>> 2 heartbeat links, and one of them being used for DRBD communication
>>
>> here is the scenario
>> 1) on node-01 heartbeat link which carries also DRBD communication is lost
>> 2) node-01 does resource-fencing and places score -inf for drbd1 on node-03
>> 3) on node-01 second heartbeat link is lost
>> 4) node-01 will be fenced by one other cluster members
>> 5) drbd1 can't run on node-03 due to location constraint created at step 2
>>
>> The problem here is that location constraint will be active even
>> node-01 is fenced.
>
> Which is good, and intended behaviour, as it protects you from
> going online with stale data (changes between 1) and 4) would be lost).
>
>> Any ideas?
>
> The drbd setting "resource-and-stonith" simply tells DRBD
> that you have stonith configured in your cluster.
> It does not by itself trigger any stonith action.
>
> So if you have stonith enabled, and you want to protect against being
> shot while modifying data, you should say "resource-and-stonith".

I do have stonith enabled in my Cluster, but I don't quite understand
what you have wrote.
The resource-and-stonith setting will add the location constraint as
the fencing resource-only and it will also prevent a node with a role
of primary to be fenced, am I right?
So, what happens when Cluster sends a fence event?

Initially, I thought this setting will trigger a fence event and I
didn't use it because I wanted to avoid a node which have the role of
secondary for drbd1 and the role primary for drbd2
to be fenced because the replication link for drbd1 was lost.

I think I need to experiment with this setting in order to understand it

>
> What exactly do you want to solve?
>
> Either you want to avoid going online with stale data,
> so you place that contraint, or use dopd, or some similar mechanism.
>
> Or you don't care, so you don't use those fencing scripts.
>
> Or you usually are in a situation where you not want to use stale data,
> but suddenly your primary data copy is catastrophically lost, and the
> (slightly?) stale other copy is the best you have.
>
> Then you remove the constraint or force drbd primary, or both.
> This should not be outomated, as it involves knowledge the cluster
> cannot have, thus cannot base decisions on.
>
> So again,
>
> What is it you are trying to solve?

Manually intervention for doing what you wrote on the last paragraph.
Looking at the setting for split-brain I thought that it would be
useful to have something similar for these scenarios.
I have been reading several post related to this topic and the more
posts I read the more I realize that any automatic resolution will
basically abolish the work that have been done on DRBD to avoid data
corruption

Lars, thanks for your mail,
Pavlos