Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 2 November 2010 22:57, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > On Tue, Nov 02, 2010 at 10:07:17PM +0100, Pavlos Parissis wrote: >> On 2 November 2010 16:15, Dan Frincu <dfrincu at streamwide.ro> wrote: >> > Hi, >> > >> > Pavlos Parissis wrote: >> >> >> >> Hi, >> >> >> >> I am trying to figure out how I can resolve the following scenario >> >> >> >> Facts >> >> 3 nodes >> >> 2 DRBD ms resource >> >> 2 group resource >> >> by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2 >> >> drbd1/group1 can only run on node-01 and node-03 >> >> drbd2/group2 can only run on node-02 and node-03 >> >> DRBD fencing_policy is resource-only [1] >> >> 2 heartbeat links and one of them used by DRBD communication >> >> >> >> Scenario >> >> 1) node-01 loses both heartbeat links >> >> 2) DRBD monitor detects first the absence of the drbd communication >> >> and does resource fencing by add location constraint which prevent >> >> drbd1 to run on node3 >> >> 3) pacemaker fencing kicks in and kills node-01 >> >> >> >> due to location constraint created at step 2, drbd1/group1 can run in >> >> the cluster >> >> >> >> >> > >> > I don't understand exactly what you mean by this. Resource-only fencing >> > would create a -inf score on node1 when the node loses the drbd >> > communication channel (the only one drbd uses), >> Because node-01 is the primary at the moment of the failure, >> resource-fencing will create an -inf score for the node-03. >> >> > however you could still have >> > heartbeat communication available via the secondary link, then you shouldn't >> As I wrote none of the heartbeat links is available. >> After I sent the mail, I realized that the node-03 will not see >> location constraint created by node-01 because there no heartbeat >> communication! >> Thus I think my scenario has a flaw, since none of the heartbeat links >> are available on node-01. >> Resource-fencing from DRBD will be triggered but without any effect >> and node-03 or node-02 will fence node-01, and node-03 will be become >> the primary for drbd1 >> >> > fence the entire node, the resource-only fencing does that for you, the only >> > thing you need to do is to add the drbd fence handlers in /etc/drbd.conf. >> > handlers { >> > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; >> > after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; >> > } >> > >> > Is this what you meant? >> >> No. >> Dan thanks for your mail. >> >> >> Since there is a flaw on the scenario let's define a similar scenario. >> >> status >> node-01 primary for drbd1 and group1 runs on it >> node-02 primary for drbd2 and group2 runs on it >> node-3 secondary for drbd1 and drbd2 >> >> 2 heartbeat links, and one of them being used for DRBD communication >> >> here is the scenario >> 1) on node-01 heartbeat link which carries also DRBD communication is lost >> 2) node-01 does resource-fencing and places score -inf for drbd1 on node-03 >> 3) on node-01 second heartbeat link is lost >> 4) node-01 will be fenced by one other cluster members >> 5) drbd1 can't run on node-03 due to location constraint created at step 2 >> >> The problem here is that location constraint will be active even >> node-01 is fenced. > > Which is good, and intended behaviour, as it protects you from > going online with stale data (changes between 1) and 4) would be lost). > >> Any ideas? > > The drbd setting "resource-and-stonith" simply tells DRBD > that you have stonith configured in your cluster. > It does not by itself trigger any stonith action. > > So if you have stonith enabled, and you want to protect against being > shot while modifying data, you should say "resource-and-stonith". I do have stonith enabled in my Cluster, but I don't quite understand what you have wrote. The resource-and-stonith setting will add the location constraint as the fencing resource-only and it will also prevent a node with a role of primary to be fenced, am I right? So, what happens when Cluster sends a fence event? Initially, I thought this setting will trigger a fence event and I didn't use it because I wanted to avoid a node which have the role of secondary for drbd1 and the role primary for drbd2 to be fenced because the replication link for drbd1 was lost. I think I need to experiment with this setting in order to understand it > > What exactly do you want to solve? > > Either you want to avoid going online with stale data, > so you place that contraint, or use dopd, or some similar mechanism. > > Or you don't care, so you don't use those fencing scripts. > > Or you usually are in a situation where you not want to use stale data, > but suddenly your primary data copy is catastrophically lost, and the > (slightly?) stale other copy is the best you have. > > Then you remove the constraint or force drbd primary, or both. > This should not be outomated, as it involves knowledge the cluster > cannot have, thus cannot base decisions on. > > So again, > > What is it you are trying to solve? Manually intervention for doing what you wrote on the last paragraph. Looking at the setting for split-brain I thought that it would be useful to have something similar for these scenarios. I have been reading several post related to this topic and the more posts I read the more I realize that any automatic resolution will basically abolish the work that have been done on DRBD to avoid data corruption Lars, thanks for your mail, Pavlos