[DRBD-user] GFS2 freezes

Wed Oct 31 00:02:01 CET 2012

On Tue, Oct 30, 2012 at 10:03:09PM +0400, Zohair Raza wrote:
> On Tue, Oct 30, 2012 at 7:58 PM, Digimer <lists at alteeve.ca> wrote:
> > On 10/30/2012 05:43 AM, Zohair Raza wrote:
> > > I have rebuild the setup, and enabled fencing
> > >
> > > Manual fencing (fence_ack_manual) works okay when I fence one a dead
> > > node from command line but it is not doing automatically

It is called "Manual fencing" for a reason ...

> > Manual fencing is not in any way supported. You must be able to call
> > 'fence_node <peer>' and have the remote node reset. If this doesn't
> > happen, your fencing is not sufficient.

> fence_node <peer> doesn't work for me
> 
> fence_node node2 says
> 
> fence node2 failed

Which is why you need a *real* fencing device
for automatic fencing.

> > > Logs:
> > > Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2
> > > Oct 30 12:05:52 node1 fenced[1414]: fencing node node2
> > > Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying
> > > to acquire journal lock...
> > > Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent
> > > fence_manual result: error from agent
> > > Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed
> > > Oct 30 12:05:55 node1 fenced[1414]: fencing node node2
> > >
> > > Cluster.conf:
> > >
> > > <?xml version="1.0"?>
> > > <cluster name="cluster1" config_version="3">
> > > <cman two_node="1" expected_votes="1"/>
> > > <clusternodes>
> > > <clusternode name="node1" votes="1" nodeid="1">
> > >         <fence>
> > >                 <method name="single">
> > >                         <device name="manual" ipaddr="192.168.23.128"/>
> > >                 </method>
> > >         </fence>
> > > </clusternode>
> > > <clusternode name="node2" votes="1" nodeid="2">
> > >         <fence>
> > >                 <method name="single">
> > >                         <device name="manual" ipaddr="192.168.23.129"/>
> > >                 </method>
> > >         </fence>
> > > </clusternode>
> > > </clusternodes>
> > > <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> > > <fencedevices>
> > >         <fencedevice name="manual" agent="fence_manual"/>
> > > </fencedevices>
> > > </cluster>
> > >
> > > drbd.conf:
> > >
> > > resource res0 {
> > >   protocol C;
> > >   startup {
> > >     wfc-timeout 20;
> > >     degr-wfc-timeout 10;
> > >     # we will keep this commented until tested successfully:
> > >      become-primary-on both;
> > >   }
> > >   net {
> > >     # the encryption part can be omitted when using a dedicated link for
> > > DRBD only:
> > >     # cram-hmac-alg sha1;
> > >     # shared-secret anysecrethere123;
> > >     allow-two-primaries;
> > >   }
> > >   on node1 {
> > >     device /dev/drbd0;
> > >     disk /dev/sdb1;
> > >     address 192.168.23.128:7789 <http://192.168.23.128:7789>;
> > >     meta-disk internal;
> > >   }
> > >   on node2 {
> > >     device /dev/drbd0;
> > >     disk /dev/sdb1;
> > >     address 192.168.23.129:7789 <http://192.168.23.129:7789>;
> > >     meta-disk internal;
> > >   }
> > > }
> > >
> > > Regards,
> > > Zohair Raza
> > >
> > >
> > > On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza
> > > <engineerzuhairraza at gmail.com <mailto:engineerzuhairraza at gmail.com>>
> > wrote:
> > >
> > >     Hi,
> > >
> > >     thanks for explanation
> > >
> > >     On Mon, Oct 29, 2012 at 9:26 PM, Digimer <lists at alteeve.ca
> > >     <mailto:lists at alteeve.ca>> wrote:
> > >
> > >         When a node stops responding, it can not be assumes to be dead.
> > >         It has
> > >         to be put into a known state, and that is what fencing does.
> > >         Disabling
> > >         fencing is like driving without a seatbelt.
> > >
> > >         Ya, it'll save you a bit of time at first, but the first time
> > >         you get a
> > >         split-brain, you are going right through the windshield. Will
> > >         you think
> > >         it was worth it then?
> > >
> > >         A split-brain is when neither node fails, but they can't
> > communicate
> > >         anymore. If each assumes the other is gone and begins using the
> > >         shared
> > >         storage without coordinating with it's peer, your data will be
> > >         corrupted
> > >         very, very quickly.
> > >
> > >
> > >     In my scenario, I have two Samba servers in two different locations
> > >     so chances of this are obvious
> > >
> > >
> > >         Heck, even if all you had was a floating IP; disabling fencing
> > means
> > >         that both nodes would try to use that IP. As what your
> > >         clients/switches/routers think of that.
> > >
> > >     I don't need floating IP, as I am not looking for high availability
> > >     but two Samba servers synced with each other so roaming employees
> > >     can have faster access to their files on both locations. I skipped
> > >     fencing as per Mautris's suggestion but I still can't figure out why
> > >     fencing daemon was not able to fence the other node.
> > >
> > >
> > >         Please read this for more details;
> > >
> > >
> > https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed