[DRBD-user] GFS2 freezes

Digimer lists at alteeve.ca
Tue Oct 30 16:58:32 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Manual fencing is not in any way supported. You must be able to call
'fence_node <peer>' and have the remote node reset. If this doesn't
happen, your fencing is not sufficient.

On 10/30/2012 05:43 AM, Zohair Raza wrote:
> I have rebuild the setup, and enabled fencing 
> 
> Manual fencing (fence_ack_manual) works okay when I fence one a dead
> node from command line but it is not doing automatically 
> 
> Logs:
> Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2
> Oct 30 12:05:52 node1 fenced[1414]: fencing node node2
> Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying
> to acquire journal lock...
> Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent
> fence_manual result: error from agent
> Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed
> Oct 30 12:05:55 node1 fenced[1414]: fencing node node2
> 
> Cluster.conf:
> 
> <?xml version="1.0"?>
> <cluster name="cluster1" config_version="3">
> <cman two_node="1" expected_votes="1"/>
> <clusternodes>
> <clusternode name="node1" votes="1" nodeid="1">
>         <fence>
>                 <method name="single">
>                         <device name="manual" ipaddr="192.168.23.128"/>
>                 </method>
>         </fence>
> </clusternode>
> <clusternode name="node2" votes="1" nodeid="2">
>         <fence>
>                 <method name="single">
>                         <device name="manual" ipaddr="192.168.23.129"/>
>                 </method>
>         </fence>
> </clusternode>
> </clusternodes>
> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> <fencedevices>
>         <fencedevice name="manual" agent="fence_manual"/>
> </fencedevices>
> </cluster>
> 
> drbd.conf:
> 
> resource res0 {
>   protocol C;
>   startup {
>     wfc-timeout 20;
>     degr-wfc-timeout 10;
>     # we will keep this commented until tested successfully:
>      become-primary-on both;
>   }
>   net {
>     # the encryption part can be omitted when using a dedicated link for
> DRBD only:
>     # cram-hmac-alg sha1;
>     # shared-secret anysecrethere123;
>     allow-two-primaries;
>   }
>   on node1 {
>     device /dev/drbd0;
>     disk /dev/sdb1;
>     address 192.168.23.128:7789 <http://192.168.23.128:7789>;
>     meta-disk internal;
>   }
>   on node2 {
>     device /dev/drbd0;
>     disk /dev/sdb1;
>     address 192.168.23.129:7789 <http://192.168.23.129:7789>;
>     meta-disk internal;
>   }
> }
> 
> Regards,
> Zohair Raza
> 
> 
> On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza
> <engineerzuhairraza at gmail.com <mailto:engineerzuhairraza at gmail.com>> wrote:
> 
>     Hi, 
> 
>     thanks for explanation
> 
>     On Mon, Oct 29, 2012 at 9:26 PM, Digimer <lists at alteeve.ca
>     <mailto:lists at alteeve.ca>> wrote:
> 
>         When a node stops responding, it can not be assumes to be dead.
>         It has
>         to be put into a known state, and that is what fencing does.
>         Disabling
>         fencing is like driving without a seatbelt.
> 
>         Ya, it'll save you a bit of time at first, but the first time
>         you get a
>         split-brain, you are going right through the windshield. Will
>         you think
>         it was worth it then?
> 
>         A split-brain is when neither node fails, but they can't communicate
>         anymore. If each assumes the other is gone and begins using the
>         shared
>         storage without coordinating with it's peer, your data will be
>         corrupted
>         very, very quickly.
> 
> 
>     In my scenario, I have two Samba servers in two different locations
>     so chances of this are obvious 
> 
> 
>         Heck, even if all you had was a floating IP; disabling fencing means
>         that both nodes would try to use that IP. As what your
>         clients/switches/routers think of that.
> 
>     I don't need floating IP, as I am not looking for high availability
>     but two Samba servers synced with each other so roaming employees
>     can have faster access to their files on both locations. I skipped
>     fencing as per Mautris's suggestion but I still can't figure out why
>     fencing daemon was not able to fence the other node. 
>      
> 
>         Please read this for more details;
> 
>         https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing
> 
> 
>     Will have a look
>      
> 
>         digimer
> 
>         On 10/29/2012 03:03 AM, Zohair Raza wrote:
>         > Hi,
>         >
>         > I have setup a Primary/Primary cluster with GFS2.
>         >
>         > All works good if I shut down any node regularly, but when I
>         unplug
>         > power of any node, GFS freezes and I can not access the device.
>         >
>         > Tried to use http://people.redhat.com/lhh/obliterate
>         >
>         > this is what I see in logs
>         >
>         > Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not
>         arrive in time.
>         > Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary ->
>         Unknown )
>         > conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown
>         ) susp( 0
>         > -> 1 )
>         > Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated
>         > Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender
>         thread
>         > Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed
>         > Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure ->
>         > Unconnected )
>         > Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated
>         > Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver
>         thread
>         > Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started
>         > Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected ->
>         > WFConnection )
>         > Oct 29 08:05:41 node1 kernel: d-con res0: helper command:
>         /sbin/drbdadm
>         > fence-peer res0
>         > Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed
>         > Oct 29 08:05:41 node1 kernel: d-con res0: helper command:
>         /sbin/drbdadm
>         > fence-peer res0 exit code 1 (0x100)
>         > Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper
>         broken,
>         > returned 1
>         > Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor
>         failed,
>         > forming new configuration.
>         > Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1
>         > Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor
>         joined or
>         > left the membership and a new membership was formed.
>         > Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen
>         downlist: sender
>         > r(0) ip(192.168.23.128) ; members(old:2 left:1)
>         > Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service
>         > synchronization, ready to provide service.
>         > Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2
>         > Oct 29 08:05:53 node1 fenced[1401]: fencing node node2
>         > Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0:
>         jid=1:
>         > Trying to acquire journal lock...
>         > Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent
>         > fence_ack_manual result: error from agent
>         > Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed
>         > Oct 29 08:05:56 node1 fenced[1401]: fencing node node2
>         > Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent
>         > fence_ack_manual result: error from agent
>         > Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed
>         > Oct 29 08:05:59 node1 fenced[1401]: fencing node node2
>         > Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent
>         > fence_ack_manual result: error from agent
>         > Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed
>         >
>         > Regards,
>         > Zohair Raza
>         >
>         >
>         >
>         > _______________________________________________
>         > drbd-user mailing list
>         > drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>         > http://lists.linbit.com/mailman/listinfo/drbd-user
>         >
> 
> 
>         --
>         Digimer
>         Papers and Projects: https://alteeve.ca/w/
>         What if the cure for cancer is trapped in the mind of a person
>         without
>         access to education?
> 
> 
> 
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?



More information about the drbd-user mailing list