[DRBD-user] GFS2 freezes

Digimer lists at alteeve.ca
Mon Oct 29 18:26:23 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


When a node stops responding, it can not be assumes to be dead. It has
to be put into a known state, and that is what fencing does. Disabling
fencing is like driving without a seatbelt.

Ya, it'll save you a bit of time at first, but the first time you get a
split-brain, you are going right through the windshield. Will you think
it was worth it then?

A split-brain is when neither node fails, but they can't communicate
anymore. If each assumes the other is gone and begins using the shared
storage without coordinating with it's peer, your data will be corrupted
very, very quickly.

Heck, even if all you had was a floating IP; disabling fencing means
that both nodes would try to use that IP. As what your
clients/switches/routers think of that.

Please read this for more details;

https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

digimer

On 10/29/2012 03:03 AM, Zohair Raza wrote:
> Hi, 
> 
> I have setup a Primary/Primary cluster with GFS2.
> 
> All works good if I shut down any node regularly, but when I unplug
> power of any node, GFS freezes and I can not access the device. 
> 
> Tried to use http://people.redhat.com/lhh/obliterate 
> 
> this is what I see in logs 
> 
> Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time.
> Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown )
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0
> -> 1 )
> Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated
> Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread
> Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed
> Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure ->
> Unconnected )
> Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated
> Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread
> Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started
> Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected ->
> WFConnection )
> Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
> fence-peer res0
> Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed
> Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
> fence-peer res0 exit code 1 (0x100)
> Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken,
> returned 1
> Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor failed,
> forming new configuration.
> Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1
> Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor joined or
> left the membership and a new membership was formed.
> Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen downlist: sender
> r(0) ip(192.168.23.128) ; members(old:2 left:1)
> Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2
> Oct 29 08:05:53 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1:
> Trying to acquire journal lock...
> Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed
> Oct 29 08:05:56 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed
> Oct 29 08:05:59 node1 fenced[1401]: fencing node node2
> Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent
> fence_ack_manual result: error from agent
> Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed
> 
> Regards,
> Zohair Raza
> 
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?



More information about the drbd-user mailing list