[DRBD-user] GFS2 freezes

Mon Oct 29 14:43:55 CET 2012

Hello,

When  one  node unexpectedly shuts down, dlm locks down until quorum is regained AND the faulty node is fenced, before it can take over the cluster resources.

I assume that you have set the "two_node" flag  in cluster.conf

>Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent
Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed

I think that adding the following option to the dlm section in cluster.conf
enable_fencing="0"
might solve this problem. (but I have not tested this) This will disable fencing.

Or you can setup fencing.

Best regards,

Maurits van de Lande

Van: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] Namens Zohair Raza
Verzonden: maandag 29 oktober 2012 11:03
Aan: drbd-user at lists.linbit.com
Onderwerp: [DRBD-user] GFS2 freezes

Hi,

I have setup a Primary/Primary cluster with GFS2.

All works good if I shut down any node regularly, but when I unplug power of any node, GFS freezes and I can not access the device.

Tried to use http://people.redhat.com/lhh/obliterate

this is what I see in logs

Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time.
Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated
Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread
Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed
Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -> Unconnected )
Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated
Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread
Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started
Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -> WFConnection )
Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm fence-peer res0
Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed
Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm fence-peer res0 exit code 1 (0x100)
Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken, returned 1
Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor failed, forming new configuration.
Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1
Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.23.128) ; members(old:2 left:1)
Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2
Oct 29 08:05:53 node1 fenced[1401]: fencing node node2
Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1: Trying to acquire journal lock...
Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent
Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed
Oct 29 08:05:56 node1 fenced[1401]: fencing node node2
Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent
Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed
Oct 29 08:05:59 node1 fenced[1401]: fencing node node2
Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent
Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed

Regards,
Zohair Raza

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121029/e514b194/attachment.htm>