[DRBD-user] GFS2 freezes

Maurits van de Lande M.vandeLande at VDL-Fittings.com
Mon Oct 29 14:43:55 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

When  one  node unexpectedly shuts down, dlm locks down until quorum is regained AND the faulty node is fenced, before it can take over the cluster resources.

I assume that you have set the "two_node" flag  in cluster.conf

>Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent
Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed

I think that adding the following option to the dlm section in cluster.conf
enable_fencing="0"
might solve this problem. (but I have not tested this) This will disable fencing.

Or you can setup fencing.

Best regards,

Maurits van de Lande



Van: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] Namens Zohair Raza
Verzonden: maandag 29 oktober 2012 11:03
Aan: drbd-user at lists.linbit.com
Onderwerp: [DRBD-user] GFS2 freezes

Hi,

I have setup a Primary/Primary cluster with GFS2.

All works good if I shut down any node regularly, but when I unplug power of any node, GFS freezes and I can not access the device.

Tried to use http://people.redhat.com/lhh/obliterate

this is what I see in logs

Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time.
Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated
Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread
Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed
Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -> Unconnected )
Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated
Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread
Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started
Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -> WFConnection )
Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm fence-peer res0
Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed
Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm fence-peer res0 exit code 1 (0x100)
Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken, returned 1
Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor failed, forming new configuration.
Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1
Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.23.128) ; members(old:2 left:1)
Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2
Oct 29 08:05:53 node1 fenced[1401]: fencing node node2
Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1: Trying to acquire journal lock...
Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent
Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed
Oct 29 08:05:56 node1 fenced[1401]: fencing node node2
Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent
Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed
Oct 29 08:05:59 node1 fenced[1401]: fencing node node2
Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent
Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed

Regards,
Zohair Raza

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121029/e514b194/attachment.htm>


More information about the drbd-user mailing list