Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, When one node unexpectedly shuts down, dlm locks down until quorum is regained AND the faulty node is fenced, before it can take over the cluster resources. I assume that you have set the "two_node" flag in cluster.conf >Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed I think that adding the following option to the dlm section in cluster.conf enable_fencing="0" might solve this problem. (but I have not tested this) This will disable fencing. Or you can setup fencing. Best regards, Maurits van de Lande Van: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] Namens Zohair Raza Verzonden: maandag 29 oktober 2012 11:03 Aan: drbd-user at lists.linbit.com Onderwerp: [DRBD-user] GFS2 freezes Hi, I have setup a Primary/Primary cluster with GFS2. All works good if I shut down any node regularly, but when I unplug power of any node, GFS freezes and I can not access the device. Tried to use http://people.redhat.com/lhh/obliterate this is what I see in logs Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time. Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -> Unconnected ) Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -> WFConnection ) Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm fence-peer res0 Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm fence-peer res0 exit code 1 (0x100) Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken, returned 1 Oct 29 08:05:48 node1 corosync[1346]: [TOTEM ] A processor failed, forming new configuration. Oct 29 08:05:53 node1 corosync[1346]: [QUORUM] Members[1]: 1 Oct 29 08:05:53 node1 corosync[1346]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 29 08:05:53 node1 corosync[1346]: [CPG ] chosen downlist: sender r(0) ip(192.168.23.128) ; members(old:2 left:1) Oct 29 08:05:53 node1 corosync[1346]: [MAIN ] Completed service synchronization, ready to provide service. Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2 Oct 29 08:05:53 node1 fenced[1401]: fencing node node2 Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1: Trying to acquire journal lock... Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed Oct 29 08:05:56 node1 fenced[1401]: fencing node node2 Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed Oct 29 08:05:59 node1 fenced[1401]: fencing node node2 Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed Regards, Zohair Raza -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121029/e514b194/attachment.htm>