Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 10/29/12 9:43 AM, Maurits van de Lande wrote: > Hello, > > > > When one node unexpectedly shuts down, dlm locks down until quorum is > regained AND the faulty node is fenced, before it can take over the > cluster resources. > > > > I assume that you have set the “two_node” flag in cluster.conf > > > >>Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent fence_ack_manual result: error from agent > > Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed > > > > I think that adding the following option to the dlm section in cluster.conf > > enable_fencing="0" > > might solve this problem. (but I have not tested this) This will disable > fencing. > > > > Or you can setup fencing. > > > > Best regards, > > > > Maurits van de Lande > > > > > > > > *Van:*drbd-user-bounces at lists.linbit.com > [mailto:drbd-user-bounces at lists.linbit.com] *Namens *Zohair Raza > *Verzonden:* maandag 29 oktober 2012 11:03 > *Aan:* drbd-user at lists.linbit.com > *Onderwerp:* [DRBD-user] GFS2 freezes > > > > Hi, > > > > I have setup a Primary/Primary cluster with GFS2. > > > > All works good if I shut down any node regularly, but when I unplug > power of any node, GFS freezes and I can not access the device. > > > > Tried to use http://people.redhat.com/lhh/obliterate > > > > this is what I see in logs > > > > Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time. > > Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown ) > conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 > -> 1 ) > > Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated > > Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread > > Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed > > Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -> > Unconnected ) > > Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated > > Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread > > Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started > > Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -> > WFConnection ) > > Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm > fence-peer res0 > > Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed > > Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm > fence-peer res0 exit code 1 (0x100) > > Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken, > returned 1 > > Oct 29 08:05:48 node1 corosync[1346]: [TOTEM ] A processor failed, > forming new configuration. > > Oct 29 08:05:53 node1 corosync[1346]: [QUORUM] Members[1]: 1 > > Oct 29 08:05:53 node1 corosync[1346]: [TOTEM ] A processor joined or > left the membership and a new membership was formed. > > Oct 29 08:05:53 node1 corosync[1346]: [CPG ] chosen downlist: sender > r(0) ip(192.168.23.128) ; members(old:2 left:1) > > Oct 29 08:05:53 node1 corosync[1346]: [MAIN ] Completed service > synchronization, ready to provide service. > > Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2 > > Oct 29 08:05:53 node1 fenced[1401]: fencing node node2 > > Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1: > Trying to acquire journal lock... > > Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent > fence_ack_manual result: error from agent > > Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed > > Oct 29 08:05:56 node1 fenced[1401]: fencing node node2 > > Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent > fence_ack_manual result: error from agent > > Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed > > Oct 29 08:05:59 node1 fenced[1401]: fencing node node2 > > Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent > fence_ack_manual result: error from agent > > Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed > > > Regards, > Zohair Raza I had a similar problem. The issue turned out to have nothing directly to do with DRBD or DLM, but that I had put clvmd under pacemaker/corosync control. I had to start cman and clvmd with the usual Linux init scripts, and mount the individual GFS2 filesystems with pacemaker. I hope this helps. -- William Seligman | http://www.nevis.columbia.edu/~seligman/ Nevis Labs, Columbia Univ | PO Box 137 | Irvington NY 10533 USA | Phone: (914) 591-2823 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4510 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121029/dc4bc270/attachment.bin>