<div dir="ltr">fence_node &lt;peer&gt; doesn&#39;t work for me<div><br></div><div>fence_node node2 says </div><div><br></div><div>fence node2 failed<br><div><br></div><div><br clear="all"><div dir="ltr">Regards,<br>Zohair Raza<div>


<br></div></div><br><div class="gmail_quote">On Tue, Oct 30, 2012 at 7:58 PM, Digimer <span dir="ltr">&lt;<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Manual fencing is not in any way supported. You must be able to call<br>

&#39;fence_node &lt;peer&gt;&#39; and have the remote node reset. If this doesn&#39;t<br>

happen, your fencing is not sufficient.<br>

<div><div class="h5"><br>

On 10/30/2012 05:43 AM, Zohair Raza wrote:<br>

&gt; I have rebuild the setup, and enabled fencing<br>

&gt;<br>

&gt; Manual fencing (fence_ack_manual) works okay when I fence one a dead<br>

&gt; node from command line but it is not doing automatically<br>

&gt;<br>

&gt; Logs:<br>

&gt; Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2<br>

&gt; Oct 30 12:05:52 node1 fenced[1414]: fencing node node2<br>

&gt; Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying<br>

&gt; to acquire journal lock...<br>

&gt; Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent<br>

&gt; fence_manual result: error from agent<br>

&gt; Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed<br>

&gt; Oct 30 12:05:55 node1 fenced[1414]: fencing node node2<br>

&gt;<br>

&gt; Cluster.conf:<br>

&gt;<br>

&gt; &lt;?xml version=&quot;1.0&quot;?&gt;<br>

&gt; &lt;cluster name=&quot;cluster1&quot; config_version=&quot;3&quot;&gt;<br>

&gt; &lt;cman two_node=&quot;1&quot; expected_votes=&quot;1&quot;/&gt;<br>

&gt; &lt;clusternodes&gt;<br>

&gt; &lt;clusternode name=&quot;node1&quot; votes=&quot;1&quot; nodeid=&quot;1&quot;&gt;<br>

&gt;         &lt;fence&gt;<br>

&gt;                 &lt;method name=&quot;single&quot;&gt;<br>

&gt;                         &lt;device name=&quot;manual&quot; ipaddr=&quot;192.168.23.128&quot;/&gt;<br>

&gt;                 &lt;/method&gt;<br>

&gt;         &lt;/fence&gt;<br>

&gt; &lt;/clusternode&gt;<br>

&gt; &lt;clusternode name=&quot;node2&quot; votes=&quot;1&quot; nodeid=&quot;2&quot;&gt;<br>

&gt;         &lt;fence&gt;<br>

&gt;                 &lt;method name=&quot;single&quot;&gt;<br>

&gt;                         &lt;device name=&quot;manual&quot; ipaddr=&quot;192.168.23.129&quot;/&gt;<br>

&gt;                 &lt;/method&gt;<br>

&gt;         &lt;/fence&gt;<br>

&gt; &lt;/clusternode&gt;<br>

&gt; &lt;/clusternodes&gt;<br>

&gt; &lt;fence_daemon clean_start=&quot;0&quot; post_fail_delay=&quot;0&quot; post_join_delay=&quot;3&quot;/&gt;<br>

&gt; &lt;fencedevices&gt;<br>

&gt;         &lt;fencedevice name=&quot;manual&quot; agent=&quot;fence_manual&quot;/&gt;<br>

&gt; &lt;/fencedevices&gt;<br>

&gt; &lt;/cluster&gt;<br>

&gt;<br>

&gt; drbd.conf:<br>

&gt;<br>

&gt; resource res0 {<br>

&gt;   protocol C;<br>

&gt;   startup {<br>

&gt;     wfc-timeout 20;<br>

&gt;     degr-wfc-timeout 10;<br>

&gt;     # we will keep this commented until tested successfully:<br>

&gt;      become-primary-on both;<br>

&gt;   }<br>

&gt;   net {<br>

&gt;     # the encryption part can be omitted when using a dedicated link for<br>

&gt; DRBD only:<br>

&gt;     # cram-hmac-alg sha1;<br>

&gt;     # shared-secret anysecrethere123;<br>

&gt;     allow-two-primaries;<br>

&gt;   }<br>

&gt;   on node1 {<br>

&gt;     device /dev/drbd0;<br>

&gt;     disk /dev/sdb1;<br>

</div></div>&gt;     address <a href="http://192.168.23.128:7789" target="_blank">192.168.23.128:7789</a> &lt;<a href="http://192.168.23.128:7789" target="_blank">http://192.168.23.128:7789</a>&gt;;<br>

<div class="im">&gt;     meta-disk internal;<br>

&gt;   }<br>

&gt;   on node2 {<br>

&gt;     device /dev/drbd0;<br>

&gt;     disk /dev/sdb1;<br>

</div>&gt;     address <a href="http://192.168.23.129:7789" target="_blank">192.168.23.129:7789</a> &lt;<a href="http://192.168.23.129:7789" target="_blank">http://192.168.23.129:7789</a>&gt;;<br>

<div class="im">&gt;     meta-disk internal;<br>

&gt;   }<br>

&gt; }<br>

&gt;<br>

&gt; Regards,<br>

&gt; Zohair Raza<br>

&gt;<br>

&gt;<br>

&gt; On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza<br>

</div><div class="im">&gt; &lt;<a href="mailto:engineerzuhairraza@gmail.com">engineerzuhairraza@gmail.com</a> &lt;mailto:<a href="mailto:engineerzuhairraza@gmail.com">engineerzuhairraza@gmail.com</a>&gt;&gt; wrote:<br>

&gt;<br>

&gt;     Hi,<br>

&gt;<br>

&gt;     thanks for explanation<br>

&gt;<br>

&gt;     On Mon, Oct 29, 2012 at 9:26 PM, Digimer &lt;<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a><br>

</div><div><div class="h5">&gt;     &lt;mailto:<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>&gt;&gt; wrote:<br>

&gt;<br>

&gt;         When a node stops responding, it can not be assumes to be dead.<br>

&gt;         It has<br>

&gt;         to be put into a known state, and that is what fencing does.<br>

&gt;         Disabling<br>

&gt;         fencing is like driving without a seatbelt.<br>

&gt;<br>

&gt;         Ya, it&#39;ll save you a bit of time at first, but the first time<br>

&gt;         you get a<br>

&gt;         split-brain, you are going right through the windshield. Will<br>

&gt;         you think<br>

&gt;         it was worth it then?<br>

&gt;<br>

&gt;         A split-brain is when neither node fails, but they can&#39;t communicate<br>

&gt;         anymore. If each assumes the other is gone and begins using the<br>

&gt;         shared<br>

&gt;         storage without coordinating with it&#39;s peer, your data will be<br>

&gt;         corrupted<br>

&gt;         very, very quickly.<br>

&gt;<br>

&gt;<br>

&gt;     In my scenario, I have two Samba servers in two different locations<br>

&gt;     so chances of this are obvious<br>

&gt;<br>

&gt;<br>

&gt;         Heck, even if all you had was a floating IP; disabling fencing means<br>

&gt;         that both nodes would try to use that IP. As what your<br>

&gt;         clients/switches/routers think of that.<br>

&gt;<br>

&gt;     I don&#39;t need floating IP, as I am not looking for high availability<br>

&gt;     but two Samba servers synced with each other so roaming employees<br>

&gt;     can have faster access to their files on both locations. I skipped<br>

&gt;     fencing as per Mautris&#39;s suggestion but I still can&#39;t figure out why<br>

&gt;     fencing daemon was not able to fence the other node.<br>

&gt;<br>

&gt;<br>

&gt;         Please read this for more details;<br>

&gt;<br>

&gt;         <a href="https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing" target="_blank">https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing</a><br>

&gt;<br>

&gt;<br>

&gt;     Will have a look<br>

&gt;<br>

&gt;<br>

&gt;         digimer<br>

&gt;<br>

&gt;         On 10/29/2012 03:03 AM, Zohair Raza wrote:<br>

&gt;         &gt; Hi,<br>

&gt;         &gt;<br>

&gt;         &gt; I have setup a Primary/Primary cluster with GFS2.<br>

&gt;         &gt;<br>

&gt;         &gt; All works good if I shut down any node regularly, but when I<br>

&gt;         unplug<br>

&gt;         &gt; power of any node, GFS freezes and I can not access the device.<br>

&gt;         &gt;<br>

&gt;         &gt; Tried to use <a href="http://people.redhat.com/lhh/obliterate" target="_blank">http://people.redhat.com/lhh/obliterate</a><br>

&gt;         &gt;<br>

&gt;         &gt; this is what I see in logs<br>

&gt;         &gt;<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not<br>

&gt;         arrive in time.<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -&gt;<br>

&gt;         Unknown )<br>

&gt;         &gt; conn( Connected -&gt; NetworkFailure ) pdsk( UpToDate -&gt; DUnknown<br>

&gt;         ) susp( 0<br>

&gt;         &gt; -&gt; 1 )<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender<br>

&gt;         thread<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -&gt;<br>

&gt;         &gt; Unconnected )<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver<br>

&gt;         thread<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -&gt;<br>

&gt;         &gt; WFConnection )<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: helper command:<br>

&gt;         /sbin/drbdadm<br>

&gt;         &gt; fence-peer res0<br>

&gt;         &gt; Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: helper command:<br>

&gt;         /sbin/drbdadm<br>

&gt;         &gt; fence-peer res0 exit code 1 (0x100)<br>

&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper<br>

&gt;         broken,<br>

&gt;         &gt; returned 1<br>

&gt;         &gt; Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor<br>

&gt;         failed,<br>

&gt;         &gt; forming new configuration.<br>

&gt;         &gt; Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1<br>

&gt;         &gt; Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor<br>

&gt;         joined or<br>

&gt;         &gt; left the membership and a new membership was formed.<br>

&gt;         &gt; Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen<br>

&gt;         downlist: sender<br>

&gt;         &gt; r(0) ip(192.168.23.128) ; members(old:2 left:1)<br>

&gt;         &gt; Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service<br>

&gt;         &gt; synchronization, ready to provide service.<br>

&gt;         &gt; Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2<br>

&gt;         &gt; Oct 29 08:05:53 node1 fenced[1401]: fencing node node2<br>

&gt;         &gt; Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0:<br>

&gt;         jid=1:<br>

&gt;         &gt; Trying to acquire journal lock...<br>

&gt;         &gt; Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent<br>

&gt;         &gt; fence_ack_manual result: error from agent<br>

&gt;         &gt; Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed<br>

&gt;         &gt; Oct 29 08:05:56 node1 fenced[1401]: fencing node node2<br>

&gt;         &gt; Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent<br>

&gt;         &gt; fence_ack_manual result: error from agent<br>

&gt;         &gt; Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed<br>

&gt;         &gt; Oct 29 08:05:59 node1 fenced[1401]: fencing node node2<br>

&gt;         &gt; Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent<br>

&gt;         &gt; fence_ack_manual result: error from agent<br>

&gt;         &gt; Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed<br>

&gt;         &gt;<br>

&gt;         &gt; Regards,<br>

&gt;         &gt; Zohair Raza<br>

&gt;         &gt;<br>

&gt;         &gt;<br>

&gt;         &gt;<br>

&gt;         &gt; _______________________________________________<br>

&gt;         &gt; drbd-user mailing list<br>

</div></div>&gt;         &gt; <a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a> &lt;mailto:<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a>&gt;<br>

<div class="HOEnZb"><div class="h5">&gt;         &gt; <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>

&gt;         &gt;<br>

&gt;<br>

&gt;<br>

&gt;         --<br>

&gt;         Digimer<br>

&gt;         Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>

&gt;         What if the cure for cancer is trapped in the mind of a person<br>

&gt;         without<br>

&gt;         access to education?<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; drbd-user mailing list<br>

&gt; <a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>

&gt; <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>

&gt;<br>

<br>

<br>

--<br>

Digimer<br>

Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>

What if the cure for cancer is trapped in the mind of a person without<br>

access to education?<br>

</div></div></blockquote></div><br></div></div></div>