<div dir="ltr">fence_node &lt;peer&gt; doesn&#39;t work for me<div><br></div><div>fence_node node2 says </div><div><br></div><div>fence node2 failed<br><div><br></div><div><br clear="all"><div dir="ltr">Regards,<br>Zohair Raza<div>

<br></div></div><br><div class="gmail_quote">On Tue, Oct 30, 2012 at 7:58 PM, Digimer <span dir="ltr">&lt;<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Manual fencing is not in any way supported. You must be able to call<br>
&#39;fence_node &lt;peer&gt;&#39; and have the remote node reset. If this doesn&#39;t<br>
happen, your fencing is not sufficient.<br>
<div><div class="h5"><br>
On 10/30/2012 05:43 AM, Zohair Raza wrote:<br>
&gt; I have rebuild the setup, and enabled fencing<br>
&gt;<br>
&gt; Manual fencing (fence_ack_manual) works okay when I fence one a dead<br>
&gt; node from command line but it is not doing automatically<br>
&gt;<br>
&gt; Logs:<br>
&gt; Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2<br>
&gt; Oct 30 12:05:52 node1 fenced[1414]: fencing node node2<br>
&gt; Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying<br>
&gt; to acquire journal lock...<br>
&gt; Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent<br>
&gt; fence_manual result: error from agent<br>
&gt; Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed<br>
&gt; Oct 30 12:05:55 node1 fenced[1414]: fencing node node2<br>
&gt;<br>
&gt; Cluster.conf:<br>
&gt;<br>
&gt; &lt;?xml version=&quot;1.0&quot;?&gt;<br>
&gt; &lt;cluster name=&quot;cluster1&quot; config_version=&quot;3&quot;&gt;<br>
&gt; &lt;cman two_node=&quot;1&quot; expected_votes=&quot;1&quot;/&gt;<br>
&gt; &lt;clusternodes&gt;<br>
&gt; &lt;clusternode name=&quot;node1&quot; votes=&quot;1&quot; nodeid=&quot;1&quot;&gt;<br>
&gt;         &lt;fence&gt;<br>
&gt;                 &lt;method name=&quot;single&quot;&gt;<br>
&gt;                         &lt;device name=&quot;manual&quot; ipaddr=&quot;192.168.23.128&quot;/&gt;<br>
&gt;                 &lt;/method&gt;<br>
&gt;         &lt;/fence&gt;<br>
&gt; &lt;/clusternode&gt;<br>
&gt; &lt;clusternode name=&quot;node2&quot; votes=&quot;1&quot; nodeid=&quot;2&quot;&gt;<br>
&gt;         &lt;fence&gt;<br>
&gt;                 &lt;method name=&quot;single&quot;&gt;<br>
&gt;                         &lt;device name=&quot;manual&quot; ipaddr=&quot;192.168.23.129&quot;/&gt;<br>
&gt;                 &lt;/method&gt;<br>
&gt;         &lt;/fence&gt;<br>
&gt; &lt;/clusternode&gt;<br>
&gt; &lt;/clusternodes&gt;<br>
&gt; &lt;fence_daemon clean_start=&quot;0&quot; post_fail_delay=&quot;0&quot; post_join_delay=&quot;3&quot;/&gt;<br>
&gt; &lt;fencedevices&gt;<br>
&gt;         &lt;fencedevice name=&quot;manual&quot; agent=&quot;fence_manual&quot;/&gt;<br>
&gt; &lt;/fencedevices&gt;<br>
&gt; &lt;/cluster&gt;<br>
&gt;<br>
&gt; drbd.conf:<br>
&gt;<br>
&gt; resource res0 {<br>
&gt;   protocol C;<br>
&gt;   startup {<br>
&gt;     wfc-timeout 20;<br>
&gt;     degr-wfc-timeout 10;<br>
&gt;     # we will keep this commented until tested successfully:<br>
&gt;      become-primary-on both;<br>
&gt;   }<br>
&gt;   net {<br>
&gt;     # the encryption part can be omitted when using a dedicated link for<br>
&gt; DRBD only:<br>
&gt;     # cram-hmac-alg sha1;<br>
&gt;     # shared-secret anysecrethere123;<br>
&gt;     allow-two-primaries;<br>
&gt;   }<br>
&gt;   on node1 {<br>
&gt;     device /dev/drbd0;<br>
&gt;     disk /dev/sdb1;<br>
</div></div>&gt;     address <a href="http://192.168.23.128:7789" target="_blank">192.168.23.128:7789</a> &lt;<a href="http://192.168.23.128:7789" target="_blank">http://192.168.23.128:7789</a>&gt;;<br>
<div class="im">&gt;     meta-disk internal;<br>
&gt;   }<br>
&gt;   on node2 {<br>
&gt;     device /dev/drbd0;<br>
&gt;     disk /dev/sdb1;<br>
</div>&gt;     address <a href="http://192.168.23.129:7789" target="_blank">192.168.23.129:7789</a> &lt;<a href="http://192.168.23.129:7789" target="_blank">http://192.168.23.129:7789</a>&gt;;<br>
<div class="im">&gt;     meta-disk internal;<br>
&gt;   }<br>
&gt; }<br>
&gt;<br>
&gt; Regards,<br>
&gt; Zohair Raza<br>
&gt;<br>
&gt;<br>
&gt; On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza<br>
</div><div class="im">&gt; &lt;<a href="mailto:engineerzuhairraza@gmail.com">engineerzuhairraza@gmail.com</a> &lt;mailto:<a href="mailto:engineerzuhairraza@gmail.com">engineerzuhairraza@gmail.com</a>&gt;&gt; wrote:<br>
&gt;<br>
&gt;     Hi,<br>
&gt;<br>
&gt;     thanks for explanation<br>
&gt;<br>
&gt;     On Mon, Oct 29, 2012 at 9:26 PM, Digimer &lt;<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a><br>
</div><div><div class="h5">&gt;     &lt;mailto:<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>&gt;&gt; wrote:<br>
&gt;<br>
&gt;         When a node stops responding, it can not be assumes to be dead.<br>
&gt;         It has<br>
&gt;         to be put into a known state, and that is what fencing does.<br>
&gt;         Disabling<br>
&gt;         fencing is like driving without a seatbelt.<br>
&gt;<br>
&gt;         Ya, it&#39;ll save you a bit of time at first, but the first time<br>
&gt;         you get a<br>
&gt;         split-brain, you are going right through the windshield. Will<br>
&gt;         you think<br>
&gt;         it was worth it then?<br>
&gt;<br>
&gt;         A split-brain is when neither node fails, but they can&#39;t communicate<br>
&gt;         anymore. If each assumes the other is gone and begins using the<br>
&gt;         shared<br>
&gt;         storage without coordinating with it&#39;s peer, your data will be<br>
&gt;         corrupted<br>
&gt;         very, very quickly.<br>
&gt;<br>
&gt;<br>
&gt;     In my scenario, I have two Samba servers in two different locations<br>
&gt;     so chances of this are obvious<br>
&gt;<br>
&gt;<br>
&gt;         Heck, even if all you had was a floating IP; disabling fencing means<br>
&gt;         that both nodes would try to use that IP. As what your<br>
&gt;         clients/switches/routers think of that.<br>
&gt;<br>
&gt;     I don&#39;t need floating IP, as I am not looking for high availability<br>
&gt;     but two Samba servers synced with each other so roaming employees<br>
&gt;     can have faster access to their files on both locations. I skipped<br>
&gt;     fencing as per Mautris&#39;s suggestion but I still can&#39;t figure out why<br>
&gt;     fencing daemon was not able to fence the other node.<br>
&gt;<br>
&gt;<br>
&gt;         Please read this for more details;<br>
&gt;<br>
&gt;         <a href="https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing" target="_blank">https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing</a><br>
&gt;<br>
&gt;<br>
&gt;     Will have a look<br>
&gt;<br>
&gt;<br>
&gt;         digimer<br>
&gt;<br>
&gt;         On 10/29/2012 03:03 AM, Zohair Raza wrote:<br>
&gt;         &gt; Hi,<br>
&gt;         &gt;<br>
&gt;         &gt; I have setup a Primary/Primary cluster with GFS2.<br>
&gt;         &gt;<br>
&gt;         &gt; All works good if I shut down any node regularly, but when I<br>
&gt;         unplug<br>
&gt;         &gt; power of any node, GFS freezes and I can not access the device.<br>
&gt;         &gt;<br>
&gt;         &gt; Tried to use <a href="http://people.redhat.com/lhh/obliterate" target="_blank">http://people.redhat.com/lhh/obliterate</a><br>
&gt;         &gt;<br>
&gt;         &gt; this is what I see in logs<br>
&gt;         &gt;<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not<br>
&gt;         arrive in time.<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -&gt;<br>
&gt;         Unknown )<br>
&gt;         &gt; conn( Connected -&gt; NetworkFailure ) pdsk( UpToDate -&gt; DUnknown<br>
&gt;         ) susp( 0<br>
&gt;         &gt; -&gt; 1 )<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender<br>
&gt;         thread<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -&gt;<br>
&gt;         &gt; Unconnected )<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver<br>
&gt;         thread<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -&gt;<br>
&gt;         &gt; WFConnection )<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: helper command:<br>
&gt;         /sbin/drbdadm<br>
&gt;         &gt; fence-peer res0<br>
&gt;         &gt; Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: helper command:<br>
&gt;         /sbin/drbdadm<br>
&gt;         &gt; fence-peer res0 exit code 1 (0x100)<br>
&gt;         &gt; Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper<br>
&gt;         broken,<br>
&gt;         &gt; returned 1<br>
&gt;         &gt; Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor<br>
&gt;         failed,<br>
&gt;         &gt; forming new configuration.<br>
&gt;         &gt; Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1<br>
&gt;         &gt; Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor<br>
&gt;         joined or<br>
&gt;         &gt; left the membership and a new membership was formed.<br>
&gt;         &gt; Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen<br>
&gt;         downlist: sender<br>
&gt;         &gt; r(0) ip(192.168.23.128) ; members(old:2 left:1)<br>
&gt;         &gt; Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service<br>
&gt;         &gt; synchronization, ready to provide service.<br>
&gt;         &gt; Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2<br>
&gt;         &gt; Oct 29 08:05:53 node1 fenced[1401]: fencing node node2<br>
&gt;         &gt; Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0:<br>
&gt;         jid=1:<br>
&gt;         &gt; Trying to acquire journal lock...<br>
&gt;         &gt; Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
&gt;         &gt; fence_ack_manual result: error from agent<br>
&gt;         &gt; Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed<br>
&gt;         &gt; Oct 29 08:05:56 node1 fenced[1401]: fencing node node2<br>
&gt;         &gt; Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
&gt;         &gt; fence_ack_manual result: error from agent<br>
&gt;         &gt; Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed<br>
&gt;         &gt; Oct 29 08:05:59 node1 fenced[1401]: fencing node node2<br>
&gt;         &gt; Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
&gt;         &gt; fence_ack_manual result: error from agent<br>
&gt;         &gt; Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed<br>
&gt;         &gt;<br>
&gt;         &gt; Regards,<br>
&gt;         &gt; Zohair Raza<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt;<br>
&gt;         &gt; _______________________________________________<br>
&gt;         &gt; drbd-user mailing list<br>
</div></div>&gt;         &gt; <a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a> &lt;mailto:<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a>&gt;<br>
<div class="HOEnZb"><div class="h5">&gt;         &gt; <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
&gt;         &gt;<br>
&gt;<br>
&gt;<br>
&gt;         --<br>
&gt;         Digimer<br>
&gt;         Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>
&gt;         What if the cure for cancer is trapped in the mind of a person<br>
&gt;         without<br>
&gt;         access to education?<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; drbd-user mailing list<br>
&gt; <a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
&gt; <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
&gt;<br>
<br>
<br>
--<br>
Digimer<br>
Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>
What if the cure for cancer is trapped in the mind of a person without<br>
access to education?<br>
</div></div></blockquote></div><br></div></div></div>