<div dir="ltr">I have rebuild the setup, and enabled fencing <div><br></div><div>Manual fencing (fence_ack_manual) works okay when I fence one a dead node from command line but it is not doing automatically </div><div><br>


</div><div>Logs:</div><div><div>Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2</div><div>Oct 30 12:05:52 node1 fenced[1414]: fencing node node2</div><div>Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying to acquire journal lock...</div>


<div>Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent fence_manual result: error from agent</div><div>Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed</div><div>Oct 30 12:05:55 node1 fenced[1414]: fencing node node2</div>


</div><div><br></div><div><div>Cluster.conf:</div><div><br></div><div>&lt;?xml version=&quot;1.0&quot;?&gt;</div><div>&lt;cluster name=&quot;cluster1&quot; config_version=&quot;3&quot;&gt;</div><div>&lt;cman two_node=&quot;1&quot; expected_votes=&quot;1&quot;/&gt;</div>


<div>&lt;clusternodes&gt;</div><div>&lt;clusternode name=&quot;node1&quot; votes=&quot;1&quot; nodeid=&quot;1&quot;&gt;</div><div>        &lt;fence&gt;</div><div>                &lt;method name=&quot;single&quot;&gt;</div>


<div>                        &lt;device name=&quot;manual&quot; ipaddr=&quot;192.168.23.128&quot;/&gt;</div><div>                &lt;/method&gt;</div><div>        &lt;/fence&gt;</div><div>&lt;/clusternode&gt;</div><div>&lt;clusternode name=&quot;node2&quot; votes=&quot;1&quot; nodeid=&quot;2&quot;&gt;</div>


<div>        &lt;fence&gt;</div><div>                &lt;method name=&quot;single&quot;&gt;</div><div>                        &lt;device name=&quot;manual&quot; ipaddr=&quot;192.168.23.129&quot;/&gt;</div><div>                &lt;/method&gt;</div>


<div>        &lt;/fence&gt;</div><div>&lt;/clusternode&gt;</div><div>&lt;/clusternodes&gt;</div><div>&lt;fence_daemon clean_start=&quot;0&quot; post_fail_delay=&quot;0&quot; post_join_delay=&quot;3&quot;/&gt;</div><div>&lt;fencedevices&gt;</div>


<div>        &lt;fencedevice name=&quot;manual&quot; agent=&quot;fence_manual&quot;/&gt;</div><div>&lt;/fencedevices&gt;</div><div>&lt;/cluster&gt;</div></div><div><br></div><div>drbd.conf:</div><div><br></div><div><div>

resource res0 {</div>

<div>  protocol C;</div><div>  startup {</div><div>    wfc-timeout 20;</div><div>    degr-wfc-timeout 10;</div><div>    # we will keep this commented until tested successfully:</div><div>     become-primary-on both;</div>


<div>  }</div><div>  net {</div><div>    # the encryption part can be omitted when using a dedicated link for DRBD only:</div><div>    # cram-hmac-alg sha1;</div><div>    # shared-secret anysecrethere123;</div><div>    allow-two-primaries;</div>


<div>  }</div><div>  on node1 {</div><div>    device /dev/drbd0;</div><div>    disk /dev/sdb1;</div><div>    address <a href="http://192.168.23.128:7789">192.168.23.128:7789</a>;</div><div>    meta-disk internal;</div><div>


  }</div><div>  on node2 {</div><div>    device /dev/drbd0;</div><div>    disk /dev/sdb1;</div><div>    address <a href="http://192.168.23.129:7789">192.168.23.129:7789</a>;</div><div>    meta-disk internal;</div><div>  }</div>


</div><div>}</div><div><div><div><br></div><div><div dir="ltr">Regards,<br>Zohair Raza<div><br></div></div><br><div class="gmail_quote">On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza <span dir="ltr">&lt;<a href="mailto:engineerzuhairraza@gmail.com" target="_blank">engineerzuhairraza@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi, <div><br></div><div>thanks for explanation</div><div><div class="im"><div dir="ltr"><div><br></div>

<div>

On Mon, Oct 29, 2012 at 9:26 PM, Digimer <span dir="ltr">&lt;<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>&gt;</span> wrote:</div>

</div></div><div class="gmail_quote"><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">When a node stops responding, it can not be assumes to be dead. It has<br>


to be put into a known state, and that is what fencing does. Disabling<br>

fencing is like driving without a seatbelt.<br>

<br>

Ya, it&#39;ll save you a bit of time at first, but the first time you get a<br>

split-brain, you are going right through the windshield. Will you think<br>

it was worth it then?<br>

<br>

A split-brain is when neither node fails, but they can&#39;t communicate<br>

anymore. If each assumes the other is gone and begins using the shared<br>

storage without coordinating with it&#39;s peer, your data will be corrupted<br>

very, very quickly.<br></blockquote><div><br></div></div><div>In my scenario, I have two Samba servers in two different locations so chances of this are obvious </div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

Heck, even if all you had was a floating IP; disabling fencing means<br>

that both nodes would try to use that IP. As what your<br>

clients/switches/routers think of that.<br>

<br></blockquote></div><div>I don&#39;t need floating IP, as I am not looking for high availability but two Samba servers synced with each other so roaming employees can have faster access to their files on both locations. I skipped fencing as per Mautris&#39;s suggestion but I still can&#39;t figure out why fencing daemon was not able to fence the other node. </div>


<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Please read this for more details;<br>

<br>

<a href="https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing" target="_blank">https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing</a><br>

<br></blockquote><div><br></div></div><div>Will have a look</div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

digimer<br>

<div><div><br>

On 10/29/2012 03:03 AM, Zohair Raza wrote:<br>

&gt; Hi,<br>

&gt;<br>

&gt; I have setup a Primary/Primary cluster with GFS2.<br>

&gt;<br>

&gt; All works good if I shut down any node regularly, but when I unplug<br>

&gt; power of any node, GFS freezes and I can not access the device.<br>

&gt;<br>

&gt; Tried to use <a href="http://people.redhat.com/lhh/obliterate" target="_blank">http://people.redhat.com/lhh/obliterate</a><br>

&gt;<br>

&gt; this is what I see in logs<br>

&gt;<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time.<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -&gt; Unknown )<br>

&gt; conn( Connected -&gt; NetworkFailure ) pdsk( UpToDate -&gt; DUnknown ) susp( 0<br>

&gt; -&gt; 1 )<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -&gt;<br>

&gt; Unconnected )<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -&gt;<br>

&gt; WFConnection )<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm<br>

&gt; fence-peer res0<br>

&gt; Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm<br>

&gt; fence-peer res0 exit code 1 (0x100)<br>

&gt; Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken,<br>

&gt; returned 1<br>

&gt; Oct 29 08:05:48 node1 corosync[1346]:   [TOTEM ] A processor failed,<br>

&gt; forming new configuration.<br>

&gt; Oct 29 08:05:53 node1 corosync[1346]:   [QUORUM] Members[1]: 1<br>

&gt; Oct 29 08:05:53 node1 corosync[1346]:   [TOTEM ] A processor joined or<br>

&gt; left the membership and a new membership was formed.<br>

&gt; Oct 29 08:05:53 node1 corosync[1346]:   [CPG   ] chosen downlist: sender<br>

&gt; r(0) ip(192.168.23.128) ; members(old:2 left:1)<br>

&gt; Oct 29 08:05:53 node1 corosync[1346]:   [MAIN  ] Completed service<br>

&gt; synchronization, ready to provide service.<br>

&gt; Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2<br>

&gt; Oct 29 08:05:53 node1 fenced[1401]: fencing node node2<br>

&gt; Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1:<br>

&gt; Trying to acquire journal lock...<br>

&gt; Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent<br>

&gt; fence_ack_manual result: error from agent<br>

&gt; Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed<br>

&gt; Oct 29 08:05:56 node1 fenced[1401]: fencing node node2<br>

&gt; Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent<br>

&gt; fence_ack_manual result: error from agent<br>

&gt; Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed<br>

&gt; Oct 29 08:05:59 node1 fenced[1401]: fencing node node2<br>

&gt; Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent<br>

&gt; fence_ack_manual result: error from agent<br>

&gt; Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed<br>

&gt;<br>

&gt; Regards,<br>

&gt; Zohair Raza<br>

&gt;<br>

&gt;<br>

&gt;<br>

</div></div><div><div>&gt; _______________________________________________<br>

&gt; drbd-user mailing list<br>

&gt; <a href="mailto:drbd-user@lists.linbit.com" target="_blank">drbd-user@lists.linbit.com</a><br>

&gt; <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>

&gt;<br>

<br>

<br>

</div></div><span><font color="#888888">--<br>

Digimer<br>

Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>

What if the cure for cancer is trapped in the mind of a person without<br>

access to education?<br>

</font></span></blockquote></div></div></div><br></div></div>

</blockquote></div><br></div></div></div></div>