<div dir="ltr">I have rebuild the setup, and enabled fencing <div><br></div><div>Manual fencing (fence_ack_manual) works okay when I fence one a dead node from command line but it is not doing automatically </div><div><br>
</div><div>Logs:</div><div><div>Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2</div><div>Oct 30 12:05:52 node1 fenced[1414]: fencing node node2</div><div>Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying to acquire journal lock...</div>
<div>Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent fence_manual result: error from agent</div><div>Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed</div><div>Oct 30 12:05:55 node1 fenced[1414]: fencing node node2</div>
</div><div><br></div><div><div>Cluster.conf:</div><div><br></div><div><?xml version="1.0"?></div><div><cluster name="cluster1" config_version="3"></div><div><cman two_node="1" expected_votes="1"/></div>
<div><clusternodes></div><div><clusternode name="node1" votes="1" nodeid="1"></div><div> <fence></div><div> <method name="single"></div>
<div> <device name="manual" ipaddr="192.168.23.128"/></div><div> </method></div><div> </fence></div><div></clusternode></div><div><clusternode name="node2" votes="1" nodeid="2"></div>
<div> <fence></div><div> <method name="single"></div><div> <device name="manual" ipaddr="192.168.23.129"/></div><div> </method></div>
<div> </fence></div><div></clusternode></div><div></clusternodes></div><div><fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/></div><div><fencedevices></div>
<div> <fencedevice name="manual" agent="fence_manual"/></div><div></fencedevices></div><div></cluster></div></div><div><br></div><div>drbd.conf:</div><div><br></div><div><div>
resource res0 {</div>
<div> protocol C;</div><div> startup {</div><div> wfc-timeout 20;</div><div> degr-wfc-timeout 10;</div><div> # we will keep this commented until tested successfully:</div><div> become-primary-on both;</div>
<div> }</div><div> net {</div><div> # the encryption part can be omitted when using a dedicated link for DRBD only:</div><div> # cram-hmac-alg sha1;</div><div> # shared-secret anysecrethere123;</div><div> allow-two-primaries;</div>
<div> }</div><div> on node1 {</div><div> device /dev/drbd0;</div><div> disk /dev/sdb1;</div><div> address <a href="http://192.168.23.128:7789">192.168.23.128:7789</a>;</div><div> meta-disk internal;</div><div>
}</div><div> on node2 {</div><div> device /dev/drbd0;</div><div> disk /dev/sdb1;</div><div> address <a href="http://192.168.23.129:7789">192.168.23.129:7789</a>;</div><div> meta-disk internal;</div><div> }</div>
</div><div>}</div><div><div><div><br></div><div><div dir="ltr">Regards,<br>Zohair Raza<div><br></div></div><br><div class="gmail_quote">On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza <span dir="ltr"><<a href="mailto:engineerzuhairraza@gmail.com" target="_blank">engineerzuhairraza@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi, <div><br></div><div>thanks for explanation</div><div><div class="im"><div dir="ltr"><div><br></div>
<div>
On Mon, Oct 29, 2012 at 9:26 PM, Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span> wrote:</div>
</div></div><div class="gmail_quote"><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">When a node stops responding, it can not be assumes to be dead. It has<br>
to be put into a known state, and that is what fencing does. Disabling<br>
fencing is like driving without a seatbelt.<br>
<br>
Ya, it'll save you a bit of time at first, but the first time you get a<br>
split-brain, you are going right through the windshield. Will you think<br>
it was worth it then?<br>
<br>
A split-brain is when neither node fails, but they can't communicate<br>
anymore. If each assumes the other is gone and begins using the shared<br>
storage without coordinating with it's peer, your data will be corrupted<br>
very, very quickly.<br></blockquote><div><br></div></div><div>In my scenario, I have two Samba servers in two different locations so chances of this are obvious </div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Heck, even if all you had was a floating IP; disabling fencing means<br>
that both nodes would try to use that IP. As what your<br>
clients/switches/routers think of that.<br>
<br></blockquote></div><div>I don't need floating IP, as I am not looking for high availability but two Samba servers synced with each other so roaming employees can have faster access to their files on both locations. I skipped fencing as per Mautris's suggestion but I still can't figure out why fencing daemon was not able to fence the other node. </div>
<div class="im">
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Please read this for more details;<br>
<br>
<a href="https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing" target="_blank">https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing</a><br>
<br></blockquote><div><br></div></div><div>Will have a look</div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
digimer<br>
<div><div><br>
On 10/29/2012 03:03 AM, Zohair Raza wrote:<br>
> Hi,<br>
><br>
> I have setup a Primary/Primary cluster with GFS2.<br>
><br>
> All works good if I shut down any node regularly, but when I unplug<br>
> power of any node, GFS freezes and I can not access the device.<br>
><br>
> Tried to use <a href="http://people.redhat.com/lhh/obliterate" target="_blank">http://people.redhat.com/lhh/obliterate</a><br>
><br>
> this is what I see in logs<br>
><br>
> Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in time.<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown )<br>
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0<br>
> -> 1 )<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -><br>
> Unconnected )<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -><br>
> WFConnection )<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm<br>
> fence-peer res0<br>
> Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm<br>
> fence-peer res0 exit code 1 (0x100)<br>
> Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken,<br>
> returned 1<br>
> Oct 29 08:05:48 node1 corosync[1346]: [TOTEM ] A processor failed,<br>
> forming new configuration.<br>
> Oct 29 08:05:53 node1 corosync[1346]: [QUORUM] Members[1]: 1<br>
> Oct 29 08:05:53 node1 corosync[1346]: [TOTEM ] A processor joined or<br>
> left the membership and a new membership was formed.<br>
> Oct 29 08:05:53 node1 corosync[1346]: [CPG ] chosen downlist: sender<br>
> r(0) ip(192.168.23.128) ; members(old:2 left:1)<br>
> Oct 29 08:05:53 node1 corosync[1346]: [MAIN ] Completed service<br>
> synchronization, ready to provide service.<br>
> Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2<br>
> Oct 29 08:05:53 node1 fenced[1401]: fencing node node2<br>
> Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1:<br>
> Trying to acquire journal lock...<br>
> Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
> fence_ack_manual result: error from agent<br>
> Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed<br>
> Oct 29 08:05:56 node1 fenced[1401]: fencing node node2<br>
> Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
> fence_ack_manual result: error from agent<br>
> Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed<br>
> Oct 29 08:05:59 node1 fenced[1401]: fencing node node2<br>
> Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
> fence_ack_manual result: error from agent<br>
> Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed<br>
><br>
> Regards,<br>
> Zohair Raza<br>
><br>
><br>
><br>
</div></div><div><div>> _______________________________________________<br>
> drbd-user mailing list<br>
> <a href="mailto:drbd-user@lists.linbit.com" target="_blank">drbd-user@lists.linbit.com</a><br>
> <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
><br>
<br>
<br>
</div></div><span><font color="#888888">--<br>
Digimer<br>
Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>
What if the cure for cancer is trapped in the mind of a person without<br>
access to education?<br>
</font></span></blockquote></div></div></div><br></div></div>
</blockquote></div><br></div></div></div></div>