<div dir="ltr">fence_node <peer> doesn't work for me<div><br></div><div>fence_node node2 says </div><div><br></div><div>fence node2 failed<br><div><br></div><div><br clear="all"><div dir="ltr">Regards,<br>Zohair Raza<div>
<br></div></div><br><div class="gmail_quote">On Tue, Oct 30, 2012 at 7:58 PM, Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Manual fencing is not in any way supported. You must be able to call<br>
'fence_node <peer>' and have the remote node reset. If this doesn't<br>
happen, your fencing is not sufficient.<br>
<div><div class="h5"><br>
On 10/30/2012 05:43 AM, Zohair Raza wrote:<br>
> I have rebuild the setup, and enabled fencing<br>
><br>
> Manual fencing (fence_ack_manual) works okay when I fence one a dead<br>
> node from command line but it is not doing automatically<br>
><br>
> Logs:<br>
> Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2<br>
> Oct 30 12:05:52 node1 fenced[1414]: fencing node node2<br>
> Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying<br>
> to acquire journal lock...<br>
> Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent<br>
> fence_manual result: error from agent<br>
> Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed<br>
> Oct 30 12:05:55 node1 fenced[1414]: fencing node node2<br>
><br>
> Cluster.conf:<br>
><br>
> <?xml version="1.0"?><br>
> <cluster name="cluster1" config_version="3"><br>
> <cman two_node="1" expected_votes="1"/><br>
> <clusternodes><br>
> <clusternode name="node1" votes="1" nodeid="1"><br>
> <fence><br>
> <method name="single"><br>
> <device name="manual" ipaddr="192.168.23.128"/><br>
> </method><br>
> </fence><br>
> </clusternode><br>
> <clusternode name="node2" votes="1" nodeid="2"><br>
> <fence><br>
> <method name="single"><br>
> <device name="manual" ipaddr="192.168.23.129"/><br>
> </method><br>
> </fence><br>
> </clusternode><br>
> </clusternodes><br>
> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/><br>
> <fencedevices><br>
> <fencedevice name="manual" agent="fence_manual"/><br>
> </fencedevices><br>
> </cluster><br>
><br>
> drbd.conf:<br>
><br>
> resource res0 {<br>
> protocol C;<br>
> startup {<br>
> wfc-timeout 20;<br>
> degr-wfc-timeout 10;<br>
> # we will keep this commented until tested successfully:<br>
> become-primary-on both;<br>
> }<br>
> net {<br>
> # the encryption part can be omitted when using a dedicated link for<br>
> DRBD only:<br>
> # cram-hmac-alg sha1;<br>
> # shared-secret anysecrethere123;<br>
> allow-two-primaries;<br>
> }<br>
> on node1 {<br>
> device /dev/drbd0;<br>
> disk /dev/sdb1;<br>
</div></div>> address <a href="http://192.168.23.128:7789" target="_blank">192.168.23.128:7789</a> <<a href="http://192.168.23.128:7789" target="_blank">http://192.168.23.128:7789</a>>;<br>
<div class="im">> meta-disk internal;<br>
> }<br>
> on node2 {<br>
> device /dev/drbd0;<br>
> disk /dev/sdb1;<br>
</div>> address <a href="http://192.168.23.129:7789" target="_blank">192.168.23.129:7789</a> <<a href="http://192.168.23.129:7789" target="_blank">http://192.168.23.129:7789</a>>;<br>
<div class="im">> meta-disk internal;<br>
> }<br>
> }<br>
><br>
> Regards,<br>
> Zohair Raza<br>
><br>
><br>
> On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza<br>
</div><div class="im">> <<a href="mailto:engineerzuhairraza@gmail.com">engineerzuhairraza@gmail.com</a> <mailto:<a href="mailto:engineerzuhairraza@gmail.com">engineerzuhairraza@gmail.com</a>>> wrote:<br>
><br>
> Hi,<br>
><br>
> thanks for explanation<br>
><br>
> On Mon, Oct 29, 2012 at 9:26 PM, Digimer <<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a><br>
</div><div><div class="h5">> <mailto:<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>>> wrote:<br>
><br>
> When a node stops responding, it can not be assumes to be dead.<br>
> It has<br>
> to be put into a known state, and that is what fencing does.<br>
> Disabling<br>
> fencing is like driving without a seatbelt.<br>
><br>
> Ya, it'll save you a bit of time at first, but the first time<br>
> you get a<br>
> split-brain, you are going right through the windshield. Will<br>
> you think<br>
> it was worth it then?<br>
><br>
> A split-brain is when neither node fails, but they can't communicate<br>
> anymore. If each assumes the other is gone and begins using the<br>
> shared<br>
> storage without coordinating with it's peer, your data will be<br>
> corrupted<br>
> very, very quickly.<br>
><br>
><br>
> In my scenario, I have two Samba servers in two different locations<br>
> so chances of this are obvious<br>
><br>
><br>
> Heck, even if all you had was a floating IP; disabling fencing means<br>
> that both nodes would try to use that IP. As what your<br>
> clients/switches/routers think of that.<br>
><br>
> I don't need floating IP, as I am not looking for high availability<br>
> but two Samba servers synced with each other so roaming employees<br>
> can have faster access to their files on both locations. I skipped<br>
> fencing as per Mautris's suggestion but I still can't figure out why<br>
> fencing daemon was not able to fence the other node.<br>
><br>
><br>
> Please read this for more details;<br>
><br>
> <a href="https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing" target="_blank">https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing</a><br>
><br>
><br>
> Will have a look<br>
><br>
><br>
> digimer<br>
><br>
> On 10/29/2012 03:03 AM, Zohair Raza wrote:<br>
> > Hi,<br>
> ><br>
> > I have setup a Primary/Primary cluster with GFS2.<br>
> ><br>
> > All works good if I shut down any node regularly, but when I<br>
> unplug<br>
> > power of any node, GFS freezes and I can not access the device.<br>
> ><br>
> > Tried to use <a href="http://people.redhat.com/lhh/obliterate" target="_blank">http://people.redhat.com/lhh/obliterate</a><br>
> ><br>
> > this is what I see in logs<br>
> ><br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not<br>
> arrive in time.<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -><br>
> Unknown )<br>
> > conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown<br>
> ) susp( 0<br>
> > -> 1 )<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender<br>
> thread<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -><br>
> > Unconnected )<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver<br>
> thread<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -><br>
> > WFConnection )<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: helper command:<br>
> /sbin/drbdadm<br>
> > fence-peer res0<br>
> > Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: helper command:<br>
> /sbin/drbdadm<br>
> > fence-peer res0 exit code 1 (0x100)<br>
> > Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper<br>
> broken,<br>
> > returned 1<br>
> > Oct 29 08:05:48 node1 corosync[1346]: [TOTEM ] A processor<br>
> failed,<br>
> > forming new configuration.<br>
> > Oct 29 08:05:53 node1 corosync[1346]: [QUORUM] Members[1]: 1<br>
> > Oct 29 08:05:53 node1 corosync[1346]: [TOTEM ] A processor<br>
> joined or<br>
> > left the membership and a new membership was formed.<br>
> > Oct 29 08:05:53 node1 corosync[1346]: [CPG ] chosen<br>
> downlist: sender<br>
> > r(0) ip(192.168.23.128) ; members(old:2 left:1)<br>
> > Oct 29 08:05:53 node1 corosync[1346]: [MAIN ] Completed service<br>
> > synchronization, ready to provide service.<br>
> > Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2<br>
> > Oct 29 08:05:53 node1 fenced[1401]: fencing node node2<br>
> > Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0:<br>
> jid=1:<br>
> > Trying to acquire journal lock...<br>
> > Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
> > fence_ack_manual result: error from agent<br>
> > Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed<br>
> > Oct 29 08:05:56 node1 fenced[1401]: fencing node node2<br>
> > Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
> > fence_ack_manual result: error from agent<br>
> > Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed<br>
> > Oct 29 08:05:59 node1 fenced[1401]: fencing node node2<br>
> > Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent<br>
> > fence_ack_manual result: error from agent<br>
> > Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed<br>
> ><br>
> > Regards,<br>
> > Zohair Raza<br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > drbd-user mailing list<br>
</div></div>> > <a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a> <mailto:<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a>><br>
<div class="HOEnZb"><div class="h5">> > <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
> ><br>
><br>
><br>
> --<br>
> Digimer<br>
> Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>
> What if the cure for cancer is trapped in the mind of a person<br>
> without<br>
> access to education?<br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> drbd-user mailing list<br>
> <a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
> <a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
><br>
<br>
<br>
--<br>
Digimer<br>
Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>
What if the cure for cancer is trapped in the mind of a person without<br>
access to education?<br>
</div></div></blockquote></div><br></div></div></div>