Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Manual fencing is not in any way supported. You must be able to call 'fence_node <peer>' and have the remote node reset. If this doesn't happen, your fencing is not sufficient. On 10/30/2012 05:43 AM, Zohair Raza wrote: > I have rebuild the setup, and enabled fencing > > Manual fencing (fence_ack_manual) works okay when I fence one a dead > node from command line but it is not doing automatically > > Logs: > Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2 > Oct 30 12:05:52 node1 fenced[1414]: fencing node node2 > Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying > to acquire journal lock... > Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent > fence_manual result: error from agent > Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed > Oct 30 12:05:55 node1 fenced[1414]: fencing node node2 > > Cluster.conf: > > <?xml version="1.0"?> > <cluster name="cluster1" config_version="3"> > <cman two_node="1" expected_votes="1"/> > <clusternodes> > <clusternode name="node1" votes="1" nodeid="1"> > <fence> > <method name="single"> > <device name="manual" ipaddr="192.168.23.128"/> > </method> > </fence> > </clusternode> > <clusternode name="node2" votes="1" nodeid="2"> > <fence> > <method name="single"> > <device name="manual" ipaddr="192.168.23.129"/> > </method> > </fence> > </clusternode> > </clusternodes> > <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> > <fencedevices> > <fencedevice name="manual" agent="fence_manual"/> > </fencedevices> > </cluster> > > drbd.conf: > > resource res0 { > protocol C; > startup { > wfc-timeout 20; > degr-wfc-timeout 10; > # we will keep this commented until tested successfully: > become-primary-on both; > } > net { > # the encryption part can be omitted when using a dedicated link for > DRBD only: > # cram-hmac-alg sha1; > # shared-secret anysecrethere123; > allow-two-primaries; > } > on node1 { > device /dev/drbd0; > disk /dev/sdb1; > address 192.168.23.128:7789 <http://192.168.23.128:7789>; > meta-disk internal; > } > on node2 { > device /dev/drbd0; > disk /dev/sdb1; > address 192.168.23.129:7789 <http://192.168.23.129:7789>; > meta-disk internal; > } > } > > Regards, > Zohair Raza > > > On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza > <engineerzuhairraza at gmail.com <mailto:engineerzuhairraza at gmail.com>> wrote: > > Hi, > > thanks for explanation > > On Mon, Oct 29, 2012 at 9:26 PM, Digimer <lists at alteeve.ca > <mailto:lists at alteeve.ca>> wrote: > > When a node stops responding, it can not be assumes to be dead. > It has > to be put into a known state, and that is what fencing does. > Disabling > fencing is like driving without a seatbelt. > > Ya, it'll save you a bit of time at first, but the first time > you get a > split-brain, you are going right through the windshield. Will > you think > it was worth it then? > > A split-brain is when neither node fails, but they can't communicate > anymore. If each assumes the other is gone and begins using the > shared > storage without coordinating with it's peer, your data will be > corrupted > very, very quickly. > > > In my scenario, I have two Samba servers in two different locations > so chances of this are obvious > > > Heck, even if all you had was a floating IP; disabling fencing means > that both nodes would try to use that IP. As what your > clients/switches/routers think of that. > > I don't need floating IP, as I am not looking for high availability > but two Samba servers synced with each other so roaming employees > can have faster access to their files on both locations. I skipped > fencing as per Mautris's suggestion but I still can't figure out why > fencing daemon was not able to fence the other node. > > > Please read this for more details; > > https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing > > > Will have a look > > > digimer > > On 10/29/2012 03:03 AM, Zohair Raza wrote: > > Hi, > > > > I have setup a Primary/Primary cluster with GFS2. > > > > All works good if I shut down any node regularly, but when I > unplug > > power of any node, GFS freezes and I can not access the device. > > > > Tried to use http://people.redhat.com/lhh/obliterate > > > > this is what I see in logs > > > > Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not > arrive in time. > > Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> > Unknown ) > > conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown > ) susp( 0 > > -> 1 ) > > Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated > > Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender > thread > > Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed > > Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -> > > Unconnected ) > > Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated > > Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver > thread > > Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started > > Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -> > > WFConnection ) > > Oct 29 08:05:41 node1 kernel: d-con res0: helper command: > /sbin/drbdadm > > fence-peer res0 > > Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed > > Oct 29 08:05:41 node1 kernel: d-con res0: helper command: > /sbin/drbdadm > > fence-peer res0 exit code 1 (0x100) > > Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper > broken, > > returned 1 > > Oct 29 08:05:48 node1 corosync[1346]: [TOTEM ] A processor > failed, > > forming new configuration. > > Oct 29 08:05:53 node1 corosync[1346]: [QUORUM] Members[1]: 1 > > Oct 29 08:05:53 node1 corosync[1346]: [TOTEM ] A processor > joined or > > left the membership and a new membership was formed. > > Oct 29 08:05:53 node1 corosync[1346]: [CPG ] chosen > downlist: sender > > r(0) ip(192.168.23.128) ; members(old:2 left:1) > > Oct 29 08:05:53 node1 corosync[1346]: [MAIN ] Completed service > > synchronization, ready to provide service. > > Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2 > > Oct 29 08:05:53 node1 fenced[1401]: fencing node node2 > > Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: > jid=1: > > Trying to acquire journal lock... > > Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent > > fence_ack_manual result: error from agent > > Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed > > Oct 29 08:05:56 node1 fenced[1401]: fencing node node2 > > Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent > > fence_ack_manual result: error from agent > > Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed > > Oct 29 08:05:59 node1 fenced[1401]: fencing node node2 > > Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent > > fence_ack_manual result: error from agent > > Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed > > > > Regards, > > Zohair Raza > > > > > > > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com> > > http://lists.linbit.com/mailman/listinfo/drbd-user > > > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person > without > access to education? > > > > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?