Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have rebuild the setup, and enabled fencing Manual fencing (fence_ack_manual) works okay when I fence one a dead node from command line but it is not doing automatically Logs: Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2 Oct 30 12:05:52 node1 fenced[1414]: fencing node node2 Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying to acquire journal lock... Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent fence_manual result: error from agent Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed Oct 30 12:05:55 node1 fenced[1414]: fencing node node2 Cluster.conf: <?xml version="1.0"?> <cluster name="cluster1" config_version="3"> <cman two_node="1" expected_votes="1"/> <clusternodes> <clusternode name="node1" votes="1" nodeid="1"> <fence> <method name="single"> <device name="manual" ipaddr="192.168.23.128"/> </method> </fence> </clusternode> <clusternode name="node2" votes="1" nodeid="2"> <fence> <method name="single"> <device name="manual" ipaddr="192.168.23.129"/> </method> </fence> </clusternode> </clusternodes> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <fencedevices> <fencedevice name="manual" agent="fence_manual"/> </fencedevices> </cluster> drbd.conf: resource res0 { protocol C; startup { wfc-timeout 20; degr-wfc-timeout 10; # we will keep this commented until tested successfully: become-primary-on both; } net { # the encryption part can be omitted when using a dedicated link for DRBD only: # cram-hmac-alg sha1; # shared-secret anysecrethere123; allow-two-primaries; } on node1 { device /dev/drbd0; disk /dev/sdb1; address 192.168.23.128:7789; meta-disk internal; } on node2 { device /dev/drbd0; disk /dev/sdb1; address 192.168.23.129:7789; meta-disk internal; } } Regards, Zohair Raza On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza <engineerzuhairraza at gmail.com>wrote: > Hi, > > thanks for explanation > > On Mon, Oct 29, 2012 at 9:26 PM, Digimer <lists at alteeve.ca> wrote: > >> When a node stops responding, it can not be assumes to be dead. It has >> to be put into a known state, and that is what fencing does. Disabling >> fencing is like driving without a seatbelt. >> >> Ya, it'll save you a bit of time at first, but the first time you get a >> split-brain, you are going right through the windshield. Will you think >> it was worth it then? >> >> A split-brain is when neither node fails, but they can't communicate >> anymore. If each assumes the other is gone and begins using the shared >> storage without coordinating with it's peer, your data will be corrupted >> very, very quickly. >> > > In my scenario, I have two Samba servers in two different locations so > chances of this are obvious > >> >> Heck, even if all you had was a floating IP; disabling fencing means >> that both nodes would try to use that IP. As what your >> clients/switches/routers think of that. >> >> I don't need floating IP, as I am not looking for high availability but > two Samba servers synced with each other so roaming employees can have > faster access to their files on both locations. I skipped fencing as per > Mautris's suggestion but I still can't figure out why fencing daemon was > not able to fence the other node. > > >> Please read this for more details; >> >> >> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing >> >> > Will have a look > > >> digimer >> >> On 10/29/2012 03:03 AM, Zohair Raza wrote: >> > Hi, >> > >> > I have setup a Primary/Primary cluster with GFS2. >> > >> > All works good if I shut down any node regularly, but when I unplug >> > power of any node, GFS freezes and I can not access the device. >> > >> > Tried to use http://people.redhat.com/lhh/obliterate >> > >> > this is what I see in logs >> > >> > Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in >> time. >> > Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown ) >> > conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 >> > -> 1 ) >> > Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated >> > Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread >> > Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed >> > Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure -> >> > Unconnected ) >> > Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated >> > Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread >> > Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started >> > Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected -> >> > WFConnection ) >> > Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm >> > fence-peer res0 >> > Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed >> > Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm >> > fence-peer res0 exit code 1 (0x100) >> > Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken, >> > returned 1 >> > Oct 29 08:05:48 node1 corosync[1346]: [TOTEM ] A processor failed, >> > forming new configuration. >> > Oct 29 08:05:53 node1 corosync[1346]: [QUORUM] Members[1]: 1 >> > Oct 29 08:05:53 node1 corosync[1346]: [TOTEM ] A processor joined or >> > left the membership and a new membership was formed. >> > Oct 29 08:05:53 node1 corosync[1346]: [CPG ] chosen downlist: sender >> > r(0) ip(192.168.23.128) ; members(old:2 left:1) >> > Oct 29 08:05:53 node1 corosync[1346]: [MAIN ] Completed service >> > synchronization, ready to provide service. >> > Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2 >> > Oct 29 08:05:53 node1 fenced[1401]: fencing node node2 >> > Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1: >> > Trying to acquire journal lock... >> > Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent >> > fence_ack_manual result: error from agent >> > Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed >> > Oct 29 08:05:56 node1 fenced[1401]: fencing node node2 >> > Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent >> > fence_ack_manual result: error from agent >> > Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed >> > Oct 29 08:05:59 node1 fenced[1401]: fencing node node2 >> > Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent >> > fence_ack_manual result: error from agent >> > Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed >> > >> > Regards, >> > Zohair Raza >> > >> > >> > >> > _______________________________________________ >> > drbd-user mailing list >> > drbd-user at lists.linbit.com >> > http://lists.linbit.com/mailman/listinfo/drbd-user >> > >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121030/5b587dc0/attachment.htm>