Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have rebuild the setup, and enabled fencing
Manual fencing (fence_ack_manual) works okay when I fence one a dead node
from command line but it is not doing automatically
Logs:
Oct 30 12:05:52 node1 kernel: dlm: closing connection to node 2
Oct 30 12:05:52 node1 fenced[1414]: fencing node node2
Oct 30 12:05:52 node1 kernel: GFS2: fsid=cluster1:gfs.0: jid=1: Trying to
acquire journal lock...
Oct 30 12:05:52 node1 fenced[1414]: fence node2 dev 0.0 agent fence_manual
result: error from agent
Oct 30 12:05:52 node1 fenced[1414]: fence node2 failed
Oct 30 12:05:55 node1 fenced[1414]: fencing node node2
Cluster.conf:
<?xml version="1.0"?>
<cluster name="cluster1" config_version="3">
<cman two_node="1" expected_votes="1"/>
<clusternodes>
<clusternode name="node1" votes="1" nodeid="1">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.23.128"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" votes="1" nodeid="2">
<fence>
<method name="single">
<device name="manual" ipaddr="192.168.23.129"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<fencedevices>
<fencedevice name="manual" agent="fence_manual"/>
</fencedevices>
</cluster>
drbd.conf:
resource res0 {
protocol C;
startup {
wfc-timeout 20;
degr-wfc-timeout 10;
# we will keep this commented until tested successfully:
become-primary-on both;
}
net {
# the encryption part can be omitted when using a dedicated link for
DRBD only:
# cram-hmac-alg sha1;
# shared-secret anysecrethere123;
allow-two-primaries;
}
on node1 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.23.128:7789;
meta-disk internal;
}
on node2 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.23.129:7789;
meta-disk internal;
}
}
Regards,
Zohair Raza
On Tue, Oct 30, 2012 at 12:39 PM, Zohair Raza
<engineerzuhairraza at gmail.com>wrote:
> Hi,
>
> thanks for explanation
>
> On Mon, Oct 29, 2012 at 9:26 PM, Digimer <lists at alteeve.ca> wrote:
>
>> When a node stops responding, it can not be assumes to be dead. It has
>> to be put into a known state, and that is what fencing does. Disabling
>> fencing is like driving without a seatbelt.
>>
>> Ya, it'll save you a bit of time at first, but the first time you get a
>> split-brain, you are going right through the windshield. Will you think
>> it was worth it then?
>>
>> A split-brain is when neither node fails, but they can't communicate
>> anymore. If each assumes the other is gone and begins using the shared
>> storage without coordinating with it's peer, your data will be corrupted
>> very, very quickly.
>>
>
> In my scenario, I have two Samba servers in two different locations so
> chances of this are obvious
>
>>
>> Heck, even if all you had was a floating IP; disabling fencing means
>> that both nodes would try to use that IP. As what your
>> clients/switches/routers think of that.
>>
>> I don't need floating IP, as I am not looking for high availability but
> two Samba servers synced with each other so roaming employees can have
> faster access to their files on both locations. I skipped fencing as per
> Mautris's suggestion but I still can't figure out why fencing daemon was
> not able to fence the other node.
>
>
>> Please read this for more details;
>>
>>
>> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing
>>
>>
> Will have a look
>
>
>> digimer
>>
>> On 10/29/2012 03:03 AM, Zohair Raza wrote:
>> > Hi,
>> >
>> > I have setup a Primary/Primary cluster with GFS2.
>> >
>> > All works good if I shut down any node regularly, but when I unplug
>> > power of any node, GFS freezes and I can not access the device.
>> >
>> > Tried to use http://people.redhat.com/lhh/obliterate
>> >
>> > this is what I see in logs
>> >
>> > Oct 29 08:05:41 node1 kernel: d-con res0: PingAck did not arrive in
>> time.
>> > Oct 29 08:05:41 node1 kernel: d-con res0: peer( Primary -> Unknown )
>> > conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0
>> > -> 1 )
>> > Oct 29 08:05:41 node1 kernel: d-con res0: asender terminated
>> > Oct 29 08:05:41 node1 kernel: d-con res0: Terminating asender thread
>> > Oct 29 08:05:41 node1 kernel: d-con res0: Connection closed
>> > Oct 29 08:05:41 node1 kernel: d-con res0: conn( NetworkFailure ->
>> > Unconnected )
>> > Oct 29 08:05:41 node1 kernel: d-con res0: receiver terminated
>> > Oct 29 08:05:41 node1 kernel: d-con res0: Restarting receiver thread
>> > Oct 29 08:05:41 node1 kernel: d-con res0: receiver (re)started
>> > Oct 29 08:05:41 node1 kernel: d-con res0: conn( Unconnected ->
>> > WFConnection )
>> > Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
>> > fence-peer res0
>> > Oct 29 08:05:41 node1 fence_node[1912]: fence node2 failed
>> > Oct 29 08:05:41 node1 kernel: d-con res0: helper command: /sbin/drbdadm
>> > fence-peer res0 exit code 1 (0x100)
>> > Oct 29 08:05:41 node1 kernel: d-con res0: fence-peer helper broken,
>> > returned 1
>> > Oct 29 08:05:48 node1 corosync[1346]: [TOTEM ] A processor failed,
>> > forming new configuration.
>> > Oct 29 08:05:53 node1 corosync[1346]: [QUORUM] Members[1]: 1
>> > Oct 29 08:05:53 node1 corosync[1346]: [TOTEM ] A processor joined or
>> > left the membership and a new membership was formed.
>> > Oct 29 08:05:53 node1 corosync[1346]: [CPG ] chosen downlist: sender
>> > r(0) ip(192.168.23.128) ; members(old:2 left:1)
>> > Oct 29 08:05:53 node1 corosync[1346]: [MAIN ] Completed service
>> > synchronization, ready to provide service.
>> > Oct 29 08:05:53 node1 kernel: dlm: closing connection to node 2
>> > Oct 29 08:05:53 node1 fenced[1401]: fencing node node2
>> > Oct 29 08:05:53 node1 kernel: GFS2: fsid=cluster-setup:res0.0: jid=1:
>> > Trying to acquire journal lock...
>> > Oct 29 08:05:53 node1 fenced[1401]: fence node2 dev 0.0 agent
>> > fence_ack_manual result: error from agent
>> > Oct 29 08:05:53 node1 fenced[1401]: fence node2 failed
>> > Oct 29 08:05:56 node1 fenced[1401]: fencing node node2
>> > Oct 29 08:05:56 node1 fenced[1401]: fence node2 dev 0.0 agent
>> > fence_ack_manual result: error from agent
>> > Oct 29 08:05:56 node1 fenced[1401]: fence node2 failed
>> > Oct 29 08:05:59 node1 fenced[1401]: fencing node node2
>> > Oct 29 08:05:59 node1 fenced[1401]: fence node2 dev 0.0 agent
>> > fence_ack_manual result: error from agent
>> > Oct 29 08:05:59 node1 fenced[1401]: fence node2 failed
>> >
>> > Regards,
>> > Zohair Raza
>> >
>> >
>> >
>> > _______________________________________________
>> > drbd-user mailing list
>> > drbd-user at lists.linbit.com
>> > http://lists.linbit.com/mailman/listinfo/drbd-user
>> >
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121030/5b587dc0/attachment.htm>