[DRBD-user] fence-peer

Fri Sep 30 11:47:05 CEST 2011

just realized that the email went in private. resending to the list, sorry

On Thu, 29 Sep 2011 09:17:48 -0700, Digimer <linux at alteeve.com> wrote:
> On 09/29/2011 02:55 AM, Kaloyan Kovachev wrote:
>> Hi list,
>>  i am about to upgrade DRBD on a RHCM cluster where GFS2 is used (dual
>> primary mode). Previously i was using the outdate-peer script wodified
to
>> call fence_node in case the peer can not be reached over SSH. In the
new
>> version i can see the outdate-peer handler is replaced by fence-peer
and
>> the script executed is crm-fence-peer, but the problem is, the cluster
is
>> not using peacemaker. So here are the questions:
>> 
>>  1. when using resource-and-stonith should the script always exit with
7
>> or it is OK to keep using the modified outdate-peer and return 7 only
if
>> the peer was fenced, which happens if it can't be contacted via SSH
only
>> i.e. at the end of the script the RV is still 5 and the cluster is
>> quorate,
>> the node status is Offline or fence_node was executed?

actually the answer is in the drbd.conf manual:
"resource-and-stonith
If a node becomes a disconnected primary, it freezes all its IO operations
and calls its fence-peer handler. The fence-peer handler is supposed to
reach the peer over alternative communication paths and call 'drbdadm
outdate res' there. In case it cannot reach the peer it should stonith the
peer. IO is resumed as soon as the situation is resolved. In case your
handler fails, you can resume IO with the resume-io command."

so if nothing has changed (except the handler name), it should be OK (and
expected) to return other exit codes if 'drbdadm outdate res' succeeds
maybe i should have asked 'are there any changes except the name'

>> 
>>  2. If any code in addition to 7 is allowed - which codes will lead to
>> unfreezing the IO and which to keep blocking it, because in case of
>> Inquorate cluster status or fence failure it is preferable to keep it
>> blocked. Will returning 6 in this case lead to calling some of the
>> pri-lost
>> handlers i.e. commit suicide?
> 
> I don't know about the exit codes, but I use Lon's obliterate-peer.sh
> script in both DRBD 8.3.9 on EL5 (RHCS stable2) and DRBD 8.3.11 on EL6
> (RHCS stable 3) to protect my dual-primary setups. It works great.
> 

Thank you for the links. Yes i have looked previously (when building the
cluster) at obliterate-peer.sh script, but it works for two nodes only and
does not have the option to outdate a single res if there are more than
one, but just one failed - no need to fence the peer and drop all
resources. That is why i opted to use outdate-peer.sh script and execute
fence_node at the end (just like obliterate-peer.sh does) only if outdate
was not successful. 

if [ $RV -eq 5 ]; then
	fence_node $DRBD_PEER
	if [ $? -eq 0 ]; then
		RV=7;
	fi
fi

now i would like to improve it a bit, so my second question was actually
about the proper exit code in case when fencing failed (or not quorate),
which currently is 5. Is it considered "In case your handler fails" or it
should be 1 or 6

> Here is how I use it. The doc is for an older version but it works the
> same. If you have the latest version of drbd, just replace
> 'outdate-peer' with 'fence-peer'.
> 
> *note* - This link is part of an *incomplete* tutorial. The DRBD section
> is finished though.
> 
>
https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Configuring_DRBD_Global_and_Common_Options
> 
> I keep a copy of the obliterate-peer.sh script here;
> 
> https://alteeve.com/files/an-cluster/sbin/obliterate-peer.sh
> 
> I install it with;
> 
> wget -c https://alteeve.com/files/an-cluster/sbin/obliterate-peer.sh -O
> /sbin/obliterate-peer.sh
> chmod a+x /sbin/obliterate-peer.sh
> ls -lah /sbin/obliterate-peer.sh
> 
> If you want to find the source, do a search for "obliterate-peer.sh Lon
> Hohberger".
> 
> Sorry that this doesn't answer your question directly, but hopefully it
> will help. :)