[DRBD-user] Pacemaker resource start failure in combination of drbd-utils-8.9.3 and drbd-8.4.6

Andreas Mock andreas.mock at drumedar.de
Tue Jun 30 15:32:17 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Lars,

looking at the patch I'm pretty sure that there is another
bug lurking around. 

I made a diff on the drbd ocf resource agent between version
8.9.2 and 8.9.3.:

-------------------8<-------------------------
--- ./ocf/resource.d/linbit/drbd        2015-06-30 09:40:15.787666004 +0200
+++ ./ocf/resource.d/linbit/drbd.8.9.3  2015-06-30 15:13:58.086771560 +0200
@@ -308,7 +308,7 @@
        DRBD_DSTATE_LOCAL=()
        DRBD_DSTATE_REMOTE=()

-       eval "$($DRBDSETUP "$DRBD_RESOURCE" sh-status)"
+       eval "$($DRBDADM sh-status "$DRBD_RESOURCE")"

        # if there was no output at all, or a weird output
        # make sure the status arrays won't be empty.
@@ -916,11 +916,11 @@
        check_binary $DRBDSETUP
        # XXX I really take cibadmin, sed, grep, etc. for granted.

-       local VERSION DRBDADM_VERSION_CODE=0
+       local VERSION DRBD_KERNEL_VERSION_CODE=0
        if VERSION="$($DRBDADM --version 2>/dev/null)"; then
                eval $VERSION
        fi
-       if (( $DRBDADM_VERSION_CODE >= 0x080400 )); then
+       if (( $DRBD_KERNEL_VERSION_CODE >= 0x080400 )); then
                DRBD_HAS_MULTI_VOLUME=true
        fi
        check_crm_feature_set
@@ -1037,6 +1037,7 @@
        for i in $OCF_RESKEY_adjust_master_score; do
                [[ $i = *[!0-9]* ]]   && fallback=true && ocf_log err "BAD adjust_master_score value $i ; falling back to default"
                [[ $j && $i -lt $j ]] && fallback=true && ocf_log err "BAD adjust_master_score value $j > $i ; falling back to default"
+               j=$i
                n=$(( n+1 ))
        done
        [[ $n != 4 ]] && fallback=true && ocf_log err "Not enough adjust_master_score values ($n != 4); falling back to default"

-------------------8<-------------------------

I'm pretty sure that the intersting part is the first change
from using 'drbdadm sh-status $RES' instead of 'drbdsetup $RES sh-status'.
When I look at the output of 'drbdadm sh-status $RES' I get a warning
on STDERR: "This command will ignore resource names!"
When I look at the output I guess that the shell variables API
is different between these two. I do have two multivolume resources
in my cluster. While I get one assignemnt per variable in the
"good" case I get severals lines in the "bad" case. Depending on
the sequence of the evaluated lines I get bad status reported back
to the ocf agent and the ocf agent starts to do wrong things based
on that bad information.

Hopefully this helps.

Best regards
Andreas Mock




> -----Ursprüngliche Nachricht-----
> Von: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-
> bounces at lists.linbit.com] Im Auftrag von Lars Ellenberg
> Gesendet: Dienstag, 30. Juni 2015 14:09
> An: drbd-user at lists.linbit.com
> Betreff: Re: [DRBD-user] Pacemaker resource start failure in combination
> of drbd-utils-8.9.3 and drbd-8.4.6
> 
> On Tue, Jun 30, 2015 at 11:14:01AM +0200, Lars Ellenberg wrote:
> > On Tue, Jun 30, 2015 at 05:57:17PM +0900, Hiroshi Fujishima wrote:
> > > >>>>> In <20150630082043.GH7381 at soda.linbit>
> > > >>>>>	Lars Ellenberg <lars.ellenberg at linbit.com> wrote:
> > > > On Tue, Jun 30, 2015 at 10:16:42AM +0200, Lars Ellenberg wrote:
> > > > > On Tue, Jun 30, 2015 at 09:46:49AM +0900, Hiroshi Fujishima wrote:
> > > > > > Hello
> > > > > >
> > > > > > In combination of drbd-utils-8.9.3 and drbd-8.4.6, the following
> > > > > > command failed to start drbd resource.
> > > > > >
> > > > > > # systemctl stop pacemaker
> > > > > > # rmmod drbd
> > > > > > # systemctl start pacemaker
> > > > > >
> > > > > > # crm_mon
> > > > > > Failed actions:
> > > > > >     res_drbd_r0_start_0 on sac-tkh-sv002 'unknown error' (1):
> call=7, status=complete, last-rc-change='Tue Jun 30 09:33:29 2015',
> queued=0ms, exec=67ms
> > > > > >
> > > > > > Jun 30 09:33:29 sac-tkh-sv002 drbd(res_drbd_r0)[26761]: ERROR:
> r0: Called drbdadm -c /etc/drbd.conf syncer r0
> 
> Try this:
> 
> From 07289b456d662379d36d964742ea71331933d1cb Mon Sep 17 00:00:00 2001
> From: Lars Ellenberg <lars.ellenberg at linbit.com>
> Date: Tue, 30 Jun 2015 11:51:41 +0200
> Subject: [PATCH] drbd.ocf: fix drbd module version detection of unloaded
>  module
> 
> ---
>  scripts/drbd.ocf | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/scripts/drbd.ocf b/scripts/drbd.ocf
> index 0733a8d..e339841 100644
> --- a/scripts/drbd.ocf
> +++ b/scripts/drbd.ocf
> @@ -926,6 +926,18 @@ drbd_validate_all () {
>  	if VERSION="$($DRBDADM --version 2>/dev/null)"; then
>  		eval $VERSION
>  	fi
> +	if (( $DRBD_KERNEL_VERSION_CODE == 0x0 )) ; then
> +		# Maybe the DRBD module was not loaded (yet).
> +		# I don't want to load the module here,
> +		# maybe this is just a probe or stop.
> +		# It will be loaded on "start", though.
> +		# Instead, look at modinfo output.
> +		# Newer drbdadm does this implicitly, but may reexec older
> +		# drbdadm versions for compatibility reasons.
> +		DRBD_KERNEL_VERSION_CODE=$(printf "0x%02x%02x%02x" $(
> +			modinfo -F version drbd |
> +			sed -ne 's/^\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\).*$/\1
> \2 \3/p'))
> +	fi
>  	if (( $DRBD_KERNEL_VERSION_CODE >= 0x080400 )); then
>  		DRBD_HAS_MULTI_VOLUME=true
>  	fi
> --
> 1.9.1
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list