[DRBD-user] DRBD with heartbeat - heartbeat is not able to start DRBD

balkod at centrum.sk balkod at centrum.sk
Sun Jan 21 07:10:59 CET 2007


Hello.
I would like to make a cluster of two nodes using DRBD and HeartBeat. I got two nodes (linux1 and linux2) running Suse linux (uname -a returns "linux1 2.6.5-7.244-default #1 Mon Dec 12 18:32:25 UTC 2005 i686 i686 i386 GNU/Linux"). I installed DRBD and Heartbeat from rpm. I can make a DRBD running manualy using the commands /etc/init.d/drbd start; drbdadm primary shareddisk. I can quite easily reach the consistent state primary/secondary.
 
Till this point it seems fine but I have problem to make a cluster of nodes using DRBD with heartbeat running. I configured the ha.cf, haresources authkeys according to guides available on Internet.
 
My ha.cf file contains:
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
baud    19200
serial  /dev/ttyS0
bcast   eth0
auto_failback off
node    linux1
node    linux2
ping 10.66.1.254
respawn hacluster /usr/lib/heartbeat/ipfail
 
My haresources file contains:
linux1 10.66.1.153 drbddisk::shareddisk Filesystem::/dev/drbd0/::/shared::ext3
 
authkeys file contains:
auth 1
1 crc
(node linux2 have there 2 instead of 1)
 
My drbd.conf contains:
DEFAULTFILE="/etc/default/drbd"
DRBDADM="/sbin/drbdadm"
PROC_DRBD="/proc/drbd"
MODPROBE="modprobe"
RMMOD="rmmod"
UDEV_TIMEOUT=10
ADD_MOD_PARAM=""
if [ -f $DEFAULTFILE ]; then
  . $DEFAULTFILE
fi
test -f $DRBDADM || exit 5
function assure_module_is_loaded
{
    [ -e "$PROC_DRBD" ] && return
    $MODPROBE -s drbd `$DRBDADM sh-mod-parms` $ADD_MOD_PARAM || {
        echo "Can not load the drbd module."$'\n'; exit 20
    }
    # tell klogd to reload module symbol information ...
    [ -e /var/run/klogd.pid ] && [ -x /sbin/klogd ] && /sbin/klogd -i
    # make sure udev has time to create the device files
    RESOURCE=`$DRBDADM sh-resources` || exit 20
    RESOURCE=${RESOURCE%%\ *}
    DEVICE=`$DRBDADM sh-dev $RESOURCE` || exit 20
    while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT -gt 0 ] ; do
        sleep 1
        UDEV_TIMEOUT=$(( $UDEV_TIMEOUT-1 ))
    done
}
function adjust_with_progress
{
    IFS_O=$IFS
    NEWLINE='
'
    IFS=$NEWLINE
    local D=0
    local S=0
    local N=0
    COMMANDS=`$DRBDADM -d adjust all` || exit 20
    echo -n "[ "
    for CMD in $COMMANDS; do
        if echo $CMD | grep -q disk; then echo -n "d$D "; D=$(( D+1 ));
        elif echo $CMD | grep -q syncer; then echo -n "s$S "; S=$(( S+1 ));
        elif echo $CMD | grep -q net; then echo -n "n$N "; N=$(( N+1 ));
        else echo echo -n ".. ";
        fi
        IFS=$IFS_O
        $CMD || {
            echo -e "\ncmd $CMD failed!"; exit 20
        }
        IFS=$NEWLINE
    done
    echo -n "]"
    IFS=$IFS_O
}
case "$1" in
    start)
        echo -n "Starting DRBD resources:    "
        assure_module_is_loaded
        adjust_with_progress
        [ -d /var/lock/subsys ] && touch /var/lock/subsys/drbd  # for RedHat
        echo "."
        $DRBDADM wait_con_int # User interruptible version of wait_connect all
        ;;
    stop)
        echo -n "Stopping all DRBD resources"
        if [ -e $PROC_DRBD ] ; then
                $DRBDADM down all
                $RMMOD drbd
        fi
        [ -f /var/lock/subsys/drbd ] && rm /var/lock/subsys/drbd
        echo "."
        ;;
    status)
        # NEEDS to be heartbeat friendly...
        # so: put some "OK" in the output.
        if [ -e $PROC_DRBD ]; then
            echo "drbd driver loaded OK; device status:"
            cat $PROC_DRBD
            exit 0
        else
            echo >&2 "drbd not loaded"
            exit 3
        fi
        ;;
    reload)
        echo -n "Reloading DRBD configuration"
        $DRBDADM adjust all
        echo "."
        ;;
    restart|force-reload)
        echo -n "Restarting all DRBD resources"
        $DRBDADM down all
        $RMMOD drbd
        assure_module_is_loaded
        $DRBDADM up all
        echo "."
        ;;
    *)
        echo "Usage: /etc/init.d/drbd {start|stop|status|reload|restart|force-reload}"
        exit 1
        ;;
esac
exit 0

 
drbd module is available as follows:
linux2:~ # modprobe -l drbd
/lib/modules/2.6.5-7.244-default/extra/drbd.ko
 
I tried to start HeartBeat on both nodes manualy (/etc/init.d/heartbeat start). Linux1 and then linux2
HeartBeat ha-log at linux1 contains:
heartbeat: 2007/01/06_13:48:15 info: **************************
heartbeat: 2007/01/06_13:48:15 info: Configuration validated. Starting heartbeat 1.2.3
heartbeat: 2007/01/06_13:48:15 info: heartbeat: version 1.2.3
heartbeat: 2007/01/06_13:48:15 info: Heartbeat generation: 26
heartbeat: 2007/01/06_13:48:16 info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud)
heartbeat: 2007/01/06_13:48:16 info: UDP Broadcast heartbeat started on port 694 (694) interface eth
0
heartbeat: 2007/01/06_13:48:16 info: ping heartbeat started.
heartbeat: 2007/01/06_13:48:16 info: pid 10218 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10219 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10220 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10221 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10222 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10216 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: Local status now set to: 'up'
heartbeat: 2007/01/06_13:48:16 info: Link linux1:eth0 up.
heartbeat: 2007/01/06_13:48:16 info: pid 10224 locked in memory.
heartbeat: 2007/01/06_13:48:17 info: pid 10223 locked in memory.
heartbeat: 2007/01/06_13:48:17 info: Link 10.66.1.254:10.66.1.254 up.
heartbeat: 2007/01/06_13:48:17 info: Status update for node 10.66.1.254: status ping
heartbeat: 2007/01/06_13:49:35 info: Link linux2:/dev/ttyS0 up.
heartbeat: 2007/01/06_13:49:35 info: Status update for node linux2: status up
heartbeat: 2007/01/06_13:49:35 info: Local status now set to: 'active'
heartbeat: 2007/01/06_13:49:35 info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90)
heartbeat: 2007/01/06_13:49:36 info: Starting "/usr/lib/heartbeat/ipfail" as uid 90  gid 90 (pid 102
28)
heartbeat: 2007/01/06_13:49:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2007/01/06_13:49:36 info: Status update for node linux2: status active
heartbeat: 2007/01/06_13:49:37 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2007/01/06_13:49:48 info: remote resource transition completed.
heartbeat: 2007/01/06_13:49:48 info: remote resource transition completed.
heartbeat: 2007/01/06_13:49:48 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat: 2007/01/06_13:49:48 info: Local Resource acquisition completed.
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
heartbeat: 2007/01/06_13:49:48 received ip-request-resp 10.66.1.153 OK yes
heartbeat: 2007/01/06_13:49:48 info: Acquiring resource group: linux1 10.66.1.153 drbddisk::shareddi
sk Filesystem::/dev/drbd0::/shared::ext3
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/IPaddr 10.66.1.153 start
heartbeat: 2007/01/06_13:49:48 info: /sbin/ifconfig eth1:0 10.66.1.153 netmask 255.255.255.0    broa
dcast 10.66.1.255
heartbeat: 2007/01/06_13:49:48 info: Sending Gratuitous Arp for 10.66.1.153 on eth1:0 [eth1]
heartbeat: 2007/01/06_13:49:48 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp
/send_arp/send_arp-10.66.1.153 eth1 10.66.1.153 auto 10.66.1.153 ffffffffffff
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/drbddisk shareddisk start
heartbeat: 2007/01/06_13:49:48 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:48 CRIT: Giving up resources due to failure of drbddisk::shareddisk
heartbeat: 2007/01/06_13:49:48 info: Releasing resource group: linux1 10.66.1.153 drbddisk::shareddi
sk Filesystem::/dev/drbd0::/shared::ext3
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /shared ext3
 stop
heartbeat: 2007/01/06_13:49:48 WARNING: Filesystem /shared not mounted?
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:48 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:49 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:49 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:49 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:50 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:50 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:50 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:51 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:51 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:51 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:52 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:52 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:52 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:53 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:53 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:53 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:54 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:54 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:54 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:55 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:55 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:55 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:56 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:56 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:56 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:57 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:57 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:57 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:58 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:58 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:58 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:58 ERROR: Resource script for drbddisk::shareddisk probably not LSB-comp
liant.
heartbeat: 2007/01/06_13:49:58 WARN: it (drbddisk::shareddisk) MUST succeed on a stop when already s
topped
heartbeat: 2007/01/06_13:49:58 WARN: Machine reboot narrowly avoided!
heartbeat: 2007/01/06_13:49:59 info: Running /etc/ha.d/resource.d/IPaddr 10.66.1.153 stop
heartbeat: 2007/01/06_13:49:59 info: /sbin/route -n del -host 10.66.1.153
heartbeat: 2007/01/06_13:49:59 info: /sbin/ifconfig eth1:0 down
heartbeat: 2007/01/06_13:49:59 info: IP Address 10.66.1.153 released
 
ha-log at linux2 contains:
heartbeat: 2007/01/05_13:47:06 info: Configuration validated. Starting heartbeat 1.2.3
heartbeat: 2007/01/05_13:47:06 info: heartbeat: version 1.2.3
heartbeat: 2007/01/05_13:47:06 info: Heartbeat generation: 7
heartbeat: 2007/01/05_13:47:07 info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud)
heartbeat: 2007/01/05_13:47:07 info: UDP Broadcast heartbeat started on port 694 (694) interface eth
0
heartbeat: 2007/01/05_13:47:07 info: ping heartbeat started.
heartbeat: 2007/01/05_13:47:07 info: pid 8028 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8029 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8030 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8031 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8032 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8026 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: Local status now set to: 'up'
heartbeat: 2007/01/05_13:47:07 info: Link linux2:eth0 up.
heartbeat: 2007/01/05_13:47:07 info: pid 8034 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: Link linux1:/dev/ttyS0 up.
heartbeat: 2007/01/05_13:47:07 info: Status update for node linux1: status active
heartbeat: 2007/01/05_13:47:07 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2007/01/05_13:47:08 info: pid 8033 locked in memory.
heartbeat: 2007/01/05_13:47:08 info: Link 10.66.1.254:10.66.1.254 up.
heartbeat: 2007/01/05_13:47:08 info: Status update for node 10.66.1.254: status ping
heartbeat: 2007/01/05_13:47:08 info: Local status now set to: 'active'
heartbeat: 2007/01/05_13:47:08 info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90)
heartbeat: 2007/01/05_13:47:08 info: Starting "/usr/lib/heartbeat/ipfail" as uid 90  gid 90 (pid 803
9)
heartbeat: 2007/01/05_13:47:19 info: local resource transition completed.
heartbeat: 2007/01/05_13:47:19 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat: 2007/01/05_13:47:19 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys
 linux2] to acquire.
heartbeat: 2007/01/05_13:47:19 info: remote resource transition completed.
 
Time at linux2 is 2 minutes later the the time on linux1.
 
The drbddisk file contains:
 
DEFAULTFILE="/etc/default/drbd"
#DEFAULTFILE="/etc/init.d/drbd"
DRBDADM="/sbin/drbdadm"
if [ -f $DEFAULTFILE ]; then
  . $DEFAULTFILE
fi
if [ "$#" -eq 2 ]; then
  RES="$1"
  CMD="$2"
else
  RES="all"
  CMD="$1"
fi
case "$CMD" in
    start)
        # try several times, in case heartbeat deadtime
        # was smaller than drbd ping time
        try=6
        while true; do
                $DRBDADM primary $RES && break
                let "--try" || exit 20
                sleep 1
        done
        ;;
    stop)
        # exec, so the exit code of drbdadm propagates
        exec $DRBDADM secondary $RES
        ;;
    status)
        if [ "$RES" = "all" ]; then
            echo "A resource name is required for status inquiries."
            exit 10
        fi
        ST=$( $DRBDADM state $RES 2>&1 )
        STATE=${ST%/*}
        if [ "$STATE" = "Primary" ]; then
            echo "running"
        elif [ "$STATE" = "Secondary" ]; then
            echo "stopped"
        else
            echo "$ST"
        fi
        ;;
    *)
        echo "Usage: drbddisk [resource] {start|stop|status}"
        exit 1
        ;;
esac
exit 0
 
Please could you help me to make it running automaticaly (DRBD together with heartbeat)? 
 
Thank you.
 
Dalo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070121/15484d94/attachment.htm>


More information about the drbd-user mailing list