Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello. I would like to make a cluster of two nodes using DRBD and HeartBeat. I got two nodes (linux1 and linux2) running Suse linux (uname -a returns "linux1 2.6.5-7.244-default #1 Mon Dec 12 18:32:25 UTC 2005 i686 i686 i386 GNU/Linux"). I installed DRBD and Heartbeat from rpm. I can make a DRBD running manualy using the commands /etc/init.d/drbd start; drbdadm primary shareddisk. I can quite easily reach the consistent state primary/secondary. Till this point it seems fine but I have problem to make a cluster of nodes using DRBD with heartbeat running. I configured the ha.cf, haresources authkeys according to guides available on Internet. My ha.cf file contains: logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 baud 19200 serial /dev/ttyS0 bcast eth0 auto_failback off node linux1 node linux2 ping 10.66.1.254 respawn hacluster /usr/lib/heartbeat/ipfail My haresources file contains: linux1 10.66.1.153 drbddisk::shareddisk Filesystem::/dev/drbd0/::/shared::ext3 authkeys file contains: auth 1 1 crc (node linux2 have there 2 instead of 1) My drbd.conf contains: DEFAULTFILE="/etc/default/drbd" DRBDADM="/sbin/drbdadm" PROC_DRBD="/proc/drbd" MODPROBE="modprobe" RMMOD="rmmod" UDEV_TIMEOUT=10 ADD_MOD_PARAM="" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi test -f $DRBDADM || exit 5 function assure_module_is_loaded { [ -e "$PROC_DRBD" ] && return $MODPROBE -s drbd `$DRBDADM sh-mod-parms` $ADD_MOD_PARAM || { echo "Can not load the drbd module."$'\n'; exit 20 } # tell klogd to reload module symbol information ... [ -e /var/run/klogd.pid ] && [ -x /sbin/klogd ] && /sbin/klogd -i # make sure udev has time to create the device files RESOURCE=`$DRBDADM sh-resources` || exit 20 RESOURCE=${RESOURCE%%\ *} DEVICE=`$DRBDADM sh-dev $RESOURCE` || exit 20 while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT -gt 0 ] ; do sleep 1 UDEV_TIMEOUT=$(( $UDEV_TIMEOUT-1 )) done } function adjust_with_progress { IFS_O=$IFS NEWLINE=' ' IFS=$NEWLINE local D=0 local S=0 local N=0 COMMANDS=`$DRBDADM -d adjust all` || exit 20 echo -n "[ " for CMD in $COMMANDS; do if echo $CMD | grep -q disk; then echo -n "d$D "; D=$(( D+1 )); elif echo $CMD | grep -q syncer; then echo -n "s$S "; S=$(( S+1 )); elif echo $CMD | grep -q net; then echo -n "n$N "; N=$(( N+1 )); else echo echo -n ".. "; fi IFS=$IFS_O $CMD || { echo -e "\ncmd $CMD failed!"; exit 20 } IFS=$NEWLINE done echo -n "]" IFS=$IFS_O } case "$1" in start) echo -n "Starting DRBD resources: " assure_module_is_loaded adjust_with_progress [ -d /var/lock/subsys ] && touch /var/lock/subsys/drbd # for RedHat echo "." $DRBDADM wait_con_int # User interruptible version of wait_connect all ;; stop) echo -n "Stopping all DRBD resources" if [ -e $PROC_DRBD ] ; then $DRBDADM down all $RMMOD drbd fi [ -f /var/lock/subsys/drbd ] && rm /var/lock/subsys/drbd echo "." ;; status) # NEEDS to be heartbeat friendly... # so: put some "OK" in the output. if [ -e $PROC_DRBD ]; then echo "drbd driver loaded OK; device status:" cat $PROC_DRBD exit 0 else echo >&2 "drbd not loaded" exit 3 fi ;; reload) echo -n "Reloading DRBD configuration" $DRBDADM adjust all echo "." ;; restart|force-reload) echo -n "Restarting all DRBD resources" $DRBDADM down all $RMMOD drbd assure_module_is_loaded $DRBDADM up all echo "." ;; *) echo "Usage: /etc/init.d/drbd {start|stop|status|reload|restart|force-reload}" exit 1 ;; esac exit 0 drbd module is available as follows: linux2:~ # modprobe -l drbd /lib/modules/2.6.5-7.244-default/extra/drbd.ko I tried to start HeartBeat on both nodes manualy (/etc/init.d/heartbeat start). Linux1 and then linux2 HeartBeat ha-log at linux1 contains: heartbeat: 2007/01/06_13:48:15 info: ************************** heartbeat: 2007/01/06_13:48:15 info: Configuration validated. Starting heartbeat 1.2.3 heartbeat: 2007/01/06_13:48:15 info: heartbeat: version 1.2.3 heartbeat: 2007/01/06_13:48:15 info: Heartbeat generation: 26 heartbeat: 2007/01/06_13:48:16 info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud) heartbeat: 2007/01/06_13:48:16 info: UDP Broadcast heartbeat started on port 694 (694) interface eth 0 heartbeat: 2007/01/06_13:48:16 info: ping heartbeat started. heartbeat: 2007/01/06_13:48:16 info: pid 10218 locked in memory. heartbeat: 2007/01/06_13:48:16 info: pid 10219 locked in memory. heartbeat: 2007/01/06_13:48:16 info: pid 10220 locked in memory. heartbeat: 2007/01/06_13:48:16 info: pid 10221 locked in memory. heartbeat: 2007/01/06_13:48:16 info: pid 10222 locked in memory. heartbeat: 2007/01/06_13:48:16 info: pid 10216 locked in memory. heartbeat: 2007/01/06_13:48:16 info: Local status now set to: 'up' heartbeat: 2007/01/06_13:48:16 info: Link linux1:eth0 up. heartbeat: 2007/01/06_13:48:16 info: pid 10224 locked in memory. heartbeat: 2007/01/06_13:48:17 info: pid 10223 locked in memory. heartbeat: 2007/01/06_13:48:17 info: Link 10.66.1.254:10.66.1.254 up. heartbeat: 2007/01/06_13:48:17 info: Status update for node 10.66.1.254: status ping heartbeat: 2007/01/06_13:49:35 info: Link linux2:/dev/ttyS0 up. heartbeat: 2007/01/06_13:49:35 info: Status update for node linux2: status up heartbeat: 2007/01/06_13:49:35 info: Local status now set to: 'active' heartbeat: 2007/01/06_13:49:35 info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90) heartbeat: 2007/01/06_13:49:36 info: Starting "/usr/lib/heartbeat/ipfail" as uid 90 gid 90 (pid 102 28) heartbeat: 2007/01/06_13:49:36 info: Running /etc/ha.d/rc.d/status status heartbeat: 2007/01/06_13:49:36 info: Status update for node linux2: status active heartbeat: 2007/01/06_13:49:37 info: Running /etc/ha.d/rc.d/status status heartbeat: 2007/01/06_13:49:48 info: remote resource transition completed. heartbeat: 2007/01/06_13:49:48 info: remote resource transition completed. heartbeat: 2007/01/06_13:49:48 info: Initial resource acquisition complete (T_RESOURCES(us)) heartbeat: 2007/01/06_13:49:48 info: Local Resource acquisition completed. heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp heartbeat: 2007/01/06_13:49:48 received ip-request-resp 10.66.1.153 OK yes heartbeat: 2007/01/06_13:49:48 info: Acquiring resource group: linux1 10.66.1.153 drbddisk::shareddi sk Filesystem::/dev/drbd0::/shared::ext3 heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/IPaddr 10.66.1.153 start heartbeat: 2007/01/06_13:49:48 info: /sbin/ifconfig eth1:0 10.66.1.153 netmask 255.255.255.0 broa dcast 10.66.1.255 heartbeat: 2007/01/06_13:49:48 info: Sending Gratuitous Arp for 10.66.1.153 on eth1:0 [eth1] heartbeat: 2007/01/06_13:49:48 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp /send_arp/send_arp-10.66.1.153 eth1 10.66.1.153 auto 10.66.1.153 ffffffffffff heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/drbddisk shareddisk start heartbeat: 2007/01/06_13:49:48 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:48 CRIT: Giving up resources due to failure of drbddisk::shareddisk heartbeat: 2007/01/06_13:49:48 info: Releasing resource group: linux1 10.66.1.153 drbddisk::shareddi sk Filesystem::/dev/drbd0::/shared::ext3 heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /shared ext3 stop heartbeat: 2007/01/06_13:49:48 WARNING: Filesystem /shared not mounted? heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:48 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:49 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:49 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:49 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:50 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:50 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:50 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:51 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:51 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:51 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:52 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:52 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:52 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:53 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:53 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:53 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:54 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:54 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:54 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:55 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:55 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:55 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:56 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:56 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:56 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:57 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:57 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:57 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:58 info: Retrying failed stop operation [drbddisk::shareddisk] heartbeat: 2007/01/06_13:49:58 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop heartbeat: 2007/01/06_13:49:58 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk heartbeat: 2007/01/06_13:49:58 ERROR: Resource script for drbddisk::shareddisk probably not LSB-comp liant. heartbeat: 2007/01/06_13:49:58 WARN: it (drbddisk::shareddisk) MUST succeed on a stop when already s topped heartbeat: 2007/01/06_13:49:58 WARN: Machine reboot narrowly avoided! heartbeat: 2007/01/06_13:49:59 info: Running /etc/ha.d/resource.d/IPaddr 10.66.1.153 stop heartbeat: 2007/01/06_13:49:59 info: /sbin/route -n del -host 10.66.1.153 heartbeat: 2007/01/06_13:49:59 info: /sbin/ifconfig eth1:0 down heartbeat: 2007/01/06_13:49:59 info: IP Address 10.66.1.153 released ha-log at linux2 contains: heartbeat: 2007/01/05_13:47:06 info: Configuration validated. Starting heartbeat 1.2.3 heartbeat: 2007/01/05_13:47:06 info: heartbeat: version 1.2.3 heartbeat: 2007/01/05_13:47:06 info: Heartbeat generation: 7 heartbeat: 2007/01/05_13:47:07 info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud) heartbeat: 2007/01/05_13:47:07 info: UDP Broadcast heartbeat started on port 694 (694) interface eth 0 heartbeat: 2007/01/05_13:47:07 info: ping heartbeat started. heartbeat: 2007/01/05_13:47:07 info: pid 8028 locked in memory. heartbeat: 2007/01/05_13:47:07 info: pid 8029 locked in memory. heartbeat: 2007/01/05_13:47:07 info: pid 8030 locked in memory. heartbeat: 2007/01/05_13:47:07 info: pid 8031 locked in memory. heartbeat: 2007/01/05_13:47:07 info: pid 8032 locked in memory. heartbeat: 2007/01/05_13:47:07 info: pid 8026 locked in memory. heartbeat: 2007/01/05_13:47:07 info: Local status now set to: 'up' heartbeat: 2007/01/05_13:47:07 info: Link linux2:eth0 up. heartbeat: 2007/01/05_13:47:07 info: pid 8034 locked in memory. heartbeat: 2007/01/05_13:47:07 info: Link linux1:/dev/ttyS0 up. heartbeat: 2007/01/05_13:47:07 info: Status update for node linux1: status active heartbeat: 2007/01/05_13:47:07 info: Running /etc/ha.d/rc.d/status status heartbeat: 2007/01/05_13:47:08 info: pid 8033 locked in memory. heartbeat: 2007/01/05_13:47:08 info: Link 10.66.1.254:10.66.1.254 up. heartbeat: 2007/01/05_13:47:08 info: Status update for node 10.66.1.254: status ping heartbeat: 2007/01/05_13:47:08 info: Local status now set to: 'active' heartbeat: 2007/01/05_13:47:08 info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90) heartbeat: 2007/01/05_13:47:08 info: Starting "/usr/lib/heartbeat/ipfail" as uid 90 gid 90 (pid 803 9) heartbeat: 2007/01/05_13:47:19 info: local resource transition completed. heartbeat: 2007/01/05_13:47:19 info: Initial resource acquisition complete (T_RESOURCES(us)) heartbeat: 2007/01/05_13:47:19 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys linux2] to acquire. heartbeat: 2007/01/05_13:47:19 info: remote resource transition completed. Time at linux2 is 2 minutes later the the time on linux1. The drbddisk file contains: DEFAULTFILE="/etc/default/drbd" #DEFAULTFILE="/etc/init.d/drbd" DRBDADM="/sbin/drbdadm" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi if [ "$#" -eq 2 ]; then RES="$1" CMD="$2" else RES="all" CMD="$1" fi case "$CMD" in start) # try several times, in case heartbeat deadtime # was smaller than drbd ping time try=6 while true; do $DRBDADM primary $RES && break let "--try" || exit 20 sleep 1 done ;; stop) # exec, so the exit code of drbdadm propagates exec $DRBDADM secondary $RES ;; status) if [ "$RES" = "all" ]; then echo "A resource name is required for status inquiries." exit 10 fi ST=$( $DRBDADM state $RES 2>&1 ) STATE=${ST%/*} if [ "$STATE" = "Primary" ]; then echo "running" elif [ "$STATE" = "Secondary" ]; then echo "stopped" else echo "$ST" fi ;; *) echo "Usage: drbddisk [resource] {start|stop|status}" exit 1 ;; esac exit 0 Please could you help me to make it running automaticaly (DRBD together with heartbeat)? Thank you. Dalo -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070121/15484d94/attachment.htm>