Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello.
I would like to make a cluster of two nodes using DRBD and HeartBeat. I got two nodes (linux1 and linux2) running Suse linux (uname -a returns "linux1 2.6.5-7.244-default #1 Mon Dec 12 18:32:25 UTC 2005 i686 i686 i386 GNU/Linux"). I installed DRBD and Heartbeat from rpm. I can make a DRBD running manualy using the commands /etc/init.d/drbd start; drbdadm primary shareddisk. I can quite easily reach the consistent state primary/secondary.
Till this point it seems fine but I have problem to make a cluster of nodes using DRBD with heartbeat running. I configured the ha.cf, haresources authkeys according to guides available on Internet.
My ha.cf file contains:
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
baud 19200
serial /dev/ttyS0
bcast eth0
auto_failback off
node linux1
node linux2
ping 10.66.1.254
respawn hacluster /usr/lib/heartbeat/ipfail
My haresources file contains:
linux1 10.66.1.153 drbddisk::shareddisk Filesystem::/dev/drbd0/::/shared::ext3
authkeys file contains:
auth 1
1 crc
(node linux2 have there 2 instead of 1)
My drbd.conf contains:
DEFAULTFILE="/etc/default/drbd"
DRBDADM="/sbin/drbdadm"
PROC_DRBD="/proc/drbd"
MODPROBE="modprobe"
RMMOD="rmmod"
UDEV_TIMEOUT=10
ADD_MOD_PARAM=""
if [ -f $DEFAULTFILE ]; then
. $DEFAULTFILE
fi
test -f $DRBDADM || exit 5
function assure_module_is_loaded
{
[ -e "$PROC_DRBD" ] && return
$MODPROBE -s drbd `$DRBDADM sh-mod-parms` $ADD_MOD_PARAM || {
echo "Can not load the drbd module."$'\n'; exit 20
}
# tell klogd to reload module symbol information ...
[ -e /var/run/klogd.pid ] && [ -x /sbin/klogd ] && /sbin/klogd -i
# make sure udev has time to create the device files
RESOURCE=`$DRBDADM sh-resources` || exit 20
RESOURCE=${RESOURCE%%\ *}
DEVICE=`$DRBDADM sh-dev $RESOURCE` || exit 20
while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT -gt 0 ] ; do
sleep 1
UDEV_TIMEOUT=$(( $UDEV_TIMEOUT-1 ))
done
}
function adjust_with_progress
{
IFS_O=$IFS
NEWLINE='
'
IFS=$NEWLINE
local D=0
local S=0
local N=0
COMMANDS=`$DRBDADM -d adjust all` || exit 20
echo -n "[ "
for CMD in $COMMANDS; do
if echo $CMD | grep -q disk; then echo -n "d$D "; D=$(( D+1 ));
elif echo $CMD | grep -q syncer; then echo -n "s$S "; S=$(( S+1 ));
elif echo $CMD | grep -q net; then echo -n "n$N "; N=$(( N+1 ));
else echo echo -n ".. ";
fi
IFS=$IFS_O
$CMD || {
echo -e "\ncmd $CMD failed!"; exit 20
}
IFS=$NEWLINE
done
echo -n "]"
IFS=$IFS_O
}
case "$1" in
start)
echo -n "Starting DRBD resources: "
assure_module_is_loaded
adjust_with_progress
[ -d /var/lock/subsys ] && touch /var/lock/subsys/drbd # for RedHat
echo "."
$DRBDADM wait_con_int # User interruptible version of wait_connect all
;;
stop)
echo -n "Stopping all DRBD resources"
if [ -e $PROC_DRBD ] ; then
$DRBDADM down all
$RMMOD drbd
fi
[ -f /var/lock/subsys/drbd ] && rm /var/lock/subsys/drbd
echo "."
;;
status)
# NEEDS to be heartbeat friendly...
# so: put some "OK" in the output.
if [ -e $PROC_DRBD ]; then
echo "drbd driver loaded OK; device status:"
cat $PROC_DRBD
exit 0
else
echo >&2 "drbd not loaded"
exit 3
fi
;;
reload)
echo -n "Reloading DRBD configuration"
$DRBDADM adjust all
echo "."
;;
restart|force-reload)
echo -n "Restarting all DRBD resources"
$DRBDADM down all
$RMMOD drbd
assure_module_is_loaded
$DRBDADM up all
echo "."
;;
*)
echo "Usage: /etc/init.d/drbd {start|stop|status|reload|restart|force-reload}"
exit 1
;;
esac
exit 0
drbd module is available as follows:
linux2:~ # modprobe -l drbd
/lib/modules/2.6.5-7.244-default/extra/drbd.ko
I tried to start HeartBeat on both nodes manualy (/etc/init.d/heartbeat start). Linux1 and then linux2
HeartBeat ha-log at linux1 contains:
heartbeat: 2007/01/06_13:48:15 info: **************************
heartbeat: 2007/01/06_13:48:15 info: Configuration validated. Starting heartbeat 1.2.3
heartbeat: 2007/01/06_13:48:15 info: heartbeat: version 1.2.3
heartbeat: 2007/01/06_13:48:15 info: Heartbeat generation: 26
heartbeat: 2007/01/06_13:48:16 info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud)
heartbeat: 2007/01/06_13:48:16 info: UDP Broadcast heartbeat started on port 694 (694) interface eth
0
heartbeat: 2007/01/06_13:48:16 info: ping heartbeat started.
heartbeat: 2007/01/06_13:48:16 info: pid 10218 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10219 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10220 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10221 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10222 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: pid 10216 locked in memory.
heartbeat: 2007/01/06_13:48:16 info: Local status now set to: 'up'
heartbeat: 2007/01/06_13:48:16 info: Link linux1:eth0 up.
heartbeat: 2007/01/06_13:48:16 info: pid 10224 locked in memory.
heartbeat: 2007/01/06_13:48:17 info: pid 10223 locked in memory.
heartbeat: 2007/01/06_13:48:17 info: Link 10.66.1.254:10.66.1.254 up.
heartbeat: 2007/01/06_13:48:17 info: Status update for node 10.66.1.254: status ping
heartbeat: 2007/01/06_13:49:35 info: Link linux2:/dev/ttyS0 up.
heartbeat: 2007/01/06_13:49:35 info: Status update for node linux2: status up
heartbeat: 2007/01/06_13:49:35 info: Local status now set to: 'active'
heartbeat: 2007/01/06_13:49:35 info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90)
heartbeat: 2007/01/06_13:49:36 info: Starting "/usr/lib/heartbeat/ipfail" as uid 90 gid 90 (pid 102
28)
heartbeat: 2007/01/06_13:49:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2007/01/06_13:49:36 info: Status update for node linux2: status active
heartbeat: 2007/01/06_13:49:37 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2007/01/06_13:49:48 info: remote resource transition completed.
heartbeat: 2007/01/06_13:49:48 info: remote resource transition completed.
heartbeat: 2007/01/06_13:49:48 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat: 2007/01/06_13:49:48 info: Local Resource acquisition completed.
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
heartbeat: 2007/01/06_13:49:48 received ip-request-resp 10.66.1.153 OK yes
heartbeat: 2007/01/06_13:49:48 info: Acquiring resource group: linux1 10.66.1.153 drbddisk::shareddi
sk Filesystem::/dev/drbd0::/shared::ext3
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/IPaddr 10.66.1.153 start
heartbeat: 2007/01/06_13:49:48 info: /sbin/ifconfig eth1:0 10.66.1.153 netmask 255.255.255.0 broa
dcast 10.66.1.255
heartbeat: 2007/01/06_13:49:48 info: Sending Gratuitous Arp for 10.66.1.153 on eth1:0 [eth1]
heartbeat: 2007/01/06_13:49:48 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp
/send_arp/send_arp-10.66.1.153 eth1 10.66.1.153 auto 10.66.1.153 ffffffffffff
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/drbddisk shareddisk start
heartbeat: 2007/01/06_13:49:48 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:48 CRIT: Giving up resources due to failure of drbddisk::shareddisk
heartbeat: 2007/01/06_13:49:48 info: Releasing resource group: linux1 10.66.1.153 drbddisk::shareddi
sk Filesystem::/dev/drbd0::/shared::ext3
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /shared ext3
stop
heartbeat: 2007/01/06_13:49:48 WARNING: Filesystem /shared not mounted?
heartbeat: 2007/01/06_13:49:48 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:48 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:49 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:49 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:49 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:50 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:50 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:50 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:51 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:51 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:51 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:52 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:52 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:52 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:53 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:53 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:53 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:54 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:54 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:54 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:55 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:55 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:55 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:56 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:56 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:56 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:57 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:57 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:57 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:58 info: Retrying failed stop operation [drbddisk::shareddisk]
heartbeat: 2007/01/06_13:49:58 info: Running /etc/ha.d/resource.d/drbddisk shareddisk stop
heartbeat: 2007/01/06_13:49:58 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
heartbeat: 2007/01/06_13:49:58 ERROR: Resource script for drbddisk::shareddisk probably not LSB-comp
liant.
heartbeat: 2007/01/06_13:49:58 WARN: it (drbddisk::shareddisk) MUST succeed on a stop when already s
topped
heartbeat: 2007/01/06_13:49:58 WARN: Machine reboot narrowly avoided!
heartbeat: 2007/01/06_13:49:59 info: Running /etc/ha.d/resource.d/IPaddr 10.66.1.153 stop
heartbeat: 2007/01/06_13:49:59 info: /sbin/route -n del -host 10.66.1.153
heartbeat: 2007/01/06_13:49:59 info: /sbin/ifconfig eth1:0 down
heartbeat: 2007/01/06_13:49:59 info: IP Address 10.66.1.153 released
ha-log at linux2 contains:
heartbeat: 2007/01/05_13:47:06 info: Configuration validated. Starting heartbeat 1.2.3
heartbeat: 2007/01/05_13:47:06 info: heartbeat: version 1.2.3
heartbeat: 2007/01/05_13:47:06 info: Heartbeat generation: 7
heartbeat: 2007/01/05_13:47:07 info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud)
heartbeat: 2007/01/05_13:47:07 info: UDP Broadcast heartbeat started on port 694 (694) interface eth
0
heartbeat: 2007/01/05_13:47:07 info: ping heartbeat started.
heartbeat: 2007/01/05_13:47:07 info: pid 8028 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8029 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8030 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8031 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8032 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: pid 8026 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: Local status now set to: 'up'
heartbeat: 2007/01/05_13:47:07 info: Link linux2:eth0 up.
heartbeat: 2007/01/05_13:47:07 info: pid 8034 locked in memory.
heartbeat: 2007/01/05_13:47:07 info: Link linux1:/dev/ttyS0 up.
heartbeat: 2007/01/05_13:47:07 info: Status update for node linux1: status active
heartbeat: 2007/01/05_13:47:07 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2007/01/05_13:47:08 info: pid 8033 locked in memory.
heartbeat: 2007/01/05_13:47:08 info: Link 10.66.1.254:10.66.1.254 up.
heartbeat: 2007/01/05_13:47:08 info: Status update for node 10.66.1.254: status ping
heartbeat: 2007/01/05_13:47:08 info: Local status now set to: 'active'
heartbeat: 2007/01/05_13:47:08 info: Starting child client "/usr/lib/heartbeat/ipfail" (90,90)
heartbeat: 2007/01/05_13:47:08 info: Starting "/usr/lib/heartbeat/ipfail" as uid 90 gid 90 (pid 803
9)
heartbeat: 2007/01/05_13:47:19 info: local resource transition completed.
heartbeat: 2007/01/05_13:47:19 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat: 2007/01/05_13:47:19 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys
linux2] to acquire.
heartbeat: 2007/01/05_13:47:19 info: remote resource transition completed.
Time at linux2 is 2 minutes later the the time on linux1.
The drbddisk file contains:
DEFAULTFILE="/etc/default/drbd"
#DEFAULTFILE="/etc/init.d/drbd"
DRBDADM="/sbin/drbdadm"
if [ -f $DEFAULTFILE ]; then
. $DEFAULTFILE
fi
if [ "$#" -eq 2 ]; then
RES="$1"
CMD="$2"
else
RES="all"
CMD="$1"
fi
case "$CMD" in
start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
try=6
while true; do
$DRBDADM primary $RES && break
let "--try" || exit 20
sleep 1
done
;;
stop)
# exec, so the exit code of drbdadm propagates
exec $DRBDADM secondary $RES
;;
status)
if [ "$RES" = "all" ]; then
echo "A resource name is required for status inquiries."
exit 10
fi
ST=$( $DRBDADM state $RES 2>&1 )
STATE=${ST%/*}
if [ "$STATE" = "Primary" ]; then
echo "running"
elif [ "$STATE" = "Secondary" ]; then
echo "stopped"
else
echo "$ST"
fi
;;
*)
echo "Usage: drbddisk [resource] {start|stop|status}"
exit 1
;;
esac
exit 0
Please could you help me to make it running automaticaly (DRBD together with heartbeat)?
Thank you.
Dalo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070121/15484d94/attachment.htm>