Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I am using DRBD in combination with Heartbeat, and I've noticed that on occasion not all of the DRBD devices are properly configured. I've tracked the issue down to DRBD and udev. When DRBD waits for udev to register its devices it only waits for the very first device to be registered. This is fine if you have one device, but if you have more devices the scripts can incorrectly continue execution. As a result Heartbeat's init script are executed, and when it attempts to mount a DRBD backed partition it fails. After Hearbeat's init script has finished executing the cluster is in the state Primary/Unknown (active), and Unknown/Secondary (standby), and the DRBD's connection state is WFConnection. I've attached a patch that addresses this issue by ensuring every device is configured before continuing execution. Index: scripts/drbd =================================================================== --- scripts/drbd (revision 2144) +++ scripts/drbd (working copy) @@ -21,7 +21,7 @@ PROC_DRBD="/proc/drbd" MODPROBE="modprobe" RMMOD="rmmod" -UDEV_TIMEOUT=10 +UDEV_TIMEOUT_ORIG=10 ADD_MOD_PARAM="" if [ -f $DEFAULTFILE ]; then @@ -45,9 +45,14 @@ RESOURCE=${RESOURCE%%\ *} DEVICE=`$DRBDADM sh-dev $RESOURCE` || exit 20 - while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT -gt 0 ] ; do - sleep 1 - UDEV_TIMEOUT=$(( $UDEV_TIMEOUT-1 )) + for resource in `$DRBDADM sh-resources`; do + for dev in `$DRBDADM sh-dev $resource`; do + UDEV_TIMEOUT=$UDEV_TIMEOUT_ORIG + while [ ! -e $dev ] && [ $UDEV_TIMEOUT -gt 0 ] ; do + sleep 1 + UDEV_TIMEOUT=$(( $UDEV_TIMEOUT-1 )) + done + done done }