Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I am using DRBD in combination with Heartbeat, and I've noticed that on
occasion not all of the DRBD devices are properly configured. I've
tracked the issue down to DRBD and udev. When DRBD waits for udev to
register its devices it only waits for the very first device to be
registered. This is fine if you have one device, but if you have more
devices the scripts can incorrectly continue execution. As a result
Heartbeat's init script are executed, and when it attempts to mount a
DRBD backed partition it fails.
After Hearbeat's init script has finished executing the cluster is in
the state Primary/Unknown (active), and Unknown/Secondary (standby), and
the DRBD's connection state is WFConnection. I've attached a patch that
addresses this issue by ensuring every device is configured before
continuing execution.
Index: scripts/drbd
===================================================================
--- scripts/drbd (revision 2144)
+++ scripts/drbd (working copy)
@@ -21,7 +21,7 @@
PROC_DRBD="/proc/drbd"
MODPROBE="modprobe"
RMMOD="rmmod"
-UDEV_TIMEOUT=10
+UDEV_TIMEOUT_ORIG=10
ADD_MOD_PARAM=""
if [ -f $DEFAULTFILE ]; then
@@ -45,9 +45,14 @@
RESOURCE=${RESOURCE%%\ *}
DEVICE=`$DRBDADM sh-dev $RESOURCE` || exit 20
- while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT -gt 0 ] ; do
- sleep 1
- UDEV_TIMEOUT=$(( $UDEV_TIMEOUT-1 ))
+ for resource in `$DRBDADM sh-resources`; do
+ for dev in `$DRBDADM sh-dev $resource`; do
+ UDEV_TIMEOUT=$UDEV_TIMEOUT_ORIG
+ while [ ! -e $dev ] && [ $UDEV_TIMEOUT -gt 0 ] ; do
+ sleep 1
+ UDEV_TIMEOUT=$(( $UDEV_TIMEOUT-1 ))
+ done
+ done
done
}