[DRBD-user] Avoid split brain in a dual primary configuration with intelligent switches

Jürgen Scholz juergen at kernkraft400.com
Tue Feb 17 16:21:27 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello!

This is my second try. Some days ago this mail did not make it to the  
list. The original poster mentioned that the information helped him a  
lot, so I'm trying again to share with you all.


> If it cannot ping switches it means that all its ethernet cards or  
> all switches are broken so it shutdown itself.
> If it can ping switches it means that other primary is broken so it  
> tries to stonith it.
You probably will need linux-ha/heartbeat for this.

I run a similar setup, and cannot change the hardware or the way it  
works for various reasons. So I wrote a linux-ha resource-agent some  
time ago. You have to check if you can adapt it to your situation. It  
basically behaves like:

1. We want to go primary and cannot reach our peer
2. Can we ping a gateway/router/switch?
If YES:
3. Return true and let heatbeat continue to initialize the services
If NO:
3. Return false and let heartbeat stop it's efforts to initialize the  
services

You will have to familiarize yourself with heartbeat, but it can  
supply the functionality that you're asking for.


hth,
juergen

The Script:
==============

#!/bin/bash
#
# ipcheck
#
#
# ATTENTION: IF YOU USE HEARTBEAT IN A NETWORK CONFIGURATION IN WHICH
#            IT SEEMS AS IF YOU NEED THIS TYPE OF SUPPORT SCRIPT, IT IS
#            HIGHLY ADVISED THAT YOU FIX YOUR NETWORK AND DO NOT USE
#            THIS.
#
# This is a resource script for heartbeat. It is used to check if  
routers
# or gateways are up/reachable via icmp ping when the heartbeats are  
sent
# over an ip network.
# This should prevent the slave(s) to become active when the routers or
# gateways are unreachable.
#
#
#
# VERSION HISTORY
#
#    2007-03-16 Initial version
#
#
# Written by Juergen Scholz <juergen at kernkraft400.com>
# licenced under the GPL v3 or newer
#
# For information about the requirements of this script see:
# http://linux-ha.org/HeartbeatResourceAgent


##################
### CONFIGURATION#
##################

HARESOURCES=/etc/ha.d/haresources
PING=/bin/ping
DEBUG=off

##################
### FUNCTIONS    #
##################

function get_primary_host {
  cat $HARESOURCES | grep -v '#' | head -n 1 | cut -d" " -f1
}

function get_hostname {
  # sed probably isn't necessary
  hostname | sed -r s/[[:space:]]//
}

function test_address {
  # Let's see if the network is reachable
  # ping parameters:
  #   -One ping response to exit
  #    \- Unfortunately this does not work with GNU ping
  #   -Numeric output
  #   -i Wait
  #   -Count
  #   -Timeout
  $PING -n -i 1 -c 3 -t 4 $1 > /dev/null 2>&1
  PING_RET_VAL=$?

  if [ $PING_RET_VAL != "0" ]; then
    return 1
  fi
  return 0
}

function debug_echo {
  # Print one argument at a time
  while (($#))
  do
    echo -n "$1 "
    shift
  done
  # Add the trailing new-line
  echo
}

function check_ping {
  # Check if $PING is set correctly
  if [ ! -x $PING ]; then
    echo
    echo "The \$PING variable is set to to $PING, but the file does  
not exist or is not executable."
    echo
    exit 1
  fi
}

function check_arguments {
  # Check amount of arguments
  # We need at least one destination and a command
  if [ $1 -lt 2 ]; then
    help
    exit 1
  fi
}

function check_master_slave {
  # Always exit this with 0 (success) if called on the primary machine
  # since we want the master to be serving when the network to the  
backup
  # isn't functioning
  if [ `get_hostname` = `get_primary_host` ]; then
    if [ $DEBUG = "on" ]; then
      echo This is the PRIMARY host. Exiting with 0.
    fi
    exit 0
  else
    if [ $DEBUG = "on" ]; then
      echo This is a SECONDARY host. Checking network.
    fi
  fi
}

function help {
  echo 'Usage: ipcheck destination [ destination ... ] (start|stop| 
status)'
  echo
  echo '   destination: hostname or ip address'
  echo
  echo '   This will always return 0 for stop and status.'
}

##################
### MAIN ROUTINE #
##################

# Tell the user we're in debug mode
if [ $DEBUG = "on" ]; then
  echo $0 in debug mode...
fi

# Little sanity checks
check_ping
check_arguments $#

# Get the command out of the last supplied argument
eval CMD='${'$#'}'
if [ $DEBUG = "on" ]; then
  echo We have been called with command: $CMD
fi

# Look for the 'start', 'stop' or status argument
case "$CMD" in
# START
start)
  if [ $DEBUG = "on" ]; then
    echo Our hostname: `get_hostname`
    echo The primary hostname: `get_primary_host`
  fi

  check_master_slave

  # Test all the supplied addresses
  let LAST_DEST_ARG=$#-1
  for i in `seq 1 $LAST_DEST_ARG`; do
    if [ $DEBUG = "on" ]; then
      echo -n Testing address $i : $1
    fi

    test_address $1
    TEST=$?
    if [ $TEST -eq 0 ]; then
      if [ $DEBUG = "on" ]; then
        echo ' ...successful.'
      fi
    else
      if [ $DEBUG = "on" ]; then
        echo ' ...failed.'
        echo Exiting with 1.
      fi
      exit 1
    fi

    # make the next argument available as $1
    shift
  done

  # Since the for loop was executed we assume that everything is ok.
  if [ $DEBUG = "on" ]; then
    echo Everything seems to be reacheable.
    echo Exiting with 0.
  fi
  exit 0
# end of start)
;;

# STOP
stop)
  if [ $DEBUG = "on" ]; then
    echo Exiting with 0.
  fi
  exit 0
;;
# end of stop

# STATUS
stop)
  if [ $DEBUG = "on" ]; then
    echo Exiting with 1.
  fi
  exit 1
;;
# end of status

# EVERYTHING ELSE
*)
    help
    exit 1
;;

esac




More information about the drbd-user mailing list