[DRBD-user] DRBD: failover when sync connection dies?

Thu Dec 13 18:24:53 CET 2007

On 2007.12.13, at 17:42, Florian Haas wrote:

> On Thursday 13 December 2007 13:45:47 Martin Gombac wrote:
>> Hi,
>>
>> i have a simple fail-over heartbeat/drbd cluster set up. In each
>> server there are two network cards. One for internet access (WAN) and
>> one for internal communication and synchronization (LAN). If one node
>> looses internet connection (WAN) the other takes over all resources
>> as expected, but this does no thappen with local connection (LAN)
>> which is just crossover UTP cable.
>> In case if local connection (LAN) fails, cluster does not migrate all
>> resources to one of the nodes and just continues to work without
>> synchronized drbd resources (amongst other things).
>>
>> My question is this:
>> How can i make one node take over all resources if local crossover
>> connection fails?
>
> Give us your ha.cf, please.
>
> Florian
>
> --  
> : Florian G. Haas
> : LINBIT Information Technologies GmbH
> : Vivenotgasse 48, A-1120 Vienna, Austria

crm no

#
#	There are lots of options in this file.  All you have to have is a set
#	of nodes listed {"node ...} one of {serial, bcast, mcast, or ucast},
#	and a value for "auto_failback".
#
#	ATTENTION: As the configuration file is read line by line,
#		   THE ORDER OF DIRECTIVE MATTERS!
#
#	In particular, make sure that the udpport, serial baud rate
#	etc. are set before the heartbeat media are defined!
#	debug and log file directives go into effect when they
#	are encountered.
#
#	All will be fine if you keep them ordered as in this example.
#
#
#       Note on logging:
#       If any of debugfile, logfile and logfacility are defined then  
they
#       will be used. If debugfile and/or logfile are not defined and
#       logfacility is defined then the respective logging and debug
#       messages will be loged to syslog. If logfacility is not defined
#       then debugfile and logfile will be used to log messges. If
#       logfacility is not defined and debugfile and/or logfile are not
#       defined then defaults will be used for debugfile and logfile as
#       required and messages will be sent there.
#
#	File to write debug messages to
#debugfile /var/log/ha-debug
#
#
# 	File to write other messages to
#
#logfile	/var/log/ha-log
#
#
#	Facility to use for syslog()/logger
#
#logfacility	local0
#
#
#	A note on specifying "how long" times below...
#
#	The default time unit is seconds
#		10 means ten seconds
#
#	You can also specify them in milliseconds
#		1500ms means 1.5 seconds
#
#
#	keepalive: how long between heartbeats?
#
keepalive 2
#
#	deadtime: how long-to-declare-host-dead?
#
#		If you set this too low you will get the problematic
#		split-brain (or cluster partition) problem.
#		See the FAQ for how to use warntime to tune deadtime.
#
deadtime 30
#
#	warntime: how long before issuing "late heartbeat" warning?
#	See the FAQ for how to use warntime to tune deadtime.
#
warntime 10
#
#
#	Very first dead time (initdead)
#
#	On some machines/OSes, etc. the network takes a while to come up
#	and start working right after you've been rebooted.  As a result
#	we have a separate dead time for when things first come up.
#	It should be at least twice the normal dead time.
#
#initdead 120
#
#
#	What UDP port to use for bcast/ucast communication?
#
udpport	713
#
#	Baud rate for serial ports...
#
#baud	19200
#	
#	serial	serialportname ...
#serial	/dev/ttyS0	# Linux
#serial	/dev/cuaa0	# FreeBSD
#serial /dev/cuad0      # FreeBSD 6.x
#serial	/dev/cua/a	# Solaris
#
#
#	What interfaces to broadcast heartbeats over?
#
#bcast	eth0		# Linux
#bcast	eth1 eth2	# Linux
bcast eth0 eth1
#bcast	le0		# Solaris
#bcast	le1 le2		# Solaris
#
#	Set up a multicast heartbeat medium
#	mcast [dev] [mcast group] [port] [ttl] [loop]
#
#	[dev]		device to send/rcv heartbeats on
#	[mcast group]	multicast group to join (class D multicast address
#			224.0.0.0 - 239.255.255.255)
#	[port]		udp port to sendto/rcvfrom (set this value to the
#			same value as "udpport" above)
#	[ttl]		the ttl value for outbound heartbeats.  this effects
#			how far the multicast packet will propagate.  (0-255)
#			Must be greater than zero.
#	[loop]		toggles loopback for outbound multicast heartbeats.
#			if enabled, an outbound packet will be looped back and
#			received by the interface it was sent on. (0 or 1)
#			Set this value to zero.
#		
#
#mcast eth0 225.0.0.1 694 1 0
#
#	Set up a unicast / udp heartbeat medium
#	ucast [dev] [peer-ip-addr]
#
#	[dev]		device to send/rcv heartbeats on
#	[peer-ip-addr]	IP address of peer to send packets to
#
#ucast eth0 192.168.1.2
#
#
#	About boolean values...
#
#	Any of the following case-insensitive values will work for true:
#		true, on, yes, y, 1
#	Any of the following case-insensitive values will work for false:
#		false, off, no, n, 0
#
#
#
#	auto_failback:  determines whether a resource will
#	automatically fail back to its "primary" node, or remain
#	on whatever node is serving it until that node fails, or
#	an administrator intervenes.
#
#	The possible values for auto_failback are:
#		on	- enable automatic failbacks
#		off	- disable automatic failbacks
#		legacy	- enable automatic failbacks in systems
#			where all nodes do not yet support
#			the auto_failback option.
#
#	auto_failback "on" and "off" are backwards compatible with the old
#		"nice_failback on" setting.
#
#	See the FAQ for information on how to convert
#		from "legacy" to "on" without a flash cut.
#		(i.e., using a "rolling upgrade" process)
#
#	The default value for auto_failback is "legacy", which
#	will issue a warning at startup.  So, make sure you put
#	an auto_failback directive in your ha.cf file.
#	(note: auto_failback can be any boolean or "legacy")
#
auto_failback on
#
#
#       Basic STONITH support
#       Using this directive assumes that there is one stonith
#       device in the cluster.  Parameters to this device are
#       read from a configuration file. The format of this line is:
#
#         stonith <stonith_type> <configfile>
#
#       NOTE: it is up to you to maintain this file on each node in the
#       cluster!
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
#       STONITH support
#       You can configure multiple stonith devices using this directive.
#       The format of the line is:
#         stonith_host <hostfrom> <stonith_type> <params...>
#         <hostfrom> is the machine the stonith device is attached
#              to or * to mean it is accessible from any host.
#         <stonith_type> is the type of stonith device (a list of
#              supported drives is in /usr/lib/stonith.)
#         <params...> are driver specific parameters.  To see the
#              format for a particular device, run:
#           stonith -l -t <stonith_type>
#
#
#	Note that if you put your stonith device access information in
#	here, and you make this file publically readable, you're asking
#	for a denial of service attack ;-)
#
#	To get a list of supported stonith devices, run
#		stonith -L
#	For detailed information on which stonith devices are supported
#	and their detailed configuration options, run this command:
#		stonith -h
#
#stonith_host *     baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3  rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
#	Watchdog is the watchdog timer.  If our own heart doesn't beat for
#	a minute, then our machine will reboot.
#	NOTE: If you are using the software watchdog, you very likely
#	wish to load the module with the parameter "nowayout=0" or
#	compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even
#	an orderly shutdown of heartbeat will trigger a reboot, which is
#	very likely NOT what you want.
#
#watchdog /dev/watchdog
#
#	Tell what machines are in the cluster
#	node	nodename ...	-- must match uname -n
node	veliki
node	mali
#
#	Less common options...
#
#	Treats 10.10.10.254 as a psuedo-cluster-member
#	Used together with ipfail below...
#	note: don't use a cluster node as ping node	
#
#ping 10.10.10.254
#ping 88.200.116.1
#
#	Treats 10.10.10.254 and 10.10.10.253 as a psuedo-cluster-member
#       called group1. If either 10.10.10.254 or 10.10.10.253 are up
#       then group1 is up
#	Used together with ipfail below...
#
#ping_group group1 10.10.10.254 10.10.10.253
ping_group internet 88.200.116.1 193.2.85.3
#
#	HBA ping derective for Fiber Channel
#	Treats fc-card-name as psudo-cluster-member
#	used with ipfail below ...
#
#	You can obtain HBAAPI from http://hbaapi.sourceforge.net.  You need
#	to get the library specific to your HBA directly from the vender
#	To install HBAAPI stuff, all You need to do is to compile the common
#	part you obtained from the sourceforge. This will produce libHBAAPI.so
#	which you need to copy to /usr/lib. You need also copy hbaapi.h to
#	/usr/include.
#	
#	The fc-card-name is the name obtained from the hbaapitest program
#	that is part of the hbaapi package. Running hbaapitest will produce
#	a verbose output. One of the first line is similar to:
#		Apapter number 0 is named: qlogic-qla2200-0
#	Here fc-card-name is qlogic-qla2200-0. 	
#
#hbaping fc-card-name
#
#
#	Processes started and stopped with heartbeat.  Restarted unless
#		they exit with rc=100
#
#respawn userid /path/name/to/run
respawn cluster /usr/lib/heartbeat/ipfail
#
#	Access control for client api
#       	default is no access
#
#apiauth client-name gid=gidlist uid=uidlist
#apiauth ipfail gid=haclient uid=hacluster

###########################
#
#	Unusual options.
#
###########################
#
#	hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
#	deadping - dead time for ping nodes
#deadping 30
#
#	hbgenmethod - Heartbeat generation number creation method
#		Normally these are stored on disk and incremented as needed.
#hbgenmethod time
#
#	realtime - enable/disable realtime execution (high priority, etc.)
#		defaults to on
#realtime off
#
#	debug - set debug level
#		defaults to zero
#debug 1
#
#	API Authentication - replaces the fifo-permissions-based system of  
the past
#
#
#	You can put a uid list and/or a gid list.
#	If you put both, then a process is authorized if it qualifies under  
either
#	the uid list, or under the gid list.
#
#	The groupname "default" has special meaning.  If it is specified, then
#	this will be used for authorizing groupless clients, and any client  
groups
#	not otherwise specified.
#	
#	There is a subtle exception to this.  "default" will never be used  
in the
#	following cases (actual default auth directives noted in brackets)
#		  ipfail 	(uid=HA_CCMUSER)
#		  ccm 	 	(uid=HA_CCMUSER)
#		  ping		(gid=HA_APIGROUP)
#		  cl_status	(gid=HA_APIGROUP)
#
#	This is done to avoid creating a gaping security hole and matches  
the most
#	likely desired configuration.
#
#apiauth ipfail uid=hacluster
#apiauth ccm uid=hacluster
#apiauth cms uid=hacluster
#apiauth ping gid=haclient uid=alanr,root
#apiauth default gid=haclient

# 	message format in the wire, it can be classic or netstring,
#	default: classic
#msgfmt  classic/netstring

#	Do we use logging daemon?
#	If logging daemon is used, logfile/debugfile/logfacility in this file
#	are not meaningful any longer. You should check the config file for  
logging
#	daemon (the default is /etc/logd.cf)
#	more infomartion can be fould in http://www.linux-ha.org/ 
ha_2ecf_2fUseLogdDirective
#	Setting use_logd to "yes" is recommended
#	
use_logd yes
#
#	the interval we  reconnect to logging daemon if the previous  
connection failed
#	default: 60 seconds
conn_logd_time 60
#
#
#	Configure compression module
#	It could be zlib or bz2, depending on whether u have the corresponding
#	library	in the system.
compression	bz2
#
#	Confiugre compression threshold
#	This value determines the threshold to compress a message,
#	e.g. if the threshold is 1, then any message with size greater than  
1 KB
#	will be compressed, the default is 2 (KB)
compression_threshold 2