[DRBD-user] drbd with heartbeat won't fail over

Thu Jun 14 17:00:24 CEST 2007

On Thu, Jun 14, 2007 at 10:37:38AM -0400, Dan Gahlinger wrote:
> I posted this in linux-ha but got no response, and didn't even see my post get
> to the list.
> so here it is here. seems more like a drbd issue anyhow.
> 
> I have two systems, with heartbeat and DRBD installed.
> Initially I tested with just DRBD, and was able to fail back and forth very
> well and easily.
> 
> However, when using heartbeat, it won't fail over, no matter what I do. status
> doesn't change.
> 
> I have it setup so that DRBD goes over a cross-over cable between the two
> systems on a private IP.
> and heartbeat is run over the public (internet facing) interfaces.
> 
> My heartbeat config looks like this:
> 
> vi /etc/ha.d/ha.cf -
> logfacility local0
> 
> logfile /var/log/ha-log
> 
> debugfile /var/log/ha-debug
> 
> udpport 694
> 
> keepalive 1
> 
> deadtime 60
> 
> bcast eth0
> 
> node LAB-TEST-01
       ^^^^^^^^^^^^ [1]
> 
> node LAB-TEST-02
> 
> auto_failback on

I don't like automatic failback.

it may even be dangerous
(in case you have some misbehaving resource agent on stop ...
if you don't know what I mean, consider yourself happy
to have missed out on one of the most fun parts setting up
a heartbeat cluster)

in a "homogeneous" 2-node-failover-cluster
(i.e. both nodes are more or less identical)
it does not make much sense.

and to have a non-homogeneous cluster is
not a good idea either (most of the time).

even then, operator will get paged for the first failover,
and if deemd useful, will initiate the switch-back by hand.

> and /etc/ha.d/haresources (note IP address is the virtual public IP):

( this is all one long single line, right?
  if not, you _have_ to use backslash! )
> lab-test-01 192.168.10.218 drbddisk Filesystem::/dev/drbd0::/mysql::ext3 Filesystem::/dev/drbd1::/data::ext3
  ^^^^^^^^^^^ [1]            ^^^^^^^^[2]

[1] should be the same cAsE (preferably both small).
    it must be the actual node name, as reported by "uname -n"
[2] please use one drbddisk statement per drbd resource explicitly.
    drbddisk::r0 drbddisk::r1
    (or whatever your resource names are in drbd.conf)

> configs on both systems are the same, hosts files identical with all
> the entries.  I've tried with auto_failback on and off seems to make
> no difference.
> 
> I test by pulling the public cable on lab-test-01, or using ifconfig eth0 down
> 
> Also, when I bring the server back up drbd can't see the other system
> (either one), it becomes
> secondary/unknown and primary/unknown.
> 
> It seems for some cases I need to use the drbdadm primary all on the
> primary at boot up to fix that.
> One other note about the heartbeat issue above. I found if I enter the
> commands manually it seems to work.
> which makes it really weird.
> 
> Can anyone tell me what's going wrong?

the heartneat log file(s) (ha-debug)?

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.