[DRBD-user] drbd with heartbeat won't fail over

Dan Gahlinger dgahling at gmail.com
Thu Jun 14 18:04:14 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


you mean like this?

lab-test-01 192.168.10.218 drbddisk::r0 Filesystem::/dev/drbd0::/mysql::ext3
drbddisk::r1 Filesystem::/dev/drbd1::/data::ext3

I'll do this and run it again, and post the debug. the weird thing is the
debug says it releases the IP resource, but it never actually does.
it says "success" "success" but doesn't actually do anything.
here's a portion of the ha-log:

ResourceManager[32348]: 2007/06/13_12:45:08 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /data ext3 stop
Filesystem[32683]:      2007/06/13_12:45:08 INFO: Running stop for
/dev/drbd1 on /data
Filesystem[32678]:      2007/06/13_12:45:08 INFO:  Success
ResourceManager[32348]: 2007/06/13_12:45:08 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /mysql ext3 stop
Filesystem[32731]:      2007/06/13_12:45:08 INFO: Running stop for
/dev/drbd0 on /mysql
Filesystem[32726]:      2007/06/13_12:45:08 INFO:  Success
ResourceManager[32348]: 2007/06/13_12:45:08 info: Running
/etc/ha.d/resource.d/drbddisk r1 stop
ResourceManager[32348]: 2007/06/13_12:45:08 info: Running
/etc/ha.d/resource.d/drbddisk r0 stop
ResourceManager[32348]: 2007/06/13_12:45:08 info: Running
/etc/ha.d/resource.d/IPaddr 192.168.100.218 stop
IPaddr[371]:    2007/06/13_12:45:08 INFO: /sbin/ifconfig eth0:0
192.168.100.218 down
IPaddr[360]:    2007/06/13_12:45:08 INFO:  Success
mach_down[32328]:       2007/06/13_12:45:08 info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[32328]:       2007/06/13_12:45:08 info: mach_down takeover
complete for node lab-test-nag01.
heartbeat[32257]: 2007/06/13_12:45:08 info: mach_down takeover complete.
heartbeat[32257]: 2007/06/13_12:45:13 info: Local Resource acquisition
completed. (none)
heartbeat[32257]: 2007/06/13_12:45:13 info: local resource transition
completed.
hb_standby[421]:        2007/06/13_12:45:38 Going standby [foreign].
heartbeat[32257]: 2007/06/13_12:45:38 info: lab-test-nag02 wants to go
standby [foreign]
heartbeat[32257]: 2007/06/13_12:45:49 WARN: No reply to standby request.
Standby request cancelled

BTW I use auto-failback for a specific reason - you always know which one is
the primary. That is, if your servers are in a remote location, managed by
different group, and you want to do maintenance, you can be reasonably sure
it's ok to remove the secondary from service.
But it's just a thought, not totally critical.

the ha-debug is way too huge to post. I could send attached, off-list.
recommend?

Dan.

On 6/14/07, Lars Ellenberg <lars.ellenberg at linbit.com> wrote:
>
> On Thu, Jun 14, 2007 at 10:37:38AM -0400, Dan Gahlinger wrote:
> > I posted this in linux-ha but got no response, and didn't even see my
> post get
> > to the list.
> > so here it is here. seems more like a drbd issue anyhow.
> >
> > I have two systems, with heartbeat and DRBD installed.
> > Initially I tested with just DRBD, and was able to fail back and forth
> very
> > well and easily.
> >
> > However, when using heartbeat, it won't fail over, no matter what I do.
> status
> > doesn't change.
> >
> > I have it setup so that DRBD goes over a cross-over cable between the
> two
> > systems on a private IP.
> > and heartbeat is run over the public (internet facing) interfaces.
> >
> > My heartbeat config looks like this:
> >
> > vi /etc/ha.d/ha.cf -
> > logfacility local0
> >
> > logfile /var/log/ha-log
> >
> > debugfile /var/log/ha-debug
> >
> > udpport 694
> >
> > keepalive 1
> >
> > deadtime 60
> >
> > bcast eth0
> >
> > node LAB-TEST-01
>        ^^^^^^^^^^^^ [1]
> >
> > node LAB-TEST-02
> >
> > auto_failback on
>
> I don't like automatic failback.
>
> it may even be dangerous
> (in case you have some misbehaving resource agent on stop ...
> if you don't know what I mean, consider yourself happy
> to have missed out on one of the most fun parts setting up
> a heartbeat cluster)
>
> in a "homogeneous" 2-node-failover-cluster
> (i.e. both nodes are more or less identical)
> it does not make much sense.
>
> and to have a non-homogeneous cluster is
> not a good idea either (most of the time).
>
> even then, operator will get paged for the first failover,
> and if deemd useful, will initiate the switch-back by hand.
>
> > and /etc/ha.d/haresources (note IP address is the virtual public IP):
>
> ( this is all one long single line, right?
>   if not, you _have_ to use backslash! )
> > lab-test-01 192.168.10.218 drbddisk Filesystem::/dev/drbd0::/mysql::ext3
> Filesystem::/dev/drbd1::/data::ext3
>   ^^^^^^^^^^^ [1]            ^^^^^^^^[2]
>
> [1] should be the same cAsE (preferably both small).
>     it must be the actual node name, as reported by "uname -n"
> [2] please use one drbddisk statement per drbd resource explicitly.
>     drbddisk::r0 drbddisk::r1
>     (or whatever your resource names are in drbd.conf)
>
> > configs on both systems are the same, hosts files identical with all
> > the entries.  I've tried with auto_failback on and off seems to make
> > no difference.
> >
> > I test by pulling the public cable on lab-test-01, or using ifconfig
> eth0 down
> >
> > Also, when I bring the server back up drbd can't see the other system
> > (either one), it becomes
> > secondary/unknown and primary/unknown.
> >
> > It seems for some cases I need to use the drbdadm primary all on the
> > primary at boot up to fix that.
> > One other note about the heartbeat issue above. I found if I enter the
> > commands manually it seems to work.
> > which makes it really weird.
> >
> > Can anyone tell me what's going wrong?
>
> the heartneat log file(s) (ha-debug)?
>
>
> --
> : Lars Ellenberg                            Tel +43-1-8178292-0  :
> : LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
> : Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
> __
> please use the "List-Reply" function of your email client.
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070614/b2fda2de/attachment.htm>


More information about the drbd-user mailing list