Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
you mean like this? lab-test-01 192.168.10.218 drbddisk::r0 Filesystem::/dev/drbd0::/mysql::ext3 drbddisk::r1 Filesystem::/dev/drbd1::/data::ext3 I'll do this and run it again, and post the debug. the weird thing is the debug says it releases the IP resource, but it never actually does. it says "success" "success" but doesn't actually do anything. here's a portion of the ha-log: ResourceManager: 2007/06/13_12:45:08 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /data ext3 stop Filesystem: 2007/06/13_12:45:08 INFO: Running stop for /dev/drbd1 on /data Filesystem: 2007/06/13_12:45:08 INFO: Success ResourceManager: 2007/06/13_12:45:08 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mysql ext3 stop Filesystem: 2007/06/13_12:45:08 INFO: Running stop for /dev/drbd0 on /mysql Filesystem: 2007/06/13_12:45:08 INFO: Success ResourceManager: 2007/06/13_12:45:08 info: Running /etc/ha.d/resource.d/drbddisk r1 stop ResourceManager: 2007/06/13_12:45:08 info: Running /etc/ha.d/resource.d/drbddisk r0 stop ResourceManager: 2007/06/13_12:45:08 info: Running /etc/ha.d/resource.d/IPaddr 192.168.100.218 stop IPaddr: 2007/06/13_12:45:08 INFO: /sbin/ifconfig eth0:0 192.168.100.218 down IPaddr: 2007/06/13_12:45:08 INFO: Success mach_down: 2007/06/13_12:45:08 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down: 2007/06/13_12:45:08 info: mach_down takeover complete for node lab-test-nag01. heartbeat: 2007/06/13_12:45:08 info: mach_down takeover complete. heartbeat: 2007/06/13_12:45:13 info: Local Resource acquisition completed. (none) heartbeat: 2007/06/13_12:45:13 info: local resource transition completed. hb_standby: 2007/06/13_12:45:38 Going standby [foreign]. heartbeat: 2007/06/13_12:45:38 info: lab-test-nag02 wants to go standby [foreign] heartbeat: 2007/06/13_12:45:49 WARN: No reply to standby request. Standby request cancelled BTW I use auto-failback for a specific reason - you always know which one is the primary. That is, if your servers are in a remote location, managed by different group, and you want to do maintenance, you can be reasonably sure it's ok to remove the secondary from service. But it's just a thought, not totally critical. the ha-debug is way too huge to post. I could send attached, off-list. recommend? Dan. On 6/14/07, Lars Ellenberg <lars.ellenberg at linbit.com> wrote: > > On Thu, Jun 14, 2007 at 10:37:38AM -0400, Dan Gahlinger wrote: > > I posted this in linux-ha but got no response, and didn't even see my > post get > > to the list. > > so here it is here. seems more like a drbd issue anyhow. > > > > I have two systems, with heartbeat and DRBD installed. > > Initially I tested with just DRBD, and was able to fail back and forth > very > > well and easily. > > > > However, when using heartbeat, it won't fail over, no matter what I do. > status > > doesn't change. > > > > I have it setup so that DRBD goes over a cross-over cable between the > two > > systems on a private IP. > > and heartbeat is run over the public (internet facing) interfaces. > > > > My heartbeat config looks like this: > > > > vi /etc/ha.d/ha.cf - > > logfacility local0 > > > > logfile /var/log/ha-log > > > > debugfile /var/log/ha-debug > > > > udpport 694 > > > > keepalive 1 > > > > deadtime 60 > > > > bcast eth0 > > > > node LAB-TEST-01 > ^^^^^^^^^^^^  > > > > node LAB-TEST-02 > > > > auto_failback on > > I don't like automatic failback. > > it may even be dangerous > (in case you have some misbehaving resource agent on stop ... > if you don't know what I mean, consider yourself happy > to have missed out on one of the most fun parts setting up > a heartbeat cluster) > > in a "homogeneous" 2-node-failover-cluster > (i.e. both nodes are more or less identical) > it does not make much sense. > > and to have a non-homogeneous cluster is > not a good idea either (most of the time). > > even then, operator will get paged for the first failover, > and if deemd useful, will initiate the switch-back by hand. > > > and /etc/ha.d/haresources (note IP address is the virtual public IP): > > ( this is all one long single line, right? > if not, you _have_ to use backslash! ) > > lab-test-01 192.168.10.218 drbddisk Filesystem::/dev/drbd0::/mysql::ext3 > Filesystem::/dev/drbd1::/data::ext3 > ^^^^^^^^^^^  ^^^^^^^^ > >  should be the same cAsE (preferably both small). > it must be the actual node name, as reported by "uname -n" >  please use one drbddisk statement per drbd resource explicitly. > drbddisk::r0 drbddisk::r1 > (or whatever your resource names are in drbd.conf) > > > configs on both systems are the same, hosts files identical with all > > the entries. I've tried with auto_failback on and off seems to make > > no difference. > > > > I test by pulling the public cable on lab-test-01, or using ifconfig > eth0 down > > > > Also, when I bring the server back up drbd can't see the other system > > (either one), it becomes > > secondary/unknown and primary/unknown. > > > > It seems for some cases I need to use the drbdadm primary all on the > > primary at boot up to fix that. > > One other note about the heartbeat issue above. I found if I enter the > > commands manually it seems to work. > > which makes it really weird. > > > > Can anyone tell me what's going wrong? > > the heartneat log file(s) (ha-debug)? > > > -- > : Lars Ellenberg Tel +43-1-8178292-0 : > : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : > : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : > __ > please use the "List-Reply" function of your email client. > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070614/b2fda2de/attachment.htm>