Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I have a problem with my MySQL+DRBD+Heartbeat setup which I don't really know how to resolve. There are two nodes which are connected both using a dedicated replication link(crossover cable) and they are on the same network outwards. A virtual IP is used which is switched over to the active node using Heartbeat. The nodes are running a Primary/Secondary setup. I heartbeat over both networks. My /etc/ha.d/ha.cf(identical on both nodes) looks like this: logfacility local0 keepalive 1 deadtime 5 warntime 3 initdead 45 ucast eth0 192.168.23.241 ucast eth0 192.168.23.242 bcast eth1 auto_failback off node ubuntubox1 node ubuntubox2 In the case where node1 is running in primary mode and I kill the computer, node2 detects this and takes over as it should. But when I unplug the network cable(not the crossover one) this is detected but since the heartbeat is going over both interfaces it can still be heard and nothing is done. In this case I would typically want to switch over so the Secondary node goes Primary. I tried to remove the heartbeat over the replication link(eth1 in the config file) so I could detect the disconnect. This gave me other problems, It seems like both boxes think they are the surviving one, but since the DRBD replication link is still up Box1(the one that was Secondary) tries to take over the resources but they are still locked by Box2(Primary) and fails. Logs below: Box1(Secondary): Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: node ubuntubox2: is dead Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: No STONITH device configured. Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: Shared disks are not protected. Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Resources being acquired from ubuntubox2. Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Link ubuntubox2:eth0 dead. Mar 21 12:21:46 ubuntubox1 heartbeat: [22939]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Mar 21 12:21:46 ubuntubox1 harc[22939]: info: Running /etc/ha.d//rc.d/status status Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: mach_down takeover complete for node ubuntubox2. Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: mach_down takeover complete. Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq(): child count 1 Mar 21 12:21:46 ubuntubox1 IPaddr2[23008]: INFO: Resource is stopped Mar 21 12:21:46 ubuntubox1 heartbeat: [22940]: info: Local Resource acquisition completed. Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq(): child count 1 Mar 21 12:21:46 ubuntubox1 heartbeat: [23094]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Mar 21 12:21:46 ubuntubox1 harc[23094]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp Mar 21 12:21:46 ubuntubox1 ip-request-resp[23094]: received ip-request-resp IPaddr2::192.168.23.240/24/eth0/192.168.23.255 OK yes Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Acquiring resource group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql Mar 21 12:21:46 ubuntubox1 IPaddr2[23145]: INFO: Resource is stopped Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 start Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip -f inet addr add 192.168.23.240/24 brd 192.168.23.255 dev eth0 Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip link set eth0 up Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.23.240 eth0 192.168.23.240 auto not_used not_used Mar 21 12:21:46 ubuntubox1 IPaddr2[23228]: INFO: Success Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/drbddisk start Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: CRIT: Giving up resources due to failure of drbddisk Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Releasing resource group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/mysql stop Mar 21 12:21:58 ubuntubox1 mysql[23446]: Rather than invoking init scripts through /etc/init.d, use the service(8) utility, e.g. service mysql stop Since the script you are attempting to invoke has been converted to an Upstart job, you may also use the stop(8) utility, e.g. stop mysql Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/drbd ext3 stop Mar 21 12:21:59 ubuntubox1 Filesystem[23490]: INFO: Running stop for /dev/drbd0 on /mnt/drbd Mar 21 12:21:59 ubuntubox1 Filesystem[23482]: INFO: Success Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/drbddisk stop Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 stop Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: IP status = ok, IP_CIP= Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: ip -f inet addr delete 192.168.23.240/24 dev eth0 Mar 21 12:21:59 ubuntubox1 IPaddr2[23585]: INFO: Success Mar 21 12:22:29 ubuntubox1 hb_standby[23790]: Going standby [foreign]. Mar 21 12:22:29 ubuntubox1 heartbeat: [8324]: ERROR: msg2string: Message with zero fields Box2(Primary): Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: WARN: node ubuntubox1: is dead Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Dead node ubuntubox1 gave up resources. Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Link ubuntubox1:eth0 dead. If I plug in the network cable again it seems to get up and running and make Box1 Primary which is fine. But is there any way to resolve this so if a box is removed from the network(while the replication link is still up) it can switch over to the other box? Also if mysql would to crash, the software that is, this would not be detected either. Is there a way to get heartbeat to check if mysql is running as well and switch over in case of software crash? Best Regards Gustav Reiz -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110321/3d9fdf8a/attachment.htm>