[DRBD-user] DRBD+Heartbeat

Gustav Reiz reg at texoapplication.se
Mon Mar 21 13:35:30 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,
 
I have a problem with my MySQL+DRBD+Heartbeat setup which I don't really know how to resolve. There are two nodes which are connected both using a dedicated replication link(crossover cable) and they are on the same network outwards. A virtual IP is used which is switched over to the active node using Heartbeat. The nodes are running a Primary/Secondary setup. I heartbeat over both networks. 
My /etc/ha.d/ha.cf(identical on both nodes) looks like this:
logfacility local0
keepalive 1
deadtime 5
warntime 3
initdead 45
ucast eth0 192.168.23.241
ucast eth0 192.168.23.242
bcast eth1
auto_failback off
node ubuntubox1
node ubuntubox2
 
 In the case where node1 is running in primary mode and I kill the computer, node2 detects this and takes over as it should. But when I unplug the network cable(not the crossover one) this is detected but since the heartbeat is going over both interfaces it can still be heard and nothing is done. In this case I would typically want to switch over so the Secondary node goes Primary. I tried to remove the heartbeat over the replication link(eth1 in the config file) so I could detect the disconnect. This gave me other problems, It seems like both boxes think they are the surviving one, but since the DRBD replication link is still up Box1(the one that was Secondary) tries to take over the resources but they are still locked by Box2(Primary) and fails. Logs below:
 
Box1(Secondary):
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: node ubuntubox2: is dead
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: No STONITH device configured.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: Shared disks are not protected.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Resources being acquired from ubuntubox2.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Link ubuntubox2:eth0 dead.
Mar 21 12:21:46 ubuntubox1 heartbeat: [22939]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Mar 21 12:21:46 ubuntubox1 harc[22939]: info: Running /etc/ha.d//rc.d/status status
Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: mach_down takeover complete for node ubuntubox2.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: mach_down takeover complete.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq(): child count 1
Mar 21 12:21:46 ubuntubox1 IPaddr2[23008]: INFO:  Resource is stopped
Mar 21 12:21:46 ubuntubox1 heartbeat: [22940]: info: Local Resource acquisition completed.
Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq(): child count 1
Mar 21 12:21:46 ubuntubox1 heartbeat: [23094]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Mar 21 12:21:46 ubuntubox1 harc[23094]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
Mar 21 12:21:46 ubuntubox1 ip-request-resp[23094]: received ip-request-resp IPaddr2::192.168.23.240/24/eth0/192.168.23.255 OK yes
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Acquiring resource group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql
Mar 21 12:21:46 ubuntubox1 IPaddr2[23145]: INFO:  Resource is stopped
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 start
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip -f inet addr add 192.168.23.240/24 brd 192.168.23.255 dev eth0
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip link set eth0 up
Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.23.240 eth0 192.168.23.240 auto not_used not_used
Mar 21 12:21:46 ubuntubox1 IPaddr2[23228]: INFO:  Success
Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/drbddisk  start
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: CRIT: Giving up resources due to failure of drbddisk
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Releasing resource group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/mysql  stop
Mar 21 12:21:58 ubuntubox1 mysql[23446]: Rather than invoking init scripts through /etc/init.d, use the service(8) utility, e.g. service mysql stop Since the script you are attempting to invoke has been converted to an Upstart job, you may also use the stop(8) utility, e.g. stop mysql
Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/drbd ext3 stop
Mar 21 12:21:59 ubuntubox1 Filesystem[23490]: INFO: Running stop for /dev/drbd0 on /mnt/drbd
Mar 21 12:21:59 ubuntubox1 Filesystem[23482]: INFO:  Success
Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/drbddisk  stop
Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 stop
Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: IP status = ok, IP_CIP=
Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: ip -f inet addr delete 192.168.23.240/24 dev eth0
Mar 21 12:21:59 ubuntubox1 IPaddr2[23585]: INFO:  Success
Mar 21 12:22:29 ubuntubox1 hb_standby[23790]: Going standby [foreign].
Mar 21 12:22:29 ubuntubox1 heartbeat: [8324]: ERROR: msg2string: Message with zero fields
 
Box2(Primary):
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: WARN: node ubuntubox1: is dead
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Dead node ubuntubox1 gave up resources.
Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Link ubuntubox1:eth0 dead.
 
If I plug in the network cable again it seems to get up and running and make Box1 Primary which is fine. But is there any way to resolve this so if a box is removed from the network(while the replication link is still up) it can switch over to the other box?
 
Also if mysql would to crash, the software that is, this would not be detected either. Is there a way to get heartbeat to check if mysql is running as well and switch over in case of software crash?
 
Best Regards
Gustav Reiz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110321/3d9fdf8a/attachment.htm>


More information about the drbd-user mailing list