<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.E-postmall17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=SV link=blue vlink=purple><div class=WordSection1><p class=MsoNormal>Hello,<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><span lang=EN-US>I have a problem with my MySQL+DRBD+Heartbeat setup which I don’t really know how to resolve. There are two nodes which are connected both using a dedicated replication link(crossover cable) and they are on the same network outwards. A virtual IP is used which is switched over to the active node using Heartbeat. The nodes are running a Primary/Secondary setup. I heartbeat over both networks. <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>My /etc/ha.d/ha.cf(identical on both nodes) looks like this:<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>logfacility local0<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>keepalive 1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>deadtime 5<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>warntime 3<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>initdead 45<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>ucast eth0 192.168.23.241<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>ucast eth0 192.168.23.242<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>bcast eth1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>auto_failback off<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>node ubuntubox1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>node ubuntubox2<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US> In the case where node1 is running in primary mode and I kill the computer, node2 detects this and takes over as it should. But when I unplug the network cable(not the crossover one) this is detected but since the heartbeat is going over both interfaces it can still be heard and nothing is done. In this case I would typically want to switch over so the Secondary node goes Primary. I tried to remove the heartbeat over the replication link(eth1 in the config file) so I could detect the disconnect. This gave me other problems, It seems like both boxes think they are the surviving one, but since the DRBD replication link is still up Box1(the one that was Secondary) tries to take over the resources but they are still locked by Box2(Primary) and fails. Logs below:<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Box1(Secondary):<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: node ubuntubox2: is dead<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: No STONITH device configured.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: WARN: Shared disks are not protected.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Resources being acquired from ubuntubox2.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: Link ubuntubox2:eth0 dead.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [22939]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 harc[22939]: info: Running /etc/ha.d//rc.d/status status<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 mach_down[22977]: info: mach_down takeover complete for node ubuntubox2.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: info: mach_down takeover complete.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq(): child count 1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 IPaddr2[23008]: INFO: Resource is stopped<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [22940]: info: Local Resource acquisition completed.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [8322]: debug: StartNextRemoteRscReq(): child count 1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 heartbeat: [23094]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 harc[23094]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 ip-request-resp[23094]: received ip-request-resp IPaddr2::192.168.23.240/24/eth0/192.168.23.255 OK yes<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Acquiring resource group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 IPaddr2[23145]: INFO: Resource is stopped<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 start<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip -f inet addr add 192.168.23.240/24 brd 192.168.23.255 dev eth0<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: ip link set eth0 up<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 IPaddr2[23254]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.23.240 eth0 192.168.23.240 auto not_used not_used<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 IPaddr2[23228]: INFO: Success<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:46 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/drbddisk start<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: CRIT: Giving up resources due to failure of drbddisk<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Releasing resource group: ubuntubox1 IPaddr2::192.168.23.240/24/eth0/192.168.23.255 drbddisk Filesystem::/dev/drbd0::/mnt/drbd::ext3 mysql<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/mysql stop<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:58 ubuntubox1 mysql[23446]: Rather than invoking init scripts through /etc/init.d, use the service(8) utility, e.g. service mysql stop Since the script you are attempting to invoke has been converted to an Upstart job, you may also use the stop(8) utility, e.g. stop mysql<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:58 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/drbd ext3 stop<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:59 ubuntubox1 Filesystem[23490]: INFO: Running stop for /dev/drbd0 on /mnt/drbd<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:59 ubuntubox1 Filesystem[23482]: INFO: Success<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/drbddisk stop<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:59 ubuntubox1 ResourceManager[23117]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.23.240/24/eth0/192.168.23.255 stop<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: IP status = ok, IP_CIP=<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:59 ubuntubox1 IPaddr2[23611]: INFO: ip -f inet addr delete 192.168.23.240/24 dev eth0<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:59 ubuntubox1 IPaddr2[23585]: INFO: Success<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:22:29 ubuntubox1 hb_standby[23790]: Going standby [foreign].<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:22:29 ubuntubox1 heartbeat: [8324]: ERROR: msg2string: Message with zero fields<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Box2(Primary):<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: WARN: node ubuntubox1: is dead<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Dead node ubuntubox1 gave up resources.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Mar 21 12:21:45 ubuntubox2 heartbeat: [4553]: info: Link ubuntubox1:eth0 dead.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>If I plug in the network cable again it seems to get up and running and make Box1 Primary which is fine. But is there any way to resolve this so if a box is removed from the network(while the replication link is still up) it can switch over to the other box?<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Also if mysql would to crash, the software that is, this would not be detected either. Is there a way to get heartbeat to check if mysql is running as well and switch over in case of software crash?<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Best Regards<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Gustav Reiz<o:p></o:p></span></p></div></body></html>