<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18241"></HEAD>
<BODY>
<DIV><FONT size=2 face=Arial><SPAN class=859365320-25062009>Hi
There,</SPAN></FONT></DIV>
<DIV><FONT size=2 face=Arial><SPAN
class=859365320-25062009></SPAN></FONT> </DIV>
<DIV><FONT size=2 face=Arial><SPAN class=859365320-25062009>I am currently
running a HA environment that consists of the following:</SPAN></FONT></DIV>
<DIV><FONT size=2 face=Arial><SPAN
class=859365320-25062009></SPAN></FONT> </DIV>
<DIV><FONT size=2 face=Arial><SPAN class=859365320-25062009>-2x Red Hat
Enterprise Linux 5.1 ES servers</SPAN></FONT></DIV>
<DIV><FONT size=2 face=Arial><SPAN class=859365320-25062009>-Both running
drbd-8.2.5-3<BR>-Both running heartbeat-2.1.3 </SPAN></FONT></DIV>
<DIV><FONT size=2 face=Arial><SPAN
class=859365320-25062009></SPAN></FONT> </DIV>
<DIV><FONT size=2 face=Arial><SPAN class=859365320-25062009>-DRBD's replication
link is over it's own private network eth1 10.1.1.X connected using a 1GBps
switch.</SPAN></FONT></DIV>
<DIV><FONT size=2 face=Arial><SPAN class=859365320-25062009>-Heartbeats running
over the LAN on eth0 192.168.0.XXX unicast</SPAN></FONT></DIV>
<DIV><FONT size=2 face=Arial><SPAN class=859365320-25062009>- There are two
separate HA clusters sharing the same replication switch, as you can see in
the logs they are setup to unicast on different ports therefore I would assume
this should be fine (Maybe they should be on separate switches or even
VLANd? </SPAN></FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>These are both
production servers that serve mysql, coldfusion and httpd. I am running
into a strange problem where at around 2am most mornings the primary server
becomes somewhat unresponsive. By "somewhat" I mean the
following:</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>- Can still ping the
primary node</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>- Cluster IP address
is still up</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>- a cat of
/proc/drbd shows the primary and secondary as being in their respective roles
(Not failed over).</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>The problem we are
facing is that for some strange reason the Primary can no longer be accessed
remotely via ssh (even VNC). While at the physical server the
console is completely unresponsive, both keyboard and mouse are unresponsive,
prompting for a physical shutdown of the server. When the server is
shutdown the secondary assumes primary correctly and once the primary is brought
back online it joins the cluster and assumes it's respective roll
correctly. </FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>At the end of this
email I will post my config files incase anyone can shed any light on the
situation. </FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>The log files
(/var/log/messages, ha-log, ha-debug) all show no indication on what may be
happening. </FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial><STRONG><U>My
DRBD.conf file -</U></STRONG></FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>global { usage-count
yes; }<BR>common { syncer { rate 500M; } }<BR>resource r0
{<BR> protocol
C;<BR>
handlers
{<BR>
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt
-f";<BR>
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt
-f";<BR>
local-io-error "echo o > /proc/sysrq-trigger ; halt
-f";<BR>
#outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t
5";<BR>
outdate-peer
"/usr/lib/heartbeat/drbd-peer-outdater";<BR>
pri-lost "echo pri-lost. Check the Log Files. | mail -s 'DRBD Alert'
root";<BR>
#split-brain "echo split-brain. drbdadm -- --discard-my-data connect
$DRBD_RESOURCE ? | mail - 'DRBD Alert'
root";<BR>
}</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial> startup
{<BR>
wfc-timeout
30;<BR>
}<BR> disk
{<BR>
fencing
resource-only;<BR>
}<BR> net
{<BR>
cram-hmac-alg
sha1;<BR>
shared-secret
"FooFunFactory";<BR>
}<BR> on (FQDN_OF_PRI_NODE) {<BR>
device /dev/drbd1;<BR>
disk /dev/sda3;<BR>
address 10.1.1.2:7789;<BR>
meta-disk internal;<BR> }</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial> on
(FQDN_OF_SEC_NODE) {<BR>
device /dev/drbd1;<BR>
disk /dev/sda3;<BR>
address 10.1.1.5:7789;<BR>
meta-disk internal;<BR> }</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial>}<BR></FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial><STRONG><U>My HA.CF
config -</U></STRONG></FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>debugfile
/var/log/ha-debug<BR>logfile /var/log/ha-log</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial>logfacility local0<BR>keepalive 1<BR>deadtime
10</DIV></FONT></SPAN>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>warntime
5</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>initdead
120</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>udpport
6694</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>bcast
eth0 #
Linux</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>ucast eth0
(IP_Address_Of_Secondary_Node)</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>auto_failback
on<BR>node (FQDN_OF_PRI_NODE)
(FQDN_OF_SEC_NODE)</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial>ping (ROUTER_IP)<BR>respawn hacluster
/usr/lib/heartbeat/ipfail<BR>use_logd yes<BR></FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial><STRONG><U>My
HARESOURCES config -</U></STRONG> </FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial>(FQDN_OF_PRI) (CLUSTER_IP) drbddisk::r0
Filesystem::/dev/drbd1::/data::ext3 mysqld</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>If anyone has
experienced this sort of behaviour before please let me know, I cannot replicate
the issue within my testing environment.</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>Any help would be
much appreciated.</FONT></SPAN></DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=859365320-25062009><FONT size=2 face=Arial>Kind Regards,</DIV>
<DIV><BR></DIV></FONT></SPAN>
<DIV align=left><FONT size=2
face=Verdana>___________________________________________________</FONT><TD
height="25"></TD><TD class=smalltext></TD></DIV>
<DIV align=left><FONT size=2 face=Tahoma></FONT> </DIV>
<DIV align=left><FONT size=2 face=Tahoma></FONT> </DIV>
<DIV align=left><FONT size=2 face=Tahoma>Adam Taylor |
Engineer </FONT><FONT size=2 face=Tahoma><FONT color=#ff8000>|
<STRONG>WML Software</STRONG><BR></FONT>Unit 3c | 14-22 Triton
Drive | Albany | Auckland</FONT></DIV>
<DIV align=left><BR><FONT size=2><FONT
face=Tahoma>P. +64 9 477
4555 | F. +64 9 478
6926</FONT></FONT></DIV>
<DIV align=left><FONT size=2><FONT face=Tahoma>DDI. +64
9 477 6375 | MOB. +64 21 621
519 </FONT></FONT></DIV>
<DIV align=left><FONT size=2><FONT
face=Tahoma>E. </FONT></FONT><A
href="mailto:adam.taylor@wml.co.nz"><FONT size=2
face=Tahoma>adam.taylor@wml.co.nz</FONT></A><BR><FONT size=2><FONT
face=Tahoma>W<STRONG>.</STRONG>
</FONT></FONT><A href="http://www.wml.co.nz/"><FONT size=2
face=Tahoma>www.wml.co.nz</FONT></A><FONT size=2 face=Tahoma> |
</FONT><A href="http://www.compose.co.nz/"><FONT size=2
face=Tahoma>www.compose.co.nz</FONT></A><BR><FONT size=2
face=Tahoma>
<BR></FONT><IMG alt="WML Software" src="cid:859365320@25062009-0568" width=376
height=105><BR></DIV>
<DIV> </DIV></BODY></HTML>