Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi!
I have two problems with 2 HA Servers. First problem is master node
hangs whenever it wants... (this problem won't be discussed here).
Second problem is when master node hangs, slave node can't adquire
resources and gives an RC=20.
Master node hangs with a "special" hangs because it continues "alive"
with all services "running" but don't working. In other words, master
node continues with all resources and these resources are not working
because of filesystem error or something misterious...
But, in spite of this, slave node detects master node as a dead node and
tries to adquire resources... but it seems master node doesn't want to
leaves reources and slave cant adquire...
After debugging with level 10, I put debug and ha-log files (resume...)
DEBUG
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
polled_input_check(): result = 1
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
polled_input_dispatch() {
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
}/*polled_input_dispatch*/;
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: hb_send_local_status() {
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: PID 4392: Sending
local status curnode = 80716b0 status: active
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
add_control_msg_fields: input packet
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
add_control_msg_fields: packet returned
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: got msg in
process_outbound_packet
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: process_clustermsg:
node [nodo-mirror]
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message allocated:
0x8256714: refcnt 1
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message 0x8256714:
refcnt 1
Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: Sending pkt to
/dev/ttyS0 [131 bytes]
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message 0x8256714 freed.
Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: >>> t=status
st=active src=nodo-mirror seq=36f1f hg=b1 ts=4714ff40 ld=0.00 0.00 0.00
2/71 14121 ttl=3 auth=1 4754bfb8 <<<
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
}/*hb_send_local_status*/;
Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: serial write
returned 131
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
polled_input_prepare(): timeout=-1
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
polled_input_check(): result = 0
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug:
polled_input_prepare(): timeout=-1
Oct 16 20:13:22 nodo-mirror heartbeat: debug:
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd0 /opt ext3 r0 rw,usrquota start
done. RC=20
Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting
/etc/ha.d/resource.d/firewall stop
Oct 16 20:13:22 nodo-mirror heartbeat: debug:
/etc/ha.d/resource.d/firewall stop done. RC=0
Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting
/etc/ha.d/resource.d/cups stop
Oct 16 20:13:22 nodo-mirror heartbeat: debug: /etc/ha.d/resource.d/cups
stop done. RC=0
Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting
/etc/ha.d/resource.d/mysql-ndb stop
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
polled_input_check(): result = 1
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
polled_input_dispatch() {
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
}/*polled_input_dispatch*/;
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: hb_send_local_status() {
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: PID 4392: Sending
local status curnode = 80716b0 status: active
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
add_control_msg_fields: input packet
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
add_control_msg_fields: packet returned
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: got msg in
process_outbound_packet
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: process_clustermsg:
node [nodo-mirror]
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message allocated:
0x8257d04: refcnt 1
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message 0x8257d04:
refcnt 1
Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: Sending pkt to
/dev/ttyS0 [131 bytes]
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message 0x8257d04 freed.
Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: >>> t=status
st=active src=nodo-mirror seq=36f22 hg=b1 ts=4714ff46 ld=0.00 0.00 0.00
2/71 14439 ttl=3 auth=1 cd66902c <<<
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
}/*hb_send_local_status*/;
Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: serial write
returned 131
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
polled_input_prepare(): timeout=-1
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
polled_input_check(): result = 0
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug:
polled_input_prepare(): timeout=-1
Oct 16 20:13:27 nodo-mirror kernel: eth0: no IPv6 routers present
Oct 16 20:13:28 nodo-mirror heartbeat: debug:
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota
start done. RC=20
Oct 16 20:13:28 nodo-mirror heartbeat: debug: Starting
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop
Oct 16 20:13:28 nodo-mirror heartbeat: debug:
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop
done. RC=0
HA-LOG
heartbeat: 2007/10/16_20:13:14 info: MSG[4] : [hg=b1]
heartbeat: 2007/10/16_20:13:14 info: MSG[5] : [ts=4714ff3a]
heartbeat: 2007/10/16_20:13:14 info: MSG[6] : [ld=0.00 0.00 0.00 2/66 13936]
heartbeat: 2007/10/16_20:13:14 info: MSG[7] : [ttl=3]
heartbeat: 2007/10/16_20:13:14 info: MSG[8] : [auth=1 d32c4413]
heartbeat: 2007/10/16_20:13:16 WARN: node nodo: is dead
heartbeat: 2007/10/16_20:13:16 info: MSG: Dumping message with 3 fields
heartbeat: 2007/10/16_20:13:16 info: MSG[0] : [t=stonith]
heartbeat: 2007/10/16_20:13:16 info: MSG[1] : [node=nodo]
heartbeat: 2007/10/16_20:13:16 info: MSG[2] : [result=n_stnth]
heartbeat: 2007/10/16_20:13:16 info: MSG: Dumping message with 10 fields
heartbeat: 2007/10/16_20:13:16 info: AnnounceTakeover(local 1, foreign
1, reason 'T_RESOURCES(us)' (1))
heartbeat: 2007/10/16_20:13:16 info: Acquiring resource group: nodo
CAOS_network::1.1.1.1/20/eth0/1.1.1.255
heartbeat: 2007/10/16_20:13:16 info: Running
/etc/ha.d/resource.d/CAOS_network 1.1.1.1/20/eth0/1.1.1.255 start
heartbeat: 2007/10/16_20:13:16 info: /sbin/ip -f inet addr add
1.1.1.1/20 brd 1.1.1.255 dev eth0
heartbeat: 2007/10/16_20:13:16 info: /sbin/ip link set eth0 up
heartbeat: 2007/10/16_20:13:16 /usr/lib/heartbeat/send_arp -i 200 -r 5
-p /var/lib/heartbeat/rsctmp/send_arp/send_arp-1.1.1.1 eth0 1.1.1.1 auto
1.1.1.1 ffffffffffff
heartbeat: 2007/10/16_20:13:17 info: Taking over resource group
CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota
heartbeat: 2007/10/16_20:13:17 info: Acquiring resource group: nodo
CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota postfix apache2 mysql
mysql-ndb-mgm mysql-ndb cups firewall
heartbeat: 2007/10/16_20:13:17 info: Running
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd0 /opt ext3 r0 rw,usrquota start
heartbeat: 2007/10/16_20:13:22 ERROR: Return code 20 from
/etc/ha.d/resource.d/CAOS_drbd
heartbeat: 2007/10/16_20:13:22 CRIT: Giving up resources due to failure
of CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota
heartbeat: 2007/10/16_20:13:22 info: Releasing resource group: nodo
CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota postfix apache2 mysql
mysql-ndb-mgm mysql-ndb cups firewall
heartbeat: 2007/10/16_20:13:22 info: Running
/etc/ha.d/resource.d/firewall stop
heartbeat: 2007/10/16_20:13:22 info: Running /etc/ha.d/resource.d/cups stop
heartbeat: 2007/10/16_20:13:22 info: Running
/etc/ha.d/resource.d/mysql-ndb stop
heartbeat: 2007/10/16_20:13:28 ERROR: Return code 20 from
/etc/ha.d/resource.d/CAOS_drbd
heartbeat: 2007/10/16_20:13:28 CRIT: Giving up resources due to failure
of CAOS_drbd::/dev/drbd1::/home::ext3::r1::rw,usrquota
heartbeat: 2007/10/16_20:13:28 info: Releasing resource group: nodo
CAOS_drbd::/dev/drbd1::/home::ext3::r1::rw,usrquota
heartbeat: 2007/10/16_20:13:28 info: Running
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop
heartbeat: 2007/10/16_20:13:28 WARNING: Filesystem /home not mounted?
heartbeat: 2007/10/16_20:13:28 info: Taking over resource group ntp
heartbeat: 2007/10/16_20:13:28 info: Acquiring resource group: nodo ntp
heartbeat: 2007/10/16_20:13:28 info: Running /etc/ha.d/resource.d/ntp start
heartbeat: 2007/10/16_20:13:28 info: Taking over resource group usermin
heartbeat: 2007/10/16_20:13:28 info: Acquiring resource group: nodo
usermin webmin
heartbeat: 2007/10/16_20:13:28 info: Running
/etc/ha.d/resource.d/webmin start
heartbeat: 2007/10/16_20:13:28 ERROR: Return code 127 from
/etc/ha.d/resource.d/webmin
heartbeat: 2007/10/16_20:13:28 CRIT: Giving up resources due to failure
of webmin
heartbeat: 2007/10/16_20:13:28 info: Releasing resource group: nodo
usermin webmin
heartbeat: 2007/10/16_20:13:28 info: Running
/etc/ha.d/resource.d/webmin stop
heartbeat: 2007/10/16_20:13:28 ERROR: Return code 127 from
/etc/ha.d/resource.d/webmin
heartbeat: 2007/10/16_20:13:28 info: MSG: Dumping message with 2 fields
heartbeat: 2007/10/16_20:13:48 info: MSG[8] : [auth=1 85e94a07]
heartbeat: 2007/10/16_20:13:49 info: Retrying failed stop operation
[usermin]
heartbeat: 2007/10/16_20:13:49 info: Running
/etc/ha.d/resource.d/usermin stop
heartbeat: 2007/10/16_20:13:49 ERROR: Return code 127 from
/etc/ha.d/resource.d/usermin
heartbeat: 2007/10/16_20:13:49 CRIT: Resource STOP failure. Reboot required!
heartbeat: 2007/10/16_20:13:49 CRIT: Killing heartbeat ungracefully!
heartbeat: 2007/10/16_20:15:06 info: AUTH: i=1: key = 0x80d613c,
auth=0xa7fd0d5c, authname=crc
heartbeat: 2007/10/16_20:15:06 info: AUTH: i=2: key = 0x80d65cc,
auth=0xa7e2d32c, authname=sha1
heartbeat: 2007/10/16_20:15:06 info: AUTH: i=3: key = 0x80d6a5c,
auth=0xa7e28b6c, authname=md5
heartbeat: 2007/10/16_20:15:06 info: **************************
Well... bufff...excuse me about this loooooooooooooooong mail, but I
NEED HELP!!!!!!!!!!!
Thanks!!