Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi! I have two problems with 2 HA Servers. First problem is master node hangs whenever it wants... (this problem won't be discussed here). Second problem is when master node hangs, slave node can't adquire resources and gives an RC=20. Master node hangs with a "special" hangs because it continues "alive" with all services "running" but don't working. In other words, master node continues with all resources and these resources are not working because of filesystem error or something misterious... But, in spite of this, slave node detects master node as a dead node and tries to adquire resources... but it seems master node doesn't want to leaves reources and slave cant adquire... After debugging with level 10, I put debug and ha-log files (resume...) DEBUG Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: polled_input_check(): result = 1 Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: polled_input_dispatch() { Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: }/*polled_input_dispatch*/; Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: hb_send_local_status() { Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: PID 4392: Sending local status curnode = 80716b0 status: active Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: add_control_msg_fields: input packet Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: add_control_msg_fields: packet returned Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: got msg in process_outbound_packet Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: process_clustermsg: node [nodo-mirror] Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message allocated: 0x8256714: refcnt 1 Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message 0x8256714: refcnt 1 Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: Sending pkt to /dev/ttyS0 [131 bytes] Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message 0x8256714 freed. Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: >>> t=status st=active src=nodo-mirror seq=36f1f hg=b1 ts=4714ff40 ld=0.00 0.00 0.00 2/71 14121 ttl=3 auth=1 4754bfb8 <<< Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: }/*hb_send_local_status*/; Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: serial write returned 131 Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: polled_input_prepare(): timeout=-1 Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: polled_input_check(): result = 0 Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: polled_input_prepare(): timeout=-1 Oct 16 20:13:22 nodo-mirror heartbeat: debug: /etc/ha.d/resource.d/CAOS_drbd /dev/drbd0 /opt ext3 r0 rw,usrquota start done. RC=20 Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting /etc/ha.d/resource.d/firewall stop Oct 16 20:13:22 nodo-mirror heartbeat: debug: /etc/ha.d/resource.d/firewall stop done. RC=0 Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting /etc/ha.d/resource.d/cups stop Oct 16 20:13:22 nodo-mirror heartbeat: debug: /etc/ha.d/resource.d/cups stop done. RC=0 Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting /etc/ha.d/resource.d/mysql-ndb stop Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: polled_input_check(): result = 1 Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: polled_input_dispatch() { Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: }/*polled_input_dispatch*/; Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: hb_send_local_status() { Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: PID 4392: Sending local status curnode = 80716b0 status: active Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: add_control_msg_fields: input packet Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: add_control_msg_fields: packet returned Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: got msg in process_outbound_packet Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: process_clustermsg: node [nodo-mirror] Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message allocated: 0x8257d04: refcnt 1 Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message 0x8257d04: refcnt 1 Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: Sending pkt to /dev/ttyS0 [131 bytes] Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message 0x8257d04 freed. Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: >>> t=status st=active src=nodo-mirror seq=36f22 hg=b1 ts=4714ff46 ld=0.00 0.00 0.00 2/71 14439 ttl=3 auth=1 cd66902c <<< Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: }/*hb_send_local_status*/; Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: serial write returned 131 Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: polled_input_prepare(): timeout=-1 Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: polled_input_check(): result = 0 Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: polled_input_prepare(): timeout=-1 Oct 16 20:13:27 nodo-mirror kernel: eth0: no IPv6 routers present Oct 16 20:13:28 nodo-mirror heartbeat: debug: /etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota start done. RC=20 Oct 16 20:13:28 nodo-mirror heartbeat: debug: Starting /etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop Oct 16 20:13:28 nodo-mirror heartbeat: debug: /etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop done. RC=0 HA-LOG heartbeat: 2007/10/16_20:13:14 info: MSG[4] : [hg=b1] heartbeat: 2007/10/16_20:13:14 info: MSG[5] : [ts=4714ff3a] heartbeat: 2007/10/16_20:13:14 info: MSG[6] : [ld=0.00 0.00 0.00 2/66 13936] heartbeat: 2007/10/16_20:13:14 info: MSG[7] : [ttl=3] heartbeat: 2007/10/16_20:13:14 info: MSG[8] : [auth=1 d32c4413] heartbeat: 2007/10/16_20:13:16 WARN: node nodo: is dead heartbeat: 2007/10/16_20:13:16 info: MSG: Dumping message with 3 fields heartbeat: 2007/10/16_20:13:16 info: MSG[0] : [t=stonith] heartbeat: 2007/10/16_20:13:16 info: MSG[1] : [node=nodo] heartbeat: 2007/10/16_20:13:16 info: MSG[2] : [result=n_stnth] heartbeat: 2007/10/16_20:13:16 info: MSG: Dumping message with 10 fields heartbeat: 2007/10/16_20:13:16 info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) heartbeat: 2007/10/16_20:13:16 info: Acquiring resource group: nodo CAOS_network::1.1.1.1/20/eth0/1.1.1.255 heartbeat: 2007/10/16_20:13:16 info: Running /etc/ha.d/resource.d/CAOS_network 1.1.1.1/20/eth0/1.1.1.255 start heartbeat: 2007/10/16_20:13:16 info: /sbin/ip -f inet addr add 1.1.1.1/20 brd 1.1.1.255 dev eth0 heartbeat: 2007/10/16_20:13:16 info: /sbin/ip link set eth0 up heartbeat: 2007/10/16_20:13:16 /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-1.1.1.1 eth0 1.1.1.1 auto 1.1.1.1 ffffffffffff heartbeat: 2007/10/16_20:13:17 info: Taking over resource group CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota heartbeat: 2007/10/16_20:13:17 info: Acquiring resource group: nodo CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota postfix apache2 mysql mysql-ndb-mgm mysql-ndb cups firewall heartbeat: 2007/10/16_20:13:17 info: Running /etc/ha.d/resource.d/CAOS_drbd /dev/drbd0 /opt ext3 r0 rw,usrquota start heartbeat: 2007/10/16_20:13:22 ERROR: Return code 20 from /etc/ha.d/resource.d/CAOS_drbd heartbeat: 2007/10/16_20:13:22 CRIT: Giving up resources due to failure of CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota heartbeat: 2007/10/16_20:13:22 info: Releasing resource group: nodo CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota postfix apache2 mysql mysql-ndb-mgm mysql-ndb cups firewall heartbeat: 2007/10/16_20:13:22 info: Running /etc/ha.d/resource.d/firewall stop heartbeat: 2007/10/16_20:13:22 info: Running /etc/ha.d/resource.d/cups stop heartbeat: 2007/10/16_20:13:22 info: Running /etc/ha.d/resource.d/mysql-ndb stop heartbeat: 2007/10/16_20:13:28 ERROR: Return code 20 from /etc/ha.d/resource.d/CAOS_drbd heartbeat: 2007/10/16_20:13:28 CRIT: Giving up resources due to failure of CAOS_drbd::/dev/drbd1::/home::ext3::r1::rw,usrquota heartbeat: 2007/10/16_20:13:28 info: Releasing resource group: nodo CAOS_drbd::/dev/drbd1::/home::ext3::r1::rw,usrquota heartbeat: 2007/10/16_20:13:28 info: Running /etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop heartbeat: 2007/10/16_20:13:28 WARNING: Filesystem /home not mounted? heartbeat: 2007/10/16_20:13:28 info: Taking over resource group ntp heartbeat: 2007/10/16_20:13:28 info: Acquiring resource group: nodo ntp heartbeat: 2007/10/16_20:13:28 info: Running /etc/ha.d/resource.d/ntp start heartbeat: 2007/10/16_20:13:28 info: Taking over resource group usermin heartbeat: 2007/10/16_20:13:28 info: Acquiring resource group: nodo usermin webmin heartbeat: 2007/10/16_20:13:28 info: Running /etc/ha.d/resource.d/webmin start heartbeat: 2007/10/16_20:13:28 ERROR: Return code 127 from /etc/ha.d/resource.d/webmin heartbeat: 2007/10/16_20:13:28 CRIT: Giving up resources due to failure of webmin heartbeat: 2007/10/16_20:13:28 info: Releasing resource group: nodo usermin webmin heartbeat: 2007/10/16_20:13:28 info: Running /etc/ha.d/resource.d/webmin stop heartbeat: 2007/10/16_20:13:28 ERROR: Return code 127 from /etc/ha.d/resource.d/webmin heartbeat: 2007/10/16_20:13:28 info: MSG: Dumping message with 2 fields heartbeat: 2007/10/16_20:13:48 info: MSG[8] : [auth=1 85e94a07] heartbeat: 2007/10/16_20:13:49 info: Retrying failed stop operation [usermin] heartbeat: 2007/10/16_20:13:49 info: Running /etc/ha.d/resource.d/usermin stop heartbeat: 2007/10/16_20:13:49 ERROR: Return code 127 from /etc/ha.d/resource.d/usermin heartbeat: 2007/10/16_20:13:49 CRIT: Resource STOP failure. Reboot required! heartbeat: 2007/10/16_20:13:49 CRIT: Killing heartbeat ungracefully! heartbeat: 2007/10/16_20:15:06 info: AUTH: i=1: key = 0x80d613c, auth=0xa7fd0d5c, authname=crc heartbeat: 2007/10/16_20:15:06 info: AUTH: i=2: key = 0x80d65cc, auth=0xa7e2d32c, authname=sha1 heartbeat: 2007/10/16_20:15:06 info: AUTH: i=3: key = 0x80d6a5c, auth=0xa7e28b6c, authname=md5 heartbeat: 2007/10/16_20:15:06 info: ************************** Well... bufff...excuse me about this loooooooooooooooong mail, but I NEED HELP!!!!!!!!!!! Thanks!!