[DRBD-user] Slave node can't adquire resources...

Daniel Ruiz Molina daniel.ruiz at caos.uab.es
Thu Oct 18 13:09:57 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi!

I have two problems with 2 HA Servers. First problem is master node 
hangs whenever it wants... (this problem won't be discussed here). 
Second problem is when master node hangs, slave node can't adquire 
resources and gives an RC=20.
Master node hangs  with a "special" hangs because it continues "alive" 
with all services "running" but don't working. In other words, master 
node continues with all resources and these resources are not working 
because of filesystem error or something misterious...

But, in spite of this, slave node detects master node as a dead node and 
tries to adquire resources... but it seems master node doesn't want to 
leaves reources and slave cant adquire...

After debugging with level 10, I put debug and ha-log files (resume...)

DEBUG
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
polled_input_check(): result = 1
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
polled_input_dispatch() {
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
}/*polled_input_dispatch*/;
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: hb_send_local_status() {
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: PID 4392: Sending 
local status curnode = 80716b0 status: active
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
add_control_msg_fields: input packet
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
add_control_msg_fields: packet returned
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: got msg in 
process_outbound_packet
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: process_clustermsg: 
node [nodo-mirror]
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message allocated: 
0x8256714: refcnt 1
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message 0x8256714: 
refcnt 1
Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: Sending pkt to 
/dev/ttyS0 [131 bytes]
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: Message 0x8256714 freed.
Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: >>> t=status 
st=active src=nodo-mirror seq=36f1f hg=b1 ts=4714ff40 ld=0.00 0.00 0.00 
2/71 14121 ttl=3 auth=1 4754bfb8 <<<
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
}/*hb_send_local_status*/;
Oct 16 20:13:20 nodo-mirror heartbeat[4542]: debug: serial write 
returned 131
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
polled_input_prepare(): timeout=-1
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
polled_input_check(): result = 0
Oct 16 20:13:20 nodo-mirror heartbeat[4392]: debug: 
polled_input_prepare(): timeout=-1
Oct 16 20:13:22 nodo-mirror heartbeat: debug: 
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd0 /opt ext3 r0 rw,usrquota start 
done. RC=20
Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting 
/etc/ha.d/resource.d/firewall  stop
Oct 16 20:13:22 nodo-mirror heartbeat: debug: 
/etc/ha.d/resource.d/firewall  stop done. RC=0
Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting 
/etc/ha.d/resource.d/cups  stop
Oct 16 20:13:22 nodo-mirror heartbeat: debug: /etc/ha.d/resource.d/cups  
stop done. RC=0
Oct 16 20:13:22 nodo-mirror heartbeat: debug: Starting 
/etc/ha.d/resource.d/mysql-ndb  stop
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
polled_input_check(): result = 1
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
polled_input_dispatch() {
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
}/*polled_input_dispatch*/;
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: hb_send_local_status() {
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: PID 4392: Sending 
local status curnode = 80716b0 status: active
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
add_control_msg_fields: input packet
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
add_control_msg_fields: packet returned
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: got msg in 
process_outbound_packet
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: process_clustermsg: 
node [nodo-mirror]
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message allocated: 
0x8257d04: refcnt 1
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message 0x8257d04: 
refcnt 1
Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: Sending pkt to 
/dev/ttyS0 [131 bytes]
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: Message 0x8257d04 freed.
Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: >>> t=status 
st=active src=nodo-mirror seq=36f22 hg=b1 ts=4714ff46 ld=0.00 0.00 0.00 
2/71 14439 ttl=3 auth=1 cd66902c <<<
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
}/*hb_send_local_status*/;
Oct 16 20:13:26 nodo-mirror heartbeat[4542]: debug: serial write 
returned 131
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
polled_input_prepare(): timeout=-1
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
polled_input_check(): result = 0
Oct 16 20:13:26 nodo-mirror heartbeat[4392]: debug: 
polled_input_prepare(): timeout=-1
Oct 16 20:13:27 nodo-mirror kernel: eth0: no IPv6 routers present
Oct 16 20:13:28 nodo-mirror heartbeat: debug: 
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota 
start done. RC=20
Oct 16 20:13:28 nodo-mirror heartbeat: debug: Starting 
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop
Oct 16 20:13:28 nodo-mirror heartbeat: debug: 
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop 
done. RC=0


HA-LOG
heartbeat: 2007/10/16_20:13:14 info: MSG[4] : [hg=b1]
heartbeat: 2007/10/16_20:13:14 info: MSG[5] : [ts=4714ff3a]
heartbeat: 2007/10/16_20:13:14 info: MSG[6] : [ld=0.00 0.00 0.00 2/66 13936]
heartbeat: 2007/10/16_20:13:14 info: MSG[7] : [ttl=3]
heartbeat: 2007/10/16_20:13:14 info: MSG[8] : [auth=1 d32c4413]
heartbeat: 2007/10/16_20:13:16 WARN: node nodo: is dead
heartbeat: 2007/10/16_20:13:16 info: MSG: Dumping message with 3 fields
heartbeat: 2007/10/16_20:13:16 info: MSG[0] : [t=stonith]
heartbeat: 2007/10/16_20:13:16 info: MSG[1] : [node=nodo]
heartbeat: 2007/10/16_20:13:16 info: MSG[2] : [result=n_stnth]
heartbeat: 2007/10/16_20:13:16 info: MSG: Dumping message with 10 fields
heartbeat: 2007/10/16_20:13:16 info: AnnounceTakeover(local 1, foreign 
1, reason 'T_RESOURCES(us)' (1))
heartbeat: 2007/10/16_20:13:16 info: Acquiring resource group: nodo 
CAOS_network::1.1.1.1/20/eth0/1.1.1.255
heartbeat: 2007/10/16_20:13:16 info: Running 
/etc/ha.d/resource.d/CAOS_network 1.1.1.1/20/eth0/1.1.1.255 start
heartbeat: 2007/10/16_20:13:16 info: /sbin/ip -f inet addr add 
1.1.1.1/20 brd 1.1.1.255 dev eth0
heartbeat: 2007/10/16_20:13:16 info: /sbin/ip link set eth0 up
heartbeat: 2007/10/16_20:13:16 /usr/lib/heartbeat/send_arp -i 200 -r 5 
-p /var/lib/heartbeat/rsctmp/send_arp/send_arp-1.1.1.1 eth0 1.1.1.1 auto 
1.1.1.1 ffffffffffff
heartbeat: 2007/10/16_20:13:17 info: Taking over resource group 
CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota
heartbeat: 2007/10/16_20:13:17 info: Acquiring resource group: nodo 
CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota postfix apache2 mysql 
mysql-ndb-mgm mysql-ndb cups firewall
heartbeat: 2007/10/16_20:13:17 info: Running 
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd0 /opt ext3 r0 rw,usrquota start
heartbeat: 2007/10/16_20:13:22 ERROR: Return code 20 from 
/etc/ha.d/resource.d/CAOS_drbd
heartbeat: 2007/10/16_20:13:22 CRIT: Giving up resources due to failure 
of CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota
heartbeat: 2007/10/16_20:13:22 info: Releasing resource group: nodo 
CAOS_drbd::/dev/drbd0::/opt::ext3::r0::rw,usrquota postfix apache2 mysql 
mysql-ndb-mgm mysql-ndb cups firewall
heartbeat: 2007/10/16_20:13:22 info: Running 
/etc/ha.d/resource.d/firewall  stop
heartbeat: 2007/10/16_20:13:22 info: Running /etc/ha.d/resource.d/cups  stop
heartbeat: 2007/10/16_20:13:22 info: Running 
/etc/ha.d/resource.d/mysql-ndb  stop
heartbeat: 2007/10/16_20:13:28 ERROR: Return code 20 from 
/etc/ha.d/resource.d/CAOS_drbd
heartbeat: 2007/10/16_20:13:28 CRIT: Giving up resources due to failure 
of CAOS_drbd::/dev/drbd1::/home::ext3::r1::rw,usrquota
heartbeat: 2007/10/16_20:13:28 info: Releasing resource group: nodo 
CAOS_drbd::/dev/drbd1::/home::ext3::r1::rw,usrquota
heartbeat: 2007/10/16_20:13:28 info: Running 
/etc/ha.d/resource.d/CAOS_drbd /dev/drbd1 /home ext3 r1 rw,usrquota stop
heartbeat: 2007/10/16_20:13:28 WARNING: Filesystem /home not mounted?
heartbeat: 2007/10/16_20:13:28 info: Taking over resource group ntp
heartbeat: 2007/10/16_20:13:28 info: Acquiring resource group: nodo ntp
heartbeat: 2007/10/16_20:13:28 info: Running /etc/ha.d/resource.d/ntp  start
heartbeat: 2007/10/16_20:13:28 info: Taking over resource group usermin
heartbeat: 2007/10/16_20:13:28 info: Acquiring resource group: nodo 
usermin webmin
heartbeat: 2007/10/16_20:13:28 info: Running 
/etc/ha.d/resource.d/webmin  start
heartbeat: 2007/10/16_20:13:28 ERROR: Return code 127 from 
/etc/ha.d/resource.d/webmin
heartbeat: 2007/10/16_20:13:28 CRIT: Giving up resources due to failure 
of webmin
heartbeat: 2007/10/16_20:13:28 info: Releasing resource group: nodo 
usermin webmin
heartbeat: 2007/10/16_20:13:28 info: Running 
/etc/ha.d/resource.d/webmin  stop
heartbeat: 2007/10/16_20:13:28 ERROR: Return code 127 from 
/etc/ha.d/resource.d/webmin
heartbeat: 2007/10/16_20:13:28 info: MSG: Dumping message with 2 fields
heartbeat: 2007/10/16_20:13:48 info: MSG[8] : [auth=1 85e94a07]
heartbeat: 2007/10/16_20:13:49 info: Retrying failed stop operation 
[usermin]
heartbeat: 2007/10/16_20:13:49 info: Running 
/etc/ha.d/resource.d/usermin  stop
heartbeat: 2007/10/16_20:13:49 ERROR: Return code 127 from 
/etc/ha.d/resource.d/usermin
heartbeat: 2007/10/16_20:13:49 CRIT: Resource STOP failure. Reboot required!
heartbeat: 2007/10/16_20:13:49 CRIT: Killing heartbeat ungracefully!
heartbeat: 2007/10/16_20:15:06 info: AUTH: i=1: key = 0x80d613c, 
auth=0xa7fd0d5c, authname=crc
heartbeat: 2007/10/16_20:15:06 info: AUTH: i=2: key = 0x80d65cc, 
auth=0xa7e2d32c, authname=sha1
heartbeat: 2007/10/16_20:15:06 info: AUTH: i=3: key = 0x80d6a5c, 
auth=0xa7e28b6c, authname=md5
heartbeat: 2007/10/16_20:15:06 info: **************************


Well... bufff...excuse me about this loooooooooooooooong mail, but I 
NEED HELP!!!!!!!!!!!

Thanks!!




More information about the drbd-user mailing list