Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, the second node of a cluster takes over all vservers after: /usr/lib/heartbeat/hb_takeover all All vservers (except one that is not in use) start correctly. User PCs can work. But if the first node is not shut down or hearbeat shut down, the the first node takes over all vservers again! Why? ResourceManager[25181]: 2011/02/27_13:00:42 info: Running /etc/ha.d/resource.d/vserver lennyfax start heartbeat[24363]: 2011/02/27_13:00:48 info: all HA resource acquisition completed (standby). heartbeat[3125]: 2011/02/27_13:00:48 info: Standby resource acquisition done [all]. heartbeat[3125]: 2011/02/27_13:00:48 info: remote resource transition completed. hb_standby[28604]: 2011/02/27_13:01:03 Going standby [foreign]. heartbeat[3125]: 2011/02/27_13:01:03 info: prax2 wants to go standby [foreign] heartbeat[3125]: 2011/02/27_13:01:14 WARN: No reply to standby request. Standby request cancelled. heartbeat[3125]: 2011/02/27_13:01:19 WARN: node prax1: is dead heartbeat[3125]: 2011/02/27_13:01:19 info: Dead node prax1 gave up resources. heartbeat[3125]: 2011/02/27_13:01:19 info: Link prax1:eth0 dead. heartbeat[3125]: 2011/02/27_13:01:19 info: Link prax1:eth1 dead. I've rebootet first node after mv /etc/init.d/heatbeat /etc/init.d/heatbeat_ so eth0 and eth1 were dead for a moment. "Standby request cancelled" - WHO did send that request? - heartbeat of first node not, its still turned off (heatbeat_). I've never written a script that may be called in crontab, /etc/init.d/xxxxx. I've searched with grep in crontab, /etc/init.d to find something that calls ".. standby" But I find nothing. That the state now, heartbeat of first node after reboot ist turned off (heartbeat_): prax2:/etc/init.d# cat /proc/drbd | egrep -v '(ns:|resync:|act_log:)' version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r---- 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- 4: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- 5: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- After starting heatbeat on first node again (mv heartbeat_ heartbead; ./heatbeat start) the second node does NOT want to go standby itself. It seems that "go standby" in second node is only called once after the takeover-command. BTW: the vserver on drbd0 is a 64Bit lenny machine without function. The second node is a 32bit lenny, first machine 64bit lenny. Unfortunately I mixed 32bit and 64bit on both nodes. The other vservers are 32bit installations. # drbdadm Version: 8.3.7 (api:88) on both machines thx Ekkard