[DRBD-user] second node always "wants to go standby" after all works fine

Sun Feb 27 13:27:10 CET 2011

Hi, 

the second node of a cluster takes over all vservers after:

/usr/lib/heartbeat/hb_takeover all

All vservers (except one that is not in use) start correctly. User 
PCs can work. But if the first node is not shut down or hearbeat shut down, 
the the first node takes over all vservers again! Why? 

ResourceManager[25181]: 2011/02/27_13:00:42 info: Running /etc/ha.d/resource.d/vserver lennyfax start
heartbeat[24363]: 2011/02/27_13:00:48 info: all HA resource acquisition completed (standby).
heartbeat[3125]: 2011/02/27_13:00:48 info: Standby resource acquisition done [all].
heartbeat[3125]: 2011/02/27_13:00:48 info: remote resource transition completed.
hb_standby[28604]:      2011/02/27_13:01:03 Going standby [foreign].
heartbeat[3125]: 2011/02/27_13:01:03 info: prax2 wants to go standby [foreign]
heartbeat[3125]: 2011/02/27_13:01:14 WARN: No reply to standby request.  Standby request cancelled.
heartbeat[3125]: 2011/02/27_13:01:19 WARN: node prax1: is dead
heartbeat[3125]: 2011/02/27_13:01:19 info: Dead node prax1 gave up resources.
heartbeat[3125]: 2011/02/27_13:01:19 info: Link prax1:eth0 dead.
heartbeat[3125]: 2011/02/27_13:01:19 info: Link prax1:eth1 dead.

I've rebootet first node after   mv /etc/init.d/heatbeat /etc/init.d/heatbeat_
so eth0 and eth1 were dead for a moment. "Standby request cancelled" - WHO 
did send that request? - heartbeat of first node not, its still turned off (heatbeat_).

I've never written a script that may be called in crontab, /etc/init.d/xxxxx. I've
searched with grep in crontab, /etc/init.d  to find something that calls ".. standby"
But I find nothing.

That the state now, heartbeat of first node after reboot ist turned off (heartbeat_):

prax2:/etc/init.d# cat /proc/drbd | egrep -v '(ns:|resync:|act_log:)'
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757 
 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
 4: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
 5: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----

After starting heatbeat on first node again (mv heartbeat_ heartbead; ./heatbeat start) 
the second node does NOT want to go standby itself. It seems that "go standby" 
in second node is only called once after the takeover-command. 

BTW: the vserver on drbd0 is a 64Bit lenny machine without function. The second node is a 32bit lenny, 
    first machine 64bit lenny. Unfortunately I mixed 32bit and 64bit on both nodes. The other vservers 
    are 32bit installations.

# drbdadm
Version: 8.3.7 (api:88)   on both machines

thx
Ekkard