[DRBD-user] drbd + heartbeat resource failover

Thu Oct 28 20:46:51 CEST 2004

/ 2004-10-27 20:23:15 -0400
\ Anand Subramanian:
> I am trying drbd (0.7.5) with Heartbeat (1.2.3) on RHEL3 update 2 
> (2.4.21-ELsmp Linux kernel - comes default with the distro). I have a 
> question regarding resource failover which hopefully makes sense to 
> someone out here.
> 
> When my primary node (node1) goes down, I can see that node2 (hitherto, 
> the secondary node) takes over. But this takeover is not fully automated. 
> W.r.t Heartbeat, the resource seems to be drbddisk (specified in the 
> /etc/ha.d/haresources file on both nodes in identical fashion). 
> 
> When the primary goes down, the secondary node (in case of drbd) needs to 
> do something like :
> 
> a) make this secondary node the primary node
> 
> b) mount  /dev/drbd0  /mnt_point (In my case, I can see that I can mount 
> the drbd meta device "rw" only when the node is made a primary, else it is 
> mounted "ro" which doesn't help my application).
> 
> The scripts that heartbeat runs to check status (on the secondary, when 
> the primary fails) seems to be  /etc/ha.d/rc.d/status. I see this script 
> return a 0 always. Which means that the above actions (a) and (b) are not 
> being taken.

this has nothing to do with drbd or other specific resources.
it will run the mach_down script if it sees some "is dead" message.

> In short, whatever the ResourceManager (Heartbeat presumably) tries to do 
> by checking for resource status seems to be simply : 
> "/etc/ha.d/rc.d/status status" which returns 0.

and what is wrong with that.  it does much more than just return.
it decides whether or not to take over the foreign resources.

>  So even if the primary has 
> gone down, the secondary does not come up as the primary node in my 
> cluster and the meta device is not mounted properly. Has anyone had this 
> problem, or is it just that old friend  "configuration error" doing things 
> here? 
> 
>  Is this known behavior wr.t. heartbeat and drbd integration?  Is this 
> status script the right one used by others who have done Heartbeat + drbd 
> integration, or do I need to write my own scripts?

since it "just works" for most of us,
why would _you_ need to hack your own scripts.

> Should status be checked via the drbddisk script provided in the
> /etc.ha.d directory?

it is already.
and you saw the patch posted in mentioned other thread to cope with
heatbeat vs. drbd timeouts, as well as "fix" exit codes.

btw heartbeat [still] just ignores exit codes of status operations.
it greps for "running" in the output of the status operation of the
resource scripts. if this string is there, it is considered running,
if not, it is considered stopped.

all in all, this is not really a drbd question.
and you not even describe your actual scenario where you think your
error is. and you don't mention any error or log messages.
and you did not post your haresources or any other configuration.

and, because I can only read what you write, not what you thought while
you wrote it, I really do not see what you are asking specifically,
other than "it does not work. do I something wrong?"

and if it is really that question, I can only answer "probably you do.".

but maybe I just don't get it, so please bear with me.
you may want to try again...

	Lars Ellenberg

-- 
please use the "List-Reply" function of your email client.