[DRBD-user] DRBD + RHCS - Failover not working

James Perry jperry at mezeo.com
Wed Dec 9 23:56:46 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Thanks for your reply.  I did try the built-in ocf script but ran into some problems early on.  Now that I understand some of this a little better, let me go back and try it again.
I just wanted to make sure that it's not something I did wrong with DRBD.

Thanks for helping me understand.  I'll go find a RHCS forum somewhere to get RGMANAGER questions answered! :)

Have a great day!

On Dec 9, 2009, at 4:11 PM, Gianluca Cecchi wrote:

> On Wed, Dec 9, 2009 at 9:03 PM, James Perry <jperry at mezeo.com> wrote:
>> Hi Gianluca,
>> You have described exactly what I'm doing to test this.  I kill the postmaster process.  I've tried both a script resource as well as a postgres-8 resource but neither seem to work.  What's odd is that I can relocate the service to either node without issue.  It's only when I "Break" a node does the recovery fail.
>> Could this be a CentOS version issue?  I'm using CentOS 5.3.
>> Let me try to clean up all my resources and try again.  I'll also be installing a CentOS 5.4 to see if that matters.
>> Would you by chance have an example cluster.conf file that has Postgres and DRBD?
>> Thanks!
> probably we are going off topic; sorry to the list...
> I have not managed PostgreSQL in a HA environment yet.
> But I think I was partially wrong inside my previous post.
> Important: Always return "0" if the status is non-fatal.
> So in both cases where you get 1 or 3, they are fatal.... and my
> answer was not correct
> I think you should read
> http://sources.redhat.com/cluster/wiki/FAQ/RGManager
> and in particular the sections regarding:
> The rgmanager keeps stopping and restarting
> mysql/named/ypserv/httpd/other script. Why?
> and eventually
> Can I have rgmanager behave differently based on the return code of my
> init script?
> One aim of rhel 5 was to have all the init scripts LSB compliant (in
> particular this means that a stop action against a not running service
> should always return 0)
> See https://bugzilla.redhat.com/show_bug.cgi?id=151104
> As they provide a custom ocf script for PostgreSQL, probably they
> didn't correct the standard init script
> Here @home, where I'm sitting now, I don't have CentOS/RHEL; but I
> have an F11 system and still I get on it:
> [root at tekkaman ~]# service postgresql stop
> Stopping postgresql service:                               [  OK  ]
> [root at tekkaman ~]# echo $?
> 0
> [root at tekkaman ~]# service postgresql stop
> Stopping postgresql service:                               [FAILED]
> [root at tekkaman ~]# echo $?
> 1
> When you manually relocate, you start in a situation where you have
> the service running, so that the stop action succeeds and the same is
> true for the start action on the other node.
> When you kill postmaster, at the first check the status action fails,
> so that rgmanager is designed to
> 1) stop the service (I think to eventually clean things in case of
> improper termination of service, to release locks, and more
> importantly to protect from data corruption)
> 2) start the service (and its dependencies then if successfully) in
> the other node
> As the stop fails too, rgmanager gives up (probably you get your
> service in a FAILED state, I presume).
> It thinks this way: if I'm not able to cleanly stop the service,
> probably it is better not to try to start it on the other node.....
> DATA protection is first priority...
> and I agree wit this!
> So probably your choice is between:
> 1) manually modify postgresql standard init script
> 2) use the provided ocf resource postgres-8
> I warmly suggest 2).... did you try it?
> HIH,
> Gianluca

James Perry
Principal Consultant
Mezeo Software
t: 713.244.0859
f: 713.244.0851
m: 713.444.0251

More information about the drbd-user mailing list