Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Dec 9, 2009 at 9:03 PM, James Perry <jperry at mezeo.com> wrote: > Hi Gianluca, > > You have described exactly what I'm doing to test this. I kill the postmaster process. I've tried both a script resource as well as a postgres-8 resource but neither seem to work. What's odd is that I can relocate the service to either node without issue. It's only when I "Break" a node does the recovery fail. > > Could this be a CentOS version issue? I'm using CentOS 5.3. > > Let me try to clean up all my resources and try again. I'll also be installing a CentOS 5.4 to see if that matters. > > Would you by chance have an example cluster.conf file that has Postgres and DRBD? > > Thanks! > probably we are going off topic; sorry to the list... I have not managed PostgreSQL in a HA environment yet. But I think I was partially wrong inside my previous post. Important: Always return "0" if the status is non-fatal. So in both cases where you get 1 or 3, they are fatal.... and my answer was not correct I think you should read http://sources.redhat.com/cluster/wiki/FAQ/RGManager and in particular the sections regarding: The rgmanager keeps stopping and restarting mysql/named/ypserv/httpd/other script. Why? and eventually Can I have rgmanager behave differently based on the return code of my init script? One aim of rhel 5 was to have all the init scripts LSB compliant (in particular this means that a stop action against a not running service should always return 0) See https://bugzilla.redhat.com/show_bug.cgi?id=151104 As they provide a custom ocf script for PostgreSQL, probably they didn't correct the standard init script Here @home, where I'm sitting now, I don't have CentOS/RHEL; but I have an F11 system and still I get on it: [root at tekkaman ~]# service postgresql stop Stopping postgresql service: [ OK ] [root at tekkaman ~]# echo $? 0 [root at tekkaman ~]# service postgresql stop Stopping postgresql service: [FAILED] [root at tekkaman ~]# echo $? 1 When you manually relocate, you start in a situation where you have the service running, so that the stop action succeeds and the same is true for the start action on the other node. When you kill postmaster, at the first check the status action fails, so that rgmanager is designed to 1) stop the service (I think to eventually clean things in case of improper termination of service, to release locks, and more importantly to protect from data corruption) 2) start the service (and its dependencies then if successfully) in the other node As the stop fails too, rgmanager gives up (probably you get your service in a FAILED state, I presume). It thinks this way: if I'm not able to cleanly stop the service, probably it is better not to try to start it on the other node..... DATA protection is first priority... and I agree wit this! So probably your choice is between: 1) manually modify postgresql standard init script 2) use the provided ocf resource postgres-8 I warmly suggest 2).... did you try it? HIH, Gianluca