Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Thanks for your reply. I did try the built-in ocf script but ran into some problems early on. Now that I understand some of this a little better, let me go back and try it again. I just wanted to make sure that it's not something I did wrong with DRBD. Thanks for helping me understand. I'll go find a RHCS forum somewhere to get RGMANAGER questions answered! :) Have a great day! On Dec 9, 2009, at 4:11 PM, Gianluca Cecchi wrote: > On Wed, Dec 9, 2009 at 9:03 PM, James Perry <jperry at mezeo.com> wrote: >> Hi Gianluca, >> >> You have described exactly what I'm doing to test this. I kill the postmaster process. I've tried both a script resource as well as a postgres-8 resource but neither seem to work. What's odd is that I can relocate the service to either node without issue. It's only when I "Break" a node does the recovery fail. >> >> Could this be a CentOS version issue? I'm using CentOS 5.3. >> >> Let me try to clean up all my resources and try again. I'll also be installing a CentOS 5.4 to see if that matters. >> >> Would you by chance have an example cluster.conf file that has Postgres and DRBD? >> >> Thanks! >> > > probably we are going off topic; sorry to the list... > I have not managed PostgreSQL in a HA environment yet. > But I think I was partially wrong inside my previous post. > > Important: Always return "0" if the status is non-fatal. > > So in both cases where you get 1 or 3, they are fatal.... and my > answer was not correct > > I think you should read > http://sources.redhat.com/cluster/wiki/FAQ/RGManager > > and in particular the sections regarding: > The rgmanager keeps stopping and restarting > mysql/named/ypserv/httpd/other script. Why? > and eventually > Can I have rgmanager behave differently based on the return code of my > init script? > > One aim of rhel 5 was to have all the init scripts LSB compliant (in > particular this means that a stop action against a not running service > should always return 0) > See https://bugzilla.redhat.com/show_bug.cgi?id=151104 > > As they provide a custom ocf script for PostgreSQL, probably they > didn't correct the standard init script > Here @home, where I'm sitting now, I don't have CentOS/RHEL; but I > have an F11 system and still I get on it: > [root at tekkaman ~]# service postgresql stop > Stopping postgresql service: [ OK ] > [root at tekkaman ~]# echo $? > 0 > [root at tekkaman ~]# service postgresql stop > Stopping postgresql service: [FAILED] > [root at tekkaman ~]# echo $? > 1 > > When you manually relocate, you start in a situation where you have > the service running, so that the stop action succeeds and the same is > true for the start action on the other node. > When you kill postmaster, at the first check the status action fails, > so that rgmanager is designed to > 1) stop the service (I think to eventually clean things in case of > improper termination of service, to release locks, and more > importantly to protect from data corruption) > 2) start the service (and its dependencies then if successfully) in > the other node > > As the stop fails too, rgmanager gives up (probably you get your > service in a FAILED state, I presume). > It thinks this way: if I'm not able to cleanly stop the service, > probably it is better not to try to start it on the other node..... > DATA protection is first priority... > and I agree wit this! > > So probably your choice is between: > 1) manually modify postgresql standard init script > 2) use the provided ocf resource postgres-8 > > I warmly suggest 2).... did you try it? > HIH, > Gianluca James Perry Principal Consultant Mezeo Software t: 713.244.0859 f: 713.244.0851 m: 713.444.0251