Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Gianluca, You have described exactly what I'm doing to test this. I kill the postmaster process. I've tried both a script resource as well as a postgres-8 resource but neither seem to work. What's odd is that I can relocate the service to either node without issue. It's only when I "Break" a node does the recovery fail. Could this be a CentOS version issue? I'm using CentOS 5.3. Let me try to clean up all my resources and try again. I'll also be installing a CentOS 5.4 to see if that matters. Would you by chance have an example cluster.conf file that has Postgres and DRBD? Thanks! On Dec 9, 2009, at 4:46 AM, Gianluca Cecchi wrote: > On Wed, Dec 9, 2009 at 2:02 AM, James Perry <jperry at mezeo.com> wrote: > [snip] >> Here's the error I'm getting... It appears that DRBD is failing but I can't tell why. >> >> Dec 7 14:34:49 rhcsnode1 clurgmgrd: <notice> Service service:mezeo_ha_db started >> Dec 7 14:36:36 rhcsnode1 clurgmgrd: : <err> script:pgsql_svc: status of /etc/rc.d/init.d/postgresql failed (returned 1) >> Dec 7 14:36:36 rhcsnode1 clurgmgrd: <notice> status on script "pgsql_svc" returned 1 (generic error) >> Dec 7 14:36:36 rhcsnode1 clurgmgrd: <notice> Stopping service service:mezeo_ha_db >> Dec 7 14:36:37 rhcsnode1 clurgmgrd: : <err> script:pgsql_svc: stop of /etc/rc.d/init.d/postgresql failed (returned 1) >> Dec 7 14:36:37 rhcsnode1 clurgmgrd: <notice> stop on script "pgsql_svc" returned 1 (generic error) > > From what you posted, one can only deduce that your > /etc/rc.d/init.d/postgresql script is perhaps not conforming with what > expected. > In fact clurgmgrd is not able to evaluate the result of postgresql status: > script:pgsql_svc: status of /etc/rc.d/init.d/postgresql failed (returned 1) > > Does this depend on you killing postmaster process or other similar? I > don't think so... > On a test server with CentOS 5.4 and a clean postgresql-server > installed, even if I do a kill -9 of the postmaster pid, so that I > have the file /var/run/postmaster.5432.pid without the process itself, > a > service postgresql status gives > [root at c54vm1 ~]# service postgresql status > postmaster is stopped > [root at c54vm1 ~]# echo $? > 3 > > (see also /etc/rc.d/init.d/functions) > > This should be returned to rhcs when a service is not running, AFAIK. > > So, coming back to your system, clurgmgrd decides to stop the service, > because it is not able to evaluate it (again giving an error ...): > script:pgsql_svc: stop of /etc/rc.d/init.d/postgresql failed (returned 1) > > Note also these: > The following rules apply to parent/child relationships in a resource tree: > • Parents are started before children. > • Children must all stop cleanly before a parent may be stopped. > • For a resource to be considered in good health, all its children > must be in good health. > > HIH, > Gianluca > > PS: you have the default resource provided by rhcs for postgresql in > resource section, but you are using standard postgresql init script in > service section as an external script... any reason? James Perry Principal Consultant Mezeo Software t: 713.244.0859 f: 713.244.0851 m: 713.444.0251