Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi All, I have a working 2 node RHCS + DRBD (ext3 filesystem replication) instance up and running. I can relocate the service just fine but when I fail the service (kill a process or power off the VM), the service fails to relocate to the other node. Please help!!!! Does ANYONE have a working example of a Postgres HA Cluster? CentOS53 drbd83 Here's the error I'm getting... It appears that DRBD is failing but I can't tell why. Dec 7 14:34:49 rhcsnode1 clurgmgrd[8024]: <notice> Service service:mezeo_ha_db started Dec 7 14:36:36 rhcsnode1 clurgmgrd: [8024]: <err> script:pgsql_svc: status of /etc/rc.d/init.d/postgresql failed (returned 1) Dec 7 14:36:36 rhcsnode1 clurgmgrd[8024]: <notice> status on script "pgsql_svc" returned 1 (generic error) Dec 7 14:36:36 rhcsnode1 clurgmgrd[8024]: <notice> Stopping service service:mezeo_ha_db Dec 7 14:36:37 rhcsnode1 clurgmgrd: [8024]: <err> script:pgsql_svc: stop of /etc/rc.d/init.d/postgresql failed (returned 1) Dec 7 14:36:37 rhcsnode1 clurgmgrd[8024]: <notice> stop on script "pgsql_svc" returned 1 (generic error) Dec 7 14:36:37 rhcsnode1 avahi-daemon[2800]: Withdrawing address record for 10.10.10.150 on eth0. Dec 7 14:36:47 rhcsnode1 kernel: block drbd0: role( Primary -> Secondary ) Dec 7 14:36:47 rhcsnode1 clurgmgrd[8024]: <crit> #12: RG service:mezeo_ha_db failed to stop; intervention required Dec 7 14:36:47 rhcsnode1 clurgmgrd[8024]: <notice> Service service:mezeo_ha_db is failed Here's my config: <?xml version="1.0"?> <cluster alias="pgsql_cluster" config_version="70" name="pgsql_cluster"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="rhcsnode1.localdomain" nodeid="1" votes="1"> <fence/> </clusternode> <clusternode name="rhcsnode2.localdomain" nodeid="2" votes="1"> <fence/> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices/> <rm> <failoverdomains> <failoverdomain name="fo_domain" nofailback="0" ordered="0" restricted="0"> <failoverdomainnode name="rhcsnode1.localdomain" priority="1"/> <failoverdomainnode name="rhcsnode2.localdomain" priority="1"/> </failoverdomain> </failoverdomains> <resources> <ip address="10.10.10.150" monitor_link="1"/> <postgres-8 config_file="/var/lib/pgsql/data/postgresql.conf" name="pgsql_db" postmaster_user="postgres" shutdown_wait="10"/> <fs device="/dev/drbd/by-res/drbd_disk" fstype="ext3" mountpoint="/var/lib/pgsql/data" name="fs_pgsql" options="noatime"/> <script file="/etc/rc.d/init.d/postgresql" name="pgsql_svc"/> <drbd name="res_drbd" resource="drbd_disk"/> </resources> <service autostart="1" domain="fo_domain" exclusive="0" name="mezeo_ha_db" recovery="relocate"> <drbd ref="res_drbd"> <fs ref="fs_pgsql"/> <ip ref="10.10.10.150"/> <script ref="pgsql_svc"/> </drbd> </service> </rm> </cluster> James Perry Principal Consultant Mezeo Software t: 713.244.0859 f: 713.244.0851 m: 713.444.0251