[DRBD-user] RHCS + DRBD - Service starting but filesystem not mounting

Thu Dec 3 21:27:59 CET 2009

Hello,

I'm quite new to RHCS and DRBD so please bare with me.

My problem is that when the RHCS service (mezeo_ha_db) starts, the filesystem is not mounted.

I have:
2 node
CentOS 5.3 with RHCS and DRBD (8.3) from CentOS repo.

I think I have DRBD working.  I can issue drbdadm primary <resource> and mount the filesystem, unmount, set to secondary and repeat on second node without issue.  Here's my drbd.conf:

global {
  usage-count yes;
}
common {
  protocol C;
}
resource drbd_disk {
  on rhcsnode1 {
    device    /dev/drbd0;
    disk      /dev/hdc1;
    address   10.10.10.100:7789;
    meta-disk internal;
  }
  on rhcsnode2 {
    device    /dev/drbd0;
    disk      /dev/hdc1;
    address   10.10.10.101:7789;
    meta-disk internal;
  }
}

I've added this resource to a service in RHCS but when I start the service the filesystem is not mounted.  Also, if I relocate the service, the service re-locates but nothing changes at the filesystem level.  Here's my cluster config.

<?xml version="1.0"?>
<cluster alias="pgsql_cluster" config_version="36" name="pgsql_cluster">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="rhcsnode1.localdomain" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="rhcsnode2.localdomain" nodeid="3" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="fo_domain" nofailback="0" ordered="0" restricted="0"/>
                </failoverdomains>
                <resources>
                        <ip address="10.10.10.150" monitor_link="1"/>
                        <postgres-8 config_file="/var/lib/pgsql/data/postgresql.conf" name="pgsql_db" postmaster_user="postgres" shutdown_wait="0"/>
                        <drbd name="res_drbd" resource="drbd_disk">
                                <fs device="/dev/drbd/by-res/drbd_disk" fstype="ext3" mountpoint="/var/lib/pgsql/data" name="fs_pgsql" options="noatime"/>
                        </drbd>
                </resources>
                <service autostart="1" exclusive="0" name="mezeo_ha_db" recovery="relocate">
                        <drbd ref="res_drbd"/>
                </service>
        </rm>
</cluster>

Here's the output from /var/log/messages:

Dec  3 14:20:27 rhcsnode1 luci[2515]: Unable to retrieve batch 979902234 status from rhcsnode1.localdomain:11111: module scheduled for execution
Dec  3 14:20:27 rhcsnode1 kernel: block drbd0: peer( Primary -> Secondary ) 
Dec  3 14:20:27 rhcsnode1 clurgmgrd[2572]: <notice> Starting stopped service service:mezeo_ha_db 
Dec  3 14:20:27 rhcsnode1 kernel: block drbd0: role( Secondary -> Primary ) 
Dec  3 14:20:28 rhcsnode1 clurgmgrd[2572]: <notice> Service service:mezeo_ha_db started 

Anyone's help is greatly appreciated.
Thx