[DRBD-user] Force Primary at Boot - Newbie

Wed Jul 26 23:10:21 CEST 2006

Hello All,

I have figured out 90% of how to configure a 2 node drbd/heartbeat
cluster. No matter what I do, I can't seem to get the primary node to
recognize /dev/drbd0 as primary when I reboot both nodes in the cluster.
As a result, heartbeat fails the resource when it attempts to run the
"Filesystem" script. They both show up as secondary. Can someone provide
some insight into a way to ensure that my primary node in the cluster will
always mark the drbd0 device on that system as primary (provided there are
no failures or errors)?

Thanks,

Darren

Here is my system info:

ha1 - primary node
ha2 - secondary node
OS - RHEL4
DRBD - 0.7.20
Heartbeat - 2.0.6

I am trying to mount the /data1 directory to /dev/drbd0 using heartbeat.
When I reboot the primary node, I get nothing:

ha1# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             10317828    765640   9028072   8% /
none                   1037372         0   1037372   0% /dev/shm

I have to manually force the primary:

ha1# drbdadm primary all
ha1# /etc/init.d/heartbeat stop
ha1# /etc/init.d/heartbeat start
ha1# df -k
# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             10317828    765640   9028072   8% /
none                   1037372         0   1037372   0% /dev/shm
/dev/drbd0            57914656    800952  54171804   2% /data1

I have moved the drbd rc script to run ahead of heartbeat by naming it
S40drbd and heartbeat S99heartbeat. I have also triple checked to make
sure that all the drbd, ha, and init script are identical on both systems.

I believe I interpreted the timeout parameters correctly in the
/etc/drbd.conf. My understanding is that a positive number here will force
the node to become primary:

 startup {
                wfc-timeout  1;
#               degr-wfc-timeout 120;    # 2 minutes.
        }

When I reboot the primary and secondary nodes (5 seconds apart from each
other), I receive the following info in dmesg on the primary node, stating
that both nodes are in secondary:

ha1# dmesg | grep drbd
drbd: initialised. Version: 0.7.20 (api:79/proto:74)
drbd: SVN Revision: 2260 build by root at ha2.strongmail.net, 2006-07-21
16:12:22
drbd: registered as block device major 147
drbd0: resync bitmap: bits=14709516 words=459674
drbd0: size = 56 GB (58838062 KB)
drbd0: 0 KB marked out-of-sync by on disk bit-map.
drbd0: Found 6 transactions (64 active extents) in activity log.
drbd0: drbdsetup [3052]: cstate Unconfigured --> StandAlone
drbd0: drbdsetup [3065]: cstate StandAlone --> Unconnected
drbd0: drbd0_receiver [3066]: cstate Unconnected --> WFConnection
drbd0: drbd0_receiver [3066]: cstate WFConnection --> WFReportParams
drbd0: Handshake successful: DRBD Network Protocol version 74
drbd0: Connection established.
drbd0: I am(S): 1:00000002:00000001:00000013:00000001:00
drbd0: Peer(S): 1:00000002:00000001:00000013:00000001:00
drbd0: drbd0_receiver [3066]: cstate WFReportParams --> Connected
drbd0: Secondary/Unknown --> Secondary/Secondary

This is further confirmed by heartbeat:

ha1# tail -100 /var/log/ha-log
<snip>

ResourceManager[3510]:  2006/07/26_06:58:10 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data1 ext3 start
Filesystem[3997]:       2006/07/26_06:58:10 INFO: Running start for
/dev/drbd0 on /data1
Filesystem[3997]:       2006/07/26_06:58:10 ERROR: Couldn't mount
filesystem /dev/drbd0 on /data1
Filesystem[3933]:       2006/07/26_06:58:10 ERROR: Filesystem Generic error
ResourceManager[3510]:  2006/07/26_06:58:10 ERROR: Return code 1 from
/etc/ha.d/resource.d/Filesystem
ResourceManager[3510]:  2006/07/26_06:58:10 CRIT: Giving up resources due
to failure of Filesystem::/dev/drbd0::/data1::ext3

My /etc/fstab file has the correct entry on both ha1 and ha2 nodes:

/dev/drbd0              /data1                  ext3    noauto          0 0

Here is the /etc/ha.d/haresources file:

ha1.example.net 192.168.29.184/28/eth0/192.168.29.255
Filesystem::/dev/drbd0::/data1::ext3