[DRBD-user] Force Primary at Boot - Newbie

Wed Jul 26 23:23:57 CEST 2006

You are missing a "drbddisk" entry in your haresources file.  You didn't 
post your drbd.conf, but if you resource name for /dev/drbd0 is called 
"abc", then your haresources file should look like the following.

ha1.example.net 192.168.29.184/28/eth0/192.168.29.255 drbddisk::abc 
Filesystem::/dev/drbd0::/data1::ext3

Michael Haverkamp

drbd-user-bounces at linbit.com wrote on 07/26/2006 04:10:21 PM:

> Hello All,
> 
> I have figured out 90% of how to configure a 2 node drbd/heartbeat
> cluster. No matter what I do, I can't seem to get the primary node to
> recognize /dev/drbd0 as primary when I reboot both nodes in the cluster.
> As a result, heartbeat fails the resource when it attempts to run the
> "Filesystem" script. They both show up as secondary. Can someone provide
> some insight into a way to ensure that my primary node in the cluster 
will
> always mark the drbd0 device on that system as primary (provided there 
are
> no failures or errors)?
> 
> Thanks,
> 
> Darren
> 
> Here is my system info:
> 
> ha1 - primary node
> ha2 - secondary node
> OS - RHEL4
> DRBD - 0.7.20
> Heartbeat - 2.0.6
> 
> I am trying to mount the /data1 directory to /dev/drbd0 using heartbeat.
> When I reboot the primary node, I get nothing:
> 
> ha1# df -k
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sda1             10317828    765640   9028072   8% /
> none                   1037372         0   1037372   0% /dev/shm
> 
> I have to manually force the primary:
> 
> ha1# drbdadm primary all
> ha1# /etc/init.d/heartbeat stop
> ha1# /etc/init.d/heartbeat start
> ha1# df -k
> # df -k
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sda1             10317828    765640   9028072   8% /
> none                   1037372         0   1037372   0% /dev/shm
> /dev/drbd0            57914656    800952  54171804   2% /data1
> 
> I have moved the drbd rc script to run ahead of heartbeat by naming it
> S40drbd and heartbeat S99heartbeat. I have also triple checked to make
> sure that all the drbd, ha, and init script are identical on both 
systems.
> 
> I believe I interpreted the timeout parameters correctly in the
> /etc/drbd.conf. My understanding is that a positive number here will 
force
> the node to become primary:
> 
>  startup {
>                 wfc-timeout  1;
> #               degr-wfc-timeout 120;    # 2 minutes.
>         }
> 
> When I reboot the primary and secondary nodes (5 seconds apart from each
> other), I receive the following info in dmesg on the primary node, 
stating
> that both nodes are in secondary:
> 
> ha1# dmesg | grep drbd
> drbd: initialised. Version: 0.7.20 (api:79/proto:74)
> drbd: SVN Revision: 2260 build by root at ha2.strongmail.net, 2006-07-21
> 16:12:22
> drbd: registered as block device major 147
> drbd0: resync bitmap: bits=14709516 words=459674
> drbd0: size = 56 GB (58838062 KB)
> drbd0: 0 KB marked out-of-sync by on disk bit-map.
> drbd0: Found 6 transactions (64 active extents) in activity log.
> drbd0: drbdsetup [3052]: cstate Unconfigured --> StandAlone
> drbd0: drbdsetup [3065]: cstate StandAlone --> Unconnected
> drbd0: drbd0_receiver [3066]: cstate Unconnected --> WFConnection
> drbd0: drbd0_receiver [3066]: cstate WFConnection --> WFReportParams
> drbd0: Handshake successful: DRBD Network Protocol version 74
> drbd0: Connection established.
> drbd0: I am(S): 1:00000002:00000001:00000013:00000001:00
> drbd0: Peer(S): 1:00000002:00000001:00000013:00000001:00
> drbd0: drbd0_receiver [3066]: cstate WFReportParams --> Connected
> drbd0: Secondary/Unknown --> Secondary/Secondary
> 
> This is further confirmed by heartbeat:
> 
> ha1# tail -100 /var/log/ha-log
> <snip>
> 
> ResourceManager[3510]:  2006/07/26_06:58:10 info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data1 ext3 start
> Filesystem[3997]:       2006/07/26_06:58:10 INFO: Running start for
> /dev/drbd0 on /data1
> Filesystem[3997]:       2006/07/26_06:58:10 ERROR: Couldn't mount
> filesystem /dev/drbd0 on /data1
> Filesystem[3933]:       2006/07/26_06:58:10 ERROR: Filesystem Generic 
error
> ResourceManager[3510]:  2006/07/26_06:58:10 ERROR: Return code 1 from
> /etc/ha.d/resource.d/Filesystem
> ResourceManager[3510]:  2006/07/26_06:58:10 CRIT: Giving up resources 
due
> to failure of Filesystem::/dev/drbd0::/data1::ext3
> 
> My /etc/fstab file has the correct entry on both ha1 and ha2 nodes:
> 
> /dev/drbd0              /data1                  ext3    noauto 0 0
> 
> Here is the /etc/ha.d/haresources file:
> 
> ha1.example.net 192.168.29.184/28/eth0/192.168.29.255
> Filesystem::/dev/drbd0::/data1::ext3
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user