Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello All, I have figured out 90% of how to configure a 2 node drbd/heartbeat cluster. No matter what I do, I can't seem to get the primary node to recognize /dev/drbd0 as primary when I reboot both nodes in the cluster. As a result, heartbeat fails the resource when it attempts to run the "Filesystem" script. They both show up as secondary. Can someone provide some insight into a way to ensure that my primary node in the cluster will always mark the drbd0 device on that system as primary (provided there are no failures or errors)? Thanks, Darren Here is my system info: ha1 - primary node ha2 - secondary node OS - RHEL4 DRBD - 0.7.20 Heartbeat - 2.0.6 I am trying to mount the /data1 directory to /dev/drbd0 using heartbeat. When I reboot the primary node, I get nothing: ha1# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 10317828 765640 9028072 8% / none 1037372 0 1037372 0% /dev/shm I have to manually force the primary: ha1# drbdadm primary all ha1# /etc/init.d/heartbeat stop ha1# /etc/init.d/heartbeat start ha1# df -k # df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 10317828 765640 9028072 8% / none 1037372 0 1037372 0% /dev/shm /dev/drbd0 57914656 800952 54171804 2% /data1 I have moved the drbd rc script to run ahead of heartbeat by naming it S40drbd and heartbeat S99heartbeat. I have also triple checked to make sure that all the drbd, ha, and init script are identical on both systems. I believe I interpreted the timeout parameters correctly in the /etc/drbd.conf. My understanding is that a positive number here will force the node to become primary: startup { wfc-timeout 1; # degr-wfc-timeout 120; # 2 minutes. } When I reboot the primary and secondary nodes (5 seconds apart from each other), I receive the following info in dmesg on the primary node, stating that both nodes are in secondary: ha1# dmesg | grep drbd drbd: initialised. Version: 0.7.20 (api:79/proto:74) drbd: SVN Revision: 2260 build by root at ha2.strongmail.net, 2006-07-21 16:12:22 drbd: registered as block device major 147 drbd0: resync bitmap: bits=14709516 words=459674 drbd0: size = 56 GB (58838062 KB) drbd0: 0 KB marked out-of-sync by on disk bit-map. drbd0: Found 6 transactions (64 active extents) in activity log. drbd0: drbdsetup [3052]: cstate Unconfigured --> StandAlone drbd0: drbdsetup [3065]: cstate StandAlone --> Unconnected drbd0: drbd0_receiver [3066]: cstate Unconnected --> WFConnection drbd0: drbd0_receiver [3066]: cstate WFConnection --> WFReportParams drbd0: Handshake successful: DRBD Network Protocol version 74 drbd0: Connection established. drbd0: I am(S): 1:00000002:00000001:00000013:00000001:00 drbd0: Peer(S): 1:00000002:00000001:00000013:00000001:00 drbd0: drbd0_receiver [3066]: cstate WFReportParams --> Connected drbd0: Secondary/Unknown --> Secondary/Secondary This is further confirmed by heartbeat: ha1# tail -100 /var/log/ha-log <snip> ResourceManager[3510]: 2006/07/26_06:58:10 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data1 ext3 start Filesystem[3997]: 2006/07/26_06:58:10 INFO: Running start for /dev/drbd0 on /data1 Filesystem[3997]: 2006/07/26_06:58:10 ERROR: Couldn't mount filesystem /dev/drbd0 on /data1 Filesystem[3933]: 2006/07/26_06:58:10 ERROR: Filesystem Generic error ResourceManager[3510]: 2006/07/26_06:58:10 ERROR: Return code 1 from /etc/ha.d/resource.d/Filesystem ResourceManager[3510]: 2006/07/26_06:58:10 CRIT: Giving up resources due to failure of Filesystem::/dev/drbd0::/data1::ext3 My /etc/fstab file has the correct entry on both ha1 and ha2 nodes: /dev/drbd0 /data1 ext3 noauto 0 0 Here is the /etc/ha.d/haresources file: ha1.example.net 192.168.29.184/28/eth0/192.168.29.255 Filesystem::/dev/drbd0::/data1::ext3