Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
So, i'm not sure if this will be helpful, but it certainly can't hurt. Last month i set up my first HA-NFS setup, backend for frontend PHP servers. I had some previous experience with the corosync/pacemaker/heartbeat/drbd suite - a rabbitmq setup a year ago, and i set up the PHP frontend servers using the suite, before realizing that an NFS backend would be more appropriate. Suffice to say that the suite is dense, particularly the corosync crm configuration. It was a challenging week of poking, prodding, relentless google searching and discovery to get it all working. I referenced all documents others have mentioned, possibly more! The document that really got me into the 'functioning' realm was the linbit pdf 'Highly available NFS storage with DRBD and Pacemaker'. Even so, i made my own adjustments and modifications to the described setup to make it happen. On each server i had a spare LVM partition to use for drbd. Initially, i thought that i *had to* use nested LVM's to make this work ( http://www.drbd.org/users-guide/s-nested-lvm.html ). That accounted for a large part of the frustration. Eventually i learned (by trial and mostly error) that i didn't have to use nested LVM's, i could just use them as is. In forcing a failover ('service corosync restart'), there's 15 to 20 seconds during which the frontend web falters, waiting on the other corosync to see the failure, then remount on the other box. That's a tolerable pause for me. I could probably lower that by tweaking the various timeouts in the corosync config, but i haven't touched that yet (i've been afraid to mess with it and break it!) Below are each of the relevant configs and info, or snippets thereof (eliding generally default stuff with '[...]'), in hopes that maybe they will be of value to someone. this is all on centos 5.8. naturally, all noted configs are identical between the two servers. [root at nfs-a ~]# uname -a Linux nfs-a 2.6.18-308.4.1.el5 #1 SMP Tue Apr 17 17:08:00 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux ================================================================ [root at nfs-a ~]# rpm -qa|grep drbd;rpm -qa|grep corosync;rpm -qa|grep heartbeat;rpm -qa|grep pacemaker drbd83-8.3.13-2.el5.centos kmod-drbd83-8.3.13-1.el5.centos corosynclib-1.2.7-1.1.el5 corosync-1.2.7-1.1.el5 heartbeat-libs-3.0.3-2.3.el5 heartbeat-3.0.3-2.3.el5 pacemaker-libs-1.0.12-1.el5.centos pacemaker-1.0.12-1.el5.centos [root at nfs-a ~]# ================================================================ [root at nfs-a ~]# cat /etc/hosts 127.0.0.1 nfs-a localhost.localdomain localhost 10.255.20.58 nfs-a 10.255.20.59 nfs-b 10.255.20.204 nfs-shared ================================================================ [root at nfs-a ~]# df -h Filesystem Size Used Avail Use% Mounted on [...] /dev/drbd1 20G 537M 18G 3% /srv/nfs/html ================================================================ [root at nfs-b ~]# df -h Filesystem Size Used Avail Use% Mounted on [...] ================================================================ [root at nfs-a ~]# cat /proc/drbd version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by mockbuild at builder10.centos.org, 2012-05-07 11:56:36 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:3755180 nr:3535432 dw:7290612 dr:156289 al:179 bm:10 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 ================================================================ [root at nfs-a ~]# cat /etc/drbd.d/global_common.conf [...] common { startup { wfc-timeout 0; degr-wfc-timeout 120; } disk { on-io-error detach; } syncer { rate 200M; al-extents 257; } ================================================================ [root at nfs-a ~]# cat /etc/drbd.d/resources.res resource nfs-volume { device /dev/drbd1; disk /dev/VolGroup00/LogVol02; # the actual disk partition meta-disk internal; # use canonical hostname for 'on <something>' on nfs-a { address 10.255.20.58:7789; } # use canonical hostname for 'on <something>' on nfs-b { address 10.255.20.59:7789; } } ================================================================ [root at nfs-a ~]# lvdisplay [...] --- Logical volume --- LV Name /dev/VolGroup00/LogVol02 VG Name VolGroup00 LV UUID c46CsV-bOmj-6idR-I48H-7O3t-48e3-wKrfVh LV Write Access read/write LV Status available # open 2 LV Size 19.53 GB Current LE 625 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:2 ================================================================ [root at nfs-a ~]# cat /etc/ha.d/ha.cf [...] # Communications udpport 698 # Change last digit so that it doesn't # conflict with another instance of HA bcast eth0 [...] ================================================================ In /etc/lvm/lvm.conf, *no* filtering, contrary to the guide ================================================================ [root at nfs-a ~]# cat /etc/corosync/corosync.conf totem { [...] interface { ringnumber: 0 # The following three values need to be set based on your environment bindnetaddr: 10.255.20.0 # your local subnet mcastaddr: 226.94.1.58 # set last octet to last octet of one of the # server's IP's. This prevents conflict with # other corosyncs that may be running on same network mcastport: 5405 [...] ================================================================ Finally the working crm configuration. Note that i used different mnemonics for primitives, etc, as the guide's use of all lowercase just made it all swim in front of my eyes. :) node nfs-a node nfs-b primitive DAEMON lsb:nfs \ op monitor interval="30" \ meta target-role="Started" primitive DRBD ocf:linbit:drbd \ params drbd_resource="nfs-volume" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="15" role="Master" \ op monitor interval="30" role="Slave" primitive EXPORT ocf:heartbeat:exportfs \ params fsid="1" directory="/srv/nfs/html" options="rw,mountpoint,no_root_squash" \ clientspec="10.255.20.0/255.255.255.0" wait_for_leasetime_on_stop="true" \ op monitor interval="0" timeout="40" \ op start interval="0" timeout="40" \ meta target-role="Started" primitive FILESYSTEM ocf:heartbeat:Filesystem \ params device="/dev/drbd1" directory="/srv/nfs/html" fstype="ext4" \ options="nobarrier,noatime" \ op monitor interval="10" timeout="40" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ meta target-role="Started" primitive VIRTUALIP ocf:heartbeat:IPaddr2 \ params ip="10.255.20.204" broadcast="10.255.20.255" nic="eth0:1" cidr_netmask="24" \ op monitor interval="30" \ meta target-role="Started" ms msDRBD DRBD \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" \ is-managed="true" target-role="Master" clone clDAEMON DAEMON clone clEXPORT EXPORT colocation NFSALL-on-msDRBD inf: clDAEMON clEXPORT FILESYSTEM VIRTUALIP msDRBD:Master order msDRBD-before-FILESYSTEM inf: msDRBD:promote FILESYSTEM:start order FILESYSTEM-before-clDAEMON inf: FILESYSTEM clDAEMON order clDAEMON-before-clEXPORT inf: clDAEMON clEXPORT order clEXPORT-before-VIRTUALIP inf: clEXPORT VIRTUALIP property $id="cib-bootstrap-options" \ dc-version="1.0.12-unknown" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1338327933" rsc_defaults $id="rsc-options" \ resource-stickiness="200" ================================================================ [root at nfs-a etc]# crm status ============ Last updated: Wed Jun 6 17:01:29 2012 Stack: openais Current DC: nfs-a - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 5 Resources configured. ============ Online: [ nfs-a nfs-b ] FILESYSTEM (ocf::heartbeat:Filesystem): Started nfs-a VIRTUALIP (ocf::heartbeat:IPaddr2): Started nfs-a Master/Slave Set: msDRBD Masters: [ nfs-a ] Slaves: [ nfs-b ] Clone Set: clDAEMON Started: [ nfs-a ] Stopped: [ DAEMON:1 ] Clone Set: clEXPORT Started: [ nfs-a ] Stopped: [ EXPORT:1 ] -- Paul Theodoropoulos www.anastrophe.com