[DRBD-user] fault tolerant/HA NFS

Thu Jun 7 02:22:22 CEST 2012

So, i'm not sure if this will be helpful, but it certainly can't hurt.

Last month i set up my first HA-NFS setup,  backend for frontend PHP 
servers. I had some previous experience with the 
corosync/pacemaker/heartbeat/drbd suite - a rabbitmq setup a year ago, 
and i set up the PHP frontend servers using the suite, before realizing 
that an NFS backend would be more appropriate. Suffice to say that the 
suite is dense, particularly the corosync crm configuration. It was a 
challenging week of poking, prodding, relentless google searching and 
discovery to get it all working. I referenced all documents others have 
mentioned, possibly more! The document that really got me into the 
'functioning' realm was the linbit pdf 'Highly available NFS storage 
with DRBD and Pacemaker'. Even so, i made my own adjustments and 
modifications to the described setup to make it happen.

On each server i had a spare LVM partition to use for drbd. Initially, i 
thought that i *had to* use nested LVM's to make this work ( 
http://www.drbd.org/users-guide/s-nested-lvm.html ). That accounted for 
a large part of the frustration. Eventually i learned (by trial and 
mostly error) that i didn't have to use nested LVM's, i could just use 
them as is.

In forcing a failover ('service corosync restart'), there's 15 to 20 
seconds during which the frontend web falters, waiting on the other 
corosync to see the failure, then remount on the other box. That's a 
tolerable pause for me. I could probably lower that by tweaking the 
various timeouts in the corosync config, but i haven't touched that yet 
(i've been afraid to mess with it and break it!)

Below are each of the relevant configs and info, or snippets thereof 
(eliding generally default stuff with '[...]'), in hopes that maybe they 
will be of value to someone. this is all on centos 5.8. naturally, all 
noted configs are identical between the two servers.

[root at nfs-a ~]# uname -a
Linux nfs-a 2.6.18-308.4.1.el5 #1 SMP Tue Apr 17 17:08:00 EDT 2012 
x86_64 x86_64 x86_64 GNU/Linux
================================================================
[root at nfs-a ~]# rpm -qa|grep drbd;rpm -qa|grep corosync;rpm -qa|grep 
heartbeat;rpm -qa|grep pacemaker
drbd83-8.3.13-2.el5.centos
kmod-drbd83-8.3.13-1.el5.centos
corosynclib-1.2.7-1.1.el5
corosync-1.2.7-1.1.el5
heartbeat-libs-3.0.3-2.3.el5
heartbeat-3.0.3-2.3.el5
pacemaker-libs-1.0.12-1.el5.centos
pacemaker-1.0.12-1.el5.centos
[root at nfs-a ~]#
================================================================
[root at nfs-a ~]# cat /etc/hosts
127.0.0.1               nfs-a localhost.localdomain localhost
10.255.20.58            nfs-a
10.255.20.59            nfs-b
10.255.20.204           nfs-shared
================================================================
[root at nfs-a ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
[...]
/dev/drbd1             20G  537M   18G   3% /srv/nfs/html
================================================================
[root at nfs-b ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
[...]
================================================================
[root at nfs-a ~]# cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by 
mockbuild at builder10.centos.org, 2012-05-07 11:56:36
  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
     ns:3755180 nr:3535432 dw:7290612 dr:156289 al:179 bm:10 lo:0 pe:0 
ua:0 ap:0 ep:1 wo:f oos:0
================================================================
[root at nfs-a ~]# cat /etc/drbd.d/global_common.conf
[...]
common {
         startup {
                 wfc-timeout 0; degr-wfc-timeout 120;
         }
         disk {
                 on-io-error detach;
         }
         syncer {
                 rate 200M; al-extents 257;
         }
================================================================
[root at nfs-a ~]# cat /etc/drbd.d/resources.res
resource nfs-volume {
     device    /dev/drbd1;
     disk      /dev/VolGroup00/LogVol02; # the actual disk partition
     meta-disk internal;
# use canonical hostname for 'on <something>'
on nfs-a {
     address   10.255.20.58:7789;
   }
# use canonical hostname for 'on <something>'
on nfs-b {
     address   10.255.20.59:7789;
   }
}
================================================================
[root at nfs-a ~]# lvdisplay
  [...]
   --- Logical volume ---
   LV Name                /dev/VolGroup00/LogVol02
   VG Name                VolGroup00
   LV UUID                c46CsV-bOmj-6idR-I48H-7O3t-48e3-wKrfVh
   LV Write Access        read/write
   LV Status              available
   # open                 2
   LV Size                19.53 GB
   Current LE             625
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     256
   Block device           253:2
================================================================
[root at nfs-a ~]# cat /etc/ha.d/ha.cf
[...]
# Communications
udpport                        698     # Change last digit so that it 
doesn't
                                        # conflict with another instance 
of HA
bcast                          eth0
[...]
================================================================
In /etc/lvm/lvm.conf, *no* filtering, contrary to the guide
================================================================
[root at nfs-a ~]# cat /etc/corosync/corosync.conf
  totem {
[...]
         interface {
                 ringnumber: 0
                 # The following three values need to be set based on 
your environment
                 bindnetaddr: 10.255.20.0        # your local subnet
                 mcastaddr: 226.94.1.58          # set last octet to 
last octet of one of the
                                                 # server's IP's. This 
prevents conflict with
                                                 # other corosyncs that 
may be running on same network
                 mcastport: 5405
[...]
================================================================
Finally the working crm configuration. Note that i used different 
mnemonics for primitives, etc, as the guide's use of all lowercase just 
made it all swim in front of my eyes. :)

node nfs-a
node nfs-b
primitive DAEMON lsb:nfs \
         op monitor interval="30" \
         meta target-role="Started"
primitive DRBD ocf:linbit:drbd \
         params drbd_resource="nfs-volume" \
         op start interval="0" timeout="240" \
         op stop interval="0" timeout="100" \
         op monitor interval="15" role="Master" \
         op monitor interval="30" role="Slave"
primitive EXPORT ocf:heartbeat:exportfs \
         params fsid="1" directory="/srv/nfs/html" 
options="rw,mountpoint,no_root_squash" \
         clientspec="10.255.20.0/255.255.255.0" 
wait_for_leasetime_on_stop="true" \
         op monitor interval="0" timeout="40" \
         op start interval="0" timeout="40" \
         meta target-role="Started"
primitive FILESYSTEM ocf:heartbeat:Filesystem \
         params device="/dev/drbd1" directory="/srv/nfs/html" 
fstype="ext4" \
         options="nobarrier,noatime" \
         op monitor interval="10" timeout="40" \
         op start interval="0" timeout="240" \
         op stop interval="0" timeout="100" \
         meta target-role="Started"
primitive VIRTUALIP ocf:heartbeat:IPaddr2 \
         params ip="10.255.20.204" broadcast="10.255.20.255" 
nic="eth0:1" cidr_netmask="24" \
         op monitor interval="30" \
         meta target-role="Started"
ms msDRBD DRBD \
         meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" \
         is-managed="true" target-role="Master"
clone clDAEMON DAEMON
clone clEXPORT EXPORT
colocation NFSALL-on-msDRBD inf: clDAEMON clEXPORT FILESYSTEM VIRTUALIP 
msDRBD:Master
order msDRBD-before-FILESYSTEM inf: msDRBD:promote FILESYSTEM:start
order FILESYSTEM-before-clDAEMON inf: FILESYSTEM clDAEMON
order clDAEMON-before-clEXPORT inf: clDAEMON clEXPORT
order clEXPORT-before-VIRTUALIP inf: clEXPORT VIRTUALIP
property $id="cib-bootstrap-options" \
         dc-version="1.0.12-unknown" \
         cluster-infrastructure="openais" \
         expected-quorum-votes="2" \
         stonith-enabled="false" \
         no-quorum-policy="ignore" \
         last-lrm-refresh="1338327933"
rsc_defaults $id="rsc-options" \
         resource-stickiness="200"
================================================================
[root at nfs-a etc]# crm status
============
Last updated: Wed Jun  6 17:01:29 2012
Stack: openais
Current DC: nfs-a - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
5 Resources configured.
============

Online: [ nfs-a nfs-b ]

  FILESYSTEM     (ocf::heartbeat:Filesystem):    Started nfs-a
  VIRTUALIP      (ocf::heartbeat:IPaddr2):       Started nfs-a
  Master/Slave Set: msDRBD
      Masters: [ nfs-a ]
      Slaves: [ nfs-b ]
  Clone Set: clDAEMON
      Started: [ nfs-a ]
      Stopped: [ DAEMON:1 ]
  Clone Set: clEXPORT
      Started: [ nfs-a ]
      Stopped: [ EXPORT:1 ]

-- 
Paul Theodoropoulos
www.anastrophe.com