Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Todd Denniston wrote: >Dave Dykstra wrote: >>On Tue, Jun 06, 2006 at 08:37:07PM -0700, mike wrote: >> >>>On 6/6/06, Montse Seisdedos <mseisdedos at mediafusion.es> wrote: ... >>>>somebody can help me, is really necesary to mv /var/lib/nfs to the >>>>share fs? >>> >>>I was getting weird errors too trying to symlink the shared NFS dir >>>(I'm using NFS v4 on CentOS 4.3) so DRBD could propogate it. >>> >>>What I realized (and I could be wrong, but it appears to work) is that >>>it doesn't *need* to be shared, as long as on failover you restart the >>>NFS services (portmapper, nfsd, all the RPC crap) - it seems to pick >>>up where it left off. >>> >>>The only thing I can think is possibly if it was in the middle of some >>>process/transaction like writing a large file when it crashed, it >>>might have been able to persist that, but for my setup, those were >>>odds I could live with. >> >> >> >>As far as I have been able to figure out, /var/lib/nfs contains the state >>of exported filesystems and of locks. If you're re-exporting everything >>when you start up the HA service, re-using the exported state from the >>previous active server doesn't really help much. In theory you should be >>able to keep locks across fail-overs, but in practice I've not been able >>to get that to work. I was experimenting with that in order to figure >>out >>what exactly should be on the http://linux-ha.org/HaNFS web page but I >>got >>stuck on that problem so I asked for help on the linux-ha mailing list >>and >>stopped working on the web page. I never did figure out an answer and >>then I got too busy with other things and never got back to it. So in >>practice I don't think it's really necessary to have the /var/lib/nfs >>symlinks even though that's been the advice for the linux-ha+drbd+NFS >>configuration for a long time. I really should clean up that page. >> >>- Dave Dykstra > >One reason for keeping /var/lib/nfs as a sym-link, besides locking, is >that without it clients who have the export mounted while the fallover >happens will start getting stale NFS handle errors when the secondary >takes over. I ran for a while without sharing /var/lib/nfs and did not get stale NFS handle errors. I've never seen stale NFS file handle errors other than when not using drbd to share the filesystem on both sides, although I do see some old email where someone was equating stale NFS file handles with whether or not "exportfs -u" was run when shutting down the active side. I wonder if it's a difference with Linux distributions and especially whether or not the nfs kernel server is used, or if maybe it is a difference in the order in which people stop and start services in haresources. I only used Debian with the kernel nfs server, and made the high-availability service IP address last in the list. Here's the relevant piece of my haresources: drbddisk::home \ Filesystem::/dev/drbd0::/mnt/home::ext3 \ StartStop::nfs-common::/sbin/rpc.statd \ StartStop::nfs-kernel-server::/usr/sbin/rpc.mountd \ IPaddr2::172.18.30.21 (where StartStop is an haresources.d script I wrote that uses the last parameter to implement the "status" function to see whether or not a service is running). > >The other big reason to do it is that if a client not only > >has it mounted but is transferring a file while the fallover happens, > >with the simlink the client just picks up where it stopped getting ACKs > >back, with out the link the client just hangs/crashes/errors. > > > >I suggest don't change that recommendation!!!. I guess I can't say for sure that I tested failovers on active systems while I wasn't sharing /var/lib/nfs, and unfortunately I don't have access to the cluster anymore to try it since I recently changed jobs. I'm hoping I may have more opportunity to set up some systems here and then I'll be able to try it. On Thu, Jun 08, 2006 at 08:17:09AM -0600, Alan Robertson wrote: > Although Dave reports that his locks don't fail over correctly, we had > no trouble with that when we followed the guidelines now on the web > site. So, I second this opinion. I can certainly believe that under different configurations lock failovers could work, but I have not seen it. I would really like to be able to verify what it takes to make it work and why I couldn't get it to work. Is this something that you've tried recently? My email appeal to the linux-ha mailing list on July 21 2005 entitled "Can't get NFS locks to survive failovers" went unanswered. Meanwhile, I won't take off the recommendation to share /var/lib/nfs but I really ought to update the page to list what the outstanding questions still are, especially with locking. - Dave