Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We're running an HA-NFS setup where the clustered servers are also NFS clients of themselves. Initially, I was having a problem where the failover would hang up on the 'fuser' call in the Filesystem script. It would typically work if I commented this call out, however, occasionally this would result in a botched failover. This would happen if there were any processes running on the mount point -- and since we're mounting the user home directories, anyone logged into their account could cause hb_standby to fail. Ultimately, I believe I've fixed the issue by issuing (via a custom resource.d script) a 'fuser -mk /nfs-mount-point' just before stopping the nfs server. This works for us since we use the NFS mount points as data directories - no critical server processes are run from them. So far this has worked flawlessly for us...although I've only been using DRBD for a couple weeks now, so I'd be interested in hearing if anyone thinks this might be a bad idea. - Patrick ________________________________ PATRICK K. JAROMIN VP Technology 574.234-2211 . fax: 312.943.9675 JGSullivan Interactive, Inc. http://JGSullivan.com > -----Original Message----- > From: drbd-user-bounces at linbit.com [mailto:drbd-user-bounces at linbit.com] > On Behalf Of Dave Dykstra > Sent: Friday, March 04, 2005 11:21 AM > To: drbd-user at linbit.com > Subject: Re: [DRBD-user] HA-NFS & drbd > > [I haven't seen Paul's message come through drbd-user yet but he did > send it there in addition to direct to me.] > > On Thu, Mar 03, 2005 at 03:53:45PM -0500, Paul Nowoczynski wrote: > > Hi, > > > > > > > > I need some clarification on this. I have tried it both ways and have > > > had a little better luck with having /var/lib/nfs be separate on the > two > > > servers than having it be a symlink to the shared filesystem. I have > not > > > seen EPERM errors after failover, but I do sometimes have NFS > filesystems > > > show up in 'df' on some clients as having some huge number of free > blocks > > > (which I assumed was a misinterpretation of -1) until I do "exportfs - > r" > > > on the new active server. Is that an EPERM error? It usually is not > on > > > all clients, just some of the ones in the netgroup that the filesystem > is > > > exported-to (and I've even seen it when explicitly listing all the > hosts > > > in /etc/exports). I'm quite sure I saw these problems with > /var/lib/nfs > > > as a symlink to the shared fileserver or not. If I make /var/lib/nfs > > > a symlink to the shared filesystem, then it disappears on the standby > > > server and 'df' hangs there when it gets up to the shared filesystem > > > which I have mounted from the activer server by NFS. > > > > After some trip-ups, I've got a similar config working very well. > > I've been sharing varlibnfs the entire time and have not seen any > > problems - even after 20 or 30 failovers (hard and soft). One thing > > I have learned is that running an nfs server on your standby machine > > is a bad idea. I chkconfig nfs off on both of the machines and let > > heartbeat start nfsd after it has mounted drbd. If the machine is > > on standby, nfsd is not started at all. One time, after a machine was > > rebuilt, nfsd was on be default and failovers ceased to work at all. > > I'm not running nfsd on the standby server either. I was only talking > about acting as an NFS client. > > > I have not been mounting the ha share on the failover cluster nodes, > > but I don't think that would be a bad thing - unless the nfs client > > translates the vip to the loopback. Are you explicitly mounting the > > vip via the nfs client or the real ip? I'd recommend that you don't > allow > > any non-vital processes on your failover cluster. > > I'm mounting the filesystem via the virtual IP address so it can continue > to work before & after a failover. > > > paul > > - Dave > > > > > > > Is it in general a Very Bad Thing to NFS-mount the shared filesystem > on > > > the HA-NFS servers? I have never seen anybody explicitly state that, > > > although I'm beginning to come to that conclusion. I've had problems > with > > > fuser hanging on failover, and even after avoiding that I still > sometimes > > > see hangs on shutdown that I'm quite sure are related to operations > > > attempting to access the non-responding NFS mountpoint. In my case > the > > > shared filesystem holds almost everybody's home directories so it's > rather > > > a pain to not be able to access them on the standby shared file > server. > > > I need to allow people to log in to the active server so I'd have to > > > have their home directories be set up there to be symlinks directly to > > > the mounted filesystem (because that's how we do CVS accesses to avoid > > > problems with CVS over NFS), but that means that when a failover > happens > > > every process that is directly accessing the filesystem will get > killed > > > which isn't very friendly. > > > > > > - Dave Dykstra > > > _______________________________________________ > > > drbd-user mailing list > > > drbd-user at lists.linbit.com > > > http://lists.linbit.com/mailman/listinfo/drbd-user > > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user