[DRBD-user] HA-NFS & drbd
Patrick Jaromin
Patrick at JGSullivan.com
Fri Mar 4 18:35:41 CET 2005
We're running an HA-NFS setup where the clustered servers are also NFS
clients of themselves.
Initially, I was having a problem where the failover would hang up on the
'fuser' call in the Filesystem script. It would typically work if I
commented this call out, however, occasionally this would result in a
botched failover. This would happen if there were any processes running on
the mount point -- and since we're mounting the user home directories,
anyone logged into their account could cause hb_standby to fail.
Ultimately, I believe I've fixed the issue by issuing (via a custom
resource.d script) a 'fuser -mk /nfs-mount-point' just before stopping the
nfs server. This works for us since we use the NFS mount points as data
directories - no critical server processes are run from them.
So far this has worked flawlessly for us...although I've only been using
DRBD for a couple weeks now, so I'd be interested in hearing if anyone
thinks this might be a bad idea.
- Patrick
________________________________
PATRICK K. JAROMIN
VP Technology
574.234-2211 . fax: 312.943.9675
JGSullivan Interactive, Inc.
http://JGSullivan.com
> -----Original Message-----
> From: drbd-user-bounces at linbit.com [mailto:drbd-user-bounces at linbit.com]
> On Behalf Of Dave Dykstra
> Sent: Friday, March 04, 2005 11:21 AM
> To: drbd-user at linbit.com
> Subject: Re: [DRBD-user] HA-NFS & drbd
>
> [I haven't seen Paul's message come through drbd-user yet but he did
> send it there in addition to direct to me.]
>
> On Thu, Mar 03, 2005 at 03:53:45PM -0500, Paul Nowoczynski wrote:
> > Hi,
> >
> > >
> > > I need some clarification on this. I have tried it both ways and have
> > > had a little better luck with having /var/lib/nfs be separate on the
> two
> > > servers than having it be a symlink to the shared filesystem. I have
> not
> > > seen EPERM errors after failover, but I do sometimes have NFS
> filesystems
> > > show up in 'df' on some clients as having some huge number of free
> blocks
> > > (which I assumed was a misinterpretation of -1) until I do "exportfs -
> r"
> > > on the new active server. Is that an EPERM error? It usually is not
> on
> > > all clients, just some of the ones in the netgroup that the filesystem
> is
> > > exported-to (and I've even seen it when explicitly listing all the
> hosts
> > > in /etc/exports). I'm quite sure I saw these problems with
> /var/lib/nfs
> > > as a symlink to the shared fileserver or not. If I make /var/lib/nfs
> > > a symlink to the shared filesystem, then it disappears on the standby
> > > server and 'df' hangs there when it gets up to the shared filesystem
> > > which I have mounted from the activer server by NFS.
> >
> > After some trip-ups, I've got a similar config working very well.
> > I've been sharing varlibnfs the entire time and have not seen any
> > problems - even after 20 or 30 failovers (hard and soft). One thing
> > I have learned is that running an nfs server on your standby machine
> > is a bad idea. I chkconfig nfs off on both of the machines and let
> > heartbeat start nfsd after it has mounted drbd. If the machine is
> > on standby, nfsd is not started at all. One time, after a machine was
> > rebuilt, nfsd was on be default and failovers ceased to work at all.
>
> I'm not running nfsd on the standby server either. I was only talking
> about acting as an NFS client.
>
> > I have not been mounting the ha share on the failover cluster nodes,
> > but I don't think that would be a bad thing - unless the nfs client
> > translates the vip to the loopback. Are you explicitly mounting the
> > vip via the nfs client or the real ip? I'd recommend that you don't
> allow
> > any non-vital processes on your failover cluster.
>
> I'm mounting the filesystem via the virtual IP address so it can continue
> to work before & after a failover.
>
> > paul
>
> - Dave
>
> > >
> > > Is it in general a Very Bad Thing to NFS-mount the shared filesystem
> on
> > > the HA-NFS servers? I have never seen anybody explicitly state that,
> > > although I'm beginning to come to that conclusion. I've had problems
> with
> > > fuser hanging on failover, and even after avoiding that I still
> sometimes
> > > see hangs on shutdown that I'm quite sure are related to operations
> > > attempting to access the non-responding NFS mountpoint. In my case
> the
> > > shared filesystem holds almost everybody's home directories so it's
> rather
> > > a pain to not be able to access them on the standby shared file
> server.
> > > I need to allow people to log in to the active server so I'd have to
> > > have their home directories be set up there to be symlinks directly to
> > > the mounted filesystem (because that's how we do CVS accesses to avoid
> > > problems with CVS over NFS), but that means that when a failover
> happens
> > > every process that is directly accessing the filesystem will get
> killed
> > > which isn't very friendly.
> > >
> > > - Dave Dykstra
> > > _______________________________________________
> > > drbd-user mailing list
> > > drbd-user at lists.linbit.com
> > > http://lists.linbit.com/mailman/listinfo/drbd-user
> > >
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
More information about the drbd-user
mailing list