[DRBD-user] HA-NFS & drbd

Patrick Jaromin Patrick at JGSullivan.com
Fri Mar 4 18:35:41 CET 2005


We're running an HA-NFS setup where the clustered servers are also NFS
clients of themselves. 

Initially, I was having a problem where the failover would hang up on the
'fuser' call in the Filesystem script. It would typically work if I
commented this call out, however, occasionally this would result in a
botched failover. This would happen if there were any processes running on
the mount point -- and since we're mounting the user home directories,
anyone logged into their account could cause hb_standby to fail.

Ultimately, I believe I've fixed the issue by issuing (via a custom
resource.d script) a 'fuser -mk /nfs-mount-point' just before stopping the
nfs server. This works for us since we use the NFS mount points as data
directories - no critical server processes are run from them.

So far this has worked flawlessly for us...although I've only been using
DRBD for a couple weeks now, so I'd be interested in hearing if anyone
thinks this might be a bad idea.

- Patrick

________________________________

PATRICK K. JAROMIN

VP Technology

574.234-2211 . fax: 312.943.9675

 

JGSullivan Interactive, Inc.

http://JGSullivan.com


> -----Original Message-----
> From: drbd-user-bounces at linbit.com [mailto:drbd-user-bounces at linbit.com]
> On Behalf Of Dave Dykstra
> Sent: Friday, March 04, 2005 11:21 AM
> To: drbd-user at linbit.com
> Subject: Re: [DRBD-user] HA-NFS & drbd
> 
> [I haven't seen Paul's message come through drbd-user yet but he did
> send it there in addition to direct to me.]
> 
> On Thu, Mar 03, 2005 at 03:53:45PM -0500, Paul Nowoczynski wrote:
> > Hi,
> >
> > >
> > > I need some clarification on this.  I have tried it both ways and have
> > > had a little better luck with having /var/lib/nfs be separate on the
> two
> > > servers than having it be a symlink to the shared filesystem.  I have
> not
> > > seen EPERM errors after failover, but I do sometimes have NFS
> filesystems
> > > show up in 'df' on some clients as having some huge number of free
> blocks
> > > (which I assumed was a misinterpretation of -1) until I do "exportfs -
> r"
> > > on the new active server.  Is that an EPERM error? It usually is not
> on
> > > all clients, just some of the ones in the netgroup that the filesystem
> is
> > > exported-to (and I've even seen it when explicitly listing all the
> hosts
> > > in /etc/exports).  I'm quite sure I saw these problems with
> /var/lib/nfs
> > > as a symlink to the shared fileserver or not.  If I make /var/lib/nfs
> > > a symlink to the shared filesystem, then it disappears on the standby
> > > server and 'df' hangs there when it gets up to the shared filesystem
> > > which I have mounted from the activer server by NFS.
> >
> > After some trip-ups, I've got a similar config working very well.
> > I've been sharing varlibnfs the entire time and have not seen any
> > problems - even after 20 or 30 failovers (hard and soft).  One thing
> > I have learned is that running an nfs server on your standby machine
> > is a bad idea.  I chkconfig nfs off on both of the machines and let
> > heartbeat start nfsd after it has mounted drbd.  If the machine is
> > on standby, nfsd is not started at all.  One time, after a machine was
> > rebuilt, nfsd was on be default and failovers ceased to work at all.
> 
> I'm not running nfsd on the standby server either.  I was only talking
> about acting as an NFS client.
> 
> > I have not been mounting the ha share on the failover cluster nodes,
> > but I don't think that would be a bad thing - unless the nfs client
> > translates the vip to the loopback.  Are you explicitly mounting the
> > vip via the nfs client or the real ip?  I'd recommend that you don't
> allow
> > any non-vital processes on your failover cluster.
> 
> I'm mounting the filesystem via the virtual IP address so it can continue
> to work before & after a failover.
> 
> > paul
> 
> - Dave
> 
> > >
> > > Is it in general a Very Bad Thing to NFS-mount the shared filesystem
> on
> > > the HA-NFS servers?  I have never seen anybody explicitly state that,
> > > although I'm beginning to come to that conclusion.  I've had problems
> with
> > > fuser hanging on failover, and even after avoiding that I still
> sometimes
> > > see hangs on shutdown that I'm quite sure are related to operations
> > > attempting to access the non-responding NFS mountpoint.  In my case
> the
> > > shared filesystem holds almost everybody's home directories so it's
> rather
> > > a pain to not be able to access them on the standby shared file
> server.
> > > I need to allow people to log in to the active server so I'd have to
> > > have their home directories be set up there to be symlinks directly to
> > > the mounted filesystem (because that's how we do CVS accesses to avoid
> > > problems with CVS over NFS), but that means that when a failover
> happens
> > > every process that is directly accessing the filesystem will get
> killed
> > > which isn't very friendly.
> > >
> > > - Dave Dykstra
> > > _______________________________________________
> > > drbd-user mailing list
> > > drbd-user at lists.linbit.com
> > > http://lists.linbit.com/mailman/listinfo/drbd-user
> > >
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list