[DRBD-user] Stale NFS file handle problem

Thu Nov 3 08:33:32 CET 2011

N

Wish I could help, but I just stopped trying to solve a very similar problem.  Part of my design issue was (I believe) that I was mounting the NFS shares from the NFS servers themselves.. the idea was that on my two servers, I would always have access to /data/share, which was nfs-mounted from the floating IP, so the server could also be the client. That broke frustratingly (many zombie processes, much hair lost) and I eventually decided to go with a less exciting active/passive, shutdown-to-migrate-your-vm solution.

My NFS-fu was weak.  But this solution is simpler and less prone to zombies attacks.

Where are you mounting your exports from?  A third machine?  FWIW, I had THAT working just fine.  I did quite a bit of testing with reading and writing large files, and reading and writing many little files, and so on, whilst doing the switchover, and it seemed to work well.  I followed probably the exact same HOWTOs you did - the linode library one was memorable.  As long as I wasn't making my machines be both nfs-servers AND nfs-clients, it was smooth.

I got some value out of running all this testing inside virtual machines - made it easier to reboot/restore/start-from-scratch when it all went wrong.

N

On 2 Nov 2011, at 12:21, Mike wrote:

> Hello list,
> 
> 	I am working with an idential pair of boxes running debian squeeze and have followed howto's to configure them up as master/slave replicated with drbd + heartbeat. For the most part, everything works, replication is happening, I can shut down the primary and the file system is still available and I can go back and forth. However, after a switch, I get 'stale nfs file handle' errors for any files being written to at the time the switch happens but this appears to be only for those files any any new files touched/created have no such issue.
> 
> 	I have 'fsid=1' in both /etc/exports. I am mounting the nfs server with UDP. I have /var/lib/nfs on the shared file system. I have -t servername on my rpc options. I have rechecked all of these on both servers. The file system in question is an LVM volume (/dev/vg00/replicated) containing a btrfs file system. I have looked at every google result about this issue and nothing has seemed to help. I've changed the nfs-kernel-server script to kill with -9 and I removed the 'exportfs -au' lines as well. No dice.
> 
> 	Any help out there?
> 
> 	Thanks.
> 
> Mike-
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user