[DRBD-user] Stale NFS file handle vs. NFS-Server-README.txt

Wed Jun 9 02:58:23 CEST 2004

On Tue, Jun 08, 2004 at 11:35:58PM +0200, Lars Ellenberg wrote:
> / 2004-06-08 22:58:21 +0200
> \ Bernd Schubert:
> 
> > I still not understand whats causing this, but I'm pretty sure that the debian 
> > nfs-kernel-server script cannot stop the nfs-server when nfs is started from 
> > heartbeat. Just check it yourself, after running '/etc/init.d/heartbeat 
> > stop', a 'ps ax' should show running nfs-daemons on this system. Those nfsd 
> > processes can only be killed with 'killall -9 nfsd'. So I think after 
> > stopping nfs, the nfs-daemons will survive and cause a stale filehandle when  
> > drbd is stopped, probably they also ignore the 'exportfs -au' command.

In my case it _only_ works _with_ 'exportfs -au'. And that delay I
already mentioned.

> > I'm also still wondering about the way the debian script is stopping nfs, I 
> > checked the script of several distributions and either nfsd's are immediately 
> > stopped with signal 9 or first with signal 15 and after a short break with 
> > signal 9, however the debian script only stops the daemons with signal 1. 

Interesting. I'll look at Suse/Redhat scripts then...

> stange thing is, that there is a fuser -k -m $device in the drbd script,
> which *should* deliver a kill -9 ...

Yes, I saw that. That's why I tested it in that simple case. I think
the problem is, that it doesn't find any PID to kill. The ha-debug
file contains:

  datadisk: ERROR: fuser -k -m /dev/nbd/0 [1]:
  datadisk: ERROR: NO OUTPUT
  datadisk: ERROR: umount -v /dev/nbd/0 [1]:
  datadisk: ERROR: umount: /drbd/0: device is busy

> maybe retrying some more times does help?

That message is actually repeating quite often, since heartbeat
retries a few times to stop drbd, before it reboots the machine.

Haven't tried your patch yet, but I have little hope. Again that
test-case:

  root at yang:~> mount /dev/hda3 /mountpoint
  root at yang:~> exportfs -vi yang:/mountpoint

This is basically what I have if I remove that "exportfs -au" from my
init-scripts. nfs-kernel-server is stopped, nfs-common is stopped.
Still, umount won't succeed:

  root at yang:~> umount /mountpoint
  umount: /mountpoint: device is busy
  umount: /mountpoint: device is busy

In this situation fuser doesn't find any PID:

  root at yang:~> fuser -m /mountpoint
  root at yang:~>

So it doesn't kill anything. If I cd into that dir, it behaves as
expected:

  root at yang:~> cd /mountpoint 
  root at yang:/mountpoint> fuser -m /mountpoint
  /mountpoint:          8752c
  root at yang:/mountpoint> echo $$
  8752
  root at yang:/mountpoint> fuser -k -m /mountpoint
  /mountpoint:          8752c
  Connection to yang closed.

What PID is fuser _supposed_ to find in this situation? There is no
nfsd running. What exactly does "exportfs yang:/mountpoint" do? It
seems to update the xtab file and pass some info to the kernel (shows
up in /proc/fs/nfs/exports). Is that what the umount is blocked by?
If so, what other than "exportfs -u" can one do?

I'll look at this a little more tomorrow. Thanks for your help so far!

Jens.