Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello Jens, > I'm trying to set up a drbd+heartbeat NFS-server. Most things work fine, > but if I write to the NFS storage during failover, I get a Stale NFS > file handle error: > > root:~> cp /tmp/large_file /mnt > cp: writing `/mnt/large_file': Stale NFS file handle > cp: closing `/mnt/large_file': Stale NFS file handle > > I can get rid of this error, if I insert a small amount of time before > taking over the IP: > > [/etc/ha.d/haresources] > node1 datadisk::drbd0 nfs-kernel-server nfs-common \ > wait_n_seconds::5 IPaddr::160.45.32.173 > [snip] > /var/lib/nfs is on the shared device and as I said, everything works > fine (no data corruption whatsoever), iff I insert the small delay. > So this is not a big problem, but I would like to understand, why > noone else seems to have this problem. > I had the same problem, but solved it in another way. Actually I think this is not a drbd issue at all, but more a heartbeat/nfs/debian problem, so I reported it about 2 or 3 weeks ago to the heartbeat and nfs ML and to the debian nfs-maintainer, unfortunality without any answer. I still not understand whats causing this, but I'm pretty sure that the debian nfs-kernel-server script cannot stop the nfs-server when nfs is started from heartbeat. Just check it yourself, after running '/etc/init.d/heartbeat stop', a 'ps ax' should show running nfs-daemons on this system. Those nfsd processes can only be killed with 'killall -9 nfsd'. So I think after stopping nfs, the nfs-daemons will survive and cause a stale filehandle when drbd is stopped, probably they also ignore the 'exportfs -au' command. I'm also still wondering about the way the debian script is stopping nfs, I checked the script of several distributions and either nfsd's are immediately stopped with signal 9 or first with signal 15 and after a short break with signal 9, however the debian script only stops the daemons with signal 1. Currently our server is down (trouble with the transtec 5008 ide/scsi raid system), so I can't send you my modified nfs-kernel-server script, but I think I only added a sleep and nfsd stop signal 9 after the first stop signal, so something like this: ... start-stop-daemon --stop --oknodo --quiet \ --name nfsd --user 0 --signal 2 sleep 5 start-stop-daemon --stop --oknodo --quiet \ --name nfsd --user 0 --signal 9 However, now I don't understand why it works if you add your 5s break... Could you please confirm the nfs-stop problem? Cheers, Bernd -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20040608/fe9dc3dd/attachment.pgp>