Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello Jens,
> I'm trying to set up a drbd+heartbeat NFS-server. Most things work fine,
> but if I write to the NFS storage during failover, I get a Stale NFS
> file handle error:
>
> root:~> cp /tmp/large_file /mnt
> cp: writing `/mnt/large_file': Stale NFS file handle
> cp: closing `/mnt/large_file': Stale NFS file handle
>
> I can get rid of this error, if I insert a small amount of time before
> taking over the IP:
>
> [/etc/ha.d/haresources]
> node1 datadisk::drbd0 nfs-kernel-server nfs-common \
> wait_n_seconds::5 IPaddr::160.45.32.173
>
[snip]
> /var/lib/nfs is on the shared device and as I said, everything works
> fine (no data corruption whatsoever), iff I insert the small delay.
> So this is not a big problem, but I would like to understand, why
> noone else seems to have this problem.
>
I had the same problem, but solved it in another way. Actually I think this is
not a drbd issue at all, but more a heartbeat/nfs/debian problem, so I
reported it about 2 or 3 weeks ago to the heartbeat and nfs ML and to the
debian nfs-maintainer, unfortunality without any answer.
I still not understand whats causing this, but I'm pretty sure that the debian
nfs-kernel-server script cannot stop the nfs-server when nfs is started from
heartbeat. Just check it yourself, after running '/etc/init.d/heartbeat
stop', a 'ps ax' should show running nfs-daemons on this system. Those nfsd
processes can only be killed with 'killall -9 nfsd'. So I think after
stopping nfs, the nfs-daemons will survive and cause a stale filehandle when
drbd is stopped, probably they also ignore the 'exportfs -au' command.
I'm also still wondering about the way the debian script is stopping nfs, I
checked the script of several distributions and either nfsd's are immediately
stopped with signal 9 or first with signal 15 and after a short break with
signal 9, however the debian script only stops the daemons with signal 1.
Currently our server is down (trouble with the transtec 5008 ide/scsi raid
system), so I can't send you my modified nfs-kernel-server script, but I
think I only added a sleep and nfsd stop signal 9 after the first stop
signal, so something like this:
...
start-stop-daemon --stop --oknodo --quiet \
--name nfsd --user 0 --signal 2
sleep 5
start-stop-daemon --stop --oknodo --quiet \
--name nfsd --user 0 --signal 9
However, now I don't understand why it works if you add your 5s break...
Could you please confirm the nfs-stop problem?
Cheers,
Bernd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20040608/fe9dc3dd/attachment.pgp>