Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi!
I'm trying to set up a drbd+heartbeat NFS-server. Most things work fine,
but if I write to the NFS storage during failover, I get a Stale NFS
file handle error:
root:~> cp /tmp/large_file /mnt
cp: writing `/mnt/large_file': Stale NFS file handle
cp: closing `/mnt/large_file': Stale NFS file handle
I can get rid of this error, if I insert a small amount of time before
taking over the IP:
[/etc/ha.d/haresources]
node1 datadisk::drbd0 nfs-kernel-server nfs-common \
wait_n_seconds::5 IPaddr::160.45.32.173
(wait_n_seconds::5 just sleeps for 5 seconds). Putting the ip in front
as suggested in http://www.slackworks.com/~dkrovich/DRBD/heartbeat.html
doesn't work at all.
drbd/documentation/NFS-Server-README.txt suggests to remove any
"exportfs -au" from nfs init-scripts. But that has the effect of
heartbeat no longer being able to unmount the drbd device on failover,
followed by a reboot of the primary node followed by a re-sync.
node1 datadisk: ===> datadisk drbd0 stop <===
node1 datadisk: 'drbd0' /dev/nbd/0 is mounted on /drbd/0, trying to unmount
node1 datadisk: 'drbd0' trying to kill users of /dev/nbd/0
node1 datadisk: fuser -k -m /dev/nbd/0
node1 datadisk: umount -v /dev/nbd/0
node1 heartbeat: CRIT: Resource STOP failure. Reboot required!
node1 heartbeat: CRIT: Killing heartbeat ungracefully!
This behaviour can be reproduced by:
root:~> mount /dev/hda3 /mountpoint
root:~> exportfs -vi node1:/mountpoint
exporting node1.physik.fu-berlin.de:/mountpoint
exporting node1.physik.fu-berlin.de:/mountpoint to kernel
root:~> umount /mountpoint
umount: /mountpoint: device is busy
umount: /mountpoint: device is busy
root:~> fuser -k -m /mountpoint
root:~> [NO OUTPUT]
After issuing an "exportfs -au" the filesystem can be unmounted:
root:~> exportfs -au
root:~> umount /mountpoint
root:~> [WORKS]
Thus I can not understand, how the advice given in
NFS-Server-README.txt could have worked.
/var/lib/nfs is on the shared device and as I said, everything works
fine (no data corruption whatsoever), iff I insert the small delay.
So this is not a big problem, but I would like to understand, why
noone else seems to have this problem.
Using:
drbd-0.6.12
heartbeat-1.2.2
kernel 2.4.26
debian woody
Any help is greatly appreciated,
Jens.