Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi! I'm trying to set up a drbd+heartbeat NFS-server. Most things work fine, but if I write to the NFS storage during failover, I get a Stale NFS file handle error: root:~> cp /tmp/large_file /mnt cp: writing `/mnt/large_file': Stale NFS file handle cp: closing `/mnt/large_file': Stale NFS file handle I can get rid of this error, if I insert a small amount of time before taking over the IP: [/etc/ha.d/haresources] node1 datadisk::drbd0 nfs-kernel-server nfs-common \ wait_n_seconds::5 IPaddr::160.45.32.173 (wait_n_seconds::5 just sleeps for 5 seconds). Putting the ip in front as suggested in http://www.slackworks.com/~dkrovich/DRBD/heartbeat.html doesn't work at all. drbd/documentation/NFS-Server-README.txt suggests to remove any "exportfs -au" from nfs init-scripts. But that has the effect of heartbeat no longer being able to unmount the drbd device on failover, followed by a reboot of the primary node followed by a re-sync. node1 datadisk: ===> datadisk drbd0 stop <=== node1 datadisk: 'drbd0' /dev/nbd/0 is mounted on /drbd/0, trying to unmount node1 datadisk: 'drbd0' trying to kill users of /dev/nbd/0 node1 datadisk: fuser -k -m /dev/nbd/0 node1 datadisk: umount -v /dev/nbd/0 node1 heartbeat: CRIT: Resource STOP failure. Reboot required! node1 heartbeat: CRIT: Killing heartbeat ungracefully! This behaviour can be reproduced by: root:~> mount /dev/hda3 /mountpoint root:~> exportfs -vi node1:/mountpoint exporting node1.physik.fu-berlin.de:/mountpoint exporting node1.physik.fu-berlin.de:/mountpoint to kernel root:~> umount /mountpoint umount: /mountpoint: device is busy umount: /mountpoint: device is busy root:~> fuser -k -m /mountpoint root:~> [NO OUTPUT] After issuing an "exportfs -au" the filesystem can be unmounted: root:~> exportfs -au root:~> umount /mountpoint root:~> [WORKS] Thus I can not understand, how the advice given in NFS-Server-README.txt could have worked. /var/lib/nfs is on the shared device and as I said, everything works fine (no data corruption whatsoever), iff I insert the small delay. So this is not a big problem, but I would like to understand, why noone else seems to have this problem. Using: drbd-0.6.12 heartbeat-1.2.2 kernel 2.4.26 debian woody Any help is greatly appreciated, Jens.