Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-10-14 23:59:42 -0400 \ Omar Kilani: > Hello, World, > > I'm in the process of moving my RHEL3 based NFS server onto DRBD. > Setup and configuration is fine, replication works, etc. > > Unfortunately, if I stop the NFS service, remount the block device on > which NFS exports from onto DRBD, and start up again, my NFS clients > start receiving 'stale file handle' errors. AFAICS, this should work, > since NFS has no inherent tie to the lower level device -- it sits on > top of the file system, after all. > > Stopping NFS, unmounting from the DRBD block dev, remounting the lower > level device and starting NFS again gets things working on the client > side. So I was wondering what the problem could be? NFS "handles" are in most implementations basically just <block device number>:<inode number>, which is supposed to be unique, and easily available. but this means that, if you change something on your nfs server, and now the data is on some other block device (drbd), the block device number changes, and thus all handles are invalid... if you change back, suddenly the handles are valid again. so there is no chance (without patching the nfs server to support a configurable mapping of exported handle numbers to block dev numbers) to migrate the data on the nfs server to some other block device without rebooting / forcefully remounting the clients. > I'm using an *external* meta data device. > > Oh, one last question. I've got: > > wfc-timeout 60; > degr-wfc-timeout 120; # 2 minutes. it does not make sense at all to have a degraded timeout of less than your non-degraded timeout. > > yet, when the drbd initscript runs, it just waits forever (although the > message does say "the timeout for resource X is 60 seconds"). I'm > obviously missing something... I haven't used drbd in production since > 2001 (version 0.5.8 is still running well... :) so I'm probably not up > with the latest configuration syntax. :) you need to specify for all resources. the message only prints the values for the first resource in the configuration file. so if one of your later resources happens to have the default of "0", it would print something about 60 seconds, but it would still wait forever. this is maybe subotimal, and I guess we could move those timeouts into the global section. but it is more flexible the way it is, and there may be configurations where different timeouts for different resources do make sense. Lars Ellenberg -- please use the "List-Reply" function of your email client.