Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Raoul Borenius wrote: > On Mon, Dec 12, 2005 at 10:27:36AM -0500, Todd Denniston wrote: > >>>>If it looks like the drbd failover worked, that is, the /var/lib/nfs >>>>symlink was pointed to drbd0, and drbd0 was there before nfs started.. it >>>>may be something else that is causing stales during failover. >>> >> >>Just for clarification: >>You indicate you use a softlink to put the drbd0 filesystem in place of >>/var/lib/nfs, but above you mount drbd0 at /var/lib/nfs ... which is it, >>and is it the same on BOTH systems? > > > Both nodes are always setup identical. I just tried two different setups: > > 1.) the 'simple' setup with only one drbd as described in > http://linux-ha.org/DRBD_2fNFS, i.e. with a symlink from /var/lib/nfs > to the drbd-volume. Everything works perfectly, no errors on failover. > > 2.) a setup without symlink but /var/lib/nfs as /dev/drbd0 and /srv/nfs/mail > as /dev/drbd1. I get stale nfs errors during failover. > I find setup 2 to be interesting in its failure, because I would think it should work the same, but I have no idea why it fails. > >>I have several exports, from different drbd managed partitions, with a >>softlink /var/lib/nfs -> /nfsstate/var/lib/nfs where nfsstate is my drbd0 >>which until recently did not have problems with stale nfs ... WAIT A >>MINUTE, now I know why I was seeing stale nfs handles the last time I fell >>over: after doing an `ls -l /var/lib/nfs`, I see that for some reason the >>systems have blown away my softlinks and replaced with new boot disk >>resident directories, must fix. Thanks for the wake up. > > > Could you send me your /etc/drbd.conf and /etc/ha.d/haresources? > I'm not sure if you have the setup that I would like to have. > I would, but because I am currently using 0.6.13 I doubt you would actually be able to get any thing out of it other than the fact I put the different partitions in different sync-groups. To me your haresources snipit looks reasonable for 0.7.x though I question two things: 1) Why do you run some kind of killnfsd before running `nfs-common start` and `nfs-kernel-server start`, if nfs for some reason was running when heartbeat got kicked off, what you really need to do is edit your rc config (on fedora I would use ntsysv ) to not start nfs services on boot. 2) why do you have a 3 second sleep between starting nfs and starting your Ethernet config? I did just have a thought on your setup #2 (mounting /dev/drbd0 at /var/lib/nfs), try putting a 3 second sleep between the Filesystem call and the nfs startups. The thought being to make sure the /var/lib/nfs Filesystem is fully ready for use before trying to use it, i.e., see if it is a race condition.