Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Marowsky-Bree wrote: > > On 2005-11-10T09:36:33, Todd Denniston <Todd.Denniston at ssa.crane.navy.mil> wrote: > > > I really hate when I am attempting to manually do a fallover[1] between > > machines and it fails when the original primary can't seem to let go of a > > drbd resource because of something in the kernel holding on (aka lsof & > > fuser cant find anything). > > > > [1] issue `service heartbeat stop` on redhat/fedora. > > That really shouldn't happen. > > I trust that you are quite aware of how to use fuser/lsof and were > looking for the right things in there... fuser w/ and w/o -m on the drbd > block device _ought_, in theory, to list all the files. And if it is not > mounted (check via /proc/mounts), NFS shouldn't be able to have any > hidden references to it either. > > If all these predicates are right and it still can't set the device to > secondary mode claiming something has the device opened, that would be a > bug. > Fedora Core 1 (yum'ed up to date with fedora legacy) kernel-source-2.4.22-1.2199.nptl with patches for aic7xxx R6.3.5 DRBD 0.6.13 heartbeat 1.0.4-2.rh.9 with 7 resources in my drbd.conf NFS clients varying among RedHat 6 - FC 4, Slack 8 - 10, sun os - solaris 10. Yes, ancient I know, the upgrade is planned for soon, if I can get the confidence that drbd 0.7.x + linux 2.6.x + SCSI + single processor is stable (see this mailing list for recent conversations of corruption). When I do the 'service heartbeat stop', sometimes (not always) the /etc/ha.d/resource.d/datadisk will give a message about `fuser -k -m device` failing, then I usually try a few more times with `fuser -v -k -m /dev/nb1` and it almost never shows me anything and if it has failed once it never gets the device released, so I can't umount the the filesystem on the device. I usually resort to sync a few times, remounting the filesystem read only, sync a few more times, try umounting and fusering in vain hope, issue reboot/halt and when the box stops responding (remember the kernel can't let go) push the power button. IIRC it is always /dev/nb1 that causes the problem, this device happens to be the nfs "home" directory for my users. -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter