[DRBD-user] Re: Unable to make DRBD Resource Secondary

Todd Denniston Todd.Denniston at ssa.crane.navy.mil
Thu Nov 10 16:28:39 CET 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars Marowsky-Bree wrote:
> 
> On 2005-11-10T09:36:33, Todd Denniston <Todd.Denniston at ssa.crane.navy.mil> wrote:
> 
> > I really hate when I am attempting to manually do a fallover[1] between
> > machines and it fails when the original primary can't seem to let go of a
> > drbd resource because of something in the kernel holding on (aka lsof &
> > fuser cant find anything).
> >
> > [1] issue `service heartbeat stop`  on redhat/fedora.
> 
> That really shouldn't happen.
> 
> I trust that you are quite aware of how to use fuser/lsof and were
> looking for the right things in there... fuser w/ and w/o -m on the drbd
> block device _ought_, in theory, to list all the files. And if it is not
> mounted (check via /proc/mounts), NFS shouldn't be able to have any
> hidden references to it either.
> 
> If all these predicates are right and it still can't set the device to
> secondary mode claiming something has the device opened, that would be a
> bug.
> 
Fedora Core 1 (yum'ed up to date with fedora legacy)
kernel-source-2.4.22-1.2199.nptl 
      with patches for aic7xxx R6.3.5
DRBD 0.6.13
heartbeat 1.0.4-2.rh.9
with 7 resources in my drbd.conf
NFS clients varying among RedHat 6 - FC 4, Slack 8 - 10, sun os - solaris
10.

Yes, ancient I know, the upgrade is planned for soon, if I can get the
confidence that drbd 0.7.x + linux 2.6.x + SCSI + single processor is stable
(see this mailing list for recent conversations of corruption).

When I do the 'service heartbeat stop', sometimes (not always) the
/etc/ha.d/resource.d/datadisk will give a message about `fuser -k -m 
device` failing, then I usually try a few more times with 
`fuser -v -k -m /dev/nb1` and it almost never shows me anything and if it
has failed once it never gets the device released, so I can't umount the the
filesystem on the device.  I usually resort to sync a few times, remounting
the filesystem read only, sync a few more times, try umounting and fusering
in vain hope, issue reboot/halt and when the box stops responding (remember
the kernel can't let go) push the power button.

IIRC it is always /dev/nb1 that causes the problem, this device happens to
be the nfs "home" directory for my users.

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane) 
Harnessing the Power of Technology for the Warfighter



More information about the drbd-user mailing list