Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I am not running nfs, but I am seeing the same problem as described below. In my case the problem is intermittent. Most failovers are successful, but every so often the "device is held open" error occurs. As mentioned below, heartbeat will reboot the active machine when this problem occurs and proceed with the failover to the standby. Everything works fine after the reboot; the machine comes back up in secondary state as expected. However, I'd like to fix the problem and thus prevent the reboot from occurring. I am able to reproduce the problem in an environment where the machine does not reboot (it stays in the "device is held open" state). At this point I execute commands "lsof", "fuser -mv", "ps", and look at /proc/mounts, but I cannot figure out who is holding open the drbd device. I am running LVM on top of DRBD. It looks to me like my application has been shut down cleanly, files have been unmounted, and the LVM volume group has been deactivated. I looked at the DRBD source code in an attempt to understand how it determines that the device is held open. It looks like it is based on an internal count of open devices. Any tips or suggestions would be greatly appreciated on how to further debug the problem. I am running DRBD version 8.0.11 and heartbeat version 2.1.3. Thanks, Chris > > Hi Guys, > > I'm attempting to set up high availability nfs with heartbeat and DRBD > as the block device, > my problem arises when attempting to fail over from master->slave. > > The process is as follows: > > Kill nfsd with signal 9 (to prevent state/lock file deletion). > Unmount the drbd block device > Change drbd state to secondary > > It is in this last step that I'm facing issues, and receiving this > error: > > hanfs1:~# drbdadm secondary export > /dev/drbd1: State change failed: (-12) Device is held open by someone > Command 'drbdsetup /dev/drbd1 secondary' terminated with exit code 11 > > There are no other services running on these machines, and looking at > lsof output, the only > things that are accessing anything drbd related are the worker, receiver > and asender. > > hanfs1:~# lsof | grep drbd > drbd1_wor 2797 root cwd DIR 254,0 4096 128 > / > drbd1_wor 2797 root rtd DIR 254,0 4096 128 > / > drbd1_wor 2797 root txt unknown > /proc/2797/exe > drbd1_rec 2802 root cwd DIR 254,0 4096 128 > / > drbd1_rec 2802 root rtd DIR 254,0 4096 128 > / > drbd1_rec 2802 root txt unknown > /proc/2802/exe > drbd1_ase 2808 root cwd DIR 254,0 4096 128 > / > drbd1_ase 2808 root rtd DIR 254,0 4096 128 > / > drbd1_ase 2808 root txt unknown > /proc/2808/exe > > Is there any way to work around this locking - forcefully or otherwise? > Heartbeat is resorting > to hard rebooting the machine when failing over due to this message. > Alternatively, has anyone > successfully resolved this issue when utilizing drbd in a ha-nfs > setting? > > Many thanks > > -- > Tony Dodd > Last.fm | http://www.last.fm > Karen House 1-11 Baches Street > London N1 6DL > > check out my music taste at: > http://www.last.fm/user/hawkeviper >