[DRBD-user] State change failed: Device is held open by someone

Lars Ellenberg lars.ellenberg at linbit.com
Mon Mar 16 11:28:55 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Sun, Mar 15, 2009 at 11:20:56PM +0100, Peter Funk wrote:
> Hello,
> 
> today I tried to find out, who is the "someone" in the following 
> drbd syslog messages:
> .... State change failed: Device is held open by someone
> ... drbd0:   state = { cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate r--- }
> ... drbd0:  wanted = { cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate r--- }
> 
> Unfortunately I had no success.  
> 
> Searching the archives I've noticed similar questions came up 
> here on the list earlier:
> http://lists.linbit.com/pipermail/drbd-user/2008-November/010706.html
> http://lists.linbit.com/pipermail/drbd-user/2008-August/009954.html
> http://lists.linbit.com/pipermail/drbd-user/2007-August/007338.html
> (I'm not using ocfs2: The answer Lars Ellenberg gave in the
> last citiation didn't apply to my situation here).
> 
> I've looked into source and found in drbd_main.c, that the message 
> above is given, if( ns.role == Secondary && mdev->open_cnt )
> 
> Can you imagine a race scenario, where the reference counter ``open_cnt``
> might be incremented or decremented wrong?  Especially on a dual
> quad core (menaing it has 8 active CPU cores) SMP machine?

No.
Well. Unless ...

You can check with a tight loop.
just after reboot, and no one has accessed the drbd so far,
and drbd is primary, do
for i in `seq 8`; do
	( i=1000; while let --i; do : < /dev/drbd0 ; done ) &
done

if you did only do that, and you cannot make it secondary after that,
well, then it is probably time to use an atomic_t for open_cnt.

.release operations are serialized in some generic layer.
or at least, have been serialized, iirc -- though of course
I may be wrong, or the kernel has changed without me noticing.

> Any hints, how to find out?

if the filesystem is still mounted ...
oh well. unmount it. not lazy, but really.

of course you also tried lsof and fuser.

if that did not pick up anything,

If nfs was involved, try
	killall -9 nfsd
	killall -9 lockd
	echo 0 > /proc/fs/nfsd/threads

if lvm/dmsetup/kpartx/multipath/udev is involved, try
	dmsetup ls --tree -o inverted
	and check if there are dependencies from drbd.

if loop/cryptoloop/etc is involved, check if one of those
is still accessing them.

if some virtualization tecknique is in use, shut down/destroy all
containers/VMs that may have been accessing that drbd during their
life time.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list