[DRBD-user] State change failed: Device is held open by someone

Maros Timko timkom at gmail.com
Mon Mar 16 11:46:48 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Very detailed answer Lars, however we are missing the basic info from Peter.


Peter, do you use Xen on top of DRBD (or some other virtualisation
technology)? Do you use heartbeat to start/stop resources?
If you use your own HA RA you should not return control to heartbeat before
VM is really shutdown. Even after that you usually will notice such log
messages, and thus have to sleep for ~1 second.

Regards.
2009/3/16 Lars Ellenberg <lars.ellenberg at linbit.com>

> On Sun, Mar 15, 2009 at 11:20:56PM +0100, Peter Funk wrote:
> > Hello,
> >
> > today I tried to find out, who is the "someone" in the following
> > drbd syslog messages:
> > .... State change failed: Device is held open by someone
> > ... drbd0:   state = { cs:Connected st:Primary/Secondary
> ds:UpToDate/UpToDate r--- }
> > ... drbd0:  wanted = { cs:Connected st:Secondary/Secondary
> ds:UpToDate/UpToDate r--- }
> >
> > Unfortunately I had no success.
> >
> > Searching the archives I've noticed similar questions came up
> > here on the list earlier:
> > http://lists.linbit.com/pipermail/drbd-user/2008-November/010706.html
> > http://lists.linbit.com/pipermail/drbd-user/2008-August/009954.html
> > http://lists.linbit.com/pipermail/drbd-user/2007-August/007338.html
> > (I'm not using ocfs2: The answer Lars Ellenberg gave in the
> > last citiation didn't apply to my situation here).
> >
> > I've looked into source and found in drbd_main.c, that the message
> > above is given, if( ns.role == Secondary && mdev->open_cnt )
> >
> > Can you imagine a race scenario, where the reference counter ``open_cnt``
> > might be incremented or decremented wrong?  Especially on a dual
> > quad core (menaing it has 8 active CPU cores) SMP machine?
>
> No.
> Well. Unless ...
>
> You can check with a tight loop.
> just after reboot, and no one has accessed the drbd so far,
> and drbd is primary, do
> for i in `seq 8`; do
>        ( i=1000; while let --i; do : < /dev/drbd0 ; done ) &
> done
>
> if you did only do that, and you cannot make it secondary after that,
> well, then it is probably time to use an atomic_t for open_cnt.
>
> .release operations are serialized in some generic layer.
> or at least, have been serialized, iirc -- though of course
> I may be wrong, or the kernel has changed without me noticing.
>
> > Any hints, how to find out?
>
> if the filesystem is still mounted ...
> oh well. unmount it. not lazy, but really.
>
> of course you also tried lsof and fuser.
>
> if that did not pick up anything,
>
> If nfs was involved, try
>        killall -9 nfsd
>        killall -9 lockd
>        echo 0 > /proc/fs/nfsd/threads
>
> if lvm/dmsetup/kpartx/multipath/udev is involved, try
>        dmsetup ls --tree -o inverted
>        and check if there are dependencies from drbd.
>
> if loop/cryptoloop/etc is involved, check if one of those
> is still accessing them.
>
> if some virtualization tecknique is in use, shut down/destroy all
> containers/VMs that may have been accessing that drbd during their
> life time.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
>  _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090316/70c974e0/attachment.htm>


More information about the drbd-user mailing list