[DRBD-user] Problems in unmounting drbddisk

Florian Haas florian.haas at linbit.com
Tue Jul 1 09:52:54 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


You are not having trouble "unmounting drbddisk", you are having trouble
unmounting the filesystem that sits on top of DRBD. Something is using
that filesystem, and Heartbeat attempts to clean up by first issuing
"fuser -mk <mountpoint>", and then "fuser -mk -SIGKILL <mountpoint>",
but there are no processes left to signal.

So you need to find our what else may be using your filesystem. This
may, for example, be a loop device referencing a file on that filesystem
(fuser has no way of finding those, as they're not associated with a
userspace process), or some specific socket types (fuser is known to not
detect those reliably).

DRBD is not at fault here. You either have your Filesystem resource
misconfigured, or some other process/application intervening, or
(unlikely) a Filesystem RA bug is biting you. The folks over on the
linux-ha mailing list may be able to help you out.

Cheers,
Florian

Predatorz wrote:
> Hi,
> 
> When i rebooted my machine, heartbeat will have problems to unmount the drbd
> device below are parts of the logs.
> Machine is running on CentOS 5.2 with drbd 8.2.6 and heartbeat
> 2.1.3-3.el5.centos
> I am trying to do HA for firewall device.
> DRBD device is a 10GB lvm2 disk formatted with ext3.
> 
> My haresources file for the resources to be started, it is able to mount the
> drbd device and start all the IPs properly, only problem is that when it try
> to unmount when i issue reboot.
> 
> eysihfw1 drbddisk::drbd0 Filesystem::/dev/drbd0::/replicated::ext3
> IPaddr::10.6.1.1/255.255.255.0 xx.xx.xx.xx/27/eth1:0 xx.xx.xx.xx/27/eth1:1
> xx.xx.xx.xx/27/eth1:2 xx.xx.xx.xx/27/eth1:3 xx.xx.xx.xx/27/eth1:4
> xx.xx.xx.xx/27/eth1:5 xx.xx.xx.xx/27/eth1:6 210.23.11.247/27/eth1:7
> xx.xx.xx.xx/27/eth1:8 xx.xx.xx.xx/27/eth1:9 xx.xx.xx.xx/27/eth1:10
> xx.xx.xx.xx/27/eth1:11 xx.xx.xx.xx/27/eth1:12 210.23.11.253/27/eth1:13
> 210.23.11.254/27/eth1:14 openvpn named
> 
> Logs from ha-log file before heartbeat get killed ungracefully and reboot
> the box.
> 
> ResourceManager[8164]:  2008/07/01_12:22:37 ERROR: Return code 1 from
> /etc/ha.d/resource.d/Filesystem
> ResourceManager[8164]:  2008/07/01_12:22:38 info: Retrying failed stop
> operation [Filesystem::/dev/drbd0::/replicated::ext3]
> ResourceManager[8164]:  2008/07/01_12:22:38 info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /replicated ext3 stop
> Filesystem[11342]:      2008/07/01_12:22:38 INFO: Running stop for
> /dev/drbd0 on /replicated
> Filesystem[11342]:      2008/07/01_12:22:38 INFO: Trying to unmount
> /replicated
> Filesystem[11342]:      2008/07/01_12:22:38 ERROR: Couldn't unmount
> /replicated; trying cleanup with SIGTERM
> Filesystem[11342]:      2008/07/01_12:22:38 INFO: No processes on
> /replicated were signalled
> Filesystem[11342]:      2008/07/01_12:22:39 ERROR: Couldn't unmount
> /replicated; trying cleanup with SIGTERM
> Filesystem[11342]:      2008/07/01_12:22:39 INFO: No processes on
> /replicated were signalled
> Filesystem[11342]:      2008/07/01_12:22:40 ERROR: Couldn't unmount
> /replicated; trying cleanup with SIGTERM
> Filesystem[11342]:      2008/07/01_12:22:40 INFO: No processes on
> /replicated were signalled
> Filesystem[11342]:      2008/07/01_12:22:41 ERROR: Couldn't unmount
> /replicated; trying cleanup with SIGKILL
> Filesystem[11342]:      2008/07/01_12:22:41 INFO: No processes on
> /replicated were signalled
> Filesystem[11342]:      2008/07/01_12:22:42 ERROR: Couldn't unmount
> /replicated; trying cleanup with SIGKILL
> Filesystem[11342]:      2008/07/01_12:22:43 INFO: No processes on
> /replicated were signalled
> Filesystem[11342]:      2008/07/01_12:22:44 ERROR: Couldn't unmount
> /replicated; trying cleanup with SIGKILL
> Filesystem[11342]:      2008/07/01_12:22:44 INFO: No processes on
> /replicated were signalled
> Filesystem[11342]:      2008/07/01_12:22:45 ERROR: Couldn't unmount
> /replicated, giving up!
> Filesystem[11331]:      2008/07/01_12:22:45 ERROR:  Generic error
> ResourceManager[8164]:  2008/07/01_12:22:45 ERROR: Return code 1 from
> /etc/ha.d/resource.d/Filesystem
> ResourceManager[8164]:  2008/07/01_12:22:46 info: Retrying failed stop
> operation [Filesystem::/dev/drbd0::/replicated::ext3]
> [...]

-- 
: Florian G. Haas
: LINBIT Information Technologies GmbH
: Vivenotgasse 48, A-1120 Vienna, Austria

Enterprise consultancy and support for DRBD is available from LINBIT. If
you are interested, Please go to http://www.linbit.com/en/contact and
leave your contact details.

When replying, there is no need to CC my personal address. I monitor the
list on a daily basis. Thank you.



More information about the drbd-user mailing list