Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-08-20 22:09:04 +0200 \ Miroslav Jany: > Hi all! > > We have been experiencing serious problems with heartbeat recently, although > the problem may not be related with heartbeat directly. When trying to stop > heartbeat on the first node and waiting for the complementary node to take > resources over, the directory of WEB resource cannot be unmounted > and the following messages can be found in heartbeat's log: > > Filesystem[5094][5154]: 2006/08/20_21:32:55 INFO: No processes on > /storage/web were signalled > Filesystem[5094][5154]: 2006/08/20_21:32:55 INFO: No processes on > /storage/web were signalled > Filesystem[5094][5157]: 2006/08/20_21:32:56 ERROR: Couldn't unmount > /storage/web, giving up! > Filesystem[5094][5157]: 2006/08/20_21:32:56 ERROR: Couldn't unmount > /storage/web, giving up! > Filesystem[5029][5159]: 2006/08/20_21:32:56 ERROR: Filesystem Generic > error > Filesystem[5029][5159]: 2006/08/20_21:32:56 ERROR: Filesystem Generic > error > > Then heartbeat reboots the machine after declaring itself dead: > > ResourceManager[3186][5259]: 2006/08/20_21:32:56 CRIT: Resource STOP > failure. Reboot required! > ResourceManager[3186][5259]: 2006/08/20_21:32:56 CRIT: Resource STOP > failure. Reboot required! > ResourceManager[3186][5260]: 2006/08/20_21:32:56 CRIT: Killing heartbeat > ungracefully! > ResourceManager[3186][5260]: 2006/08/20_21:32:56 CRIT: Killing heartbeat > ungracefully! > > I've also tried to unmount /storage/web via "umount" command interactively, > however it's still saying it's busy :-( > > The cluster of 2 nodes has been running fine for over 9 months without > any serious problems. This problem started to happen recently (about 2 > weeks ago) with heartbeat version 2.0.1 (which has been in production use for > the mentioned 9 months), and the upgrade to the latest heartbeat version, > 2.0.7, didn't improve situation. > > Can anybody please tell me how to better figure out why it can't be unmounted? > /storage/web is a mounted DRBD device, using ext3 filesystem on top. as long as the filesystem can not be unmounted, it has not yet anything to do with drbd. maybe you get more answers on linux-ha ... do you happen to have bind mounts somewhere in that? or loop mounts or similar? while you try to umount it by hand, what does "fuser -v -m /sotrage/web" say? lsof? do you have exported this with nfs or by other means? -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.