[DRBD-user] Heartbeat Filesystem resource agent (on top of DRBD RA) unmounting failure

Rodrigo Pereira rbpereira at critical-links.com
Thu Dec 27 13:29:56 CET 2007

My cluster had a hiccup today. Primary node was manually soft-rebooted by
someone, and DRBD on secondary node was on a loop trying to start.

I browsed the Filesystem RA, and i see it uses fuser to try and remove
processes attached to the fs. Obviously this didn't work, as fuser does not
return 0. So i ask, what kind of known or hipothetical situation could
originate this problem? I believe i had a bash planted on the fs, but that
souldn't be a problem for fuser -k. I'm also sending  this to DRBD list,
maybe it's something just DRBD related.

Logs showed this when the cluster was shutting down on the primary node:

Filesystem[10970]:      2007/12/27_09:52:27 INFO: Running stop for
/dev/drbd0 on /drbd0
Filesystem[10970]:      2007/12/27_09:52:27 INFO: Trying to unmount /drbd0
lrmd[4900]: 2007/12/27_09:52:27 info: RA output: (fs0:stop:stderr) umount:
/drbd0: device is busy
umount: /drbd0: device is busy

[... trying to umount several times with fuser and SIGTERM/KILL signals ...]

Filesystem[10970]:      2007/12/27_09:52:32 ERROR: Couldn't unmount /drbd0;
trying cleanup with SIGKILL
Filesystem[10970]:      2007/12/27_09:52:32 INFO: No processes on /drbd0
were signalled
Filesystem[10970]:      2007/12/27_09:52:33 ERROR: Couldn't unmount /drbd0,
giving up!
lrmd[4900]: 2007/12/27_09:52:34 WARN: Exiting fs0:stop process 10970
returned rc 1.
crmd[4903]: 2007/12/27_09:52:34 ERROR: process_lrm_event: LRM operation
fs0_stop_0 (call=83, rc=1) Error unknown error


