[DRBD-user] Identifying High iowait on DRBD Computers

Thu Feb 17 22:26:32 CET 2011

From: "Robinson, Eric" <eric.robinson at psmnv.com>
>> killproc nfsd -2
>> to
>> killproc nfsd -9
>> In order to have successful failovers with 
>> pacemaker/heartbeat of the nfs server.
> This is an older Heartbeat haresources-based cluster. Would your
> suggestion work even when kill -9 won't kill the nfs, nfsd, or
> mfsd4 processes? They simply would not die. I suspect this is
> because they are in use by another process.

If so, then kill *that* process.  lsof and/or fuser can help a lot with
determining what's using things and killing those things.

> I [wish there] was a --die-die-die switch that
> just killed the dang things even when they are in use. 

Under Linux, a process in state D (waiting for a syscall to return) *cannot*
be killed, even with SIGKILL.  The kernel programmers found a number of
circumstances where terminating a process in state D could lead to Undefined
Behavior, as I understand it.  The example I remember was a process doing a
read() syscall on a dodgy disk.  Kernel code says, "When the read from the
disk completes, I'll use DMA to shove the result into page 0x131072."  Disk
can't deliver immediately because it's flaky.  User kill -9s the process. 
Kernel MM code releases page 0x131072, some other process grabs hold of that
page.  Disk eventually gets its act together, uses DMA to write data to that
page, other process gets its data corrupted.  How embarrassing.

And IIRC, there were a fair number of practical and philosophical problems
with rewriting everything so that that couldn't happen.  So that's why you
can't kill a process in state D even with SIGKILL.

-- 
Matt G / Dances With Crows
The Crow202 Blog:  http://crow202.org/wordpress/
There is no Darkness in Eternity/But only Light too dim for us to see