[DRBD-user] DRBD stuck after a strong network failure

Lars Ellenberg Lars.Ellenberg at linbit.com
Mon May 8 17:36:44 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2006-05-08 15:18:31 +0300
\ Cyril Bouthors:
> Another interesting thing on the primary:
> 
> # ps -axo pid,wchan=WIDE-WCHAN-COLUMN,cmd -o comm | grep drbd
>  2493 down              [drbd0_receiver]            drbd0_receiver
>  5514 ?                 drbdsetup /dev/drbd0 net 10 drbdsetup
>  5661 down_interruptibl drbdsetup /dev/drbd0 net 10 drbdsetup
>  6203 down_interruptibl drbdsetup /dev/drbd0 state drbdsetup
> 
> "drbdsetup /dev/drbd0 state" seems to be running for quite some time,
> it has been launched by heartbeat.

only one drbdsetup per /dev/drbdX is allowed at any time.
thats why the other two are in "down_interruptible" state, they wait for
the ioctl semaphore.

I'm sorry, this is not debuggable this way. I cannot even tell where
drbd does hang, if it hangs. Maybe it hangs in some lower level kernel
function, that is not supposed to hang. Maybe it is something in our
kernel 2.4 compatibility layer.  There is not enought information for me
to give any specific diagnosis, and unless you are able to get some
"stack traces" (via kernel debugger/sysrq/whatever) to see where exactly
which kernel threads are waiting on each other (which I assume is the
case), I can not help you here.

The best I can do is recommend to use a 2.6 kernel,
and see if it gets better.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list