[DRBD-user] disconnecting hangs after ko-count failure

Lars Ellenberg lars.ellenberg at linbit.com
Wed Jan 23 00:28:37 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Jan 22, 2008 at 11:52:39PM +0100, Walter Haidinger wrote:
> > > > please do
> > > > # ps -eo pid,state,wchan:30,cmd | grep -e D -e drbd
> > > 
> > >   171 S drbd_nl_disconnect             [cqueue/1]
> > >  7735 S -                              [drbd0_worker]
> > > 13018 D drbd_disconnect                [drbd0_receiver]
> > > 21135 S pipe_wait                      grep drbd
> > 
> > to even start guessing about anything,
> > what I'd need is /proc/drbd (not only the first line),
> > and the above output, of both nodes,
> > when they are in this "hanging" state.
> 
> I only have output of one node right now, the other one has been already rebooted. Hope that helps nevertheless.
> 
> > cat /proc/drbd
> version: 8.0.8 (api:86/proto:86)
> GIT-hash: bd3e2c922f95c4fa0dca57a4f8c24bf8b249cc02 build by root at vmhost.private, 2008-01-02 21:11:11
>  0: cs:Disconnecting st:Secondary/Unknown ds:UpToDate/Inconsistent C r---
>     ns:1360172 nr:0 dw:49554716 dr:1364921 al:25388 bm:25544 lo:0 pe:0 ua:0 ap:0
>     resync: used:0/31 hits:679830 misses:89 starving:0 dirty:0 changed:89
>     act_log: used:0/19 hits:12363291 misses:26105 starving:3 dirty:716 changed:25388

interestingly, all reference counts are zero already.
I have no idea what it would be waiting for.

> > also, maybe you can trigger a sysrq shoW-blocked-tasks (or showTasks, if
> > your kernel does not have the former), and see if you can figure out
> > where exactly the drbd_disconnect sleeps, so we know what exactly it is
> > waiting for.
> 
> Very good idea! However,  have not used sysrq yet, and
> Documentation/sysrq.txt doesn't know about showtasks, so how do I do
> this, i.e. what do I write to /proc/sysrq-trigger?

echo 1 > /proc/sys/kernel/sysrq
echo h > /proc/sysrq-trigger
dmesg | tail

gives you the "help line".
capital letters indicate the triggers, which are to be used
(as lower case, though) for the respective action.

so, echo w or echo t.
unfortunately, t dumps all tasks, which is likely too much to fit in the
kernel printk buffer (unless you first kill everything else...),
wrapping that around. so maybe there is good information there, maybe
the interessting parts got overwritten by wraparound.

-- 
: commercial DRBD/HA support and consulting: sales at linbit.com :
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list