[DRBD-user] 0.7.4 in state WFReportParams forever ?

Lars Ellenberg Lars.Ellenberg at linbit.com
Sun Oct 24 12:43:28 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2004-10-22 12:55:46 +0100
\ Matthew Hodgson:
> > hm.
> > could you kick the kernel log daemon (klogd -i), and
> > then trigger a sysrq Task dump (echo t > /proc/sysrq-trigger) ?
> 
> Apologies for the width & length of information here - i haven't truncated
> it as I'm not sure what might be obliquely relavent and what isn't:
> 
> # klogd -i
> # echo t > /proc/sysrq-trigger
> # cat /var/log/kern.log
> Oct 22 12:49:46  kernel: drbd0_receive D 00000001  4416   328      1          4542   280 (L-TLB)
> Oct 22 12:49:46  kernel: Call Trace:    [<c0105d12>] [<c0105eac>] [<f896ad33>] [<f897a1f9>] [<f896fe7b>]
> Oct 22 12:49:46  kernel:   [<f8969bad>] [<f897a1f9>] [<f89663ab>] [<f8969de8>] [<f897a7f0>] [<f896ff5a>]
> Oct 22 12:49:46  kernel:   [<c010578e>] [<f896fee0>]
> Oct 22 12:49:46  kernel: drbd0_worker  S 00000002  4572  4542      1          6151   328 (L-TLB)
> Oct 22 12:49:46  kernel: Call Trace:    [<c0311ca6>] [<c0105de9>] [<c0105eb7>] [<f8965174>] [<f897a46d>]
> Oct 22 12:49:46  kernel:   [<f896ff5a>] [<c010578e>] [<f896fee0>]
> Oct 22 12:49:46  kernel: drbdsetup     D 4000A490     0  6151      1                4542 (NOTLB)

there we are.
drbdsetup and drbd0_receiver deadlocking each other.
wtf.

btw, my hope was that if you had klogd running, those funny numbers
would get decoded to kernel symbols...

obviously this did not work :(

> > or at least give the output of /proc/drbd, and
> > ps -eo pid,comm,stat,wchan ?
> 
> # cat /proc/drbd
> version: 0.7.5 (api:76/proto:74)
> SVN Revision: 1578 build by root at mxtelecom.com, 2004-10-10 18:54:22
>   0: cs:WFReportParams st:Primary/Unknown ld:Consistent
>      ns:8715928 nr:0 dw:36426099 dr:7245115 al:8923 bm:2370 lo:0 pe:0 ua:0 ap:0
>   1: cs:Unconfigured
> 
> # ps -eo pid,comm,stat,wchan
>    PID COMMAND          STAT WCHAN
>    328 drbd0_receiver   D    down
>   4542 drbd0_worker     S    down_interruptible
>   6151 drbdsetup        D    down

exactly.
one owns the smaphore, and waits for the other to die
while the other tries to get the first semaphore it self.
 :(

> many thanks for looking into this :)

can you (by looking inot heartbeat logfiles e.g.) figure out what this
drbdsetup tries to do?


in short, I think we need to down_interruptible sometimes where we
currently use down.


	Lars Ellenberg

-- 
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list