[DRBD-user] 0.7.4 in state WFReportParams forever ?

Matthew Hodgson matthew at mxtelecom.com
Sun Oct 24 17:01:36 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars Ellenberg wrote:
> / 2004-10-22 12:55:46 +0100
> \ Matthew Hodgson:
> 
>>>hm.
>>>could you kick the kernel log daemon (klogd -i), and
>>>then trigger a sysrq Task dump (echo t > /proc/sysrq-trigger) ?
>>
>># klogd -i
>># echo t > /proc/sysrq-trigger
>># cat /var/log/kern.log
>>Oct 22 12:49:46  kernel: drbd0_receive D 00000001  4416   328      1          4542   280 (L-TLB)
>>Oct 22 12:49:46  kernel: Call Trace:    [<c0105d12>] [<c0105eac>] [<f896ad33>] [<f897a1f9>] [<f896fe7b>]
>>Oct 22 12:49:46  kernel:   [<f8969bad>] [<f897a1f9>] [<f89663ab>] [<f8969de8>] [<f897a7f0>] [<f896ff5a>]
>>Oct 22 12:49:46  kernel:   [<c010578e>] [<f896fee0>]
>>Oct 22 12:49:46  kernel: drbd0_worker  S 00000002  4572  4542      1          6151   328 (L-TLB)
>>Oct 22 12:49:46  kernel: Call Trace:    [<c0311ca6>] [<c0105de9>] [<c0105eb7>] [<f8965174>] [<f897a46d>]
>>Oct 22 12:49:46  kernel:   [<f896ff5a>] [<c010578e>] [<f896fee0>]
>>Oct 22 12:49:46  kernel: drbdsetup     D 4000A490     0  6151      1                4542 (NOTLB)
> 
> there we are.
> drbdsetup and drbd0_receiver deadlocking each other.
> wtf.
> 
> btw, my hope was that if you had klogd running, those funny numbers
> would get decoded to kernel symbols...

oops - my bad; I haven't played with the commandline options for
klogd before - here we go with the symbols being deferenced from
(hopefully the correct) System.map:

(apologies again for supersize lines...)

Oct 24 15:37:52 kernel: drbd0_receive D 00000001  4416   328      1          4542   280 (L-TLB)
Oct 24 15:37:52 kernel: Call Trace:    [__down+114/192] [__down_failed+8/12] [drbd:drbd_asender+1923/2032] [drbd:__insmod_drbd_S.rodata_L838+25195/31698] [drbd:_set_cstate+139/560]
Oct 24 15:37:52 kernel:   [drbd:drbd_send_handshake+173/656] [drbd:__insmod_drbd_S.rodata_L838+25195/31698] [drbd:drbd_connect+523/14688] [drbd:drbdd_init+88/2080] [drbd:__insmod_drbd_S.rodata_L838+26722/31698] [drbd:_set_cstate+362/560]
Oct 24 15:37:52 kernel:   [arch_kernel_thread+46/64] [drbd:_set_cstate+240/560]
Oct 24 15:37:52 kernel: drbd0_worker  S 00000002  4572  4542      1          6151   328 (L-TLB)
Oct 24 15:37:52 kernel: Call Trace:    [sense_data_texts+934/1024] [__down_interruptible+137/240] [__down_failed_interruptible+7/12] [drbd:drbd_worker+1220/1776] [drbd:__insmod_drbd_S.rodata_L838+25823/31698]
Oct 24 15:37:52 kernel:   [drbd:_set_cstate+362/560] [arch_kernel_thread+46/64] [drbd:_set_cstate+240/560]
Oct 24 15:37:52 kernel: drbdsetup     D 4000A490     0  6151      1         11057  4542 (NOTLB)
Oct 24 15:37:52 kernel: Call Trace:    [__down+114/192] [__down_failed+8/12] [drbd:restore_old_sigset+367/942] [drbd:drbd_send_sync_param+98/224] [drbd:drbd_set_state+1516/2304]
Oct 24 15:37:52 kernel:   [drbd:drbd_ioctl+1918/4048] [blkdev_ioctl+53/64] [sys_ioctl+245/707] [system_call+51/56]

>>>or at least give the output of /proc/drbd, and
>>>ps -eo pid,comm,stat,wchan ?
>>
>># cat /proc/drbd
>>version: 0.7.5 (api:76/proto:74)
>>SVN Revision: 1578 build by root at mxtelecom.com, 2004-10-10 18:54:22
>>  0: cs:WFReportParams st:Primary/Unknown ld:Consistent
>>     ns:8715928 nr:0 dw:36426099 dr:7245115 al:8923 bm:2370 lo:0 pe:0 ua:0 ap:0
>>  1: cs:Unconfigured
>>
>># ps -eo pid,comm,stat,wchan
>>   PID COMMAND          STAT WCHAN
>>   328 drbd0_receiver   D    down
>>  4542 drbd0_worker     S    down_interruptible
>>  6151 drbdsetup        D    down
> 
> 
> exactly.
> one owns the smaphore, and waits for the other to die
> while the other tries to get the first semaphore it self.
>  :(
>  
> can you (by looking into heartbeat logfiles e.g.) figure out what this
> drbdsetup tries to do?

The drbdsetup you see there at process 6151 was one run by
me a while after the module hung with:

# drbdsetup /dev/drbd0 syncer -r 512000

I forget precisely why I was running it - I guess I was
trying to nudge it into reestablishing a connection to the
slave.

Or do you want to know the syntax of the original setup?
The DRBD was being run entirely on its own - no heardbeatd
or drbdadm - just:

# modprobe drbd
# drbdsetup /dev/drbd0 disk /dev/sda3 internal -1
# drbdsetup /dev/drbd0 primary
# drbdsetup /dev/drbd0 net 10.0.0.2:7788 10.0.0.1:7788 C
# drbdsetup /dev/drbd0 syncer -r 512000
# mount /dev/drbd0 /mnt

to get the master up and running.

> in short, I think we need to down_interruptible sometimes where we
> currently use down.

If there's anything more I can do in trying to reproduce
or investigate where things have hung, just say.

best regards,

Matthew.

-- 
______________________________________________________________
Matthew Hodgson   matthew at mxtelecom.com   Tel: +44 845 6667778
                 Systems Analyst, MX Telecom Ltd.




More information about the drbd-user mailing list