Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-10-22 12:55:46 +0100 \ Matthew Hodgson: > > hm. > > could you kick the kernel log daemon (klogd -i), and > > then trigger a sysrq Task dump (echo t > /proc/sysrq-trigger) ? > > Apologies for the width & length of information here - i haven't truncated > it as I'm not sure what might be obliquely relavent and what isn't: > > # klogd -i > # echo t > /proc/sysrq-trigger > # cat /var/log/kern.log > Oct 22 12:49:46 kernel: drbd0_receive D 00000001 4416 328 1 4542 280 (L-TLB) > Oct 22 12:49:46 kernel: Call Trace: [<c0105d12>] [<c0105eac>] [<f896ad33>] [<f897a1f9>] [<f896fe7b>] > Oct 22 12:49:46 kernel: [<f8969bad>] [<f897a1f9>] [<f89663ab>] [<f8969de8>] [<f897a7f0>] [<f896ff5a>] > Oct 22 12:49:46 kernel: [<c010578e>] [<f896fee0>] > Oct 22 12:49:46 kernel: drbd0_worker S 00000002 4572 4542 1 6151 328 (L-TLB) > Oct 22 12:49:46 kernel: Call Trace: [<c0311ca6>] [<c0105de9>] [<c0105eb7>] [<f8965174>] [<f897a46d>] > Oct 22 12:49:46 kernel: [<f896ff5a>] [<c010578e>] [<f896fee0>] > Oct 22 12:49:46 kernel: drbdsetup D 4000A490 0 6151 1 4542 (NOTLB) there we are. drbdsetup and drbd0_receiver deadlocking each other. wtf. btw, my hope was that if you had klogd running, those funny numbers would get decoded to kernel symbols... obviously this did not work :( > > or at least give the output of /proc/drbd, and > > ps -eo pid,comm,stat,wchan ? > > # cat /proc/drbd > version: 0.7.5 (api:76/proto:74) > SVN Revision: 1578 build by root at mxtelecom.com, 2004-10-10 18:54:22 > 0: cs:WFReportParams st:Primary/Unknown ld:Consistent > ns:8715928 nr:0 dw:36426099 dr:7245115 al:8923 bm:2370 lo:0 pe:0 ua:0 ap:0 > 1: cs:Unconfigured > > # ps -eo pid,comm,stat,wchan > PID COMMAND STAT WCHAN > 328 drbd0_receiver D down > 4542 drbd0_worker S down_interruptible > 6151 drbdsetup D down exactly. one owns the smaphore, and waits for the other to die while the other tries to get the first semaphore it self. :( > many thanks for looking into this :) can you (by looking inot heartbeat logfiles e.g.) figure out what this drbdsetup tries to do? in short, I think we need to down_interruptible sometimes where we currently use down. Lars Ellenberg -- please use the "List-Reply" function of your email client.