Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-10-24 16:01:36 +0100 \ Matthew Hodgson: > oops - my bad; I haven't played with the commandline options for > klogd before - here we go with the symbols being deferenced from > (hopefully the correct) System.map: hm... they look a bit strange. in case you can reproduce the hang, please recompile drbd with DBG_ALL_SYMBOLS defined in drbd_config.h (and don't forget to make sure the recompiled module is installed and loaded! ) > (apologies again for supersize lines...) > > Oct 24 15:37:52 kernel: drbd0_receive D 00000001 4416 328 1 4542 280 (L-TLB) > Oct 24 15:37:52 kernel: Call Trace: [__down+114/192] [__down_failed+8/12] [drbd:drbd_asender+1923/2032] eh? [drbd:__insmod_drbd_S.rodata_L838+25195/31698] [drbd:_set_cstate+139/560] [drbd:drbd_send_handshake+173/656] huh? confused call trace, I guess. [drbd:__insmod_drbd_S.rodata_L838+25195/31698] [drbd:drbd_connect+523/14688] [drbd:drbdd_init+88/2080] [drbd:__insmod_drbd_S.rodata_L838+26722/31698] [drbd:_set_cstate+362/560] [arch_kernel_thread+46/64] [drbd:_set_cstate+240/560] > Oct 24 15:37:52 kernel: drbd0_worker S 00000002 4572 4542 1 6151 328 (L-TLB) > Oct 24 15:37:52 kernel: Call Trace: [sense_data_texts+934/1024] [__down_interruptible+137/240] [__down_failed_interruptible+7/12] [drbd:drbd_worker+1220/1776] [drbd:__insmod_drbd_S.rodata_L838+25823/31698] [drbd:_set_cstate+362/560] [arch_kernel_thread+46/64] [drbd:_set_cstate+240/560] ok, this thread is fine. > Oct 24 15:37:52 kernel: drbdsetup D 4000A490 0 6151 1 11057 4542 (NOTLB) > Oct 24 15:37:52 kernel: Call Trace: [__down+114/192] [__down_failed+8/12] [drbd:restore_old_sigset+367/942] [drbd:drbd_send_sync_param+98/224] [drbd:drbd_set_state+1516/2304] [drbd:drbd_ioctl+1918/4048] [blkdev_ioctl+53/64] [sys_ioctl+245/707] [system_call+51/56] this should never end up there in the first place. but yes, it is expected to hang when the drbd_receiver hangs itself in the sending function. > If there's anything more I can do in trying to reproduce > or investigate where things have hung, just say. well, it is a race. and to reproduce races is not easy... and I'm still not seing who takes part in this race, because actually, the only competitor should be the drbd_receiver thread itself. so, just try to reproduce it some more times :) and enable those DBG_ALL_SYMBOLS, and then maybe try if you can still reproduce with those two small fixes in current svn. If you really can reproduce this with not too much effort, I'm sure I can come up with some "print style" like debug patch to find exactly where it hangs. but you should be able to reproduce it first. Lars Ellenberg -- please use the "List-Reply" function of your email client.