Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, 2004-07-16 at 14:16, Lars Ellenberg wrote: > have you any indication *what* may hang? > can you enable sysrq, and, when it hangs again, > hit sysrq-t (which is showTask)... and see which processes are in "D" state, > and where (this includes a stack backtrace of each process). > (if you hook up a serial console, you could capture the output) The machines are in a datacenter across the city so its not easy to do but possible. I'm going to go down there now with a new crossover cable to see if that solves the problem. If not I will resort to what you mentioned above. The biggest problem is that it is so unpredictable. Usually it only happens once every 2-3 days but some times its 2-3 times a day. > what happens when you unplug the cable? > does the "hung" node recover then? Haven't tried it but next time I get a hang I'll ask the data centre staff to do this instead of the hard-reset. They are fairly cooperative but I get the impression I'm starting to use up my free reboot support... :| Thanks very much for your suggestions. -- John Lange