Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-07-16 14:52:58 -0500 \ John Lange: > On Fri, 2004-07-16 at 14:16, Lars Ellenberg wrote: > > have you any indication *what* may hang? > > can you enable sysrq, and, when it hangs again, > > hit sysrq-t (which is showTask)... and see which processes are in "D" state, > > and where (this includes a stack backtrace of each process). > > (if you hook up a serial console, you could capture the output) > > The machines are in a datacenter across the city so its not easy to do > but possible. I'm going to go down there now with a new crossover cable > to see if that solves the problem. > > If not I will resort to what you mentioned above. The biggest problem is > that it is so unpredictable. Usually it only happens once every 2-3 days > but some times its 2-3 times a day. > > > what happens when you unplug the cable? > > does the "hung" node recover then? > > Haven't tried it but next time I get a hang I'll ask the data centre > staff to do this instead of the hard-reset. They are fairly cooperative > but I get the impression I'm starting to use up my free reboot > support... :| > > Thanks very much for your suggestions. instead of plugging the cable, you could also do iptables -I INPUT -i eth1 -j DROP iptables -I OUTPUT -o eth1 -j DROP (or even only drbd disconnect) on the surviving box (when eth1 is your crossover cable)... in case that the other box hangs somewhere within DRBD, it _may_ recover when DRBD recognizes the connection loss. lge