Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Jun 30, 2009 at 10:11:40AM +0200, Kris Buytaert wrote: > On Thu, 2009-06-25 at 11:42 +0200, Kris Buytaert wrote: > > > > Use a serial console, attach that to some "monitoring" host. > > > (you can useUSB-to-Serial, they are cheap and work), and log > > > on that one. You'll get the last messages from there. > > > > > I indeed had hoped to see some output on on the serial console when the > > reboots happened .. but the best I got so far was a partial timestamp > > with no further explanation before the reboot output started again .. > > > > Any other ideas ? > > > > Update : > > The problem is indeed ocfs2 fencing off the systems , the logging > however does not show up in a serial console it DOES show up when using > netconsole > > > [base-root at CCMT-A ~]# nc -l -u -p 6666 > (8,0):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device > drbd0 after 478000 milliseconds > (8,0):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active > regions. > ocfs2 is very sorry to be fencing this system by restarting > , > > One'd think that it output over Serial console before it log over the > network :) It doesn't . > > > > > Next step is that I`ll start fiddling some more with the timeout > values :) 478000 milliseconds. almost 8 minutes! No, I don't think fiddling with timeouts will help. There is something else seriously wrong. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed