[DRBD-user] Unexplained reboots in DRBD82 + OCFS2 setup

Lars Ellenberg lars.ellenberg at linbit.com
Tue Jun 30 10:52:02 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Jun 30, 2009 at 10:11:40AM +0200, Kris Buytaert wrote:
> On Thu, 2009-06-25 at 11:42 +0200, Kris Buytaert wrote:
> 
> > > Use a serial console, attach that to some "monitoring" host.
> > > (you can useUSB-to-Serial, they are cheap and work), and log
> > > on that one. You'll get the last messages from there.
> > > 
> > I indeed had hoped to see some output on on the serial console when the
> > reboots happened .. but the best I got so far was a partial timestamp
> > with no further explanation before the reboot output started again .. 
> > 
> > Any other ideas ? 
> > 
> 
> Update : 
> 
> The problem is indeed ocfs2 fencing off the systems , the logging
> however does not show up in a serial console  it DOES show up when using
> netconsole 
> 
> 
> [base-root at CCMT-A ~]# nc -l -u -p 6666
> (8,0):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device
> drbd0 after 478000 milliseconds
> (8,0):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active
> regions.
> ocfs2 is very sorry to be fencing this system by restarting
> ,
> 
> One'd think that it output over Serial console before it log over the
> network :)   It doesn't . 
> 
> 
> 
> 
> Next step is that I`ll start fiddling some more with the timeout
> values :) 


478000 milliseconds.
almost 8 minutes!
No, I don't think fiddling with timeouts will help.
There is something else seriously wrong.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list