Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Jan 11, 2012 at 5:56 PM, Adam Wilbraham <adam.wilbraham at technophobia.com> wrote: > I've spent the past couple of days trying to get a pair of servers > into a state of stability and thought I would try and get down my > issues into a post & possible bug report whilst I have them in my > head. I'm probably going to miss some bits of information out, but > I'll try and get down what I remember. Its been a busy couple of days > & the combinations I've tried might mean that none of this is useful > for debugging because I haven't logged down as much as I should have > done as I was going through the troubleshooting process as I've been > blocking colleagues and therefore against the clock. > > I started out with a pair of HP DL360 G6's which were built > approximately a year ago, running Debian Squeeze (before it became > stable) with Xen 4.0 on top all from Apt and with DRBD 8.3.10 built > from source (I believe this was stable at the time). The pair of > servers have only been used as internal development hosts and were > never patched up when Debian went stable, so the kernel version was a > little out of date > (xen-linux-system-2.6.32-5-xen-amd64_2.6.32-30_amd64.deb) as were > other packages. Over the last year the pair have been stable almost > all of the time, but we did have a couple of incidents where the pair > would reboot in tandem but because they weren't business critical the > resolution of this was never a priority. > > Anyway, we moved the servers to a new location over the weekend and > almost from the point of power up this parallel reboot issue reared > its head. If the servers were sat there idling they would be fine, but > the minute I started to boot up domUs I began the risk of it > happening. Normally the more domUs running, the more likely it was to > kick a reboot. It seemed like it was most likely to happen when > starting another domU rather than just doing its running of online > VMs. Anyway, I eventually narrowed this down to a point where I > realised that if I unplugged the network cable that was being used for > DRBD replication then the server would spew output to the screen and > reboot instantly, with the other one in the pair going about a second > later. You didn't have the chance to hook up a serial terminal and capture the log messages that way, I suppose? Any idea whether you were getting a kernel panic, or an oops? And, just checking, you did follow http://www.drbd.org/users-guide-8.3/s-xen-drbd-mod-params.html? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now