Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 01/11/2012 11:56 AM, Adam Wilbraham wrote: > Anyway, we moved the servers to a new location over the weekend and > almost from the point of power up this parallel reboot issue reared > its head. If the servers were sat there idling they would be fine, but > the minute I started to boot up domUs I began the risk of it > happening. Normally the more domUs running, the more likely it was to > kick a reboot. It seemed like it was most likely to happen when > starting another domU rather than just doing its running of online > VMs. Anyway, I eventually narrowed this down to a point where I > realised that if I unplugged the network cable that was being used for > DRBD replication then the server would spew output to the screen and > reboot instantly, with the other one in the pair going about a second > later. If you have fencing configured, as you should, then you could have been seeing a dual-fence problem. Basically, both nodes send off their kill commands before one of them die. I believe this is a known issue with some iLO based fencing, but I can't quote source. Generally, the test is to put a 5sec delay into the fence script on one node. If it then reliably dies first and the other node lives, you found the issue. You *are* using fencing, right? ;) -- Digimer E-Mail: digimer at alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron