[DRBD-user] Tales of woe - possible Debian Squeeze kernel bugs, possible DRBD bugs, possible Xen bugs...

Digimer linux at alteeve.com
Wed Jan 11 18:15:58 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 01/11/2012 11:56 AM, Adam Wilbraham wrote:
> Anyway, we moved the servers to a new location over the weekend and
> almost from the point of power up this parallel reboot issue reared
> its head. If the servers were sat there idling they would be fine, but
> the minute I started to boot up domUs I began the risk of it
> happening. Normally the more domUs running, the more likely it was to
> kick a reboot. It seemed like it was most likely to happen when
> starting another domU rather than just doing its running of online
> VMs. Anyway, I eventually narrowed this down to a point where I
> realised that if I unplugged the network cable that was being used for
> DRBD replication then the server would spew output to the screen and
> reboot instantly, with the other one in the pair going about a second
> later.

If you have fencing configured, as you should, then you could have been
seeing a dual-fence problem. Basically, both nodes send off their kill
commands before one of them die. I believe this is a known issue with
some iLO based fencing, but I can't quote source. Generally, the test is
to put a 5sec delay into the fence script on one node. If it then
reliably dies first and the other node lives, you found the issue.

You *are* using fencing, right? ;)

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron



More information about the drbd-user mailing list