[DRBD-user] KVM on multiple DRBD resources + IPMI stonith - questions

Thu Jun 23 19:33:05 CEST 2011

Hi,

After various diversions from the project, I'm back to gathering clues on
what should be a simple setup. But either my clue basket it leaking, or some
necessary clues take better luck than mine to easily find. It's should be a
fairly simple setup to complete.

It consists of two servers, dedicated only to hosting KVM VMs, which are
running raw on LVM volumes, which are on top individual DRBD resources. Half
the VMs run on each of the servers. So far so good.

All that needs to happen now is: If one host goes down
- fence it from the other
- promote the DRBD partitions for the VMs it wasn't running to primary
- start the KVM VMs which it wasn't already running
- and notify me.

The step for today is trying to get IPMI stonith working, using Pacemaker
1.0.9 and Heartbeat 3.0.2. I've tried configuring external/impi through
DRBD-MC, but all ends up with is a gray box saying Starting ... Not Running,
despite that the various connection settings I've put in are known good. The
logs show it generating a bunch of XML, and no errors, but no success.
Evidently I'll need to configure this by hand. There are bits of
documentation like this out there:

http://www.karlkatzke.com/ipmi-stonith-howto-with-pacemaker/

but it's just a fragment. Let's assume I can work out the missing pieces to
that part of the puzzle. Then there's the promotion of DRBD resources, which
it looks like there are a few different, somewhat disputed, ways to handle.
Advice on the simplest reliable way would be welcome.

As for a resource to start a defined list of KVM VMs, I don't see one in
Pacemaker or Heartbeat at all. Should I find one elsewhere, or do I need to
learn how to create one? If so, is there a good template around?

The notification I'm not too worried about, since I can get most of what I
want from Nagios running elsewhere.

As an alternative, it wouldn't take much of a custom script to send the IPMI
reset message, promote a list of DRBD partitions, and start a list of KVMs.
The trickier part of that approach would be joining it with
Pacemaker/Heartbeat, which I presume should still be preferred for initial
detection over simply sending out pings by script. There'd have to be some
conditionals added for initial system startup to make sure not to promote
DRBD resources or start VMs if the other system had already taken them in
failover. But doing it all "by hand" looks doable.

So, short of taking the long way around and spending weeks more studying the
whole Pacemaker/OpenAIS universe, what's the simplest way to a reasonably
dependable two-host KVM-VM-on-LVM-on-DRBD private cloud? 

I'm totally open to doing it "wrong." I'm running an NFS server consisting
of two systems with DRBD and UCARP plus some simple scripts, and the
failover tests have been flawless. But that's obviously not adaptable to
this KVM puzzle. 

Nor are Red-Hat specific methods. I've got this on Ubuntu. And I'd really
like to avoid the solution having more layers than it needs - some of the
cluster resource stacks are top-heavy with their jack-of-all-trades
approaches. I'm just trying to master this relatively simple system.

Best,
Whit