[DRBD-user] KVM on multiple DRBD resources + IPMI stonith - questions

Noah Mehl noah at tritonlimited.com
Fri Jun 24 17:45:44 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Jun 24, 2011, at 10:58 AM, Whit Blauvelt wrote:
> But of course I'm looking to live replacement of guests. It's easy enough to
> do that with a human involved sitting at virt-manager or drbd-mc. Anything a
> human can do, that's a simple, consistent operation like that can be handled
> by a well-designed script. 

I have to second this, this is part of the work on my plate right now.  The wonderful thing is that it's a better solution than anything from vmware, etc...

> I truly appreciate that, Pete. What I've been hoping to find is
> distribution-neutral models for handling KVM failover, where each KVM VM is
> directly on top of a dedicated DRBD resource - which is a setup with an
> ideal granularity, as compared to having multiple VMs sharing a single DRBD
> resource. Because the granularity is right-sized, it greatly simplifies the
> possible failure modes, and what should be needed for reliable failover. For
> example, if it were one shared DRBD resource for many VMs running across two
> hosts, it would need to run primary-primary, with CLVM and a clustering file
> system and the full pacemaker-corosync/heartbeat treatment. I get that. But
> with dedicated, per-VM DRBD resources, each can be run primary-secondary (if
> you don't mind a few seconds down during migration - which will be there in
> failover in any case), so there's no need for CLVM or one of the (less
> mature than ext4 or xfs) clustering file systems in the arrangement.

I've done a lot of thinking on this point, but I've come to a different conclusion.  I think that a drbd resource per guest HD is not a great way to go.   I think that adds a lot of complexity and overhead you don't necessarily need.  Although I do agree with your Primary/Secondary conclusion.  My conclusion is to take a identical storage shelf on each node and divide it by the number of nodes.  So, for two nodes, take the shelf and partition it in half.  This way you have half of it available on each node in Primary/Secondary.  On the event of a failover, just promote that half of the shelf to Primary on the surviving node.  This gets better on a three node configuration, because you can use 66% of your cpu and ram resources, instead of 50% in a two node scenario.  Anyways, I also suggest XFS and qcow2 containers (or using NFS also if you using a storage/host deployment).  I still would put the XFS disk on a LVM on DRBD, so you can take a lvm snapshot and copy off qcows for backups, etc.  The reason I say use qcows is that when you use LVMs, you waste all the free space on your guests because its tied up in LVM's.  I have not seen a significant performance difference between lvm and qcow2.  Perhaps you have?


> There also should be a whole lot less needed on the pacemaker-
> corosync/heartbeat side. What I've been hoping to find is documentation on
> just enough of pacemaker-corosync/heartbeat to handle this simplified
> architecture adequately. But most of the documentation isn't aimed towards
> an architecture like this at all, and just about nothing I've found
> addresses a KVM environment. 

I concur.

> 
> Because each layer is sliced into its own project, with its own
> documentation, configuration syntax, and mailing list, there's no perfect
> place for addressing these questions - no place I've found dedicated to the
> overview, to the questions of architecture and integration rather than to
> close focus on one of the components. While KVM/QEMU is at an admirable
> state of maturity - and rapidly improving - that doesn't extend to its
> documentation at all. The libvirt list has fascinating discussion at the
> edge of the technology, but even less about the practical issues of
> deployment than we have here. LINBIT has adopted pacemaker because failover
> concerns are a natural fit to DRBD. Failover concerns for KVM VMs hosted on
> dedicted DRBD resources, I'll argue, area also a natural fit for DRBD. I'm
> not sure, in this context whether pacemaker etc. is the answer. If it is, it
> would be nice to see that documented. 

Maybe we need to make one?  I know that I'm betting my business future on this set of technologies.  If no one has a better idea, shall we make a KVM Cluster mailing list?  I'd host it.  Just a thought.

Separately, I agree, I'm unable to find good documentation on the clustering stacks.  Even the redhat-cluster-suite documentation is not very complete.  Again, maybe we need to make it?

~Noah


Scanned for viruses and content by the Tranet Spam Sentinel service.



More information about the drbd-user mailing list