[DRBD-user] DRBD primary/primary for KVM - what is the best option?

Tue May 11 10:44:12 CEST 2010

On Tuesday 11 May 2010 01:28:37 Michael Iverson wrote:
> Thanks all for your input. Based on what was posted here, plus some
> private discussions, and some reflection, I think I've come to my
> senses, and really only have two viable options:
> 
> 1. iSCSI with a conventional primary/secondary setup and failover.
> Take the disk performance hit in exchange for flexible live migration.
> This should be perfect for some of my not-very-io-intensive tasks. I
> might still play with multipath for network redundancy and link
> throughput, but it will be multipath to one host at a time.
> 
> 2. The multiple drbd partition that Ben Timby mentioned. This should
> be better for io-bound applications, while avoiding much of the split
> brain headaches caused by a single volume primary-primary setup. The
> only downside is that live migration is only going to be between two
> machines.
> 
> For my initial cluster I'm building now, I'm going to go with the
> former method. However, the latter method will probably be ideal for
> phase two of my HA project, which will tackle a rather large database
> we have.
> 
> Mike
> 
> On Mon, May 10, 2010 at 4:23 PM, Michael Iverson <miverson at 4hatteras.com> 
wrote:
> > Ben,
> >
> > This is an interesting approach. I can see how recovering from split
> > brain would be relatively easy in this situation, as each volume
> > would, in effect, be operating in a primary/secondary role, as the
> > virtual image would only be running in a single location.
> >
> > The only reason I've brought iSCSI into the mix is to have the option
> > of including a third node into the cluster, used to off load
> > processing if one of the primary nodes is down. This, in theory, would
> > allow the cluster to run at higher utilization.
> >
> > I guess there are three ways I could bring the third node into play:
> >
> > 1. Use direct disk access for normal production in a manner similar to
> > your setup. If I need to use the third node, I could manually start
> > iSCSI on the drbd images, and cold migrate some the VMs to the third
> > node. This way, I'd get good performance in normal production, but
> > still could bring the third node into play when needed, with the loss
> > of live migration.
> >
> > 2. Use a similar approach to yours, only with the addition of iSCSI.
> > Each LUN could be associated with the same compute node where the VM
> > image normally runs, with failover to the opposite node. This would
> > allow the iSCSI traffic to remain local on the machine in most cases,
> > and only use the network in the event the image was migrated to
> > another node. This would keep live migration between any node a viable
> > option. If I can figure out how to hot migrate individual iSCSI
> > targets, it would be even more flexible.
> >
> > I suppose I'll have to do some benchmarks to see if the performance
> > loss of a local and/or remote iSCSI service brings to the VM versus
> > direct block device access.
> >
> > 3. The third option would be an all iSCSI setup similar to what I
> > mentioned in my original email, with the DRBD cluster acting as a
> > load-balanced, multipath SAN.
> >
> > Thanks,
> >
> > Mike
> >
> > On Mon, May 10, 2010 at 2:34 PM, Ben Timby <btimby at gmail.com> wrote:
> >> Michael, sorry I did not speak about KVM, I have never used it, my
> >> experience is with Xen, so I can only assume you can do something
> >> similar with KVM. My point was that having a dedicated DRBD resource
> >> for each VM (as opposed to replicating the entire volume) gives you
> >> granular control over each virtual disk. Allowing you to move a VM
> >> from one machine to the other, and not requiring that you have a
> >> primary/primary setup and a cluster file system. This is of course at
> >> the expense of having many LVs and DRBD resources, but I have not run
> >> into any issues with my setup so far.
> >>
> >> On Mon, May 10, 2010 at 2:29 PM, Ben Timby <btimby at gmail.com> wrote:
> >>> Michael, I have a similar setup, however, I am going with a simpler
> >>> configuration.
> >>>
> >>> I have two Xen hypervisor machines, each with a 1TB volume. I use LVM
> >>> on top of the 1TB volume to carve out LVs for each VM harddrive.
> >>>
> >>> Then I have DRBD replicating each LV. So, currently I have 14 DRBD
> >>> devices, I add a new DRBD resource whenever I create a new VM.
> >>>
> >>> This allows each VM to migrate from one hypervisor to the other
> >>> independently. All the DRBD resources are setup for dual primary, this
> >>> is needed to support Xen live migration.
> >>>
> >>> I let Hearbeat manage the VMs and I use the drbd: storage type for the
> >>> Xen VMs, so Xen can handle the DRBD resources. This gives me failover
> >>> for all the VMs, as well as manual live migration. Currently I run
> >>> half the VMs on each hypervisor 7/7, to spread the load, of course
> >>> Hearbeat will boot up the VMs on the remaining hypervisor if one of
> >>> the systems fail. When I perform maintenance, I can put a Heartbeat
> >>> node into standby and the VMs live migrate.
> >>>
> >>> This has been a very stable configuration for me.
> >>
> >> _______________________________________________
> >> drbd-user mailing list
> >> drbd-user at lists.linbit.com
> >> http://lists.linbit.com/mailman/listinfo/drbd-user
> >
> > --
> > Dr. Michael Iverson
> > Director of Information Technology
> > Hatteras Printing
> 
What also might be a nice idea (though theoretical, don't know if this is 
feasible) is to have a primary/secondary setup for iSCSI and have Heartbeat 
take care of individual targets on individual DRBD resources. One set could 
run on a seperate IP address on one node and the other could run on the other 
node on a different IP address. This would perfectly divide the load. 
However, while failing over, the Heartbeat resource agent should instruct IETD 
(if this is the iSCSI software you are using) to first bind to the other IP-
address and then dynamically load the new targets. 
I'm not sure if this is possible at all and if this will result in breaking 
existing iSCSI sessions or not, but it would be a nice thing to happen though 
...


Bart