[DRBD-user] DRBD or not DRBD ?

Sun Apr 24 17:34:03 CEST 2011

On Sun, Apr 24, 2011 at 10:39:01AM -0400, Digimer wrote:

>   OCFS2 and GFS2 require cluster locking, which comes with a fair amount
> of overhead. Primary/Secondary DRBD with a "normal" filesystem like ext3
> will certainly be faster, but in Secondary, you can not access the
> Secondary resource at all.

...

>   Given the relative trivial expense of network cards, I always
> recommend three separate networks; Internet Facing, Storage and
> Back-Channel (cluster comms + live migrations when clustering VMs).

Digimer,

All useful stuff. Thanks. I hadn't considered three rather than two
networks. That's a good case for it.

Here's what I'm trying to scope out, and from your comments it looks to be
territory you're well familiar with. I've got two systems set up with KVM
VMs, where each VM is on its own LVM, currently each with primary-secondary
DRBD, where the primary roles are balanced across the two machines. As far
as I can tell, and from past comments here, It's necessary to go
primary-primary to enable KVM live migration, which is a very nice feature
to have. None of the VMs in this case face critical issues with disk
performance, so primary-primary slowing that, if it does in this context,
isn't a problem.

Since each VM is in raw format, directly on top of DRBD, on top of its
dedicated LVM, there is no normal running condition where locking should be
an issue. That is, there's no time, when the systems are both running well,
when both copies of a VM will be live - aside from during migration, where
libvirt handles that well.

It's the abnormal conditions that require planning. In basic primary-primary
it's possible to end up with the same VM on each host running based on the
same storage at the same time. When that happens, even cluster locking won't
necessarily prevent corruption, since the two instances can be doing
inconsistent stuff in different areas of the storage, in ways that locks at
the file system level can't prevent. 

There are two basic contexts where both copies of a VM could be actively
running at once like that. One is in a state of failover. In a way failover
initiation should be simpler here than that between non-VM systems. No
applications per se need to be started when one system goes down. It's just
that the VMs that were primary on it need to be started on the survivor. At
the same time, some variation of stonith needs to be aimed at the down
system to be sure it doesn't recover and create dueling VMs. Any hints at
what the most effective way of accomplishing that (probably using IPMI in my
case) will be welcomed.

The other way to get things in a bad state, if it's a primary-primary setup
for each VM, is operator error. I can't see any obvious way to block this,
other than running primary-secondary instead, and sacrificing the live
migration capacity. It doesn't look like libvirt, virsh and virt-manager
have any way to test whether a VM is already running on the other half of a
two-system mirror, so they might decline to start it when that's the case.

Maybe I'm missing something obvious? Is there, for instance, a way to run
primary-secondary just up to when a live migration's desired, and go
primary-primary in DRBD for just long enough to migrate? 

Thanks,
Whit