Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Jun 23, 2011 at 01:54:45PM -0600, Pete Ashdown wrote: > I don't think the cluster software is quite ready for primetime in Ubuntu. > I have high hopes for the next LTS (12.04?). In any case, you'll need the > ubuntu-ha ppa now. I'm an old-school sysadmin. It's gotten to a point in the last few years where I'll trust varios distros' LAMP stacks, but before that it didn't matter which distro, I always built each component of anything mission critical from source. Never trusted distro kernels either. I look at cluster stacks now as being where the LAMP stack was 15 years ago, and kernels were 5. But unlike with the LAMP stack 15 years back, the documentation is worse, and configuration methods are a bizarre, diverse, inconsistent mess. > If you're not looking to live clustering replacement of downed KVM guests, > you can get clvm running by just configuring corosync then disabling the > corosync & clvm init.d scripts (update-rc.d disable) and creating one to > run /usr/sbin/aisexec. But of course I'm looking to live replacement of guests. It's easy enough to do that with a human involved sitting at virt-manager or drbd-mc. Anything a human can do, that's a simple, consistent operation like that can be handled by a well-designed script. > I know this topic isn't directly drbd related, but I do know how hard it is > to find good clustering help for Ubuntu. You can email me offlist if you > need any extra help. I truly appreciate that, Pete. What I've been hoping to find is distribution-neutral models for handling KVM failover, where each KVM VM is directly on top of a dedicated DRBD resource - which is a setup with an ideal granularity, as compared to having multiple VMs sharing a single DRBD resource. Because the granularity is right-sized, it greatly simplifies the possible failure modes, and what should be needed for reliable failover. For example, if it were one shared DRBD resource for many VMs running across two hosts, it would need to run primary-primary, with CLVM and a clustering file system and the full pacemaker-corosync/heartbeat treatment. I get that. But with dedicated, per-VM DRBD resources, each can be run primary-secondary (if you don't mind a few seconds down during migration - which will be there in failover in any case), so there's no need for CLVM or one of the (less mature than ext4 or xfs) clustering file systems in the arrangement. There also should be a whole lot less needed on the pacemaker- corosync/heartbeat side. What I've been hoping to find is documentation on just enough of pacemaker-corosync/heartbeat to handle this simplified architecture adequately. But most of the documentation isn't aimed towards an architecture like this at all, and just about nothing I've found addresses a KVM environment. Because each layer is sliced into its own project, with its own documentation, configuration syntax, and mailing list, there's no perfect place for addressing these questions - no place I've found dedicated to the overview, to the questions of architecture and integration rather than to close focus on one of the components. While KVM/QEMU is at an admirable state of maturity - and rapidly improving - that doesn't extend to its documentation at all. The libvirt list has fascinating discussion at the edge of the technology, but even less about the practical issues of deployment than we have here. LINBIT has adopted pacemaker because failover concerns are a natural fit to DRBD. Failover concerns for KVM VMs hosted on dedicted DRBD resources, I'll argue, area also a natural fit for DRBD. I'm not sure, in this context whether pacemaker etc. is the answer. If it is, it would be nice to see that documented. Best, Whit