Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all: This message isn't a HOWTO it's more of a heads-up and hopefully could help others to get a grip on the things you can do with this awesome technology. On that note I first must congratulate and exoress my admiration for the folks developing DRBD, this is definitely the type of technology that enables a paradigm shift in how servers and storage are managed in many diverse scenarios. It's also the key to achieving "The Poor Man's High Availability Cluster". I dabbled a bit with 8pre6 and was able to get an active/active setup that permitted live migrations of Xen DomUs, but I had problems with heartbeat and the xen init scripts not being ready for this type of setup. If I rebooted a node, I was the victim of "split-brain hell", thus requiring some sort of manual intervention. I'm sure that with more thorough tweaking and planning of this setup, the problems could be ironed out but I just didn't have the time. So I went back to the trusty old 7.22 and have a setup that's truly HA and automated. It's a great feeling to walk up to one node and just press the power button and in a matter of seconds everything has failed over smoothly to the other box. Better yet, when the shutdown node comes back up, the resources are returned automatically. No human intervention required whatsoever. So here's a quick glance over what I've got running. If anyone needs more details let me know. * Two node HA cluster composed of two Dual AMD Opteron based 1u rack mount boxes. Identical configurations for both. * Gigabit to the network and another Gigabit for the crossover link directly between both machines. * CentOS 4.4 x86_64 on both nodes. Minimal install: de-selected everything in the installer. * Compiled Xen (lean Dom0 and DomU kernels) * Compiled DRBD 7.22 * Heartbeat via yum * Built-in LVM2 courtesy of CentOS 1. Used fdisk to setup two partitions on a 146 GB drive. I set the type to Linux LVM (8e). 2. Setup drbd.conf to use one partition for r0 and the other for r1. These are drbd0 and drbd1 respectively. 3. Setup /etc/lvm/lvm.conf to filter out the underlying partitions and add the drbd devices explicitly. Eg., [ "r|/dev/sda1|", "a|/dev/drbd0|", "r|/dev/sda2|", "a|/dev/drbd1|" ] 4. Created an LVM PV on each drbd device. 5. Created an LVM VG on each PV. 6. Created two LVM LVs for each DomU one for Swap and the other for VBD. I distributed these LVs amongst the two VGs which are atop the two separate drbd devices. 7. Created two subdirectories under /etc/xen/auto: one for the DomUs to be active on node1 and the other for the DomUs to be active on node2. 8. Put the config files for the DomUs distributed between the two new subdirectories. 9. Copied /etc/sysconfig/xendomains twice (xd1 and xd2) in that directory since we need two configs for the two new subdirectories of /etc/xen/auto. Modified these two new copies setting the XEN_DOMAINS_AUTO variable to one of the two new subdirs respectively. 10. Likewise copied /etc/rc.d/init.d/xendomains twice, this time not in the same directory but in /etc/ha.d/resource.d since only heartbeat will be using these scripts. Modified each copy to point to the corresponding config file we created in the previous step. 11. Configured the /etc/ha.d/ha.cf file to enable autofailback , specify the cluster node names and the ucast device and perr IP. 12. Modified the /etc/ha.d/resource.d/LVM script so it would work on CentOS. For some reason it was not able to detect the LVM version and balked on the command line switches to the LVM commands. 13. Configured the /etc/ha.d/haresources to specify one resource line for each group of DomUs with corresponding drbddisk and LVM dependencies. Eg. node1 drbddisk::r0 LVM::vgr0 xd1 node2 drbddisk::r1 LVM::vgr1 xd2 So what does all this produce? Node 1 has N DomUs and so does Node 2. Each set of DomUs is on its own drbd device and each node is primary for one of these devices. When a node fails, heartbeat sets the other node as primary for the affected drbd device, activates the LVM VG and LVs and starts the affected set of DomUs via their custom xendomains script (xd1 or xd2). It works great. I've rebooted, pulled the plug, and hit the power button and everything fails over OK. There's a slight delay of about 90 seconds since it isn't live migration but my environment can tolerate this. Try it out, it's all good! -- José E. Colón Rodríguez Academic Computing Coordinator University of Puerto Rico at Cayey V. 787-738-2161 x. 2415, 2532 E. jecolon at cayey.upr.edu ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.