Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, for your basic questions: Yes, your design idea is sound and it should work without any major problems, see exceptions below. Getting this in production in 2 days time without any hands on Pacemaker experience, though - that's one hell of a call. (I'm assuming this isn't something you've yet made a habit of.) I suggest you focus on the DRBD side of things and see if you can establish a synced resource. From your description of the situation, a manual failover near the start of work hours will still be much preferable a week of downtime, so it may suffice? As for the aforementioned exceptions: During network failure, you will most definitely run into split-brain, i.e. the VMs in your datacenter remain operational and do whatever work they were doing when connectivity failed. The failover VMs will boot as though they had crashed at the moment of network failure. So once connectivity is restored, DRBD will tell you that in the datacenter, stuff has been written that your failover VMs never knew about. Normally (if you had time and resources), you would implement STONITH (a.k.a. fencing) to protect yourself by killing the original VMs during failover. As your time frame probably won't allow you to set this up properly ("testing? heh"), you may want to settle for the manual approach: Anticipate the split brain and be ready to discard whatever the original VMs saw fit to commit to their disks after getting disconnected. HTH, Felix On 12/13/2011 10:50 PM, Trey Dockendorf wrote: > I have somewhat of an emergency on my hands, and am hoping the community > may have some insight. One of the primary fiber rings on my campus will > be down for a week, unless the damaged fiber fails, it will be down > then. During this time my primary datacenter could possibly > have intermittent connectivity to other buildings / outside work. > Unfortunately we do not have the resources for a remote data center > (yet), but for now I'm setting up a remote KVM server in a portion of > campus that will not be effected. Currently all VMs are QCOW2 images > that live on a 1.2TB Logical volume. This seems like the perfect > situation to use DRBD in active-passive. However I'm not currently > trying to prepare for hardware failure but network failure. Is it > possible to do this, to have the two LVMs synced with DRBD and then have > a resource manager (Pacemaker) detect that the two can't communicate > (network failure) and then activate the passive node? I want to avoid > as much complexity as possible, so no live migration or dual primary if > possible. > > This is really a temporary solution for 1 week for all > my organization's web servers, but it has made our executives aware of > our need for more funds and a remote datacenter. So hopefully this > won't be last I can use DRBD, but I have to do this in the next 2 days. > > Any help is greatly appreciated.