[DRBD-user] DRBD and KVM for a HA-Cluster ?

Digimer linux at alteeve.com
Fri Jan 7 17:50:25 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 01/07/2011 11:25 AM, Lentes, Bernd wrote:
>  
> Digimer wrote:
> 
>>>
>>> i thought dual primary means, that i have services (like a 
>> VM) running on both nodes. Am i wrong ?
>>>
>>>
>>> Bernd
>>
>> That is a use for dual-primary, but not technically the same thing.
>>
>> All that dual-primary gives you is block-level write ability 
>> from both nodes. What you run on top of it and whether or not 
>> that can be run from both nodes at the same time is a 
>> separate question. Ideally, it *should* be, but doesn't 
>> necessarily need to be.
>>
>> So in your case where each LV hosts a VM, locking is not 
>> needed. This is because the entirety of the partition has 
>> only one source. So the inodes contained in that partition 
>> will never be altered by two sources at the same time.
> 
> Why does the partition only have one source, when both VM's are running ? Why will the inodes not be altered at the same time when i have both VM's running ?

Let's try some ASCII art. :)

 [VM1]   [VM2]
   |       |
 [LV1]   [LV2]
   |       |
 [____VG1____]
       |
 [_DRBD - PV_]
   |       |
[Node1] [Node2]

You can run VM1 and VM2 on either node, but not both nodes at once. VM1
and VM2 can both be running on Node1 or Node2, or one on each. While a
given VM is running, it and only it will be modifying the underlying LV.
By the very nature that the VM can not run in two places at one is why
the inodes will never be updated in two places at once.

>> The risk here, though, is with split-brains.
>>
>> Consider this;
>>
>> You have two partition on your DRBD; one each for two VMs. In 
> 
> Why do i have two partitions ? I thought, from the VM's point of view, i have one partition. 
> I thought that DRBD makes out of two physical independent devices one logical device, like a RAID1. Right ?
> Or do you just mean that every VM has one local partition, which is synchronized with the other, remote partition, using DRBD ?

*Under* the VMs, on the host Nodes, you see two partitions. Inside the
VMs, they see a (virtual) hard drive. Let's further say that we'll run
two VMs. Finally, lets say that each node has a 2-node RAID1 array.

  [ Node1 ]       [ Node 2 ]
    |   |           |   |
[sda1] [sdb1]   [sda1] [sdb1]
   |      |        |      |
 [___md1___]     [___md1___]
      |               |
      \---[DRBD r0]---/
              |
           [LVM PV]
              |
           [ vg0 ]
            |   |
         [lv1] [lv2]
           |     |
         [vm1] [vm2]

Is this the structure you have in mind?

Now edit this to use Felix's idea where instead of have two LVs on vg0,
instead create two RAID devices on each node (md0 and md1) and use them
to create two DRBD resources (r0 and r1). Make each of these an LVM PV
-> VG -> LV to host each VM (repeat this for all VMs you want to
create). The benefit now, as Felix explained, is that you won't run into
the problem I described, though your layout gets a little more interesting.

> Currently, i'm totally confused. 
> Single primary means that a resoure is only primary on one node. Write access to the primary node is synchronized via DRBD to the other node. Right ?
> Dual primary means that a resource is primary on both nodes. That means that write acces to node A is replicated to node B, and vice versa, at hte same time. Right ?
> Question: can i run two identical VM's, one on node A and one on node B, having DRBD configured as a dual primary ?
> Please don't forget that i want to install my KVM's into plain LV's, without any filesystem.
> 
> Sorry for bothering you, but i like to understand the techniques i use.
> 
> Bernd

In Primary/Secondary, the data is still replicated but the Secondary
node can not use the DRBD resource. Period. Full stop. Should the
Primary fail, you would then need to promote the Secondary to Primary
and begin recovering services.

In Primary/Primary, both nodes can use the DRBD resource, at least so
far as DRBD is concerned. Without a cluster though to provide cluster
locking, each partition on the DRBD resource can only be used on one
node at a time. So if you have two LVs on your DRBD resource, each of
those LVs can be used on either node. So /dev/vg0/lv1 could be used by
Node1 while /dev/vg0/lv2 can be used by Node2.

So yes, if you have to separate VMs, one on lv1 and the other on lv2,
you can run them on either node (VM1 can run on Node1 using /dev/vg0/lv1
while VM2 can run on Node2 using /dev/vg0/lv2. This is the benefit of
Primary/Primary.

Again though, for the reason I explained, without a stonith/fence device
and/or a cluster, I strongly urge you to use Felix's method and create a
unique DRBD resource per VM. This way you can 'invalidate' each DRBD
resource after a split brain independently.

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org



More information about the drbd-user mailing list