[DRBD-user] DRBD and KVM for a HA-Cluster ?

Fri Jan 7 22:00:44 CET 2011

Digimer wrote:
> 
> Let's try some ASCII art. :)
> 
>  [VM1]   [VM2]
>    |       |
>  [LV1]   [LV2]
>    |       |
>  [____VG1____]
>        |
>  [_DRBD - PV_]
>    |       |
> [Node1] [Node2]
> 
> You can run VM1 and VM2 on either node, but not both nodes at 
> once. VM1 and VM2 can both be running on Node1 or Node2, or 
> one on each. While a given VM is running, it and only it will 
> be modifying the underlying LV.
> By the very nature that the VM can not run in two places at 
> one is why the inodes will never be updated in two places at once.
> 
> >> The risk here, though, is with split-brains.
> >>
> >> Consider this;
> >>
> >> You have two partition on your DRBD; one each for two VMs. In
> > 
> > Why do i have two partitions ? I thought, from the VM's 
> point of view, i have one partition. 
> > I thought that DRBD makes out of two physical independent 
> devices one logical device, like a RAID1. Right ?
> > Or do you just mean that every VM has one local partition, 
> which is synchronized with the other, remote partition, using DRBD ?
> 
> *Under* the VMs, on the host Nodes, you see two partitions. 
> Inside the VMs, they see a (virtual) hard drive. Let's 
> further say that we'll run two VMs. Finally, lets say that 
> each node has a 2-node RAID1 array.
> 
>   [ Node1 ]       [ Node 2 ]
>     |   |           |   |
> [sda1] [sdb1]   [sda1] [sdb1]
>    |      |        |      |
>  [___md1___]     [___md1___]
>       |               |
>       \---[DRBD r0]---/
>               |
>            [LVM PV]
>               |
>            [ vg0 ]
>             |   |
>          [lv1] [lv2]
>            |     |
>          [vm1] [vm2]
> 
> Is this the structure you have in mind?
> 
> Now edit this to use Felix's idea where instead of have two 
> LVs on vg0, instead create two RAID devices on each node (md0 
> and md1) and use them to create two DRBD resources (r0 and 
> r1). Make each of these an LVM PV
> -> VG -> LV to host each VM (repeat this for all VMs you want to
> create). The benefit now, as Felix explained, is that you 
> won't run into the problem I described, though your layout 
> gets a little more interesting.
> 
> > Currently, i'm totally confused. 
> > Single primary means that a resoure is only primary on one 
> node. Write access to the primary node is synchronized via 
> DRBD to the other node. Right ?
> > Dual primary means that a resource is primary on both 
> nodes. That means that write acces to node A is replicated to 
> node B, and vice versa, at hte same time. Right ?
> > Question: can i run two identical VM's, one on node A and 
> one on node B, having DRBD configured as a dual primary ?
> > Please don't forget that i want to install my KVM's into 
> plain LV's, without any filesystem.
> > 
> > Sorry for bothering you, but i like to understand the 
> techniques i use.
> > 
> > Bernd
> 
> In Primary/Secondary, the data is still replicated but the 
> Secondary node can not use the DRBD resource. Period. Full 
> stop. Should the Primary fail, you would then need to promote 
> the Secondary to Primary and begin recovering services.
> 
> In Primary/Primary, both nodes can use the DRBD resource, at 
> least so far as DRBD is concerned. Without a cluster though 
> to provide cluster locking, each partition on the DRBD 
> resource can only be used on one node at a time. So if you 
> have two LVs on your DRBD resource, each of those LVs can be 
> used on either node. So /dev/vg0/lv1 could be used by
> Node1 while /dev/vg0/lv2 can be used by Node2.
> 
> So yes, if you have to separate VMs, one on lv1 and the other 
> on lv2, you can run them on either node (VM1 can run on Node1 
> using /dev/vg0/lv1 while VM2 can run on Node2 using 
> /dev/vg0/lv2. This is the benefit of Primary/Primary.
> 
> Again though, for the reason I explained, without a 
> stonith/fence device and/or a cluster, I strongly urge you to 
> use Felix's method and create a unique DRBD resource per VM. 
> This way you can 'invalidate' each DRBD resource after a 
> split brain independently.
> 

Hi,

Thanks for your patience.
I think we have a missunderstanding.
My scenario is like that:
I have three different vm's, each one offering a different service. I'd like to have each vm high available. I have only two nodes, each one with a hardware raid 5:

[ Node 1 ]     [ Node 2 ]
     |             |
[HW RAID5]     [HW RAID5]
     |             |
[Partition]    [Partition]
     |             |
[ PV1    ]     [ PV1    ]
     |             |
[ VG1    ]     [ VG1    ]
     |             |
[ LV1    ]     [ LV1    ]
     |             |
[ DRBD1  ]-----[ DRBD1  ]
     |             |
[ VM1A   ]     [ VM1B   ]   

VM1A and VM1B are identical VM's, offering the same service !

Second VM (second service):

[ Node 1 ]     [ Node 2 ]
     |             |
[HW RAID5]     [HW RAID5]
     |             |
[Partition]    [Partition]
     |             |
[ PV1    ]     [ PV1    ]
     |             |
[ VG1    ]     [ VG1    ]
     |             |
[ LV2    ]     [ LV2    ]
     |             |
[ DRBD2  ]-----[ DRBD2  ]
     |             |
[ VM2A   ]     [ VM2B   ]   

VM2A and VM2B are identical VM's, offering another service.  
And so on for further vm's (further services) ...

The vm's are installed in the plain DRBD, without any filesystem. Each vm has just to be created once, because DRBD, responsible for the replication, takes care that the vm "appears"
on the other node. Right ? Is DRBD able to replicate that, although there is no filesystem ?
Is this scenario senseful ?
In this scenario, is it possible to run VM1A and VM1B simultaneously, in a DRBD dual primary mode ? I don't think so.
I hope my scenario is clear.

Bernd