[DRBD-user] drdb for hundreds of virtual servers with cheap hardware

Adrien Laurent adrien at modulis.ca
Tue Oct 16 16:58:19 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Hi Graham,

Thanks for your time, my answers below:

On 10/16/07, Graham Wood <drbd at spam.dragonhold.org> wrote:
> ----- Message from adrien at modulis.ca ---------
> > -will NFS do the job? (knowing that there will not be simultaneous
> access of
> > the same data from different virtual servers). Note: the virtual server
> will
> > be relatively small 300Mo each, they will be stored in a folder not an
> image
> > (kinda like chroot).
> What virtualization method are you using?  All the ones that really
> separate things out (other than Zones) that I've used require a block
> device, not a directory....

I'm using vserver which is closer to freebsd chroot jailed than xen or
vserver just "chroot" all the process of the vserver. All the virtual
servers share the same kernel.

NFS is able to do it, but your commodity hardware may not be able to
> handle the throughput you're looking for using NFS - a lot depends on
> the traffic patterns, load, etc.  Your network diagram also shows each
> storage server as only having a single network link to the backbone -
> which is probably not what you want.  I'd suggest a couple of changes:
> 1. Use 2 "partitions" within the RAID1, and have both servers active
> (nfs-virt-1 and nfs-virt-2 if you like) to maximize performance (only
> in failover conditions will you have everything running off a single
> server.

That's a good idea !

2. Use 4 network connections for the storage servers - a pair for the
> DRBD link and a pair for the front end connection.  It removes the 2
> SPoF (Single Point of Failure) that your diagram has there

I can do it too.

3. If you can afford it, I'd use 4 network connections for the
> virtualization servers 2.  A pair to the backend storage and another
> pair to the front end for user usage.
> > -will Heartbeat "guarantee" that failover his made transparently without
> > human intervention ?
> I use the redhat cluster suite rather than heartbeat, and NFS running
> on that does this quite happily for me.  I'm using DRBD as the back
> end for a shared LVM arrangement - this provides my storage for a DB,
> user home directories, mail server, etc.  I'm using the RH cluster
> rather than heartbeat because it seems to have better options for
> managing the services (e.g. most of the time I have 4 running on
> serverA and 1 running on serverB - my VoIP stuff gets its own server
> normally)

I will have a look at redhat cluster suite, but it seems more complicated to
setup than heartbeat.

> -My physical virtualization servers will be diskless to simplify
> management
> > (only swap will be on a local partition), is it a bad idea - could it
> > decrease performance ?
> How much of an affect this would have depends on the method of doing
> the virtualization as much as anything else.  If the OS is primarily
> "passive", and therefore not accessed much, then this should be fine -
> although if you're using local swap then it's almost as easy to have a
> really simple OS image on it - which could reduce your network
> traffic.  Most linux distributions allow for very easy/quick
> provisioning, so you could even not bother with RAID on the servers.
> I'm using FAI to do my debian installs, and I can reinstall a node in
> my cluster in approximately 4 minutes - not counting the DRBD resync.

The goal of the pxe boot is to save me a KVM if I screw up a boot process.
But I can easily do an automated network install on pxe.

> -Can DRDB save me the RAID 1 setup ? so that I can use RAID 0 and double
> my
> > capacity without affecting nfs service in case of hard disk failure ?
> Yes and no.  You have 2 copies of the data, so the system could cope
> with a failure - but you then have no extra redundancy at all.
> Considering the time it'll take to rebuild a 750GB DRBD device (and
> the associated performance reduction), I think that the $100 or so
> saving per disk just wouldn't be worth it.

Good point, I haven't accounted for the DRDB rebuilding time ...

> -Has anyone run software RAID 5 and DRDB, or the overhead is too important
> ?
> I've done it, but personally really don't like RAID5 anyway...
> Considering the relative costs of disks and servers I'd probably stick
> with mirroring.  50 virtual machines per physical only works out
> (using your 300MB figure) as 150GB - so each pair of 750GB disks can
> handle approximately 5 full virtualization servers... The performance
> of this ( 250 different machines fighting to access a single spindle)
> is something that you'll need to check, it could easily blow up if the
> load is high enough

The voip servers doesn't do to much IO (they are more cpu intensive)

> -A another scenario would be to use the local disk (80Go) of each
> > virtualization servers (no more pxe or nfs) and have DRDB duplicate the
> > local disk to a same-size partition on a RAID 5 one server NAS. Do you
> think
> > this second scenario would be better in terms of uptime ?
> It is likely to be a much higher performance answer than using NFS as
> the back end, but you need to think about failover to decide whether
> it is better.  If you are using the single pair of "storage" servers,
> then if one of your virtualization servers dies you can evenly
> distribute the virtualized servers across the rest of the estate to
> wherever you want.  If you mirror the local disk to a single back end,
> how do you bring up that virtual machine somewhere else?
> A more "generic" solution would be to totally decentralize the
> solution - and automatically allocate virtual servers a primary and
> secondary location.  Autogenerate the DRBD configuration files from
> this pairing, and you can split the load more evenly.  That gives you
> a situation where losing a single virtualization server just slightly
> increases the load everywhere without much additional effort - and
> would remove the bottleneck of a small number of storage servers (e.g.
> all virtualization servers talking to each other, rather than all of
> them talking to 1 or 2 backend stores).

I have think about this first, but I think it would be much more
configuration and script making.
The other problem with this setup is that most of the server I can buy have
scsi interface only, so I can't easily increase the disk capacity.

Using a very quick and simple model, I reckon that with 10 physical
> servers, each with 9 virtual servers, this would give you:
> 0 failures = 90 available, all servers running 9 virtual machines
> 1 failure  = 90 available, all servers running 10 virtual machines
> 2 failures = 89 available, 7 servers running 11, 1 server running 12
> Graham

Thanks a lot for your feedback,


> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

Adrien Laurent
(514) 284-2020 x 202
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20071016/5a577e98/attachment.htm>

More information about the drbd-user mailing list