Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, On Tue, 2 Jul 2013 17:08:30 +0900 Christian Balzer <chibi at gol.com> wrote: > not purely a DRBD issue, but it sort of touches most of the bases, so > here goes. > > I'm looking at deploying a small (as things go these days) cluster (2 > machines) for about 30 KVM guests, with DRBD providing the storage. > > My main concern here is how long it would take to fail-over (restart) > all the guests if a node goes down. From what I gathered none of the > things listed below do anything in terms of parallelism when it comes > to starting up resources, even if the HW (I/O system) could handle it. <snip> > Lastly I could go with Pacemaker, as I've done in the past for much > simpler clusters, but I really wonder how long starting up those > resources will take. If I forgo live-migration I guess I could just > do one DRBD backing resource for all the LVMs. But still, firing up > 30 guests in sequence will take considerable time, likely more than I > would consider really a "HA failover" level of quality. Why do the vms have to start in sequence? Pacemaker happily starts several services in parallel provided they don't depend on each other. And you have to define these dependencies as orders/groups yourself. Otherwise pacemaker assumes that services are to be startet in parallel. (At least thats what I see here when booting my 2+1 node cluster from cold.) I don't have 30 vms, more like 15. But at least one drbd-volume for each machine. And dependencies defined so the ldap-server has to be up before the others start that need it. And using individual drbd-resources for the machines might be a bit more to set-up when doing it all at once (my setup has grown over time), it allows to distribute the vms on the two nodes, so the vms don't need to run all on one node. And when you also define scores for importance, you can over-commit the memory (and cpu) of the two nodes so that normally everything runs and only in case of a node-failure some not so importent vms are stopped/not started. One the other hand, if I had to start all over again (without a deadline within the next two weeks), I would look at ceph or sheepdog for storage and either use pacemaker to manage the vms or take a look at openstack's ha-support. Have fun, Arnold -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20130702/a0d706ff/attachment.pgp>