Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, first of all many thanks to everybody (expecially Linbit) for the excellent work made on drbd. I'm currently involved in setting up a Pacemaker / DRBD cluster to serve a bunch of Xen VMs. My current configuration is: - 2 identical DELL R410 servers with 4core Xeon and 8Gb RAM - 4x1Tb SATA connected to Dell's PERC 6i battery backed RAID controllers configured as RAID10 on each server. - DOM0 Slackware 13.0 X86_64 with Xen 4.0 compiled from source (2.6.31.13 Xenified kernel) - OpenAIS 1.1.2 + PaceMaker 1.0.8 compiled from source - DRBD 8.3.7 compiled from source The configuration files for drbd and DomU's are local to each host (replicated by hand), the only shared data on drbd are guest's block devices: one drbd resource per guest built on two identically sized LVM logical volumes, Xen uses /dev/drbdX as guest's block device. The whole thing is working flawlessy and seems fast and stable too. I've got a question regarding HVM's (windows guests) and Primary/Primary DRBD for live migration. DRBD docs states that I can't use block-drbd helper script with HVM's (and other few cases) I'm relatively new in setting up a pacemaker cluster so the question is : How can I make pacemaker ocf RA take care of: 1) promoting drbd on target node 2) live migrating HVM 3) demoting drbd on start node (Which is basically what the block-drbd helper is supposed to do) If I haven't missed something setting ocf:linbit:drbd masters-max=1 doesn't allow the drbd resource to be in Primary/Primary even during the (short) time of Xen live migration, on the other side setting max-masters=2 causes drbd to run constantly in Primary/Primary mode, posing some data corruption risks (I don't want to use a clustered filesystem, because I want to store DomU's filesystems on a physical device for best performance). In short I would like to leave drbd guest resources primary only on the active host (where the relative DomU has been started), set them to primary/primary only when migrating guest DomU on the other host and then quickly demote the resource on start host after migration is complete for safety reasons. This apparently can't be done without some form of "cooperation" between the ocf:linbit:drbd and ocf:heartbeat:Xen resource agents... which is exactly what I'm looking for but apparently cannot find in pacemaker's docs. Please tell me if I miss something crucial about ocf RAs and their usage in this situation. Now the (apparently) good news... The approach taken by block-drbd seemed more logical for me, having a single OCF RA managing and coordinating the whole transition (DomU migration + drbd promoton/demotion). Digging around I've found this patch: http://post.gmane.org/post.php?group=gmane.comp.emulators.xen.devel&followup=80598 many thanks and full credits to the author: James Harper So I've investigated a little bit more, and come up to the point where I can instruct a (patched as described) qemu-dm to recognize drbd resources (specified as drbd:resource in Xen DomU cfg file) and map them to the correct /dev/drbd/by-res/xxx node in the HVM's... sadly this solved only a part of the problem. Starting from a Primary/Secondary state and launching (xm create) the HVM on the "Primary" drbd host works perfectly. After this I can do live migration of the HVM to the other host and obtain the sequence of promotions/demotions from the block-drbd script, leaving the system in the expected state. Starting from a Secondary/Secondary drbd state (DomU stopped on each host) when i "xm create" an HVM DomU qemu-dm is fired before launching the block-drbd helpers, so the HVM correctly maps the device but drbd has not yet been promoted to Primary and the DomU is immediately turned off. BTW: Someone can explain this difference ? Why the block-drbd script is called BEFORE starting a live migration (making also the destination host Primary before attempting to migrate) and not BEFORE (but after) attempting to create the DomU and map the vbd via qemu-dm ? Looking at the state transitions in /proc/drbd during a "xm create" of my HVM DomU I saw the Secondary->Primary transition happens... normally followed by the inverse transition just after qemu-dm "finds" the resource in an unusable state and shuts down the creation of DomU.... it seems only a timing problem ! Qemu-dm was too fast (and even started a bit before) and checks the vbd BEFORE block-drbd script can promote to Primary... the logical (but badly hackish) solution for me was inserting a delay in qemu-dm process if the resource IS "drbd:". So, the final state is : - I can create HVM guests using "drbd:res" syntax in configuration files making block-drbd take care of drbd transitions - I can migrate / live migrate HVM (windows) guests having block-drbd doing his job - my solution is a bad hack (at least for creation of HVM) based on a delay inserted in qemu-dm to wait for block-drbd execution. The complete patch to Xen 4.0 source (AGAIN THANKS TO : James Harper) is : --- xenstore.c.orig 2010-04-29 23:23:45.720258686 +0200 +++ xenstore.c 2010-04-29 22:52:43.897264812 +0200 @@ -513,6 +513,15 @@ params = newparams; format = &bdrv_raw; } + /* handle drbd mapping */ + if (!strcmp(drv, "drbd")) { + char *newparams = malloc(17 + strlen(params) + 1); + sprintf(newparams, "/dev/drbd/by-res/%s", params); + free(params); + sleep(5); + params = newparams; + format = &bdrv_raw; + } #if 0 /* Phantom VBDs are disabled because the use of paths I've only added the sleep(5); statement to make qemu-dm relax a bit and wait for block-drbd to be called. Please come up with your comments and ideas to stabilize and improve the patch making it less hackish (at least in my little addition) and possibly suitable for production use (probably finding a reliable way to "wait" for drbd state change in qemu-dm). Sauro Saltini.