Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi All, Apologies for the mass email, but it seemed most appropriate to post a followup to all the lists I originally sent the LCA 2012 HA miniconf CFP to. I would humbly suggest that any miniconf-related replies be sent either direct to myself, or to ha-wg at lists.linux-foundation.org. Comments on the HA BOF mentioned below should probably go to either pacemaker at oss.clusterlabs.org or linux-ha at lists.linux-ha.org. ========== The High Availability and Distributed Storage miniconf[1] at LCA 2012 went very well. Probably 60+ in attendance (so about 1/8th of the conference attendees, given 7 other concurrent miniconfs), with maybe a few less later in the day. First half was more linux-ha type stuff, second half more database-y, with a bit of CTDB and Samba foo in the middle. Sadly we didn't actually get much in the way of distributed storage talks -- oddly enough, there was a conspicuous absence of Gluster and Ceph talks in the main conf track as well. We hope to have better luck next year (I plan to propose this miniconf again). The talks were almost all 25 minute slots, as follows: Storage Replication in High-Performance High-Availability Environments http://www.youtube.com/watch?v=l910kiEuHOM by Florian Haas; discussion of using drbd with flashcache to provide failover while still keeping the cache hot. Building a Non-Shared Storage HA Cluster with Pacemaker & PostgreSQL 9.1 http://www.youtube.com/watch?v=ON4QGfDkqwg by Keisuke Mori; enhanced pgsql RA to work with PostgreSQL streaming replication. Extend Pacemaker to Support Geographically Distributed Clustering http://www.youtube.com/watch?v=S3DB_DSVI_A by Tim Serong on behalf of Jiaju Zhang; an introduction to Booth (what it is, how to configure it). HiPBX - HiAv VoIP with Open Source Software and 5000 Lines of Bash http://www.youtube.com/watch?v=CpMifzcYSdU by Rob Thomas; showing how he built an HA VoIP system with live demo (which almost worked) and a rickroll. Very entertaining. Squashing SPOFs with Common Sense, Velcro, and a Hammer http://www.youtube.com/watch?v=6mQ65Flmri8 also by Rob Thomas; somewhat more generic (label everything, do proper cable management etc.), but still also entertaining. CTDB Overview http://www.youtube.com/watch?v=L7-QSbEEjS0 by Ronnie Sahlberg; CTDB's approach to clustering - run everything everywhere instead of classic active/passive, and know what state is safe to drop/lose if a node dies. High Availability Login Services with Samba4 Active Directory http://www.youtube.com/watch?v=-EeqYbEwJU8 by Kai Blin; Brief overview of using Samba4 for AD auth - Kai has a whole bunch of little embedded systems in his house running this, which is kind of cute. HA Lessons Learned from Darth Vader http://www.youtube.com/watch?v=tnBz8212X5M by Ronnie Sahlberg; essentially saying the Empire got it wrong with the Death Star (big SPOF), but did better on Hoth with its redundant army of AT-ATs. MySQL for the Developer in a Post-Oracle World http://www.youtube.com/watch?v=oJ9HnFgC48s by Adam Donnison; various forking etc. of MySQL, both project forking and different companies providing dev, consulting etc. MySQL and Postgres Cloud Offerings http://www.youtube.com/watch?v=UFTp0zA4Mx8 by Stewart Smith & Selena Deckelmann; basically there aren't many sensible DB cloud offerings and/or they don't work and/or they don't scale (I might be exaggerating, but probably not much). Scaling Data: Postgres, The Stack and the Future of Replication http://www.youtube.com/watch?v=Pdgzy7KoGWU by Selena Deckelmann; some general postgres discussion, live demo of setting up binary replication, new stuff in 9.2. Swift 101 http://www.youtube.com/watch?v=mX25RtDvf8E by Monty Taylor; introduction to Swift in OpenStack - it's not a RAID, it's not distributed storage, it's not (etc.), it's an object store! Good for backups (large, write once, read never) and web content (small, write once, read many). MySQL Web Infra Scaling and Keeping it Online, Cheaply http://www.youtube.com/watch?v=A4K-ZDDBRHI by Arjen Lentz; the approaches his company takes when "fixing" client systems so that they're resilient to failure (mysql tuning, split web/db servers, backups, monitoring, master/slave systems etc.) We also had two lightning talks which apparently weren't recorded. One was Avi Miller from Oracle announcing that they're supporting DRBD 8.3 in UEK2 (which is currently in beta). The other was from Florian Haas ranting about crappy HA stack usability (e.g.: inscrutable command line options and incomprehensible error messages). It was fun. On Thursday, I co-presented the tutorial "High Availability Sprint: from the brink of disaster to the Zen of Pacemaker" with Florian Haas. We ran through basic concepts of drbd, corosync, pacemaker etc. then did a walkthrough of setting up drbd+corosync+pacemaker+mysql on two VMs (VM images were provided in advance, so participants could follow along). This was well received, with people coming out of it actually understanding what the hell we were talking about. Probably 30-40 attendees. The video is at http://www.youtube.com/watch?v=3GoT36cK6os After that we had an HA birds of a feather session for a couple of hours, maybe 15-20 people. Party this was answering questions and random discussion, but also us (myself, Florian, Andrew Beekhof) seeking feedback about pain points with the HA stack. Comments include: - Documentation is still too hard to find. - crm shell lacks some facilities for automation with e.g.: puppet. Someone wanted to be able to query the current value of a monitor op on a resource. Querying the whole primitive and grepping is too coarse. - The whole stack is too complicated(?) and/or some concern about maintenance of documentation going forwards. - Corosync 2.0 drops support for plugins, and requires libqb. - Someone wants resource-agents manpage generation foo to go to a devel package, so people shipping their own RAs can utilize that. - A "frequently encountered errors and solutions" help page somewhere would be of major benefit. We could probably crowdsource this to some extent. We're still evaluating where this could be hosted best, but currently the Clusterlabs wiki seems like the most suitable candidate. - The need to deprecate resource agents came up again ("should I use ocf:heartbeat:drbd or ocf:linbit:drbd?"), highlighting the need for the overdue OCF spec update. - Some part of Red Hat's decision to use their own (new, in development) shell for Pacemaker in RHEL 7(?) is because they want that shell to do whole cluster setup, including corosync etc. which is a different scope than the crm shell. Thanks for reading, hope it was interesting. Regards, Tim [1] http://lca2012.linux.org.au/wiki/index.php/Miniconfs/HighAvailabilityAndDistributedStorage -- Tim Serong Senior Clustering Engineer SUSE tserong at suse.com