Claims in the Scheduler

One of the shortcomings of the current scheduler in OpenStack Nova is that there is a long interval from when the scheduler selects a suitable host for a new instance until the resources on that host are claimed so that they are no longer available. Now that resources are tracked in the Placement service, we want to move the claim closer to the time of host selection, in order to avoid (or eliminate) the race condition. I’m not going to explain the race condition here; if you’re reading this, I’m assuming this is well understood, so let me just summarize my concern: the current proposed design, as seen in the series starting with https://review.openstack.org/#/c/465175/, could be made much better with some design changes.

At the recent Boston Summit, which I was unable to attend due to lack of funding by my employer, the design for this change was discussed, and the consensus was to have the scheduler return a list of hosts for each instance to the super conductor, and then have the super conductor attempt to claim the resources for the first host returned. If the allocation fails, the super conductor discards that host and tries to claim the resources on the second host. When it finally succeeds in a claim, it sends a message to that host to start building the instance, and that message will include the list of alternative hosts. If something happens that causes the build to fail, the compute node sends it back to its local conductor, which will unclaim the resources, and then try each of the alternates in order by first claiming the resources on that host, and if successful, sending the build request to that host. Only if all of the alternates fail will the request fail.

I believe that while this is an improvement, it could be better. I’d like to do two things differently:

  1. Have the scheduler claim the resources on the first selected host. If it fails, discard it and try the next. When it succeeds, find other hosts in the list of weighed hosts that are in the same cell as the selected host in order to provide the number of alternates, and return that list.
  2. Have the process asking the scheduler to select a host also provide the number of alternates, instead of having the scheduler use the current max_attempts config option value.

On the first point: the scheduler already has a representation of the resources that need to be claimed. If the super conductor does the claiming, it will have to re-generate that representation. Sure, that’s not all that demanding, but it sure makes for cleaner design to not repeat things. It also ensures that the super conductor gets a good host from the start. Let me give an example. If the scheduler returns a chosen host (without claiming) and two alternates (which is the standard behavior using the config option default), the conductor has no guarantee of getting a good host. In the event of a race, the first host may fail to allocate resources, and now there are only the two alternates to try. If the claim was done in the scheduler, though, when that first host failed it would have been discarded, and the the next host tried, until the allocation succeeded. Only then would the alternates be determined, and the super conductor could confidently pass on that build request to the chosen host. Simply put: by having the scheduler do the initial claim, the super conductor is guaranteed to get a good host.

Another problem, although much less critical, is that the scheduler still has the host do consume_from_request(). With the claim done in the conductor, there is no way to keep this working if the initial host fails. We will have consumed on that host, even though we aren’t building on it, and have not consumed on the host we actually select.

On the second point: we have spent a lot of time over the past few years trying to clean up the interface between Nova and the scheduler, and have made a great deal of progress on that front. Now I know that the dream of an independent scheduler is still just that: a dream. But I also know that the scheduler code has been greatly improved by defining a cleaner interface between it an Nova. One of the items that has been discussed is that the config option max_attempts doesn’t belong in the scheduler; instead, it really belongs in the conductor, and now that the conductor will be getting a list of hosts from the scheduler, the scheduler is out of the picture when it comes to retrying a failed build. The current proposal to not only leave that config option in the scheduler, but to make it dependent on it for its functioning, is something that once again makes the scheduler Nova-centric (and Nova-exclusive). It would be a much cleaner design to simply have the conductor ask for the number of hosts (chosen + alternates), and have the scheduler’s behavior use that number. Yes, it requires a change to the RPC interface, but that is to be expected if you are changing a fundamental behavior of the scheduler. And if the scheduler is ever moved into a module, all it is is another parameter. Really, that’s not a good reason to follow a poor design.

Since some of the principal people involved in this discussion are not available now, and I’m going to be away at PyCon for the next few days, Dan Smith suggested that I post a summary of my concerns so that all can read it and have an idea what the issues are. Then next week sometime when we are all around and have the time to discuss this, we can hash it out on #openstack-nova, or maybe in a hangout. I also have pushed a series that has all of the steps needed to make this happen, since it’s one thing to talk about a design, and it’s another to see the actual code. The series starts here: https://review.openstack.org/#/c/464086/. For some of the later patches I haven’t finished updating the tests to match the change in method signatures and returned value structures, but you should be able to get a good idea of the code changes I’m proposing.

Re-imagining the Nova Scheduler

The Problem

OpenStack is a distributed, asynchronous system, and much of the difficulty in designing such a system is keeping the data about the state of the various components up-to-date and available across the entire system. There are several examples of this, but as I’m most familiar with the scheduler, let’s consider that and the data it needs in order to fulfill its role.

The Even Bigger Problem

There is no way that Nova could ever incrementally adopt a solution like this. It would require changing a huge part of the way things currently work all at once, which is why I’m not writing this as a spec, as it would generate a slew of -2s immediately. So please keep in mind that I am fully aware of this limitation; I only present it to help people think of alternative solutions, instead of always trying to incrementally refine an existing solution that will probably never get us where we need to be.

The Example: Nova Scheduler

The scheduler receives a request for resources, and then must select a provider of those resources (i.e., the host) that has the requested resources in sufficient amounts. It does this by querying the Nova compute_node table, and then updating an in-memory copy of that information with anything changed in the database. That means that there is a copy of the information in the compute node database held in memory by the scheduler, and that most of the queries it runs do not actually update anything, as the data doesn’t change that often. Then, once it has updated all of the hosts, it runs them through a series of filters to remove those that cannot fulfill the request. It then runs those that make it through the filters through a series of weighers to determine the best fit. This filtering and weighing process takes a small but finite amount of time, and while it is going on, other requests can be received and also processed in a similar manner. Once a host has been selected, a message is sent to the host (via the conductor, but let’s simplify things here) to claim the resources and build the requested virtual machine; this request can sometimes fail due to a classic race condition, where two requests for similar resources are received in a short period of time, and different threads handling the requests select the same host. To make things even more tenuous, in the case of cells each cell will have its own database, and keeping data synchronized across these cells can further complicate this process.

Another big problem with this is that it is Nova-centric. It assumes that a request has a flavor, which is comprised of RAM, CPU and ephemeral disk requirements, with possibly some other compute-related information. Work is being done now to create more generic Resource classes that the scheduler could use to allocate Cinder and Neutron resources, too. The bigger problem, though, is the sheer clumsiness of the design. Data is stored in one place, and each resource type will require a separate table to store its unique constraints. Then this data is perpetually passed around to the parts of the system that might need it. Updates to that data are likewise passed around, and a lot of code is in place to help insure that these different copies of the data stay in sync. The design of the scheduler is inherently racy, because in the case of multiple schedulers (or multiple threads of the same service), none of the schedulers has any idea what any of the others are doing. It is common for similar requests to come in close to each other, and thus likely that in those cases that the same host will be selected by different schedulers, since they are both using the same criteria to make that selection. In those cases, one request will build successfully, and the other will fail and have to be retried.

Current Direction

For the past year a great deal of work has been done to clean up the interface of the scheduler with nova, and there are also other thoughts on how we can improve the current design to make it work a little better. While these are steps in the right direction, it very much feels like we are ignoring the big problem: the overall design is wrong. We are trying to implement a technology solution that has already been implemented, and not doing a very good job of it. Sure, it’s way, way better than things were a few years ago, but it isn’t good enough for what we need, and it seems clear that it will never be better than “good enough” under the current design.

Proposal

I propose replacing all the internal communication that handles the distribution and synchronization of data among the various parts of Nova with a system that is designed to do this natively. Apache Cassandra is a mature, proven database that is a great fit for this problem domain. It is a masterless design, with all nodes capable of full read and write access. It also provide for extremely low overhead for writes, as well as low overhead for reads with correct data modeling. Its flexible data schemas will also enable the scheduler to support additional types of resources, not just compute as in the current design, without having to have different tables for each type. And since Cassandra is replicated across all clusters equally, different cells would be reading and writing to the same data, even with physically separate installations. Data updates are obviously not instant across the globe, but they are only limited by the connection speed.

Wait – a NoSQL database?

Well, yeah, but the NoSQL part isn’t the reason for suggesting Cassandra. It is the extremely fast, efficient replication of data across all clusters that makes it a great fit for this problem. The schemaless design of the database does have an advantage when it comes to the implementation, but many other products offer similar capabilities. It is the efficient replication combined with very high write capabilities that make it ideal.

Cassandra is used by some of the biggest sites in the world. It is the backbone of Apple’s AppStore and iTunes; Netflix uses Cassandra for its stream services database. And it is used by CERN and WalMart, two of the biggest OpenStack deployments.

Implementation

How would this work in practice? I have some ideas which I’ll outline here, but please keep in mind that this is not intended to be a full-on spec, nor is it the only possible design.

Resource Classes

Instead of limiting this to compute resources, we create the concept of resource type, and have each resource class define its properties. These will map to columns in the database, and give Cassandra’s schemaless design, will make representing different resource types much easier. There would be some columns in common with all resource types, and others that are specific to each type. The subclasses that define each resource type would enumerate their specific columns, as well as define the method for comparing to a request for that resource.

Resource Providers

Resources providers are what the scheduler schedules. In our example here, the resource provider is a compute node.

Compute Nodes

Compute nodes would write their state to the database when the compute service starts up, and then update that record whenever anything significant changes. There should also be a periodic update to make sure things are in sync, but that shouldn’t be as critical as it is in the current system. What the node will write will consist of the resources available on the node, along with the resource type of ‘compute’. When a request to build an instance is received, the compute node will find the matching claim record, and after creating the new instance delete that claim record and update its state with its current state. Similarly when an instance is destroyed, a write will update the record to reflect the newly-available resources. There will be no need for a compute node to use a Resource Tracker, as querying Cassandra for claim info will be faster and more reliable than trying to keep yet another object in sync.

Scheduler

Filters now work by comparing requested amounts of resources (for consumable resources) or host properties (for things like aggregates) with an in-memory copy of each compute node, and deciding if it meets the requirement. This is relatively slow and prone to race conditions, especially with multiple scheduler instances or threads. With this proposal, the scheduler will no longer maintain in-memory copies of HostState information. Instead, it will be able to query the data to find all hosts that match the requested resources, and then process those with additional filters if necessary. Each resource class will know its own database columns, and how to take a request object along with the enabled filters and turn it into the appropriate query. The query will return all matching resource providers, which can then be further processed by weighers in order to select the best fit. Note that in this design, bare metal hosts are considered a different resource type, so we will eliminate the need for the (host, node) tracking that is currently necessary to fit non-divisible resources into the compute resource model.

When a host is selected, the scheduler will write a claim record to the compute node table; this will be the same format as the compute node record, but with negative amounts to reflect reserved consumption. Therefore, at any time, the available resources on the host is the sum of the actual resources reported by the host along with any claims on that host. However, when writing the claim, a Lightweight Transaction can be used to ensure that another thread hasn’t already claimed resources on the compute node, or that the state of that node hasn’t changed in any other way. This will greatly reduce (and possibly eliminate) the frequency of retries due to threads racing with each other.

The remaining internal communication will remain the same. API requests will be passed to the conductor, which will hand them off to the scheduler. After the scheduler selects a host, it will send a message to that effect back to the conductor, which will then notify the appropriate host.

Summary

There is a distributed, reliable way to share data among disconnected systems, but for historical reasons, we do not use it. Instead, we have attempted to create a different approach and then tweak it as much as possible. It is my belief that these incremental improvements will not be sufficient to make this design work well enough, and that by making the hard decision now to change course and adopt a different design will make OpenStack better in the long run.

PyCon 2015

PyCon 2015 ended over a week ago, so you might be wondering why I’m writing this so late. Well, once again (see my PyCon 2014 post) I blame the location: the city of Montreal. We like it so much that Linda and I planned on staying a few extra days on holiday afterwards. After returning, though, I again payed the price by digging out from the accumulated backlog. It was well worth it, though!

Old Montreal
Old Montreal at night

If you weren’t able to go to PyCon, or even if you were there and don’t possess the ability to be in multiple places at once, you missed a lot of excellent talks. But no need to worry: the A/V team did an amazing job this year, and not only recorded every session, but got them posted to YouTube in record time – many just a few hours after the talk was completed! Major kudos to them for an excellent job.

swagline
swagbags The swag table (top) and pile of stuffed bags (bottom)

PyCon is an amazing effort by many people, all of whom are volunteers. One of my favorite volunteer activity is the stuffing of the swag bags. Think about it: over 3,000 attendees each receive a bag filled with the promotional materials from the various sponsors. Those items – flyers, toys, pens, etc. – are shipped from the sponsors to PyCon, and somehow one of each must get put into each one of those bags. Over the years we’ve iterated on the approach, trying all sorts of concurrency models, and have finally found one that seems to work best: each box of swag has one person to dish it out, and then everyone else picks up an empty bag and walks down the table, and one item of each is deposited in their bag. Actually it took two very long tables, after which the filled bag is handed to another volunteer, who folds and stacks it. It’s both exhausting and exhilarating at the same time. We managed to finish in just under 3 hours, so that’s over 1,000 bags completed per hour!

In between talks, I spent much of my time staffing the OpenStack booth, and talked with many people who had various degrees of familiarity with OpenStack. Some had heard the name, but not much else. Others knew it was “cloud something”, but weren’t sure what that something was. Others had installed and played around with it, and had very specific configuration questions. Many people, even those familiar with what OpenStack was, were surprised to learn that it is written entirely in Python, and that it is by far the largest Python project today. It was great to be able to talk to so many different people and share what the OpenStack community is all about.

Last year PyCon introduced a new conference feature: onsite child care for people who wanted to attend, but who didn’t have anyone to watch their kids during the conference. Now, since my kids are no longer “kids”, I would not have a personal need for this service, but I still thought that it was an incredible idea. Anything that encourages more people to be able to be a part of the conference is a good thing, and one that helps a particularly under-represented group is even better. So in that tradition, there was another enabling feature added this year: live captioning of every single talk! Each room had one of the big screens in the front dedicated to a live captioned stream, so that those attendees who cannot hear can still participate. I took a short, wobbly video when they announced the feature during the opening keynote so you can how prominent the screens were. I have a bit of hearing loss, so I did need to refer to the screen several times to catch what I missed. Just another example of how welcoming the Python community is.

gabriellacolemanTrue to last year’s form, one of the keynotes was focused on the online community of the entire world, not just the limited world of Python development. Last year was a talk by John Perry Barlow, former Grateful Dead lyricist and co-founder of the Electronic Frontier Foundation, sharing his thoughts on government spying and security. This year’s talk was from Gabriella Coleman, a professor of anthropology at McGill University. Her talk was on her work studying Anonymous, the ever-morphing group of online activists, and how they have evolved and splintered in response to events in the world. It was a fascinating look into a little-understood movement, and I would urge you to watch her keynote if you are at all interested in either online security and activism, or just the group itself.

jkmmediocreThe highlight of the conference for me and many others, though, was the extremely thoughtful and passionate keynote by Jacob Kaplan-Moss that attempts to kill the notion of “rockstar” or “ninja” programmers (ugh!) once and for all. “Hi, I’m Jacob, and I’m a mediocre programmer”. You really do need to find 30 minutes of time to watch it all the way through.

This last point is a long-time peeve of mine: the notion that programming is engineering, and that there are objective measurements that can be applied to it. Perhaps that will be fodder for a future blog post…

One aspect of all PyCons that I’ve been to is the friendships that I have made and renewed over the years. It’s always great to catch up with people you only see once a year, and see how their lives are progressing. It was also fun to take advantage of the excellent restaurants that the host city has to offer, and we certainly did that! On Sunday night, just after the closing of PyCon, we went out to dinner at Barroco, a wonderful restaurant in Old Montreal, with my long-time friends Paul and Steve. Good food, wonderful wine, and excellent company made for a very memorable evening.

dinner picture
(L to R) Paul McNett, Steve Holden, Linda and me.

This was my 12th PyCon in a row, and I certainly don’t plan on breaking that streak next year, when PyCon US moves back to the US – to Portland, Oregon, to be specific. I hope to see many of you there!

PyCon 2014 Review

mirror.jpgBetter late than never, right?

PyCon 2014 happened two weeks ago, and I’m just getting around to write about it now. Why the delay? Well, I took a few days of vacation to explore and enjoy Montréal with my woman after PyCon, and when I returned I found myself trapped under a mountain of work that had built up in my absence. I’ve finally dug myself out enough to take the time to write up my impressions.

This PyCon (my 11th) was different in many ways, not only for the fact that for the first time, the US PyCon was not in the US! It was held in Canada in beautiful (and cold!) Montréal. It was the first time in years that I did not arrive early enough to help with the bag stuffing event, which has always been a highlight of my PyCon experiences. I arrived late Thursday afternoon, and just made it to the Opening Reception, where I not only touched based with my fellow Rackers, but also ran into dozens of Python people I have gotten to know over the years. PyCon is special in that regard: while I have technical contacts at many of the conferences I attend, I consider many of the people at PyCon to be my friends.

I was pleasantly surprised by the bold choice for the opening keynote: John Perry Barlow, one of the founders of the Electronic Frontier Foundation (EFF).

OLYMPUS DIGITAL CAMERA

He pulled no punches, and let the audience know exactly where he stood on matters of security, openness, government spying, and free information in a society. I enjoyed his talk immensely, but I knew that some with more conservative views might not respond positively to the anti-corporation, anti-BigBrother tone of his talk, and judging by the sight of several people leaving during the talk, I think I was right. Still, I appreciated that the primary conferences for one of the most important Open languages took that risk.

I attended several sessions, and bounced between them and my duties at the Rackspace booth in the vendor area. We were once again a major sponsor of PyCon, and this gives me great pleasure, as I had first convinced Rackspace to become a sponsor shortly after I joined the company 6 years ago. We benefit so much from Python and the work of the PSF, it’s only right that we give something back.

OLYMPUS DIGITAL CAMERA

I won’t go into detail about the individual sessions, but I would encourage you to check out the videos of all the talks that are available on the pyvideo site. The quality of the presentations keeps getting better and better every year, which is a great reflection on the talk selection committee, who had to select just 95 talks from a total of 650 proposals. That is a thankless task, so let me say “thank you” to the folks who reviewed all those proposals and made the tough calls.

I would also like to note that this year 1/3 of the speakers were women, and by some estimates, the percentage of female attendees was nearly as high! (They don’t record the sex of a person when they register; hence the need for estimates). This is a phenomenal result in the traditionally male-dominated tech industry, and it isn’t by accident. The PSF has actively encouraged women to attend, both by creating (and standing behind) a Code of Conduct, as well as offering financial assistance to those who might not otherwise be able to attend. And this year was another first: onsite child care during the main conference days. I think this is an amazing addition to PyCon, even though my kids are grown. It allows many people to come who would otherwise not be able, and also encouraged more families to travel together, making PyCon the hub of their vacation plans.

That’s exactly what I did, too. Linda flew into Montréal on Saturday evening, hung around PyCon for the closing session so that she could get a glimpse into the strange geek world I inhabit regularly, and meet some of my friends. We spent the next few days exploring this city that neither of us had visited before, enjoying ourselves immensely. Coming from South Texas, we could have done without the near-freezing temperatures and snow, though! Here was the view from our hotel window:OLYMPUS DIGITAL CAMERA

It was a wonderful vacation, but much too short! We’re really looking forward to returning next year for PyCon 2015!OLYMPUS DIGITAL CAMERA

 

 

PyCon Canada Review

It’s been almost a week since PyCon Canada ended, and I had meant to write up a review, but I was busy, so better late than never.

Besides an opportunity to learn more about Python, this conference served as a “training ground” for next year’s PyCon Montréal, which is where the 2014 US PyCon will be held. Yeah, I know, it seems silly to still call it PyCon US instead of PyCon North America, but I suppose it has to do with domain names, trademarks, etc., so I’ll refrain from ranting about that. So to that end, I volunteered to be an MC for roughly half of the total conference. Normally for PyCon you have two roles: Session Chair, who introduces the speaker, times the talk, and manages the Q&A session; and Session Runner, who makes sure that the speaker gets from the green room to the correct session room, and who handles any problems such as missing video adapters, etc. But for PyCon CA, this was changed so that there was an MC and a Runner. The MC was someone with experience from PyCon US, and they served for half of the day in the room. There were fresh runners for each session, but rather than just escort the speaker, they did everything that session chairs and runners do, with the MC guiding them and making sure that they knew what was expected, and who could help out if there were any problems. So in many ways, though I spent half the conference running the talks, I felt like I hardly did anything. The runners did just about everything, and I only had to help out a few times. I have no worries that they’ll be ready for the big leagues next year!

As expected, there were many interesting sessions. I’m not going to give you a summary of everything I saw that I liked, but instead I’ll touch on a few highlights. Probably the one that made the biggest impression on me had nothing to do with Python per se, but instead was about Sketch, a programming language designed by MIT to get children thinking about programming without saddling them with the tedium of learning syntax. This was shown in the keynote on Sunday given by Karen Brennan, who is an Assistant Professor of Education at Harvard University. Besides making it easy for the kids to get started learning about programming concepts, it also makes it easy for them to share their projects, or to take someone else’s project and add to it. In other words, they are teaching kids about the benefits of open source and shared code! And while it’s designed for kids, I think it would also be a huge benefit for non-technical adults by helping familiarize themselves with programming without having to get totally geeked out.

Another talk I enjoyed was Git Happens, which was given by Jessica Kerr. This was one of the talks for which I was MC’ing, and hadn’t otherwise planned on attending, as I’m comfortable enough with Git. But rather than be bored, her way of explaining how Git works was much, much clearer than I have ever been able to manage when helping someone else get up to speed with Git.

Brandon Rhodes gave an excellent talk entitled Skyfield and 15 Years of Bad APIs , which covered his efforts to re-write his astronomical calculation library. It was fascinating to see how decisions which seemed reasonable at the time turned out to be less than optimal, and how he learned from them to make the new version much cleaner. As an aside, if you haven’t seen Brandon talk, you’re missing something special. He blends insight with an extremely dry sense of humor, making for a very enjoyable session.

I’ve already written about Dana Bauer‘s Red Balloon demo, so I won’t repeat it here. Suffice it to say that it captivated the attention of everyone in the room.

My talk was basically the same talk I had given the previous month at PyCon Australia, but in a much shorter time slot: 20 minutes instead of 45! I spent much of the time before the conference consolidating my slides, and eliminating anything I could in order to cut the length of the talk. I practiced giving the session over and over, speaking as fast as I could. When the time came, I apologized in advance for the speed at which I was about to give the talk, and then blazed through it. To my surprise, I managed to finish it in 18 minutes, and had time for a few questions! Afterwards I spoke with some who attended the talk to get an idea how it seemed to them, and they didn’t feel like I skipped over anything that made it hard to follow, which was my goal when I was paring it down.

After the sessions had ended, I went with a fairly large group of people to a local restaurant. I didn’t know very many people, and those with whom I sat were all from the team in Montréal who were going to be running PyCon 2014. They started out as strangers, but by the end of the evening (and more than a few pitchers of beer!), we became friends, and I look forward to seeing them again in Montréal next April!

PyCon Canada Begins!

Today was the opening of PyCon Canada 2013, with some volunteer prep work to get things ready in the afternoon, followed by a casual mixer in the evening. This is only the second PyCon here in Canada, and it’s already grown significantly from last year’s small venue. I didn’t go to that one, but I’ve been told it was small enough to be held in a Legion hall. This year it is being held in the Chestnut Conference Centre, which is part of the University of Toronto.

mixer

The mixer was a lot of fun, with food and beverages graciously supplied by Upverter. I got to meet several people as well as connect with many others whom I had already met. Based on my small sampling, there was a good mix of seasoned Python developers and relative newcomers to the language. One of the best parts about conferences like this is learning what others are doing with the language, and there was a wide range just in the discussions that I had: some doing web development, while others focused on internal tools for their companies, while still another was working with medical research teams to analyze data.

The conference proper begins tomorrow morning, with a keynote by Jacob Kaplan-Moss, followed by a full day of sessions. My talk isn’t until Sunday afternoon, which means that I’ll probably spend a lot of tomorrow revising my slides again and again between now and then.

 

A Look Back at PyCon Australia 2013

In an earlier post I reviewed the OpenStack miniconf that preceded the main PyCon, which was held in Hobart, Tasmania on July 6–7. I had meant to write this shortly after PyCon ended, but the whirlwind of travel back to the US and getting back into the daily grind pushed it off my plate.

The conference was recorded, and all the videos are available on the pyvideo.org website. I encourage you to watch as many of the sessions that interest you as you can – lots of good stuff in them!

The conference actually started for me earlier – the organizer, Chris Neugebauer, had asked for volunteers to help with the conference prep work: badges, swag, all that stuff. This was on the Wednesday before the conference, which happened to be the day I arrived, so it was as good an excuse as any to get out of my hotel and into Hobart. For those of you who have never gone to a PyCon, it is completely run by volunteers. No one gets paid; no one gets free admission; no one gets special perks. This was shocking to me when I moved from the Microsoft conference world a decade ago, where conferences were run as profit centers, and attendees paid for tickets that cost well over $1,000, but who could then relax and treat their time there as a vacation (which many did, at their employers’ expense). But PyCons are the exact opposite, and as a result everyone has a stake in the conference experience. I’ve found that volunteering not only makes you feel like you’re contributing, but it also means that you meet a lot of interesting people who might otherwise remain anonymous faces in the crowd.

PyCon AU volunteer night

The picture above is a quick snapshot I took as we carried out the task of wrapping a thousand coffee cups (yes, that’s correct: 1000) with vinyl cutouts of the coffee sponsor’s logo. Why did we do this? Because a run of pre-printed cups had a minimum of 50,000, and, well, PyCon just ain’t that big! And to make things even more fun, the logos were slightly bigger than the cup, which made it difficult to wrap cleanly. Tedious, huh? But rather than being a drag, it was a great time as the dozen or so people there had great conversations as we wrapped the cups one by one. As the only American there, and this being my first time in Australia, observations on cultural differences was a big topic, and I certainly learned a lot. We were all shocked when we were done to learn that we had spent 4 hours working on this stuff; it certainly went by much faster than that.

After the work was done, we went out to eat, where (among other things) I learned that Tasmania has a high-end distillery business that is producing spirits that rival other parts of the world, especially in the high-end whiskey area. Two of the best are the Nant and Lark distilleries. I’m not a big Scotch drinker, but I appreciated the quality that these two distilleries are producing. Unfortunately, they are relatively small, and their products are nearly impossible to find in the US.

The conference proper started off Saturday morning, with Alex Gaynor giving the keynote. It set the tone for the conference, and we spent much of the next two days exploring what it meant to be both a software developer in general, and a Pythonista in particular. I then went to Nick Coghlan’s talk about the state of Python packaging; it’s no secret that packaging has always been a pain point for Python, but it is great to see that the different efforts are now unified under Nick’s leadership.

Next up was Jacob Kaplan-Moss’s talk on security in web apps in the Python world. I was aware of most of the attack vectors thanks to my security training at Rackspace, but it was somewhat shocking to see how easy it is to expose a vulnerability if you don’t know it’s there. If you write web apps that are exposed to the public, make sure you watch the video of his talk. Now.

In the afternoon I drifted between various talks which, to be frank, varied in their appeal. That’s not to say that the content wasn’t valuable; it just wasn’t of particular interest to my work. I did end up skipping a few talks to work on my slides for my session; I don’t think I’ve ever given a talk without editing it right up to the last minute, and this was no different.

The day ended with a full hour of Lightning Talks, which are always one of my favorite parts of PyCon, and the lightning talks at PyCon AU did not disappoint. If you aren’t familiar with the concept, they are a series of short (5 minutes – enforced!) talks on whatever the speaker wants to talk about. There was a good mix of dense information, rants, product description and general wackiness. If you’ve never been to a PyCon, don’t miss the lightning talks! I didn’t take notes, but fortunately Thomas Sutton did.

One thing that was different at this PyCon was the Conference Dinner after the sessions on Saturday. All the attendees were gathered into a room with many large round tables that sat about 10 people each. There was wine and beer served, along with iced tea and soda. The food was served buffet-style, but this was no ordinary conference fare. Fresh oysters! Smoked salmon! Freshly-carved roasts! All this with an assortment of salads, vegetables, breads, and deserts. I followed my usual practice and sat at a table where I knew no one, in order to make new friends and learn about what they are doing with Python, but we ended up having much less technical discussion for some reason (big surprise!).

The evening wasn’t all food and drink, though. There was an evening keynote by Mark Pesce of MooresCloud that talked about their work integrating Raspberry Pi units to control LED displays. An example of this was used in the lightning talks to provide a visual indication of the time remaining for each speaker; as time wound down, the LEDs changed from green to yellow to red to flashing red to off. During the keynote, though, he demonstrated how the Pis could change the light display in response to a stream of Twitter hashtags, creating a virtual tug-of-war. He had half the room send tweets with one hashtag, while the other half tweeted a different hashtag, and the color distribution of the display changed in response. Apologies to anyone who follows me on Twitter for the hashtag spam that night! So while this was a somewhat silly example of what it could do, it certainly demonstrated that you could program the device to respond to any sort of dynamic data.

Sunday morning’s sessions were… ok, you got me. I skipped the sessions at the prompting of Alan Perkins, the director of our Sydney office, who grew up in Hobart and knows the area well. He wanted to take me along with a couple of other Rackers up to the peak of Mount Wellington, which is probably the most ubiquitous visual in the Hobart area. Unfortunately, being the middle of winter there, the roads were closed due to iciness. So instead Alan took us on a tour along the coastline of the Derwent estuary through some of the smaller towns and some truly gorgeous scenery.

We made it back in time for the last morning session, but I needed to prepare for my talk in the afternoon. After lunch Richard Jones gave an excellent talk entitled “Don’t Do This”, which was an exploration of some of the oddities that were possible in the Python language. Fortunately I can say that none of them could be found in my code! Next up was Dylan Lacey from Sauce Labs, with a talk on using Splinter to streamline your testing by providing an API that allows you to automate browser actions. Splinter looks pretty interesting, and if I were doing web acceptance testing, I would definitely check it out.

Next up was my session on using Python to build your application infrastructure. I won’t self-review, but I do want to thank the people who attended the talk for being understanding when the shaky hotel network caused my live demo to die halfway through. I did get a few good reviews from those who attended the talk.

After the afternoon tea break, I saw Luke Miller’s talk on developing an indie game with Python that featured a gay theme. I had never really thought about it before, but there weren’t any games that featured leading characters who were gay, and which might appeal to gay audiences more than the current selection available. The talk wasn’t just about the game’s theme; it was a very in-depth look into what it takes to develop a game and make it interesting. I must say that I learned more in that talk than in any other I attended, as the material was totally new to me, and it was presented very clearly.

The final session was by Adam Forsyth on Getting the Most Out of StackOverflow. Adam is seriously active on that site, and knows quite a bit about how things work, and how it can be overwhelming at times to a newcomer. He had many tips for those who wanted to both ask better questions that would have a good chance of receiving a great answer, as well as for those who wanted to be able to find questions to answer before others did.

Last but not least were the second day of lightning talks. Once again they were excellent, and again, Thomas Sutton took much better notes than I did.

That was the end of the conference, and Chris Neugebauer, as the conference chair, gave his closing remarks, thanking everyone from the sponsors to the attendees for making PyCon AU 2013 a truly remarkable conference. He received a standing ovation of thanks from the audience for all his hard work, and he deserved every clap.

Many people left after that, but for those of us who remained, things were not over. We had the upper level of Jack Greene’s reserved, and it was packed with Pythonistas enjoying some final food, drink, and conversation.

I’ve been to 10 PyCon US conferences, and while PyCon AU 2013 was much smaller in overall size, it matched its US counterpart in terms of spirit, content, and sense of community. Next year it moves to Brisbane, and I would certainly love to be able to return.

OpenStack Miniconf at Pycon AU 2013

Friday was the pre-conference day, with two miniconfs: one for Django, and the other for OpenStack. While I’d love to spend some time digging deeper into Django, I figured that given my background as an OpenStack developer, the OpenStack miniconf was for me.

There were probably 40 people or so in attendance, and it was a good mix of those who were completely new to OpenStack, those who have looked into it a bit and wanted to learn more, and those who either were core developers or (in my case) a former core dev. Tim Serong from OpenSUSE opened up the day with the talk “WTF is OpenStack?“, which was an excellent introduction for those who had heard a lot about this “cloud” stuff. The presentation included the classic spoof by The Onion about “that Cloud thing” (with apologies to Robert Collins of HP, who really does totally know what that is). He covered all the projects within OpenStack, and how they work together.

Robert Collins then followed with a talk on “Deploying OpenStack using OpenStack“, which tackled the issue that although OpenStack allows you to automate the provisioning of cloud resources, installing OpenStack itself is a terribly manual process. His solution is “TripleO“, which stands for “Openstack On Openstack”. It sounds similar to the iNova project from Rackspace, but with several differences. From the ReadMe:

TripleO is the use of a self hosted OpenStack infrastructure – that is OpenStack bare metal (nova and cinder) + Heat + diskimage-builder + in-image orchestration such as Chef or Puppet – to install, maintain and upgrade itself.

Christopher Yeoh of IBM was next, and gave an excellent overview of the changes coming in v3 of the Nova API. The current v2 API was smartly designed by making only a precious few critical parts part of the core API, and making everything else an extension to this core. The problem is that some of the implementation details that seemed wise at the time are starting to show some cracks, both with consistency in naming and in the connection to the core Nova code. Nova v3 API addresses these problems by removing the requirement that extension name matches the class name, allowing for cleaner (and thus more consistent) extension naming.  Extensions now must be derived from a common base class; this was optional in v2. Overall, it was apparent that the people working on the v3 API had learned the lessons that the v2 experience offered, both good and bad, and that as a result the v3 API will be much more consistent and robust.

After lunch Tim Serong gave a talk on the state of OpenStack development on OpenSUSE. They are doing some interesting stuff with Crowbar for deployment, and have spent a lot of time on their internal CI processes. I didn’t take very extensive notes, as this is an area that I have little experience and/or interest in, but it was obvious that Tim is passionate about this, and that they are doing some excellent work at OpenSUSE.

Next up was Robert Collins again, this time talking about testing, covering both improvements in the tools themselves (e.g., the testtools module), as well as his work in creating a test runner runner. No, that wasn’t a stutter – this is a script to run a test runner, such as Jenkins in this case. With the growth of projects in OpenStack, it can take a very long time to run the tests, which is done when a change is first proposed for review, and then again when it has been approved to ensure that no conflicts have creeped in while the change was in the review process. In order to speed this up, his test runner runner breaks the tests into several parallel processes, and then aggregates the results. This means that the time required to run all the tests  can be reduced greatly, depending on the number of parallel processes. It also provides for test randomization, which can help reveal hidden test inter-dependencies.

Next up was a talk on how to get involved in OpenStack development, which covered the basics of where you can find bugs to work on, or blueprints to contribute to, as well as the review process and Gerrit. Michael Still from Rackspace was supposed to give this talk, but the birth of his child was a bit more important, so Robert Collins stepped up and did the talk for him. This was more of a review session for me, but it had a lot of useful information for those who were new to OpenStack development.

That was the last talk; after that was the hackfest, which had the goal of getting participants to find a fairly simple bug, fix it, and submit it for review. In practice, though, most of the time was spent helping people get Devstack up and running on their machines. Those of us who were OpenStack ‘veterans’ helped where we could, and in the process I had some great discussions with people about OpenStack, so even though we never got to fix any bugs, I believe that the people there who were new to OpenStack got a lot out of that time.

Finally was the bar track! Many of the miniconf attendees, myself included, retired to one of the many bars here at Wrest Point, and enjoyed a cold beer after a long day of learning about OpenStack.

Heads up, Aussies!

In a couple of days I’ll be heading out to Australia for a couple of weeks. My proposal for a talk at PyCon Australia had been accepted, and coincidentally Rackspace just launched our cloud in our Sydney datacenter. So I’ll be flying to Sydney to meet all the Rackers in the Sydney office, and then flying the following week to Hobart, Tasmania for the conference.

Except for a trip to Italy on my honeymoon 27 years ago, I have never left the Americas. I have never been south of the Equator. This is going to be a brand-new experience for me, so I’m using it as my motivation to start writing again. I had deliberately avoided blogging for years because I tend to obsess on editing: just one more tweak to that phrasing, or maybe choose a synonym here, or… well, you get the idea. A great example of perfect being the enemy of the good. I’m resolving to change that by instituting a one edit rule: after the post is written, I’ll go over it once, and then publish it. I’m curious to see how difficult that will be to follow!