Pair Development

If you’ve worked on large open source projects, one of the difficulties is dividing the workload. The goal, of course, is to spread it out so that every developer has a workload that will keep them busy, and everyone is working in sync towards a common goal. This isn’t easy in practice, as there is no top-down authority to hand out assignments and keep everyone on track, as there is in a corporate development environment. It requires a good deal of communication among the members of the team, as well as a good deal of trust.

This problem was brought to light recently in the Nova community. The issue was with the subteam working on the scheduler/placement engine, of which I’m a member. During the Newton development cycle, there was a significant bottleneck due to the fact that one person, Chris Dent, was responsible for a large chunk of work in designing and coding the Placement API and underlying engine, while the rest of us could only help by doing reviews after the code was written. And this isn’t a new thing: during Mitaka, it was Jay Pipes who was the bottleneck with the development of the Resource Providers concept, and in Liberty, it was Sylvain Bauza with the huge amount of work he did to integrate the Request Spec into Nova. Don’t get me wrong: I’m not criticizing any of these people, as they all did great work. Rather, I am expressing frustration that they bore the brunt of the load, when it didn’t have to be that way. I think that it is time to try a different approach in Ocata.

I propose that we use Pair Development. No, not Pair Programming – that’s an entirely different thing. Pair Development is when each “chunk” of work is not undertaken by a single developer, but rather to two. They discuss the path they want to take ahead of time, and instead of splitting the work, they both work on the same patches at the same time. Wait, you say – won’t this slow things down? I don’t believe that it will, for several reasons. First, when discussing a design, having multiple sets of eyes will reduce the number of dead ends, in the same way that bugs are reduced in pair programming by having both developers review the code as it is being written. Second, when a reviewer finds an issue with a patch, either developer can make the fix. This is an even greater benefit if the two developers are in different, but overlapping, time zones.

We also have as evidence the week before the most recent Feature Freeze: the placement stuff needed to get in before FF, and so a whole group of us pulled together to make that happen. Having a diverse set of eyes uncovered several edge cases and inconsistencies in the code, and those were resolved pretty quickly. We used IRC mostly, but had a Google Hangout at least once a day to discuss any outstanding, unresolved matters, so that we would all be on the same page. So yeah, the time pressure helped instill a bit of urgency in us all, but I think that it was having all of us own the code, not just Chris, that made things happen as well as they did. I know that I was familiar with the code, having reviewed much of it before, but now that I had to change it and test it myself, my understanding grew much deeper. It’s amazing how deeper you understand something when you touch it instead of just look at it.

Another benefit of pair development is that it provides much more continuity when one of the developers takes some time off. Instead of the progress getting put on hold, the other member of the development pair can continue along. It will also help to have more than one person know the new code intimately, so that when a behavior surfaces that is not expected, we aren’t depending on a single person to figure out what’s going on.

So for Ocata, let’s figure out the tasks, and make sure that each has two people assigned to it. I will wager that come the end of the cycle, it will help us accomplish much more than we have in previous releases.

PyCon 2015

PyCon 2015 ended over a week ago, so you might be wondering why I’m writing this so late. Well, once again (see my PyCon 2014 post) I blame the location: the city of Montreal. We like it so much that Linda and I planned on staying a few extra days on holiday afterwards. After returning, though, I again payed the price by digging out from the accumulated backlog. It was well worth it, though!

Old Montreal
Old Montreal at night

If you weren’t able to go to PyCon, or even if you were there and don’t possess the ability to be in multiple places at once, you missed a lot of excellent talks. But no need to worry: the A/V team did an amazing job this year, and not only recorded every session, but got them posted to YouTube in record time – many just a few hours after the talk was completed! Major kudos to them for an excellent job.

swagline
swagbags The swag table (top) and pile of stuffed bags (bottom)

PyCon is an amazing effort by many people, all of whom are volunteers. One of my favorite volunteer activity is the stuffing of the swag bags. Think about it: over 3,000 attendees each receive a bag filled with the promotional materials from the various sponsors. Those items – flyers, toys, pens, etc. – are shipped from the sponsors to PyCon, and somehow one of each must get put into each one of those bags. Over the years we’ve iterated on the approach, trying all sorts of concurrency models, and have finally found one that seems to work best: each box of swag has one person to dish it out, and then everyone else picks up an empty bag and walks down the table, and one item of each is deposited in their bag. Actually it took two very long tables, after which the filled bag is handed to another volunteer, who folds and stacks it. It’s both exhausting and exhilarating at the same time. We managed to finish in just under 3 hours, so that’s over 1,000 bags completed per hour!

In between talks, I spent much of my time staffing the OpenStack booth, and talked with many people who had various degrees of familiarity with OpenStack. Some had heard the name, but not much else. Others knew it was “cloud something”, but weren’t sure what that something was. Others had installed and played around with it, and had very specific configuration questions. Many people, even those familiar with what OpenStack was, were surprised to learn that it is written entirely in Python, and that it is by far the largest Python project today. It was great to be able to talk to so many different people and share what the OpenStack community is all about.

Last year PyCon introduced a new conference feature: onsite child care for people who wanted to attend, but who didn’t have anyone to watch their kids during the conference. Now, since my kids are no longer “kids”, I would not have a personal need for this service, but I still thought that it was an incredible idea. Anything that encourages more people to be able to be a part of the conference is a good thing, and one that helps a particularly under-represented group is even better. So in that tradition, there was another enabling feature added this year: live captioning of every single talk! Each room had one of the big screens in the front dedicated to a live captioned stream, so that those attendees who cannot hear can still participate. I took a short, wobbly video when they announced the feature during the opening keynote so you can how prominent the screens were. I have a bit of hearing loss, so I did need to refer to the screen several times to catch what I missed. Just another example of how welcoming the Python community is.

gabriellacolemanTrue to last year’s form, one of the keynotes was focused on the online community of the entire world, not just the limited world of Python development. Last year was a talk by John Perry Barlow, former Grateful Dead lyricist and co-founder of the Electronic Frontier Foundation, sharing his thoughts on government spying and security. This year’s talk was from Gabriella Coleman, a professor of anthropology at McGill University. Her talk was on her work studying Anonymous, the ever-morphing group of online activists, and how they have evolved and splintered in response to events in the world. It was a fascinating look into a little-understood movement, and I would urge you to watch her keynote if you are at all interested in either online security and activism, or just the group itself.

jkmmediocreThe highlight of the conference for me and many others, though, was the extremely thoughtful and passionate keynote by Jacob Kaplan-Moss that attempts to kill the notion of “rockstar” or “ninja” programmers (ugh!) once and for all. “Hi, I’m Jacob, and I’m a mediocre programmer”. You really do need to find 30 minutes of time to watch it all the way through.

This last point is a long-time peeve of mine: the notion that programming is engineering, and that there are objective measurements that can be applied to it. Perhaps that will be fodder for a future blog post…

One aspect of all PyCons that I’ve been to is the friendships that I have made and renewed over the years. It’s always great to catch up with people you only see once a year, and see how their lives are progressing. It was also fun to take advantage of the excellent restaurants that the host city has to offer, and we certainly did that! On Sunday night, just after the closing of PyCon, we went out to dinner at Barroco, a wonderful restaurant in Old Montreal, with my long-time friends Paul and Steve. Good food, wonderful wine, and excellent company made for a very memorable evening.

dinner picture
(L to R) Paul McNett, Steve Holden, Linda and me.

This was my 12th PyCon in a row, and I certainly don’t plan on breaking that streak next year, when PyCon US moves back to the US – to Portland, Oregon, to be specific. I hope to see many of you there!

OpenStack Nova Mid-cycle Meetup, Day 3

The final day of the mid-cycle meetup started with some discussions about a few various issues. The first was regarding recent versions of libvirt not working well in our CI infrastructure, and the efforts to package these for Fedora, Ubuntu, and CentOS. The next, and somewhat more interesting (since I know very little about our CI infrastructure), was the discussion about EC2 API support in OpenStack. I found myself experiencing déjà vu, as this was so similar to the discussions about EC2 support in the early days of OpenStack: a few vocal people claimed that it was critical, but nobody seemed to feel that it was important enough to put in the time to maintain it properly. The consensus was that we should deprecate the EC2 API in Kilo, and remove it as soon as the L release. While a few people thought that this was a bit drastic, the truth is that the EC2 stuff hasn’t worked well since Folsom – hell, it had barely worked since the Cactus release. One bright spot for EC2 fans is that there is a project on StackForge to implement the EC2 API in a separate code base; this can be developed independent of the Nova source tree, and if it succeeds, great, but if it withers on the vine, Nova will not be stuck with a bunch of useless EC2 cruft in its code.

Bugs! We’ve climbed back up over 1,000 active bugs, and that’s certainly a cause for concern. Many of these, however, are considered trivial: not because the bug isn’t important, but because the fix is only a couple of changed lines with little possibility of impacting other parts of the code. There had been a plan to label these bugs so that core reviewers could find them easier and help reduce the overall load, but this seems to have lost momentum since the last release. So we asked a few people to volunteer to become “Trivial Patch Monkeys”, whose job it will be to regularly devote some of their time to going over the bug list to identify these trivial fixes. So far there are 6 monkeys… um, I mean, volunteers.

The last part of the morning was spent discussing the Feature Freeze Exception process for Kilo. The goal is to not only reduce the number of FFEs, but to get them to zero. Why is this so important? Well, adding new code so late in the cycle takes a lot of the time that Nova core reviewers have, so if we can keep that to a minimum (zero is a nice minimum!), it would free up the cores to review and merge as many bug fixes as possible before the release. It would also help people realize that FFEs are supposed to be very rare, and that it should truly require some unusual circumstance to be granted.

I couldn’t stay for the afternoon session, because I had to leave for the airport for my return flight home. I was very glad to have been able to participate in this event, as I learned an awful lot about some of the current intricacies of the project, which have grown considerably since the days when I was a core reviewer for Nova. It was also great to see some of the faces I first met at the Paris Summit again, and develop a deeper working relationship with them. So, until Vancouver

OpenStack Nova Mid-cycle Meetup, Day 2

The second day of the mid-cycle meetup was very different than the first (for a summary of that, please see yesterday’s post). While there was a set agenda that the group as a whole went through on Day 1, today was more or less broken out into ad-hoc groups who were working on a particular issue; many of these were groups of 1. So this post will be a lot shorter than yesterday’s, since I don’t know just what went on in each of those groups. Many of the groups were focused on patches that were very close to being ready that a lot of other work was depending on, with the goal of giving them that final push they needed to get them merged. I listened in on many of these discussions, mostly to learn more about that particular part of the codebase, since I didn’t have enough familiarity to help with the coding side of things. I also spent a lot of time reviewing the changes that were being pushed, which is also an excellent way to learn, as you not only can see the code, but you can read the insights of the other reviewers about the changes.

In the afternoon we had several of the nova-spec cores review my spec on changing how the scheduler gets instance information. I know that some people dread having their work examined and criticized, but I happen to love it. The discussions uncovered several things that needed to be accounted for that had never come up in all the prior back-and-forth on the spec, so I spent a lot of the rest of the afternoon incorporating their suggestions into a revised version, and pushed that up before the day was done. It also shows how these in-person meetings can get so much more accomplished than our typical remote tools such as email and IRC, and why the summits and mid-cycles are critical to attend.

OpenStack Nova Mid-cycle Meetup, Day 1

I’m here in Palo Alto, California, for the mid-cycle meetup of the OpenStack Nova team. For those of you unfamiliar with the concept, the OpenStack community worldwide gets together every 6 months at a Summit to collectively celebrate what we’ve accomplished, and to plan what we’ll be working on for the next 6 months. During the months that follow, though, it’s easy for things to slide off to the side, or for other things to creep up and get in the way of continued progress. So many of the programs that make up OpenStack plan on getting together about halfway through the process so that we all get an idea of the progress we’ve made, and can discuss and potentially solve any of the issues that would prevent us from completing the work we set out to do for this cycle.

For the Nova team, we set out several things as the priorities that we would be focusing on: the next generation of the Cells design (cells v2); the continued development of Nova Objects; cleaning up the interface between the Scheduler and Nova so that scheduler may eventually be split out; the v2.1 API (microversions); functional testing; nova-network migration; no downtime upgrades; as well as working on the number of bugs we have, and improving our testing infrastructure. The meeting today started with the people heading up each of those tasks giving an update on their progress.

First up was Cells v2. It’s moving along well, but not as fast as they would like. One of the big things was getting the CI testing working with cells, which currently cause most tests to fail. Progress has been made on disabling these tests for now, with the goal of fixing them so that our CI tests with cells on, which will be the standard once this work is complete. Cells are now a configurable option, and the tests now run with it off. By turning this back on, and adding the fixed tests in, we can eventually be confident that any new feature in Nova will work right away in a deployment using cells.

There has been good progress with the Objects work, but the biggest problem is that the first item to be objectified, Flavors, is a hairy mess, and required a bunch of changes to undo all the hacks that made flavors work in the past. Once completed it will bring a lot more sanity to flavors (which is a concept I believe should die in a fire, but I fought it years ago and lost, so we’re stuck with it now).

On the Scheduler front, we only had one outstanding spec (mine, of course!), and lots of code up for review. The series of patches to detach Service from Compute Node is the top priority, as so much of the later patches depend on these changes.

None of the principal movers on the v2.1 API was able to make the mid-cycle, but they did fill in some of their progress information on our shared etherpad. The testing integration is nearly done, but one possible problem is support for v2.1 in novaclient.

Functional testing is aiming to get a dozen or so test patterns defined that others can use as the basis for writing future functional tests. There probably won’t be much more than that in the Kilo timeframe, but the hope is that going forward these can help make funcitonal testing more pervasive.

There is a bunch of work being done for the nova-network to neutron migration, but one thing that everyone working on this wanted to make clear is that while they will be creating some tools to help deployers who want to make the switch, there will not be a single “click it and forget it” single-button migration in the near future. One other issue brought up is that while we are telling everyone who is deploying OpenStack to use Neutron and not nova-network, devstack still uses nova-network. This is poor dogfooding, so it was agreed that we will start to move devstack to use Neutron.

The zero-downtime migrations was interesting: the idea is that instead of running the current SQLAlchemy migrations which require taking the database offline, The new expand/contract approach will compare the defined structures in code with the current database, and if there is a discrepency, create the new structures (expand), migrate the data over, and then later remove the old, unneeded structures (contract). The first code patches to accomplish this have been working, although a lot of work remains to update the tests accordingly.

That was just the morning! The afternoon started with a whiteboard discussion I had asked for where we could identify just what we expect the interface between Nova and the (separated) Scheduler to look like. We did get into a little bit of implementation details at times, but overall we clarified the flow of messages between the two, and defined where the responsibility for ensuring that each build request succeeds should go. A lot of the discussion focused on how we can make the overall process bulletproof, which some saw as a tangent, but I think that this is what is needed: figure out what a solid, robust scheduling solution should look like, and though we aren’t going to get there in this cycle, or even the next, we can make sure that we’re moving towards that design.

The remainder of the day was largely focused on discussing process: how the Nova project is run. Was enough information communicated about what the priorities were? Were the various channels of communication being used well? How can we help the few Nova core reviewers handle the huge number of reviews more effectively? Everyone seemed to have their own preference (e.g., email vs. IRC), but no one had any concrete suggestions about what needs to change. It was pointed out that while the loads are high, they haven’t been getting worse, so there is some measure of stability.

I’m looking forward to Day 2, where we plan on breaking into smaller groups to focus on pushing through as many of the critical patches we can while we’re all in the same room. We’ll see how that goes!

A New (But Familiar) Adventure

OK, I admit it: I haven’t been posting here as regularly as I should be. It’s not like my life is so boring, or that I haven’t had any interesting thoughts to share. Rather, it’s because I have been so busy and my life changing so fast that I haven’t had the time to sit down, catch my breath, and write things down. I hope to be able to change that moving forward.

Today is the beginning of one of those changes: I start my new job as an OpenStack developer for IBM. In a way I feel like I’m coming back to a familiar world, even though I’ve never worked for IBM before. I was involved in the creation and initial design of OpenStack, and have always felt that it was partially “my baby” (ok, a very small part, but mine nonetheless). I left active involvement with OpenStack a few years ago to start the Developer Experience effort at Rackspace, and have only been tangentially working on OpenStack during that time. With the move to IBM, my focus will once again be on OpenStack, and I couldn’t be happier.