OpenStack – Walking Contradiction

Day 52: Happy 10th Birthday, OpenStack!

I just saw this announcement from the OpenStack Foundation about OpenStack’s 10th birthday! Yes, 10 years ago this week was the first OpenStack Summit, in Austin, TX, with the public announcement the following week at O’Rielly OSCO N. Yet most people don’t know that I played a very critical role in the beginning!

OpenStack began as a joint venture between Rackspace (my employer at the time) and NASA. I was on the team at Rackspace that developed and supported its aging cloud compute services, and we were looking to develop something from scratch that could be much more scalable than our current system. Around that time Thierry Carrez saw an announcement from a group at NASA about their development of a compute virtualization system, and suggested to the powers that be at Rackspace that this might be a better way to go instead of developing the whole thing ourselves. From that followed a lot of discussion among the executives at Rackspace, as well as some conversations with NASA, and the conclusion was that we would team up. One of the first things to do was to get the developers for both groups together to discuss things from a more technical perspective. And this is where I believe that I made a critical decision that, had I chosen wrong, might have resulted in OpenStack never happening.

The NASA team was a consulting group, Anso Labs, and they were arriving in San Antonio, and we had plans to take them out to dinner, but no idea where. It was then that I suggested The Cove, a local place with lots of outdoor seating, a relaxed atmosphere, good beer, and delicious food. We had a great time that evening, and we all got to know each other. Had we been in a more conventional restaurant, people may have only gotten to know the people sitting next to them, but since The Cove is open seating, we moved around a lot, talking about both the technical stuff and personal stuff.

Over the next few days we began reviewing the code and exchanging idea on what needed to be developed next, and those discussions went very smoothly, getting a lot done in a short period of time. I still maintain to this day that without that first night having beer and food at The Cove, OpenStack might never have become the success that it did.

You’re welcome.

Day 12: Communities and Survivorship Bias

Communities, especially Open Source communities, tend to form some form of governance once they grow beyond a certain size. The actual size isn’t as important as the relationship among the members: when everyone knows everyone else, there’s really no need for governance. But when individuals come from different companies, or who otherwise may have different interests than the others, there needs to be some ground rules for making decisions on what does or does not get done. Without governance, projects will inevitably fork when these differences get large enough.

Typically governance is established by what most people involved like to think is a meritocracy: the hardest-working, most knowledgeable people are the ones who make the important decisions. At first glance this seems perfectly fair, and it usually is—initially, at least. Over time, though, this system is prone to the problems of Survivorship Bias. Let me illustrate how that happens.

Imagine a group of people who are on a long hike through the wilderness. There will be some people who have more skill reading a map, or operating a compass, or who know the terrain better. When the group starts out, it is only natural that these people lead the group, and they are given the title of Navigator. The group creates rules that while anyone can provide ideas as to what direction they should head in, only Navigators can make that choice. It works well for a while.

As time passes, though, and people in the group learn more about map reading and terrain features, their knowledge begins to approach the level of the existing Navigators. At that point it would seem fair to also designate these people as Navigators, since they now have enough knowledge to make directional decisions. But the rule is that the only way an existing group member can be designated a Navigator is if all of the existing Navigators agree. In other words, the process is largely subjective, as there is no objective test for competency. It also calls for a good deal of trust.

After a while, some people realize that while the Navigators have generally been doing their job well, they have made some errors that have taken the group off of the ideal path. Some group members point that out, and want to adjust course to get back to where they should have been. The Navigators, though, prefer to keep moving forward, even if it makes for a longer and more difficult hike in the long term; they prefer the feeling of moving ahead. Those who disagree go off on their own in frustration. Others within the group get the clear message that if they ever want to become a Navigator, they should curry favor with the existing Navigators. And when they do make it into that core group, they feel that they have worked hard to earn it, and anyone else who wants to reach that level has to play by the same rules that they did.

This is classic survivorship bias. The only people who can change the system are the ones who agreed with it in the first place, and thus don’t really see a problem with it. The voices of disagreement fade away until they can no longer be heard, so everyone thinks it’s all good. The system self-perpetuates.

I’ve seen this in action in several communities, but none so strikingly as in the OpenStack community, both on the Nova team as well as the Technical Committee that is supposed to provide technical leadership. I originally wrote a draft of this a year ago when I was working in that community, and became increasingly frustrated at how decisions were made. I happened to “run into” (electronically, of course) a few former Nova developers who had moved on, and when I expressed my frustration, they both said that similar feelings were why they looked to move to a different project. That’s when the role of survivorship bias became clear to me.

As I’m no longer in the OpenStack community, I don’t need to vent about particular issues or personalities. That’s history to me. I do hope that people realize that survivorship bias can shape how a community views itself, because if you are coming up short in some areas, you won’t know about it, because the people affected usually leave rather than deal with that BS. If you care about growing a healthy community, you need to make it easy and welcoming for people to share their ideas. And you should also take the time when someone who was active decides to leave to do a sort of exit interview. You might learn something important.

Writing Again

Today marks 2 months since I was laid off from my job at DataRobot. It was part of a 25% reduction that was made in anticipation of the business slump from the COVID-19 pandemic, and having just been there for 6 months, I was one of the ones let go.

I have spent the last two months like most of you largely confined to my home, with only an occasional trip to the grocery store. Since I was home with all this newfound free time on my hands, I decided to work on a lot of projects that I’ve had to put off. Now that those are largely complete, I need to find a new outlet to occupy my time. So today I’m going to start on a program of writing every day; you’re reading the first entry.

Not to brag, but I’ve always been pretty good writer. The biggest problem that I have is perfectionism: I want to edit/rewrite until it can’t be tweaked any more. As an example, I was stuck for 10 minutes just coming up with a title for this post! I mentioned this tendency in the very first post of this blog: I was about to travel to Australia, and wanted to have a way to record my impressions. I had hardly ever traveled outside the US, so this was a really big deal for me.

I kept it up for a short while, but soon fell back into my old ways: starting a post, and then abandoning it because it didn’t feel good enough. When I joined IBM in 2014 as an OpenStack developer, part of my role was to be outspoken, and writing blog posts was one way to do that. So for a while I was posting fairly regularly. This time, though, it was the blowback from those posts that caused me to lose interest in writing. You see, the OpenStack developer hierarchy is designed to discourage change and alternate approaches, both of which were the frequent topics of my posts. It discouraged me because I was publicly criticized by many of the “core” developers, who seemed to take my ideas as threats to the way they were doing things. It was even more discouraging that I received at least as much private thanks and praise from others, all of whom were not comfortable expressing support publicly, less they lose political capital with the core developers. I have a post I started and never finished on this toxic atmosphere; maybe one of these days I’ll finish it as part of this new effort.

So I’m now going to reiterate my initial pledge: I will limit myself to a one edit rule: after the post is written, I’ll go over it once for typos, etc., and then publish it. I’m also setting a minimum of 30 minutes per day to write, and to publish it no matter how good I feel it is. There will be no set topic; one day it may be my thoughts on photographic composition, the next may be a tirade about the latest Trump atrocity. But I do hope that what I write is as interesting for you to read as it is for me to go through the process of writing.

Moving On

It’s been a great run, but my days in the OpenStack world are coming to an end. As some of you know already, I have accepted an offer to work for DataRobot. I only know bits and pieces of what I will be working on there, but one thing’s for sure: it won’t be on OpenStack. And that’s OK with me, as I’ve been working on OpenStack in one form or another for 10 years now.

Wait a moment, you say – OpenStack is only 9 years old! Well, before the OpenStack project was started, I worked on Swift briefly when it was an internal, proprietary project at Rackspace. After that I switched to the Cloud Servers team, which was the team that started Nova with NASA. So yeah, it’s been a full decade. That’s a loooonnnnggg time to be on any development project!

So the feelings of burnout combined with the shift away from OpenStack within IBM made moving to DataRobot a very attractive option. And after having done several video interviews with the people there and getting their impression of life at DataRobot, I’m that much more excited to be joining that team. I’m sure that for the first few months it will be like drinking from the proverbial fire hose, and that’s perfectly fine by me. It’s been much too long since I’ve pushed the reset button for my career.

Over these past 10 years I have made many professional contacts, some of whom I consider true friends. I will miss the OpenStack community, and I hope to run into many of you at future tech events – PyCon, anyone?

Why OpenStack Failed, or How I Came to Love the Idea of a BDFL

OK, so the title of this is a bit clickbait-y, but let me explain. By some measures, OpenStack is a tremendous success, being used to power several public clouds and many well-known businesses. But it has failed to become a powerful player in the cloud space, and I believe the reason is not technical in nature, but a lack of leadership.

OpenStack began as a collaboration between Rackspace, a commercial, for-profit business, and a consulting group working for NASA. While there were several companies involved in the beginning, Rackspace dominated by sheer numbers. This dominance was a concern to many companies – why should they contribute their time and resources to a project that might only benefit Rackspace? This fear was not entirely unfounded, as the OpenStack API was initially created to match Rackspace’s legacy cloud API, and much of the early naming of things matched Rackspace’s terminology – I mean, who ever thought of referring to virtual machines as “servers”? But that matched the “Cloud Servers” branding that Rackspace used for its cloud offering, and that name, as well as the use of “flavor” for instance sizing, persist today. The early governance was democratic, but when one company has many more votes than the others…

The executives at Rackspace were aware of this concern, and quickly created the OpenStack Foundation, which would be an independent entity that would own the intellectual property, helping to guarantee that one commercial company would not control the destiny of OpenStack. More subtly, though, it also engendered a deep distrust of any sort of top-down control over the direction of the software development. Each project within OpenStack was free to pretty much do things however they wanted, as long as they remained within the bounds of the Four Opens: Open Source, Open Design, Open Development, and Open Community.

That sound pretty good, right? I mean, who needs someone imposing their opinions on you?

Well, it turns out that OpenStack needed that. For those who don’t know the term “BDFL“, it is an acronym for “Benevolent Dictator For Life”. It means that the software created under a BDFL is opinionated, but it is also consistently opinionated. A benevolent dictator listens to the various voices asking for features, or designing an API, and makes a decision based on the overall good of the project, and not on things like favoring corporate interests for big contributors, or strong personalities that otherwise dominate design discussions. Can you imagine what AWS would be like if each group within could just decide how they wanted to do things? The imposition of the design from above assures AWS that each of its projects can work easily with others.

The closest thing to that in OpenStack is the Technical Committee (TC), which “is an elected group that represents the contributors to the open source project, and has oversight on all technical matters”. Despite the typical meaning of “oversight”, the TC is essentially a suggestion body, and has no real enforcement power. They can spend months agonizing over the wording of mission statements and community goals, but shy away from anything that might appear to be a directive that others must do. I don’t think the word “must” is in their vocabulary.

They also bend over backwards to avoid potentially offending anyone. Here is one example from my interactions with them: one of the things the TC does is “tag” projects, so that newcomers to OpenStack can get a better idea how mature a particular project is, or how stable, etc. One of the proposed tags was to warn potential users that a project was primarily being developed by a single company; the concern is that all it would take is one manager at that company to decide to re-assign their employees, and the project would be dead. This is a very valid concern for open source projects, and it was proposed that a tag named “team:diverse-affiliation-danger” be created to flag such projects. What followed was much back-and-forth on the review of the proposal as well as in TC meetings about how the tag name was negative and would hurt people’s feelings, how it would be seen as an attack against a project, that it was more of a stick rather than a carrot, etc. All of this hand-wringing over an objective measurement of the content of a project’s current level of activity. (Epilogue: they ended up making it a positive-sounding tag: “team:single-vendor”, and no tears were shed)

Having ineffective leadership like the TC has ripple effects throughout all of OpenStack. Each project is an island, and calls its own shots. So when two projects need to interact, they both see it from the perspective of “how will this affect me?” instead of “how will this improve OpenStack?”. This results in protracted discussions about interfaces and who will do what thing in what order. And when I say “protracted”, I don’t just mean weeks or months; some, such as the Cyborg–Nova integration discussions, have dragged on for two years! I cannot imaging that happening in a world with an OpenStack BDFL. This inter-project friction slows down development of OpenStack as a whole, and in my opinion, contributes to developer dissatisfaction.

So what would OpenStack have been like if it had had a BDFL? Of course, that would depend entirely on the individual, but I can say this: it would have flamed out very quickly with a poor BDFL, or it would be a much better product with a much higher adoption with a good one. Back in 2013 I had predicted that OpenStack would eventually rival the commercial clouds in much the same manner that Linux now dominates the internet over proprietary operating systems. In the early days of the internet, the ability for people to download and play with free software such as the LAMP stack enabled people with big ideas but small budgets to turn those ideas into reality. OpenStack began in the early days of cloud computing, and it seemed logical that having a freely-available alternative to the commercial clouds might likewise result in new cloud-native creations becoming reality. It was a believable prediction, but I missed the effect that a lack of coordination from above would have on OpenStack achieving the potential to fill that role.

By the way, many people point to Linux and its BDFL, Linus Torvalds, as the argument against having a BDFL, as Linus has repeatedly behaved as an offensive ass towards others when he didn’t like their ideas. But ass or not, Linux succeeded because of having that single opinion consistently shaping its development. Most BDFLs, though, are not insufferable asses, and their projects are better off as a result.