Why OpenStack Failed, or How I Came to Love the Idea of a BDFL

OK, so the title of this is a bit clickbait-y, but let me explain. By some measures, OpenStack is a tremendous success, being used to power several public clouds and many well-known businesses. But it has failed to become a powerful player in the cloud space, and I believe the reason is not technical in nature, but a lack of leadership.

OpenStack began as a collaboration between Rackspace, a commercial, for-profit business, and a consulting group working for NASA. While there were several companies involved in the beginning, Rackspace dominated by sheer numbers. This dominance was a concern to many companies – why should they contribute their time and resources to a project that might only benefit Rackspace? This fear was not entirely unfounded, as the OpenStack API was initially created to match Rackspace’s legacy cloud API, and much of the early naming of things matched Rackspace’s terminology – I mean, who ever thought of referring to virtual machines as “servers”? But that matched the “Cloud Servers” branding that Rackspace used for its cloud offering, and that name, as well as the use of “flavor” for instance sizing, persist today. The early governance was democratic, but when one company has many more votes than the others…

The executives at Rackspace were aware of this concern, and quickly created the OpenStack Foundation, which would be an independent entity that would own the intellectual property, helping to guarantee that one commercial company would not control the destiny of OpenStack. More subtly, though, it also engendered a deep distrust of any sort of top-down control over the direction of the software development. Each project within OpenStack was free to pretty much do things however they wanted, as long as they remained within the bounds of the Four Opens: Open Source, Open Design, Open Development, and Open Community.

That sound pretty good, right? I mean, who needs someone imposing their opinions on you?

Well, it turns out that OpenStack needed that. For those who don’t know the term “BDFL“, it is an acronym for “Benevolent Dictator For Life”. It means that the software created under a BDFL is opinionated, but it is also consistently opinionated. A benevolent dictator listens to the various voices asking for features, or designing an API, and makes a decision based on the overall good of the project, and not on things like favoring corporate interests for big contributors, or strong personalities that otherwise dominate design discussions. Can you imagine what AWS would be like if each group within could just decide how they wanted to do things? The imposition of the design from above assures AWS that each of its projects can work easily with others.

The closest thing to that in OpenStack is the Technical Committee (TC), which “is an elected group that represents the contributors to the open source project, and has oversight on all technical matters”. Despite the typical meaning of “oversight”, the TC is essentially a suggestion body, and has no real enforcement power. They can spend months agonizing over the wording of mission statements and community goals, but shy away from anything that might appear to be a directive that others must do. I don’t think the word “must” is in their vocabulary.

They also bend over backwards to avoid potentially offending anyone. Here is one example from my interactions with them: one of the things the TC does is “tag” projects, so that newcomers to OpenStack can get a better idea how mature a particular project is, or how stable, etc. One of the proposed tags was to warn potential users that a project was primarily being developed by a single company; the concern is that all it would take is one manager at that company to decide to re-assign their employees, and the project would be dead. This is a very valid concern for open source projects, and it was proposed that a tag named “team:diverse-affiliation-danger” be created to flag such projects. What followed was much back-and-forth on the review of the proposal as well as in TC meetings about how the tag name was negative and would hurt people’s feelings, how it would be seen as an attack against a project, that it was more of a stick rather than a carrot, etc. All of this hand-wringing over an objective measurement of the content of a project’s current level of activity. (Epilogue: they ended up making it a positive-sounding tag: “team:single-vendor”, and no tears were shed)

Having ineffective leadership like the TC has ripple effects throughout all of OpenStack. Each project is an island, and calls its own shots. So when two projects need to interact, they both see it from the perspective of “how will this affect me?” instead of “how will this improve OpenStack?”. This results in protracted discussions about interfaces and who will do what thing in what order. And when I say “protracted”, I don’t just mean weeks or months; some, such as the CyborgNova integration discussions, have dragged on for two years! I cannot imaging that happening in a world with an OpenStack BDFL. This inter-project friction slows down development of OpenStack as a whole, and in my opinion, contributes to developer dissatisfaction.

So what would OpenStack have been like if it had had a BDFL? Of course, that would depend entirely on the individual, but I can say this: it would have flamed out very quickly with a poor BDFL, or it would be a much better product with a much higher adoption with a good one. Back in 2013 I had predicted that OpenStack would eventually rival the commercial clouds in much the same manner that Linux now dominates the internet over proprietary operating systems. In the early days of the internet, the ability for people to download and play with free software such as the LAMP stack enabled people with big ideas but small budgets to turn those ideas into reality. OpenStack began in the early days of cloud computing, and it seemed logical that having a freely-available alternative to the commercial clouds might likewise result in new cloud-native creations becoming reality. It was a believable prediction, but I missed the effect that a lack of coordination from above would have on OpenStack achieving the potential to fill that role.

By the way, many people point to Linux and its BDFL, Linus Torvalds, as the argument against having a BDFL, as Linus has repeatedly behaved as an offensive ass towards others when he didn’t like their ideas. But ass or not, Linux succeeded because of having that single opinion consistently shaping its development. Most BDFLs, though, are not insufferable asses, and their projects are better off as a result.

Fanatical Support

“Fanatical Support®” – that’s the slogan for my former employer, Rackspace. It meant that they would do whatever it took to make their customers successful. From their own website:

Fanatical Support® Happens Anytime, Anywhere, and Any Way Imaginable at Rackspace

It’s the no excuses, no exceptions, can-do way of thinking that Rackers (our employees) bring to work every day. Your complete satisfaction is our sole ambition. Anything less is unacceptable.

Sounds great, right? This sort of approach to customer service is something I have always believed in. And it was my philosophy when I ran my own companies, too. Conversely, nothing annoys me more than a company that won’t give good service to their customers. So when I joined Rackspace, I felt right at home.

Back in 2012 I was asked to create an SDK in Python for the Rackspace Cloud, which was based on OpenStack. This would allow our customers to more easily develop applications that used the cloud, as the SDK would handle the minutiae of dealing with the API, and allow developers to focus on the tasks they needed to carry out. This SDK, called pyrax, was very popular, and when I eventually left Rackspace in 2014, it was quite stable, with maybe a few outstanding small bugs.

Our team at Rackspace promoted pyrax, as well as our SDKs for other languages, as “officially supported” products. Prior to the development of official SDKs, some people within the company had developed some quick and dirty toolkits in their spare time that customers began using, only to find out some time later when they had an issue that the original developer had moved on, and no one knew how to correct problems. So we told developers to use these official SDKs, and they would always be supported.

However, a few years later there was a movement within the OpenStack community to build a brand-new SDK for Python, so being good community citizens, we planned on supporting that tool, and helping our customers transition from pyrax to the OpenStackSDK for Python. That was in January of 2014. Three and a half years later, this has still not been done. The OpenStackSDK has still not reached a 1.0 release, which in itself is not that big a deal to me. What is a big deal is that the promise for transitioning customers from pyrax to this new tool was never kept. A few years ago the maintainers began replying to issues and pull requests stating that pyrax was deprecated in favor of the OpenStackSDK, but no tools or documentation to help move to the new tool have been released.

What’s worse, is that Rackspace now actively refuses to make even the smallest of fixes to pyrax, even though they would require no significant developer time to verify. At this point, I take this personally. For years I went to conference after conference promoting this tool, and personally promising people that we would always support it. I fought internally at Rackspace to have upper management commit to supporting these tools with guaranteed headcount backing them before we would publish them as officially supported tools. And now I’m extremely sad to see Rackspace abandon these people who trusted my words.

So here’s what I will do: I have a fork of pyax on my GitHub account. While my current job doesn’t afford me the time to actively contribute much to pyrax, I will review and accept pull requests, and try to answer support questions.

Rackspace may have broken its promises and abandoned its customers, but I cannot do that. These may not be my customers, but they are my community.

Is Swift OpenStack?

There has been some discussion recently on the OpenStack Technical Committee about adding Golang as a “supported” language within OpenStack. This arose because the Swift project had recently run into some serious performance issues, which they solved by re-writing the bottleneck process in Golang with much success. I’m not writing here to debate the merits of making OpenStack more polyglot (it’s no secret that I oppose that), but instead, I want to address the issue of Swift not behaving like the rest of OpenStack.

Doug Hellman summarized this feeling well, originally writing it in a pastebin, but then copying it into a review comment on the TC proposal. Essentially, it says that while Swift makes some efforts to do things the “OpenStack Way”, it doesn’t hesitate to follow its own preferences when it chooses to.

I believe that there is good reason for this, and I think that people either don’t know or forget a lot of the history of OpenStack when they discuss Swift. Here’s some background to clarify:

Back in the late ’00s, Rackspace had a budding public cloud business (note: I worked for Rackspace from 2008-2014). It had bought Slicehost, a company with a closed-source VPS system that it used as the basis for its Cloud Servers product, and had developed a proprietary object storage system called NAST (Not Another S Three: S3, get it?). They began hitting limits with NAST fairly soon – it was simply too slow. So it was decided to write a new system with scalability in mind that would perform orders of magnitude better than NAST; this was named ‘Swift’ (for obvious reasons). Swift was developed in-house as a proprietary software project. The development team was a small, close-knit group of guys who had known each other for years. I joined the Swift development team briefly in 2009, but as I was the only team member working remotely, I was at a significant disadvantage, and found it really difficult to contribute much. When I learned that Rackspace was forming a distributed team to rewrite the Cloud Servers software, which was also beginning to hit scalability limits, I switched to that team. For a while we focused on keeping the Slicehost code running while starting to discuss the architecture of the new system. Meanwhile the Swift team continued to make strong progress, releasing Swift into production in the spring of 2010, several months before OpenStack was announced.

At roughly the same time, the other main part of OpenStack, Nova, was being started by some developers working for NASA. It worked, but it was, shall we say, a little rough in spots, and lacked some very important features. But since Nova had a lot of the things that Rackspace was looking for, we started talking with NASA about working together, which led to the creation of OpenStack. So while Rackspace was a major contributor to Nova development back then, from the beginning we had to work with people from a wide variety of companies, and it was this interaction that formed the basis of the open development process that is now the hallmark of OpenStack. Most of the projects in OpenStack today grew out of Nova (Glance, Neutron, Cinder), or are built on top of Nova (Trove, Heat, Watcher). So when we talk about the “OpenStack Way”, it really is more accurately thought of as the “Nova” way, since Nova was only half of OpenStack. These two original halves of OpenStack were built very differently, and that is reflected in their different cultures. So I don’t find it surprising that Swift behaves very differently. And while many more people work on it now than just the original team from Rackspace, many of that original team are still developing Swift today.

I do find it somewhat strange that Swift is being criticized for having “resisted following so many other existing community policies related to consistency”. They are and always have been distinct from Nova, and that goes for the community that sprang up around Nova. It feels really odd to ignore that history, and sweep Swift’s contributions away, or disparage their team’s intentions, because they work differently. So while I oppose the addition of languages other than Python for non-web and non-shell programming, I also feel that we should let Swift be Swift and let them continue to be a distinct part of OpenStack. Requiring Swift to behave like Nova and its offspring is as odd a thought as requiring Nova et. al. to run their projects like Swift.

A New (But Familiar) Adventure

OK, I admit it: I haven’t been posting here as regularly as I should be. It’s not like my life is so boring, or that I haven’t had any interesting thoughts to share. Rather, it’s because I have been so busy and my life changing so fast that I haven’t had the time to sit down, catch my breath, and write things down. I hope to be able to change that moving forward.

Today is the beginning of one of those changes: I start my new job as an OpenStack developer for IBM. In a way I feel like I’m coming back to a familiar world, even though I’ve never worked for IBM before. I was involved in the creation and initial design of OpenStack, and have always felt that it was partially “my baby” (ok, a very small part, but mine nonetheless). I left active involvement with OpenStack a few years ago to start the Developer Experience effort at Rackspace, and have only been tangentially working on OpenStack during that time. With the move to IBM, my focus will once again be on OpenStack, and I couldn’t be happier.

Review: GlueCon 2014

GlueCon 2014 was held on May 21-22 at the Omni Interlocken Hotel in Broomfield, Colorado, and I was fortunate enough to be selected as a speaker there for the second year in a row. For those of you who aren’t familiar with GlueCon, it’s not quite an unconference, but it is a loosely-structured gathering of people working in lots of different technologies that drive the web today, with the focus on the “glue” that holds them all together: developers.

The morning of the 21st was a series of keynotes, and devops seemed to be the topic on everyone’s mind. There was one keynote in general that I strongly disagreed with: John Sheehan of Runscope claimed that “API SDKs Will Ruin Your Life” (the title of the keynote). The assumption, of course, was that the only reason that anyone would use an SDK is because the native API is terrible, and it is the reliance on SDKs that is causing API devs to get sloppy. While that might be true in some sense, the responsibility for creating a good API rests on the API developers, and if they are not professional enough to take pride in the quality of their APIs, I doubt that any SDK would change that. Creating good, consistent APIs is not easy, and typically shouldn’t be left to the the people creating the underlying product that the API is exposing.

The afternoon was broken into several tracks, with the idea that the talks in each track were related, and would happen one right after the other, with no break in between. This was when my talk was scheduled, and it was part of the Cloud track. The talk was titled “How to Leave Rackspace”, and no, I wasn’t encouraging people to leave Rackspace! (I like having a job!). It was about the realities of portability among clouds, and emphasized the importance of data gravity binding you to a provider. I also covered the importance of API compatibility, and how working with providers sharing a common API, such as OpenStack-based providers, won’t do anything about data gravity, but will at least allow your applications to run on different providers.

The script I showed used pyrax to transfer a running compute instance in Rackspace to the HP cloud, so that you could move an entire server with just a few lines of code. The overall lesson, however, was to take the time to choose your provider carefully up front, instead of relying on some promise of portability later. The audience seemed pretty engaged during the talk, but there weren’t many questions and what seemed like lukewarm applause afterwards. I was a little disappointed, but noticed the same pattern for the other speakers who followed me, so I thought it might be due to the format. Later on I did run into several people who attended my session, and their feedback was very positive, so that made me feel better about the talk. This was the first time I’ve given this particular talk, and so I was very appreciative of the feedback.

After my talk was an excellent talk by Cornelia Davis of Pivotal that gave us a peek under the hood of Cloud Foundry. I was familiar with what Cloud Foundry could do, but I hadn’t had any exposure to the underlying architecture, so this was very interesting. Following that were talks on OpenStack Swift and network virtualization, which were both full of useful information, and then there was a talk on “The New Development Experience” that had some good stuff on Bluemix, IBM’s implementation of Cloud Foundry, but was too filled with buzzwords to be interesting. For a sample of what I mean, here’s a sentence from the talk abstract:

“The cloud combined with the need-for-time to value is driving a movement toward a scalable and flexible cloud computing development model that provides application developers with the freedom to innovate rapidly, integrate with existing infrastructure, use or create APIs, and create cloud native applications”.

I spent the rest of the day working at the booth, and noticed a distinct difference in attitude from last year’s crowd. For some reason, the crowd last year was not only very pro-AWS, but actively hostile to OpenStack. This year I didn’t get any of that negativity; instead, I spoke with developers and business people who knew about OpenStack and Rackspace, and simply wanted more information. My favorite, though, was Atsushi Yamanaka, from IDC Frontier in Japan. He flew to the Denver area the previous day from Tokyo to attend GlueCon, and was flying back to Tokyo the day after GlueCon ended. Talk about dedication! He wanted to get a better idea of Rackspace’s presence in the Asia/Pacific area, both current and future, and unfortunately I had to explain that those decisions are made at much higher pay grades than mine!

The second day of the conference similarly opened with a series of keynotes. I had to work the booth and also work on some code, so I didn’t catch most of them. I did, however, catch the keynote by Joshua McKenty of Piston Cloud Computing, which I found to be equal parts entertaining and informative. I’ve known Joshua for several years now, since the birth of OpenStack, and know as well as anyone how he can tend to overstate things just to get a rise out of the audience. He also knocked SDKs, but made it clear that it was only when they were used as crutches for bad APIs, and I had to comment about it on Twitter. The example he gave was that if instead of a website, you had created a duck, when you create the duck API, it would be dumb to expose the gizzard and the inner workings of its circulatory system, yet developers do the equivalent of that regularly. When people want to use your duck, they are interested in its ability to fly or swim or eat or quack, and don’t really care about the incredibly complex implementation details that make those abilities possible. Developers, on the other hand, are so proud of their incredibly complex implementations that they tend to feature them front and center in their API design. This makes for an unwieldy API, which then almost forces users of it to rely on an SDK to be able to do anything useful.

There was another keynote by Julia Ferraioli from Google on developer outreach, and since that’s my job, I was interested to hear what she had to say. Unfortunately, the talk was really nothing more than the general knowledge stuff that people working in this field for any length of time already know; I certainly didn’t pick up anything new. I hope, though, that maybe some of the non-developers in the crowd learned something.

I wish I could comment on more of the talks, but again I spent a good part of the day either working the booth or writing code for work. I did see a couple of talks that revolved around novel uses of Docker to either speed up or reduce the cost of CI/CD enviroments; it certainly seems as though Docker was the new hotness among developers at GlueCon.

I can’t say enough good things about the event itself: the location, the organization, the food/refreshments, the attendees, and the quality of the talks were all excellent. If you get a chance to attend GlueCon 2015 next year, go for it!