Damned If You Do

The recent spate of canceled conferences, sporting events, etc., due to concerns about spreading infection of the coronavirus COVID-19 has made me think about what will happen if these efforts are successful.

Back in the late 1990s, people realized that a lot of software written in the mid-20th century had a problem: due to the expense of storage, programmers shortened the way years were stored, so that something like 1978 would be stored as 78, with the century assumed. This was fine, but as that software aged, and the coming change of century approached, it was realized that many critical software problems would go from December 31, 1999, to January 1, 1900. This was the Year 2000 problem, commonly abbreviated as Y2K.

Having recognized the issue, most software companies invested heavily in updating their software to use full 4-digit representations for the year. It was tedious work; I personally had to write a series of tests for my projects that verified that things would continue to work in the year 2000. But because the warning was heeded, by the time that January 1, 2000 came most software had been updated. As a result, all of the doomsday scenarios (such as planes dropping from the sky) had been avoided. Yes, there were some billing glitches that were missed, but because of the intense efforts to address this problem, there were no serious problems.

What was the public’s reaction to this? Did they laud the developers for successfully averting a potential problem? Of course not. Instead, they reacted with disdain: “I thought this was going to be the end of the world! Nothing happened!”.

And that’s the point: because the warnings were heeded, and action was taken, nothing catastrophic happened. It didn’t mean that the problem wasn’t real; it just meant that the tech community understood the problem, and addressed it head-on.

So I’m wondering what will happen if the common-sense steps we are taking now to avoid spreading this virus ends up that not that many people get sick or die: will the Fox News people start complaining that it was all a politically-inspired hoax? That the liberal media tried to make Trump look bad by crying wolf? It almost makes me think that if there is a terrible body count, people will be ridiculed for taking ineffective steps, but if there isn’t such a terrible outcome, the steps that were taken will be ridiculed as overreaction, or, even worse, a political stunt.

Damned if you do; damned if you dont.

Why Go to a Tech Conference?

Good question! It does seem unnecessary, especially since most major conferences record every talk and make them freely available online. PyCon has been doing this for many years, and are so good at it that the talks are available online shortly after they are finished! So there’s no real penalty for waiting until you can watch it online.

I suppose that if you look at tech conferences as simply dry tutorials on some new tool or technique, the answer would be “no, you should save your money and watch the sessions at home”. But there are much bigger benefits to attending a conference than just the knowledge available at the talks. I like to think of it as pressing the restart button on my thinking as a developer. By taking advantage of these additional avenues of learning, I come away with a different perspective on things: new tools, new ways of using existing tools, different approaches to solving development issues, and so much more that is intangible. Limiting yourself to the tangible resources of a conference means that you’re missing out. So what are these intangible things?

One of the most important is meeting people. Not so much to build your social network, but more to expand your understanding of different approaches to development. The people there may be strangers, but you know that you have at least one thing in common with them, so it’s easy to start conversations. I’ve been to 14 PyCons, and at lunch I make it a point to sit at tables where I don’t know anyone, and ask the people there “So what do you use Python for?”. Invariably they use it in ways that I had never thought about, or to solve problems that I had never worked on. The conversation can then move on to “Where are you from?”; people usually love to brag about their home town, and you might learn a few interesting things about a place you’ve never been to. Many people also go out to dinner in groups, usually with people who know each other, but I always try to look for people who are alone, and invite them to join our group.

Another major benefit of attending in person is what is known as the “hallway track”. These are the unscheduled discussions that occur in the hallways between sessions; sometimes they are a continuation of discussions that were held in a previous talk, and other times they are simply a bunch of people exchanging ideas. Some of the best technical takeaways I’ve gotten from conferences have come from these hallway discussions. When you’ve been to as many PyCons as I have, there are many people I run into who I haven’t seen since the last PyCon, and we can catch up on what’s new in each other’s lives and careers. Like the lunchtime table discussions, these are opportunities to learn about techniques and approaches that are different than what you regularly do.

Closely related to the above is the “bar track”. Most conferences have a main hotel for attendees, and in the evening you can find lots of people hanging out in the bar. The discussions there tend to have a bit less technical content, for obvious reasons, but I’ve been part of some very technical discussions where the participants are all on their third beer or so. But even if you don’t drink alcohol, you can certainly enjoy hanging out with your fellow developers in the evening. Or, of course, you can use that time to recharge your mental batteries.

Yet another opportunity at a conference is to enhance your career. There is usually some form of formal recruiting; if you’re looking for a change of career, this can be a valuable place to start. I’ve heard some managers say that they won’t send their developers to conferences because they are afraid that someone will hire them; it makes you wonder why they think their developers are not happy with their current job! But even if you’re not looking to make a career move at the moment, establishing relationships with others in your field can come in handy in the future if your job suddenly disappears. You can also learn what companies are looking for skills that match yours; I was surprised to learn that companies as diverse as Disney, Capital One, Yelp, and Bloomberg are all looking for Python talent. As an example, back at PyCon 2016 I met with some people recruiting for DataRobot, and while I didn’t pursue things then, they made a good impression on me. When I was looking for a change last year and got a LinkedIn message from a recruiter at DataRobot, I remembered them well, and this time I followed up, with the result that I’m now happily employed by DataRobot!

Unfortunately, I’ve seen people who arrive to a conference with a group of co-workers, attend the sessions, eat with each other at lunch, and then go out to dinner together. By isolating themselves and confining their learning to the scheduled talks, they are missing out on the most valuable part of attending a conference: interacting with your community, and sharing knowledge with your peers. If this sounds like you, I would advise you to try out some of the things I’ve mentioned here. I’m sure you will find that your conference experience is greatly improved!

Moving On

It’s been a great run, but my days in the OpenStack world are coming to an end. As some of you know already, I have accepted an offer to work for DataRobot. I only know bits and pieces of what I will be working on there, but one thing’s for sure: it won’t be on OpenStack. And that’s OK with me, as I’ve been working on OpenStack in one form or another for 10 years now.

Wait a moment, you say – OpenStack is only 9 years old! Well, before the OpenStack project was started, I worked on Swift briefly when it was an internal, proprietary project at Rackspace. After that I switched to the Cloud Servers team, which was the team that started Nova with NASA. So yeah, it’s been a full decade. That’s a loooonnnnggg time to be on any development project!

So the feelings of burnout combined with the shift away from OpenStack within IBM made moving to DataRobot a very attractive option. And after having done several video interviews with the people there and getting their impression of life at DataRobot, I’m that much more excited to be joining that team. I’m sure that for the first few months it will be like drinking from the proverbial fire hose, and that’s perfectly fine by me. It’s been much too long since I’ve pushed the reset button for my career.

Over these past 10 years I have made many professional contacts, some of whom I consider true friends. I will miss the OpenStack community, and I hope to run into many of you at future tech events – PyCon, anyone?

Why OpenStack Failed, or How I Came to Love the Idea of a BDFL

OK, so the title of this is a bit clickbait-y, but let me explain. By some measures, OpenStack is a tremendous success, being used to power several public clouds and many well-known businesses. But it has failed to become a powerful player in the cloud space, and I believe the reason is not technical in nature, but a lack of leadership.

OpenStack began as a collaboration between Rackspace, a commercial, for-profit business, and a consulting group working for NASA. While there were several companies involved in the beginning, Rackspace dominated by sheer numbers. This dominance was a concern to many companies – why should they contribute their time and resources to a project that might only benefit Rackspace? This fear was not entirely unfounded, as the OpenStack API was initially created to match Rackspace’s legacy cloud API, and much of the early naming of things matched Rackspace’s terminology – I mean, who ever thought of referring to virtual machines as “servers”? But that matched the “Cloud Servers” branding that Rackspace used for its cloud offering, and that name, as well as the use of “flavor” for instance sizing, persist today. The early governance was democratic, but when one company has many more votes than the others…

The executives at Rackspace were aware of this concern, and quickly created the OpenStack Foundation, which would be an independent entity that would own the intellectual property, helping to guarantee that one commercial company would not control the destiny of OpenStack. More subtly, though, it also engendered a deep distrust of any sort of top-down control over the direction of the software development. Each project within OpenStack was free to pretty much do things however they wanted, as long as they remained within the bounds of the Four Opens: Open Source, Open Design, Open Development, and Open Community.

That sound pretty good, right? I mean, who needs someone imposing their opinions on you?

Well, it turns out that OpenStack needed that. For those who don’t know the term “BDFL“, it is an acronym for “Benevolent Dictator For Life”. It means that the software created under a BDFL is opinionated, but it is also consistently opinionated. A benevolent dictator listens to the various voices asking for features, or designing an API, and makes a decision based on the overall good of the project, and not on things like favoring corporate interests for big contributors, or strong personalities that otherwise dominate design discussions. Can you imagine what AWS would be like if each group within could just decide how they wanted to do things? The imposition of the design from above assures AWS that each of its projects can work easily with others.

The closest thing to that in OpenStack is the Technical Committee (TC), which “is an elected group that represents the contributors to the open source project, and has oversight on all technical matters”. Despite the typical meaning of “oversight”, the TC is essentially a suggestion body, and has no real enforcement power. They can spend months agonizing over the wording of mission statements and community goals, but shy away from anything that might appear to be a directive that others must do. I don’t think the word “must” is in their vocabulary.

They also bend over backwards to avoid potentially offending anyone. Here is one example from my interactions with them: one of the things the TC does is “tag” projects, so that newcomers to OpenStack can get a better idea how mature a particular project is, or how stable, etc. One of the proposed tags was to warn potential users that a project was primarily being developed by a single company; the concern is that all it would take is one manager at that company to decide to re-assign their employees, and the project would be dead. This is a very valid concern for open source projects, and it was proposed that a tag named “team:diverse-affiliation-danger” be created to flag such projects. What followed was much back-and-forth on the review of the proposal as well as in TC meetings about how the tag name was negative and would hurt people’s feelings, how it would be seen as an attack against a project, that it was more of a stick rather than a carrot, etc. All of this hand-wringing over an objective measurement of the content of a project’s current level of activity. (Epilogue: they ended up making it a positive-sounding tag: “team:single-vendor”, and no tears were shed)

Having ineffective leadership like the TC has ripple effects throughout all of OpenStack. Each project is an island, and calls its own shots. So when two projects need to interact, they both see it from the perspective of “how will this affect me?” instead of “how will this improve OpenStack?”. This results in protracted discussions about interfaces and who will do what thing in what order. And when I say “protracted”, I don’t just mean weeks or months; some, such as the CyborgNova integration discussions, have dragged on for two years! I cannot imaging that happening in a world with an OpenStack BDFL. This inter-project friction slows down development of OpenStack as a whole, and in my opinion, contributes to developer dissatisfaction.

So what would OpenStack have been like if it had had a BDFL? Of course, that would depend entirely on the individual, but I can say this: it would have flamed out very quickly with a poor BDFL, or it would be a much better product with a much higher adoption with a good one. Back in 2013 I had predicted that OpenStack would eventually rival the commercial clouds in much the same manner that Linux now dominates the internet over proprietary operating systems. In the early days of the internet, the ability for people to download and play with free software such as the LAMP stack enabled people with big ideas but small budgets to turn those ideas into reality. OpenStack began in the early days of cloud computing, and it seemed logical that having a freely-available alternative to the commercial clouds might likewise result in new cloud-native creations becoming reality. It was a believable prediction, but I missed the effect that a lack of coordination from above would have on OpenStack achieving the potential to fill that role.

By the way, many people point to Linux and its BDFL, Linus Torvalds, as the argument against having a BDFL, as Linus has repeatedly behaved as an offensive ass towards others when he didn’t like their ideas. But ass or not, Linux succeeded because of having that single opinion consistently shaping its development. Most BDFLs, though, are not insufferable asses, and their projects are better off as a result.

OpenStack PTG, Denver 2019

PTG Denver 2019 Logo

Immediately following the Open Infrastructure Summit in Denver was the 3-day Project Teams Gathering (PTG). This was the first time that these two events were scheduled back-to-back. It was in response to some members of the community complaining that traveling to 4 separate events a year (2 Summits, 2 PTGs) was both too expensive and too tiring. The idea was that now you would only have to travel twice a year.

Now that I’ve experienced these back-to-back events, I think that this was a giant step backwards. Let me explain why.

First, it was exhausting! Being in rooms with lots of people for days on end is very draining for those of us who are introverts. Sure, we can be outgoing and interact with people, but it takes a toll, and downtime is necessary to recharge the psychological batteries. At several points I found myself faced with attending a session or finding an empty room to work on stuff by myself, and the latter often won out.

Second, the main idea of the PTG was to take the midcycle get-togethers that many teams had been doing, and formalize a single place for them to meet. The feeling was that having these teams in the same place would spur cross-project discussions, and that definitely was the case. But now that teams will only be getting together every 6 months, we’re back to the situation we were in before the PTGs were created: many teams will need a mid-cycle meeting to ensure that everyone is on-track to complete the goals for that release cycle.

Third, being away from home for an entire week is too long. OK, maybe I’m just getting old, but I really do like being home. One of the nice things about traveling to conferences is tacking on a few extra days to explore the area. For example, after last year’s PTG in Denver, my wife flew out to join me, and we spent a long weekend in Rocky Mountain National Park and other nearby natural areas. But after a solid week of stuff, I couldn’t wait to go home.

Fourth, many people time their return travel so that they miss the last day (or part of it). My unscientific observation was that attendance on the last day of this PTG showed a more dramatic drop than in previous PTGs. I think that’s because it doesn’t seem as severe to miss one day out of 6 than to miss one day out of 3.

As is the tradition at PTGs, there was a feedback session at lunch on the second day, and a lot of the feedback was in line with my observations. Of course, there were a lot of people who liked the format, and for the exact opposite reasons! Goes to show you can’t please everyone.

As for the sessions, the API-SIG was scheduled in a room for Thursday morning. I hung out there, and a few people did come in, but I think we had covered all of the outstanding issues at the BoF session on Tuesday. So I got to spend a lot of the morning hacking on Neo4j, and was able to implement a lot of the functionality that is missing in Placement: nested providers, shared providers, and quotas. I put together a series of Jupyter Notebooks that demonstrated all of these things working with just a small amount of code so that I could share with other people involved in Placement.

And then there was lunch! After 3 days of either going hungry or grabbing something nearby, it was so much nicer to sit down with people while eating lunch. Unfortunately, the box lunches provided seemed to have been kept at near-freezing temperatures until just before the lunch break, and almost too cold to eat. Still, I much preferred them to not having any lunch session at all, if for nothing else than being able to share a meal with other OpenStackers.

In the afternoon we had the Nova – Placement cross-project session, to which the Placement PTL, Chris Dent, brought some bottles of bubbly to celebrate the deletion of the Placement code base from Nova. That commit ended up getting delayed for one more day, but still, it was a milestone to celebrate.

The rest of the session was personally painful to sit through, as the topics revolved around the things that we have been fighting to implement for over 2.5 years: nested providers, shared providers, tree affinity, and other complex relationships among resources. It was painful because I just wanted to shout out “WE’RE USING THE WRONG TOOL!”, as these things naturally flowed from a graph database. I was able to get all of these things working in my spare time over the previous few days. I like to think that I’m a pretty smart guy, but I’m not THAT smart. It’s just because the tool fits the problem domain.

Nested Provider demo
Jupyter notebook showing a section of the Nested Provider demo. It’s a little hard to see, but the two results show that there two possible solutions, both starting with the ComputeNode named ‘balanced_testnode’. Each solution shows that the requested resources both came from the same NUMA node. This is one of the things that comes naturally with a graph DB that is really, really hard in a SQL DB.

I spent that evening working to finish up my Neo4j examples, as I had asked several key placement contributors to take a few minutes to sit down with me so that I could show them what I had done. On Friday morning I showed my graph work to several people, and while each reaction was different, there was a definite flow from skepticism to curiosity and then (for some) to agreement. One of the people to whom I especially wanted to show this was Jay Pipes, whom I had mentioned in my earlier experiments with graph DBs. He had already seen the potential after those blog posts, but he was concerned with developers having to learn some new, cryptic language in order to implement this. However, after about 10 minutes of my demos, I showed him the query I was currently working on that wasn’t quite right. He looked it over, made a suggestion, and when I ran it, it worked correctly! So I think that if he could get a working knowledge after just 10 minutes of seeing the Cypher Query Language, it won’t be hard for other devs to pick it up.

Later in the day we had a good discussion with the Ironic team about a need that they had for stand-alone (i.e, not running under Nova). In such situations, they wanted to use the full resource amounts in placement, as opposed to the current approach used in Nova, which is to represent an Ironic node as an inventory of 1 thing. The issue with representing a baremetal server as, say, 500GB of disk and 16 CPUs is that it may occasionally be selected from a request for 250GB and 8CPU. Since each server cannot be shared, we needed to figure out a way to fully consume the resources on the machine when it was selected, even if the request was for a lower amount. Several ideas were floated and discussed, all with varying degrees of messiness. We finally settled on adding a new API endpoint that would accept a Resource Provider, and allocate all of its resources so that it would no longer be available to any other request.

Hallway Sign

On Saturday morning we started with the Cyborg-Nova cross-project session, at which we could finally see a demonstration of Cyborg in action! I had thought that the Summit sessions would have been much more useful if the demo had been shown then, so that we could have something concrete to discuss. I was glad to see that Cyborg is working and handling accelerators after a few years of planning and design, and I look forward to making further progress integrating it with Nova and Placement.

There were a few discussions in the afternoon that had to do with representing nested resources and their relationships. Once again, it was difficult to listen to these attempts to represent complex relationships in a SQL DB, when I had just demonstrated how simple it was in a graph DB. It was indeed telling that the subject was entitled “Implementing Nested Magic” – getting this working in SQL does seem to require supernatural powers!

I had to leave around 3pm to get to the airport, so I missed anything after that. But most people seemed to have left by then anyway. It had been a long week, and I was burnt out. I also missed being home with my wife, sleeping in my own bed, working at my own desk, and eating my own food. I sincerely hope that the Foundation reconsiders this back-to-back setup. I realize that they are trying to save money wherever possible, but this just wasn’t worth it.