Day 31: Using etcd As a Mediator

etcd is a database originally developed by CoreOS, and is most famously used as the database at the heart of Kubernetes. It is a distributed key-value store, which in itself is not all that remarkable. The thing about etcd that makes it so attractive is the ability to watch a key for changes.

Other key/value stores, such as Redis, have implemented a similar feature, and may work just as well for you. I’ve been using etcd for years, and it’s worked well for me, so I’ve never had a reason to try these other tools.

For most data stores, the only way to find if a particular value has changed is to poll. You issue a query for that value on a regular basis, and compare it to the last value returned to see if it has changed. This is terribly inefficient, especially with values that don’t change often. It’s also inexact with respect time, because your system’s reaction to a changed value depends on the interval between polls. Longer intervals, while less chatty, mean that more time will elapse between when the value changes, and when your application responds to that change.

Enter etcd. Instead of polling an etcd server for changes, you can watch for changes. This is essentially a pubsub system that requires almost no configuration to work. When a key is written to etcd, if there are any watchers for that key, a message is sent to them with the new value.

This is kind of dry in theory, so let’s look at a real-world application using this system: my photoviewer and photoserver applications. These applications allow me to display photographs on monitors that can be anywhere with an internet connection, and control each of those displays from a central server. They represent the ultimate convergence of my work as an artist and my love of programming.

Each display consists of a monitor (actually a TV, but all I want is an HDMI input) and a Raspberry Pi that runs the photoviewer application. Each display has a unique ID to identify it, and when a display starts up, it registers itself with the server. The server contains the settings for that display, such as the list of photos to display, and how often to change the displayed photo.

Photoviewer running in my kitchen

I have one such display in the kitchen of my home, and like to change the photos displayed on it from time to time. To do that, I go into my photoserver app and change the album for that display. Almost instantly the image on the display changes. How did that happen? The server is a virtual machine running in the Digital Ocean cloud, not local to the kitchen display.

The reason this works is that I’m also running an etcd server on another cloud instance. When I change any setting for a display, the photoserver app writes a new value for that display’s key. The key consists of the unique ID of the display plus the type of value being changed. For example, if I change the photos I want displayed for a display with the ID of 65febdde-3e8a-4c76-ab8f-d8a653e466c7, the server would write a value of a list of the names of those images to the key /551a441f-8aba-44b5-b70b-349af0be5b67:images.

That application uses the etcd3 library to watch my etcd server for changes to any key beginning with /<unique ID>. The watch() method is called with a callback method, and when a new key is written beginning with that display’s prefix, the value is sent to the callback.

The callback method sees that the full key ends with :images, so it passes the value (the list of image names) to the photo display method, which then retrieves the image and displays it. This happens in real time, without any polling of the server needed.

The original version of these apps used the traditional polling method, which seemed wasteful, considering that it was typically weeks between any changes being made. Switching to an etcd watch makes much more sense from a design perspective, and it greatly simplified the code.

Look for cases in your applications where a response is needed to a change in data. Using etcd as a mediator might be a good approach.

Day 12: Communities and Survivorship Bias

Communities, especially Open Source communities, tend to form some form of governance once they grow beyond a certain size. The actual size isn’t as important as the relationship among the members: when everyone knows everyone else, there’s really no need for governance. But when individuals come from different companies, or who otherwise may have different interests than the others, there needs to be some ground rules for making decisions on what does or does not get done. Without governance, projects will inevitably fork when these differences get large enough.

Typically governance is established by what most people involved like to think is a meritocracy: the hardest-working, most knowledgeable people are the ones who make the important decisions. At first glance this seems perfectly fair, and it usually is—initially, at least. Over time, though, this system is prone to the problems of Survivorship Bias. Let me illustrate how that happens.

Imagine a group of people who are on a long hike through the wilderness. There will be some people who have more skill reading a map, or operating a compass, or who know the terrain better. When the group starts out, it is only natural that these people lead the group, and they are given the title of Navigator. The group creates rules that while anyone can provide ideas as to what direction they should head in, only Navigators can make that choice. It works well for a while.

As time passes, though, and people in the group learn more about map reading and terrain features, their knowledge begins to approach the level of the existing Navigators. At that point it would seem fair to also designate these people as Navigators, since they now have enough knowledge to make directional decisions. But the rule is that the only way an existing group member can be designated a Navigator is if all of the existing Navigators agree. In other words, the process is largely subjective, as there is no objective test for competency. It also calls for a good deal of trust.

After a while, some people realize that while the Navigators have generally been doing their job well, they have made some errors that have taken the group off of the ideal path. Some group members point that out, and want to adjust course to get back to where they should have been. The Navigators, though, prefer to keep moving forward, even if it makes for a longer and more difficult hike in the long term; they prefer the feeling of moving ahead. Those who disagree go off on their own in frustration. Others within the group get the clear message that if they ever want to become a Navigator, they should curry favor with the existing Navigators. And when they do make it into that core group, they feel that they have worked hard to earn it, and anyone else who wants to reach that level has to play by the same rules that they did.

This is classic survivorship bias. The only people who can change the system are the ones who agreed with it in the first place, and thus don’t really see a problem with it. The voices of disagreement fade away until they can no longer be heard, so everyone thinks it’s all good. The system self-perpetuates.

I’ve seen this in action in several communities, but none so strikingly as in the OpenStack community, both on the Nova team as well as the Technical Committee that is supposed to provide technical leadership. I originally wrote a draft of this a year ago when I was working in that community, and became increasingly frustrated at how decisions were made. I happened to “run into” (electronically, of course) a few former Nova developers who had moved on, and when I expressed my frustration, they both said that similar feelings were why they looked to move to a different project. That’s when the role of survivorship bias became clear to me.

As I’m no longer in the OpenStack community, I don’t need to vent about particular issues or personalities. That’s history to me. I do hope that people realize that survivorship bias can shape how a community views itself, because if you are coming up short in some areas, you won’t know about it, because the people affected usually leave rather than deal with that BS. If you care about growing a healthy community, you need to make it easy and welcoming for people to share their ideas. And you should also take the time when someone who was active decides to leave to do a sort of exit interview. You might learn something important.

Moving On

It’s been a great run, but my days in the OpenStack world are coming to an end. As some of you know already, I have accepted an offer to work for DataRobot. I only know bits and pieces of what I will be working on there, but one thing’s for sure: it won’t be on OpenStack. And that’s OK with me, as I’ve been working on OpenStack in one form or another for 10 years now.

Wait a moment, you say – OpenStack is only 9 years old! Well, before the OpenStack project was started, I worked on Swift briefly when it was an internal, proprietary project at Rackspace. After that I switched to the Cloud Servers team, which was the team that started Nova with NASA. So yeah, it’s been a full decade. That’s a loooonnnnggg time to be on any development project!

So the feelings of burnout combined with the shift away from OpenStack within IBM made moving to DataRobot a very attractive option. And after having done several video interviews with the people there and getting their impression of life at DataRobot, I’m that much more excited to be joining that team. I’m sure that for the first few months it will be like drinking from the proverbial fire hose, and that’s perfectly fine by me. It’s been much too long since I’ve pushed the reset button for my career.

Over these past 10 years I have made many professional contacts, some of whom I consider true friends. I will miss the OpenStack community, and I hope to run into many of you at future tech events – PyCon, anyone?

Why OpenStack Failed, or How I Came to Love the Idea of a BDFL

OK, so the title of this is a bit clickbait-y, but let me explain. By some measures, OpenStack is a tremendous success, being used to power several public clouds and many well-known businesses. But it has failed to become a powerful player in the cloud space, and I believe the reason is not technical in nature, but a lack of leadership.

OpenStack began as a collaboration between Rackspace, a commercial, for-profit business, and a consulting group working for NASA. While there were several companies involved in the beginning, Rackspace dominated by sheer numbers. This dominance was a concern to many companies – why should they contribute their time and resources to a project that might only benefit Rackspace? This fear was not entirely unfounded, as the OpenStack API was initially created to match Rackspace’s legacy cloud API, and much of the early naming of things matched Rackspace’s terminology – I mean, who ever thought of referring to virtual machines as “servers”? But that matched the “Cloud Servers” branding that Rackspace used for its cloud offering, and that name, as well as the use of “flavor” for instance sizing, persist today. The early governance was democratic, but when one company has many more votes than the others…

The executives at Rackspace were aware of this concern, and quickly created the OpenStack Foundation, which would be an independent entity that would own the intellectual property, helping to guarantee that one commercial company would not control the destiny of OpenStack. More subtly, though, it also engendered a deep distrust of any sort of top-down control over the direction of the software development. Each project within OpenStack was free to pretty much do things however they wanted, as long as they remained within the bounds of the Four Opens: Open Source, Open Design, Open Development, and Open Community.

That sound pretty good, right? I mean, who needs someone imposing their opinions on you?

Well, it turns out that OpenStack needed that. For those who don’t know the term “BDFL“, it is an acronym for “Benevolent Dictator For Life”. It means that the software created under a BDFL is opinionated, but it is also consistently opinionated. A benevolent dictator listens to the various voices asking for features, or designing an API, and makes a decision based on the overall good of the project, and not on things like favoring corporate interests for big contributors, or strong personalities that otherwise dominate design discussions. Can you imagine what AWS would be like if each group within could just decide how they wanted to do things? The imposition of the design from above assures AWS that each of its projects can work easily with others.

The closest thing to that in OpenStack is the Technical Committee (TC), which “is an elected group that represents the contributors to the open source project, and has oversight on all technical matters”. Despite the typical meaning of “oversight”, the TC is essentially a suggestion body, and has no real enforcement power. They can spend months agonizing over the wording of mission statements and community goals, but shy away from anything that might appear to be a directive that others must do. I don’t think the word “must” is in their vocabulary.

They also bend over backwards to avoid potentially offending anyone. Here is one example from my interactions with them: one of the things the TC does is “tag” projects, so that newcomers to OpenStack can get a better idea how mature a particular project is, or how stable, etc. One of the proposed tags was to warn potential users that a project was primarily being developed by a single company; the concern is that all it would take is one manager at that company to decide to re-assign their employees, and the project would be dead. This is a very valid concern for open source projects, and it was proposed that a tag named “team:diverse-affiliation-danger” be created to flag such projects. What followed was much back-and-forth on the review of the proposal as well as in TC meetings about how the tag name was negative and would hurt people’s feelings, how it would be seen as an attack against a project, that it was more of a stick rather than a carrot, etc. All of this hand-wringing over an objective measurement of the content of a project’s current level of activity. (Epilogue: they ended up making it a positive-sounding tag: “team:single-vendor”, and no tears were shed)

Having ineffective leadership like the TC has ripple effects throughout all of OpenStack. Each project is an island, and calls its own shots. So when two projects need to interact, they both see it from the perspective of “how will this affect me?” instead of “how will this improve OpenStack?”. This results in protracted discussions about interfaces and who will do what thing in what order. And when I say “protracted”, I don’t just mean weeks or months; some, such as the CyborgNova integration discussions, have dragged on for two years! I cannot imaging that happening in a world with an OpenStack BDFL. This inter-project friction slows down development of OpenStack as a whole, and in my opinion, contributes to developer dissatisfaction.

So what would OpenStack have been like if it had had a BDFL? Of course, that would depend entirely on the individual, but I can say this: it would have flamed out very quickly with a poor BDFL, or it would be a much better product with a much higher adoption with a good one. Back in 2013 I had predicted that OpenStack would eventually rival the commercial clouds in much the same manner that Linux now dominates the internet over proprietary operating systems. In the early days of the internet, the ability for people to download and play with free software such as the LAMP stack enabled people with big ideas but small budgets to turn those ideas into reality. OpenStack began in the early days of cloud computing, and it seemed logical that having a freely-available alternative to the commercial clouds might likewise result in new cloud-native creations becoming reality. It was a believable prediction, but I missed the effect that a lack of coordination from above would have on OpenStack achieving the potential to fill that role.

By the way, many people point to Linux and its BDFL, Linus Torvalds, as the argument against having a BDFL, as Linus has repeatedly behaved as an offensive ass towards others when he didn’t like their ideas. But ass or not, Linux succeeded because of having that single opinion consistently shaping its development. Most BDFLs, though, are not insufferable asses, and their projects are better off as a result.

OpenStack PTG, Denver 2019

PTG Denver 2019 Logo

Immediately following the Open Infrastructure Summit in Denver was the 3-day Project Teams Gathering (PTG). This was the first time that these two events were scheduled back-to-back. It was in response to some members of the community complaining that traveling to 4 separate events a year (2 Summits, 2 PTGs) was both too expensive and too tiring. The idea was that now you would only have to travel twice a year.

Now that I’ve experienced these back-to-back events, I think that this was a giant step backwards. Let me explain why.

First, it was exhausting! Being in rooms with lots of people for days on end is very draining for those of us who are introverts. Sure, we can be outgoing and interact with people, but it takes a toll, and downtime is necessary to recharge the psychological batteries. At several points I found myself faced with attending a session or finding an empty room to work on stuff by myself, and the latter often won out.

Second, the main idea of the PTG was to take the midcycle get-togethers that many teams had been doing, and formalize a single place for them to meet. The feeling was that having these teams in the same place would spur cross-project discussions, and that definitely was the case. But now that teams will only be getting together every 6 months, we’re back to the situation we were in before the PTGs were created: many teams will need a mid-cycle meeting to ensure that everyone is on-track to complete the goals for that release cycle.

Third, being away from home for an entire week is too long. OK, maybe I’m just getting old, but I really do like being home. One of the nice things about traveling to conferences is tacking on a few extra days to explore the area. For example, after last year’s PTG in Denver, my wife flew out to join me, and we spent a long weekend in Rocky Mountain National Park and other nearby natural areas. But after a solid week of stuff, I couldn’t wait to go home.

Fourth, many people time their return travel so that they miss the last day (or part of it). My unscientific observation was that attendance on the last day of this PTG showed a more dramatic drop than in previous PTGs. I think that’s because it doesn’t seem as severe to miss one day out of 6 than to miss one day out of 3.

As is the tradition at PTGs, there was a feedback session at lunch on the second day, and a lot of the feedback was in line with my observations. Of course, there were a lot of people who liked the format, and for the exact opposite reasons! Goes to show you can’t please everyone.

As for the sessions, the API-SIG was scheduled in a room for Thursday morning. I hung out there, and a few people did come in, but I think we had covered all of the outstanding issues at the BoF session on Tuesday. So I got to spend a lot of the morning hacking on Neo4j, and was able to implement a lot of the functionality that is missing in Placement: nested providers, shared providers, and quotas. I put together a series of Jupyter Notebooks that demonstrated all of these things working with just a small amount of code so that I could share with other people involved in Placement.

And then there was lunch! After 3 days of either going hungry or grabbing something nearby, it was so much nicer to sit down with people while eating lunch. Unfortunately, the box lunches provided seemed to have been kept at near-freezing temperatures until just before the lunch break, and almost too cold to eat. Still, I much preferred them to not having any lunch session at all, if for nothing else than being able to share a meal with other OpenStackers.

In the afternoon we had the Nova – Placement cross-project session, to which the Placement PTL, Chris Dent, brought some bottles of bubbly to celebrate the deletion of the Placement code base from Nova. That commit ended up getting delayed for one more day, but still, it was a milestone to celebrate.

The rest of the session was personally painful to sit through, as the topics revolved around the things that we have been fighting to implement for over 2.5 years: nested providers, shared providers, tree affinity, and other complex relationships among resources. It was painful because I just wanted to shout out “WE’RE USING THE WRONG TOOL!”, as these things naturally flowed from a graph database. I was able to get all of these things working in my spare time over the previous few days. I like to think that I’m a pretty smart guy, but I’m not THAT smart. It’s just because the tool fits the problem domain.

Nested Provider demo
Jupyter notebook showing a section of the Nested Provider demo. It’s a little hard to see, but the two results show that there two possible solutions, both starting with the ComputeNode named ‘balanced_testnode’. Each solution shows that the requested resources both came from the same NUMA node. This is one of the things that comes naturally with a graph DB that is really, really hard in a SQL DB.

I spent that evening working to finish up my Neo4j examples, as I had asked several key placement contributors to take a few minutes to sit down with me so that I could show them what I had done. On Friday morning I showed my graph work to several people, and while each reaction was different, there was a definite flow from skepticism to curiosity and then (for some) to agreement. One of the people to whom I especially wanted to show this was Jay Pipes, whom I had mentioned in my earlier experiments with graph DBs. He had already seen the potential after those blog posts, but he was concerned with developers having to learn some new, cryptic language in order to implement this. However, after about 10 minutes of my demos, I showed him the query I was currently working on that wasn’t quite right. He looked it over, made a suggestion, and when I ran it, it worked correctly! So I think that if he could get a working knowledge after just 10 minutes of seeing the Cypher Query Language, it won’t be hard for other devs to pick it up.

Later in the day we had a good discussion with the Ironic team about a need that they had for stand-alone (i.e, not running under Nova). In such situations, they wanted to use the full resource amounts in placement, as opposed to the current approach used in Nova, which is to represent an Ironic node as an inventory of 1 thing. The issue with representing a baremetal server as, say, 500GB of disk and 16 CPUs is that it may occasionally be selected from a request for 250GB and 8CPU. Since each server cannot be shared, we needed to figure out a way to fully consume the resources on the machine when it was selected, even if the request was for a lower amount. Several ideas were floated and discussed, all with varying degrees of messiness. We finally settled on adding a new API endpoint that would accept a Resource Provider, and allocate all of its resources so that it would no longer be available to any other request.

Hallway Sign

On Saturday morning we started with the Cyborg-Nova cross-project session, at which we could finally see a demonstration of Cyborg in action! I had thought that the Summit sessions would have been much more useful if the demo had been shown then, so that we could have something concrete to discuss. I was glad to see that Cyborg is working and handling accelerators after a few years of planning and design, and I look forward to making further progress integrating it with Nova and Placement.

There were a few discussions in the afternoon that had to do with representing nested resources and their relationships. Once again, it was difficult to listen to these attempts to represent complex relationships in a SQL DB, when I had just demonstrated how simple it was in a graph DB. It was indeed telling that the subject was entitled “Implementing Nested Magic” – getting this working in SQL does seem to require supernatural powers!

I had to leave around 3pm to get to the airport, so I missed anything after that. But most people seemed to have left by then anyway. It had been a long week, and I was burnt out. I also missed being home with my wife, sleeping in my own bed, working at my own desk, and eating my own food. I sincerely hope that the Foundation reconsiders this back-to-back setup. I realize that they are trying to save money wherever possible, but this just wasn’t worth it.