Cloud – Walking Contradiction

Day 52: Happy 10th Birthday, OpenStack!

I just saw this announcement from the OpenStack Foundation about OpenStack’s 10th birthday! Yes, 10 years ago this week was the first OpenStack Summit, in Austin, TX, with the public announcement the following week at O’Rielly OSCO N. Yet most people don’t know that I played a very critical role in the beginning!

OpenStack began as a joint venture between Rackspace (my employer at the time) and NASA. I was on the team at Rackspace that developed and supported its aging cloud compute services, and we were looking to develop something from scratch that could be much more scalable than our current system. Around that time Thierry Carrez saw an announcement from a group at NASA about their development of a compute virtualization system, and suggested to the powers that be at Rackspace that this might be a better way to go instead of developing the whole thing ourselves. From that followed a lot of discussion among the executives at Rackspace, as well as some conversations with NASA, and the conclusion was that we would team up. One of the first things to do was to get the developers for both groups together to discuss things from a more technical perspective. And this is where I believe that I made a critical decision that, had I chosen wrong, might have resulted in OpenStack never happening.

The NASA team was a consulting group, Anso Labs, and they were arriving in San Antonio, and we had plans to take them out to dinner, but no idea where. It was then that I suggested The Cove, a local place with lots of outdoor seating, a relaxed atmosphere, good beer, and delicious food. We had a great time that evening, and we all got to know each other. Had we been in a more conventional restaurant, people may have only gotten to know the people sitting next to them, but since The Cove is open seating, we moved around a lot, talking about both the technical stuff and personal stuff.

Over the next few days we began reviewing the code and exchanging idea on what needed to be developed next, and those discussions went very smoothly, getting a lot done in a short period of time. I still maintain to this day that without that first night having beer and food at The Cove, OpenStack might never have become the success that it did.

You’re welcome.

Day 44: Employed Once More!

After 3 1/2 months of unemployment, during which I submitted countless job applications, became a regular on LinkedIn, learned the routines of the Texas Unemployment Benefits system, and sat through numerous interviews, I’m excited to report that I have a new job!

In a couple of weeks I will be starting at Nvidia as a Senior Python Developer, working on the tools for their GPU cloud. I’ve met the other people on my team during the video interview process, and they all seem like a bright bunch, so I can’t wait to start working with them!

It’s been difficult these last few months. It started with the pandemic and subsequent lockdown, which has affected everyone. Then came the layoff, with DataRobot letting 25% of its workforce go, including yours truly. It really wasn’t much consolation that I was only 1 of the 40 million or so in the US who lost their job in those few weeks – it still hurt.

Still, I have had it better than most. My wife still had her job, which was super-important financially. We also had some savings, so we weren’t living paycheck-to-paycheck like so many Americans have to. And it did give me some free time to work on my photoviewer software, and practice my newly-discovered sport of disc golf. It also gave me the chance to perfect my sourdough bread technique (yeah, I know – how cliché!). But there is only so much to do when largely confined to the house.

Which is why I started this daily writing exercise. Not just to fill the time, but to get down some of the thoughts that have been in my head for a while, and polish my rusty writing skills. And while it’s been difficult to always find something to write about, I have noticed that writing itself is feeling more fluid.

I will continue this daily project until I start the job on July 20. After that, I will continue to write, but just not on a daily basis. Going through this exercise has helped me enjoy writing more, and improved my ability to let a piece out into the wild without first obsessing with endless editing. That is probably the best thing I’ve gotten out of it.

Day 31: Using etcd As a Mediator

etcd is a database originally developed by CoreOS, and is most famously used as the database at the heart of Kubernetes. It is a distributed key-value store, which in itself is not all that remarkable. The thing about etcd that makes it so attractive is the ability to watch a key for changes.

Other key/value stores, such as Redis, have implemented a similar feature, and may work just as well for you. I’ve been using etcd for years, and it’s worked well for me, so I’ve never had a reason to try these other tools.

For most data stores, the only way to find if a particular value has changed is to poll. You issue a query for that value on a regular basis, and compare it to the last value returned to see if it has changed. This is terribly inefficient, especially with values that don’t change often. It’s also inexact with respect time, because your system’s reaction to a changed value depends on the interval between polls. Longer intervals, while less chatty, mean that more time will elapse between when the value changes, and when your application responds to that change.

Enter etcd. Instead of polling an etcd server for changes, you can watch for changes. This is essentially a pubsub system that requires almost no configuration to work. When a key is written to etcd, if there are any watchers for that key, a message is sent to them with the new value.

This is kind of dry in theory, so let’s look at a real-world application using this system: my photoviewer and photoserver applications. These applications allow me to display photographs on monitors that can be anywhere with an internet connection, and control each of those displays from a central server. They represent the ultimate convergence of my work as an artist and my love of programming.

Each display consists of a monitor (actually a TV, but all I want is an HDMI input) and a Raspberry Pi that runs the photoviewer application. Each display has a unique ID to identify it, and when a display starts up, it registers itself with the server. The server contains the settings for that display, such as the list of photos to display, and how often to change the displayed photo.

I have one such display in the kitchen of my home, and like to change the photos displayed on it from time to time. To do that, I go into my photoserver app and change the album for that display. Almost instantly the image on the display changes. How did that happen? The server is a virtual machine running in the Digital Ocean cloud, not local to the kitchen display.

The reason this works is that I’m also running an etcd server on another cloud instance. When I change any setting for a display, the photoserver app writes a new value for that display’s key. The key consists of the unique ID of the display plus the type of value being changed. For example, if I change the photos I want displayed for a display with the ID of 65febdde-3e8a-4c76-ab8f-d8a653e466c7, the server would write a value of a list of the names of those images to the key /551a441f-8aba-44b5-b70b-349af0be5b67:images.

That application uses the etcd3 library to watch my etcd server for changes to any key beginning with /<unique ID>. The watch() method is called with a callback method, and when a new key is written beginning with that display’s prefix, the value is sent to the callback.

The callback method sees that the full key ends with :images, so it passes the value (the list of image names) to the photo display method, which then retrieves the image and displays it. This happens in real time, without any polling of the server needed.

The original version of these apps used the traditional polling method, which seemed wasteful, considering that it was typically weeks between any changes being made. Switching to an etcd watch makes much more sense from a design perspective, and it greatly simplified the code.

Look for cases in your applications where a response is needed to a change in data. Using etcd as a mediator might be a good approach.

Day 12: Communities and Survivorship Bias

Communities, especially Open Source communities, tend to form some form of governance once they grow beyond a certain size. The actual size isn’t as important as the relationship among the members: when everyone knows everyone else, there’s really no need for governance. But when individuals come from different companies, or who otherwise may have different interests than the others, there needs to be some ground rules for making decisions on what does or does not get done. Without governance, projects will inevitably fork when these differences get large enough.

Typically governance is established by what most people involved like to think is a meritocracy: the hardest-working, most knowledgeable people are the ones who make the important decisions. At first glance this seems perfectly fair, and it usually is—initially, at least. Over time, though, this system is prone to the problems of Survivorship Bias. Let me illustrate how that happens.

Imagine a group of people who are on a long hike through the wilderness. There will be some people who have more skill reading a map, or operating a compass, or who know the terrain better. When the group starts out, it is only natural that these people lead the group, and they are given the title of Navigator. The group creates rules that while anyone can provide ideas as to what direction they should head in, only Navigators can make that choice. It works well for a while.

As time passes, though, and people in the group learn more about map reading and terrain features, their knowledge begins to approach the level of the existing Navigators. At that point it would seem fair to also designate these people as Navigators, since they now have enough knowledge to make directional decisions. But the rule is that the only way an existing group member can be designated a Navigator is if all of the existing Navigators agree. In other words, the process is largely subjective, as there is no objective test for competency. It also calls for a good deal of trust.

After a while, some people realize that while the Navigators have generally been doing their job well, they have made some errors that have taken the group off of the ideal path. Some group members point that out, and want to adjust course to get back to where they should have been. The Navigators, though, prefer to keep moving forward, even if it makes for a longer and more difficult hike in the long term; they prefer the feeling of moving ahead. Those who disagree go off on their own in frustration. Others within the group get the clear message that if they ever want to become a Navigator, they should curry favor with the existing Navigators. And when they do make it into that core group, they feel that they have worked hard to earn it, and anyone else who wants to reach that level has to play by the same rules that they did.

This is classic survivorship bias. The only people who can change the system are the ones who agreed with it in the first place, and thus don’t really see a problem with it. The voices of disagreement fade away until they can no longer be heard, so everyone thinks it’s all good. The system self-perpetuates.

I’ve seen this in action in several communities, but none so strikingly as in the OpenStack community, both on the Nova team as well as the Technical Committee that is supposed to provide technical leadership. I originally wrote a draft of this a year ago when I was working in that community, and became increasingly frustrated at how decisions were made. I happened to “run into” (electronically, of course) a few former Nova developers who had moved on, and when I expressed my frustration, they both said that similar feelings were why they looked to move to a different project. That’s when the role of survivorship bias became clear to me.

As I’m no longer in the OpenStack community, I don’t need to vent about particular issues or personalities. That’s history to me. I do hope that people realize that survivorship bias can shape how a community views itself, because if you are coming up short in some areas, you won’t know about it, because the people affected usually leave rather than deal with that BS. If you care about growing a healthy community, you need to make it easy and welcoming for people to share their ideas. And you should also take the time when someone who was active decides to leave to do a sort of exit interview. You might learn something important.

Moving On

It’s been a great run, but my days in the OpenStack world are coming to an end. As some of you know already, I have accepted an offer to work for DataRobot. I only know bits and pieces of what I will be working on there, but one thing’s for sure: it won’t be on OpenStack. And that’s OK with me, as I’ve been working on OpenStack in one form or another for 10 years now.

Wait a moment, you say – OpenStack is only 9 years old! Well, before the OpenStack project was started, I worked on Swift briefly when it was an internal, proprietary project at Rackspace. After that I switched to the Cloud Servers team, which was the team that started Nova with NASA. So yeah, it’s been a full decade. That’s a loooonnnnggg time to be on any development project!

So the feelings of burnout combined with the shift away from OpenStack within IBM made moving to DataRobot a very attractive option. And after having done several video interviews with the people there and getting their impression of life at DataRobot, I’m that much more excited to be joining that team. I’m sure that for the first few months it will be like drinking from the proverbial fire hose, and that’s perfectly fine by me. It’s been much too long since I’ve pushed the reset button for my career.

Over these past 10 years I have made many professional contacts, some of whom I consider true friends. I will miss the OpenStack community, and I hope to run into many of you at future tech events – PyCon, anyone?