openstack – Walking Contradiction

Open Infrastructure Summit, Denver 2019

The first ever Open Infrastructure Summit was held in the last week of April 2019 at the Colorado Convention Center in Denver, CO. It’s the first since the re-branding from OpenStack to Open Infrastructure began last year to be officially held with the new name. Otherwise, it felt just like the OpenStack summits of old.

The keynotes were better than in prior summits – I think the sponsors got the feedback that no one was interested in sitting through a recap of “how they did X with OpenStack”, and instead focused more on what they intended to do with it. There was a great demo by Chris Hoge and Julia Kreger that showed a kubernetes operator managing a bare metal infrastructure; it showed very clearly that the typical media message around “Kubernetes is replacing OpenStack” is silly. They exist in different problem spaces, and work well together. The only place Kubernetes is replacing OpenStack is in the hype cycle.

After the keynotes I went to the Nova Project Update session. It was very thorough, but felt more like someone reading release notes out loud. I had hoped for more of a discussion about the thinking that went into some of the things that were worked on or are being planned rather than just a straight recitation.

After that was lunch – sort of. For the first time since these summits began, lunch was not provided. Instead, you were supposed to go to one of the many restaurants in the area and buy your own lunch. However, since we had pretty poor weather—freezing temperatures, snow, and rain—walking around downtown Denver wasn’t what I felt like doing. Judging by how packed the restaurant in the hotel across the street was, a lot of other people felt the same way. I understand that times are not as heady as in previous years when OpenStack was the latest hotness, but this seemed like a poor place to cut back. I always enjoyed sharing a table with a bunch of other OpenStackers and learning about where they were from and what they were doing with OpenStack. Going out to lunch meant that people tended to stay with groups they already knew. The afternoon snacks were also gone, which is no big deal for me, but others mentioned to me that they missed having them. Finally, they didn’t have a signature piece of conference swag. I’m typing this wearing the OpenStack hoodie I got back in the Paris 2014 summit, and have my sweatshirt from Tokyo 2015 in my room. Well, OK, they did give out a pair of socks, but they weren’t tied to the event. It’s not a huge thing, but not having something this time really makes things feel… different. And not in a good way.

There weren’t any sessions in the afternoon that I really wanted to go to, so instead I worked on two OpenStack-related projects: etcd-compute and using Graph Databases, such as Neo4j, to hold information for the Placement service. I have previously written about my work with both of these. And since the author of etcd-compute, Chris Dent, was also here at the summit, it was a perfect time to work on it together, so I set up several VMs for us to “play with”.

Monday evening after the sessions was the “Marketplace Mixer”, which is a way to get the attendees to visit the vendor area. They provided food and beverages, and I had my badge scanned several times in exchange for some local craft beer. There wasn’t a lot offered by the vendors that would be useful to me, but I did run into a lot of people I knew. When you’re in your 10th year of working on OpenStack, you get to know quite a few people!

On Tuesday I started with a session on Nova-Cyborg integration. Or at least that was what it was advertised as. It turned out to be more of an “Introduction to Cyborg Concepts” talk, rather than focusing on where the two projects needed to integrate.

cyborg-nova — The crowd at the Cyborg-Nova integration session

Later on was the API-SIG BoF (Birds of a Feather) session that I headed up. There hadn’t been much traffic in the SIG ahead of the summit, so I was happily surprised when several people showed up. We ended up having a good discussion on a variety of API-related topics, and I got to meet several of the people who have joined in some of the more recent IRC discussions and Office Hours who previously I had only known by their IRC handles. It’s always nice to put a face to a name.

In the afternoon was a session to update everyone on the process of extracting Placement from Nova. In the past this has been a somewhat heated topic, but this time everyone seemed to understand where things were and were pretty cool with it. There weren’t any long discussions, so the session finished early. I guess that’s a very good sign that we handed that process well.

The final session of the afternoon was to discuss what the various SIGs (Special Interest Groups) and WGs (Working Groups) needed to be successful. Since the API-SIG has been around for many years, we didn’t really have any needs along these lines. Sure, it would be great to get more people involved, but it isn’t critical. Some of the newer groups explored ways of getting the word out about their existence, which is always a problem. There is so much going on in the OpenStack world that getting people to pay attention to yet another thing is always challenging.

That evening was the Open Infrastructure party, sponsored by Trilio, Mirantis, Red Hat, Open Telekom Cloud, & AVI Networks. It was held in The Church Nightclub, which is an old church that has been converted to a nightclub. There was an open bar and food available, and they had a band playing for entertainment. The location was fun, but being indoors with loud music meant that there was only so much conversation you could have. Still, it was fun!

church niteclub — A view from higher up that shows how an old church was converted into a niteclub. You can see the some of the band playing at the very bottom.

There weren’t any talks on Wednesday morning that I really wanted to attend, so I spent most of the morning in the designated hacking room working on the etcd-compute project for a while, and then on implementing many of the features that are currently lacking in Placement in my graph database code. I managed to implement passing a tree structure to represent nested resource providers so that it creates the corresponding nodes and relationships in the database. This implementation is becoming more and more complete, and I hope when I show it to others this week that they are able to get out of their MySQL comfort zone and see how much better this approach is for representing resources.

I went to lunch with some of the members of my team at IBM who were at the Summit, along with some people from Red Hat with whom we are working to ensure that their various offerings run as well on Power hardware as on x86. So while the pizza was tasty, it was definitely a working lunch. It was also great to meet some of the people I had only known online before.

The Red Hat – IBM lunch *after* the food had been eaten.

After lunch was a session focused on the gaps between Nova functionality and what has been implemented in OpenStack Client. Most of the missing functionality is concerned with supporting new microversions, and this support is several years behind. I’m not sure how effective the discussions were, since what is really needed is for people to take ownership of some of the needed tasks, and I didn’t hear a lot of that happening.

After that I went to the Cyborg Project Update. Once again, it probably would have been much more useful to anyone who hadn’t been following along with the project, so while I didn’t get much from it, there was a lot of information presented on the current state and future plans for Cyborg.

And that was it! The end of another Summit, even if it was the first. That evening I met my sister for dinner. She lives in the Denver area, and it was great to catch up with her and spend some time relaxing after 3 long days. But the relaxation will be short-lived, as the Train PTG starts first thing tomorrow morning!

Geri & Ed — Selfie with my big sister Geri

OpenStack Nova Mid-cycle Meetup, Day 1

I’m here in Palo Alto, California, for the mid-cycle meetup of the OpenStack Nova team. For those of you unfamiliar with the concept, the OpenStack community worldwide gets together every 6 months at a Summit to collectively celebrate what we’ve accomplished, and to plan what we’ll be working on for the next 6 months. During the months that follow, though, it’s easy for things to slide off to the side, or for other things to creep up and get in the way of continued progress. So many of the programs that make up OpenStack plan on getting together about halfway through the process so that we all get an idea of the progress we’ve made, and can discuss and potentially solve any of the issues that would prevent us from completing the work we set out to do for this cycle.

For the Nova team, we set out several things as the priorities that we would be focusing on: the next generation of the Cells design (cells v2); the continued development of Nova Objects; cleaning up the interface between the Scheduler and Nova so that scheduler may eventually be split out; the v2.1 API (microversions); functional testing; nova-network migration; no downtime upgrades; as well as working on the number of bugs we have, and improving our testing infrastructure. The meeting today started with the people heading up each of those tasks giving an update on their progress.

First up was Cells v2. It’s moving along well, but not as fast as they would like. One of the big things was getting the CI testing working with cells, which currently cause most tests to fail. Progress has been made on disabling these tests for now, with the goal of fixing them so that our CI tests with cells on, which will be the standard once this work is complete. Cells are now a configurable option, and the tests now run with it off. By turning this back on, and adding the fixed tests in, we can eventually be confident that any new feature in Nova will work right away in a deployment using cells.

There has been good progress with the Objects work, but the biggest problem is that the first item to be objectified, Flavors, is a hairy mess, and required a bunch of changes to undo all the hacks that made flavors work in the past. Once completed it will bring a lot more sanity to flavors (which is a concept I believe should die in a fire, but I fought it years ago and lost, so we’re stuck with it now).

On the Scheduler front, we only had one outstanding spec (mine, of course!), and lots of code up for review. The series of patches to detach Service from Compute Node is the top priority, as so much of the later patches depend on these changes.

None of the principal movers on the v2.1 API was able to make the mid-cycle, but they did fill in some of their progress information on our shared etherpad. The testing integration is nearly done, but one possible problem is support for v2.1 in novaclient.

Functional testing is aiming to get a dozen or so test patterns defined that others can use as the basis for writing future functional tests. There probably won’t be much more than that in the Kilo timeframe, but the hope is that going forward these can help make funcitonal testing more pervasive.

There is a bunch of work being done for the nova-network to neutron migration, but one thing that everyone working on this wanted to make clear is that while they will be creating some tools to help deployers who want to make the switch, there will not be a single “click it and forget it” single-button migration in the near future. One other issue brought up is that while we are telling everyone who is deploying OpenStack to use Neutron and not nova-network, devstack still uses nova-network. This is poor dogfooding, so it was agreed that we will start to move devstack to use Neutron.

The zero-downtime migrations was interesting: the idea is that instead of running the current SQLAlchemy migrations which require taking the database offline, The new expand/contract approach will compare the defined structures in code with the current database, and if there is a discrepency, create the new structures (expand), migrate the data over, and then later remove the old, unneeded structures (contract). The first code patches to accomplish this have been working, although a lot of work remains to update the tests accordingly.

That was just the morning! The afternoon started with a whiteboard discussion I had asked for where we could identify just what we expect the interface between Nova and the (separated) Scheduler to look like. We did get into a little bit of implementation details at times, but overall we clarified the flow of messages between the two, and defined where the responsibility for ensuring that each build request succeeds should go. A lot of the discussion focused on how we can make the overall process bulletproof, which some saw as a tangent, but I think that this is what is needed: figure out what a solid, robust scheduling solution should look like, and though we aren’t going to get there in this cycle, or even the next, we can make sure that we’re moving towards that design.

The remainder of the day was largely focused on discussing process: how the Nova project is run. Was enough information communicated about what the priorities were? Were the various channels of communication being used well? How can we help the few Nova core reviewers handle the huge number of reviews more effectively? Everyone seemed to have their own preference (e.g., email vs. IRC), but no one had any concrete suggestions about what needs to change. It was pointed out that while the loads are high, they haven’t been getting worse, so there is some measure of stability.

I’m looking forward to Day 2, where we plan on breaking into smaller groups to focus on pushing through as many of the critical patches we can while we’re all in the same room. We’ll see how that goes!

Simplifying OpenStack

Recently I’ve returned to working on the OpenStack code base after a couple of years of working on related projects, and the sheer scope of the changes to the code, both in size and structure, is a bit overwhelming. I’ve also had to catch up with the current issues being discussed among the community, as a project that is growing the way OpenStack is will always have pain points. One such discussion that caught my attention concerns the very definition of what OpenStack is. The discussion not only addresses some of the experiences I’ve had returning to the world of OpenStack development, but it also feels like a continuation of the discussions we had when we were first shaping the project 4 years ago.

Sean Dague wrote an interesting take on this, envisioning OpenStack as a set of layers. The lowest layers provided the basic compute infrastructure, and the higher layers either built on this, or added additional capabilities. While I can see the basis for this division, it struck me as somewhat arbitrary; I mean, are Heat and Trove really that similar in either purpose or architecture?

Yesterday I read Monty Taylor’s blog post on this topic, and I think he has much more accurately captured the relationship between the various parts of OpenStack. His post is much more far-reaching than simply describing OpenStack; it’s really more concerned with how to help OpenStack move forward, and uses these relationships to make his case.

I think that working with the notion that there is a fundamental base for OpenStack (what Monty and Sean both refer to as ‘Layer #1‘), and that everything else is an add-on to this base, will have several positive results. First, it simplifies the internals of these projects. As Monty points out, “nova shouldn’t need a config file option pointing to where glance is, it should just be able to ask the keystone service catalog”. In other words, these Layer #1 projects can assume that they are all present, and don’t have to add needless complexity to handle these relationships. Glance and Keystone have to be there, or Nova can’t work – period.

Second, it streamlines the testing process considerably. The only tests for changes to any of these Layer #1 projects that need to be run are those for their internal workings, and those for their interfaces. This smaller, more manageable gate should relieve (but not eliminate) some of the testing bottlenecks OpenStack experiences today.

Speaking of interfaces, this is the third, and in my opinion, the most significant benefit of this concept: it forces the interfaces between the various projects to be defined clearly, and makes testing any potential change to these interfaces much cleaner. Let’s take the example of Trove: it started as a separate database-as-a-service built on top of Nova, but now is being consumed internally to handle Nova’s database needs. With the current “everything is OpenStack” approach, the lines between Nova and Trove might start to blur, and before long Trove and Nova would become too tightly bound to separate. By keeping the interface between the two clean and separate, it allows for the possibility that another DBaaS project may come along that performs better for Nova’s needs, and can be adopted without having to rip out any hard assumptions about Trove. Similarly, other DBaaS could be developed that are optimized for other use cases, and these can test their interfaces to assure that they can also be implemented as a drop-in replacement for Trove.

Monty identified yet another benefit: there will no longer be a “race” for teams to get officially designated as part of OpenStack, as this designation would go away. Instead, there would be a known set of interfaces that a project would have to support, and if there is more than one group with an idea for implementing a particular need, they can each develop their code without having to first get “blessed” as somehow the “official” OpenStack solution for this need. Developing to a known interface will allow the community to test these different solutions much more easily, and allow a superior solution arise, instead of having to settle for the solution that got there first.

Finally, I see a side benefit to this kind of simplicity of structure. Someone coming to OpenStack for the first time, the sheer number of projects is daunting. Never mind having to learn all the names and what each project does; it’s the feeling that OpenStack is too big to wrap your brain around that can potentially discourage new developers. A clear separation between the heart of OpenStack and the various peripheral projects would eliminate that confusion.

I’m greatly encouraged by these discussions, as it helps clarify the long-term direction of OpenStack. And I’m extremely happy to be part of it all once more.