Queens PTG Recap

Last week was the second-ever OpenStack Project Teams Gathering, or PTG. It’s still an awkward name for a very productive conference.

PTG logo

This time the PTG was held in Denver, Colorado, at a hotel several miles outside of downtown Denver.

Downtown Denver
Downtown Denver, as seen from the PTG hotel. We were about 8 miles away.

It was clear that the organizers from the OpenStack Foundation took the comments from the attendees of the first PTG in Atlanta to heart, as it seemed that none of the annoyances from Atlanta were an issue: there was no loud air conditioning, and the rooms were much less echo-y. The food was also a lot better!

mac and cheese
On Friday, the lunch offering featured a custom Mac & Cheese station, where you could select from shrimp, ham, or chicken, and then add your choice of cheeses.

As in Atlanta, Monday and Tuesday were set aside for cross-project sessions, with team sessions on Wednesday–Friday. Most of the first two days was taken up by the API-SIG discussions. There was a lot to talk about, and we managed to cover most of it. One main focus was how to expand our outreach to various groups, now that we have transitioned from a Working Group (WG) to a Special Interest Group (SIG). That may sound like a simple name change, but it represents the shift in direction from being only API developer-focused to reaching out to SDK developers and users.

API-SIG tables
For the API-SIG discussions, the arrangement of tables spread us too far apart, so we took matters into our own hands

We discussed several issues that had been identified ahead of time. The first was the format for single resources. The format for multiple resources has not been contentious; it looks like:

{"resource_name": [{resource}, {resource},... {resource}]}

In English, a list of the returned resources in a dictionary with the resource type/name as the key. But for a single resource, there are several possibilities:

# Singular resource

# One-element list

# Dictionary keyed by resource name, single value
{"resource_name": {resource}}

# Dictionary keyed by resource name, list of one value
{"resource_name": [{resource}]}

None of these stood out as a clear winner, as we could come up with pros and cons for each. When that happens, we make consistency with the rest of OpenStack a priority, so elmiko agreed to survey the code base to get some numbers. If there is a clear preference within OpenStack, we can make that the recommended form.

Next was a very quick discussion of the microversion-parse library, and whether we should recommend it as an “official” tool for projects to use (we did). This would mean that the API-SIG would be undertaking ownership of the library, but as it’s very simple, this was not felt to be a significant burden.

We moved on to the topic of API testing tools. This idea had come up in the past: create a tool that would check how well an API conformed to the guidelines. We agreed once again that that would be a huge effort with very little practical benefit, and that we would not entertain that idea again.

Next up were some people from the Ironic team who had questions about what we would recommend for an API call that was expected to take a long time to complete. Blocking while the call completes could take several minutes, so that was not a good option. The two main options were to use a GET with an “action” as the resource, or POST with the action in the body. Using GET for this doesn’t fit well with RESTful principles, so POST was really the only option, as it is semantically fluid. The response should be a 202 Accepted, and contain the URI that can be called with GET to determine the status of the request. The Ironic team agreed to write up a more detailed description of their use case, which the API-SIG could then use as the base for an example of a guided review discussion.

Another topic that got a lot of discussion was Capabilities. This term is used in many contexts, so we were sure to distinguish among them.

  • What is this cloud capable of doing?
  • What actions are possible for this particular resource?
  • What actions are possible for this particular authenticated user?

We focused on the first type of capability, as it is important for cloud interoperability. There are ways to determine these things, but they might require a dozen API calls to get the information needed. There already is a proposal for creating a static file for clouds, so perhaps this can be expanded to cover all the capabilities that may be of interest to consumers of multiple clouds. This sort of root document would be very static and thus highly cacheable.

For the latter two types of capabilities, it was felt that there was no alternative to making the calls as needed. For example, a user might be able to create an instance of a certain size one minute, but a little later they would not because they’ve exceeded their quota. So for user interfaces such as Horizon, where, say, a button in the UI might be disabled if the user cannot perform that action, there does not seem to be a good way to simplify things.

We spent a good deal of time with a few SDK authors about some of the issues they are having, and how the API-SIG can help. As someone who works on the API creation side of things but who has also created an SDK, these discussions were of particular interest. Since this topic is fairly recent, most of the time was spent getting a feel for the issues that may be of interest. There was some talk of creating SDK guidelines, similar to the API guidelines, but that doesn’t seem like the best way to go. APIs have to be consumed by all sorts of different applications, so consistency is important. SDKs, on the other hand, are consumed by developers for that particular language. The best advice is to make your SDK as idiomatic as possible for the language so that the developers using your SDK will find it as usable as the rest of the language.

After the sessions on Tuesday, there was a pleasant happy hour, with the refreshments sponsored by IBM. It gave everyone a chance to talk to each other, and I had several interesting conversations with people working on different parts of OpenStack.

happy hour
The Tuesday happy hour featured beer and wine, courtesy of IBM!

Starting Wednesday I was in the Nova room for most of the time. The day started off with the Pike retrospective, where we ideally take a look at how things went during the last cycle, and identify the things that we could do better. This should then be used to help make the next cycle go more smoothly. The Nova team can certainly be pretty dysfunctional at times, and in past retrospectives people have tried to address that. But rather than help people understand the effects of their actions better, such comments were typically met by sheer defensiveness, and as a result none of the negative behaviors changed. So this time no one brought up the problems with personal interactions, and we settled on a vague “do shit earlier” motto. What this means is that some people felt that the spec process dragged on for much too long, and that we would be better off if we kept that short and started coding sooner. No process for cutting short the time spent on specs was discussed, though, so it isn’t clear how this will be carried out. The main advantage of coding sooner is that many of these changes will break existing behaviors, and it is better to find that out early in the cycle rather than just before freeze. The downside is that we may start down a particular path early, and due to shortening the spec process, not realize that it isn’t the right (or best) path until we have already written a bunch of code. This will most likely result in a sunk cost fallacy argument in favor of patching the code and taking on more technical debt. Let’s hope that I’m wrong about this.

We moved on to Cells V2. On of the top priorities is listing instances in a multi-cell deployment. One proposed solution was to have Searchlight monitor instance notifications from the cells, and aggregate that information so that the API layer could have access to all cell instance info. That approach was discarded in favor of doing cross-cell DB queries. Another priority was the addition of alternate build candidates being sent to the cell, so that after a request to build an instance is scheduled to a cell, the local cell conductor can retry a failed build without having to go back through the entire scheduling process. I’ve already got some code for doing this, and will be working on it in the coming weeks.

In the afternoon we discussed Placement. One of the problems we uncovered late in the Pike cycle was that the Placement model we created didn’t properly handle migrations, as migrations involve resources from two separate hosts being “in use” at the same time for a single instance. While we got some quick fixes in Pike, we want to implement a better solution early in Queens. The plan is to add a migration UUID, and make that the consumer of the resources on the target provider. This will greatly simplify the accounting necessary to handle resources during migrations.

We moved on to discuss the status of Traits. Traits are the qualitative part of resources, and we have continued to make progress in being able to select resource providers who have particular traits. There is also work being done to have the virt drivers report traits on things such as CPUs.

We moved on to the biggest subject in Placement: nested resource providers. Implementing this will enable us to model resources such as PCI devices that have a number of Physical Functions (PFs), each of which can supply a number of Virtual Functions (VFs). That much is easy enough to understand, but when you start linking particular VCPUs to particular NUMA nodes, it gets messy very quickly. So while we outlined several of these complex relationships during the session, we all agreed that completing all that was not realistic for Queens. We do want to keep those complex cases in mind, though, so that anything we do in Queens won’t have to be un-done in Rocky.

We briefly touched on the question of when we would separate Placement out into its own service. This has been the plan from the beginning, and once again we decided to punt this to a future cycle. That’s too bad, as keeping it as part of Nova is beginning to blur the boundaries of things a bit. But it’s not super-critical, so…

We then moved on to discuss Ironic, and the discussion centered mainly on the changes in how an Ironic node is represented in Placement. To recap, we used to use a hack that pretended that an Ironic node, which must be consumed as a single unit, was a type of VM, so that the existing paradigm of selection based on CPU/RAM/disk would work. So in Ocata we started allowing operators to configure a node’s resource_class attribute; all nodes having the same physical hardware would be the same class, and there would always be an inventory of 1 for each node. Flavors were modified in Pike to accept an Ironic custom resource class or the old VM-ish method of selection, but in Queens, Ironic nodes will only be selected based on this class. This has been a request from operators of large Ironic deployments for some time, and we’re close to realizing this goal. But, of course, not everyone is happy about this. There are some operators who want to be able to select nodes based on “fuzzy” criteria, like they were able to in the “old days”. Their use cases were put forth, but they weren’t considered compelling enough. You can’t just consume 2 GPUs on a 4-GPU node: you must consume them all. There may be ways to accomplish what these operators want using traits, but in order to determine that, they will have to detail their use cases much more completely.

Thursday began with a Nova-Cinder discussion, which I confess I did not pay a lot of attention to, except for the parts about evolving and maintaining the API between the two. The afternoon was focused on Nova-Neutron, with a lot of discussion about improving the interaction between the two services during instance migration. There was some discussion about bandwidth-based scheduling, but as this depends on Placement getting nested resource providers done, it was agreed that we would hold off on that for now.

We wrapped up Thursday with another deep-dive into Placement; this time focusing on Generic Device Management, which has as its goal to be able to model all devices, not just PCI devices, as being attached to instances. This would involve the virt driver being able to report all such devices to the placement service in such as way as to correctly model any sort of nested relationships, and determine the inventory for each such item. Things began to get pretty specific, from the “I need a GPU” to “I need a particular GPU on a particular host”, which, in my opinion, is a cloud anti-pattern. One thing that stuck out for me was the request to be able to ask for multiple things of the same class, but each having a different trait. While this is certainly possible, it wasn’t one of the use cases considered when creating the queries that make placement work, and will require some more thought. There was much more discussed, and I think I wasn’t the only one whose brain was hurting afterwards. If you’re interested, you can read the notes from the session.

Friday was reserved for all the things that didn’t fit into one of the big topics covered on Wednesday or Thursday. You can see the variety of things covered on this etherpad, starting around line 189. We actually managed to get through the majority of those, as most people were able to stay for the last day of PTG. I’m not going to summarize them here, as that would make this post interminably long, but it was satisfying to accomplish as much as we did.

After the conference, my wife joined me, and we spent the weekend out in the nearby Rockies. We visited Rocky Mountain National Park, and to describe the views as breathtaking would be an understatement.

View of the mountains in Rocky Mountain National Park.

I would certainly say that the week was a success! It took me a few days upon returning to decompress after a week of intense meetings, but I think we laid the groundwork for a productive Queens release!

Lasik post-op

Two days ago I underwent Lasik surgery to correct my nearsightedness. It went as well as could be expected, and I’m currently seeing 20/20 without glasses or contacts. There is a little bit of hazy ghosting around bright objects, but that’s supposed to go away as my corneas heal.

First, the place where I had it done, LasikPlus, is first-class in every respect. They understand customer experience, and do everything to make the experience, including coughing up around $4K, as pleasant as it can possibly be. And my doctor, Bruce January, M.D., had a positive energy that was so infectious that it inspired confidence. The staff also did a great job explaining just what will happen, including trying to explain what it will feel like. The thing is, that only goes so far. So here are my impressions.

The first step in the process is cutting a flap in your cornea’s surface to expose the main part of the lens. There are two separate machines involved, so they have you lay down on a small (comfortable!) table that pivots between two machines. The first machine is where they place a circular device on the eye to hold it still.

operating room
I’m getting ready for the first part of the procedure to start.

Before this is done, some numbing drops are placed in your eyes, so there is no pain. You are told that you will feel it pressing on your eyeball, and that while there won’t be any pain, it will feel a little weird. That’s an understatement! After the device is on your eye, they move you a few feet to the second machine, which does the actual cutting of the cornea using a femtosecond laser. This is where it gets trippy.

You’re staring up and see several very bright white lights, but of course, you can’t blink. When they have you lined up, the machine presses down hard on the aforementioned circular device, and sure, it feel strange having that much pressure on your eyeball. But is even stranger is that you go blind in that eye! From bright white lights to black in a split second! Then you start seeing all sorts of colored patterns moving around. If you ever pressed against your closed eyes when you were a kid and saw the resulting visual effects, it’s sort of like that, but 100 times more intense. I saw spots of different colors that moved around randomly, leaving a trail of dots behind them. And while the visual show was interesting, the whole eyeball pressure thing was getting more and more uncomfortable. I’m not sure of the elapsed time; it was probably less than 30 seconds. But it felt a lot longer! When it was done, the pressure released, and the white lights reappeared. Now it was time to be wheeled back to the first machine, and repeat for the second eye. This time I had a much better idea as to what to expect, and while it was equally uncomfortable, it seemed to go more quickly.

This was the view on the monitor as I was about to have the flap cut into my cornea

Now it was time to get up and walk to the machine that actually reshaped the lenses. It was odd – I could sort of see where I was going, but felt quite a bit disoriented after the previous procedure. The operating room assistant guided me over, and I laid down for part two.

This time there was no discomfort; nothing pressing on the eye, just a small device to keep you from blinking. They told you to just keep focused on the green dot in the middle, while the red lights around it danced and blinked. After a few seconds of blinking it sounded like someone turned on a vacuum cleaner, and the red lights got more intense. The green dot turned into a green patch as the laser etched the lens. Then there’s the smell – you’ve smelled burning hair, right? Well, it’s pretty close to that. I don’t know why I was surprised, since I knew that the whole point of this procedure was to have a laser burn away parts of your lens to re-shape it, but smelling it brought home the reality of what was going on.

The whole process lasted only a few seconds. The smell went away, the vacuum sound stopped, and the green light returned to a dot. Then I saw what looked like a small brush going across my eye. This was the surgeon replacing the corneal flap over my eye. My wife was watching this on the monitors and said it looked like the doctor was smoothing wallpaper. Oh, I didn’t mention that the whole procedure area is viewable from the waiting area, including monitors that show what the doctor is seeing. Just one more thing that showed that they put a lot of thought into the whole experience. The photos here were all taken by my wife Linda while I was undergoing the procedure.

The view on the monitor during the second half of the procedure

Repeat with the other eye, and done! When I got up, I could see clearly enough, although everything had a fuzziness to it. Off to the side room where they give your eyes a quick once-over, repeat the instructions to you for applying the various drops, and we’re done!

post-surgery goggles
Wearing the protective goggles over my very bloodshot eyes 10 minutes after the procedure

I put on the sunglasses they give you, and got in the car for the ride home. Even with sunglasses and my eyes closed, it was uncomfortably bright outside. I kept my eyes closed for the whole ride home, only opening when we arrived. By now the numbing drops were beginning to wear off, and my eyes were watering like crazy. They were also beginning to burn a bit, and soon reminded me of the time I was cutting jalapeño peppers and absent-mindedly rubbed my eyes! Every so often my eyes would get uncomfortable and I’d open them a bit, only to have tears come gushing down my cheeks! It was clear that my eyes were not very happy! I did end up going through a lot of tissues that day!

They advise you to keep your eyes closed for several hours, and recommend that you take a nap. They give you some Tylenol PM to help you sleep, but that didn’t do anything at all for me. I have some over-the-counter sleep aid pills that I use when flying overseas, so I took a couple of those, and slept for the next 5 hours. When I awoke, my eyes felt better, although still a bit scratchy. I kept the drops up, and tried to keep my eyes closed as much as possible.

The next morning I awoke to much clearer vision. Bright areas had a soft halo around them, but that’s to be expected as the cornea heals. I kept up with the drops, as my eyes would start to feel a bit scratchy if I went too long without them. I had my day-after exam, and all was fine.

eye bruising
Some bruising is visible the day after surgery

So 24 hours after having my eyes zapped by lasers I was able to return to working, which requires staring at a screen. The trick is to limit it to 20 minutes at a time, after which I put in more eyedrops and get up to walk around and let my eyes focus on other things for a few minutes. And yes, this post was written in small chunks to give my eyes some rest.

Some post-Lasik thoughts:

As is common with people my age, I need reading glasses. Before the surgery, when I was wearing my contacts and needed to see something up close, the readers were necessary. However, when I removed my contacts I could see perfectly well up close. This was handy when I would awake in the morning and want to set an alarm on my phone for, say, 5 minutes of snoozing. Since the surgery I can’t read my phone when I pick it up in the morning! Guess I’ll have to keep a pair of readers on my nightstand.

Somewhat related to this, when I rolled over to say good morning to my wife, I couldn’t see her very clearly, either. This was far more troubling to me. I guess I had taken it for granted that I would always be able to see her when we woke up. This will take some getting used to. I’m hoping that as my eyes heal, this will not be as severe. I’m not sure the convenience of not having to wear contacts is worth losing this.