May 2015 – Walking Contradiction

After several days of intense discussions at the Vancouver OpenStack Summit, it’s clear to me that we have a giant pile of technical debt in the scheduler, based on the way we think about resources in a cloud environment. This needs to change.

In the beginning there were numerous compute resources that were managed by Nova. Theoretically, they could be divided up in any way you wanted, but some combinations really didn’t make sense. For example, a single server with 4 CPUs, 32GB of RAM, and 1TB of disk could be sold as several virtual servers, but if the first one requested asked for 1CPU, 32GB RAM and 10GB disk, the rest of the CPUs and disk would be useless. So for that reason, the concept of flavors was born: particular combinations of RAM, CPU and disk that would be the only allowable way to size your VM; this would allow resources to be allocated in ways that would minimize waste. It was also convenient for billing usages, as public cloud providers could charge a set amount per flavor, rather than creating a confusing matrix of prices. In fact, the flavor concept was brought over from Rackspace’s initial public cloud, based on the Slicehost codebase, which used flavors this way. Things were simple, and flavors worked.

Well, at least for a while, but then the notion of “cloud” continue to grow, and the resources to be allocated become more complex than the original notion of “partial slices of a whole thing”, with new things to specify, such as SSD disks, NUMA topologies and PCI devices. These really had nothing to do with the original concept of flavors, but since they were the closest thing to saying “I want a VM that looks like this”, these extra items were grafted onto flavors, as ‘flavor’ became a synonym for “all the stuff I want in my VM”. These additional things didn’t fit into the original idea of a flavor, and instead of recognizing that they are fundamentally different, the data model was updated to add things called ‘extra_specs’. This is wrong on so many levels: they aren’t “extra”; they are as basic to the request as anything else. These extra specs were originally freeform key-value pairs, and you could stuff pretty much anything in there. Now we have begun the process of cleaning this up, and it hasn’t been very pretty.

With the advent of Ironic, though, it’s clear that we need to take a step back and think this through. You can’t allocate parts of a resource in Ironic, because each resource is a single non-virtualized machine. We’ve already broken the original design of one host == one compute node by treating Ironic resources as individual compute nodes, each with a flavor that represents the resources of that machine. Calling the Ironic machine sizes “flavors” just adds to the error.

We need to re-think just what it means to say we have a resource. We have to stop trying to treat all resources as if they can be made to follow the original notion of a divisible pool of stuff, and start to recognize that only some resources follow that pattern, while others are discreet. Discreet resources cannot be divided, and for them, the “flavor” notion simply does not apply. We need to stop trying to cram everything into flavor, and instead treat the request as what we need to persist, with ‘flavor’ being just one possible component of the request. The spec to create a request object is a step in the right direction, but doesn’t do enough to shed this notion of requests only being for divisible compute resources.

Making these changes now would make it a lot easier in the long run to turn the current nova scheduler into a service that can allocate all sorts of resources, and not just divide up compute nodes. I would like to see the notion of resources, requests, and claims all completely revamped during the Liberty cycle, with the changes being completed in M. This will go a long way to making the scheduler cleaner by reducing the technical debt by assumption that we’ve built up in the last 5 years.

Yesterday was the 2015 Tour de Cure San Antonio, a cycling event to help raise money to find a cure for diabetes. This was the third time I’ve ridden it, and the first time I felt in good enough shape to attempt the century course (century = 100 miles). In order to fit in such a long ride, we arrived at the site at 6am! Note: I’m not one of those crazy people who think this is a good time to be doing anything other than drinking coffee.

We were scheduled to start at 6:30, so we all lined up at the starting line before then. But the event organizers thought that it would be a wonderful idea to talk to the riders about all the wonderful things we were helping to accomplish by raising the funds that we did, so they kept us waiting until just before 7:00, straddling our bikes. I was ready to go a half hour earlier, and instead of starting the ride out ready to conquer the world, I started the ride feeling kind of crabby. All the rides do this to some degree, but keeping us waiting for over 30 minutes was uncalled for.

The weather was the big question mark, with rain and thunderstorms moving across the region. And, of course, we didn’t escape them! It started around mile 25, and continued for the next 10 miles or so. Lightning, rain, big wind gusts (straight into our face, of course!), but I kept going, knowing that there was a cutoff time for the century: if you didn’t reach the point where the 100 and 65 mile routes diverged by 11am, you wouldn’t be allowed to do the century, because you wouldn’t finish in time. Here’s a shot of the rest stop right after the rain stopped.

Rest Stop #3 — After riding through the storm – soaked!

You really can’t see how soaked everyone is, but trust me, my gloves and socks were pretty soggy! You can, however, see the patches of blue sky just beginning to break through. The rest of the ride was dry, which was a relief.

I got to the rest stop located 3 miles before the point where the routes split a few minutes after 10am, so I was happy that I made the effort to ride through the bad weather. I’ve only done a full century once before, and it was really important to me to not have that be a one-time event. I headed out from that rest stop, and continued down the road. If you haven’t done a ride like this, they give you a map of the route ahead of time, but most of the roads are in pretty remote areas where you don’t know the roads, so you navigate with the help of signs put up on the side of the road by the event organizers. They have each route marked with a different color, so where the routes diverge is easy to see. So I rode ahead with some others who were also doing the century, but a few miles later we came upon a sign that only listed the 65-mile route; there was no mention of the 100! We stopped, thinking that we must have missed the sign; perhaps it had blown over in the storm, and we all didn’t see it. Just then a marshall drove up (the routes are patrolled by ride marshalls, who make sure that riders are safe), so we stopped him to ask about the 100 mile route. He checked it out on the radio, and then told us that we should go to the next rest stop, where the routes will diverge. Well, I got to that stop, and asked the people there, and they told us that they had pulled the direction signs for the century an hour earlier than planned! I was furious! All of the work I had put in to training for this ride, and all of the discomfort of riding through the thunderstorm so I could make the cutoff, and they took that away from me and many other riders for no reason.

So I took out my phone, pulled up the century route PDF, and tried to plot a path to go to one of the rest stops on that route. I couldn’t backtrack to find the turnoff intersection, because even if I had, I would have been much too late at this point. So I knew I wouldn’t be able to do the full century, but at least I’d get as close as I could. So Google Maps plotted a route, and I took off, ignoring the signs for the 65 mile route, and creating my own.

The only problem was that Google Maps thinks that there are a bunch of roads in that area that simply don’t exist. I went up and down the roads it suggested, until I finally gave up and figured I had better head back to the finish of the ride. Here’s one example: note that the map in the lower left corner shows a road, but what is actually there is a driveway made of sand that dead-ends at someone’s house. The map shows it continuing all the way through. And yes, I plan on letting the fine folks at Google Maps know about this problem.

So I rode back to the highway, and continued west until I hit the return path for the century route. I followed that back to the finish, with a total of 81 miles on the day (here’s the RunKeeper record of my ride). And waiting for me at the end was my wonderful woman Linda, who has done so much to support me for this ride. It was great to see her smiling face!

So while I didn’t get to complete another century, I did have an unusually adventurous ride. I do hope that the organizers learn from this event, because I would really like to do it again next year, and it is for a very good cause. If you’re interested in donating, they are still accepting donations for this event for the next few weeks, so follow this link and give what you can.

Month: May 2015

Rethinking Resources

Tour de Cure 2015