October 2016 – Walking Contradiction

Recently we’ve been doing a lot of work to revamp how the Nova Scheduler service manages the resources that are being requested in the cloud. The original design was very compute-centric, as the only thing we originally designed for was finding host machines that had enough CPU, disk, and RAM for the requested virtual machine. That design has been far too limiting, so in the past year we began making things simpler and more generic with the concept of Resource Providers. A resource provider is any entity that had something that could be shared in a virtual environment. Besides physical compute hosts, this would also handle shared storage, network resources, block storage, and anything else that could be virtualized. Those things that are being provided would be referred to as Resource Classes, and the amounts of each of those would be represented as integer amounts, making comparison simple (previously there were many complicated conditional code structures that were necessary to compare different types of things under the old model). These amounts are referred to as Inventory, and the consumed amounts of inventory are referred to as Allocations. Determining the available amount that a provider has of a particular resource class is a simple matter of subtracting the allocations from the inventory. This assumes, of course, that all of the inventory for a particular resource class is identical and interchangeable. (hint: they might not be!)

So far, everything seems straightforward enough. This model is designed to only address the quantitative aspect of resources; qualitative aspects are represented by boolean traits that can be assigned to resource providers (and only to resource providers). The classic example was different compute hosts that disk space available, where some was SSD and others were slower spinning disks. The disk space was all storage, measured in GB and treated equivalently. It was only the providers that were different, as distinguished by their differing traits.

However, once we began to consider more complex resources, things didn’t fit as well. SR-IOV devices, for example, allow their virtual functions (VFs) to be shared by virtual machines running on the host with the SR-IOV device. It is these VFs that are the actual resources provisioned to the virtual machines. Each compute node can also have multiple devices available, and they can be (and usually are) attached to different networks. So if we assume two devices that each offer 8 VFs, our typical model would have an inventory of 16 VFs for that resource provider.

It’s clear, though, that those 16 VFs are not interchangeable. A VM needs a VF attached to a particular network, and so we need to tell those two groups of VFs apart. The current solution being put forward tries to solve this by introducing a hierarchy of resource providers in a parent-child relationship, referred to as nested resource providers. In this approach, the compute host is the parent resource provider, with two child resource providers (the two SR-IOV devices). Each of those would have an inventory of 8 VFs, and we would distinguish them by assigning different traits to the child resource providers. While this approach does work, in my opinion it’s an unnecessary complication that is more of a workaround for two incorrect assumptions: that all inventory for a particular resource class is identical, and that traits describe resource providers.

The reason for this disconnect was that the original design of the resource provider/class model was too simple. It was based on a relation between the compute node and the inventory it controlled being flat, so that we could assign traits *of the inventory* to its provider, and it all worked. Think about it: is SSD vs.spinning disk really a trait of the compute node? Or is it a trait of the storage system? The iMac I have for our family has both SSD and spinning disk storage. If it were a compute node, what would its trait be set to? Clearly, saying that the storage type is a trait of the compute node is not correct. It is this error that requires the sort of complex workarounds such as nested resource providers.

So what it the alternative? I see two; there may be more. The first would be to make a separate ResourceClass for each type of resource. This has the advantage of preserving the notion that all inventory for a given resource class is interchangeable. In the SR-IOV case, there would be two classes of VFs (one for each network connection type), and the request to build a VM would specify which network the VF requires. Unfortunately, there are some who resist the idea of multiple resource classes for similar things; I believe that it’s an unfortunate result of naming them ‘classes’, since most of us who are experienced in OOP see that as bad class design. If they had been named ‘ResourceTypes’ instead, I doubt there would be as much resistance. The second approach doesn’t add more resource classes; instead, it would assign traits to the ResourceClass to distinguish among their respective inventories. While this may more accurately model the real world, it would require some changes to the inner workings of the placement engine, which assumes that all the inventory for a particular ResourceClass is interchangeable; it would now have to be class+traits that would be unique. It would also require extra calls to the traits API to find the right ResourceClass. That just seems like a lot of complication just to avoid making separate ResourceClasses.

Let’s imagine another example: Bike Shed As A Service! Our cloud provides virtual bike sheds using a Bike Shed ResourceProvider that can provide bike sheds on demand. There are a total of 32 bike sheds: 8 blue, 8 green, and 16 red (because red is the best color, obviously!). What would be the most practical way of representing them in the ResourceProvider framework? Can we really say that all the bike sheds are identical? Of course not! There is no way that a blue shed is anything like a prized red shed! So when I request my bike shed, of course I will specify “red bike shed”, not just any old shed.

The correct way to represent such a situation is to have a Bike Shed ResourceProvider, and it has 3 ResourceClasses: RedBikeShed, BlueBikeShed, and GreenBikeShed, each of which has an inventory of 16, 8, and 8 sheds, respectively. Contrast this with the nested resource provider proposal, which would have: A BikeShed ResourceProvider, with three child ResourceProviders, with traits of ‘red’, ‘blue’, and ‘green’ respectively, and each of which has separate inventories as above. Besides the inefficiency of the SQL joins required to query such a design, it really doesn’t reflect reality. There isn’t any such intermediary ‘provider’; it’s just an artifact of the workaround for an incorrect model.

To get back to the real-world SR-IOV example, it’s clear that the inventory of VFs for each device are not interchangeable, so therefore they belong to separate resource classes. We can bike shed on how to best name them (see what I did there?), but the end result would be an inventory of 8 VFs on network 1, and 8 VFs on network 2.

I know that the Bike Shed example is a very simple one, but one designed to show the problems with the nested approach. Let’s make sure that we aren’t digging ourselves into a design hole that will make things hard to work with as the placement engine design grows to incorporate all sorts of resources. Perhaps there may be a case that can only be solved with the nested approach, but I haven’t seen it yet.

The Valero Ride to the River is a two-day cycling event to raise money for research for a cure for Multiple Sclerosis. This was the third time I’ve ridden in it, but what made this year different is that this is the first time that Mother Nature didn’t completely wash out one of the days. We had gorgeous weather, with temperatures cool in the morning, and only climbing to the low 80sF (around 25-27C) in the afternoon.

Starting off on Day 1 (that’s me in the bright green)

The ride starts in San Antonio, and wanders east and north until it reaches New Braunfels. This route is about 71 miles, but near the end there is a choice: turn left, and finish your ride. Or, you can turn right, and go up the Guadalupe River for 15 miles, turn around, and return back, making the total ride 100 miles. As I had done a full century on my last ride, I didn’t feel the need to push myself to prove anything. I had told everyone that I was only doing the 71. But as the ride progressed, I continued to feel fresh. This was most likely due to the very mild weather: temperatures never rose very high, and there were enough clouds so that you weren’t baking in the sun the entire time. By the time I reached the lunch rest stop (50 miles in), I started thinking seriously about going for the century, but I told my wife I’d wait until the last rest stop before the decision point.

When I reached that stop, at around mile 65, I knew that I wanted to do the full century. I remembered the only other time that I did this course, and what a struggle those last 30 miles were, so I braced myself for the ride. I was very surprised to find that, while definitely an effort, it was nowhere near as exhausting as it had been the previous time. Either they smoothed the hills out, or I was in much better shape! ? So while I didn’t set any speed records, I finished the century much easier than my previous two. Here’s the record of my ride, thanks to the Runkeeper app.

Enjoying a well-deserved beer after completing the century!

My well-earned Century Rider armband. — My Century Rider armband.

The next day offered a choice of two looping routes: 61 miles or 38 miles through the Texas Hill Country. I had done the 61 mile route a couple of years ago, and remembered how grueling the hills were on that ride, so I chose to only do the 38. For comparison, the route for Day 1 was through areas to the east of San Antonio, which is relatively flat. I had about 4,300′ of total climb (43 ft/mile). This route took us to the northwest of New Braunfels, which is much hillier by far. The total climb was about 2,700′, or over 71 ft/mile! And as you can see from the graph below, most of that climb was in the first half of the ride. There isn’t much else to say about the Day 2 ride. The weather was once again perfect, and while the ride was difficult at times, it felt good overall. Here’s the Runkeeper summary for Day 2.

Of course, I can’t take all the credit. The ride was extremely well-organized by the MS Society, with well-staffed rest stops every 12-15 miles. They also arranged for police support for traffic management, so that riders didn’t get stuck (or struck!) at busy intersections. My belated apologies to the drivers who were made to wait while 2,000 riders passed through!

I also don’t think I would have been able to accomplish this without the loving support of my wife Linda, who gives me the motivation to stay healthy so that I can live a long life with her! Three years ago I thought it was a pretty amazing accomplishment to complete a century at age 55, but now to have done two centuries this year at age 58 is really more than I ever expected to achieve, and I have Linda to thank for that.

Month: October 2016

Virtual Bike Sheds

Ride to the River 2016