The ‘cloud’ in cloud computing derives from the amorphous location of your resources. Sure, you have a virtual server, but all you really know is that it’s somewhere on your cloud provider’s hardware, and you can reach it at a certain IP address. You generally don’t care about its exact location.
There are times, though, when you might care. High Availability (HA) designs are a good example of this: you want to avoid putting all your eggs in one basket, so to speak. You don’t want your resources in the same failure domain, so that when there is an outage, it is likely to only affect one part of that provider’s cloud, while other parts remain functional. In these cases, you would want to request that your VMs are placed in different locations so that in the event of an outage in one part of the cloud, the rest should remain functional.
After many discussions about this recently in the #openstack-nova IRC channel, I’ve thought about how to approach this, and wanted to write down my ideas so that others can comment on them.
The placement service already has the concept of aggregates, which are used primarily for determining which resources can be shared with other resources. I think that we can use aggregates for affinity modeling, too. Note: Placement aggregates are not Nova aggregates!
A resource’s location can be described at many levels. In the example of a VM, it is located on a host machine. This machine is located in a rack of machines. These racks are located in rows. Rows are located in rooms. Rooms are located… well, you get the idea. These more or less correspond with sharing the same network segments, power sources, etc. The higher you go up this nesting, the more isolated things are from each other.
As an operator of a cloud, you define which of these groupings you want to give your users the ability to choose from. You could, of course, choose to expose every possible layer to your users, but that may not be practical (or necessary). So for this example, let’s assume that users can specify affinity or anti-affinity at the rack, row, and room levels. Here’s how I think this can be achieved in placement:
Define an aggregate for each rack, and then add all the host machines in that rack to that agg. Once that’s done, it’s a simple matter to determine if a potential candidate for a new VM is in the same rack or not by simply testing membership in that aggregate.
Next, define an aggregate for the row. Add all the hosts in that row to this row aggregate. Yeah, that sounds cumbersome, but really, we could create a simple tool that lets the operator select the racks for a row, and iteratively adds the hosts in those racks to that aggregate. Once again, when this is done, it is simple to determine if a candidate is in a given rack by simply checking if it is a member of that rack’s aggregate. The same pattern would hold for defining the room aggregate.
While this would indeed require some setup effort by operators, I don’t think it’s any worse than other proposals. These definitions would be relatively static, as the layout of a DC doesn’t change too often, so the pain would be mostly up front, with the occasional update necessary.
Now we start to handle requests like “give me an instance that has this much RAM, VCPU, and disk, but make sure it’s in a different row than instance X”. To do this, we just need to look up what row aggregate instance X is in, and filter out those hosts that are also in that same aggregate. The naming of these grouping levels would be left to the operator, so they could be in control of how granular they want this to be, and also make sure that their flavors match these names.
This idea doesn’t address the “soft affinity” issue, which basically says “give me an instance in Row R, but if there isn’t any candidate found there, give me one in the closest row to that”. The notion of “closest”, or any sort of “distance”, for that matter, isn’t really something that is clearly defined, as it could be physical location, or number of network switches, or pretty much anything else. But if it is determined to be a priority to support, perhaps we could add a column to contain some sort of distance/location identifier to the PlacementAggregate table. Or maybe we could think up some other way of defining relative location. In any case, I think it would be a poor choice to design an entire system around a relatively esoteric requirement. We need support for robust affinity/anti-affinity in Placement, and I think we can get this done without adding more tables, relationships, and complexity.