A Guide to Alternate Hosts in Nova

One of the changes coming in the Queens release of OpenStack is the addition of alternate hosts to the response from the Scheduler’s select_destinations() method. If the previous sentence was gibberish to you, you can probably skip the rest of this post.

In order to understand why this change was made, we need to understand the old way of doing things (before Cells v2). Cells were an optional configuration back then, and if you did use them, cells could communicate with each other. There were many problems with the cells design, so a few years ago, work was started on a cleaner approach, dubbed Cells v2. With Cells v2, an OpenStack deployment consists of a top-level API layer, and one or more cells below it. I’m not going to get into the details here, but if you want to know more about it, read this document about Cells v2 layout. The one thing that’s important to take away from this is that once a process is cast to a cell, that cell cannot call back up to the API layer.

Why is that important? Well, let’s take the most common case for the scheduler in the past: retrying a failed VM build. The process then was that Nova API would receive a request to build a VM with particular amounts of RAM, disk, etc. The conductor service would call the scheduler’s select_destinations() method, which would filter the entire list of physical hosts to find only those with enough resources to satisfy the request, and then run the qualified hosts through a series of weighers in order to determine the “best” host to fulfill the request, and return that single host. The conductor would then cast a message to that host, telling it to build a VM matching the request, and that would be that. Except when it failed.

Why would it fail? Well, for one thing, the Nova API could receive several simultaneous requests for the same size VM, and when that happened, it was likely that the same host would be returned for different requests. That was because the “claim” for the host’s resources didn’t happen until the host started the build process. The first request would succeed, but the second may not, as the host may not have had enough room for both. When such a race for resources happened, the compute would call back to the conductor and ask it to retry the build for the request that it couldn’t accomodate. The conductor would call the scheduler’s select_destinations() again, but this time would tell it to exclude the failed host. Generally, the retry would succeed, but it could also run into a similar race condition, which would require another retry.

However, with cells no longer able to call up to the API layer, this retry pattern is not possible. Fortunately, in the Pike release we changed where the claim for resources happens so that the FilterScheduler now uses the Placement service to do the claiming. In the race condition described above, the first attempt to claim the resources in Placement would succeed, but the second request would fail. At that point, though, the scheduler has a list of qualified hosts, so it would just move down to the next host on the list and try claiming the resources on that host. Only when the claim is successful would the scheduler return that host. This eliminated the biggest cause for failed builds, so cells wouldn’t need to retry nearly as often as in the past.

Except that not every OpenStack deployment uses the Placement service and the FilterScheduler. So those deployments would not benefit from the claiming in the scheduler change. And sometimes builds fail for reasons other than insufficient resources: the network could be flaky, or some other glitch happens in the process. So in all these cases, retrying a failed build would not be possible. When a build fails, all that can be done is to put the requested instance into an ERROR state, and then someone must notice this and manually re-submit the build request. Not exactly an operator’s dream!

This is the problem that alternate hosts addresses. The API for select_destinations() has been changed so that instead of returning a single destination host for an instance, it will return a list of potential destination hosts, consisting of the chosen host, along with zero or more alternates from the same cell as the chosen host. The number of alternates is controlled by a configuration option (CONF.scheduler.max_attempts), so operators can optimize that if necessary. So now the API-level conductor will get this list, pop the first host off, and then cast the build request, along with the remaining alternates, to the chosen host. If the build succeeds, great — we’re done. But now, if the build fails, the compute can notify the cell-level conductor that it needs to retry the build, and passes it the list of alternate hosts.

The cell-level conductor then removes any allocated resources against the failed host, since that VM didn’t get built. It then pops the first host off the list of alternates, and attempts to claim the resources needed for the VM on that host. Remember, some other request may have already consumed that host’s resources, so this has a non-zero chance of failing. If it does, the cell conductor tries the next host in the list until the resource claim succeeds. It then casts the build request to that host, and the cycle repeats until one of two things happen: the build succeeds, or the list of alternate hosts is exhausted. Generally failures should now be a rare occurrence, but if an operator finds that they happen too often, they can increase the number of alternate hosts returned, which should reduce that rate of failure even further.

Handling White Supremacists

This is a thought experiment, not an actual proposal, so hear me out. Let’s take one state (Alabama and Mississippi come to mind as front-runners), and make it white-only. There will be a national relocation service, and within a period of say, two years, all non-white, non-Christian (can’t have Muslims or Jews here!) people will have to relocate to anywhere else in the country they want. Similarly, all white supremacists, including KKK, Nazis, “alt-right”, etc., will have to relocate to this newly-whitened state. For convenience, let’s assume that Alabama is selected, and renamed Alabaster.

Once that is done, it will be illegal to espouse those views outside of the state of Alabaster, but it will be completely legal within. We can follow the Chinese example and create a Great Firewall around Alabaster, so that all social media posts made in this state cannot be seen outside, and posts from non-whites cannot be seen within. Inside Alabaster they will be free to fly Confederate and Nazi flags and erect all the statues to Confederate figures they want. Anyone caught promoting white supremecist messages in the rest of the country will be forced to relocate to Alabaster. And, of course, Fox News will only be able to be seen by residents of Alabaster (and it will be the only news station).

We could also make it more comfortable for these racists by eliminating some of the other things that bug them. Obamacare will not apply to residents of Alabaster, so they can enjoy being excluded for pre-existing conditions and the like. There will also be no federal taxes: income tax, social security, medicare – none. Of course, no federal funds will be given to Alabaster, so they’ll have to maintain their own roads, feed their own poor, care for their own elderly, all out of their own pockets. In other words, Tea Party heaven!

Having set the stage for this thought experiment, I’m wondering what the population of Alabaster would eventually be. In other words, how many people in the current USA would volunteer to move to such a white-only state? And for those who did, I wonder how happy they’d be, now that all of the things that they are currently so angry about have been removed from their lives. I would like to hear your thoughts on this.

Sydney Summit Recap

Last week was the OpenStack Summit, which was held in Sydney, NSW, Australia. This was my first summit since the split with the PTG, and it felt very different than previous summits. In the past there was a split between the business community part of the summit and the Design Summit, which was where the dev teams met to plan the work for the upcoming cycle. With the shift to the PTG, there is no move developer-centric work at the summit, so I was free to attend sessions instead of being buried in the Nova room the whole time. That also meant that I was free to explore the hallway track more than in the past, and as a result I had many interesting conversations with fellow OpenStackers.

There was also only one keynote session on Monday morning. I found this a welcome change, because despite getting some really great information, there are the inevitable vendor keynotes that bore you to tears. Some vendors get it right: they showed the cool scientific research that their OpenStack cloud was enabling, and knowing that I’m helping to make that happen is always a positive feeling. But other vendors just drone about things like the number of cores they are running, and the tools that they use to get things running and keep them running. Now don’t get me wrong: that’s very useful information, but it’s not keynote material. I’d rather see it written up on their website as a reference document.

Keynote audience
A view of the audience for Monday’s keynote

On Monday after the keynote we had a lively session for the API-SIG, with a lot of SDK developers participating. One issue was that of keeping up with API changes and deprecating older API versions. In many cases, though, the reason people use an SDK is to be insulated from that sort of minutiae; they just want it to work. Sometimes that comes at a price of not having access to the latest features offered by the API. This is where the SDK developer has to determine what would work best for their target users.

Chris Dent
Chris Dent getting ready to start the API-SIG session
API-SIG session
Many of the attendees of the API-SIG session

Another discussion was how to best use microversions within an SDK. The consensus was to pin each request to the particular microversion that provides the desired functionality, rather than make all requests at the same version. There was a suggestion to have aliases for the latest microversion for each release; e.g., “OpenStack-API-Version: compute pike” would return the latest behaviors that were available for the Nova Pike release. This idea was rejected, as it dilutes the meaning and utility of what a microversion is.

On the Tuesday I helped with the Nova onboarding session, along with Dan Smith and Melanie Witt. We covered things like the layout of code in the Nova repository, and also some of the “magic” that handles the RPC communication among services within Nova. While the people attending seemed to be interested in this, it was hard to gauge the effectiveness for them, as we got precious few questions, and those we did get really didn’t have much to do with what we covered.

That evening the folks from Aptira hired a fairly large party boat, and invited several people to attend. I was fortunate enough to be invited along with my wife, and we had a wonderful evening cruising around Sydney Harbour, with some delicious food and drink provided. I also got to meet and converse with several other IBMers.

Aptira Boat
The Clearview Glass Boat for the Aptira party getting ready to board passengers
 Sydney Harbour Cruise
Linda and I enjoying ourselves aboard the Aptira Sydney Harbour Cruise.
Food
We enjoyed the food and drink!
IBMers
Talking with a group of IBMers. It looks like I’m lecturing them!

There were other sessions I attended, but mostly out of curiosity about the subject. The only other session with anything worth reporting was with the Ironic team and their concerns about the change to scheduling by resource classes and traits. There was still a significant lack of understanding about how this will work for many in the room, which I interpret to mean that we who are creating the Placement service are not communicating this well enough. I was glad that I was able to clarify several things for those who had concerns, and I think that everyone had a better understanding of both how things are supposed to work, as well as what will be required to move their deployments forward.

One development I was especially interested in was the announcement of OpenLab, which will be especially useful for testing SDKs across multiple clouds. Many people attending the API-SIG session thought that they would want to take advantage of that for their SDK work.

My overall impression of the new Summit format is that, as a developer, it leaves a lot to be desired. Perhaps it was because the PTGs have become the place where all the real development planning happens, and so many of the people who I normally would have a chance to interact with simply didn’t come. The big benefit of in-person conferences is getting to know the new people who have joined the project, and re-establishing ties with those with whom you have worked for a while. If you are an OpenStack developer, the PTGs are essential; the Summits, no so much. It will be interesting to see how this new format evolves in the future.

If you’re interested in more in-depth coverage of what went on at the Summit, be sure to read the summary from Superuser.

The location was far away for me, but Sydney was wonderful! We took a few days afterwards to holiday down in Hobart, Tasmania, which made the long journey that much more worth the effort.

Darling Harbour
Panoramic view of Darling Harbour from my hotel. The Convention Centre is on the right.

Bag Claim Etiquette

This seems so simple that it should be obvious, but it apparently it’s not, so I guess I have to spell it out. When waiting at a baggage claim carousel, you should stand back a few feet until you see your bag. Not only does it make it easier for everyone to get a view of the bags, but it leaves room for unloading the bags. Don’t crowd the carousel like these people:

One guy has the right idea, but he can’t see the bags that are coming because the others are blocking his view.

On a recent flight I saw my bag coming around , and had to squeeze my way through the people crowding the carousel. When it came, I grabbed it and lifted it off in order to put it on the ground. In the process it struck one of the people crowding the area, and he gave me a dirty look. I gave him one right back. It’s simply rude not to give a fellow traveler enough room to retrieve their luggage, and when you see someone grabbing their bag, it’s up to you to get out of their way.

Trump Is Not The Enemy

Are you numb yet? To all of the outrageous, dishonest, and self-serving things that Donald Trump does? Or does your blood still boil every time his name is mentioned? I find myself in the latter camp, but I’m here to remind you of something very important: Trump Is Not The Enemy.

He is an enemy, of course, but by focusing on him, we miss the bigger picture. Thought experiment:  imagine that tomorrow morning you wake up to see that Trump has tweeted his resignation. He’s gone. Not only that, but Bob Mueller also announces indictments of Trump Sr., Trump Jr., and Ivanka Trump. They’re all going to prison for a long, long time. Would that mean that things will be all right again?

Hardly. The Republican party controls all 3 branches of government, and they have been more than happy to ride along with the Trump populist train in order to achieve their goals. They have confirmed nominees to cabinet posts who are not only unqualified, but who have expressed views that are 180° opposed to the office they occupy. They have tried to take away health care to millions of people, and are currently planning on redistributing even more of the income of the masses to the extremely wealthy. They have wantonly ignored the truth, and instead parroted Fox News talking points. If Trump were to disappear, we’d have President Mike Pence, who would gladly continue, if not accelerate, the decline of our country.

It is not Trump, but the Republican party that is the true enemy. The only way we can progress as a country is to remove them from power.

How do we do that? My suggestion is to tie everything that Trump does to Republicans. If Trump hints about nuking North Korea or killing gay people, don’t place the blame solely on him. Place it on the enablers. Place it on the Republicans. Hold them accountable.

There are several Republicans in Congress who have expressed privately that they feel that Trump is unstable, but instead of acting on that, they just let it continue. Hold them accountable.

There is an overwhelming track record of violating the emoluments clause, and personally profiting from government use of his properties, but the Republicans turn a blind eye to that. Hold them accountable.

If in the weeks to come, evidence of Trump’s collusion with Russia surfaces, or any of a number of other  potentially impeachable offenses are revealed, the power to impeach is 100% in the hands of the Republican majority. If they don’t act as swiftly and as thoroughly as they did investigating Hillary Clinton for Benghazi, they are excusing those acts, and thus complicit. Make that the headline, and hold them accountable.

We need to constantly tie our outrage to the Republicans, and not just to Trump, if we ever hope to move this country back to a positive direction. We need to stop focusing on winning the White House, and focus instead on winning state legislatures so that we can undo the gerrymandering that has allowed them to control the House with a minority of votes. We need to be sure to target every single Republican who is up for re-election in 2018, and tie them inextricably to Donald Trump. Trump is going down in flames, and we need to bring all of his enablers down with him.