Out of the Closet

From the time he was an adolescent, Johnny was always aware that he was somehow different than others. His parents, teachers, ministers, and neighbors all told him things that he didn’t feel were correct. He had thoughts and feelings that were clearly considered evil by the society around him, but try as he might, those feelings never went away. So in public he pretended to be the way they expected him to be. He got pretty good at pretending; so good that no one had a clue as to his true nature. He dreamt of a day when he could stop pretending, and be who he really was.

At first he thought he must be the only one who had to keep such a secret. Sure, there were a few people like him who were open about who they were, but they were reviled among his family and friends, and he sure didn’t want to become an outcast. So he kept pretending.

A few years later things slowly started to improve for Johnny. Many people in the media, and even some popular politicians, began to talk about these things. Not openly, of course – that would never have worked. But they clearly hinted at it, using code words and loose word associations that were understood by their listeners, but which could always be publicly denied as having any subtext. He began to notice that others were responding to these signals. Lots of other people. He began to understand that he was far from being alone.

He also started to think that if people like him were to unite and work together, they could change the underlying culture of society. So he started meeting with other like-minded people. He began to become politically active, and supported those candidates who were clearly sympathetic to his view of the world. As more and more of these candidates for change were elected, he began to feel more confident that things were finally changing!

And now, after years of supporting candidates who spoke about these matters by using carefully-chosen code words, a new, fresh candidate has emerged who spoke openly about the things he always believed! Donald Trump didn’t bother with the polite code words; he said what he felt, and this was exactly what Johnny had been waiting for: someone who represented what those feelings.

For Johnny is a racist. He never liked blacks or Jews, and always thought gays were perverts and should be locked away. He wanted to send all the Mexicans back, and keep Muslims in their countries, where we could bomb the shit out of them. He doesn’t see anything wrong about the Confederate flag, except that people are being too “politically correct” about it. Oh, and the misogyny! He had always felt that only men should be leaders, since women were inferior. He wished that someday women just shut up about equality, and go back to their “traditional” roles of cooking, cleaning, and raising babies, while always submitting to his sexual desires.

Johnny still can’t say those things out loud in public, because he knows that he would be ostracized socially, and would probably lose his job if his boss knew. So he still pretends, but come November, he will ecstatically cast his vote for Trump. And despite polls showing that Trump has nearly no chance of winning, Trump will end up getting millions of votes from people like Johnny who are skilled at acting one way in public, but who secretly long for the days of segregation and male dominance.

Don’t kid yourselves into thinking that people like Johnny are rare. All you have to do is spend any time on the internet and they will use that anonymity to reveal themselves. They are much more common than you think, and if you get complacent reading polls that show Trump as wildly unpopular, you will be in for a shock when he continues to beat the pollsters. Because polls rely on people saying what they honestly think, and these racists may be ignorant, but they aren’t dumb. They will happily report to be shocked by what Trump says when asked publicly, while inwardly smiling and thinking “ah, one of us!”. Don’t fall into that trap. Treat him, and those who support him in the shadows, as the serious threat that they are.

The Second Century

No, I’m not talking about history – this is about my cycling ride on Saturday. I participated in the 2016 Tour de Cure San Antonio, and completed the 103-mile course. I’ve only ridden a century (a 100-mile ride) once before, and my attempts at doing another were thwarted twice: once, a year later, when the entire ride was washed out by heavy thunderstorms, and then again at last year’s Tour de Cure, when they closed the century course early due to thunderstorms.

Start of ride
Lining up for the start of the ride (at 7am)!

 

Well, this year’s ride had its share of thunderstorms, too, but fortunately they were at the end. The day started off overcast and threatening-looking, but nothing came of all those clouds. About 30 miles into the ride the sun burst through, and I was hoping that it would stick around for a while. However, we only got to enjoy the sunshine for an hour or so until the clouds returned. It kept looking darker and darker as the ride progressed, and then at the rest stop at mile 80 there were event officials warning that a little ways up the road it was already raining heavily. They had vehicles that would shuttle you and your bike to the finish line if you didn’t want to ride through the storm, but that wasn’t what I had set out to do. What’s a little water, anyway?

To be honest, I was feeling pretty drained after 80 miles. When you sweat while cycling, the breeze against you dries it quickly, so after a few hours it feels like a salty crust. My leg muscles also felt like they had begun to run out of energy. But I set out to continue the ride anyway, and sure enough, about a mile later the skies opened up. Within minutes I was soaked from my helmet to my shoes. Oddly enough, though, it was actually re-invigorating! And once you’re wet, more rain isn’t getting you any wetter, so I rode on. The loud cracks of thunder sounded great, like music for a film I was starring in. Yeah, it felt pretty dramatic!

So I made it to the finish. The first time I did a century I was struggling – hard. I wasn’t even running on fumes then; hell, I would have loved to have had some fumes at that point. I had to stop several times in that last 30 mile loop to regain enough strength to keep going. So completing that ride was a matter of sheer will power. This year it was different: sure, I was tired during the ride, and a bit stiff afterwards, but when I got within a few miles of the finish, I found another gear and sprinted my way in.

Crossing the finish line
Crossing the finish line after 103 miles!

 

I think that there were several differences this year. I had trained much better this time, so my legs were better able to keep going for the distance. It was also much cooler, with temperatures in the 70s (instead of around 90F). And the rain, while making some aspects uncomfortable, certainly helped to refresh me. Finally, the course this year didn’t have very many severe hills. It had lots of climb, but nothing compared to the earlier course, which featured several killer hills.

posing with medal
Posing with my medal after finishing the ride, soaking wet!

 

There are three sets of people I want to thank: first, the American Diabetes Association, for organizing this event and making it run so smoothly – you’re really doing great work! Second, to the members of the ProFox online community for generously donating to support me. Together we raised $500! And finally, of course, to my wonderful wife Linda, who encouraged me every step of the way, and even drove back home to get my water bottles that I had forgotten. Hey, it was 6 in the morning, and my brain hadn’t caffeinated enough yet!

Linda and Ed
Linda and I, just before the start of the ride

Mea Culpa and Clarification

With my recent posts I seem to have confused people, and instead of helping us all see a better solution, I’ve made things murkier. So mea culpa.

The confusion comes from mentioning two distinct and mostly unrelated problems in different posts: the issues with the current Nova Scheduler regarding resource modeling and scalability, and the problem with fragmented data in the Cells V2 design. Because I proposed Cassandra as a solution to the first, many assumed that I was promoting it as the cure-all for everything in Nova. That’s not the case, so let me start with the focus on the cells issue.

The design of Cells V2 has a globally-available database, and separate database instances in each cell. The rationale was that this limits the failure domain, so if a single cell’s DB (or any other local service) goes down, the rest of my cloud will still operate normally. While this is a big advantage for the message queue, it comes at a high cost for data, as it will be difficult now to get a view of, say, a user’s resources across cells. Users don’t see (and can’t specify) the cell for their instance, so it is important to keep that global view. The response to my criticism was split between “yeah, that’s a bad idea” and “look, we can add this additional dependency and layer of complexity to fix it!”. The ROME approach to replacing MySQL with Redis was an interesting approach, but further discussion on the email list pointed to a much better choice (IMO): Vitess. Vitess would provide the failure isolation without having to fragment the data. So I would prefer to see everything moved to a single database, and if failure isolation and redundancy is important for the database, add a tool like Vitess to handle that. I don’t think that Cells V2 is a bad idea; quite the opposite is true. My only concern was the data design and the implications of that design on everything else in Nova.

Now to get back to the Scheduler, my proposal for Cassandra was based on two things: fast, reliable data availability without duplication and syncing, and the difficulty of modeling very different resource types in a single, inflexible relational design. Those were the biggest problems facing the Scheduler, and as the long-term plan is to separate the Scheduler into its own service so that it can support an even greater number of resource types, it seemed like settling on a static resource model now was going to lead to huge technical debt in the future. I had hoped to spur a discussion about that, and it certainly did. But let me make clear that I don’t think those arguments apply to Nova as a whole.

So again, mea culpa. Let’s keep the discussions going, because even though there has been some negative energy released in the process, the overall impact has been quite positive. I had never heard of Vitess before, and had no idea that it allowed YouTube to be able to use MySQL to handle the data loads it does. It’s exciting to see all these incredibly smart people with different technical backgrounds work together to come up with better and better solutions.

Fragmented Data

(This is a follow-up to my earlier post on Distributed Data)

One of the more interesting design sessions today at the OpenStack Design Summit was focused on Nova Cells V2, which is the effort to rework the way cells work in Nova. Briefly, cells are a mechanism for allowing separate independent deployments to work as a single cloud, primarily as a way to provide horizontal scalability. They also have other uses for operators, but that’s the main reason for them. And as separate deployments, they have their own API service, conductor service, message queue, and database. There are several advantages that this kind of independence offers, with failure isolation being one of the biggest. By this I mean that something goes wrong and a cell is unreachable, it doesn’t affect the performance of the remaining cells.

There are tradeoffs with any approach, and this one is no different. One glaring issue that came up at that session is that there is no simple way to get a global view of your cloud. The example that was discussed was the common case of listing all your instances, which would require querying each cell independently, aggregating the results, and then sorting the aggregated records. For small clouds this process is negligible, but as the size grows, so does the overhead and complexity. It is particularly problematic for something that requires multiple calls, like pagination. Let’s consider a site with thousands of instances spread across dozens of cells. Typically when querying a large list like that, the API will return the first few, and include a link for the next batch. With a fragmented database, this will require some form of centralized caching approach, or, if that’s not feasible or the cache is stale, re-running the same costly query, aggregation, and sorting process for each page of data requested. With that, any gain that might have been realized by separating the databases will be more than offset by a need for a way to efficiently recombine that data. This isn’t only a cost for more memory/CPU for the API service to handle the aggregation and caching, which will only need to be borne by the larger cloud operating companies. It is an ongoing cost of complexity to the developers and maintainers of the Nova codebase to handle this, and every new part of Nova will be similarly difficult to fit.

There are other places where this fragmented database design will cause complexity, such as having the Scheduler require a database connection to every cell, and then query every cell on each request, followed by aggregating the results… see the pattern? Splitting a database to improve performance, or sharding, only makes sense if you shard along a line that logically separates the data so that each shard can be queried efficiently. We’re not doing that in the design of cells.

It’s not too late. There is a project that makes minimal changes to the oslo.db driver to allow replacing the SQLAlchemy and MySQL database that underpins Nova with a distributed database (they used Redis, but it doesn’t depend on Redis). It should really be investigated further before we create a huge pile of technical and design debt by fragmenting the data in Nova.

OpenStack Ideas

I’ve written several blog posts about my ideas for improving OpenStack, with a particular emphasis on the Nova Scheduler. This week at the OpenStack Summit in Austin, there were two other proposals put forth. So at least I’m not the only one thinking about this stuff!

At the Tuesday keynote, Intel demonstrated a version of OpenStack that was completely re-written in Go. They demonstrated creating 10,000 containers and 5,000 VMs in under a minute. Pretty impressive, right? Well, yeah, except they gave no idea of what parts of Nova were supported, and what was left out. How were all those VMs scheduled? What sort of logging was done to help operators diagnose their sites? None of this was shown or even discussed. It didn’t seem to be a serious proposal for moving OpenStack forward; instead, it seemed that it was a demo with a lot of sizzle designed to simply wake up a dormant community, and make people think that Intel has the keys to our future. But for me, the question was always the same one I deal with when I’m thinking about these matters: how do you get from the current OpenStack to what they were showing? Something tells me that rather than being a path forward, this represents a brand-new project, with no way for existing deployments to migrate without starting all over. So yeah, kudos on the demo, but I didn’t see anything directly useful in it. Of course Go would be faster for concurrent tasks; that’s what the language was designed for!

The other project was presented by a team of researchers from Inria in France who are aiming to build a massively-distributed cloud with OpenStack. Instead of starting from scratch as Intel did, they instead created a driver for oslo.db that mimicked SQLAlchemy, and used Redis as the datastore. It’s ironic, since the first iteration of Nova used Redis, and it was felt back then that Redis wasn’t up to the task, so it was replaced by MySQL. (Side note: some of my first commits were for removing Redis from Nova!) And being researchers, they meticulously measured the performance, and when sites were distributed, over 80% of the queries performed better than with MySQL. This is an interesting project that I intend on following in the future, as it actually has a chance of ever becoming part of OpenStack, unlike the Intel project.

I still hold out hope that one day we can free ourselves of the constraints of having to fit all resources that OpenStack will ever have to deal with into a static SQL model, but until then, I’m happy with whatever incremental improvements we can make. It was obvious from this Summit that there are a lot of very smart people thinking about these issues, too, and that fills me with hope for the long-term health of OpenStack.