Choose Your Own FiveThirtyEight Adventure

In case you weren’t aware, the US election is in less than a week, five days. I had written a long list of issues on the ballot, but it kept getting longer and longer so I cut it. Suffice it to say, Americans are voting on a lot of issues this year. But a US presidential election is not like many other countries’ elections in that we use the Electoral College.

For my non-American readers, the Electoral College, very briefly, was created by the country’s founding fathers (Washington, Jefferson, Adams, Franklin, et al.) to do two things. One, restrict selection of the American president to a class of individuals who theoretically had a broader/deeper understanding of the issues—but who also had vested interests in the outcome. The founders did not intend for the American people to elect the president. The second feature of the Electoral College was to prevent the largest states from dominating smaller states in elections. Why else would Delaware and Rhode Island surrender their sovereignty to join the new United States if Virginia, Pennsylvania, and New York make all the decisions? (The founders went a step further and added the infamous 3/5 clause, but that’s another post.)

So Americans don’t elect the president directly and larger states like California, New York, and Texas, have slightly less impact than smaller states like Wyoming, Vermont, and Delaware. Each state is allotted a number of Electoral College votes and the key is to reach 270. (Maybe another time I’ll get into the details of what happens in a 269–269 tie.) Many Americans are probably familiar with sites like 270 To Win, where you can determine the outcome of the election by saying who won each state. But, even though the US election is really 50 different state elections, common threads and themes run through all those states and if one candidate or another wins one state, it makes winning or losing other states more or less likely. FiveThirtyEight released a piece that attempts to link those probabilities and help reveal how decisions voters in one state make may reflect on how other voters decide.

The interface is fairly straightforward—I’m looking at this on a desktop, though it does work on mobile—with a bunch of choices at the top and a choropleth map below. There we have a continually divergent gradient, meaning the states aren’t grouped into like bins but we have incredibly subtle differences between similar states. (I should also point out that Maine and Nebraska are the two exceptions to my above description of the Electoral College. They divide their votes by congressional district, whoever wins the district gets that Electoral College vote and then the state overall winner receives the remaining two votes.)

Below that we have a bar chart, showing each state, its more/less likely winner state and the 270 threshold. Below that, we have what I’ve read/heard described as a ball plot. It represents runs of the simulation. As of Thursday morning, the current FiveThirtyEight model says Trump has an 11 in 100 chance of winning, Biden, conversely, an 89-in-100 chance.

But what happens when we start determining the winners of states?

Well, for my non-American readers, this election will feature a large number of voters casting their ballots early. (I voted early by mail, and dropped my ballot off at the county election office.) That’s not normal. And I cannot emphasise this next point enough. We may not know who wins the election Tuesday night or by the time Americans wake up on Wednesday. (Assuming they’re not like me and up until Alaska and Hawaii close their polls. Pro-tip, there’s a potentially competitive Senate race in Alaska, though it’s definitely leaning Republican.)

But, some states vote early and/or by mail every year and have built the infrastructure to count those votes, or the vast majority of them, on or even before Election Day. Three battleground states are in that group: Arizona, Florida, and North Carolina. We could well know the result in those states by midnight on Election Day—though Florida is probably going to Florida.

So what happens with this FiveThirtyEight model if we determine the winners of those three states? All three voted for Trump in 2016, so let’s say he wins them again next week.

We see that the states we’ve decided are now outlined in black. The remainder of the states have seen their colours change as their odds reflect the set electoral choice of our three states. We also now have a rest button that appears only once we’ve modified the map. I’m also thinking that I like FiveyFox, the site’s new mascot? He provides a succinct, plain language summary of what the user is looking at. At the bottom we see what the model projects if Arizona, Florida, and North Caroline vote for Trump. And in that scenario, Trump wins in 58 out of 100 elections, Biden in only 41. Still, it’s a fairly competitive election.

So what happens if by midnight we have results from those three states that Biden has managed to flip them? And as of Thursday morning, he’s leading very narrowly in the opinion polls.

Well, the interface hasn’t really changed. Though I should add below this screenshot there is a button to copy the link to this outcome to your clipboard if, like me, you want to share it with the world or my readers.

As to the results, if Biden wins those three states, Trump has less than a 1-in-100 chance of winning and Biden a greater than 99-in-100.

This is a really strong piece from FiveThirtyEight and it does a great job to show how states are subtly linked in terms of their likelihood to vote one way or the other.

Credit for the piece goes to Ryan Best, Jay Boice, Aaron Bycoffe and Nate Silver.

Where Are the Votes?

I’m not working for a good chunk of the next few days. But, I did want to share with my readers an analysis of Pennsylvania’s missing votes. Broadly, Trump needs to win the Commonwealth of Pennsylvania next week—yes, the US election is now one week away. Though, Pennsylvania allows mail-in ballots postmarked on Election Day to arrive within a few days and still be counted. So we may not have final tallies for the state until the weekend or Monday after Election Day.

Pennsylvania, of course, narrowly voted for Donald Trump over Hillary Clinton in 2016 with 44,000+ votes making the difference. In 2020, polling has consistently placed Joe Biden above Donald Trump by 5+ points. But, can Trump again pull off an upset victory?

I argue that yes, he can. And fairly easily too. (If you want to see why I think Pennsylvania is really Trumpsylvania, I recommend checking out my longer, more in-depth analysis.) So where would the votes come from? I mapped the 2016 difference between votes cast and registered voters, i.e. people who could have voted, but did not for whatever reason. I then coloured the map by the county’s winner in 2016. Red counties voted for Trump by more than 10 points and blue for Clinton by more than 10 points. The purple counties are those that were competitive, plus or minus 10 points for either candidate.

In the purple counties, both candidates will want to drive out as many voters as possible. But in the blue counties, Biden has reliably Democratic votes and in red Trump has reliably Republican votes. So why on Monday did Trump visit Allentown, Lititz, and Martinsburg? Because that’s where those votes are.

Allentown, in Lehigh County, is competitive. In fact, neighbouring Northampton Co. will be a key swing county next week and one I will be following closely as the returns come in. But Lititz, Lancaster Co., and Martinsburg, Blair Co., are in reliably red counties. (Though in my Trumpsylvania piece I argue Lancaster Co. is undergoing a transition to a competitive, albeit lean Republican county.)

In Lancaster Co., which went to Trump by nearly 20 percentage points in 2016, there were still just short of 100,000 voters who didn’t vote in 2016. Not all of those voters would have voted for Trump, but for sake of argument, just say 50% would have. That makes just short of 50,000 potential Trump votes—more than Trump’s entire state margin.

Blair Co. is in the Pennsyltucky region of the state, relatively rural, but in Blair’s case, its county seat Altoona is the state’s 10th largest city. While the total number of votes—and the total number of non-voting voters—are smaller than in Lancaster Co., add up all the available votes and it’s a large number.

If you add up all those red counties’ missing votes, you get a total of just shy of 840,000 missing votes. Far more than enough to drastically swing the Commonwealth to Trump in 2020.

Of course, Biden’s counting on driving out turnout in Philadelphia and Pittsburgh and their suburbs, along with other cities in the state, like Allentown, Scranton, Harrisburg, and Erie. In those blue counties, there were 927,000 missing votes, so the potential for a Biden win is also there.

But, if Democratic voters don’t vote again in 2016, Trump has plenty of potential votes to pick up across the state.

Credit for the piece is mine.

Covid Migration

Yep, Covid-19 remains a thing. About a month or so ago, an article in City Lab (now owned by Bloomburg), looked at the data to see if there was any truth in the notion that people are fleeing urban areas. Spoiler: they’re not, except in a few places. The entire article is well worth a read, as it looks at what is actually happening in migration and why some cities like New York and San Francisco are outliers.

But I want to look at some of the graphics going on inside the article, because those are what struck me more than the content itself. Let’s start with this map titled “Change in Moves”, which examines “the percentage drop in moves between March 11 and June 30 compared to last year”.

Conventionally, what would we expect from this kind of choropleth map. We have a sequential stepped gradient headed in one direction, from dark to light. Presumably we are looking at one metric, change in movement, in one direction, the drop or negative.

But look at that legend. Note the presence of the positive 4—there is an entire positive range within this stepped gradient. Conventionally we would expect to see some kind of red equals drop, blue equals gain split at the zero point. Others might create a grey bin to cover a negative one to positive one slight-to-no change set of states. Here, though, we don’t have that. Nor do we even get a natural split, instead the dark bin goes to a slightly less dark bin at positive four, so everything less than four through -16 is in the darker bin.

Look at the language, too, because that’s where it becomes potentially more confusing. If the choropleth largely focuses on the “percentage drop” and has negative numbers, a negative of a negative would be…a positive. A -25% drop in Texas could easily be mistaken with its use of double negatives. Compare Texas to Nebraska, which had a 2% drop. Does that mean Nebraska actually declined by 2%, or does it mean it rose by 2%?

A clean up in the data definition to, say, “Percentage change in moves from…” could clear up a lot of this ambiguity. Changing the colour scheme from a single gradient to a divergent one, with a split around zero (perhaps with a bin for little-to-no change), would make it clearer which states were in the positive and which were in the negative.

The article continues with another peculiar choice in its bar charts when it explores the data on specific cities.

Here we see the destinations of people moving out of San Francisco, using, as a note explains, requests for quotes as a proxy for the numbers of actual moves. What interests me here is the minimalist take on the bar charts. Note the absence of an axis, which leaves the bars almost groundless for comparison, except that the designer attached data labels to the ends of the bars.

Normally data labels are redundant. The point of a visualisation is to visualise the comparison of data sets. If hyper precise differences to the decimal point are required, tables often are a better choice. But here, there are no axis labels to inform the user as to what the length of a bar means.

It’s a peculiar design decision. If we think of labelling as data ink, is this a more efficient use with data labels than just axis labels? I would venture to say no. You would probably have five axis labels (0–4) and then a line to connect them. That’s probably less ink/pixels than the data labels here. I prefer axis lines to help guide the user from labels up (in this case) through the bars. Maybe the axis lines make for more data ink than the labels? It’s hard to say.

Regardless, this is a peculiar decision. Though, I should note it’s eminently more defensible than the choropleth map, which needs a rethink in both design and language.

Credit for the piece goes to Marie Patino.

Covid-19 Update: 19 October

I took a holiday yesterday. To be honest, I’ll be taking a lot of short holidays as the year winds down on account of not taking any the first three quarters of the year. So expect quite a few quiet Mondays and Fridays in the next few months.

But back to Covid-19. I won’t have a lot to say in this weekly update, because I didn’t write anything last night when I made these. Suffice it to say that things are bad and getting worse. Although, things could also be much worse. And by that I mean, while we are seeing dramatic rises in new cases, we are not yet seeing the rises in deaths that accompanied similar rises in March and April.

Although it should be said that while still low, deaths have been rising. The easiest seen instance of that is in Illinois, below. You can see deaths are rising slowly upwards and the state is approaching 50 deaths per day. While that is still off from the peaks of 100+ earlier this year, that’s still too many people.

New case curves for PA, NJ, DE, VA, and IL.
Death curves for PA, NJ, DE, VA, and IL.

Credit for the pieces is mine.

Mask Up

Well, we made it to Friday. But, if you’ve been following me on the social, you’ll know that Covid is beginning to spread once again in Pennsylvania, New Jersey, Delaware, Virginia, and Illinois. I live in a tower block and I can say that many of my neighbours are no longer wearing masks indoors. Yet mask-wearing is the easiest defence we have against the spread of the coronavirus. So let’s take a look at the most effective types of masks, thankfully charted by xkcd.

Credit for the piece goes to Randall Munroe.

Cheesesteaks and Politics

For those unaware, Pennsylvania matters in the 2020 election. And it has mattered for years as a perennial swing state. There are of course the visits to steel mill cities like Pittsburgh, deindustrialised places like Johnstown, and unions love visits to places in Lackawanna and Luzerne. (You can read more about Pennsylvania as a swing state in my latest analysis here.)

But I want to focus on visits to Philadelphia. Because they inevitably involve the candidate consuming a cheesesteak. The Economist’s sister magazine, 1843, recently published an article on this very subject. And the whole thing is worth a read.

How have I managed to find this relevant to a blog about data visualisation? Well, they included a recipe to help people understand just what goes into the traditional Philadelphia dish.

Personally, I always have to confess, I’ve never been a huge fan. But, I’ll take provolone over whiz any day.

Credit for the piece goes to Jake Read.


After working pretty much non-stop all spring and summer, your humble author finally took a few days off and throw in a bank holiday and you are looking at a five-day weekend. But, because this is 2020 travelling was out of the question and so instead I hunkered down to finish writing/designing an article I have been working on for the last several weeks/few months.

The main write-up—it is a lengthy-ish read so you may want to brew a cup of tea—is over at my data projects site. This is the first project I have really written about for that since spring/summer 2016. Some of my longer-listening readers may recall that the penultimate piece there I wrote about Pennsyltucky was inspired by work I did here at Coffeespoons.

To an extent, so is this piece. I wrote about Trumpsylvania, the political realignment of the state of Pennsylvania. 2016 and the state’s vote for Donald Trump was less an aberration than many think. It was the near-end result of a decades-long transformation of the state’s political geography. And so I looked at the data underlying the shift and how and where it occurred.

And originally, I had a slightly different conclusion as to how this related to Pennsylvania in the upcoming 2020 election. But, the whole 2020 thing made me shift my thinking slightly. But you’ll have to read the whole thing to understand what I’m talking about. I will leave you with one of the graphics I made for the piece. It looks at who won each county in the state, but also whether or not the candidate was able to flip the county. In other words, was Clinton able to flip a Republican county? Was Trump able to flip a Democratic county?

Who won what? Who flipped what?

Let me know what you think.

And of course, many, many thanks to all the people who suffered my ideas, thoughts, and early drafts over the last several weeks. And even more thanks to those who edited it. Any and all mistakes or errors in the piece are all mine and not theirs.

Credit for the piece is mine.

Covid-19 Is Not the Flu Part Augh!

Yesterday, President Trump once again lied to the American public on his social media platforms. He falsely claimed that Covid-19 was nothing worse than the flu, which he falsely claimed sometimes kills more than 100,000 people. Once again we are going to look at the data comparing influenza to the novel coronavirus and the disease it causes, Covid-19. We are going to look at the president’s claim that Covid isn’t much worse than the flu, which sometimes kills more than 100,000 people.

I mean, I don’t know where else to begin. Over the last decade, not in any flu season has the flu killed 100,000 people. In the 2017/18 season, the CDC estimates the flu killed 61,000 Americans. But they also give a range where they feel with 95% confidence that the flu killed between 46,000 and 95,000 Americans. And that is the closest it’s come.

In fact, as of yesterday, Covid-19 has killed 207,000 Americans. That averages out to about 30,000 Americans per month. In other words, Covid-19 has killed each month the same number of people the flu kills in an entire (average) fly season.

And the worst part is that we still haven’t exited the first wave of the coronavirus, because we never got it under control in the first place.

I just don’t know how many more times we have to say this, but because the president keeps lying about it, I feel like I need to say, once again…

Covid-19. Is. Not. The. Flu.

Credit for the piece is mine.

Super Spreading Garden Parties

If you were unaware, in the wee hours of Friday, President Trump announced that he had tested positive for the coronavirus that causes Covid-19. It should be stated in the just three days hence, there is an enormous amount of confusion about the timeline as the White House is not commenting. From the prepared statement initially released it seems Trump first tested positive Wednesday. But that statement was then changed to fit the diagnosis in the wee hours of Friday morning. But just last night I saw reporting saying that test was actually a second, confirmatory test and the president first tested positive earlier Thursday.

The timeline is also important because it would allow us to more definitively determine when the president was infected. The reporting indicates that he caught the virus at a Rose Garden ceremony at the White House to introduce his Supreme Court nominee, Amy Coney Barrett. This BBC graphic does a great job showing who from that ceremony has tested positive with the virus.

The photo also does a great job showing how the seven people there were situated. Six of the seven did not wear masks, only North Carolina Senator Thom Tillis did. There is no social distancing whatsoever. And not shown in this photo are the indoor pre- and post-ceremony festivities where people are in close quarters, mingling, talking, hugging, shaking hands, all also without masks.

It should be noted others not in the photograph, e.g. campaign manager Bill Stepien, communications advisor Hope Hicks, and body man Nicholas Luna, have also now been confirmed positive.

The final point is that this goes to show how much the administration does not take the pandemic seriously. Right now the Covid data for some states indicates that the virus is beginning to spread once again. And so maybe this serves as a good reminder to the general public.

Just because you are socialising outdoors does not make you safe. Outdoors is better than indoors. No gatherings is better than small gatherings is better than large, well attended garden parties. Masks are better than no masks. Rapid result test screening is better than no test screening. Temperature checks are better than no temperature checks.

But the White House only did that last one, temperature checks, in order to protect the president before admitting people to the Rose Garden. Compare that to how they protect the president from other physical threats. He has Secret Service agents standing near him (or riding with him in hermetically sealed SUVs for joyrides whilst he is infected and contagious); he has checkpoints and armed fences further out to secure the perimeter. Scouts and snipers are on the White House roof for longer range threats. And there is a command centre coordinating this with I presume CCTV and aerial surveillance to monitor things even further out. In short, a multi-layered defence keeps the president safe.

If you just take temperatures; if you just hang out outside; if you just wear masks; if you just do one of those things without doing the others I mentioned above, you are putting yourself—and through both pre-diagnostic/pre-symptomatic and asymptomatic spreading, others—at risk.

But on Sunday night, Trump campaign strategist went on television said that now that President Trump has been infected, been hospitalised, he is ready to lead the fight on coronavirus. Great. We need leadership.

But where was that leadership seven months ago when your advisors told you in January about the impact this pandemic would likely have on the United States? Where was the leadership in February saying the coverage was a hoax? Where was it in March when he said the virus would go away in April with the warmer weather? Where was it in April when it didn’t go away, when things continued to get worse? Where was it in May when thousands of Americans were dying? Where was it in June when states began to reopen even though the virus was still out-of-control and testing and contact tracing was less available than necessary to contain outbreaks? Where was it in July? And August? And September? Where was the leadership at a Rose Garden party celebrating the nomination of a Supreme Court justice, a party where at least seven people have been infected and one of them, the president of the United States, has been hospitalised with moderate to severe symptoms?

Credit for the piece goes to the BBC.