Happy Friday, all. Apologies for the lack of posting yesterday, I wasn’t feeling well and sitting in front of my computer typing stuff up wasn’t happening. But now the weekend is nearly upon us and to get in the mood I wanted to share this great dot plot from xkcd. It captures something I’ve definitely been thinking about.
For example, on 3 March 2020, I had a friend over to my flat for drinks and to watch the Super Tuesday Democratic primary results come in. Tomorrow, if all goes according to plan, will be the first time I’ve had company over in 15 months.
In essence we have check boxes of the normal things we did in the before times and we’re just checking them off one by one until we can feel normal again.
Just please don’t contract a novel bat virus again.
In case you weren’t aware, the US election is in less than a week, five days. I had written a long list of issues on the ballot, but it kept getting longer and longer so I cut it. Suffice it to say, Americans are voting on a lot of issues this year. But a US presidential election is not like many other countries’ elections in that we use the Electoral College.
For my non-American readers, the Electoral College, very briefly, was created by the country’s founding fathers (Washington, Jefferson, Adams, Franklin, et al.) to do two things. One, restrict selection of the American president to a class of individuals who theoretically had a broader/deeper understanding of the issues—but who also had vested interests in the outcome. The founders did not intend for the American people to elect the president. The second feature of the Electoral College was to prevent the largest states from dominating smaller states in elections. Why else would Delaware and Rhode Island surrender their sovereignty to join the new United States if Virginia, Pennsylvania, and New York make all the decisions? (The founders went a step further and added the infamous 3/5 clause, but that’s another post.)
So Americans don’t elect the president directly and larger states like California, New York, and Texas, have slightly less impact than smaller states like Wyoming, Vermont, and Delaware. Each state is allotted a number of Electoral College votes and the key is to reach 270. (Maybe another time I’ll get into the details of what happens in a 269–269 tie.) Many Americans are probably familiar with sites like 270 To Win, where you can determine the outcome of the election by saying who won each state. But, even though the US election is really 50 different state elections, common threads and themes run through all those states and if one candidate or another wins one state, it makes winning or losing other states more or less likely. FiveThirtyEight released a piece that attempts to link those probabilities and help reveal how decisions voters in one state make may reflect on how other voters decide.
The interface is fairly straightforward—I’m looking at this on a desktop, though it does work on mobile—with a bunch of choices at the top and a choropleth map below. There we have a continually divergent gradient, meaning the states aren’t grouped into like bins but we have incredibly subtle differences between similar states. (I should also point out that Maine and Nebraska are the two exceptions to my above description of the Electoral College. They divide their votes by congressional district, whoever wins the district gets that Electoral College vote and then the state overall winner receives the remaining two votes.)
Below that we have a bar chart, showing each state, its more/less likely winner state and the 270 threshold. Below that, we have what I’ve read/heard described as a ball plot. It represents runs of the simulation. As of Thursday morning, the current FiveThirtyEight model says Trump has an 11 in 100 chance of winning, Biden, conversely, an 89-in-100 chance.
But what happens when we start determining the winners of states?
Well, for my non-American readers, this election will feature a large number of voters casting their ballots early. (I voted early by mail, and dropped my ballot off at the county election office.) That’s not normal. And I cannot emphasise this next point enough. We may not know who wins the election Tuesday night or by the time Americans wake up on Wednesday. (Assuming they’re not like me and up until Alaska and Hawaii close their polls. Pro-tip, there’s a potentially competitive Senate race in Alaska, though it’s definitely leaning Republican.)
But, some states vote early and/or by mail every year and have built the infrastructure to count those votes, or the vast majority of them, on or even before Election Day. Three battleground states are in that group: Arizona, Florida, and North Carolina. We could well know the result in those states by midnight on Election Day—though Florida is probably going to Florida.
So what happens with this FiveThirtyEight model if we determine the winners of those three states? All three voted for Trump in 2016, so let’s say he wins them again next week.
We see that the states we’ve decided are now outlined in black. The remainder of the states have seen their colours change as their odds reflect the set electoral choice of our three states. We also now have a rest button that appears only once we’ve modified the map. I’m also thinking that I like FiveyFox, the site’s new mascot? He provides a succinct, plain language summary of what the user is looking at. At the bottom we see what the model projects if Arizona, Florida, and North Caroline vote for Trump. And in that scenario, Trump wins in 58 out of 100 elections, Biden in only 41. Still, it’s a fairly competitive election.
So what happens if by midnight we have results from those three states that Biden has managed to flip them? And as of Thursday morning, he’s leading very narrowly in the opinion polls.
Well, the interface hasn’t really changed. Though I should add below this screenshot there is a button to copy the link to this outcome to your clipboard if, like me, you want to share it with the world or my readers.
As to the results, if Biden wins those three states, Trump has less than a 1-in-100 chance of winning and Biden a greater than 99-in-100.
This is a really strong piece from FiveThirtyEight and it does a great job to show how states are subtly linked in terms of their likelihood to vote one way or the other.
Credit for the piece goes to Ryan Best, Jay Boice, Aaron Bycoffe and Nate Silver.
Well, we made it to Friday. But, if you’ve been following me on the social, you’ll know that Covid is beginning to spread once again in Pennsylvania, New Jersey, Delaware, Virginia, and Illinois. I live in a tower block and I can say that many of my neighbours are no longer wearing masks indoors. Yet mask-wearing is the easiest defence we have against the spread of the coronavirus. So let’s take a look at the most effective types of masks, thankfully charted by xkcd.
Earlier this week, some of the work work my team does was published. We produced a one-page summary of a far larger and more comprehensive (relative to the scope of the summary) survey of consumers during the Covid Recession. I will spare you the details of recreating existing templates from scratch and the design decisions that went into that bit—neither insignificant nor unsubstantial—and rather focus on the one graphic we designed.
The broad thrust of the summary is that while overall we are beginning to see some job recovery, that the recovery is uneven and that, in fact, those below the age of 36 are getting hit pretty hard (my words, not the authors). That while in some industries the young are recovering in good numbers, in other industries, industries with a larger share of the youth population, young people are still losing jobs. Then we broke those top line numbers out by industries in the below graphic captured by screenshot.
There are a couple of things from a design side to discuss. We had about two or three days from when we started the project to develop some ideas and then execute and produce the summary. And as I noted above, that also included quite a bit of time in emulating existing documents and building ourselves a new template should we need to do something similar in the future.
But for that graphic in particular, there’s one thing I wanted to highlight: the lack of values on the axis. The challenge here was that the data displayed is people not working. And when we compared this time period (Wave 3) to the earlier waves, we were looking for declines. And so if we going to say that 36+ are gaining construction jobs, that would be -2% value and the youth are about a -13% increase. If you are doing a bit of a double-take at a negative increase, so did the team. Ultimately, we used the data to generate the chart, but then opted for qualitative labelling on the axes. They simply point that in one direction, youth are either gaining or losing jobs, and the same for the 36+. To reinforce this idea, we also added some descriptors in the far corner of each quadrant that said whether the age groups were gaining or losing jobs.
Despite the unusual design decisions I took in the graphic, I’m really proud of this piece especially given its tight turnaround. It shows in almost real-time how fractured the recovery—is this a recovery?—is at this point.
Credit for the piece goes to the team on this, Tom Akana, Kate Gamble, Natalie Spingler, and myself.
Baseball for the Red Sox starts on Friday. Am I glad baseball is back? Yes?
I love the sport and will be glad that it’s back on the air to give me something to watch. But the But the way it’s being done boggles the mind. Here today I don’t want to get into the Covid, health, and labour relations aspect of the game. But, as the title suggests, I want to look at a graphic that looks at just how bad the Red Sox could be this (shortened) year. And over at FiveThirtyEight, they created a model to evaluate teams’ starting rotations on an ongoing basis.
Form wise, this isn’t too difficult than what we looked at yesterday. It’s a dot plot with the dots representing individual pitchers. The size of the dots represents their number of total starts. This is an important metric in their model, but as we all know size is a difficult attribute for people to compare and I’m not entirely convinced it’s working here. Some dots are clearly smaller than others, but for most it’s difficult for me to clearly tell.
Colour is just tied to the colour of the teams. Necessary? Not at all. Because the teams are not compared on the same plot, they could all be the same colour. If, however, an eventual addition were made that plot the day’s matchups on one line, then colour would be very much appropriate.
I like the subtle addition of “Better” at the top of the plots to help the user understand the constructed metric. Otherwise the numbers are just that, numbers that don’t mean anything.
Overall a solid piece. And it does a great job of showing just how awful the Red Sox starting rotation is going to be. Because I know who Nate Eovaldi is. And I’ve heard of Martin Perez. Ryan Weber I only know through largely pitching in relief last year. And after that? Well, not on this graphic, but we have Eduardo Rodriguez who had corona and, while he has recovered, nobody knows how that will impact people in sports. There’s somebody named Hall who I have never heard of. Then we have Brian Johnson, a root for the guy story of beating the odds to reach the Major Leagues but who has been inconsistent. Then…it is literally a list of relief pitchers.
We dumped the salary of Mookie Betts and David Price and all we got was basically a tee-shirt saying “We still need a pitcher or three”.
Okay, so we’re going to post some more of my work today, but it’s not about cases and deaths. Instead, I took some data produced by my colleagues and thought that it could do for a small transformation from a table into a chart. The original table can be found in their report on consumer payment options during the Covid-19 pandemic.
After setting the kettle on for some tea this morning we started on their Table 1. Thirty minutes later and a cup of Irish Breakfast consumed, I had transformed it into this:
Obviously I changed the language/title a little bit. But the original was too long and didn’t fit. Also this is my blog, so my rules. The visualisation improves upon the table in a number of ways, but tables do have their place. Tables are great for organising information. Find a column header and a row header and you can get any specific data point. But, if you want to make a comparison between two data points or several of them, a chart is the way to go. Now, you may lose some precision. For example, do I know to the decimal point or to the tenths even what one of those dots represents? Nope. But at a glance, can I see which dots are below the overall respondents? Yep. It’s abundantly clear that those earning less than $40,000 per year have a greater availability of debit cards than the other groups shown.
And after all, I couldn’t have made this graphic without that table.
Full disclosure, as alluded to above, I work at the Federal Reserve Bank of Philadelphia. But I had nothing to do with the data, report, or presentation thereof.
A few weeks ago here in the United States, we had the mass shootings in El Paso, Texas and Dayton, Ohio. The Washington Post put together a piece looking at how mass shootings have changed since 1966. And unfortunately one of the key takeaways is that since 1999 they are far too common.
The biggest graphic from the article is its timeline.
It captures the total number of people killed per event. But, it also breaks down the shootings by admittedly arbitrary time periods. Here it looks at three distinct ones. The first begins at the beginning of the dataset: 1966. The second begins with Columbine High School in 1999, when two high school teenagers killed 13 fellow students. Then the third begins with the killing of 9 worshippers in a African Episcopal Methodist church in Charlestown, South Carolina.
Within each time period, the peaks become more extreme, and they occur more frequently. The beige boxes do a good job of calling out just how frequently they occur. And then the annotations call out the unfortunate historic events where record numbers of people were killed.
The above is a screenshot of a digital presentation. However, I hope the print piece did a full-page printing of the timeline and showed the entire timeline in sequence. Here, the timeline is chopped up into two separate lines. I like how the thin grey rule breaks the second from the third segment. But the reader loses the vertical comparison of the bars in the first segment to those in the second and third.
Later on in the graphic, the article uses a dot plot to examine the age of the mass shooters. There it could have perhaps used smaller dots that did not feature as much overlap. Or a histogram could have been useful as infrequently used type of chart.
Lastly it uses small multiples of line charts to show the change in frequency of particular types of locations.
Overall it’s a solid piece. But the timeline is its jewel. Unfortunately, I will end up talking about similar graphics about mass shootings far too soon in the future.
Credit for the piece goes to Bonnie Berkowitz, Adrian Blanco, Brittany Renee Mayes, Klara Auerbach, and Danielle Rindler.
Yesterday we looked at the New York Times coverage of some water stress climate data and how some US cities fit within the context of the world’s largest cities. Well today we look at how the Washington Post covered the same data set. This time, however, they took a more domestic-centred approach and focused on the US, but at the state level.
Both pieces start with a map to anchor the piece. However, whereas the Times began with a world map, the Post uses a map of the United States. And instead of highlighting particular cities, it labels states mentioned in the following article.
Interestingly, whereas the Times piece showed areas of No Data, including sections of the desert southwest, here the Post appears to be labelling those areas as “arid area”. We also see two different approaches to handling the data display and the bin ranges. Whereas the Times used a continuous gradient the Post opts for a discrete gradient, with sharply defined edges from one bin to the next. Of course, a close examination of the Times map shows how they used a continuous gradient in the legend, but a discrete application. The discrete application makes it far easier to compare areas directly. Gradients are, by definition, harder to distinguish between relatively close areas.
The next biggest distinguishing characteristic is that the Post’s approach is not interactive. Instead, we have only static graphics. But more importantly, the Post opts for a state-level approach. The second graphic looks at the water stress level, but then plots it against daily per capita water use.
My question is from the data side. Whence does the water use data come? It is not exactly specified. Nor does the graphic provide any axis limits for either the x- or the y-axis. What this graphic did make me curious about, however, was the cause of the high water consumption. How much consumption is due to water-intensive agricultural purposes? That might be a better use of the colour dimension of the graphic than tying it to the water stress levels.
The third graphic looks at the international dimension of the dataset, which is where the Times started.
Here we have an interesting use of area to size population. In the second graphic, each state is sized by population. Here, we have countries sized by population as well. Except, the note at the bottom of the graphic notes that neither China nor India are sized to scale. And that make sense since both countries have over a billion people. But, if the graphic is trying to use size in the one dimension, it should be consistent and make China and India enormous. If anything, it would show the scale of the problem of being high stress countries with enormous populations.
I also like how in this graphic, while it is static in nature, breaks each country into a regional classification based upon the continent where the country is located.
Overall this, like the Times piece, is a solid graphic with a few little flaws. But the fascinating bit is how the same dataset can create two stories with two different foci. One with an international flavour like that of the Times, and one of a domestic flavour like this of the Post.
Credit for the piece goes to Bonnie Berkowitz and Adrian Blanco.
Most of Earth’s surface is covered by water. But, as any of you who have swallowed seawater can attest, it is not exactly drinkable. Instead, mankind evolved to drink freshwater. And as some new data suggests, that might not be as plentiful in the future because some areas are already under extreme stress. Yesterday the New York Times published an article looking at the findings.
The piece leads with a large map showing the degree of water stress across the globe. It uses a fairly standard yellow to red spectrum, but note the division of the labels. The High range dwarfs that of the Low, but instead of continuing on, the Extremely High range then shrinks. Unfortunately, the article does not go into the methodology behind that decision and it makes me wonder why the difference in bin sizes.
Of course, any big map makes one wonder about their own local condition. How stressed is Philadelphia, for example? Thankfully, the designers kept that in mind and created an interactive dot plot that marks where each large city falls according to the established bins.
At this scale, it is difficult to find a particular city. I would have liked a quick text search ability to find Philadelphia. Instead, I had to open the source code and search the text there for Philadelphia. But more curiously, I am not certain the graphic shows what the subheading says.
To understand what a third of major urban areas is, we would need to know the total number of said cities. If we knew that, a small number adjacent to the categorisation could be used to create a quick sum. Or a separate graphic showing the breakdown strictly by number of cities could also work. Because seeing where each city falls is both interesting and valuable, especially given how the shown cities are mentioned in the text—it just doesn’t fit the subheading.
But, for those of you from Chicago, I included my former home as a different screenshot. Though I didn’t need to search the source code, because I just happened across it scrolling through the article.
Credit for the piece goes to Somini Sengupta and Weiyi Cai.
This piece was published Monday, so it’s one round out of date, but it still holds true. It looks at the betting odds of each of the candidates looking to enter No. 10 Downing Street. And yeah, it’s going to be Boris.
The thing that strikes me as odd about this piece however, is note the size of the circles. Why are they larger for Boris Johnson and Rory Stewart? It cannot be proportional to their odds of victory or else Boris’ head would be…even bigger. Is that even possible? Maybe it relates to their predicted placement of first and second, the two of which go to the broader Tory party for a vote. It’s really unclear and deserves some explanation.
The graphic also includes a standard line chart. It falls down because of spaghettification in that all those also rans have about the same odds, i.e. slim, to beat Boris.
Perhaps the most interesting thing to follow is who will be the other person on the ballot. But then who remembers Andrea Leadsom was the runner up to Theresa May?
Credit for the piece goes to the Economist graphics department.