I will try to get to my weekly Covid-19 post tomorrow, but today I want to take a brief look at a graphic from the New York Times that sat above the fold outside my door yesterday morning. And those who have been following the blog know that I love print graphics above the fold.
Of the six-column layout, you can see that this graphic gets three, in other words half-a-page width, and the accompany column of text for the article brings this to nearly 2/3 the front page.
When we look more closely at the graphic, you can see it consists of two separate parts, a scatter plot and a line chart. And that’s where it begins to fall apart for me.
The scatter plot uses colour to indicate the vote share that went to Trump. My issue with this is that the colour isn’t necessary. If you look at the top for the x-axis labelling, you will see that the axis represents that same data. If, however, the designer chose to use colour to show the range of the state vote, well that’s what the axis labelling should be for…except there is none.
If the scatter plot used proper x-axis labels, you could easily read the range on either side of the political spectrum, and colour would no longer be necessary. I don’t entirely understand the lack of labelling here, because on the y-axis the scatter plot does use labelling.
On a side note, I would probably have added a US unvaccination rate for a benchmark, to see which states are above and below the US average.
Now if we look at the second part of the graphic, the line chart, we do see labelling for the axis here. But what I’m not fond of here is that the line for counties with large Trump shares, the line significantly exceeds the the maximum range of the chart. And then for the 0.5 deaths per 100,000 line, the dots mysteriously end short of the end of the chart. It’s not as if the line would have overlapped with the data series. And even if it did, that’s the point of an axis line, so the user can know when the data has exceeded an interval.
I really wanted to like this piece, because it is a graphic above the fold. But the more I looked at it in detail, the more issues I found with the graphic. A couple of tweaks, however, would quickly bring it up to speed.
A few weeks back, a good friend of mine sent me this graphic from Statista that detailed the global beer industry. It showed how many of the world’s biggest brands are, in fact, owned by just a few of the biggest companies. This isn’t exactly news to either my friend or me, because we both worked in market research in our past lives, but I wanted to talk about this particular chart.
At first glance we have a tree map, where the area of each “squarified” shape represents, usually, the share of the total. In this case, the share of global beer production in millions of hectolitres. Nothing too crazy there.
Next, colour often will represent another variable, for market share you might often see greens or blues to red that represent the recent historical growth or forecast future growth of that particular brand, company, or market. Here, however, is where the chart begins to breakdown. Colour does not appear to encode any meaningful data. It could have been used to encode data about region of origin for the parent company. Imagine blue represented European companies, red Asian, and yellow American. We would still have a similarly coloured map, sans purple and green,
But we also need to look at the data the chart communicates. We have the production in hectolitres, or the shape of the rectangle. But what about that little rectangle in the lower right corner? Is that supposed to be a different measurement or is it merely a label? Because if it’s a label, we need to compare it to the circles in the upper right. Those are labels, but they change in size whereas the rectangles change only in order to fit the number.
And what about those circles? They represent the share of total beer production. In other words the squares represent the number of hectolitres produced and the circles represent the share of hectolitres produced. Two sides of the same coin. Because we can plot this as a simple scatter plot and see that we’re really just looking at the same data.
We can see that there’s a pretty apparent connection between the volume of beer produced and the share of volume produced—as one would (hopefully) expect. The chart doesn’t really tell us too much other than that there are really three tiers in the Big Six of Breweries. AB Inbev is in own top tier and Heineken is a second separate tier. But Carlsberg and China Resources Snow Breweries are very competitive and then just behind them are Molson Coors and Tsingtao. But those could all be grouped into a third tier.
Another way to look at this would be to disaggregate the scatter plot into two separate bar charts.
You can see the pattern in terms of the shapes of the bars and the resulting three tiers is broadly the same. You can also see how we don’t need colour to differentiate between any of these breweries, nor does the original graphic. We could layer on additional data and information, but the original designers opted not to do that.
But I find that the big glaring miss is that the article makes the point despite the boom in craft beer in recent years, American craft beer is still a very small fraction of global beer production. The text cites a figure that isn’t included in the graphic, probably because they come from two different sources. But if we could do a bit more research we could probably fit American craft breweries into the data set and we’d get a resultant chart like this.
This more clearly makes the point that American craft beer is a fraction of global beer production. But it still isn’t a great chart, because it’s looking at global beer production. Instead, I would want to be able to see the share of craft brewery production in the United States.
How has that changed over the last decade? How dominant are these six big beer companies in the American market? Has that share been falling or rising? Has it been stable?
Well, I went to the original source and pulled down the data table for the Top 40 brewers. I took the Top 15 in beer production, all above 1% share in 2020, and then plotted that against the change in their beer production from 2019 to 2020. I added a benchmark of global beer production—down nearly 5% in the pandemic year—and then coloured the dots by the region of origin. (San Miguel might not seem to fit in Asia by name, but it’s from the Philippines.)
What mine does not do, because I couldn’t find a good (and convenient) source is what top brands belong to which parent companies. That’s probably buried in a report somewhere. But whilst market share data and analysis used to be my job, as I alluded to in the opening, it is no longer and I’ve got to get (virtually) to my day job.
I’ve heard a lot about vaccine hesitancy and resistance lately and I mentioned this on Monday. Subsequently, I thought I would try to make a graphic to try and help people understand where some of these excuses fit on the spectrum of rational to irrational—with claims of people being magnetised off the chart in the land of kooky.
But I also realised there’s a second spectrum, albeit far more limited in range, of selfishness vs altruism. And there is an interesting shift in how those who waited for the most vulnerable to receive their shots first were, initially, altruistic and rational. But now that those populations have received their vaccines, it’s shifted into an irrational selfish behaviour.
Anyway, I made a few sketches and as I was working on it, there was something in the aesthetic quality of the sketches that I couldn’t quite replicate digitally. And so I present the unpolished rough cut of my graphic.
For the fuller explanations, I refer you to my aforementioned post earlier this week. This was just an attempt to visualise the two spectrums.
Every four years (or so) I have to confess that I think fondly back upon my former job, because I worked with a few wonderful colleagues of mine on some data about the Olympics. And the highlight was that we had a model to try and predict the number of medals won by the host country as we were curious about the idea of a host nation bump. In other words, do host countries witness an increase in their medal count relative to their performance in other Olympiads?
We concluded that host nations do see a slight bump in their total medal count and we then forecast that we expected Team GB (the team for Great Britain and Northern Ireland) to win a total of 65 medals. We reached 64 by the final day and it wasn’t until the women’s pentathlon when, in maybe the last event, Team GB won a silver medal bringing its total to 65, exactly in line with our forecast.
Of course we also looked at the data for a number of other things, including if GDP per capita correlated to Olympic performance. We also looked at BMI and that did yield some interesting tidbits. But at the end of the day it was the medal forecast that thrilled me in the summer of 2012.
So yeah, today’s a shameless plug for some old work of mine. But I’m still proud of it two olympiads later.
Those of my readers who know me well know that I’ve long been a fan of Star Trek. And so we’ve made it to the weekend. And over at Indexed earlier this month, well, we have a great science fiction comparison.
Here in the states we have a bank holiday Monday, so Star Trek is just a great way to start a holiday weekend.
As I prepared to reconnect and rejoin the world, I spent most of the weekend prior to full vaccination cleaning and clearing out my flat of things from the past 14 months. One thing I meant to do more with was printed pieces I saw in the New York Times. Interesting pages, front pages in particular, have been piling up and before recycling them all, I took some photos of the backlog. I’ll try to publish more of them in the coming weeks and months.
You may recall this time last month I wrote about a piece from the New York Times that examined the politicisation of vaccinations. I meant to get around to the print version, but didn’t, so let’s do it now.
I noted last time the use of ellipses for the title and the lack of value scales on the x-axis. Those did not change from the online version. But look at the y-axis.
For the print piece I noted how the labels were placed inside the chart. I wondered at the time—but didn’t write about—how perhaps that could have been a technical limitation for the web. But here we can see the labels still inside. It was a deliberate design decision.
Keeping with the labelling, I also pointed out Wyoming being outside the plot and it is here too, but I finally noted the lack of a label for zero on the first chart. Here the zero does appear, as I would have placed it. That does make me wonder if the lack of zero online was a technical/development issue.
Finally, something very subtle. At first, I didn’t catch this and it wasn’t until I opened the image above in Photoshop. The web version I noted the use of tints, or lighter shades, for two different blues and two different reds. When I looked at the print, I saw only one red and one blue. But they were in fact different, and it wasn’t until I had zoomed in on the photo I took when I could see the difference.
The dots do have two different blues. But it’s very subtle. Same with the red.
So all in all the piece is very similar to what we looked at last month, but there were a few interesting differences. I wonder if the designers had an opportunity to test the blues/reds prior to printing. And I wonder if the zero label was an issue for developers.
Credit for the piece goes to Lauren Leatherby and Guilbert Gates.
If all goes according to plan, I should be receiving my second dose of Pfizer later this afternoon. Then it’s two more weeks until I’m fully vaccinated and ready to rejoin the world. But what kind of world will be rejoining? The allergy plagued one looking at the calendar. And that’s why this post from Indexed by Jessica Hagy made me laugh.
Yesterday I wrote my usual weekly piece about the progress of the Covid-19 pandemic in the five states I cover. At the end I discussed the progress of vaccinations and how Pennsylvania, Virginia, and Illinois all sit around 25% fully vaccinated. Of course, I leave my write-up at that. But not everyone does.
This past weekend, the New York Times published an article looking at the correlation between Biden–Trump support and rates of vaccination. Perhaps I should not be surprised this kind of piece exists, let alone the premise.
From a design standpoint, the piece makes use of a number of different formats: bars, lines, choropleth maps, and scatter plots. I want to talk about the latter in this piece. The article begins with two side by side scatter plots, this being the first.
The header ends in an ellipsis, but that makes sense because the next graphic, which I’ll get to shortly, continues the sentence. But let’s look at the rest of the plot.
Starting with the x-axis, we have a fairly simple plot here: votes for the candidates. But note that there is no scale. The header provides the necessary definition of being a share of the vote, but the lack of minimum and maximum makes an accurate assessment a bit tricky. We can’t even be certain that the scales are consistent. If you recall our choropleth maps from the other day, the scale of the orange was inconsistent with the scale of the blue-greys. Though, given this is produced by the Times, I would give them the benefit of the doubt.
Furthermore, we have five different colours. I presume that the darkest blues and reds represent the greatest share. But without a scale let alone a legend, it’s difficult to say for certain. The grey is presumably in the mixed/nearly even bin, again similar to what I described in the first post about choropleths from my recent string.
Finally, if we look at the y-axis, we see a few interesting decisions. The first? The placement of the axis labels. Typically we would see the labelling on the outside of the plot, but here, it’s all aligned on the inside of the plot. Intriguingly, the designers took care for the placement—or have their paragraph/character styles well set—as the text interrupts the axis and grid lines, i.e. the text does not interfere with the grey lines.
The second? Wyoming. I don’t always think that every single chart needs to have all the outliers within the bounds of the plot. I’ve definitely taken the same approach and so I won’t criticise it, but I wonder what the chart would have looked like if the maximum had been 35% and the grid lines were set at intervals of 5%. The tradeoff is likely increased difficulty in labelling the dots. And that too is a decision I’ve made.
Third, the lack of a zero. I feel fairly comfortable assuming the bottom of the y-axis is zero. But I would have gone ahead and labelled it all the same, especially because of how the minimum value for the axis is handled in the next graphic.
Speaking of, moving on to the second graphic we can see the ellipsis completes the sentence.
We otherwise run into similar issues. Again, there is a lack of labelling on the x-axis. This makes it difficult to assess whether we are looking at the same scale. I am fairly certain we are, because when I overlap the graphics I can see that the two extremes, Wyoming and Vermont, look to exist on the same places on the axis.
We also still see the same issues for the y-axis. This time the axis represents vaccination rates. I wish this graphic made a little clearer the distinction between partial and full vaccination rates. Partial is good, but full vaccination is what really matters. And while this chart shows Pennsylvania, for example, at over 40% vaccinated, that’s misleading. Full vaccination is 15 points lower, at about 25%. And that’s the number that needs to be up in the 75% range for herd immunity.
But back to the labelling, here the minimum value, 20%, is labelled. I can’t really understand the rationale for labelling the one chart but not the other. It’s clearly not a spacing issue.
I have some concerns about the numbers chosen for the minimum and maximum values of the y-axis. However, towards the middle of the article, this basic construct is used to build a small multiples matrix looking at all 50 states and their rates of vaccination. More on that in a moment.
My last point about this graphic is on the super picky side. Look at the letter g in “of residents given”. It gets clipped. You can still largely read it as a g, but I noticed it. Not sure why it’s happening, though.
So that small multiples graphic I mentioned, well, see below.
Note how these use an expanded version of the larger chart. The y-minimum appears to be 0%, but again, it would be very helpful if that were labelled.
Also for the x-axis in all the charts, I’m not sure every one needs the Biden–Trump label. After all, not every chart has the 0–60% range labelled, but the beginning of each row makes that clear.
In the super picky, I wish that final row were aligned with the four above it. I find it super distracting, but that’s probably just me.
Overall, this is a strong piece that makes good use of a number of the standard data visualisation forms. But I wish the graphics were a bit tighter to make the graphics just a little clearer.
Credit for the piece goes to Danielle Ivory, Lauren Leatherby and Robert Gebeloff.
Perseverance landed on Mars on 18 February, almost a month ago. The video and photography the rover has already sent back has been stunning. We all know she is the most capable rover yet landed on the Red Planet, but what we all want to know is how cute is Perseverance compared to her predecessors?
This week I’m on deadline for the magazine I produce. Technically, the files go out Monday, but I spend Monday double/triple-checking things and assembling all the packages I need and so everything really needs to be done the day before, for this quarter, that’s today. Regardless, that means little sleep and craziness.
Over at Indexed, Jessica Hagy nailed how I feel this time every quarter with a simple scatter plot entitled “‘You Look Tired’.”