The May Jobs Report

Last Friday, the government released the labour statistics from April and they showed a weaker rebound in employment than many had forecasted. When I opened the door Saturday morning, I got to see the numbers above the fold on the front page of the New York Times.

Welcome to the weekend

What I enjoyed about this layout, was that the graphic occupied half the above the fold space. But, because the designers laid the page out using a six-column grid, we can see just how they did it. Because this graphic is itself laid out in the column widths of the page itself. That allows the leftmost column of the page to run an unrelated story whilst the jobs numbers occupy 5/6 of the page’s columns.

If we look at the graphic in more detail, the designers made a few interesting decisions here.

Jobs in detail

First, last week I discussed a piece from the Times wherein they did not use axis labels to ground the dataset for the reader. Here we have axis labels back, and the reader can judge where intervening data points fall between the two. For attention to detail, note that under Retail, Education and health, and Business and professional services, the “illion” in -2 Million was removed so as not to interfere with legibility of the graphic, because of bars being otherwise in the way.

My issue with the axis labels? I have mentioned in the past that I don’t think a designer always needs to put the maximum axis line in place, especially when the data point darts just above or below the line. We see this often here, for example Construction and Manufacturing both handle it this way for their minimums. This works for me.

But for the column above Construction, i.e. State and local government and Education and health, we enter the space where I think the graphic needs those axis lines. For Education and health, it’s pretty simple, the red losses column looks much closer to a -3 million value than a -2 million value. But how close? We cannot tell with an axis line.

And then under State and local government we have the trickier issue. But I think that’s also precisely why this could use some axis lines. First, almost all the columns fall below the -1 million line. This isn’t the case of just one or two columns, it’s all but two of them. Second, these columns are all fairly well down below the -1 million axis line. These aren’t just a bit over, most are somewhere between half to two-thirds beyond. But they are also not quite nearly as far to -2 million as the ones we had in the Education and health growth were near to -3 million.

So why would I opt to have an axis line for State and local governments? The designers chose this group to add the legend “Gain in April”. That could neatly tuck into the space between the columns and the axis line.

Overall it’s a solid piece, but it needs a few tweaks to improve its legibility and take it over the line.

Credit for the piece goes to Ella Koeze and Bill Marsh.

Off the Axis

Two Fridays ago, I opened the door and found my copy of the New York Times with a nice graphic above the fold. This followed the announcement from the White House of aggressive targets to reduce greenhouse gas emissions

In general, I love seeing charts and graphics above the fold. As an added bonus, this set looked at climate data.

Need to see more downward trending lines.

But there are a few things worth pointing out.

First from a data side, this chart is a little misleading. Without a doubt, carbon dioxide represents the greatest share of greenhouse gasses, according to the US Environmental Protection Agency (EPA) it was 76% in 2010. Methane contributes the next largest share at 16%. But the labelling should be a little clearer here. Or, perhaps lead with a small chart showing CO2’s share of greenhouse gasses and from there, take a look at the largest CO2 emitters per person.

Second, where are the axis labels?

I will probably have more on this at a later date, but neither the bar chart nor the line charts have axis labels. Now the designers did choose to label the beginning value for the lines and the bars, but this does not account for the minimums or maximums. (It also assumes that the bottom of the lines is zero.)

For example, we can see that China began 1990 with emissions at 3.4 billon metric tons. The annotation makes clear that China’s aggregate emissions surpassed those of the US in 2004. But where do they peak? What about developing countries?

If I pull out a ruler and draw some lines I can roughly make some height comparisons. But, an easier way would be simply to throw some dotted lines across the width of the page, or each line chart.

This piece takes a big swing at presenting the challenge of reducing emissions, but it fails to provide the reader with the proper—and I think necessary—context.

Credit for the piece goes to Nadja Popovich and Bill Marsh.

Covid-19: A Global Update

I’ve been trying to limit the amount of Covid-19 visualisations I’ve been covering. But on Sunday this image landed at my front door, above the fold on page 1 of the New York Times. And it dovetails nicely with our story about the pandemic’s impact on Pennsylvania, New Jersey, Delaware, Virginia, and Illinois.

Some not so great looking numbers across the globe.

For most of 2020, the United States was one of the worst hit countries as the pandemic raged out of control. Since January 2021, however, the United States has slowly been coming to grips with the virus and the pandemic. Its rate is now solidly middle of the pack—no longer is America first.

And if you compare the chart at the bottom to those that I’ve been producing, you can clearly see how our five states have really gotten this most recent wave under control to the point of declining rates of new cases.

However, you’ve probably heard the horror stories from India and Brazil where things are not so great. It’s countries like those that account for the continual increase in new cases at a global level.

Credit for the piece goes to Lazaro Gamio, Bill Marsh, and Alexandria Symonds.

A Visual History of the International Space Station

When I was in high school, the United States would regularly spend space shuttles into orbit to help build this new thing: the International Space Station (ISS). In the aftermath of the Cold War, the nations of the world joined together to commit to building an orbital space station.

There was of course a time before the ISS, and I can recall many jokes being made about Mir, the Soviet then Russian space station. And before Mir there were other, though none as long-lasting. But I digress, we’re here today because recently Canadian astronaut Chris Hadfield tweeted a graphic made by Peter Batenburg that visually captures the history of the International Space Station.

Space, the final frontier…

I think my favourite element is the graphical representation of the expansion of the ISS in terms of its volume. I’ve seen similar sort of graphics showing the addition of modules and new components, but I can’t recall seeing the amount of space where people can live and work being captured.

But really, the whole piece is worthy of sitting down and enjoying. After all the ISS is only about 22 years old. But there are questions of how much longer it will remain in orbit. I’m not aware of any concrete plans to fund it beyond 2030 nor any plans for an eventual replacement.

We can only hope that the ISS and its successor remain an area that fosters international cooperation for the next thirty years.

Credit for the piece goes to Peter Batenburg.

The Super Short European Super League

Sunday night, news broke that a number of European football clubs were creating a rogue league, the European Super League. My British and European readers—and Americans who follow football—will know the names of Manchester United, Liverpool, AC Milan, Juventus, Real Madrid, and the others.

To put this in perspective for my American readers, imagine the Yankees, Dodgers, Red Sox, Astros, Padres, Mets, Cardinals, Phillies, Angels, and Nationals saying that they were leaving Major League Baseball to go and form their own new baseball league. That they were doing so to “save the sport”. But in so doing, they also guarantee they all make the playoffs every year.

My frequent readers and those who know me will know I’m a fan of the Boston Red Sox. I should point out that the owner of the Red Sox, John Henry, owns both the Red Sox and Liverpool through his company Fenway Sports Group.

Of course, the analogy doesn’t quite hold up, because there are some significant differences between American sports and European football. Relegation is a big one. Personally, I wish American sports had some way of using relegation to incentivise teams to not intentionally suck.

The basic premise of relegation. Take English football. You have four levels of play and in theory any team can exist in any level. Each year, the worst teams move from their current level down one whilst the best teams move up. And for the top level, the top teams get to compete in lucrative European-wide matches. That is a bit simplistic, but imagine that at the end of last year, the Pirates, Rangers, Tigers, and Red Sox became AAA minor league teams and the four best AAA minor league teams became MLB teams. MLB teams would theoretically try to do everything they could to stay in the MLB and not drop to AAA, because that would mean a loss of money. After all, the Yankees would no longer be heading to Fenway nor the White Sox to Detroit. Would seeing the Detroit Tigers play the Woo Sox really be worth the ticket prices you pay at Comerica Park?

But that’s not how American sports work. And so a few American owners, namely those of Manchester United, Arsenal, and Liverpool, want to ensure a steady stream of money. By creating their own league where their teams cannot be relegated, they guarantee that revenue stream.

In other words, this is all about the owners of these Super League teams making even more money.

Because, during the last year, teams have been hurting without fans in attendance. And that gets us to why I can write this up. Because the BBC in an article about this new league addressed the fact that most of these teams are heavily in debt.

This graphic, however, is a bit misleading. Look at Liverpool. There is no available data for how much financial debt the club holds. So why is it placed between Chelsea and Manchester City? It could well have more debt than Tottenham. Liverpool should really be left off this chart and included in the note, because its placement suggests that it has little debt, when that may well not be the case. This is a really misleading graphic when it comes to how Liverpool fits with the other 11 clubs.

From a design standpoint, I’m also not clear on why the x-axis line extends beyond the labels for £-200m and £600m.

I’m not going to touch all the data labels. That’s for another piece I’ve been working on off and on for a little while now.

At this point I should point out that I was going to post this article later, but in the last 18 hours or so the whole thing has fallen apart as the English teams, followed by the others, have been dropping out under immense pressure from the sport and their fans. To bring back my analogy above, imagine MLB retaliating and saying that if those teams created their own league, the players would not be allowed to play in any other matches and the teams would be locked out from all other competitive baseball games. It’s a mess.

Credit for the piece goes to the BBC graphics department.

Politicising Vaccinations

Yesterday I wrote my usual weekly piece about the progress of the Covid-19 pandemic in the five states I cover. At the end I discussed the progress of vaccinations and how Pennsylvania, Virginia, and Illinois all sit around 25% fully vaccinated. Of course, I leave my write-up at that. But not everyone does.

This past weekend, the New York Times published an article looking at the correlation between Biden–Trump support and rates of vaccination. Perhaps I should not be surprised this kind of piece exists, let alone the premise.

From a design standpoint, the piece makes use of a number of different formats: bars, lines, choropleth maps, and scatter plots. I want to talk about the latter in this piece. The article begins with two side by side scatter plots, this being the first.

Hesitancy rates compared to the election results

The header ends in an ellipsis, but that makes sense because the next graphic, which I’ll get to shortly, continues the sentence. But let’s look at the rest of the plot.

Starting with the x-axis, we have a fairly simple plot here: votes for the candidates. But note that there is no scale. The header provides the necessary definition of being a share of the vote, but the lack of minimum and maximum makes an accurate assessment a bit tricky. We can’t even be certain that the scales are consistent. If you recall our choropleth maps from the other day, the scale of the orange was inconsistent with the scale of the blue-greys. Though, given this is produced by the Times, I would give them the benefit of the doubt.

Furthermore, we have five different colours. I presume that the darkest blues and reds represent the greatest share. But without a scale let alone a legend, it’s difficult to say for certain. The grey is presumably in the mixed/nearly even bin, again similar to what I described in the first post about choropleths from my recent string.

Finally, if we look at the y-axis, we see a few interesting decisions. The first? The placement of the axis labels. Typically we would see the labelling on the outside of the plot, but here, it’s all aligned on the inside of the plot. Intriguingly, the designers took care for the placement—or have their paragraph/character styles well set—as the text interrupts the axis and grid lines, i.e. the text does not interfere with the grey lines.

The second? Wyoming. I don’t always think that every single chart needs to have all the outliers within the bounds of the plot. I’ve definitely taken the same approach and so I won’t criticise it, but I wonder what the chart would have looked like if the maximum had been 35% and the grid lines were set at intervals of 5%. The tradeoff is likely increased difficulty in labelling the dots. And that too is a decision I’ve made.

Third, the lack of a zero. I feel fairly comfortable assuming the bottom of the y-axis is zero. But I would have gone ahead and labelled it all the same, especially because of how the minimum value for the axis is handled in the next graphic.

Speaking of, moving on to the second graphic we can see the ellipsis completes the sentence.

Vaccination rates compared to the election results

We otherwise run into similar issues. Again, there is a lack of labelling on the x-axis. This makes it difficult to assess whether we are looking at the same scale. I am fairly certain we are, because when I overlap the graphics I can see that the two extremes, Wyoming and Vermont, look to exist on the same places on the axis.

We also still see the same issues for the y-axis. This time the axis represents vaccination rates. I wish this graphic made a little clearer the distinction between partial and full vaccination rates. Partial is good, but full vaccination is what really matters. And while this chart shows Pennsylvania, for example, at over 40% vaccinated, that’s misleading. Full vaccination is 15 points lower, at about 25%. And that’s the number that needs to be up in the 75% range for herd immunity.

But back to the labelling, here the minimum value, 20%, is labelled. I can’t really understand the rationale for labelling the one chart but not the other. It’s clearly not a spacing issue.

I have some concerns about the numbers chosen for the minimum and maximum values of the y-axis. However, towards the middle of the article, this basic construct is used to build a small multiples matrix looking at all 50 states and their rates of vaccination. More on that in a moment.

My last point about this graphic is on the super picky side. Look at the letter g in “of residents given”. It gets clipped. You can still largely read it as a g, but I noticed it. Not sure why it’s happening, though.

So that small multiples graphic I mentioned, well, see below.

All 50 states compared

Note how these use an expanded version of the larger chart. The y-minimum appears to be 0%, but again, it would be very helpful if that were labelled.

Also for the x-axis in all the charts, I’m not sure every one needs the Biden–Trump label. After all, not every chart has the 0–60% range labelled, but the beginning of each row makes that clear.

In the super picky, I wish that final row were aligned with the four above it. I find it super distracting, but that’s probably just me.

Overall, this is a strong piece that makes good use of a number of the standard data visualisation forms. But I wish the graphics were a bit tighter to make the graphics just a little clearer.

Credit for the piece goes to Danielle Ivory, Lauren Leatherby and Robert Gebeloff.

Choropleths and Colours Part 2

Last Thursday I wrote about the use of colour in a choropleth map from the Philadelphia Inquirer. Then on Sunday morning, I opened the door to collect the paper and saw a choropleth above the fold for the New York Times. I’ll admit my post was a bit lengthy—I’ve never been one described as short of words—but the key point was how in the Inquirer piece the designer opted to use a blue-to-red palette for what appeared to be a data set whose numbers ran in one direction. The bins described the number of weeks a house remained on the market, in other words, it could only go up as there are no negative weeks.

Compare that to this graphic from the Times.

More choropleth colours…

Here we are not looking at the Philadelphia housing market, but rather the spread of the UK/Kent variant of SARS-CoV-2, the virus that causes COVID-19. (In the states we call it the UK variant, but obviously in the UK they don’t call it the UK variant, they call it the Kent variant from the county in the UK where it first emerged.)

Specifically, the map looks at the share (percent) of the variant, technically named B.1.1.7, in the tests reported for each country. The Inquirer map had six bins, this Times map has five. The Inquirer, as I noted above, went from less than one week to over five weeks. This map divides 100% into five 20-percent bins.

Unlike the Inquirer map, however, this one keeps to one “colour”. Last week I explained why you’ll see one colour mean yellow to red like we see here.

This map makes better use of colour. It intuitively depicts increasing…virus share, if that’s a phrase, by a deepening red. The equivalent from last week’s map would have, say, 0–40% in different shades of blue. That doesn’t make any sense by default. You could create some kind of benchmark—though off the top of my head none come to mind—where you might want to split the legend into two directions, but in this default setting, one colour headed in one direction makes significant sense.

Separately, the map makes a lot of sense here, because it shows a geographic spread of the variant, rippling outward from the UK. The first significant impacts registering in the countries across the Channel and the North Sea. But within four months, the variant can be found in significant percentages across the continent.

Credit for the piece goes to Josh Holder, Allison McCann, Benjamin Mueller, and Bill Marsh.

Discontinuous Lead Bars

Last week the Guardian published an article about drinking water pollution across the United States. Overall, it was a nicely done piece and the graphics within segmented the longer text into discrete sections. Each unit looks similar:

PFAs.

The left focuses on a definition and provides contextual information. It includes small illustrations of the mechanisms by which the pollutant enters the water system. To the right is a chart showing the levels of the contamination detected in the 120 tests the Guardian (and its partner Consumer Reports) conducted.

In almost all of the charts, we see the maximum depicted on the y-axis. And the bars are coloured if that observation station exceeds the health and safety limits. (The limit is represented by the dotted line.)

But towards the end of the piece we get to lead, a particularly problematic pollutant. There is no safe level of lead contamination. But how the piece handles the lead chart leaves a bit to be desired.

But how bad is it, really?

The first thing is colour, but that’s okay. Everything is red, but again, there is no safe level of lead so everything is over the limit. But look at the y-axis. That little black line at the top indicates a discontinuity in the lines, in other words the values for those three observations are literally off the chart.

But does that work?

First, this kind of thing happens all the time. If you ever have to work with data on either China or India, you’ll often find those two nations, due to their sheer demographic size, skew datasets that involve people. But in these kind of situations, how do we handle off the charts data points?

There is a value to including those points. It can show how extreme of an outlier those observations truly are. In other words, it can help with data transparency, i.e. you’re not trying to hide data points that don’t fit the narrative with which you’re working.

In this piece, it’s never explicitly stated what the largest value in the data set is, but I interpret it as being 5.8. So what happens if we make a quick chart showing a value of 6 (because it’s easier than 5.8)? I added a blue bar to distinguish it from the the rest of the chart.

It’s pretty bad.

You can see that including the data point drastically changes how the chart looks. The number falls well outside the graphic, but it also shows just how dangerously high that one observation truly is.

But if you say, well yeah, but that falls outside the box allowed by the webpage, you’re correct. There are ways it could be handled to sit outside the “box”, but that would require some extra clever bits. And this isn’t a print layout where it’s much easier to play with placement. So what happens when we resize that graphic to fit within its container?

And resized

You can see that All the other bars become quite small. And this is probably why the designers chose to break the chart in the first place. But as we’ve established, in doing so they’ve minimised the danger of those few off-the-charts sites as well as left off context that shows how for the vast majority of sites, the situation is not nearly as dire—though, again, no lead is good lead.

What else could have been done? If maintaining the height of the less affected bars was paramount, the designers had a few other options they could have used. First, you could exclude those observations and perhaps put a line below the 118 text that says “for three sites, the data was off the charts and we’ve excluded them from the set below.”

I have used that approach in the past, but I use it with great reluctance. You are removing important outliers from the data set and the set is not complete without them. After all, if you are looking to use this data set to inform a policy choice such as, which communities should receive emergency funding to reduce lead levels, I’d want to start with the city in blue. Sure, I would like everyone to get money, but we’d have to prioritise resources.

I think the best compromise here would have actually been a small tweak to the original. Above the three bars that are broken (or perhaps to the right with some labelling), label the discontinuous data points to provide clearer context to the vast majority of the sites, which are below 0.5 ppb.

As easy as ABC

This preserves the ability to easily compare the lower level observations, but provides important context of where they sit within the overall data set by maintaining the upper limits of the worst offenders.

Credit for the piece goes to the Guardian’s graphics department.

Impeachment 2: The Insurrection

Like many Americans I closely followed the outcome of yesterday’s historic vote by the House of Representatives to impeach President Trump for his incitement of an insurrection at the US Capitol in a failed coup attempt to overturn the 2020 election.

Words I still never thought I’d write describing an American election.

So at the end of the vote, I created this first graphic to capture the bipartisan nature of the impeachment. Ten Republicans broke ranks and voted with the Democrats. Keep in mind that in 2020, zero Republicans did the same. Justin Amash had by then resigned from the Republican Party and sat as an independent.

But I was also interested in how “courageous” these votes could be seen. Trump remains immensely popular with his base despite his attempt to overthrow the US government and keep himself in power. Did the Republicans who supported impeachment sit in districts won by Biden?

The answer? Not really. Two did: congressmen from New York and California. But a look at the other eight reveals they represent Trump-supporting districts.

To be fair, there are probably three tiers of seats in that group. Liz Cheney, the No. 3 Republican in the House, is in her own Trump-supporting seat as Wyoming’s at large representative. But four other Republicans have seats where Trump won by more than 10 points.

Three more Republicans are in seats I’d label competitive, but lean Republican.

Clearly the argument can be made that for most of these Republicans, it was not a politically safe choice to vote for impeachment. House seats will be redistricted this year for the 2022 midterms, but I’ll be curious to see how these Republicans fare in those redistricting proceedings and then in the ultimate elections thereafter.

Credit for the piece is mine.

Parties in Pennsylvania

This is from a social media post I made a few days ago, but think it may be of some relevance/interest to my Coffeespoons followers. I was curious to see at 30+ days from the general election, how has the landscape changed for the two parties since 2016?

Well, this project has driven me to a related, but slightly different project that has been consuming my non-work time. Hopefully I will have more on that in the coming days. Without further ado, the post:

Pennsylvania will likely be one of the more critical battleground swing states in this year’s election. In 2016, then candidate Trump won the state by less than one percentage point. But four years is a long time and I was curious to see how things have changed.

In the first chart on the right we see counties won by Trump and on the left, Clinton. The further from the centre, the greater the candidate’s margin of victory over the other. The top half plots registered Republicans’ margin over Democrats as a percentage of all registered voters in the county (including independents and third party) and the bottom half does the same for Democrats. Closer to the centre, the more competitive, further away, less so.

Trump’s key to victory was the white, working class voter clustered in the west and the northeast of the state–old mining and steel towns. There Democrats normally counted on organised labour support as registered Democrats. That all but collapsed in 2016. The bottom right shows a number of nominally Democratic counties Trump won, whereas Clinton only picked up one Republican county, Chester.

But what are PA’s battlegrounds?

In the second chart we ignore places like Philly and Fulton County and zoom in on more competitive counties within 20 point margins. Polls presently point to a Biden lead of about 5 points in PA. If every dot moved left by 5 points (it doesn’t really work like that), we only see Erie and Northampton with potential to flip.

But Trump’s realignment of politics is accelerating (more on this another day) a realignment of PA’s political geography.

In the fourth chart, neither Erie nor Northampton show any real movement via party registration back to Democrats. Erie may flip, but Northampton’s likely a stretch. Places like Cumberland and Lancaster counties are too solidly Republican to flip this year. Instead Trump is more likely to flip counties like Monroe and Lehigh red, even if he loses the state.

Because, not shown, the key to a Biden victory will be running up the margins in Philly & Pittsburgh, and to a lesser extent Philly’s four collar counties, including Chester, which appears to be rapidly shifting in Democrats’ favour.

Credit for the piece is mine.