datagraphic – Page 2 – Coffee Spoons

A Visual History of the International Space Station

When I was in high school, the United States would regularly spend space shuttles into orbit to help build this new thing: the International Space Station (ISS). In the aftermath of the Cold War, the nations of the world joined together to commit to building an orbital space station.

There was of course a time before the ISS, and I can recall many jokes being made about Mir, the Soviet then Russian space station. And before Mir there were other, though none as long-lasting. But I digress, we’re here today because recently Canadian astronaut Chris Hadfield tweeted a graphic made by Peter Batenburg that visually captures the history of the International Space Station.

I think my favourite element is the graphical representation of the expansion of the ISS in terms of its volume. I’ve seen similar sort of graphics showing the addition of modules and new components, but I can’t recall seeing the amount of space where people can live and work being captured.

But really, the whole piece is worthy of sitting down and enjoying. After all the ISS is only about 22 years old. But there are questions of how much longer it will remain in orbit. I’m not aware of any concrete plans to fund it beyond 2030 nor any plans for an eventual replacement.

We can only hope that the ISS and its successor remain an area that fosters international cooperation for the next thirty years.

Credit for the piece goes to Peter Batenburg.

The Super Short European Super League

Sunday night, news broke that a number of European football clubs were creating a rogue league, the European Super League. My British and European readers—and Americans who follow football—will know the names of Manchester United, Liverpool, AC Milan, Juventus, Real Madrid, and the others.

To put this in perspective for my American readers, imagine the Yankees, Dodgers, Red Sox, Astros, Padres, Mets, Cardinals, Phillies, Angels, and Nationals saying that they were leaving Major League Baseball to go and form their own new baseball league. That they were doing so to “save the sport”. But in so doing, they also guarantee they all make the playoffs every year.

My frequent readers and those who know me will know I’m a fan of the Boston Red Sox. I should point out that the owner of the Red Sox, John Henry, owns both the Red Sox and Liverpool through his company Fenway Sports Group.

Of course, the analogy doesn’t quite hold up, because there are some significant differences between American sports and European football. Relegation is a big one. Personally, I wish American sports had some way of using relegation to incentivise teams to not intentionally suck.

The basic premise of relegation. Take English football. You have four levels of play and in theory any team can exist in any level. Each year, the worst teams move from their current level down one whilst the best teams move up. And for the top level, the top teams get to compete in lucrative European-wide matches. That is a bit simplistic, but imagine that at the end of last year, the Pirates, Rangers, Tigers, and Red Sox became AAA minor league teams and the four best AAA minor league teams became MLB teams. MLB teams would theoretically try to do everything they could to stay in the MLB and not drop to AAA, because that would mean a loss of money. After all, the Yankees would no longer be heading to Fenway nor the White Sox to Detroit. Would seeing the Detroit Tigers play the Woo Sox really be worth the ticket prices you pay at Comerica Park?

But that’s not how American sports work. And so a few American owners, namely those of Manchester United, Arsenal, and Liverpool, want to ensure a steady stream of money. By creating their own league where their teams cannot be relegated, they guarantee that revenue stream.

In other words, this is all about the owners of these Super League teams making even more money.

Because, during the last year, teams have been hurting without fans in attendance. And that gets us to why I can write this up. Because the BBC in an article about this new league addressed the fact that most of these teams are heavily in debt.

This graphic, however, is a bit misleading. Look at Liverpool. There is no available data for how much financial debt the club holds. So why is it placed between Chelsea and Manchester City? It could well have more debt than Tottenham. Liverpool should really be left off this chart and included in the note, because its placement suggests that it has little debt, when that may well not be the case. This is a really misleading graphic when it comes to how Liverpool fits with the other 11 clubs.

From a design standpoint, I’m also not clear on why the x-axis line extends beyond the labels for £-200m and £600m.

I’m not going to touch all the data labels. That’s for another piece I’ve been working on off and on for a little while now.

At this point I should point out that I was going to post this article later, but in the last 18 hours or so the whole thing has fallen apart as the English teams, followed by the others, have been dropping out under immense pressure from the sport and their fans. To bring back my analogy above, imagine MLB retaliating and saying that if those teams created their own league, the players would not be allowed to play in any other matches and the teams would be locked out from all other competitive baseball games. It’s a mess.

Credit for the piece goes to the BBC graphics department.

Politicising Vaccinations

Yesterday I wrote my usual weekly piece about the progress of the Covid-19 pandemic in the five states I cover. At the end I discussed the progress of vaccinations and how Pennsylvania, Virginia, and Illinois all sit around 25% fully vaccinated. Of course, I leave my write-up at that. But not everyone does.

This past weekend, the New York Times published an article looking at the correlation between Biden–Trump support and rates of vaccination. Perhaps I should not be surprised this kind of piece exists, let alone the premise.

From a design standpoint, the piece makes use of a number of different formats: bars, lines, choropleth maps, and scatter plots. I want to talk about the latter in this piece. The article begins with two side by side scatter plots, this being the first.

Hesitancy rates compared to the election results

The header ends in an ellipsis, but that makes sense because the next graphic, which I’ll get to shortly, continues the sentence. But let’s look at the rest of the plot.

Starting with the x-axis, we have a fairly simple plot here: votes for the candidates. But note that there is no scale. The header provides the necessary definition of being a share of the vote, but the lack of minimum and maximum makes an accurate assessment a bit tricky. We can’t even be certain that the scales are consistent. If you recall our choropleth maps from the other day, the scale of the orange was inconsistent with the scale of the blue-greys. Though, given this is produced by the Times, I would give them the benefit of the doubt.

Furthermore, we have five different colours. I presume that the darkest blues and reds represent the greatest share. But without a scale let alone a legend, it’s difficult to say for certain. The grey is presumably in the mixed/nearly even bin, again similar to what I described in the first post about choropleths from my recent string.

Finally, if we look at the y-axis, we see a few interesting decisions. The first? The placement of the axis labels. Typically we would see the labelling on the outside of the plot, but here, it’s all aligned on the inside of the plot. Intriguingly, the designers took care for the placement—or have their paragraph/character styles well set—as the text interrupts the axis and grid lines, i.e. the text does not interfere with the grey lines.

The second? Wyoming. I don’t always think that every single chart needs to have all the outliers within the bounds of the plot. I’ve definitely taken the same approach and so I won’t criticise it, but I wonder what the chart would have looked like if the maximum had been 35% and the grid lines were set at intervals of 5%. The tradeoff is likely increased difficulty in labelling the dots. And that too is a decision I’ve made.

Third, the lack of a zero. I feel fairly comfortable assuming the bottom of the y-axis is zero. But I would have gone ahead and labelled it all the same, especially because of how the minimum value for the axis is handled in the next graphic.

Speaking of, moving on to the second graphic we can see the ellipsis completes the sentence.

Vaccination rates compared to the election results

We otherwise run into similar issues. Again, there is a lack of labelling on the x-axis. This makes it difficult to assess whether we are looking at the same scale. I am fairly certain we are, because when I overlap the graphics I can see that the two extremes, Wyoming and Vermont, look to exist on the same places on the axis.

We also still see the same issues for the y-axis. This time the axis represents vaccination rates. I wish this graphic made a little clearer the distinction between partial and full vaccination rates. Partial is good, but full vaccination is what really matters. And while this chart shows Pennsylvania, for example, at over 40% vaccinated, that’s misleading. Full vaccination is 15 points lower, at about 25%. And that’s the number that needs to be up in the 75% range for herd immunity.

But back to the labelling, here the minimum value, 20%, is labelled. I can’t really understand the rationale for labelling the one chart but not the other. It’s clearly not a spacing issue.

I have some concerns about the numbers chosen for the minimum and maximum values of the y-axis. However, towards the middle of the article, this basic construct is used to build a small multiples matrix looking at all 50 states and their rates of vaccination. More on that in a moment.

My last point about this graphic is on the super picky side. Look at the letter g in “of residents given”. It gets clipped. You can still largely read it as a g, but I noticed it. Not sure why it’s happening, though.

So that small multiples graphic I mentioned, well, see below.

Note how these use an expanded version of the larger chart. The y-minimum appears to be 0%, but again, it would be very helpful if that were labelled.

Also for the x-axis in all the charts, I’m not sure every one needs the Biden–Trump label. After all, not every chart has the 0–60% range labelled, but the beginning of each row makes that clear.

In the super picky, I wish that final row were aligned with the four above it. I find it super distracting, but that’s probably just me.

Overall, this is a strong piece that makes good use of a number of the standard data visualisation forms. But I wish the graphics were a bit tighter to make the graphics just a little clearer.

Credit for the piece goes to Danielle Ivory, Lauren Leatherby and Robert Gebeloff.

Choropleths and Colours Part 2

Last Thursday I wrote about the use of colour in a choropleth map from the Philadelphia Inquirer. Then on Sunday morning, I opened the door to collect the paper and saw a choropleth above the fold for the New York Times. I’ll admit my post was a bit lengthy—I’ve never been one described as short of words—but the key point was how in the Inquirer piece the designer opted to use a blue-to-red palette for what appeared to be a data set whose numbers ran in one direction. The bins described the number of weeks a house remained on the market, in other words, it could only go up as there are no negative weeks.

Compare that to this graphic from the Times.

Here we are not looking at the Philadelphia housing market, but rather the spread of the UK/Kent variant of SARS-CoV-2, the virus that causes COVID-19. (In the states we call it the UK variant, but obviously in the UK they don’t call it the UK variant, they call it the Kent variant from the county in the UK where it first emerged.)

Specifically, the map looks at the share (percent) of the variant, technically named B.1.1.7, in the tests reported for each country. The Inquirer map had six bins, this Times map has five. The Inquirer, as I noted above, went from less than one week to over five weeks. This map divides 100% into five 20-percent bins.

Unlike the Inquirer map, however, this one keeps to one “colour”. Last week I explained why you’ll see one colour mean yellow to red like we see here.

This map makes better use of colour. It intuitively depicts increasing…virus share, if that’s a phrase, by a deepening red. The equivalent from last week’s map would have, say, 0–40% in different shades of blue. That doesn’t make any sense by default. You could create some kind of benchmark—though off the top of my head none come to mind—where you might want to split the legend into two directions, but in this default setting, one colour headed in one direction makes significant sense.

Separately, the map makes a lot of sense here, because it shows a geographic spread of the variant, rippling outward from the UK. The first significant impacts registering in the countries across the Channel and the North Sea. But within four months, the variant can be found in significant percentages across the continent.

Credit for the piece goes to Josh Holder, Allison McCann, Benjamin Mueller, and Bill Marsh.

Discontinuous Lead Bars

Last week the Guardian published an article about drinking water pollution across the United States. Overall, it was a nicely done piece and the graphics within segmented the longer text into discrete sections. Each unit looks similar:

The left focuses on a definition and provides contextual information. It includes small illustrations of the mechanisms by which the pollutant enters the water system. To the right is a chart showing the levels of the contamination detected in the 120 tests the Guardian (and its partner Consumer Reports) conducted.

In almost all of the charts, we see the maximum depicted on the y-axis. And the bars are coloured if that observation station exceeds the health and safety limits. (The limit is represented by the dotted line.)

But towards the end of the piece we get to lead, a particularly problematic pollutant. There is no safe level of lead contamination. But how the piece handles the lead chart leaves a bit to be desired.

The first thing is colour, but that’s okay. Everything is red, but again, there is no safe level of lead so everything is over the limit. But look at the y-axis. That little black line at the top indicates a discontinuity in the lines, in other words the values for those three observations are literally off the chart.

But does that work?

First, this kind of thing happens all the time. If you ever have to work with data on either China or India, you’ll often find those two nations, due to their sheer demographic size, skew datasets that involve people. But in these kind of situations, how do we handle off the charts data points?

There is a value to including those points. It can show how extreme of an outlier those observations truly are. In other words, it can help with data transparency, i.e. you’re not trying to hide data points that don’t fit the narrative with which you’re working.

In this piece, it’s never explicitly stated what the largest value in the data set is, but I interpret it as being 5.8. So what happens if we make a quick chart showing a value of 6 (because it’s easier than 5.8)? I added a blue bar to distinguish it from the the rest of the chart.

You can see that including the data point drastically changes how the chart looks. The number falls well outside the graphic, but it also shows just how dangerously high that one observation truly is.

But if you say, well yeah, but that falls outside the box allowed by the webpage, you’re correct. There are ways it could be handled to sit outside the “box”, but that would require some extra clever bits. And this isn’t a print layout where it’s much easier to play with placement. So what happens when we resize that graphic to fit within its container?

You can see that All the other bars become quite small. And this is probably why the designers chose to break the chart in the first place. But as we’ve established, in doing so they’ve minimised the danger of those few off-the-charts sites as well as left off context that shows how for the vast majority of sites, the situation is not nearly as dire—though, again, no lead is good lead.

What else could have been done? If maintaining the height of the less affected bars was paramount, the designers had a few other options they could have used. First, you could exclude those observations and perhaps put a line below the 118 text that says “for three sites, the data was off the charts and we’ve excluded them from the set below.”

I have used that approach in the past, but I use it with great reluctance. You are removing important outliers from the data set and the set is not complete without them. After all, if you are looking to use this data set to inform a policy choice such as, which communities should receive emergency funding to reduce lead levels, I’d want to start with the city in blue. Sure, I would like everyone to get money, but we’d have to prioritise resources.

I think the best compromise here would have actually been a small tweak to the original. Above the three bars that are broken (or perhaps to the right with some labelling), label the discontinuous data points to provide clearer context to the vast majority of the sites, which are below 0.5 ppb.

This preserves the ability to easily compare the lower level observations, but provides important context of where they sit within the overall data set by maintaining the upper limits of the worst offenders.

Credit for the piece goes to the Guardian’s graphics department.

Impeachment 2: The Insurrection

Like many Americans I closely followed the outcome of yesterday’s historic vote by the House of Representatives to impeach President Trump for his incitement of an insurrection at the US Capitol in a failed coup attempt to overturn the 2020 election.

Words I still never thought I’d write describing an American election.

So at the end of the vote, I created this first graphic to capture the bipartisan nature of the impeachment. Ten Republicans broke ranks and voted with the Democrats. Keep in mind that in 2020, zero Republicans did the same. Justin Amash had by then resigned from the Republican Party and sat as an independent.

But I was also interested in how “courageous” these votes could be seen. Trump remains immensely popular with his base despite his attempt to overthrow the US government and keep himself in power. Did the Republicans who supported impeachment sit in districts won by Biden?

The answer? Not really. Two did: congressmen from New York and California. But a look at the other eight reveals they represent Trump-supporting districts.

To be fair, there are probably three tiers of seats in that group. Liz Cheney, the No. 3 Republican in the House, is in her own Trump-supporting seat as Wyoming’s at large representative. But four other Republicans have seats where Trump won by more than 10 points.

Three more Republicans are in seats I’d label competitive, but lean Republican.

Clearly the argument can be made that for most of these Republicans, it was not a politically safe choice to vote for impeachment. House seats will be redistricted this year for the 2022 midterms, but I’ll be curious to see how these Republicans fare in those redistricting proceedings and then in the ultimate elections thereafter.

Credit for the piece is mine.

Parties in Pennsylvania

This is from a social media post I made a few days ago, but think it may be of some relevance/interest to my Coffeespoons followers. I was curious to see at 30+ days from the general election, how has the landscape changed for the two parties since 2016?

Well, this project has driven me to a related, but slightly different project that has been consuming my non-work time. Hopefully I will have more on that in the coming days. Without further ado, the post:

Pennsylvania will likely be one of the more critical battleground swing states in this year’s election. In 2016, then candidate Trump won the state by less than one percentage point. But four years is a long time and I was curious to see how things have changed.

In the first chart on the right we see counties won by Trump and on the left, Clinton. The further from the centre, the greater the candidate’s margin of victory over the other. The top half plots registered Republicans’ margin over Democrats as a percentage of all registered voters in the county (including independents and third party) and the bottom half does the same for Democrats. Closer to the centre, the more competitive, further away, less so.

Trump’s key to victory was the white, working class voter clustered in the west and the northeast of the state–old mining and steel towns. There Democrats normally counted on organised labour support as registered Democrats. That all but collapsed in 2016. The bottom right shows a number of nominally Democratic counties Trump won, whereas Clinton only picked up one Republican county, Chester.

But what are PA’s battlegrounds?

In the second chart we ignore places like Philly and Fulton County and zoom in on more competitive counties within 20 point margins. Polls presently point to a Biden lead of about 5 points in PA. If every dot moved left by 5 points (it doesn’t really work like that), we only see Erie and Northampton with potential to flip.

But Trump’s realignment of politics is accelerating (more on this another day) a realignment of PA’s political geography.

In the fourth chart, neither Erie nor Northampton show any real movement via party registration back to Democrats. Erie may flip, but Northampton’s likely a stretch. Places like Cumberland and Lancaster counties are too solidly Republican to flip this year. Instead Trump is more likely to flip counties like Monroe and Lehigh red, even if he loses the state.

Because, not shown, the key to a Biden victory will be running up the margins in Philly & Pittsburgh, and to a lesser extent Philly’s four collar counties, including Chester, which appears to be rapidly shifting in Democrats’ favour.

Credit for the piece is mine.

Shipping Out of Boston

Monday was the trade deadline for this year’s attempt at a baseball season. The Red Sox actively sold off parts of their roster. You may remember that just two years ago, the Red Sox won the World Series, the sport’s national championship. One would imagine that two years later, most of that championship calibre roster would remain.

You would be wrong.

Well over half that roster is gone. And to prove it, I bought a t-shirt to celebrate. The t-shirt’s design featured the World Series roster on the reverse. (To be fair, there was a mistake as Brandon Workman, who had been on the ALDS and ALCS teams was removed for Drew Pomeranz. But Pomeranz is also gone and so what do you know, the math still works.) I simply crossed out who is no longer with the team.

Some people retired, like Steve Pearce, who, despite being World Series MVP, had his body simply give out and could no longer play the sport two years later. Others, like Blake Swihart, were really only on the roster so that they would not be lost to waiver claims. Still others, like Joe Kelly, understandably left in free agency for deals that were probably way overpriced. And others like Mitch Moreland were simply traded at the end of their contracts for potential prospects to build the next winning team.

And then there are the others.

Brock Holt, a fan favourite super utility, a verifiable Brockstar, who the Red Sox never really entertained any notion of retaining this past off-season. Jose Peraza is no Brock Holt.

And of course, last but certainly not least, we have the Mookie Betts situation. Because ownership has got to make its millions. A homegrown, fifth-round draft pick who was originally slotted into second base. As he began to rise through the system the thought was to trade him, because Dustin Pedroia blocked him in that position. Well someone, somewhere (probably no longer in the organisation) had the idea of let’s try him in the outfield. 2018 MVP much?

But he was traded to the Dodgers this off season because ownership wouldn’t agree to an extension, a pricey one to be fair, but one that an ownership group and a particular owner that includes (in whole or in part) the Red Sox, Fenway Park, NESN, Roush Fenway Racing (controls two NASCAR cars), and in the UK, Liverpool FC, and Anfield, home of Liverpool FC. So, you know, they have some money. But they wouldn’t commit to paying a homegrown star his due to have him play his entire career in Boston.

So they flipped him to the Dodgers for a few prospects and one player, Alex Verdugo, who has a checkered past with allegations of being present near a sexual assault (though he is not alleged to have assaulted the victim, being as he was reportedly in the other room) and then more directly recording on Snapchat the beating of aforementioned victim by two other women who were in the room. None of this has been proven in court, however, because none of it was thoroughly investigated, allegedly because the Dodgers and their director of player development, who would later go on to manage the Phillies and now the Giants, did not really want it fully investigated. And by all accounts, the incident will never be fully investigated and so we’ll never really know what happened in that hotel room.

They traded Mookie Betts, generally perceived in the media as all around nice and humble guy, and also a champion bowler, for saving some money, two minors prospects, and Alex Verdugo.

Credit for the original shirt goes to somebody on either the MLB or Red Sox design teams I would assume. The annotations are, of course, my own work.

Wednesday’s Covid-19 Data

Here we have the data from Wednesday for Covid-19.

Pennsylvania saw continued spread of the virus. Notably, Monroe County in eastern Pennsylvania passed 1000 cases. It was one of the state’s earliest hotspots. That appears to have been because it was advertised as a corona respite for people from New York, not too far to the east and by then in the grips of their own outbreak.

New Jersey grimly passed 5000 deaths Wednesday. And it is on track to pass 100,000 total cases likely Friday or Saturday. Almost 2/3 of these cases are located in North Jersey, with some South Jersey counties still reporting just a few hundred cases and a handful of deaths.

Delaware passed 3000 cases and Kent Co. passed 500. While those don’t read like large numbers, keep in mind the relatively small population of the state.

Virginia has restarted reporting deaths, this time at the county level and not the health district level. What we see is deaths being reported all over the eastern third of the state from DC through Richmond down to Virginia Beach. In the interior counties we are beginning to see the first deaths appear. And in western counties, we still see that the virus has yet to reach some locations, but counties are beginning to report their first cases.

Illinois continues to suffer greatly in the Chicago area, and at levels that dwarf the remainder of the state. However, the downstate counties are beginning to see spikes of their own. Macon and Jefferson Counties each saw increases of 30–40 cases in just 24 hours.

Preview(opens in a new tab)

A longer-term look at the states shows how the states diverge in their outbreaks. Pennsylvania looks like it might be forcing the curve downward whereas New Jersey appears to have more plateaued. Earlier I expressed concern about Virginia, which does now appear to have not peaked and continues to see an increasing rate of spread. Then we have Illinois, which may have plateaued, but we need to see if yesterday’s record amount of new cases was a blip or an inflection point. And in Delaware a missing day of records makes it tricker to see what exactly the trend is.

Credit for the piece is mine.

Comparing Covid-19 to Influenza

I want to share a small graphic I made yesterday evening. And I am being charitable with the term graphic. Really it is nothing more than a collection of organised factettes. But I have seen the footage of those protesting the lockdowns in various states, including Pennsylvania.

To be clear, people can have different policy prescriptions to solve the pandemic. For example, the governor of Pennsylvania is considering lifting the lockdown piecemeal once the state overall has sufficient testing and tracing capabilities. Look at the state.

He rightly said that Cameron County, one of the little light purple shapes in the upper left, with its one case for the last 25 days is in a different situation than Philadelphia where cases continue to grow, albeit at a slowing rate. And in the future it is possible that Cameron County could open before Philadelphia. That is a different policy prescription than, say, opening the state all at once.

I don’t think most people enjoy lockdown—I haven’t left my building in 38 days and I cannot wait to leave and go do something. But I recognise that spreading outside these walls we have a deadly pandemic for which we have no vaccine. But then I see people protesting—protesting in a manner that contradicts the guidelines put out by the health officials—and claiming that we should open up because this is nothing worse than the flu.

Well, Covid-19 is not the flu. It is much worse.

This isn't your grandmother's flu. Or anyone else's flu. Because this isn't the flu. — This isn’t your grandmother’s flu. Or anyone else’s flu. Because this isn’t the flu.

Now, those numbers will change because the pandemic is ongoing. But, let’s spitball. Let’s assume those numbers hold. The idea of the shutdowns, lockdowns, and quarantines is to prevent the spread of the virus. For the sake of this thought experiment, let’s just assume, however, that it infects 56 million people, the upper end of the range for this most recent influenza season.

Influenza this year killed as many as 62,000 people after infecting 56 million. Hypothetically, with a mortality rate of 5%, Covid-19 would kill 2,800,000 people.

With a 4% rate that drops to 2,240,000

With a 3% rate that drops to 1,680,000

With a 2% rate that drops to 1,120,000

With a a 1% rate that drops to 560,000

With a 0.5% rate that drops to 280,000

And even at 0.5% that is still far greater than the flu. And so that is why it is so important to keep the number of people infected as low as possible. (And I won’t even get into the surge problems overwhelming hospitals that acts as a force multiplier and is the proximate reason for the lockdowns.)

This is not the flu.

Credit for the piece is mine.