More on Those Million Covid-19 Deaths

Yesterday I focused on the big graphic from the New York Times that crossed the full spread of the front/back page. But the graphic was merely the lead graphic for a larger piece. I linked to the online version of the article, but for this post I’m going to stick with the print edition. The article consists of a full-page open then an entire interior spread, all in limited colour. The remainder of the extensive coverage consists of photo essays and interviews that understandably attempt to humanise the data points, after all, each dot from yesterday represented one individual, solitary, human being. That is an important element of a story like this and other national and international tragedies, but we also need to focus on the data and not let the emotion of the story overwhelm our rational and logical analysis.

Sometimes it’s hard to realise we’re in the third year of this pandemic.

From a data visualisation standpoint the first page begins simply enough with a long timeline of the Covid-19 pandemic charting the number of absolute deaths each day. As we looked at yesterday, the absolute deaths tell part of the story. But if we were to have looked at the number of absolute cases in conjunction with the deaths, we could also see how the virus has thus far evolved to be more transmissible but less lethal. Here the number of daily deaths from Omicron surpassed Delta, but fell short of the winter peak in early 2021. But the number of cases exploded with Omicron, making its mortality rate lower. In other words, far more people were getting sick, but as far fewer were dying.

An interesting note is that if you take a look at the online version, there the designers chose a more stylised approach to presenting the data.

All the dots

Here they kept the dot approach and simply stacked and reordered the dots. However, I presume for aesthetic reasons, they kept the stacking loose dots and dropped all the axis lines because it does make for a nice transition from the map to this chart. But they also dropped all headings and descriptors that tell the reader just what they are looking at. These decisions make the chart far less useful as a tool to tell the data-driven element of the story.

There are three annotations that label the number of deaths in New York, the Northeast, and the rest of the United States. But what does the chart say? When are the endpoints for those annotations? And then you can compare the scale of the y-axis of this chart and compare it to the printed version above. A more dramatic scale leads to a more dramatic narrative.

This sort of visual style of flash and fancy transitions over the clear communication of the data is why I find the print piece more compelling and more trustworthy. I find the online version, still useful, but far more lacking and wanting in terms of information design.

The interior spread is where this article shines.

Just a fantastic spread.

From an editorial design standpoint, the symmetry works very well here. It’s a clear presentation and the white space around the graphic blocks lets that content shine as it should in this type of story. Collectively these pieces do a great job telling the story of the pandemic thus far across the nation. The graphics do not need a lot of colour and make do with sparse flash. Annotations call the reader’s attention to salient points and outliers.

Very nice work here.

From a content standpoint, I would be particularly curious if we have robust data for deaths by education level. Earlier this year I recall reading news about a study that said education best correlated to Covid cases, and I would be curious to see if that held true for deaths. Of course these charts do a great job of showing just how effective the vaccines were and remain. They are the best preventative measure we have available to us.

More really nice graphics

Here I disagree with the design decision of how to break down the states into regions. The Census Bureau breaks down the United States into four regions using the same names as in the graphic above. However, if you look closely at the inset map, you will see that Delaware, Maryland, and West Virginia in particular are included as part of the Northeast. (I cannot tell if the District of Columbia is included as part of the Northeast or South.)

Now compare that to the Census Bureau’s definition:

How the government defines US geography

If you ask me to include Delaware and Maryland as part of the Northeast, well, if you’re selling it, I’ll buy it. After all, just because the Census Bureau defines the United States this way does not mean the New York Times has to. Both are connected to the Northeast Corridor via Amtrak and I-95 and are plugged into the Megalopolis economy. Maybe the Potomac should be the demarcation between Northeast and South. But I struggle to understand West Virginia. Before you go and connect it to the Northeast, I would argue that West Virginia has far more in common with the Midwest geographically, economically, and culturally.

More critically, given this issue, it strikes me as a serious problem when the online version of the chart—with the aforementioned issues—does not even include the little inset to highlight this at best unusual regional definition.

Where would you place West Virginia?

And so while I have reservations about the data—how would the data have looked if the states were realigned?—the design of the line charts overall is good.

Again, I am talking about the print version, not that online graphic. I would argue that the above screenshot is barely even a chart and more “data art” or an illustration of data. Consider here, for example, that for the South we have that muted slate blue for the dots, but the spacing and density of the dots leads to areas of lighter slate and darker slate. But a lighter slate means more space between stacked dots and darker slate means a more compact design. A lighter colour therefore pushes the “edge” of the line further up the y-axis and artificially inflates its value, not that we can understand what that value is as the “chart” lacks any sort of y-axis.

Finally the print piece has a set of small multiples breaking down deaths by income in the three largest American cities: New York, Los Angeles, and Chicago. These are just great little charts showing the correlation between income and death from Covid, organised by Zip code.

But this also serves as a stark reminder of just how much better the print piece is over the online version. Because if we take a look at a screenshot from the online article, we have a graphic that addresses all the issues I pointed out earlier.

Why couldn’t the online article kept to this style?

I am left to wonder why the reader of the online version does not have access to this clearer and more accurate representation of the data throughout the piece?

To me this article is a great example of when the print piece far exceeds that of the online version. Content-wise this is a great story that needed to be told this weekend, but design wise we see a significant gap in quality from print to online. Suffice it to say that on Sunday I was very glad I received the print version.

Credit for the piece goes to Sarah Almukhtar, Amy Harmon, Danielle Ivory, Lauren Leatherby, Albert Sun, and Jeremy White.

Political Hatch Jobs

Earlier this week I read an article in the Philadelphia Inquirer about the political prospects of some of the candidates for the open US Senate seat for Pennsylvania, for which I and many others will be voting come November. But before I get to vote on a candidate, members of the political parties first get to choose whom they want on the ballot. (In Pennsylvania, independent voters like myself are ineligible to vote in party primaries.)

This year the Republican Party has several candidates running and one of them you may have heard of: Dr. Oz. Yeah, the one from television. And while he is indeed the front runner, he is not in front by much as the article explains. Indeed, the race largely had been a two-person contest between Oz and David McCormick until recently when Kathy Barnette pulled just about even with the two.

In fact, according to a recent poll the three candidates are all statistically tied in that they all fall within the margin of error for victory. And that brings us to the graphic from the article.

It would be funny to see a candidate finish with negative vote share.

Conceptually this is a pretty simple bar chart with the bar representing the share of the support of those polled. But I wanted to point out how the designer chose to represent the margin of error via hatched shading to both sides of the ends of the red bar.

In some cases the hatch job does not work for me, particularly with those smaller candidates where the bar goes negative. I would have grave reservations about the vote should any candidate win a negative share of the vote. 0% perhaps, but negative? No. I also don’t think the grey hatching works as well over the grey bar in particular and to a lesser degree the red.

I have often thought that these sorts of charts should use some kind of box plot approach. So this morning I took the chart above and reworked it.

Now with box plots.

Overall, however, I really like this designer’s approach. We should not fear subtlety and nuance, and margins of error are just that. After all, we need not go back too far in time to remember a certain candidate who thought she had a presidential election locked up when really her opponent was within the margin of error.

Credit for the piece goes to John Duchneskie.

All the Colours, All the Space

Everyone knows inflation is a thing. If not, when was the last time you went shopping? Last week the Boston Globe looked specifically at children’s shoes. I don’t have kids, but I can imagine how a rapidly growing miniature human requires numerous pairs of shoes and frequently. The article explores some of the factors going into the high price of shoes and uses, not very surprisingly, some line charts to show prices for components and the final product over time. But the piece also contains a few bar charts and that’s what I’d like to briefly discuss today, starting with the screenshot below.

What is going on here?

What we see here are a list of countries and the share of production for select inputs—leather, rubber, and textiles—in 2020. At the top we have a button that allows the user to toggle between the two and a little movement of the bars provides the transition. The length of the bar encodes the country in question’s market share for the selected material.

We also have all this colour, but what is it doing? What data point does the colour encode? Initially I thought perhaps geographic regions, but then you have the US and Mexico, or Italy and Russia, or Argentina and Brazil, all pairs of countries in the same geographic regions and yet all coloured differently. Colour encodes nothing and thus becomes a visual distraction that adds confusion.

Then we have the white spaces between the bars. The gap between bars is there because the country labels attach to the top of the bars. But, especially for the top of the chart, the labels are small and the gap is at just the right height such that the white spaces become white bars competing with the coloured bars for visual attention.

The spaces and the colours muddy the picture of what the data is trying to show. How do we know this? Because later in the article we get this chart.

Ahh, much better. Much clearer.

This works much better. The focus is on the bars, the labelling is clear, almost nothing else competes visually with the data. I have a few quibbles with this design as well, but it’s certainly an improvement over the earlier screenshot we discussed. (I should note that this graphic, as it does here, also comes after the earlier graphic.)

My biggest issue is that when I first look at the piece, I want to see it sorted, say greatest to least. In other words, Furniture and bedding sits at the top with its 15.8% increase, year-on-year, and then Alcoholic beverages last at 3.7%. The issue here, however, is that we are not necessarily looking at goods at the same hierarchical level.

The top of the list is pretty easy to consider: food, new vehicles, alcoholic beverages, shelter, furniture and bedding, and appliances. We can look at all those together. But then we have All apparel. And then immediately after that we have Men’s, Women’s, Boys’ , Girls’, and Infants’ and toddlers’ apparel. In other words, we are now looking at a subset of All apparel. All apparel is at the same level of Food or Shelter, but Men’s apparel is not.

At that point we would need to differentiate between the two, whilst also grouping them together, because the range of values for those different sub-apparel groups comprise the aggregate value for All apparel. And showing them all next to Food is not an apples-to-apples comparison.

If I were to sort these, I would sort by from greatest to least by the parent group and then immediately beneath the parent I would display the children. To differentiate between parent-level and children-level, I would probably make the bars shorter in the vertical and then address the different levels typographically with the labels, maybe with smaller type or by putting the children in italic.

Finally, again, whilst this is a massive improvement over the earlier graphic, I’d make one more addition, an addition that would also help the first graphic. As we are talking about inflation year-on-year, we can see how much greater costs are from Furniture and bedding to Alcoholic beverages and that very much is part of the story. But what is the inflation rate overall?

According to the Bureau of Labour Statistics, inflation over that period was 8.5%. In other words, a number of the categories above actually saw price increases less than the average inflation rate—that’s good—even though they were probably higher than increases had been prior to the pandemic—that’s bad. But, more importantly for this story, with the addition of a benchmark line running vertically at 8.5%, we could see how almost all apparel and footwear child-level line items were below the inflation rate. But the children and infant level items far exceeded that benchmark line, hence the point of the article. I made a quick edit to the screenshot to show how that could work in theory.

To the right, not so good.

Overall, an interesting article worth reading, but it contained one graphic in need of some additional work and then a second that, with a few improvements, would have been a better fit for the article’s story.

Credit for the piece goes to Daigo Fujiwara.

The Potential Impacts of Throwing Out Roe v Wade

Spoiler: they are significant.

Last night we had breaking news on two very big fronts. The first is that somebody inside the Supreme Court leaked an entire draft of the majority opinion, written by Justice Alito, to Politico. Leaks from inside the Supreme Court, whilst they do happen, are extremely rare. This alone is big news.

But let’s not bury the lede, the majority opinion is to throw out Roe v. Wade in its entirety. For those not familiar, perhaps especially those of you who read me from abroad, Roe v Wade is the name of a court case that went before the United States Supreme Court in 1971 and was decided in 1973. It established the woman’s right to an abortion as constitutionally protected, allowing states to enact some regulations to balance out the state’s role in concern for women’s public health and the health of the fetus as it nears birth. Regardless of how you feel about the issue—and people have very strong feelings about it—that’s largely been the law of the United States for half a century.

Until now.

To be fair, the draft opinion is just that, a draft. And the supposed 5-3 vote—Chief Justice Roberts is reportedly undecided, but against the wholesale overthrow of Roe—could well change. But let’s be real, it won’t. And even if Roberts votes against the majority he would only make the outcome 5-4. In other words, it looks like at some point this summer, probably June or July, tens of millions of American women will lose access to reproductive healthcare.

And to the point of this post, what will that mean for women?

This article by Grid runs down some of the numbers, starting with laying out the numbers on who chooses to have abortions. And then ultimately getting to this map that I screenshot.

That’s pretty long distances in the south…

The map shows how far women in a state would need to travel for an abortion with Roe active as law and without. I’ve used the toggle to show without. Women in the south in particular will need to travel quite far. The article further breaks out distances today with more granularity to paint the picture of “abortion deserts” where women have to travel sometimes well over 200 miles to have a safe, legal abortion.

I am certain that we will be returning to this topic frequently in coming months, unfortunately.

Credit for the piece goes to Alex Leeds Matthews.

Russo-Ukrainian War Refugees: 12 April

Another week, more combat and refugees in Ukraine. I’m going to try and hold the war update until tomorrow pending some news that hasn’t been confirmed yet: the fall of Mariupol. Instead, we’re going to again look briefly at the refugee situation in Ukraine—technically outside. I haven’t seen a recent number on the internally displaced, though we have begun to see some people return to Ukraine especially in the north and around Kyiv. It’s unclear to me if the data includes those people returning.

Regardless, we are at over 4.6 million Ukrainians who have fled Ukraine.

Slowing down of late.

The question now is as Russia refocuses its effort now on the Donbas—though fierce fighting has been waged in the area for eight years now—will these numbers begin to see a notable change.

Credit for the piece is mine.

Russo-Ukrainian War Refugees: 5 April

Just a quick update as I try to update my battle map. Today we’re taking another look at the refugee crisis Putin created in eastern and central Europe. Over four million Ukrainians have left Ukraine and millions more have been displaced internally within Ukraine.

Whilst we may hope they will eventually return home, the photos and videos we are seeing of Ukrainian areas that had been captured by Russian forces show that many Ukrainians no longer have homes or even villages to which they can return.

This problem will persist for years as Ukraine tries to rebuild. And that doesn’t include the fact that much of southern and parts of eastern Ukraine remain under Russian control. And some of those areas continue to see fierce fighting.

Credit for the piece is mine.

Where’s the Axis

We’re starting this week with an article from the Philadelphia Inquirer. It looks at the increasing number of guns confiscated by the Transportation Security Administration (TSA) at Philadelphia International Airport. Now while this is a problem we could discuss, one of the graphics therein has a problem that we’ll discuss here.

We have a pretty standard bar chart here, with the number of guns “detected” at all US airports from 2008 through 2021. The previous year is highlighted with a darker shade of blue. But what’s missing?

We have two light grey lines running across the graphic. But what do they represent? We do have the individual data points labelled above each bar, and that gives us a clue that the grey lines are axis lines, specifically representing 2,000 and 4,000 guns, because they run between the bars straddling those two lines.

However, we also have the data labels themselves. I wonder, however, are they even necessary? If we look at the amount of space taken up by the labels, we can imagine that three labels, 2k, 4k, and 6k, would use significantly less visual real estate than the individual labels. The data contained in the labels could be relegated to a mouseover state, revealed only when the user interacts directly with the graphic. Here it serves as a “sparkle”, distracting from the visual relationships of the bars.

If the actual data values to the single digit are important, a table would be a better format for displaying the information. A chart should show the visual relationship. Now, perhaps the Inquirer decided to display data labels and no axis for all charts. I may disagree with that, but it’s a house data visualisation stylistic choice.

But then we have the above screenshot. In this bar chart, we have something similar. Bars represent the number of guns detected specifically at Philadelphia International Airport, although the time framer is narrower being only 2017–2021. We do have grey lines in the background, but now on the left of the chart, we have numbers. Here we do have axis labels displaying 10, 20, and 30. Interestingly, the maximum value in the data set is 39 guns detected last year, but the chart does not include an axis line at 40 guns, which would make sense given the increments used.

At the end of the day, this is just a frustrating series of graphics. Whilst I do not understand the use of the data labels, the inconsistency with the data labels within one article is maddening.

Credit for the piece goes to John Duchneskie.

Can You Hit the High Notes?

This is an older piece that I stumbled across doing some other work. I felt like it needed sharing. The interactive graphic shows the high and low note vocal ranges of major musical artists.

Good to see some of my favourite artists in the mix.

Interactive controls allow the user to sort the bars by the greatest vocal range, high notes, or low notes. Colour coding distinguishes male from female vocalists.

In particular I enjoy the bottom of the piece that uses the keyboard to show the range of notes. When the user mouses over a particular singer, the ends of the range display the particular song in which the singer hit the note.

Again, this is an older piece that I just discovered, but I did enjoy it. I would be curious to see how these things could change over time. As an artist ages, how does that change his or her vocal range? Are there differences between albums? This could be a fascinating point at which branching out for further research could be done.

Credit for the piece goes to ConcertHotels.com

Slaveholders in the Halls of Congress

Taking a break from going through the old articles and things I’ve saved, let’s turn to a an article from the Washington Post published earlier this week. As the title indicates, the Post’s article explores slaveholders in Congress. Many of us know that the vast majority of antebellum presidents at one point or another owned slaves. (Washington and Jefferson being the two most commonly cited in recent years.) But what about the other branches of government?

The article is a fascinating read about the prevalence of slaveholders in the legislative branch. For our purposes it uses a series of bar charts and maps to illustrate its point. Now, the piece isn’t truly interactive as it’s more of the scrolling narrative, but at several points in American history the article pauses to show the number of slaveholders in office during a particular Congress. The screenshot below is from the 1807 Congress.

That year is an interesting choice, not mentioned explicitly in the article, because the United States Constitution prohibited Congress from passing limits on the slave trade prior to 1808. But in 1807 Congress passed a law that banned the slave trade from 1 January 1808, the first day legally permitted by the Constitution.

Almost half of Congress in the early years had, at one point or another, owned slaves.

Graphic-wise, we have a set of bar charts representing the percentage and then a choropleth map showing each state’s number of slaveholders in Congress. As we will see in a moment, the map here is a bit too small to work. Can you really see Delaware, Rhode Island, and (to a lesser extent) New Jersey? Additionally, because of the continuous gradient it can be difficult to distinguish just how many slaveholders were present in each state. I wonder if a series of bins would have been more effective.

The decision to use actual numbers intrigues me as well. Ohio, for example, has few slaveholders in Congress based upon the map. But as a newly organised state, Ohio had only two senators and one congressman. That’s a small actual, but 33% of its congressional delegation.

Overall though, the general pervasiveness of slaveholders warrants the use of a map to show geographic distribution was not limited to just the south.

Later on we have what I think is the best graphic of the article, a box map showing each state’s slaveholders over time.

How the trends changed over time over geography.

Within each state we can see the general trend, including the legacy of the Civil War and Reconstruction. The use of a light background allows white to represent pre-statehood periods for each state. And of course some states, notably Alaska and Hawaii, joined the United States well after this period.

But I also want to address one potential issue with the methodology of the article. One that it does briefly address, albeit tangentially. This data set looks at all people who at one point or another in their life held slaves. First, contextually, in the early years of the republic slavery was not uncommon throughout the world. Though by the aforementioned year of 1807 the institution appeared on its way out in the West. Sadly the cotton gin revolutionised the South’s cotton industry and reinvigorated the economic impetus for slavery. There after slavery boomed. The banning of the slave trade shortly thereafter introduced scarcity into the slave market and then the South’s “peculiar institution” truly took root. That cotton boom may well explain how the initial decline in the prevalence of slaveholders in the first few Congresses reversed itself and then held steady through the early decades of the 19th century.

And that initial decline before a hardening of support for slavery is what I want to address. The data here looks only at people who at one point in their life held slaves. It’s not an accurate representation of current slaveholders in Congress at the time they served. It’s a subtle but important distinction. The most obvious result of this is how after the 1860s the graphics show members of Congress as slaveholders when this was not the case. They had in the past held slaves.

That is not to say that some of those members were reluctant and, in all likelihood, would have preferred to have kept their slaves. And therefore those numbers are important to understand. But it undermines the count of people who eventually came to realise the error of their ways. The article addresses this briefly, recounting several anecdotes of people who later in life became abolitionists. I wonder though whether these people should count in this graphic as—so far as we can tell—their personal views changed so substantially to be hardened against slavery.

I would be very curious to see these charts remade with a data set that accounts for contemporary ownership of slaves represented in Congress.

Regardless of the methodology issue, this is still a fascinating and important read.

Credit for the piece goes to Adrian Blanco, Leo Dominguez, and Julie Zuazmer Weil.

Philadelphia’s Wild Winters

Winter is coming? Winter is here. At least meteorologically speaking, because winter in that definition lasts from December through February. But winters in Philadelphia can be a bit scattershot in terms of their weather. Yesterday the temperature hit 19ºC before a cold front passed through and knocked the overnight low down to 2ºC. A warm autumn or spring day to just above freezing in the span of a few hours.

But when we look more broadly, we can see that winters range just that much as well. And look the Philadelphia Inquirer did. Their article this morning looked at historical temperatures and snowfall and whilst I won’t share all the graphics, it used a number of dot plots to highlight the temperature ranges both in winter and yearly.

Yep, I still prefer winter to summer.

The screenshot above focuses attention on the range in January and July and you can see how the range between the minimum and maximum is greater in the winter than in the summer. Philadelphia may have days with summer temperatures in the winter, but we don’t have winter temperatures in summer. And I say that’s unfair. But c’est la vie.

Design wise there are a couple of things going on here that we should mention. The most obvious is the blue background. I don’t love it. Presently the blue dots that represent colder temperatures begin to recede into and blend into the background, especially around that 50ºF mark. If the background were white or even a light grey, we would be able to clearly see the full range of the temperatures without the optical illusion of a separation that occurs in those January temperature observations.

Less visible here is the snowfall. If you look just above the red dots representing the range of July temperatures, you can see a little white dot near the top of the screenshot. The article has a snowfall effect with little white dots “falling” down the page. I understand how the snowfall fits with the story about winter in Philadelphia. Whilst the snowfall is light enough to not be too distracting, I personally feel it’s a bit too cute for a piece that is data-driven.

The snowfall is also an odd choice because, as the article points out, Philadelphia winters do feature snowfall, but that on days when precipitation falls, snow accounts for less than 1/3 of those days with rain and wintry mixes accounting for the vast majority.

Overall, I really like the piece as it dives into the meteorological data and tries to accurately paint a portrait of winters in Philadelphia.

And of course the article points out that the trend is pointing to even warmer winters due to climate change.

Credit for the piece goes to Aseem Shukla and Sam Morris.