Where’s the Axis

We’re starting this week with an article from the Philadelphia Inquirer. It looks at the increasing number of guns confiscated by the Transportation Security Administration (TSA) at Philadelphia International Airport. Now while this is a problem we could discuss, one of the graphics therein has a problem that we’ll discuss here.

We have a pretty standard bar chart here, with the number of guns “detected” at all US airports from 2008 through 2021. The previous year is highlighted with a darker shade of blue. But what’s missing?

We have two light grey lines running across the graphic. But what do they represent? We do have the individual data points labelled above each bar, and that gives us a clue that the grey lines are axis lines, specifically representing 2,000 and 4,000 guns, because they run between the bars straddling those two lines.

However, we also have the data labels themselves. I wonder, however, are they even necessary? If we look at the amount of space taken up by the labels, we can imagine that three labels, 2k, 4k, and 6k, would use significantly less visual real estate than the individual labels. The data contained in the labels could be relegated to a mouseover state, revealed only when the user interacts directly with the graphic. Here it serves as a “sparkle”, distracting from the visual relationships of the bars.

If the actual data values to the single digit are important, a table would be a better format for displaying the information. A chart should show the visual relationship. Now, perhaps the Inquirer decided to display data labels and no axis for all charts. I may disagree with that, but it’s a house data visualisation stylistic choice.

But then we have the above screenshot. In this bar chart, we have something similar. Bars represent the number of guns detected specifically at Philadelphia International Airport, although the time framer is narrower being only 2017–2021. We do have grey lines in the background, but now on the left of the chart, we have numbers. Here we do have axis labels displaying 10, 20, and 30. Interestingly, the maximum value in the data set is 39 guns detected last year, but the chart does not include an axis line at 40 guns, which would make sense given the increments used.

At the end of the day, this is just a frustrating series of graphics. Whilst I do not understand the use of the data labels, the inconsistency with the data labels within one article is maddening.

Credit for the piece goes to John Duchneskie.

Can You Hit the High Notes?

This is an older piece that I stumbled across doing some other work. I felt like it needed sharing. The interactive graphic shows the high and low note vocal ranges of major musical artists.

Good to see some of my favourite artists in the mix.

Interactive controls allow the user to sort the bars by the greatest vocal range, high notes, or low notes. Colour coding distinguishes male from female vocalists.

In particular I enjoy the bottom of the piece that uses the keyboard to show the range of notes. When the user mouses over a particular singer, the ends of the range display the particular song in which the singer hit the note.

Again, this is an older piece that I just discovered, but I did enjoy it. I would be curious to see how these things could change over time. As an artist ages, how does that change his or her vocal range? Are there differences between albums? This could be a fascinating point at which branching out for further research could be done.

Credit for the piece goes to ConcertHotels.com

Dots Beat Bars

Today is just a quick little follow-up to my post from Monday. There I talked about how a Boston Globe piece using three-dimensional columns to show snowfall amounts in last weekend’s blizzard failed to clearly communicate the data. Then I showed a map from the National Weather Service (NWS) that showed the snowfall ranges over an entire area.

Well scrolling through the weather feeds on the Twitter yesterday I saw this graphic from the NWS that comes closer to the Globe‘s original intent, but again offers a far clearer view of the data.

Much better

Whilst we miss individual reports being depicted as exact, that is to say the reports are grouped into bins and assigned a colour, we have a much more granular view than we did with the first NWS graphic I shared.

The only comment I have on this graphic is that I would probably drop the terrain element of the map. The dots work well when placed atop the white map, but the lighter blues and yellows fade out of view when placed atop the green.

But overall, this is a much clearer view of the storm’s snowfall.

Credit for the piece goes to the National Weather Service graphics department.

How Accurate Is Punxsutawney Phil?

For those unfamiliar with Groundhog Day—the event, not the film, because as it happens your author has never seen the film—since 1887 in the town of Punxsutawney, Pennsylvania (60 miles east-northeast of Pittsburgh) a groundhog named Phil has risen from his slumber, climbed out of his burrow, and went to see if he could see his shadow. Phil prognosticates upon the continuance of winter—whether we receive six more weeks of winter or an early spring—based upon the appearance of his shadow.

But as any meteorological fan will tell you, a groundhog’s shadow does not exactly compete with the latest computer modelling running on servers and supercomputers. And so we are left with the all important question: how accurate is Phil?

Thankfully the National Oceanic and Atmospheric Administration (NOAA) published an article several years ago that they continue to update. And their latest update includes 2021 data.

Not exactly an accurate depiction of Phil.

I am loathe to be super critical of this piece, because, again, relying upon a groundhog for long-term weather forecasting is…for the birds (the best I could do). But critiques of information design is largely what this blog is for.

Conceptually, dividing up the piece between a long-term, i.e. since 1887, and a shorter-term, i.e. since 2012, makes sense. The long-term focuses more on how Phil split out his forecasts—clearly Phil likes winter. I dislike the use of the dark blue here for the years for which we have no forecast data. I would have opted for a neutral colour, say grey, or something that is visibly less impactful than the two light colours (blue and yellow) that represent winter and spring.

Whilst I don’t love the icons used in the pie chart, they do make sense because the designers repeat them within the table. If they’re selling the icon use, I’ll buy it. That said, I wonder if using those icons more purposefully could have been more impactful? What would have happened if they had used a timeline and each year was represented by an icon of a snowflake or a sun? What about if we simply had icons grouped in blocks of ten or twenty?

The table I actually enjoy. I would tweak some of the design elements, for example the green check marks almost fade into the light blue sky. A darker green would have worked well there. But, conceptually this makes a lot of sense. Run each prognostication and compare it with temperature deviation for February and March (as a proxy for “winter” or “spring”) and then assess whether Phil was correct.

I would like to know more about what a slightly above or below measurement means compared to above or below. And I would like to know more about the impact of climate change upon these measurements. For example, was Phil’s accuracy higher in the first half of the 20th century? The end of the 19th?

Finally, the overall article makes a point about how difficult it would be for a single groundhog in western Pennsylvania to determine weather for the entire United States let alone its various regions. But what about Pennsylvania? Northern Appalachia? I would be curious about a more regionally-specific analysis of Phil’s prognostication prowess.

Credit for the piece goes to the NOAA graphics department.

America’s Crime Problem

During the pandemic, media reports of the rise of crime have inundated American households. Violent crimes, we are told, are at record highs. One wonders if society is on the verge of collapse.

But last night a few friends asked me to take a look at the data during the pandemic (2020–2021) and see what is actually going on out on the streets in a few big cities. Naturally I agreed and that’s why we have this post today. The first thing to understand, however, is that we do not have a federal-level database where we can cross compare crimes in cities using standardised definitions. The FBI used to produce such a thing, but in 2020 retired it in favour of a new system that, for reasons, local and state agencies have yet to fully embrace. Consequently, just when we need some real data, we have a notable lack of it.

At the very least we have national-level reporting on violent crimes and homicides, the latter of which is a subset of violent crimes. Though these reports are also dependent on local and state agencies self-reporting to the FBI. I also wanted to look at not just whether crime is up of late, but is crime up over the last several years. I chose to go back 30 years, or a generation.

We can see one important trend here, that at a national level violent crimes are largely stable at rate of 400 per 100,000 people. Homicides, however, have climbed by nearly a third. Violent crimes are not rising, but murders are.

My initial charge was to look at cities and violent crime. However, knowing that nationally violent crimes are largely stable, the issue of concern would be how the rise in murders is playing out on American city streets. With the caveat that we do not have a single database to review, I pulled data directly from the five cities of interest: Philadelphia, Chicago, New York, Washington, and Detroit.

I also considered that large cities will have more murders simply by dint of their larger populations. And so when I collected the data, I also tried to find the Census Bureau’s population estimates of the cities during the same time frame. Unfortunately the 2021 estimates are not yet available so I had to use the 2020 population estimates for my 2021 calculations.

First we can see that not all cities report data for the same time period. And for Detroit in particular that makes comparisons tricky. In fact only New York had data back to the beginning of the century. Regardless of the data set’s less than full robustness we can see that in all five cities homicides rose in 2020 and 2021.

Second, however, if squint through that lack of full data, we see a trend at the city level that aligns with the national level. Homicides, tragically, are indeed up. However, in New York and Washington homicides are still below the data from near 2000 and at that time homicides already appear on a downward trajectory. I would bet that homicides were even higher during the 1990s and that the 2000s represented a long-run decline. In other words, whilst homicides are up, they are still below their peaks. A worrying trend, but far from the sky is falling.

That cannot quite be said for other cities. Let’s start with Detroit. Sadly we have too few years of data to draw any conclusion other than that homicides rose compared to the years preceding the pandemic.

That leaves us with Philadelphia and Chicago. Philadelphia has less data available and it’s harder to make a determination of what is happening. But we can say that since 2007, homicides have not been higher. If you look closely though, you can see how there does appear to be a downward trend at the beginning of the line. We do not have enough data like we do with New York and Washington, but I would bet homicides are up in Philadelphia, but still far short of what they were in the 1990s.

Chicago is the oddball. Yes, it saw a peak in homicides during the pandemic. But in 2016 the city didn’t miss the pandemic peak by much. In other words, homicides were staggeringly high in Chicago before the pandemic. If anything, we see a failure to combat high crime rates. But even before that spike in 2016, we see more of a valley floor in homicides. True, at the beginning of the century homicides appear to have trended down. But unlike the other cities here, homicides bottomed out at around 450 per 100,000 people. I’m not so certain we had a persistent, long-run decline in Chicago with which to start.

And like I said above, larger populations we would expect to have more murders because more potential criminals and victims. When we equalise for population we see the same trends as we expect—the city populations have been relatively stable over the last 20 years. Instead what we see is that relative to each other murders are more common in some cities and less so in others.

New York is a great example with nearly 500 murders last year, a number on par with Philadelphia. But New York has over 8 million inhabitants. Philadelphia has just 1.6. Consequently New York’s homicide rate is a surprisingly low 5.9 per 100,000 people. Philadelphia’s on the other hand? 35.6.

Philadelphia is near the top of that list, with Washington and Chicago having similar, albeit lower, rates at 31.7 and 30.1, respectively. But sadly Detroit surpasses them all and is in league of its own: 47.5 in 2021.

Credit for the pieces is mine.

Obfuscating Bars

On Friday, I mentioned in brief that the East Coast was preparing for a storm. One of the cities the storm impacted was Boston and naturally the Boston Globe covered the story. One aspect the paper covered? The snowfall amounts. They did so like this:

All the lack of information

This graphic fails to communicate the breadth and literal depth of the snow. We have two big reasons for that and they are both tied to perspective.

First we have a simple one: bars hiding other bars. I live in Greater Centre City, Philadelphia. That means lots of tall buildings. But if I look out my window, the tall buildings nearer me block my view of the buildings behind. That same approach holds true in this graphic. The tall red columns in southeastern Massachusetts block those of eastern and northeastern parts of the state and parts of New Hampshire as well. Even if we can still see the tops of the columns, we cannot see the bases and thus any real meaningful comparison is lost.

Second: distance. Pretty simple here as well, later today go outside. Look at things on your horizon. Note that those things, while perhaps tall such as a tree or a skyscraper, look relatively small compared to those things immediately around you. Same applies here. Bars of the same data, when at opposite ends of the map, will appear sized differently. Below I took the above screenshot and highlighted two observations that differed in only 0.5 inches of snow. But the box I had to draw—a rough proxy for the columns’ actual heights—is 44% larger.

These bars should be about the same.

This map probably looks cool to some people with its three-dimensional perspective and bright colours on a dark grey map. But it fails where it matters most: clearly presenting the regional differences in accumulation of snowfall amounts.

Compare the above to this graphic from the Boston office of the National Weather Service (NWS).

No, it does not have the same cool factor. And some of the labelling design could use a bit of work. But the use of a flat, two-dimensional map allows us to more clearly compare the ranges of snowfall and get a truer sense of the geographic patterns in this weekend’s storm. And in doing so, we can see some of the subtleties, for example the red pockets of greater snowfall amounts amid the wider orange band.

Credit for the Globe piece goes to John Hancock.

Credit for the NWS piece goes to the graphics department of NWS Boston.

How the Globe’s Writers Voted

Yesterday we looked at a piece by the Boston Globe that mapped out all of David Ortiz’s home runs. We did that because he has just been voted into baseball’s Hall of Fame. But to be voted in means there must be votes and a few weeks after the deadline, the Globe posted an article about how that publication’s eligible voters, well, voted.

The graphic here was a simple table. But as I’ll always say, tables aren’t an inherently bad or easy-way-out form of data visualisation. They are great at organising information in such a way that you can quickly find or reference specific data points. For example, let’s say you wanted to find out whether or not a specific writer voted for a specific ballplayer.

Just don’t ask me for whom I would have voted…

Simple red check marks represent those players for whom the Globe’s eligible staff voted. I really like some of the columns on the left that provide context on the vote. For the unfamiliar, players can only remain on the list for up to ten years. And so for the first four, this was their last year of eligibility. None made the cut. Then there’s a column for the total number of votes made by the Globe’s staff. Following that is more context, the share of votes received in 2021. Here the magic number if 75% to be elected. Conversely, if you do not make 5% you drop off the following year. Almost all of those on their first year ballot failed to reach that threshold.

The only potential drawback to this table is that by the time you reach the end of the table, there are few check marks to create implicit rules or lines that guide you from writer to player. David Ortiz’s placement helps because six—remarkably not all Globe writers voted for him—it grounds you for the only person below him (alphabetically) to receive a vote. And we need that because otherwise quickly linking Alex Rodriguez to Alex Speier would be difficult.

Finally below the table we have jump links to each writer’s writings about their selections. And if you’ll allow a brief screenshot of that…

Still don’t ask me

We have a nicely designed section here. Designers delineated each author’s section with red arrows that evoke the red stitching on a baseball. It’s a nice design tough. Then each author receives a headline and a small call out box inside which are the players—and their headshots—for whom the author voted. An initial dropped capital (drop cap), here a big red M, grabs the reader’s attention and draws them into the author’s own words.

Overall this was a solidly designed piece. I really enjoyed it. And for those who don’t follow the sport, the table is also an indicator of how divisive the voting can be. Even the Globe’s writers couldn’t unanimously agree on voting for David Ortiz.

Credit for the piece goes to Daigo Fujiwara and Ryan Huddle.

Slaveholders in the Halls of Congress

Taking a break from going through the old articles and things I’ve saved, let’s turn to a an article from the Washington Post published earlier this week. As the title indicates, the Post’s article explores slaveholders in Congress. Many of us know that the vast majority of antebellum presidents at one point or another owned slaves. (Washington and Jefferson being the two most commonly cited in recent years.) But what about the other branches of government?

The article is a fascinating read about the prevalence of slaveholders in the legislative branch. For our purposes it uses a series of bar charts and maps to illustrate its point. Now, the piece isn’t truly interactive as it’s more of the scrolling narrative, but at several points in American history the article pauses to show the number of slaveholders in office during a particular Congress. The screenshot below is from the 1807 Congress.

That year is an interesting choice, not mentioned explicitly in the article, because the United States Constitution prohibited Congress from passing limits on the slave trade prior to 1808. But in 1807 Congress passed a law that banned the slave trade from 1 January 1808, the first day legally permitted by the Constitution.

Almost half of Congress in the early years had, at one point or another, owned slaves.

Graphic-wise, we have a set of bar charts representing the percentage and then a choropleth map showing each state’s number of slaveholders in Congress. As we will see in a moment, the map here is a bit too small to work. Can you really see Delaware, Rhode Island, and (to a lesser extent) New Jersey? Additionally, because of the continuous gradient it can be difficult to distinguish just how many slaveholders were present in each state. I wonder if a series of bins would have been more effective.

The decision to use actual numbers intrigues me as well. Ohio, for example, has few slaveholders in Congress based upon the map. But as a newly organised state, Ohio had only two senators and one congressman. That’s a small actual, but 33% of its congressional delegation.

Overall though, the general pervasiveness of slaveholders warrants the use of a map to show geographic distribution was not limited to just the south.

Later on we have what I think is the best graphic of the article, a box map showing each state’s slaveholders over time.

How the trends changed over time over geography.

Within each state we can see the general trend, including the legacy of the Civil War and Reconstruction. The use of a light background allows white to represent pre-statehood periods for each state. And of course some states, notably Alaska and Hawaii, joined the United States well after this period.

But I also want to address one potential issue with the methodology of the article. One that it does briefly address, albeit tangentially. This data set looks at all people who at one point or another in their life held slaves. First, contextually, in the early years of the republic slavery was not uncommon throughout the world. Though by the aforementioned year of 1807 the institution appeared on its way out in the West. Sadly the cotton gin revolutionised the South’s cotton industry and reinvigorated the economic impetus for slavery. There after slavery boomed. The banning of the slave trade shortly thereafter introduced scarcity into the slave market and then the South’s “peculiar institution” truly took root. That cotton boom may well explain how the initial decline in the prevalence of slaveholders in the first few Congresses reversed itself and then held steady through the early decades of the 19th century.

And that initial decline before a hardening of support for slavery is what I want to address. The data here looks only at people who at one point in their life held slaves. It’s not an accurate representation of current slaveholders in Congress at the time they served. It’s a subtle but important distinction. The most obvious result of this is how after the 1860s the graphics show members of Congress as slaveholders when this was not the case. They had in the past held slaves.

That is not to say that some of those members were reluctant and, in all likelihood, would have preferred to have kept their slaves. And therefore those numbers are important to understand. But it undermines the count of people who eventually came to realise the error of their ways. The article addresses this briefly, recounting several anecdotes of people who later in life became abolitionists. I wonder though whether these people should count in this graphic as—so far as we can tell—their personal views changed so substantially to be hardened against slavery.

I would be very curious to see these charts remade with a data set that accounts for contemporary ownership of slaves represented in Congress.

Regardless of the methodology issue, this is still a fascinating and important read.

Credit for the piece goes to Adrian Blanco, Leo Dominguez, and Julie Zuazmer Weil.

Graduate Degrees

Many of us know the debt that comes along with undergraduate degrees. Some of you may still be paying yours down. But what about graduate degrees? A recent article from the Wall Street Journal examined the discrepancies between debt incurred in 2015–16 and the income earned two years later.

The designers used dot plots for their comparisons, which narratively reveal themselves through a scrolling story. The author focuses on the differences between the University of Southern California and California State University, Long Beach. This screenshot captures the differences between the two in both debt and income.

Pretty divergent outcomes…

Some simple colour choices guide the reader through the article and their consistent use makes it easy for the reader to visually compare the schools.

From a content standpoint, these two series, income and debt, can be combined to create an income to debt ratio. Simply put, does the degree pay for itself?

What’s really nice from a personal standpoint is that the end of the article features an exploratory tool that allows the user to search the data set for schools of interest. More than just that, they don’t limit that tool to just graduate degrees. You can search for undergraduate degrees.

Below the dot plot you also have a table that provides the exact data points, instead of cluttering up the visual design with that level of information. And when you search for a specific school through the filtering mechanism, you can see that school highlighted in the dot plot and brought to the top of the table.

Fortunately my alma mater is included in the data set.

Welp.

Unfortunately you can see that the data suggests that graduates with design and applied arts degrees earn less (as a median) than they spend to obtain the degree. That’s not ideal.

Overall this was a really nice, solid piece. And probably speaks to the discussions we need to have more broadly about post-secondary education in the United States. But that’s for another post.

Credit for the piece goes to James Benedict, Andrea Fuller, and Lindsay Huth.

Philadelphia’s Wild Winters

Winter is coming? Winter is here. At least meteorologically speaking, because winter in that definition lasts from December through February. But winters in Philadelphia can be a bit scattershot in terms of their weather. Yesterday the temperature hit 19ºC before a cold front passed through and knocked the overnight low down to 2ºC. A warm autumn or spring day to just above freezing in the span of a few hours.

But when we look more broadly, we can see that winters range just that much as well. And look the Philadelphia Inquirer did. Their article this morning looked at historical temperatures and snowfall and whilst I won’t share all the graphics, it used a number of dot plots to highlight the temperature ranges both in winter and yearly.

Yep, I still prefer winter to summer.

The screenshot above focuses attention on the range in January and July and you can see how the range between the minimum and maximum is greater in the winter than in the summer. Philadelphia may have days with summer temperatures in the winter, but we don’t have winter temperatures in summer. And I say that’s unfair. But c’est la vie.

Design wise there are a couple of things going on here that we should mention. The most obvious is the blue background. I don’t love it. Presently the blue dots that represent colder temperatures begin to recede into and blend into the background, especially around that 50ºF mark. If the background were white or even a light grey, we would be able to clearly see the full range of the temperatures without the optical illusion of a separation that occurs in those January temperature observations.

Less visible here is the snowfall. If you look just above the red dots representing the range of July temperatures, you can see a little white dot near the top of the screenshot. The article has a snowfall effect with little white dots “falling” down the page. I understand how the snowfall fits with the story about winter in Philadelphia. Whilst the snowfall is light enough to not be too distracting, I personally feel it’s a bit too cute for a piece that is data-driven.

The snowfall is also an odd choice because, as the article points out, Philadelphia winters do feature snowfall, but that on days when precipitation falls, snow accounts for less than 1/3 of those days with rain and wintry mixes accounting for the vast majority.

Overall, I really like the piece as it dives into the meteorological data and tries to accurately paint a portrait of winters in Philadelphia.

And of course the article points out that the trend is pointing to even warmer winters due to climate change.

Credit for the piece goes to Aseem Shukla and Sam Morris.