How Accurate Is Punxsutawney Phil?

For those unfamiliar with Groundhog Day—the event, not the film, because as it happens your author has never seen the film—since 1887 in the town of Punxsutawney, Pennsylvania (60 miles east-northeast of Pittsburgh) a groundhog named Phil has risen from his slumber, climbed out of his burrow, and went to see if he could see his shadow. Phil prognosticates upon the continuance of winter—whether we receive six more weeks of winter or an early spring—based upon the appearance of his shadow.

But as any meteorological fan will tell you, a groundhog’s shadow does not exactly compete with the latest computer modelling running on servers and supercomputers. And so we are left with the all important question: how accurate is Phil?

Thankfully the National Oceanic and Atmospheric Administration (NOAA) published an article several years ago that they continue to update. And their latest update includes 2021 data.

Not exactly an accurate depiction of Phil.

I am loathe to be super critical of this piece, because, again, relying upon a groundhog for long-term weather forecasting is…for the birds (the best I could do). But critiques of information design is largely what this blog is for.

Conceptually, dividing up the piece between a long-term, i.e. since 1887, and a shorter-term, i.e. since 2012, makes sense. The long-term focuses more on how Phil split out his forecasts—clearly Phil likes winter. I dislike the use of the dark blue here for the years for which we have no forecast data. I would have opted for a neutral colour, say grey, or something that is visibly less impactful than the two light colours (blue and yellow) that represent winter and spring.

Whilst I don’t love the icons used in the pie chart, they do make sense because the designers repeat them within the table. If they’re selling the icon use, I’ll buy it. That said, I wonder if using those icons more purposefully could have been more impactful? What would have happened if they had used a timeline and each year was represented by an icon of a snowflake or a sun? What about if we simply had icons grouped in blocks of ten or twenty?

The table I actually enjoy. I would tweak some of the design elements, for example the green check marks almost fade into the light blue sky. A darker green would have worked well there. But, conceptually this makes a lot of sense. Run each prognostication and compare it with temperature deviation for February and March (as a proxy for “winter” or “spring”) and then assess whether Phil was correct.

I would like to know more about what a slightly above or below measurement means compared to above or below. And I would like to know more about the impact of climate change upon these measurements. For example, was Phil’s accuracy higher in the first half of the 20th century? The end of the 19th?

Finally, the overall article makes a point about how difficult it would be for a single groundhog in western Pennsylvania to determine weather for the entire United States let alone its various regions. But what about Pennsylvania? Northern Appalachia? I would be curious about a more regionally-specific analysis of Phil’s prognostication prowess.

Credit for the piece goes to the NOAA graphics department.

America’s Crime Problem

During the pandemic, media reports of the rise of crime have inundated American households. Violent crimes, we are told, are at record highs. One wonders if society is on the verge of collapse.

But last night a few friends asked me to take a look at the data during the pandemic (2020–2021) and see what is actually going on out on the streets in a few big cities. Naturally I agreed and that’s why we have this post today. The first thing to understand, however, is that we do not have a federal-level database where we can cross compare crimes in cities using standardised definitions. The FBI used to produce such a thing, but in 2020 retired it in favour of a new system that, for reasons, local and state agencies have yet to fully embrace. Consequently, just when we need some real data, we have a notable lack of it.

At the very least we have national-level reporting on violent crimes and homicides, the latter of which is a subset of violent crimes. Though these reports are also dependent on local and state agencies self-reporting to the FBI. I also wanted to look at not just whether crime is up of late, but is crime up over the last several years. I chose to go back 30 years, or a generation.

We can see one important trend here, that at a national level violent crimes are largely stable at rate of 400 per 100,000 people. Homicides, however, have climbed by nearly a third. Violent crimes are not rising, but murders are.

My initial charge was to look at cities and violent crime. However, knowing that nationally violent crimes are largely stable, the issue of concern would be how the rise in murders is playing out on American city streets. With the caveat that we do not have a single database to review, I pulled data directly from the five cities of interest: Philadelphia, Chicago, New York, Washington, and Detroit.

I also considered that large cities will have more murders simply by dint of their larger populations. And so when I collected the data, I also tried to find the Census Bureau’s population estimates of the cities during the same time frame. Unfortunately the 2021 estimates are not yet available so I had to use the 2020 population estimates for my 2021 calculations.

First we can see that not all cities report data for the same time period. And for Detroit in particular that makes comparisons tricky. In fact only New York had data back to the beginning of the century. Regardless of the data set’s less than full robustness we can see that in all five cities homicides rose in 2020 and 2021.

Second, however, if squint through that lack of full data, we see a trend at the city level that aligns with the national level. Homicides, tragically, are indeed up. However, in New York and Washington homicides are still below the data from near 2000 and at that time homicides already appear on a downward trajectory. I would bet that homicides were even higher during the 1990s and that the 2000s represented a long-run decline. In other words, whilst homicides are up, they are still below their peaks. A worrying trend, but far from the sky is falling.

That cannot quite be said for other cities. Let’s start with Detroit. Sadly we have too few years of data to draw any conclusion other than that homicides rose compared to the years preceding the pandemic.

That leaves us with Philadelphia and Chicago. Philadelphia has less data available and it’s harder to make a determination of what is happening. But we can say that since 2007, homicides have not been higher. If you look closely though, you can see how there does appear to be a downward trend at the beginning of the line. We do not have enough data like we do with New York and Washington, but I would bet homicides are up in Philadelphia, but still far short of what they were in the 1990s.

Chicago is the oddball. Yes, it saw a peak in homicides during the pandemic. But in 2016 the city didn’t miss the pandemic peak by much. In other words, homicides were staggeringly high in Chicago before the pandemic. If anything, we see a failure to combat high crime rates. But even before that spike in 2016, we see more of a valley floor in homicides. True, at the beginning of the century homicides appear to have trended down. But unlike the other cities here, homicides bottomed out at around 450 per 100,000 people. I’m not so certain we had a persistent, long-run decline in Chicago with which to start.

And like I said above, larger populations we would expect to have more murders because more potential criminals and victims. When we equalise for population we see the same trends as we expect—the city populations have been relatively stable over the last 20 years. Instead what we see is that relative to each other murders are more common in some cities and less so in others.

New York is a great example with nearly 500 murders last year, a number on par with Philadelphia. But New York has over 8 million inhabitants. Philadelphia has just 1.6. Consequently New York’s homicide rate is a surprisingly low 5.9 per 100,000 people. Philadelphia’s on the other hand? 35.6.

Philadelphia is near the top of that list, with Washington and Chicago having similar, albeit lower, rates at 31.7 and 30.1, respectively. But sadly Detroit surpasses them all and is in league of its own: 47.5 in 2021.

Credit for the pieces is mine.

Obfuscating Bars

On Friday, I mentioned in brief that the East Coast was preparing for a storm. One of the cities the storm impacted was Boston and naturally the Boston Globe covered the story. One aspect the paper covered? The snowfall amounts. They did so like this:

All the lack of information

This graphic fails to communicate the breadth and literal depth of the snow. We have two big reasons for that and they are both tied to perspective.

First we have a simple one: bars hiding other bars. I live in Greater Centre City, Philadelphia. That means lots of tall buildings. But if I look out my window, the tall buildings nearer me block my view of the buildings behind. That same approach holds true in this graphic. The tall red columns in southeastern Massachusetts block those of eastern and northeastern parts of the state and parts of New Hampshire as well. Even if we can still see the tops of the columns, we cannot see the bases and thus any real meaningful comparison is lost.

Second: distance. Pretty simple here as well, later today go outside. Look at things on your horizon. Note that those things, while perhaps tall such as a tree or a skyscraper, look relatively small compared to those things immediately around you. Same applies here. Bars of the same data, when at opposite ends of the map, will appear sized differently. Below I took the above screenshot and highlighted two observations that differed in only 0.5 inches of snow. But the box I had to draw—a rough proxy for the columns’ actual heights—is 44% larger.

These bars should be about the same.

This map probably looks cool to some people with its three-dimensional perspective and bright colours on a dark grey map. But it fails where it matters most: clearly presenting the regional differences in accumulation of snowfall amounts.

Compare the above to this graphic from the Boston office of the National Weather Service (NWS).

No, it does not have the same cool factor. And some of the labelling design could use a bit of work. But the use of a flat, two-dimensional map allows us to more clearly compare the ranges of snowfall and get a truer sense of the geographic patterns in this weekend’s storm. And in doing so, we can see some of the subtleties, for example the red pockets of greater snowfall amounts amid the wider orange band.

Credit for the Globe piece goes to John Hancock.

Credit for the NWS piece goes to the graphics department of NWS Boston.

I Call Them Life Tiles

Happy Friday, everyone. Here in the United States’ Northeast Corridor we’re looking forward to a potentially powerful nor’easter that could be the first real snowstorm to hit Philadelphia all winter. (Dumb La Niña.)

But I’ve also recently started working in a new sketchbook. (It happens often.) But that’s why I thought this graphic from Indexed would work for me. I am often sketching out notes, concepts, still lifes, whatever else and I now have a neat little collection of used sketchbooks.

But my sketchbooks are always worth my time and that’s why I always save them.

Credit for the piece goes to Jessica Hagy.

How the Globe’s Writers Voted

Yesterday we looked at a piece by the Boston Globe that mapped out all of David Ortiz’s home runs. We did that because he has just been voted into baseball’s Hall of Fame. But to be voted in means there must be votes and a few weeks after the deadline, the Globe posted an article about how that publication’s eligible voters, well, voted.

The graphic here was a simple table. But as I’ll always say, tables aren’t an inherently bad or easy-way-out form of data visualisation. They are great at organising information in such a way that you can quickly find or reference specific data points. For example, let’s say you wanted to find out whether or not a specific writer voted for a specific ballplayer.

Just don’t ask me for whom I would have voted…

Simple red check marks represent those players for whom the Globe’s eligible staff voted. I really like some of the columns on the left that provide context on the vote. For the unfamiliar, players can only remain on the list for up to ten years. And so for the first four, this was their last year of eligibility. None made the cut. Then there’s a column for the total number of votes made by the Globe’s staff. Following that is more context, the share of votes received in 2021. Here the magic number if 75% to be elected. Conversely, if you do not make 5% you drop off the following year. Almost all of those on their first year ballot failed to reach that threshold.

The only potential drawback to this table is that by the time you reach the end of the table, there are few check marks to create implicit rules or lines that guide you from writer to player. David Ortiz’s placement helps because six—remarkably not all Globe writers voted for him—it grounds you for the only person below him (alphabetically) to receive a vote. And we need that because otherwise quickly linking Alex Rodriguez to Alex Speier would be difficult.

Finally below the table we have jump links to each writer’s writings about their selections. And if you’ll allow a brief screenshot of that…

Still don’t ask me

We have a nicely designed section here. Designers delineated each author’s section with red arrows that evoke the red stitching on a baseball. It’s a nice design tough. Then each author receives a headline and a small call out box inside which are the players—and their headshots—for whom the author voted. An initial dropped capital (drop cap), here a big red M, grabs the reader’s attention and draws them into the author’s own words.

Overall this was a solidly designed piece. I really enjoyed it. And for those who don’t follow the sport, the table is also an indicator of how divisive the voting can be. Even the Globe’s writers couldn’t unanimously agree on voting for David Ortiz.

Credit for the piece goes to Daigo Fujiwara and Ryan Huddle.

558 Dingers

Yesterday baseball writers elected David Ortiz of the Boston Red Sox, better known as Big Papi, to the Baseball Hall of Fame. I was trying to work on a thing for yesterday, but ran out of time. While I will attempt to return to that later, for now I want to share a simple interactive graphic from the Boston Globe. As the blog title suggests, it’s about the 558 career home runs Ortiz hit between his time with the Twins and the Red Sox. He hit 541 of those during the regular season, tacking on 17 more in the post season including his famous 2013 ALCS grand slam against the Detroit Tigers. (The one where the cop’s arms are in the air alongside Torii Hunter’s legs.)

That’s a lot of runs

Now you can see that Ortiz was a left-handed pull hitter with that home run concentration to right field, especially those wrapped around Fenway’s (in)famous Pesky Pole.

But with the number of dots you see inside the grounds at Fenway, you can also see the one downside of a chart like this. The graphic maps home runs at all Major League ballparks to that of Fenway. Not to mention the role that the Green Monster plays in turning a lot of those line drive home runs that when hit to right field leave the yard, but to left simply bounce off the Monster for doubles or the dreaded long single. But in part that’s why Ortiz also had ridiculous season numbers for extra base hits because of all those Green Monster doubles. (Conversely, how many popups a mile in the sky came down into the Green Monster seats?)

You access this interactive piece by scrolling through the experience as the Globe chose 12 home runs to represent Ortiz’s entire career. I’m fortunate enough to remember watching several of them on the television.

Big Papi was a force to be reckoned with and watching him hit was entertainment. I’m very excited to see him enter the Hall of Fame.

This summer? It’s his effing Hall.

Credit for the piece goes to John Hancock.

Finding Home with a Homemade Map

We’re going to start this week out with some good news and for that we turn to China. 30 years ago, child traffickers kidnapped four-year old Li Jingwei from his family and sold him to another family over 1,000 miles away. A BBC article from earlier this month covered Li Jingwei’s reunion with his family. How did it happen? Because of a map he drew and shared with the internet.

Here’s a screenshot of that map.

It’s missing a Starbucks though

We all create mental maps of our surroundings. And not surprisingly they grow larger as we get older. But this man’s ability to recall details of his family hometown allowed internet sleuths, and eventually the police, to identify the village. DNA tests then connected Li to a woman whose son had been abducted.

When we draw out these maps ourselves they become a link to the cartographic world. And that this man was able to use his own mental map to find his home. Well, like I said, we’re going to start the news off with some good news.

Credit for the piece goes to Li Jingwei.

Tea. Earl Grey. Hot.

Any science fiction fan—and likely many who are not—can identify the character who utters those words in that order: Jean-Luc Picard, Star Trek’s captain of the USS Enterprise, NCC-1701-D. Ask your Amazon Alexa for it. Or your Google Home.

Thanks to the work of xkcd, we now know that Jean-Luc—may I call him Jean-Luc?—had a number of other options in the replicator from which to choose before he settled on “hot”.

Although Garak would still like to meet that Earl Grey and tell him a thing or two about tea leaves.

Credit for the piece goes to Randall Munroe.

Even Older Family Trees

Yesterday we looked at a graphic about an old family tree, revealed by ancient DNA. But at the end of the day it is a family tree of descent for a human male. But mankind itself fits within a kind of family tree, the circle family tree of life.

The tree of life continues to evolve as we discover new species and then reconfigure what we have to fit what we now know. When I was a wee lad in school, we learned about the three kingdoms of life: plants, animals, and fungi. Bacteria were a separate branch.

A few weeks ago, however, I was reading an article about how a recent DNA analysis identified a new “supergroup” within our larger group of complex cellular life, eukaryotes (plants, animal, and fungi fall within this). Luckily for our purposes the article contains a small graphic at which we can take a look.

Humans are way, way, way down on the tree.

The diagram uses a fairly simple design. Two panels split the largest groupings into its branches whilst the second panel breaks up eukaryotes. Colour links the eukaryotes together and shows how they fit into the broader tree to the left, which uses dark grey and light blue for bacteria and archaea, respectively.

A nice additional touch was the designer’s decision to include a small icon that represents the name of the supergroups within eukaryotes. Because, as the text points out, we don’t have commonly known names for these supergroups. Did I know that we belong to the opisthokonts? Absolutely not. Although dog people may be upset that the cat got the call to represent animals.

Regardless of the design, you can still see in the second panel how people are more closely related to amoeba than we are plants. But this new supergroup, hemimastigotes, branches off from the rest of us eukaryotes at a very early point. And the DNA proves it.

Overall this was a really nice graphic to see in a fascinating article. Science is cool.

Credit for the piece goes to Lucy Reading-Ikkanda.

Old Family Trees

Another quick little post from a little while back, around Christmas news broke about the oldest family tree yet discovered. Researchers used DNA recovered from a 5700-year old tomb in the UK to piece together the relationships between the people interred within the tomb.

Graphic wise, we’re not talking about anything crazy or inventive here—it’s a family tree after all. But the designers did a nice job using colour to indicate the different family groups of descent, which were spatially organised within the tomb by the woman to whom the children were born. To be fair, it was all based upon the descendants of one man, but one man who had several wives.

What’s fascinating about this, however, is simply the age. We can go back nearly 6,000 years and simply from DNA create a family tree five generations deep.

The only thing I wish is that we had an accompanying map of the tomb, because that’s the other key part of the story. But at the end of the day I’ll always take a nice family tree.

Credit for the piece goes to Newcastle University’s design team.