scatter plot – Coffee Spoons

Climate Conscientious and Cheaper Cars

Sometimes in the course of my work I stumble across graphics and work that I previously missed. In this case I was seeking a post about one of my favourite infographics, but it turned out I’ve never posted about it and so I will have to rectify that someday. However in my searching, I came upon an article from the New York Times last year where they wrote about research from MIT that compared the carbon dioxide emissions—bad for the environment and climate—per mile to the average monthly cost of a wide range of 2021 vehicles. The important distinction here is that average monthly cost is not the sticker price of a vehicle, but rather the sticker price plus lifetime operating costs. (For their analysis, the authors assumed a 15-year lifespan and 13,000 miles driven per year.)

Why is this so important? It’s pretty simple, really. In the United States, vehicle emissions are the largest source of carbon emissions. And the vast majority of that is due to passenger vehicles. If we as a society want to get serious about reducing our carbon footprint, the biggest changes we need to make are reducing our amount of driving, moving more people into mass transit, or switching out people’s gas-powered vehicles for electric vehicles.

The New York Times turned their work into a really nice static datagraphic. It is static, so there is no real interactivity if you want to compare your vehicle to others. However, the designers did choose some popular models and identified some of the key outliers.

There are nice annotations here that double their effort as a legend here.

The designers group the cars, represented by dots, into colour fields. These do a good job of showing how there is overlap between the different types of vehicles. Not all hybrid and plug-in vehicles are cheaper or even less CO2 emitting than some gas-powered vehicles, typically your smaller compacts and hatchbacks. Each colour field is linked to a textual annotation that also functions as a legend.

That alone is very helpful in understanding the differences, subtle and not-so-much, between the types of vehicles. Later on in the article the designers also used a scatter plot of a narrower set of data to compare a select set of vehicles.

Here we can see that one cannot simply assume that all electric vehicles are cheaper long-term than their gas-powered compatriots. Here we can see that the Nissan Altima, whilst emitting more CO2, compares favourably with the Tesla Model 3 in both the long-term cost but also in the upfront sticker price.

Despite finding this article a year and a half late, we can tie this to current events in that President Biden’s climate bill creates tax credits for electric vehicles. While the bill is perhaps not as significant as many would like, it is remarkable for still being a lot of money devoted to reducing our emissions. And when it comes to electric vehicles, one of the key components is the creation of tax credits. These would help mitigate those upfront sticker costs of electric vehicles. Because whilst they may generally be cheaper in the long-run, you still need to put up more money than their conventionally-powered alternatives either as lump sums or down payments. And with interest rates rising, what you need to cover via an auto loan will become more expensive.

Overall this is a really nice piece. Should I ever need to buy another vehicle, I would love to see this as a resource available to the general public. Unfortunately it only compares 2021 vehicles. And it does make me wonder where my 2005 vehicle compares. Probably not too terribly favourably.

Credit for the piece goes to Veronica Penney.

Serfs Up, Bro

Now get him into the fields.

Well that was a week. But at least we made it to Friday and for my American readers and myself this weekend and its bank holiday on Monday, Memorial Day, mark the unofficial beginning of summer. So thanks to Indexed, it’s time to head down to the beach and hang ten (serfs).

Credit for the piece goes to Jessica Hagy.

More on Those Million Covid-19 Deaths

Yesterday I focused on the big graphic from the New York Times that crossed the full spread of the front/back page. But the graphic was merely the lead graphic for a larger piece. I linked to the online version of the article, but for this post I’m going to stick with the print edition. The article consists of a full-page open then an entire interior spread, all in limited colour. The remainder of the extensive coverage consists of photo essays and interviews that understandably attempt to humanise the data points, after all, each dot from yesterday represented one individual, solitary, human being. That is an important element of a story like this and other national and international tragedies, but we also need to focus on the data and not let the emotion of the story overwhelm our rational and logical analysis.

Sometimes it’s hard to realise we’re in the third year of this pandemic.

From a data visualisation standpoint the first page begins simply enough with a long timeline of the Covid-19 pandemic charting the number of absolute deaths each day. As we looked at yesterday, the absolute deaths tell part of the story. But if we were to have looked at the number of absolute cases in conjunction with the deaths, we could also see how the virus has thus far evolved to be more transmissible but less lethal. Here the number of daily deaths from Omicron surpassed Delta, but fell short of the winter peak in early 2021. But the number of cases exploded with Omicron, making its mortality rate lower. In other words, far more people were getting sick, but as far fewer were dying.

An interesting note is that if you take a look at the online version, there the designers chose a more stylised approach to presenting the data.

Here they kept the dot approach and simply stacked and reordered the dots. However, I presume for aesthetic reasons, they kept the stacking loose dots and dropped all the axis lines because it does make for a nice transition from the map to this chart. But they also dropped all headings and descriptors that tell the reader just what they are looking at. These decisions make the chart far less useful as a tool to tell the data-driven element of the story.

There are three annotations that label the number of deaths in New York, the Northeast, and the rest of the United States. But what does the chart say? When are the endpoints for those annotations? And then you can compare the scale of the y-axis of this chart and compare it to the printed version above. A more dramatic scale leads to a more dramatic narrative.

This sort of visual style of flash and fancy transitions over the clear communication of the data is why I find the print piece more compelling and more trustworthy. I find the online version, still useful, but far more lacking and wanting in terms of information design.

The interior spread is where this article shines.

From an editorial design standpoint, the symmetry works very well here. It’s a clear presentation and the white space around the graphic blocks lets that content shine as it should in this type of story. Collectively these pieces do a great job telling the story of the pandemic thus far across the nation. The graphics do not need a lot of colour and make do with sparse flash. Annotations call the reader’s attention to salient points and outliers.

From a content standpoint, I would be particularly curious if we have robust data for deaths by education level. Earlier this year I recall reading news about a study that said education best correlated to Covid cases, and I would be curious to see if that held true for deaths. Of course these charts do a great job of showing just how effective the vaccines were and remain. They are the best preventative measure we have available to us.

Here I disagree with the design decision of how to break down the states into regions. The Census Bureau breaks down the United States into four regions using the same names as in the graphic above. However, if you look closely at the inset map, you will see that Delaware, Maryland, and West Virginia in particular are included as part of the Northeast. (I cannot tell if the District of Columbia is included as part of the Northeast or South.)

Now compare that to the Census Bureau’s definition:

If you ask me to include Delaware and Maryland as part of the Northeast, well, if you’re selling it, I’ll buy it. After all, just because the Census Bureau defines the United States this way does not mean the New York Times has to. Both are connected to the Northeast Corridor via Amtrak and I-95 and are plugged into the Megalopolis economy. Maybe the Potomac should be the demarcation between Northeast and South. But I struggle to understand West Virginia. Before you go and connect it to the Northeast, I would argue that West Virginia has far more in common with the Midwest geographically, economically, and culturally.

More critically, given this issue, it strikes me as a serious problem when the online version of the chart—with the aforementioned issues—does not even include the little inset to highlight this at best unusual regional definition.

And so while I have reservations about the data—how would the data have looked if the states were realigned?—the design of the line charts overall is good.

Again, I am talking about the print version, not that online graphic. I would argue that the above screenshot is barely even a chart and more “data art” or an illustration of data. Consider here, for example, that for the South we have that muted slate blue for the dots, but the spacing and density of the dots leads to areas of lighter slate and darker slate. But a lighter slate means more space between stacked dots and darker slate means a more compact design. A lighter colour therefore pushes the “edge” of the line further up the y-axis and artificially inflates its value, not that we can understand what that value is as the “chart” lacks any sort of y-axis.

Finally the print piece has a set of small multiples breaking down deaths by income in the three largest American cities: New York, Los Angeles, and Chicago. These are just great little charts showing the correlation between income and death from Covid, organised by Zip code.

But this also serves as a stark reminder of just how much better the print piece is over the online version. Because if we take a look at a screenshot from the online article, we have a graphic that addresses all the issues I pointed out earlier.

Why couldn’t the online article kept to this style?

I am left to wonder why the reader of the online version does not have access to this clearer and more accurate representation of the data throughout the piece?

To me this article is a great example of when the print piece far exceeds that of the online version. Content-wise this is a great story that needed to be told this weekend, but design wise we see a significant gap in quality from print to online. Suffice it to say that on Sunday I was very glad I received the print version.

Credit for the piece goes to Sarah Almukhtar, Amy Harmon, Danielle Ivory, Lauren Leatherby, Albert Sun, and Jeremy White.

I Call Them Life Tiles

Happy Friday, everyone. Here in the United States’ Northeast Corridor we’re looking forward to a potentially powerful nor’easter that could be the first real snowstorm to hit Philadelphia all winter. (Dumb La Niña.)

But I’ve also recently started working in a new sketchbook. (It happens often.) But that’s why I thought this graphic from Indexed would work for me. I am often sketching out notes, concepts, still lifes, whatever else and I now have a neat little collection of used sketchbooks.

But my sketchbooks are always worth my time and that’s why I always save them.

Credit for the piece goes to Jessica Hagy.

Covid Vaccination and Political Polarisation

I will try to get to my weekly Covid-19 post tomorrow, but today I want to take a brief look at a graphic from the New York Times that sat above the fold outside my door yesterday morning. And those who have been following the blog know that I love print graphics above the fold.

Of the six-column layout, you can see that this graphic gets three, in other words half-a-page width, and the accompany column of text for the article brings this to nearly 2/3 the front page.

When we look more closely at the graphic, you can see it consists of two separate parts, a scatter plot and a line chart. And that’s where it begins to fall apart for me.

Pennsylvania is thankfully on the more vaccinated side of things

The scatter plot uses colour to indicate the vote share that went to Trump. My issue with this is that the colour isn’t necessary. If you look at the top for the x-axis labelling, you will see that the axis represents that same data. If, however, the designer chose to use colour to show the range of the state vote, well that’s what the axis labelling should be for…except there is none.

If the scatter plot used proper x-axis labels, you could easily read the range on either side of the political spectrum, and colour would no longer be necessary. I don’t entirely understand the lack of labelling here, because on the y-axis the scatter plot does use labelling.

On a side note, I would probably have added a US unvaccination rate for a benchmark, to see which states are above and below the US average.

Now if we look at the second part of the graphic, the line chart, we do see labelling for the axis here. But what I’m not fond of here is that the line for counties with large Trump shares, the line significantly exceeds the the maximum range of the chart. And then for the 0.5 deaths per 100,000 line, the dots mysteriously end short of the end of the chart. It’s not as if the line would have overlapped with the data series. And even if it did, that’s the point of an axis line, so the user can know when the data has exceeded an interval.

I really wanted to like this piece, because it is a graphic above the fold. But the more I looked at it in detail, the more issues I found with the graphic. A couple of tweaks, however, would quickly bring it up to speed.

Credit for the piece goes to Ashley Wu.

Big Beer

A few weeks back, a good friend of mine sent me this graphic from Statista that detailed the global beer industry. It showed how many of the world’s biggest brands are, in fact, owned by just a few of the biggest companies. This isn’t exactly news to either my friend or me, because we both worked in market research in our past lives, but I wanted to talk about this particular chart.

At first glance we have a tree map, where the area of each “squarified” shape represents, usually, the share of the total. In this case, the share of global beer production in millions of hectolitres. Nothing too crazy there.

Next, colour often will represent another variable, for market share you might often see greens or blues to red that represent the recent historical growth or forecast future growth of that particular brand, company, or market. Here, however, is where the chart begins to breakdown. Colour does not appear to encode any meaningful data. It could have been used to encode data about region of origin for the parent company. Imagine blue represented European companies, red Asian, and yellow American. We would still have a similarly coloured map, sans purple and green,

But we also need to look at the data the chart communicates. We have the production in hectolitres, or the shape of the rectangle. But what about that little rectangle in the lower right corner? Is that supposed to be a different measurement or is it merely a label? Because if it’s a label, we need to compare it to the circles in the upper right. Those are labels, but they change in size whereas the rectangles change only in order to fit the number.

And what about those circles? They represent the share of total beer production. In other words the squares represent the number of hectolitres produced and the circles represent the share of hectolitres produced. Two sides of the same coin. Because we can plot this as a simple scatter plot and see that we’re really just looking at the same data.

Not the most interesting scatter plot I’ve ever seen…

We can see that there’s a pretty apparent connection between the volume of beer produced and the share of volume produced—as one would (hopefully) expect. The chart doesn’t really tell us too much other than that there are really three tiers in the Big Six of Breweries. AB Inbev is in own top tier and Heineken is a second separate tier. But Carlsberg and China Resources Snow Breweries are very competitive and then just behind them are Molson Coors and Tsingtao. But those could all be grouped into a third tier.

Another way to look at this would be to disaggregate the scatter plot into two separate bar charts.

You can see the pattern in terms of the shapes of the bars and the resulting three tiers is broadly the same. You can also see how we don’t need colour to differentiate between any of these breweries, nor does the original graphic. We could layer on additional data and information, but the original designers opted not to do that.

But I find that the big glaring miss is that the article makes the point despite the boom in craft beer in recent years, American craft beer is still a very small fraction of global beer production. The text cites a figure that isn’t included in the graphic, probably because they come from two different sources. But if we could do a bit more research we could probably fit American craft breweries into the data set and we’d get a resultant chart like this.

This more clearly makes the point that American craft beer is a fraction of global beer production. But it still isn’t a great chart, because it’s looking at global beer production. Instead, I would want to be able to see the share of craft brewery production in the United States.

How has that changed over the last decade? How dominant are these six big beer companies in the American market? Has that share been falling or rising? Has it been stable?

Well, I went to the original source and pulled down the data table for the Top 40 brewers. I took the Top 15 in beer production, all above 1% share in 2020, and then plotted that against the change in their beer production from 2019 to 2020. I added a benchmark of global beer production—down nearly 5% in the pandemic year—and then coloured the dots by the region of origin. (San Miguel might not seem to fit in Asia by name, but it’s from the Philippines.)

What mine does not do, because I couldn’t find a good (and convenient) source is what top brands belong to which parent companies. That’s probably buried in a report somewhere. But whilst market share data and analysis used to be my job, as I alluded to in the opening, it is no longer and I’ve got to get (virtually) to my day job.

Credit to the original goes to Felix Richter.

Credit for my take goes to me.

Get Your Shots

I’ve heard a lot about vaccine hesitancy and resistance lately and I mentioned this on Monday. Subsequently, I thought I would try to make a graphic to try and help people understand where some of these excuses fit on the spectrum of rational to irrational—with claims of people being magnetised off the chart in the land of kooky.

But I also realised there’s a second spectrum, albeit far more limited in range, of selfishness vs altruism. And there is an interesting shift in how those who waited for the most vulnerable to receive their shots first were, initially, altruistic and rational. But now that those populations have received their vaccines, it’s shifted into an irrational selfish behaviour.

Anyway, I made a few sketches and as I was working on it, there was something in the aesthetic quality of the sketches that I couldn’t quite replicate digitally. And so I present the unpolished rough cut of my graphic.

For the fuller explanations, I refer you to my aforementioned post earlier this week. This was just an attempt to visualise the two spectrums.

Credit for the piece is mine.

Olympic Recap/Retro

Every four years (or so) I have to confess that I think fondly back upon my former job, because I worked with a few wonderful colleagues of mine on some data about the Olympics. And the highlight was that we had a model to try and predict the number of medals won by the host country as we were curious about the idea of a host nation bump. In other words, do host countries witness an increase in their medal count relative to their performance in other Olympiads?

We concluded that host nations do see a slight bump in their total medal count and we then forecast that we expected Team GB (the team for Great Britain and Northern Ireland) to win a total of 65 medals. We reached 64 by the final day and it wasn’t until the women’s pentathlon when, in maybe the last event, Team GB won a silver medal bringing its total to 65, exactly in line with our forecast.

Probably the most Olympics I’ve ever watched.

Of course we also looked at the data for a number of other things, including if GDP per capita correlated to Olympic performance. We also looked at BMI and that did yield some interesting tidbits. But at the end of the day it was the medal forecast that thrilled me in the summer of 2012.

So yeah, today’s a shameless plug for some old work of mine. But I’m still proud of it two olympiads later.

If you’d like to see some of the pieces, I have them in my portfolio.

Credit for the piece is mine.

Boldly Going…

Those of my readers who know me well know that I’ve long been a fan of Star Trek. And so we’ve made it to the weekend. And over at Indexed earlier this month, well, we have a great science fiction comparison.

Here in the states we have a bank holiday Monday, so Star Trek is just a great way to start a holiday weekend.

The needs of the many outweigh the needs of the few. Or the one.

Credit for the piece goes to Jessica Hagy.

I’ve Got the Subtlest of Blues

As I prepared to reconnect and rejoin the world, I spent most of the weekend prior to full vaccination cleaning and clearing out my flat of things from the past 14 months. One thing I meant to do more with was printed pieces I saw in the New York Times. Interesting pages, front pages in particular, have been piling up and before recycling them all, I took some photos of the backlog. I’ll try to publish more of them in the coming weeks and months.

You may recall this time last month I wrote about a piece from the New York Times that examined the politicisation of vaccinations. I meant to get around to the print version, but didn’t, so let’s do it now.

I noted last time the use of ellipses for the title and the lack of value scales on the x-axis. Those did not change from the online version. But look at the y-axis.

For the print piece I noted how the labels were placed inside the chart. I wondered at the time—but didn’t write about—how perhaps that could have been a technical limitation for the web. But here we can see the labels still inside. It was a deliberate design decision.

Keeping with the labelling, I also pointed out Wyoming being outside the plot and it is here too, but I finally noted the lack of a label for zero on the first chart. Here the zero does appear, as I would have placed it. That does make me wonder if the lack of zero online was a technical/development issue.

Finally, something very subtle. At first, I didn’t catch this and it wasn’t until I opened the image above in Photoshop. The web version I noted the use of tints, or lighter shades, for two different blues and two different reds. When I looked at the print, I saw only one red and one blue. But they were in fact different, and it wasn’t until I had zoomed in on the photo I took when I could see the difference.

The dots do have two different blues. But it’s very subtle. Same with the red.

So all in all the piece is very similar to what we looked at last month, but there were a few interesting differences. I wonder if the designers had an opportunity to test the blues/reds prior to printing. And I wonder if the zero label was an issue for developers.

Credit for the piece goes to Lauren Leatherby and Guilbert Gates.