Covid Vaccination and Political Polarisation

I will try to get to my weekly Covid-19 post tomorrow, but today I want to take a brief look at a graphic from the New York Times that sat above the fold outside my door yesterday morning. And those who have been following the blog know that I love print graphics above the fold.

On my proverbial stoop this morning.

Of the six-column layout, you can see that this graphic gets three, in other words half-a-page width, and the accompany column of text for the article brings this to nearly 2/3 the front page.

When we look more closely at the graphic, you can see it consists of two separate parts, a scatter plot and a line chart. And that’s where it begins to fall apart for me.

Pennsylvania is thankfully on the more vaccinated side of things

The scatter plot uses colour to indicate the vote share that went to Trump. My issue with this is that the colour isn’t necessary. If you look at the top for the x-axis labelling, you will see that the axis represents that same data. If, however, the designer chose to use colour to show the range of the state vote, well that’s what the axis labelling should be for…except there is none.

If the scatter plot used proper x-axis labels, you could easily read the range on either side of the political spectrum, and colour would no longer be necessary. I don’t entirely understand the lack of labelling here, because on the y-axis the scatter plot does use labelling.

On a side note, I would probably have added a US unvaccination rate for a benchmark, to see which states are above and below the US average.

Now if we look at the second part of the graphic, the line chart, we do see labelling for the axis here. But what I’m not fond of here is that the line for counties with large Trump shares, the line significantly exceeds the the maximum range of the chart. And then for the 0.5 deaths per 100,000 line, the dots mysteriously end short of the end of the chart. It’s not as if the line would have overlapped with the data series. And even if it did, that’s the point of an axis line, so the user can know when the data has exceeded an interval.

I really wanted to like this piece, because it is a graphic above the fold. But the more I looked at it in detail, the more issues I found with the graphic. A couple of tweaks, however, would quickly bring it up to speed.

Credit for the piece goes to Ashley Wu.

Misleading Graphics Aren’t Limited to US Elections

Last week I wrote about how CBS News’ coverage of the California recall election featured a misleading graphic. In particular, the graphic created the appearance that the results were closer than they really were.

This week we had another election and, sadly, I find that I have to write the same sort of piece again. Except this time we are headed north of the border to Canada.

I was watching CBC coverage last night and I noticed early on that the vote share bar chart looked off given the data points. Next time it popped up I took a screenshot.

Look at the bars

First we need to note these are three-dimensional and the camera angle kept swinging around—not ideal for a fair comparison. This was the most straight-on angle I captured.

Second, at first glance, we have the Conservative share at a little more than 3/4 the Liberal vote share. That looks to be about right. Then you have the New Democratic Party (NDP) at roughly half the vote of the Conservatives. And the bar looks about half the height of the blue Conservative bar. Checks out. Then you have the People’s Party of Canada at roughly 1/4 the amount of NDP votes. But now look at the bar’s height. The purple bar is nearly the same height as the orange bar.

Clearly that is wrong and misleading.

The problem, I think, is that the designers artificially inflated the height of the bars to include the labels and data points for the bars. The designers should have dropped the labelling below the bars and let the bars only represent the data.

I created the following graphic to show how the chart should have looked.

And my take…

Here you can more clearly see how much greater the NDP victory was over the People’s Party. The labelling falls below the charts and doesn’t distort the height comparison between the bars. In some respects, it wasn’t even close. But the original graphic made it look else wise.

I just wish I knew what the designers were thinking. Why did they inflate the bars? Like with the CBS News graphic, I hope it wasn’t intentional. Rather, I hope it was some kind of mistake or even ignorance.

Credit for the original piece goes to the CBC graphics department.

Credit for the updated version is mine.

Correcting CBS News Charts

One of the long-running critiques of Fox News Channel’s on air graphics is that they often distort the truth. They choose questionable if not flat-out misleading baselines, scales, and adjust other elements to create differences where they don’t exist or smooth out problematic issues.

But yesterday a friend sent me a graphic that shows Fox News isn’t alone. This graphic came from CBS News and looked at the California recall election vote totals.

If you just look at the numbers, 66% and 34%, well we can see that 34 is almost half of 66. So why does the top bar look more like 2/3 of the length of the bottom? I don’t actually know the animus of the designer who created the graphic, but I hope it’s more ignorance or sloppiness than malice. I wonder if the designer simply said, 66%, well that means the top bar should be, like, two-thirds the length of the bottom.

The effect, however, makes the election seem far closer than it really was. For every yes vote, there were almost two no votes. And the above graphic does not capture that fact. And so my friend asked if I could make a graphic with the correct scale. And so I did.

One really doesn’t need a chart to compare the two numbers. And I touch on that with the last point, using two factettes to simply state the results. But let’s assume we need to make it sexy, sizzle, or flashy. Because I think every designer has heard that request.

A simple scale of 0 to 66 could work and we can see how that would differ from the original graphic. Or, if we use a scale of 0 to 100, we can see how the two bars relate to each other and to the scale of the total vote. That approach would also have allowed for a stacked bar chart as I made in the third option. The advantage there is that you can easily see the victor by who crosses the 50% line at the centre of the graphic.

Basically doing anything but what we saw in the original.

Credit for the original goes to the CBS News graphics department.

Credit for the correction is mine.

Big Beer

A few weeks back, a good friend of mine sent me this graphic from Statista that detailed the global beer industry. It showed how many of the world’s biggest brands are, in fact, owned by just a few of the biggest companies. This isn’t exactly news to either my friend or me, because we both worked in market research in our past lives, but I wanted to talk about this particular chart.

Not included, your home brew

At first glance we have a tree map, where the area of each “squarified” shape represents, usually, the share of the total. In this case, the share of global beer production in millions of hectolitres. Nothing too crazy there.

Next, colour often will represent another variable, for market share you might often see greens or blues to red that represent the recent historical growth or forecast future growth of that particular brand, company, or market. Here, however, is where the chart begins to breakdown. Colour does not appear to encode any meaningful data. It could have been used to encode data about region of origin for the parent company. Imagine blue represented European companies, red Asian, and yellow American. We would still have a similarly coloured map, sans purple and green,

But we also need to look at the data the chart communicates. We have the production in hectolitres, or the shape of the rectangle. But what about that little rectangle in the lower right corner? Is that supposed to be a different measurement or is it merely a label? Because if it’s a label, we need to compare it to the circles in the upper right. Those are labels, but they change in size whereas the rectangles change only in order to fit the number.

And what about those circles? They represent the share of total beer production. In other words the squares represent the number of hectolitres produced and the circles represent the share of hectolitres produced. Two sides of the same coin. Because we can plot this as a simple scatter plot and see that we’re really just looking at the same data.

Not the most interesting scatter plot I’ve ever seen…

We can see that there’s a pretty apparent connection between the volume of beer produced and the share of volume produced—as one would (hopefully) expect. The chart doesn’t really tell us too much other than that there are really three tiers in the Big Six of Breweries. AB Inbev is in own top tier and Heineken is a second separate tier. But Carlsberg and China Resources Snow Breweries are very competitive and then just behind them are Molson Coors and Tsingtao. But those could all be grouped into a third tier.

Another way to look at this would be to disaggregate the scatter plot into two separate bar charts.

And now to the bars…

You can see the pattern in terms of the shapes of the bars and the resulting three tiers is broadly the same. You can also see how we don’t need colour to differentiate between any of these breweries, nor does the original graphic. We could layer on additional data and information, but the original designers opted not to do that.

But I find that the big glaring miss is that the article makes the point despite the boom in craft beer in recent years, American craft beer is still a very small fraction of global beer production. The text cites a figure that isn’t included in the graphic, probably because they come from two different sources. But if we could do a bit more research we could probably fit American craft breweries into the data set and we’d get a resultant chart like this.

A better bar…

This more clearly makes the point that American craft beer is a fraction of global beer production. But it still isn’t a great chart, because it’s looking at global beer production. Instead, I would want to be able to see the share of craft brewery production in the United States.

How has that changed over the last decade? How dominant are these six big beer companies in the American market? Has that share been falling or rising? Has it been stable?

Well, I went to the original source and pulled down the data table for the Top 40 brewers. I took the Top 15 in beer production, all above 1% share in 2020, and then plotted that against the change in their beer production from 2019 to 2020. I added a benchmark of global beer production—down nearly 5% in the pandemic year—and then coloured the dots by the region of origin. (San Miguel might not seem to fit in Asia by name, but it’s from the Philippines.)

Now I can use a good bar.

What mine does not do, because I couldn’t find a good (and convenient) source is what top brands belong to which parent companies. That’s probably buried in a report somewhere. But whilst market share data and analysis used to be my job, as I alluded to in the opening, it is no longer and I’ve got to get (virtually) to my day job.

Credit to the original goes to Felix Richter.

Credit for my take goes to me.

Rarely Shady in Philadelphia

After a rainy weekend in Philadelphia thanks to Hurricane Henri, we are bracing for another heat wave during the middle of this week. Of course when you swelter in the summer, you seek out shade. But as a recent article in the Philadelphia Inquirer pointed out, not all neighbourhoods have the same levels of tree cover, or canopy.

From a graphics standpoint, the article includes a really nice scatter plot that explores the relationship between coverage and median household income. It shows that income correlates best with lack of shade rather than race. But I want to focus on a screenshot of another set of graphics earlier on in the article.

On the other hand, pollen.

I enjoyed this graphic in particular. It starts with a “simple” map of tree coverage in Philadelphia and then overlays city zip codes atop that. Two zip codes in particular receive highlights with bolder and larger type.

Those two zip codes, presumably the minimum and maximum or otherwise broadly representative, then receive call outs directly below. Each includes an enlarged map and then the data points for tree cover, median income, and then Black/Latino percentage of the population.

I don’t think the median income needs to be in bar chart form here, especially given the bars do not line up so that you can easily compare the zip codes. The numbers would work well enough as factettes or perhaps a small dot plot with the zip codes highlighted could work instead.

Additionally, the data labels would be particularly redundant if a small scale were used instead. That would work especially well if the median income were moved to the lowest place in the table and the share charts were consolidated in one graphic. Conceptually, though, I enjoy the deep dive into those two zip codes.

Then I wanted to highlight some great design work on the maps. Note how in particular for Chestnut Hill, 19118, the outline of the zip code is largely in a thicker, black stroke than the rest of the map. At the upper right, however, you have two important roads that define the area and the black stroke breaks at those points so the roads can be clearly and well labelled. The other map does the same thing for two roads, but their breaks are shorter as the roads run perpendicular to the border.

Overall this was just a great piece to read and I thoroughly enjoyed the graphics.

Credit for the piece goes to John Duchneskie.

Olympic Recap/Retro

Every four years (or so) I have to confess that I think fondly back upon my former job, because I worked with a few wonderful colleagues of mine on some data about the Olympics. And the highlight was that we had a model to try and predict the number of medals won by the host country as we were curious about the idea of a host nation bump. In other words, do host countries witness an increase in their medal count relative to their performance in other Olympiads?

We concluded that host nations do see a slight bump in their total medal count and we then forecast that we expected Team GB (the team for Great Britain and Northern Ireland) to win a total of 65 medals. We reached 64 by the final day and it wasn’t until the women’s pentathlon when, in maybe the last event, Team GB won a silver medal bringing its total to 65, exactly in line with our forecast.

Probably the most Olympics I’ve ever watched.

Of course we also looked at the data for a number of other things, including if GDP per capita correlated to Olympic performance. We also looked at BMI and that did yield some interesting tidbits. But at the end of the day it was the medal forecast that thrilled me in the summer of 2012.

So yeah, today’s a shameless plug for some old work of mine. But I’m still proud of it two olympiads later.

If you’d like to see some of the pieces, I have them in my portfolio.

Credit for the piece is mine.

Sunday Covid-19 Data

Another day, more cases of coronavirus and Covid-19. So let’s take a look at Sunday’s data as there were some interesting things going on.

First, let’s dispense with Virginia. The state is enhancing its reporting structure, and so they admit the data is likely an underestimate of the present situation in Virginia. So here’s Virginia, nothing really changed.

The situation in Virginia
The situation in Virginia

Moving on, we have Pennsylvania. Here we are beginning to truly see the disparity between the cities in the southeast and southwest, namely Philadelphia and Pittsburgh, and the T that describes what sometimes is used to describe Pennsyltucky. (Though it also includes cities like Harrisburg, the state capital.) The point is that the T of Pennsylvania has yet to suffer greatly from the outbreak. Of course, it’s also the part of the state least equipped to deal with a pandemic.

The situation in Pennsylvania
The situation in Pennsylvania

New Jersey is just bad. One can make the argument that South Jersey is hanging on. (Though I will touch on that later with an idea for today’s afterwork work.) Bergen County in the northeast is likely to surpass 10,000 cases on its own today. And that will put it above most states.

The situation in New Jersey
The situation in New Jersey

Delaware is tough because it sits as a small state next to several much larger ones. But, the numbers seem to indicate the outbreak is still worsening. Though in terms of geographic spread, there’s little to say other than that New Castle County, home to Wilmington, in the north is the heart of the state’s outbreak.

The situation in Delaware
The situation in Delaware

Illinois is a fascinating state, because of how dissimilar it is compared to Pennsylvania, a state which has a similar number of people.

The situation in Illinois
The situation in Illinois

The map shows that geographic spread still has a little way to go before reaching every county in the state. But the outbreak has been there longer than in Pennsylvania. And most of the darker purples are concentrated in the northeast, in Chicago and its collar counties. Compare that to Pennsylvania above where you will see dark purple scattered across the cities of its eastern third, e.g. Allentown and Scranton, and in the western parts near Pittsburgh. This too could be worth exploring in depth in the future.

Lastly I want to get to the cases curves charts. Here we look at the daily new cases in each state.

The curves, flattening or otherwise, of the five states.
The curves, flattening or otherwise, of the five states.

And unfortunately Sunday’s numbers will impact the Virginia curve, but it overall looks as if the state is worsening. I would argue that Illinois, which appears to be bending towards a steadying condition is likely in a weird weekly pattern where it appears to stabilise on weekends and then resumes reported infections come Monday. Pennsylvania might well be flattening its curve. I would want to see a few more days’ worth of data before stating that more definitively. Let’s give it to Wednesday or Thursday.

And then in New Jersey we have a fascinating trend. The curve of increasing number of cases has clearly broken. But it also is not shrinking. Instead, it seems to be more of a plateau. And in that case, the outbreak in New Jersey is not getting worse, but it’s also not getting any better. At least not numerically. However, the goal of flattening the curve is to create a slower, more steady increase in case numbers to help hospitals cope with surge volumes. So good news?

Credit for the pieces is mine.

Wednesday’s Corona Update

As I said yesterday, since people are finding these updates helpful on the social media, I am going to repost the previous evening’s graphics I make on the Coronavirus Covid-19 outbreak here on Coffeespoons as well. So while today is Thursday, these are the numbers states provided yesterday, so it’s more of a Wednesday update.

But here I can start with the flatter curves graphic. The New Jersey numbers in particular look good—I mean they’re still bad. Of course we are just a few big breaches of quarantine and lapses in social distancing from reversing that progress.

Maybe some curve flattening?
Maybe some curve flattening?

State-wise, Pennsylvania continues to worsen. However, a close look at the slope of the line in the previous chart indicates that the steepness of the growth may be lessening. Deaths passed 300 and cases are now firmly entrenched on both sides of the state with the rural, less densely populated areas in the Ridge and Valley portion of the state seemingly hit not as hard.

The situation in Pennsylvania
The situation in Pennsylvania

Despite the potential flattening, New Jersey is just in a rough spot. The final bastions of low case numbers in South Jersey are slowly filling up as Cape May County passed the 100-case threshold.

The situation in New Jersey
The situation in New Jersey

Delaware continues to accelerate and is now past 1000 cases.

The situation in Delaware
The situation in Delaware

Virginia continues to see cases spreading in the eastern, more populous portions of the state. And at 75 deaths, it’s nearing the 100-death threshold.

The situation in Virginia
The situation in Virginia

Illinois is seeing deaths occur away from Chicago, in the St. Louis suburban counties and in and around Springfield and Champaign and Bloomington areas.

The situation in Illinois
The situation in Illinois

Credit for the piece goes to me.

Tuesday’s Data on Covid-19

Here are the Tuesday figures for Pennsylvania, New Jersey, Delaware, Virginia, and Illinois. At the end is an updated version of the flattening curves chart as well. Given the value of these graphics that people have been texting, emailing, and DMing me on social media, I might consider making these a regular staple here on my blog as well. I would probably slowly write about other graphics covering the outbreak as well.

Any feedback is welcome on how to make the graphics more useful to you, the public.

Pennsylvania has finally reached the point where the virus has infected at least one person in every county. Now, if we shift our attention a wee bit to the deaths, we can see those are still largely confined to the eastern third of the state.

The condition in Pennsylvania
The condition in Pennsylvania

New Jersey continues to suffer greatly. But a sharp increase in new cases could be a blip, or it could mean the curve isn’t flattening. We need more data to see a longer trend. Regardless, over 3000 more people were reported infected and over 200 more died.

The condition in New Jersey
The condition in New Jersey

Delaware worsened significantly. As a small state, it has a lower captive population. But it is rapidly approaching 1000 cases. In fact, I would not be surprised if that is the headline from Wednesday.

The condition in Delaware
The condition in Delaware

Virginia also saw a significant uptick in cases. And most counties and independent cities in eastern Virginia now report cases. But the rural, mountainous counties in the west and southwest are not uniformly infected. At least not yet.

The condition in Virginia
The condition in Virginia

Illinois saw some geographic spread, but again, compared to a state like Pennsylvania, the worst in Illinois is disproportionately concentrated in the Chicago metropolitan area.

The condition in Illinois
The condition in Illinois

Lastly, the curves are not flattening in all the states but maybe New Jersey. But as I noted above, the higher daily cases there might be a blip.

The state of curves
The state of curves

Credit for the pieces goes to me.

Where’s My Corona? Another Round, Please

This past weekend I continued looking at the spread of COVID-19 across the United States. But in addition to my usual maps of Pennsylvania, New Jersey, Delaware, Virginia, and Illinois, I also looked at the number of cases across the United States adjusted for population. I then looked at the five aforementioned states in terms of new cases to see if the curve is flattening. Finally, I looked at the number of hospital beds per 1000 people vs the number of cases per 1000 people.

The latter in particular I wanted to be an examination of hospitalisation rates vs ICU beds, which are a small fraction of total hospital beds. But as I could not find that data, I made do with overall cases and overall beds.

So first let’s look at the cases across the U.S. What you can see is that whilst New York and New Jersey do have some of the worst of the impact, Washington is still not great and Louisiana and Michigan are also suffering.

The situation across the United States
The situation across the United States

And then when we look at the states by their cases per 1000 people and their hospital beds per 1000 people, we see that the states often claimed to be overwhelmed, New York, New Jersey, and Washington are all well over the blue line, which indicates an equal number of beds and cases per 1000 people, or near it. Because it is important to remember that not all beds are the type needed for COVID-19 victims, who often require the more fully kitted out ICU beds. Additionally, not all cases are severe enough to warrant hospitalisation.

Cases per 1k people vs hospital beds per 1k people
Cases per 1k people vs hospital beds per 1k people

Then from the broader national view, we can look at the states of interest. Here, those of you who have been following my social media posts, you can see fewer dark purples in these maps. That’s because I have adopted a new palette that has sacrificed granularity at the lower end of the scale and added it at the top, a particular need in New Jersey and the Philadelphia and Chicago metro areas. And finally we look at the daily new cases to see if that curve is flattening.

Pennsylvania now has almost every county infected. But unlike Illinois, which has a similar infection rate but more unaffected counties, Pennsylvania has fewer cases in its big city, Philadelphia, and has more cases in the smaller cities and towns.

The situation in Pennsylvania
The situation in Pennsylvania

New Jersey is just a disaster. Deaths are now reported in every county—so I can probably remove those orange outlines. The only potential good news is that new cases for the second day in a row were fewer than the day before. It could be a blip. But it could also be a signal that the peak of infection has or is nearing. That said, hospitalisations and deaths are lagging indicators and could take two weeks to follow the positive test results. So in the best case scenario that this is a peak, New Jersey is far from out of the woods.

The situation in New Jersey
The situation in New Jersey

Delaware is the smallest state I look at—and one of the smallest in the union overall—but its cases are worryingly increasing rapidly, although like every state I examine in detail it had fewer new cases Sunday than Saturday.

The situation in Delaware
The situation in Delaware

Virginia is in a better spot overall than the other four states. You can see that in the national map above. And most of Virginia’s cases are concentrated in the DC and Richmond areas as well as the cities along the peninsulas jutting into the Chesapeake.

The situation in Virginia
The situation in Virginia

Illinois is, as noted above, similar to Pennsylvania in terms of infections. In terms of deaths, however, it is doubling Pennsylvania’s numbers. And most of its cases are located in and around Chicago. Big chunks of downstate Illinois are unaffected or lightly affected compared to the Commonwealth.

The situation in Illinois
The situation in Illinois

Finally, as I noted in New Jersey, could these lower numbers Sunday than Saturday be meaningful? Possibly. But in all five states? Highly unlikely. Regardless, we can look at the number of daily new cases and see if that curve of infection is flattening. We should wait several days before beginning to make that assessment. But one can hope.

The case for flattening curves
The case for flattening curves

All of this is to say that things are bad and likely will continue to get worse. But I will keep looking at the data daily and presenting it to the public to keep them informed.

Credit for this piece is mine.