Biden’s Biggest Pyramids

Yesterday we looked at an article from the Inquirer about the 2020 election and how Biden won because of increased margins in the suburbs. Specifically we looked at an interactive scatter plot.

Today I want to talk a bit about another interactive graphic from the same article. This one is a map, but instead of the usual choropleth—a form the article uses in a few other graphics—here we’re looking at three-dimensional pyramids.

All the pyramids, built by aliens?

Yesterday we talked about the explorative vs. narrative concept. Here we can see something a bit more narrative in the annotations included in the graphic. These, however, are only a partial win, though. They call out the greatest shifts, which are indeed mentioned in the text. But then in another paragraph the author writes about Bensalem and its rightward swing. But there’s no callout of Bensalem on the map.

But the biggest things here, pun intended, are those pyramids. Unlike the choropleth maps used elsewhere in the article, the first thing this map fails to communicate is scale. We know the colour means a county’s net shift was either Democratic or Republican. But what about the magnitude? A big pyramid likely means a big shift, but is that big shift hundreds of votes? Thousands of votes? How many thousands? There’s no way to tell.

Secondly, when we are looking at rural parts of Bucks, Chester, and Montgomery Counties, the pyramids are fine. They remain small and contained within their municipality boundaries. Intuitively this makes sense. Broadly speaking, population decreases the further you move from the urban core. (Unless there’s a secondary city, e.g. Minneapolis has St. Paul.) But nearer the city, we have more population, and we have geographically smaller municipalities. Compare Colwyn, Delaware County to Springfield, Bucks County. Tiny vs. huge.

In choropleth maps we face this problem all the time. Look at a classic election map at the county level from 2016.

Wayb ack when…

You can see that there is a lot more red on that map. But Hillary Clinton won the popular vote by more then 3,000,000 votes. (No, I won’t rehash the Electoral College here and now.) More people are crowded into smaller counties than there are in those big, expansive red counties with far, far fewer people.

And that pattern holds true in the Philadelphia region. But instead of using the colour fill of an area as above, this map from the Inquirer uses pyramids. But we face the same problem, we see lots of pyramids in a small space. And the problem with the pyramids is that they overlap each other.

At a glance, you cannot see one pyramid beind another. At least in the choropleth, we see a tiny field of colour, but that colour is not hidden behind another.

Additionally, the way this is constructed, what happens if in a municipality there was a small net shift? The pyramid’s height will be minimal. But to determine the direction of the shift we need to see the colour, and if the area under the line creating the pyramid is small, we may be unable to see the colour. Again, compare that to a choropleth where there would at least be a difference between, say, a light blue and light red. (Though you could also bin the small differences into a single neutral bin collecting all small shifts be them one way or the other.)

I really think that a more straight forward choropleth would more clearly show the net shifts here. And even then, we would still need a legend.

The article overall, though, is quite strong and a great read on the electoral dynamics of the Philadelphia region a month ago.

Credit for the piece goes to John Duchneskie.

Covid Migration

Yep, Covid-19 remains a thing. About a month or so ago, an article in City Lab (now owned by Bloomburg), looked at the data to see if there was any truth in the notion that people are fleeing urban areas. Spoiler: they’re not, except in a few places. The entire article is well worth a read, as it looks at what is actually happening in migration and why some cities like New York and San Francisco are outliers.

But I want to look at some of the graphics going on inside the article, because those are what struck me more than the content itself. Let’s start with this map titled “Change in Moves”, which examines “the percentage drop in moves between March 11 and June 30 compared to last year”.

Conventionally, what would we expect from this kind of choropleth map. We have a sequential stepped gradient headed in one direction, from dark to light. Presumably we are looking at one metric, change in movement, in one direction, the drop or negative.

But look at that legend. Note the presence of the positive 4—there is an entire positive range within this stepped gradient. Conventionally we would expect to see some kind of red equals drop, blue equals gain split at the zero point. Others might create a grey bin to cover a negative one to positive one slight-to-no change set of states. Here, though, we don’t have that. Nor do we even get a natural split, instead the dark bin goes to a slightly less dark bin at positive four, so everything less than four through -16 is in the darker bin.

Look at the language, too, because that’s where it becomes potentially more confusing. If the choropleth largely focuses on the “percentage drop” and has negative numbers, a negative of a negative would be…a positive. A -25% drop in Texas could easily be mistaken with its use of double negatives. Compare Texas to Nebraska, which had a 2% drop. Does that mean Nebraska actually declined by 2%, or does it mean it rose by 2%?

A clean up in the data definition to, say, “Percentage change in moves from…” could clear up a lot of this ambiguity. Changing the colour scheme from a single gradient to a divergent one, with a split around zero (perhaps with a bin for little-to-no change), would make it clearer which states were in the positive and which were in the negative.

The article continues with another peculiar choice in its bar charts when it explores the data on specific cities.

Here we see the destinations of people moving out of San Francisco, using, as a note explains, requests for quotes as a proxy for the numbers of actual moves. What interests me here is the minimalist take on the bar charts. Note the absence of an axis, which leaves the bars almost groundless for comparison, except that the designer attached data labels to the ends of the bars.

Normally data labels are redundant. The point of a visualisation is to visualise the comparison of data sets. If hyper precise differences to the decimal point are required, tables often are a better choice. But here, there are no axis labels to inform the user as to what the length of a bar means.

It’s a peculiar design decision. If we think of labelling as data ink, is this a more efficient use with data labels than just axis labels? I would venture to say no. You would probably have five axis labels (0–4) and then a line to connect them. That’s probably less ink/pixels than the data labels here. I prefer axis lines to help guide the user from labels up (in this case) through the bars. Maybe the axis lines make for more data ink than the labels? It’s hard to say.

Regardless, this is a peculiar decision. Though, I should note it’s eminently more defensible than the choropleth map, which needs a rethink in both design and language.

Credit for the piece goes to Marie Patino.

Trumpsylvania

After working pretty much non-stop all spring and summer, your humble author finally took a few days off and throw in a bank holiday and you are looking at a five-day weekend. But, because this is 2020 travelling was out of the question and so instead I hunkered down to finish writing/designing an article I have been working on for the last several weeks/few months.

The main write-up—it is a lengthy-ish read so you may want to brew a cup of tea—is over at my data projects site. This is the first project I have really written about for that since spring/summer 2016. Some of my longer-listening readers may recall that the penultimate piece there I wrote about Pennsyltucky was inspired by work I did here at Coffeespoons.

To an extent, so is this piece. I wrote about Trumpsylvania, the political realignment of the state of Pennsylvania. 2016 and the state’s vote for Donald Trump was less an aberration than many think. It was the near-end result of a decades-long transformation of the state’s political geography. And so I looked at the data underlying the shift and how and where it occurred.

And originally, I had a slightly different conclusion as to how this related to Pennsylvania in the upcoming 2020 election. But, the whole 2020 thing made me shift my thinking slightly. But you’ll have to read the whole thing to understand what I’m talking about. I will leave you with one of the graphics I made for the piece. It looks at who won each county in the state, but also whether or not the candidate was able to flip the county. In other words, was Clinton able to flip a Republican county? Was Trump able to flip a Democratic county?

Who won what? Who flipped what?

Let me know what you think.

And of course, many, many thanks to all the people who suffered my ideas, thoughts, and early drafts over the last several weeks. And even more thanks to those who edited it. Any and all mistakes or errors in the piece are all mine and not theirs.

Credit for the piece is mine.

Parties in Pennsylvania

This is from a social media post I made a few days ago, but think it may be of some relevance/interest to my Coffeespoons followers. I was curious to see at 30+ days from the general election, how has the landscape changed for the two parties since 2016?

Well, this project has driven me to a related, but slightly different project that has been consuming my non-work time. Hopefully I will have more on that in the coming days. Without further ado, the post:

Pennsylvania will likely be one of the more critical battleground swing states in this year’s election. In 2016, then candidate Trump won the state by less than one percentage point. But four years is a long time and I was curious to see how things have changed.

In the first chart on the right we see counties won by Trump and on the left, Clinton. The further from the centre, the greater the candidate’s margin of victory over the other. The top half plots registered Republicans’ margin over Democrats as a percentage of all registered voters in the county (including independents and third party) and the bottom half does the same for Democrats. Closer to the centre, the more competitive, further away, less so.

Trump’s key to victory was the white, working class voter clustered in the west and the northeast of the state–old mining and steel towns. There Democrats normally counted on organised labour support as registered Democrats. That all but collapsed in 2016. The bottom right shows a number of nominally Democratic counties Trump won, whereas Clinton only picked up one Republican county, Chester.

But what are PA’s battlegrounds?

In the second chart we ignore places like Philly and Fulton County and zoom in on more competitive counties within 20 point margins. Polls presently point to a Biden lead of about 5 points in PA. If every dot moved left by 5 points (it doesn’t really work like that), we only see Erie and Northampton with potential to flip.

But Trump’s realignment of politics is accelerating (more on this another day) a realignment of PA’s political geography.

In the fourth chart, neither Erie nor Northampton show any real movement via party registration back to Democrats. Erie may flip, but Northampton’s likely a stretch. Places like Cumberland and Lancaster counties are too solidly Republican to flip this year. Instead Trump is more likely to flip counties like Monroe and Lehigh red, even if he loses the state.

Because, not shown, the key to a Biden victory will be running up the margins in Philly & Pittsburgh, and to a lesser extent Philly’s four collar counties, including Chester, which appears to be rapidly shifting in Democrats’ favour.

Credit for the piece is mine.

The Size of the California Wildfires Compared to Philly

The West Coast is a different scale than the East Coast. After all, California alone is almost the size of New England and parts of the Mid-Atlantic combined. So when we take that enormous size into consideration, how big are these fires on an East Coast scale? It can be difficult to imagine.

Thankfully the Philadelphia Inquirer addressed the issue.

It’s a simple concept, but I love these kind of graphics. The East Coast is dense and cities and towns are clustered closer together, being they were founded before personal automobiles were things. And so the August Complex fire in California would cover a significant portion of the Philadelphia metropolitan area, almost wiping it all off the map.

Credit for the piece goes to John Duchneskie.

It’ll Get Cooler Eventually

President Trump, on climate change.

I mean, technically he’s correct. Eventually the universe will likely end with heat death as all the energy dissipates and stars die out and space becomes a truly empty, cold void. So it’ll get cooler, eventually.

But what about right now? In one to three generations’ time? 30–90 years? Not looking so great.

So what sparked this ludicrous comment? This year’s wildfire season on the West Coast, usually relegated to California, this year’s season has burned up forests in both Washington and Oregon as well, states whose usually wetter climate inhibits these kind of rapidly spreading fires.

A few days ago the Washington Post published a piece looking at the fires out west. It started with a map showing ultimate fire perimeters and currently active fires.

In a normal year, those fires in Oregon and Washington wouldn’t be there. Welcome to the new normal.

Frequent readers will know I’m not a fan of the dark background for graphics, but I’m betting it was chosen because as you scroll through the article, it makes the photo journalism really pop off the page. Contrast the bright yellows, oranges, and reds with a dark black background and c’est magnifique, at least from a design standpoint. And given this piece is really about the photography depicting the horrors on the West Coast, it’s an understandable design decision.

Credit for the piece goes to Laris Karklis.

Double Your Hurricanes, Double Your Fun

In a first, the Gulf of Mexico basin has two active hurricanes simultaneously. Unfortunately, they are both likely to strikes somewhere along the Louisiana coastline within approximately 36 hours of each other. Fortunately, neither is strong as a storm named Katrina that caused a mess of things several years ago now.

Over the last few weeks I have been trying to start the week with my Covid datagraphics, but I figured we could skip those today and instead run with this piece from the Washington Post. It tracks the forecast path and forecast impact of tropical storm force winds for both storms.

The forecast path above is straight forward. The dotted line represents the forecast path. The coloured area represents the probability of that area receiving tropical storm force winds. Unsurprisingly the present locations of both storms have the greatest possibilities.

Now compare that to the standard National Weather Service graphic, below. They produce one per storm and I cannot find one of the combined threat. So I chose Laura, the one likely to strike mid-week and not the one likely to strike later today.

The first and most notable difference here is the use of colour. The ocean here is represented in blue compared to the colourless water of the Post version. The colour draws attention to the bodies of water, when the attention should be more focused on the forecast path of the storm. But, since there needs to be a clear delineation between land and water, the Post uses a light grey to ground the user in the map (pun intended).

The biggest difference is what the coloured forecast areas mean. In the Post’s versions, it is the probability of tropical force winds. But, in the National Weather Service version, the white area actually is the “cone”, or the envelope or range of potential forecast paths. The Post shows one forecast path, but the NWS shows the full range and so for Laura that means really anywhere from central Louisiana to eastern Texas. A storm that impacts eastern Texas, for example, could have tropical storm force winds far from the centre and into the Galveston area.

Of course every year the discussion is about how people misinterpret the NWS version as the cone of impact, when that is so clearly not the case. But then we see the Post version and it might reinforce that misconception. Though, it’s also not the Post’s responsibility to make the NWS graphic clearer. The Post clearly prioritised displaying a single forecast track instead of a range along with the areas of probabilities for tropical storm force winds.

I would personally prefer a hybrid sort of approach.

But I also wanted to touch briefly on a separate graphic in the Post version, the forecast arrival times.

This projects when tropical storm force winds will begin to impact particular areas. Notably, the areas of probability of tropical storm force winds does not change. Instead the dotted line projections for the paths of the storms are replaced by lines relatively perpendicular to those paths. These lines show when the tropical storm winds are forecast to begin. It’s also another updated design of the National Weather Service offering below.

Again, we only see one storm per graphic here and this is only for Laura, not Marco. But this also probably most analogous to what we see in the Post version. Here, the black outline represents the light pink area on the Post map, the area with at least a 5% forecast to receive tropical storm force winds. The NWS version, however, does not provide any further forecast probabilities.

The Post’s version is also design improved, as the blue, while not as dark the heavy black lines, still draws unnecessary attention to itself. Would even a very pale blue be an improvement? Almost certainly.

In one sense, I prefer the Post’s version. It’s more direct, and the information presented is more clearly presented. But, I find it severely lack in one key detail: the forecast cone. Even yesterday, the forecast cone had Laura moving in a range both north and south of the island of Cuba from its position west of Puerto Rico. 24 hours later, we now know it’s on the southern track and that has massive impact on future forecast tracks.

Being east of west of landfall can mean dramatically different impacts in terms of winds, storm surge, and rainfall. And the Post’s version, while clear about one forecast track, obscures the very real possibilities the range of impacts can shift dramatically in just the course of one day.

I think the Post does a better job of the tropical storm force wind forecast probabilities. In an ideal world, they would take that approach to the forecast paths. Maybe not showing the full spaghetti-like approach of all the storm models, but a percentage likelihood of the storm taking one particular track over another.

Credit for the Post pieces goes to the Washington Post graphics department.

Credit for the National Weather Service graphics goes to the National Weather Service.

A Map of Unequal Comparisons

I’ve largely been busy creating and posting content on the Covid pandemic and its impact on the Pennsylvania, New Jersey, and Delaware tristate area along with, by request, both Virginia, and Illinois, my former home. It leaves me very little time for blogging, and I really do not want this site to become a blog of my personal work. That’s why I have a portfolio or my data project sites, after all.

But in posting my Covid datagraphics, I’ve come across variations of this map with all sorts of meme-y, witty captions saying why Canada is doing so much better than the US, why Americans shouldn’t be allowed to travel to Canada, and now why the Blue Jays shouldn’t be allowed to host Major League Baseball games.

Wait just a minute, there…

Well, that map isn’t necessarily wrong, but it’s incredibly misleading.

First, the map comes from the fantastic Johns Hopkins work on Covid-19. (Full disclosure, that’s the data source I use at work to create my work work datagraphics: https://philadelphiafed.org/covid-19/covid-19-research/covid-19-cases-and-deaths#.) And their site has a larger and more comprehensive dashboard (still hate that term but it does have sticking power) of which the map is the focal point.

The numbers as of this posting.

You can see the map there in the centre and some tables to the left, some tables to the right, and even a micro table beneath thundering away at the map’s position. I could get into the overall design—maybe I will one of these days—but again, let’s look at that map.

The crux of the argument is that there are a lot of red dots in the United States and very few in Canada. But look at the table in the dashboard on the left. At the very bottom you see three small tabs, Admin 0, Admin 1, and Admin 2. Admin 0 contains all entities at the sovereign state level, e.g. US, Canada, Sweden, Brazil, &c. Admin 1 is the provincial/state level, e.g. Pennsylvania, Illinois, Ontario, Quebec, &c. Admin 2 is the sub-provincial/sub-state level, e.g. Philadelphia County, Cook County, Chester County, Lake County, &c.

Notice anything about my examples? Not all countries have provinces/states, but Canada certainly does. And then at Admin 2, the examples and indeed the data only have US counties and US data. Everything in Canada has been aggregated up to Admin 1. And that is the problem.

The second part to point out is the dot-ness of the map. And to be fair, this is part of a broader problem I have been seeing in data visualisation the last few months. Dots, circles, or markers imply specificity in location. The centre of that object, after all, has to fall on a specific geographic place, a latitude and longitude coordinate. It utterly fails to capture the dimensions and physical size of the geographic unit, which can be critical.

Because not all geographic units are of the same size. We all know Rhode Island as one of the smallest US states. Let’s compare that to Nunavut or Yukon in Canada, massive provinces that spread across the Canadian Arctic. Rhode Island, according to Google, 1212 square kilometres. Nunavut? 808,200.

So now show both states/provinces on a map with one dot and Rhode Island’s will practically cover the state. And it will also be surrounded by and in close proximity to the states or Massachusetts and Connecticut. Nunavut, on the other hand will be a small dot in a massive empty space on a map. But those dots are equal.

Now, combine that with the fact that the Hopkins map is showing data on the US county level. Every single county in the United States gets a red dot. By default, that means the US is covered with red dots. But there is no county-level equivalent data for Canada. Or for Mexico (also seen in the above graphic). And so given we’re only using dots to relate the data, we see wide swaths of empty space, untouched by red dots. And that’s just not true.

Yes, large parts of the Canadian Arctic are devoid of people, but not southern Ontario and Quebec, not the southwestern coast of British Columbia, not the Maritimes.

The Hopkins map should be showing geographic units at the same admin level. By that I mean that when on Admin 0, the map should reflect geographic units of sovereign state level, allowing us to compare the US to Canada directly. But, and for this argument I’m assuming we’re keeping the dots despite their flaws, we only see Admin 0 level data.

Admin 1 shows only provincial level data. Some countries will begin to disappear, because Hopkins does not have the data at that level. But in North America, we still can compare Pennsylvania and Illinois to Ontario and Quebec.

But then at Admin 2, we only see the numerous dots of the United States counties. It’s neither an accurate nor a helpful comparison to contrast Chester County or Will County to the entire province of Ontario and so the map should not allow it. Instead, as the above graphic shows, it creates misconceptions of the true state of the pandemic in the US and Canada.

Credit for the Hopkins dashboard goes to, well, Hopkins.

Wednesday’s Covid-19 Data

Here we have the data from Wednesday for Covid-19.

The situation in Pennsylvania
The situation in Pennsylvania

Pennsylvania saw continued spread of the virus. Notably, Monroe County in eastern Pennsylvania passed 1000 cases. It was one of the state’s earliest hotspots. That appears to have been because it was advertised as a corona respite for people from New York, not too far to the east and by then in the grips of their own outbreak.

The situation in New Jersey
The situation in New Jersey

New Jersey grimly passed 5000 deaths Wednesday. And it is on track to pass 100,000 total cases likely Friday or Saturday. Almost 2/3 of these cases are located in North Jersey, with some South Jersey counties still reporting just a few hundred cases and a handful of deaths.

The situation in Delaware
The situation in Delaware

Delaware passed 3000 cases and Kent Co. passed 500. While those don’t read like large numbers, keep in mind the relatively small population of the state.

The situation in Virginia
The situation in Virginia

Virginia has restarted reporting deaths, this time at the county level and not the health district level. What we see is deaths being reported all over the eastern third of the state from DC through Richmond down to Virginia Beach. In the interior counties we are beginning to see the first deaths appear. And in western counties, we still see that the virus has yet to reach some locations, but counties are beginning to report their first cases.

The situation in Illinois
The situation in Illinois

Illinois continues to suffer greatly in the Chicago area, and at levels that dwarf the remainder of the state. However, the downstate counties are beginning to see spikes of their own. Macon and Jefferson Counties each saw increases of 30–40 cases in just 24 hours.

Preview(opens in a new tab)

How about those curves?
How about those curves?

A longer-term look at the states shows how the states diverge in their outbreaks. Pennsylvania looks like it might be forcing the curve downward whereas New Jersey appears to have more plateaued. Earlier I expressed concern about Virginia, which does now appear to have not peaked and continues to see an increasing rate of spread. Then we have Illinois, which may have plateaued, but we need to see if yesterday’s record amount of new cases was a blip or an inflection point. And in Delaware a missing day of records makes it tricker to see what exactly the trend is.

Credit for the piece is mine.

Comparing Covid-19 to Influenza

I want to share a small graphic I made yesterday evening. And I am being charitable with the term graphic. Really it is nothing more than a collection of organised factettes. But I have seen the footage of those protesting the lockdowns in various states, including Pennsylvania.

To be clear, people can have different policy prescriptions to solve the pandemic. For example, the governor of Pennsylvania is considering lifting the lockdown piecemeal once the state overall has sufficient testing and tracing capabilities. Look at the state.

The situation in Pennsylvania
The situation in Pennsylvania

He rightly said that Cameron County, one of the little light purple shapes in the upper left, with its one case for the last 25 days is in a different situation than Philadelphia where cases continue to grow, albeit at a slowing rate. And in the future it is possible that Cameron County could open before Philadelphia. That is a different policy prescription than, say, opening the state all at once.

I don’t think most people enjoy lockdown—I haven’t left my building in 38 days and I cannot wait to leave and go do something. But I recognise that spreading outside these walls we have a deadly pandemic for which we have no vaccine. But then I see people protesting—protesting in a manner that contradicts the guidelines put out by the health officials—and claiming that we should open up because this is nothing worse than the flu.

Well, Covid-19 is not the flu. It is much worse.

This isn't your grandmother's flu. Or anyone else's flu. Because this isn't the flu.
This isn’t your grandmother’s flu. Or anyone else’s flu. Because this isn’t the flu.

Now, those numbers will change because the pandemic is ongoing. But, let’s spitball. Let’s assume those numbers hold. The idea of the shutdowns, lockdowns, and quarantines is to prevent the spread of the virus. For the sake of this thought experiment, let’s just assume, however, that it infects 56 million people, the upper end of the range for this most recent influenza season.

Influenza this year killed as many as 62,000 people after infecting 56 million. Hypothetically, with a mortality rate of 5%, Covid-19 would kill 2,800,000 people.

With a 4% rate that drops to 2,240,000

With a 3% rate that drops to 1,680,000

With a 2% rate that drops to 1,120,000

With a a 1% rate that drops to 560,000

With a 0.5% rate that drops to 280,000

And even at 0.5% that is still far greater than the flu. And so that is why it is so important to keep the number of people infected as low as possible. (And I won’t even get into the surge problems overwhelming hospitals that acts as a force multiplier and is the proximate reason for the lockdowns.)

This is not the flu.

Credit for the piece is mine.