Choropleths and Colours

In many cities through the United States, real estate represents a hot commodity. It’s not difficult to understand why, as have covered before, Americans are saving a bit more. Coupled with stay-at-home orders in a pandemic, spending that cash on a home down payment makes a lot of sense for a lot of people. But with little new construction, it’s a seller’s market.

The Philadelphia Inquirer covers that angle for the Philadelphia region and in the article, it includes a map looking at time to sell a house. And it’s that interactive map I want to look at briefly this morning.

Red vs. blue

Primarily I want to discuss the colours, as you can gather from this post’s title. We have six bins here, each indicating an amount of time in one-week intervals. So far so good. Now to the colours, we have red for homes that sell in one week or less and blue for homes that sell in five weeks or more.

Blue to red is a pretty standard choice. You will often see it in maps where you have positive growth to negative growth or something similar, I’ve used it myself on Coffeespoons a number of times, like in this map of population growth at the county level here in Pennsylvania.

In those scenarios, however, note how you have positive values and negative values. The change in colour (hue) encodes the change in numerical value, i.e. positive vs. negative. We then encode the values within that positive or negative range with lighter/darker blues and reds. Most often the darker the blue or red, the greater the value toward the end of the spectrum. For example, in Pennsylvania, the dark blue meant population growth greater than 8% and red meant population declines in excess of 8%.

As an aside you’ll note that there are no dark blue counties in that map and that’s by design. By keeping the legend symmetrical in terms of its minimum and maximum values, we can show how no counties experienced rapid population growth whilst several declined rapidly. If dark blue had meant greater than 4% growth, that angle of the story would have been absent from the map.

Back to our choropleth discussion, however. How does that fit with this map of selling times for homes in the Philadelphia region?

Note first that five weeks is a positive value. But so is one week or less. The use of the red-blue split here is not immediately intuitive. If this map were about the change or growth in how long homes sell, certainly you could see positive and negative rates and those would make sense in red and blue.

The second part to understand about a traditional red-blue choropleth is that at some point you have to switch from red to blue, a mid-point if you will. If you are talking positive/negative like in my Pennsylvania map, zero makes a whole lot of sense. Anything above zero, blue, anything below zero red.

Sometimes, you will see a third colour, maybe a grey or a purple, between that red and blue. That encodes a fuzzier split between positive and negative. Say you want to give a margin of 1%, i.e. any geographic area that has growth between +1% and -1%. That intrinsically means the bin is both positive and negative at the same time, so a neutral colour like grey or a blend of the two colours, a purple in the case of red and blue, makes a whole lot of sense.

Here we have nothing like that. Instead we jump from a light yellow two-to-three weeks to a light blue three-to-four weeks.

What about that yellow? In a spectrum of dark blue to light blue, you will see lighter blues than darker blues. But in a red spectrum, that light red becomes pinkish or salmonish depending on that exact type of red you use. (Conversation for another day.) Personal preferences will often push clients to asking a designer to “use less pink” in their maps. I can’t tell you the number of times I’ve heard that.

If that comes up, designers will often keep their blue side of the legend from the dark to light—no complaints there, or at least I’ve never heard any. But for the red side, they’ll switch to using hue or type of colour instead of dark to light red.

Not all colours are as dark as others. Blue and red can be pretty dark. Yellow, however, is a fairly light colour. Imagine if you converted the colours to greyscale, you’ll have very dark greys for blue and red, but yellow will be consistently far lighter than the other two.

The designer can use the light yellow as the light red. But to link the yellow to red, they need to move through the hues or colours between the two. There’s a whole conversation here about colour theory and pigment and light absorption vs. pixels and light emission, but let’s go back to your colours you learned in primary school (pigment and light absorption). Take your colour wheel and what sits between red and yellow? Orange.

And so if a client objects to a light pink, you’ll see a pseudo dark-to-light red spectrum that uses a dark red, a medium orange, and a light yellow. Just like we see here in this Inquirer map.

Back to the two-to-three week and three-to-four week switch, though. What’s the deal? This is my sticking point with the graphic. I am looking for the explanation of why the sudden break in colour here, but I don’t see any obvious one.

Why would you use this colour scheme where blue and red diverge around a non-zero value? Let’s say the average home in the region sells in three weeks, any of the zip codes in red are selling faster than average, hot markets, and those taking longer than average are in blue, cold markets. Maybe it’s the current average, however. What if it were the average last year? Or the national average? These all serve as benchmarks for the presented data and provide valuable context to understand the market.

Unfortunately it’s not clear what, if any, benchmarks the divergence point in this map reflects. And if there is no reason to change colours mid-legend, with only six bins, a designer could find a single colour, a blue or purple for example, and then provide five additional lighter/darker shades of that to indicate increasing/decreasing levels of speed at which homes sell.

Overall, I left this piece a wee bit confused. The general trend of regional differences in how quickly homes are selling? I get that. But because there’s a non-logical break between red and blue here—or at least one I fail to see in the graphic—this map would work almost as well if each bin were a separate colour entirely, using ROYGBIV as a base for example.

Credit for the piece goes to John Duchneskie.

But What About New Zealand?

It’s time for another Friday just for fun posting. I once worked with a guy who could draw a map of the United States or the world on a whiteboard incredibly accurately. He then left it in the break room for the office to try and label correctly.

This is kind of that, but in reverse, from xkcd. Good luck.

Which states are missing?

Credit for the piece goes to Randall Munroe.

Warmer, Wetter Winters in the UK

I remember hearing and reading stories as a child about the Thames in London freezing over and hosting winter festivals. Of course most of that happened during what we call the Little Ice Age, a period of below average temperatures during the 15th through the early 19th century.

But those days are over.

The UK’s Meteorological Office, or the Met for short, released some analysis of the impacts of climate change to winter temperatures in the United Kingdom. And if, like me, you’re more partial to winter than summer, the news is…not great.

Winter warming

Broadly speaking, winters will become warmer and wetter, i.e. less snowy and more rainy. Meanwhile summers will become hotter and drier. Farewell, frost festivals.

But let’s talk about the graphic. Broadly, it works. We see two maps with a unidirectional stepped gradient of six bins. And most importantly those bins are consistent between the maps, allowing for the user to compare regions for the same temperatures: like for like.

But there are a couple of things I would probably do a bit differently. Let’s start with colour. And for once we’re not dealing with the colour of the BBC weather map. Instead, we have shades of blue for the data, but all sitting atop an even lighter blue that represents the waters around the UK and Ireland. I don’t think that blue is really necessary. A white background would allow for the warmest shade of blue, +4ºC, to be even lighter. That would allow greater contrast throughout the spectrum.

Secondly, note the use of think black lines to delineate the sub-national regions of the UK whilst the border of the Republic of Ireland is done in a light grey. What if that were reversed? If the political border between the UK and Ireland were black and the sub-national region borders were light grey—or white—we would see a greater contrast with less visual disruption. The use of lines lighter in intensity would allow the eye to better focus on the colours of the map.

Then we reach an interesting discussion about how to display the data. If the purpose of the map is to show “coldness”, this map does it just fine. For my American audience unfamiliar with Celsius, 4ºC is about 39ºF, many of you would definitely say that’s cold. (I wouldn’t, because like many of my readers, I spent eight winters in Chicago.)

The article touches upon the loss of snowy winters. And by and large, winters require temperatures below the freezing point, 0ºC. So what if the map used a bidirectional, divergent stepped gradient? Say temperatures above freezing were represented in shades of a different colour like red whilst below freezing remained in blue, what would happen? You could easily see which regions of the UK would have their lowest temperatures fail to fall below freezing.

Or another way of considering looking at the data is through the lens of absolute vs. change. This graphic compares the lowest annual temperature. But what if we instead had only one map? What if it coloured the UK by the change in temperature? Then you could see which regions are being the most (or least) impacted.

If the data were isolated to specific and discrete geographic units, you could take it a step further and then compare temperature change to the baseline temperatures and create a simple scatterplot for the various regions. You could create a plot showing cold areas getting warmer, and those remaining stable.

That said, this is still a really nice piece. Just a couple little tweaks could really improve it.

Credit for the piece goes to the UK Met Office.

Biden’s Biggest Pyramids

Yesterday we looked at an article from the Inquirer about the 2020 election and how Biden won because of increased margins in the suburbs. Specifically we looked at an interactive scatter plot.

Today I want to talk a bit about another interactive graphic from the same article. This one is a map, but instead of the usual choropleth—a form the article uses in a few other graphics—here we’re looking at three-dimensional pyramids.

All the pyramids, built by aliens?

Yesterday we talked about the explorative vs. narrative concept. Here we can see something a bit more narrative in the annotations included in the graphic. These, however, are only a partial win, though. They call out the greatest shifts, which are indeed mentioned in the text. But then in another paragraph the author writes about Bensalem and its rightward swing. But there’s no callout of Bensalem on the map.

But the biggest things here, pun intended, are those pyramids. Unlike the choropleth maps used elsewhere in the article, the first thing this map fails to communicate is scale. We know the colour means a county’s net shift was either Democratic or Republican. But what about the magnitude? A big pyramid likely means a big shift, but is that big shift hundreds of votes? Thousands of votes? How many thousands? There’s no way to tell.

Secondly, when we are looking at rural parts of Bucks, Chester, and Montgomery Counties, the pyramids are fine. They remain small and contained within their municipality boundaries. Intuitively this makes sense. Broadly speaking, population decreases the further you move from the urban core. (Unless there’s a secondary city, e.g. Minneapolis has St. Paul.) But nearer the city, we have more population, and we have geographically smaller municipalities. Compare Colwyn, Delaware County to Springfield, Bucks County. Tiny vs. huge.

In choropleth maps we face this problem all the time. Look at a classic election map at the county level from 2016.

Wayb ack when…

You can see that there is a lot more red on that map. But Hillary Clinton won the popular vote by more then 3,000,000 votes. (No, I won’t rehash the Electoral College here and now.) More people are crowded into smaller counties than there are in those big, expansive red counties with far, far fewer people.

And that pattern holds true in the Philadelphia region. But instead of using the colour fill of an area as above, this map from the Inquirer uses pyramids. But we face the same problem, we see lots of pyramids in a small space. And the problem with the pyramids is that they overlap each other.

At a glance, you cannot see one pyramid beind another. At least in the choropleth, we see a tiny field of colour, but that colour is not hidden behind another.

Additionally, the way this is constructed, what happens if in a municipality there was a small net shift? The pyramid’s height will be minimal. But to determine the direction of the shift we need to see the colour, and if the area under the line creating the pyramid is small, we may be unable to see the colour. Again, compare that to a choropleth where there would at least be a difference between, say, a light blue and light red. (Though you could also bin the small differences into a single neutral bin collecting all small shifts be them one way or the other.)

I really think that a more straight forward choropleth would more clearly show the net shifts here. And even then, we would still need a legend.

The article overall, though, is quite strong and a great read on the electoral dynamics of the Philadelphia region a month ago.

Credit for the piece goes to John Duchneskie.

Covid Migration

Yep, Covid-19 remains a thing. About a month or so ago, an article in City Lab (now owned by Bloomburg), looked at the data to see if there was any truth in the notion that people are fleeing urban areas. Spoiler: they’re not, except in a few places. The entire article is well worth a read, as it looks at what is actually happening in migration and why some cities like New York and San Francisco are outliers.

But I want to look at some of the graphics going on inside the article, because those are what struck me more than the content itself. Let’s start with this map titled “Change in Moves”, which examines “the percentage drop in moves between March 11 and June 30 compared to last year”.

Conventionally, what would we expect from this kind of choropleth map. We have a sequential stepped gradient headed in one direction, from dark to light. Presumably we are looking at one metric, change in movement, in one direction, the drop or negative.

But look at that legend. Note the presence of the positive 4—there is an entire positive range within this stepped gradient. Conventionally we would expect to see some kind of red equals drop, blue equals gain split at the zero point. Others might create a grey bin to cover a negative one to positive one slight-to-no change set of states. Here, though, we don’t have that. Nor do we even get a natural split, instead the dark bin goes to a slightly less dark bin at positive four, so everything less than four through -16 is in the darker bin.

Look at the language, too, because that’s where it becomes potentially more confusing. If the choropleth largely focuses on the “percentage drop” and has negative numbers, a negative of a negative would be…a positive. A -25% drop in Texas could easily be mistaken with its use of double negatives. Compare Texas to Nebraska, which had a 2% drop. Does that mean Nebraska actually declined by 2%, or does it mean it rose by 2%?

A clean up in the data definition to, say, “Percentage change in moves from…” could clear up a lot of this ambiguity. Changing the colour scheme from a single gradient to a divergent one, with a split around zero (perhaps with a bin for little-to-no change), would make it clearer which states were in the positive and which were in the negative.

The article continues with another peculiar choice in its bar charts when it explores the data on specific cities.

Here we see the destinations of people moving out of San Francisco, using, as a note explains, requests for quotes as a proxy for the numbers of actual moves. What interests me here is the minimalist take on the bar charts. Note the absence of an axis, which leaves the bars almost groundless for comparison, except that the designer attached data labels to the ends of the bars.

Normally data labels are redundant. The point of a visualisation is to visualise the comparison of data sets. If hyper precise differences to the decimal point are required, tables often are a better choice. But here, there are no axis labels to inform the user as to what the length of a bar means.

It’s a peculiar design decision. If we think of labelling as data ink, is this a more efficient use with data labels than just axis labels? I would venture to say no. You would probably have five axis labels (0–4) and then a line to connect them. That’s probably less ink/pixels than the data labels here. I prefer axis lines to help guide the user from labels up (in this case) through the bars. Maybe the axis lines make for more data ink than the labels? It’s hard to say.

Regardless, this is a peculiar decision. Though, I should note it’s eminently more defensible than the choropleth map, which needs a rethink in both design and language.

Credit for the piece goes to Marie Patino.

Trumpsylvania

After working pretty much non-stop all spring and summer, your humble author finally took a few days off and throw in a bank holiday and you are looking at a five-day weekend. But, because this is 2020 travelling was out of the question and so instead I hunkered down to finish writing/designing an article I have been working on for the last several weeks/few months.

The main write-up—it is a lengthy-ish read so you may want to brew a cup of tea—is over at my data projects site. This is the first project I have really written about for that since spring/summer 2016. Some of my longer-listening readers may recall that the penultimate piece there I wrote about Pennsyltucky was inspired by work I did here at Coffeespoons.

To an extent, so is this piece. I wrote about Trumpsylvania, the political realignment of the state of Pennsylvania. 2016 and the state’s vote for Donald Trump was less an aberration than many think. It was the near-end result of a decades-long transformation of the state’s political geography. And so I looked at the data underlying the shift and how and where it occurred.

And originally, I had a slightly different conclusion as to how this related to Pennsylvania in the upcoming 2020 election. But, the whole 2020 thing made me shift my thinking slightly. But you’ll have to read the whole thing to understand what I’m talking about. I will leave you with one of the graphics I made for the piece. It looks at who won each county in the state, but also whether or not the candidate was able to flip the county. In other words, was Clinton able to flip a Republican county? Was Trump able to flip a Democratic county?

Who won what? Who flipped what?

Let me know what you think.

And of course, many, many thanks to all the people who suffered my ideas, thoughts, and early drafts over the last several weeks. And even more thanks to those who edited it. Any and all mistakes or errors in the piece are all mine and not theirs.

Credit for the piece is mine.

Parties in Pennsylvania

This is from a social media post I made a few days ago, but think it may be of some relevance/interest to my Coffeespoons followers. I was curious to see at 30+ days from the general election, how has the landscape changed for the two parties since 2016?

Well, this project has driven me to a related, but slightly different project that has been consuming my non-work time. Hopefully I will have more on that in the coming days. Without further ado, the post:

Pennsylvania will likely be one of the more critical battleground swing states in this year’s election. In 2016, then candidate Trump won the state by less than one percentage point. But four years is a long time and I was curious to see how things have changed.

In the first chart on the right we see counties won by Trump and on the left, Clinton. The further from the centre, the greater the candidate’s margin of victory over the other. The top half plots registered Republicans’ margin over Democrats as a percentage of all registered voters in the county (including independents and third party) and the bottom half does the same for Democrats. Closer to the centre, the more competitive, further away, less so.

Trump’s key to victory was the white, working class voter clustered in the west and the northeast of the state–old mining and steel towns. There Democrats normally counted on organised labour support as registered Democrats. That all but collapsed in 2016. The bottom right shows a number of nominally Democratic counties Trump won, whereas Clinton only picked up one Republican county, Chester.

But what are PA’s battlegrounds?

In the second chart we ignore places like Philly and Fulton County and zoom in on more competitive counties within 20 point margins. Polls presently point to a Biden lead of about 5 points in PA. If every dot moved left by 5 points (it doesn’t really work like that), we only see Erie and Northampton with potential to flip.

But Trump’s realignment of politics is accelerating (more on this another day) a realignment of PA’s political geography.

In the fourth chart, neither Erie nor Northampton show any real movement via party registration back to Democrats. Erie may flip, but Northampton’s likely a stretch. Places like Cumberland and Lancaster counties are too solidly Republican to flip this year. Instead Trump is more likely to flip counties like Monroe and Lehigh red, even if he loses the state.

Because, not shown, the key to a Biden victory will be running up the margins in Philly & Pittsburgh, and to a lesser extent Philly’s four collar counties, including Chester, which appears to be rapidly shifting in Democrats’ favour.

Credit for the piece is mine.

The Size of the California Wildfires Compared to Philly

The West Coast is a different scale than the East Coast. After all, California alone is almost the size of New England and parts of the Mid-Atlantic combined. So when we take that enormous size into consideration, how big are these fires on an East Coast scale? It can be difficult to imagine.

Thankfully the Philadelphia Inquirer addressed the issue.

It’s a simple concept, but I love these kind of graphics. The East Coast is dense and cities and towns are clustered closer together, being they were founded before personal automobiles were things. And so the August Complex fire in California would cover a significant portion of the Philadelphia metropolitan area, almost wiping it all off the map.

Credit for the piece goes to John Duchneskie.

It’ll Get Cooler Eventually

President Trump, on climate change.

I mean, technically he’s correct. Eventually the universe will likely end with heat death as all the energy dissipates and stars die out and space becomes a truly empty, cold void. So it’ll get cooler, eventually.

But what about right now? In one to three generations’ time? 30–90 years? Not looking so great.

So what sparked this ludicrous comment? This year’s wildfire season on the West Coast, usually relegated to California, this year’s season has burned up forests in both Washington and Oregon as well, states whose usually wetter climate inhibits these kind of rapidly spreading fires.

A few days ago the Washington Post published a piece looking at the fires out west. It started with a map showing ultimate fire perimeters and currently active fires.

In a normal year, those fires in Oregon and Washington wouldn’t be there. Welcome to the new normal.

Frequent readers will know I’m not a fan of the dark background for graphics, but I’m betting it was chosen because as you scroll through the article, it makes the photo journalism really pop off the page. Contrast the bright yellows, oranges, and reds with a dark black background and c’est magnifique, at least from a design standpoint. And given this piece is really about the photography depicting the horrors on the West Coast, it’s an understandable design decision.

Credit for the piece goes to Laris Karklis.

Double Your Hurricanes, Double Your Fun

In a first, the Gulf of Mexico basin has two active hurricanes simultaneously. Unfortunately, they are both likely to strikes somewhere along the Louisiana coastline within approximately 36 hours of each other. Fortunately, neither is strong as a storm named Katrina that caused a mess of things several years ago now.

Over the last few weeks I have been trying to start the week with my Covid datagraphics, but I figured we could skip those today and instead run with this piece from the Washington Post. It tracks the forecast path and forecast impact of tropical storm force winds for both storms.

The forecast path above is straight forward. The dotted line represents the forecast path. The coloured area represents the probability of that area receiving tropical storm force winds. Unsurprisingly the present locations of both storms have the greatest possibilities.

Now compare that to the standard National Weather Service graphic, below. They produce one per storm and I cannot find one of the combined threat. So I chose Laura, the one likely to strike mid-week and not the one likely to strike later today.

The first and most notable difference here is the use of colour. The ocean here is represented in blue compared to the colourless water of the Post version. The colour draws attention to the bodies of water, when the attention should be more focused on the forecast path of the storm. But, since there needs to be a clear delineation between land and water, the Post uses a light grey to ground the user in the map (pun intended).

The biggest difference is what the coloured forecast areas mean. In the Post’s versions, it is the probability of tropical force winds. But, in the National Weather Service version, the white area actually is the “cone”, or the envelope or range of potential forecast paths. The Post shows one forecast path, but the NWS shows the full range and so for Laura that means really anywhere from central Louisiana to eastern Texas. A storm that impacts eastern Texas, for example, could have tropical storm force winds far from the centre and into the Galveston area.

Of course every year the discussion is about how people misinterpret the NWS version as the cone of impact, when that is so clearly not the case. But then we see the Post version and it might reinforce that misconception. Though, it’s also not the Post’s responsibility to make the NWS graphic clearer. The Post clearly prioritised displaying a single forecast track instead of a range along with the areas of probabilities for tropical storm force winds.

I would personally prefer a hybrid sort of approach.

But I also wanted to touch briefly on a separate graphic in the Post version, the forecast arrival times.

This projects when tropical storm force winds will begin to impact particular areas. Notably, the areas of probability of tropical storm force winds does not change. Instead the dotted line projections for the paths of the storms are replaced by lines relatively perpendicular to those paths. These lines show when the tropical storm winds are forecast to begin. It’s also another updated design of the National Weather Service offering below.

Again, we only see one storm per graphic here and this is only for Laura, not Marco. But this also probably most analogous to what we see in the Post version. Here, the black outline represents the light pink area on the Post map, the area with at least a 5% forecast to receive tropical storm force winds. The NWS version, however, does not provide any further forecast probabilities.

The Post’s version is also design improved, as the blue, while not as dark the heavy black lines, still draws unnecessary attention to itself. Would even a very pale blue be an improvement? Almost certainly.

In one sense, I prefer the Post’s version. It’s more direct, and the information presented is more clearly presented. But, I find it severely lack in one key detail: the forecast cone. Even yesterday, the forecast cone had Laura moving in a range both north and south of the island of Cuba from its position west of Puerto Rico. 24 hours later, we now know it’s on the southern track and that has massive impact on future forecast tracks.

Being east of west of landfall can mean dramatically different impacts in terms of winds, storm surge, and rainfall. And the Post’s version, while clear about one forecast track, obscures the very real possibilities the range of impacts can shift dramatically in just the course of one day.

I think the Post does a better job of the tropical storm force wind forecast probabilities. In an ideal world, they would take that approach to the forecast paths. Maybe not showing the full spaghetti-like approach of all the storm models, but a percentage likelihood of the storm taking one particular track over another.

Credit for the Post pieces goes to the Washington Post graphics department.

Credit for the National Weather Service graphics goes to the National Weather Service.