Expansion Teams in Baseball

I was not planning on posting this today, because I was—am?—still working on it. But there was some baseball news last night that prompted me to export what I had to try and get this live.

For a little while now I’ve been wondering why a number of baseball stars, albeit in their later years, are still looking for employment. Some are pretty obvious in that they are facing legal troubles. Some may have high demands that ball clubs are not willing to meet. Some may have reasonable demands but the clubs are just being incredibly cheap. Or it may be none of those. Or some combination of those. But when you see some of the players some teams put on the field each night, you can’t tell me some of these free agents wouldn’t be better options.

Separately, I also tend to think baseball needs to expand and add some new clubs. But they won’t until the Oakland Athletics and Tampa Bay Rays resolve their stadium issues.

But what if…

Well a normal expansion would include two teams to keep an even balance. The new teams would likely use some kind of draft to select players from the rosters of other teams, with a certain number of players almost certainly protected. But what if we just used those unsigned ball players?

Anibal Sanchez is the guy messing this up. He’s been a free agent for some time now but is reportedly going to sign by the end of this week, perhaps today. So with him and everyone else, could we field two expansion teams?

Kinda, yeah.

First up, the Charlotte Piedmonters.

The Charlotte Piedmonters could also be looking for a new name.

Not a great team—nor would we expect it to be as all the really good free agents have already been signed. But these former stars, award winners, and fan favoutites may have just enough left in the tank to make for some competitive games if all goes well. My readers who happen to be fellow baseball fans will probably recognise most of these names, though I’ll admit a number of the relief pitchers are new to me. I can figure out basically everything but a centre fielder. But you could probably get somebody from an independent league or international league or just convert somebody.

I used projected Wins Above Replacement (WAR) to determine how good the players would be. For non-baseball fans, WAR is a value you can use to determine how good a player is relative to an average replacement player. Somebody with the value 0 to 1 is a scrub or bench player. Take any average ballplayer and sub them in and you wouldn’t know the difference. 2s and 3s are solid role playing guys, but not likely stars. Stars get into the picture around 4 and your best players are probably 5 to 6 or higher.

In Charlotte, nobody has a WAR higher than Rick Porcello’s 1.4. In other words, he’s a better than average pitcher, but not by much. Tyler Flowers: a better than average catcher, but not by much. Homer Bailey: barely better than average starting pitcher. Everyone else, generally you could sub them out and not know the difference. But, crucially for our purposes, they are not below average players. Some of those are still on the market, but I didn’t assign them to Charlotte.

Now if Charlotte gets a team, so does Portland, Oregon: the Portland Lumberjacks.

Again, I’m open to name suggestions.

Here you can see Anibal Sanchez as the third man in the rotation. You can also see that the rotation here is the weakest part. For Charlotte you could get away with a bullpen game every five days. But two bullpen days? Well, take a look at the Boston Red Sox in 2020 and that pitching dumpster fire and you’ll see what having only two or three starters can do. (Though the relief starters they did use were all worse than the people on these lists, which just makes my point that there are talented if not star-level players available.)

Neither of these teams would be good. You can imagine a team like Charlotte getting beat almost every night in the AL East—except by Baltimore. The NL East might be a bit easier. And Portland in the NL West would be similarly a punching bag—except by Colorado probably. But dump either into the AL or NL Central and who knows.

Two teams is clearly a stretch. So what if we just made one? What if we brought back the Montreal Expos? Sure, it messes up the schedule, but we get to pick the best players from Charlotte and Portland.

No new name needed.

The result is a team that is significantly improved. That doesn’t mean very good. These Expos wouldn’t make the playoffs. But the rotation is full of guys who could be, at best, solid middle- to, more likely, back-end starters. The lineup, well, the lineup would still be mostly replacement level players, a.k.a. scrubs, with two exceptions. But with past track records, it’s not impossible to imagine a few of these players having a better than projected year.

On paper, they still wouldn’t be as good as the worst team in baseball (by WAR), the Pirates. But Pittsburgh also doesn’t have a centre fielder, so…

Anyway, I was going to try and do some more analysis beyond using WAR, but I wanted to get this out before Sanchez signed this week.

I also got to add Oliver Perez, who despite having a good year was released by Cleveland today. Boston needs a solid lefty reliever for the middle innings, and I hope they pick up Perez and option Josh Taylor down to Worcester.

Credit for the piece is mine.

Arrowheads

I don’t know if this is a trend, but I’ve now seen a few graphics appearing using arrows to show the direction or trend of the data. This graphic in an article by Bloomberg prompted me to talk about this piece.

I should add, after rereading my draft, that I’m not clear who made this graphic. I assume that it was the Bloomberg graphics team, because it appears in Bloomberg and all the data is presented to recreate the chart. But, it could also be a chart made by someone at Goldman Sachs that credits Bloomberg as a source and then someone at Bloomberg got hold of a copy. And a graphic made for a news/media outlet will typically be of a different quality or level of polish than one made perhaps by and for analysts. (Not that I think there should be said differences, as it does a disservice to internal users, but I digress from a digression.)

All the things going on in this chart.

The arrow here appears above the peak quarter, i.e. the second of 2021, for both the Goldman Sachs Economics forecast and the consensus forecast. But what does it really add? First, it adds “ink”, in this case pixels. Here, every pixel consumes our attention and there is a finite number of available pixels within the space of this graphic.

When I work with authors or subject matter experts, I often find myself asking them “what’s the most important thing to communicate?” or something along those lines. If the person answers with a long laundry list, I remind them that if everything is important, nothing is important. If everything is set in bold, all caps text, what will look most important is the rare bit of text set in regular, lower-case letters.

In the above graphic, there are so many things screaming for my attention, it’s difficult to say which is the most important. First, I’m fairly certain that “US QoQ annualised GDP growth” could move to the graphic subhead or data definition. Allow the graphic’s data container to contain, well, data. Second, the data series labels can be moved outside the data container. The labels here have an inherent problem is that the Goldman Sachs Economics numbers are in blue, and that blue text has less visual weight than the black text of the Consensus label. Consequently, the Goldman Sachs Economics label recedes into the background and becomes lost, not what you want from your legend.

Third, I don’t believe the data labels here add anything to the chart. They function as sparkly distractions from the visual trend, which should be the most important aspect of a visual chart.

Finally, we get to the arrow, the impetus for this post. First, I should note that it is not clear what growth it shows. The fact the line is black makes me think it reflects the Consensus forecast whereas a blue line would represent the Goldman Sachs forecast. But it could also be the average of the two or even a more general “here’s the general shape”. The problem is that the shape matters. If you look at the slope of the actual forecasts, you see a sharp increase to the peak followed by a slower, more gradual taper. The arrow in the original graphic shows a decelerating curve that is shallower in the lead up to the peak and that is not what is forecast to happen.

Now we get to the issue I mentioned at the top, the extraneous labelling and data ink wasted. If we look at the chart as is, but remove the arrow, we see this.

Immediately to the right of the peak, we have have some blue data labels and then just a bit to the right of that, but sitting vertically above the label we have the bold blue text labelling the data series. But further to the upper right we have a dark and bold block of text that draws the eye away from the peak and into the corner. It draws the eye away from the very element of the shape the peak needs to be a peak, the trough in the wave. Consequently, it makes sense with the eye being drawn up and to the right that the designers threw an arrow in above the peak to show how, no, actually your eye needs to go down and to the right.

But what happens if we then strip out the data series labelling? Do we still need the arrow? Let’s take a look.

I would argue that no, we do not. And so let’s strip the arrow out of the picture and take a look.

Here the shape of the curve is clear, a sharp rise and then a gradual taper to the right. No arrow needed to show the contour. In other words, the additional labelling wastes our attention, which then forces us to add an arrow to see what we needed to see in the first place, but then further wasting our attention.

There are a number of other things I take issue with in this chart: the black outlines of the blue rectangles, the tick marks on the x-axis, the solid border of the container, the lack of axis lines. But the arrow points to this graphic’s central problem, a poorly thought out labelling structure.

So because the chart provides all the data, I took a quick stab at how I would chart it using my own styles. I gave myself a 3:2 ratio, less space than the original graphic had. This is where I landed. I would prefer the legend below the chart labelling, but it felt cramped in the space. And with so few data points along the x-axis, the chart doesn’t need a ton of horizontal space and so I repurposed some of it to create a vertical legend space.

I mixed typefaces only because my default does not have a proper small capitals and I wanted to use small capitals to reduce and balance out the weight of the exhibit label in the graphic title.

I could still tweak the spacing between the bars and perhaps the treatment of the years below the quarters could use some additional work, but the main point here is that the shape of the curve is clear. I need no arrow to tell the user that there is a peak and that after the peak the line goes down. The white space around the bars and the line does that for me.

Credit for the piece goes to either the Bloomberg graphics department or the Goldman Sachs graphics department. Not sure.

The Super Short European Super League

Sunday night, news broke that a number of European football clubs were creating a rogue league, the European Super League. My British and European readers—and Americans who follow football—will know the names of Manchester United, Liverpool, AC Milan, Juventus, Real Madrid, and the others.

To put this in perspective for my American readers, imagine the Yankees, Dodgers, Red Sox, Astros, Padres, Mets, Cardinals, Phillies, Angels, and Nationals saying that they were leaving Major League Baseball to go and form their own new baseball league. That they were doing so to “save the sport”. But in so doing, they also guarantee they all make the playoffs every year.

My frequent readers and those who know me will know I’m a fan of the Boston Red Sox. I should point out that the owner of the Red Sox, John Henry, owns both the Red Sox and Liverpool through his company Fenway Sports Group.

Of course, the analogy doesn’t quite hold up, because there are some significant differences between American sports and European football. Relegation is a big one. Personally, I wish American sports had some way of using relegation to incentivise teams to not intentionally suck.

The basic premise of relegation. Take English football. You have four levels of play and in theory any team can exist in any level. Each year, the worst teams move from their current level down one whilst the best teams move up. And for the top level, the top teams get to compete in lucrative European-wide matches. That is a bit simplistic, but imagine that at the end of last year, the Pirates, Rangers, Tigers, and Red Sox became AAA minor league teams and the four best AAA minor league teams became MLB teams. MLB teams would theoretically try to do everything they could to stay in the MLB and not drop to AAA, because that would mean a loss of money. After all, the Yankees would no longer be heading to Fenway nor the White Sox to Detroit. Would seeing the Detroit Tigers play the Woo Sox really be worth the ticket prices you pay at Comerica Park?

But that’s not how American sports work. And so a few American owners, namely those of Manchester United, Arsenal, and Liverpool, want to ensure a steady stream of money. By creating their own league where their teams cannot be relegated, they guarantee that revenue stream.

In other words, this is all about the owners of these Super League teams making even more money.

Because, during the last year, teams have been hurting without fans in attendance. And that gets us to why I can write this up. Because the BBC in an article about this new league addressed the fact that most of these teams are heavily in debt.

This graphic, however, is a bit misleading. Look at Liverpool. There is no available data for how much financial debt the club holds. So why is it placed between Chelsea and Manchester City? It could well have more debt than Tottenham. Liverpool should really be left off this chart and included in the note, because its placement suggests that it has little debt, when that may well not be the case. This is a really misleading graphic when it comes to how Liverpool fits with the other 11 clubs.

From a design standpoint, I’m also not clear on why the x-axis line extends beyond the labels for £-200m and £600m.

I’m not going to touch all the data labels. That’s for another piece I’ve been working on off and on for a little while now.

At this point I should point out that I was going to post this article later, but in the last 18 hours or so the whole thing has fallen apart as the English teams, followed by the others, have been dropping out under immense pressure from the sport and their fans. To bring back my analogy above, imagine MLB retaliating and saying that if those teams created their own league, the players would not be allowed to play in any other matches and the teams would be locked out from all other competitive baseball games. It’s a mess.

Credit for the piece goes to the BBC graphics department.

Politicising Vaccinations

Yesterday I wrote my usual weekly piece about the progress of the Covid-19 pandemic in the five states I cover. At the end I discussed the progress of vaccinations and how Pennsylvania, Virginia, and Illinois all sit around 25% fully vaccinated. Of course, I leave my write-up at that. But not everyone does.

This past weekend, the New York Times published an article looking at the correlation between Biden–Trump support and rates of vaccination. Perhaps I should not be surprised this kind of piece exists, let alone the premise.

From a design standpoint, the piece makes use of a number of different formats: bars, lines, choropleth maps, and scatter plots. I want to talk about the latter in this piece. The article begins with two side by side scatter plots, this being the first.

Hesitancy rates compared to the election results

The header ends in an ellipsis, but that makes sense because the next graphic, which I’ll get to shortly, continues the sentence. But let’s look at the rest of the plot.

Starting with the x-axis, we have a fairly simple plot here: votes for the candidates. But note that there is no scale. The header provides the necessary definition of being a share of the vote, but the lack of minimum and maximum makes an accurate assessment a bit tricky. We can’t even be certain that the scales are consistent. If you recall our choropleth maps from the other day, the scale of the orange was inconsistent with the scale of the blue-greys. Though, given this is produced by the Times, I would give them the benefit of the doubt.

Furthermore, we have five different colours. I presume that the darkest blues and reds represent the greatest share. But without a scale let alone a legend, it’s difficult to say for certain. The grey is presumably in the mixed/nearly even bin, again similar to what I described in the first post about choropleths from my recent string.

Finally, if we look at the y-axis, we see a few interesting decisions. The first? The placement of the axis labels. Typically we would see the labelling on the outside of the plot, but here, it’s all aligned on the inside of the plot. Intriguingly, the designers took care for the placement—or have their paragraph/character styles well set—as the text interrupts the axis and grid lines, i.e. the text does not interfere with the grey lines.

The second? Wyoming. I don’t always think that every single chart needs to have all the outliers within the bounds of the plot. I’ve definitely taken the same approach and so I won’t criticise it, but I wonder what the chart would have looked like if the maximum had been 35% and the grid lines were set at intervals of 5%. The tradeoff is likely increased difficulty in labelling the dots. And that too is a decision I’ve made.

Third, the lack of a zero. I feel fairly comfortable assuming the bottom of the y-axis is zero. But I would have gone ahead and labelled it all the same, especially because of how the minimum value for the axis is handled in the next graphic.

Speaking of, moving on to the second graphic we can see the ellipsis completes the sentence.

Vaccination rates compared to the election results

We otherwise run into similar issues. Again, there is a lack of labelling on the x-axis. This makes it difficult to assess whether we are looking at the same scale. I am fairly certain we are, because when I overlap the graphics I can see that the two extremes, Wyoming and Vermont, look to exist on the same places on the axis.

We also still see the same issues for the y-axis. This time the axis represents vaccination rates. I wish this graphic made a little clearer the distinction between partial and full vaccination rates. Partial is good, but full vaccination is what really matters. And while this chart shows Pennsylvania, for example, at over 40% vaccinated, that’s misleading. Full vaccination is 15 points lower, at about 25%. And that’s the number that needs to be up in the 75% range for herd immunity.

But back to the labelling, here the minimum value, 20%, is labelled. I can’t really understand the rationale for labelling the one chart but not the other. It’s clearly not a spacing issue.

I have some concerns about the numbers chosen for the minimum and maximum values of the y-axis. However, towards the middle of the article, this basic construct is used to build a small multiples matrix looking at all 50 states and their rates of vaccination. More on that in a moment.

My last point about this graphic is on the super picky side. Look at the letter g in “of residents given”. It gets clipped. You can still largely read it as a g, but I noticed it. Not sure why it’s happening, though.

So that small multiples graphic I mentioned, well, see below.

All 50 states compared

Note how these use an expanded version of the larger chart. The y-minimum appears to be 0%, but again, it would be very helpful if that were labelled.

Also for the x-axis in all the charts, I’m not sure every one needs the Biden–Trump label. After all, not every chart has the 0–60% range labelled, but the beginning of each row makes that clear.

In the super picky, I wish that final row were aligned with the four above it. I find it super distracting, but that’s probably just me.

Overall, this is a strong piece that makes good use of a number of the standard data visualisation forms. But I wish the graphics were a bit tighter to make the graphics just a little clearer.

Credit for the piece goes to Danielle Ivory, Lauren Leatherby and Robert Gebeloff.

Covid Update: 18 April

Last week I wrote about how we may have been beginning to see divergent patterns in new cases, i.e. how New Jersey in particular had seen its new cases numbers falling whilst other states continued with increasing case counts.

One week later, that may still broadly hold true.

Emphasis on may.

New case curves for PA, NJ, DE, VA, & IL.

If we look at the new charts, we can see that broadly, New Jersey did continue its downward trend as Pennsylvania and Delaware experienced significant rises in new cases. Virginia remained fairly stable, but with a slight trend towards increasing numbers of new cases.

But New Jersey and now Illinois present some interesting trends to watch this coming week. Illinois reminds me of New Jersey in that despite rising numbers most of last week, the last few days (and of course the weekend) saw numbers lower than preceding days. You can see from the slightest of dips at the tail of the line the trend has flipped direction. Will the direction hold, however, once we start receiving weekday reporting figures starting Tuesday?

Back to New Jersey, though. The downward trend continued most of the week. But, the last several days could portend a reversal of sorts. For most of the last week, the state saw daily new case numbers increasing day after day. But the trend line, as it should, remained heading downwards. Until just a few days ago. If you look at the tail of the line there, you will see a slight uptick. This too will be something to watch in the coming week.

Deaths also need careful attention this week.

Death curves in PA, NJ, DE, VA, & IL.

Last I asked the question, will deaths follow rising cases? After a week of data, the answer is unmistakably yes. However, unlike new cases, the increases are largely of a marginal number. Look closely at the ends of the lines for Pennsylvania, New Jersey, Delaware, and Illinois and you will see last week’s shallow rise continued.

Virginia bucked the trend with decreasing numbers of deaths. And of course marginal increases could easily give way to marginal decreases. Now I try not to mention too many daily numbers in these posts because I take the weekly view, but I will be closely following Pennsylvania this week. For the last several weeks, the Commonwealth regularly reported deaths on Sunday and Monday in the single digits. Yesterday Harrisburg reported 40. Is this a one-day surge of reports? Is the state resuming reporting more deaths at the weekend? Or does it portend something worse, a mores significant rise in the number of deaths?

Vaccinations continue apace. Although, I would expect to see some slowdown as the Johnson & Johnson vaccine pause ripples out across the vaccination programme.

Fully vaccinated curves for PA, NJ, DE, VA, & IL.

For now though we continue to see increasing numbers. Indeed, the three states I track have now all reached or should reach today 25% of their population as fully vaccinated.

One, that is good news.

But, two, this is just the beginning.

Last week in some tense questioning about when we can expect resumption of “normal”, Dr. Fauci provided a figure of 10,000 new cases per day across the US. (Currently we are about at 60,000 or so.) Vaccines will impede the transmission as they become ever more widely administered and fully implemented—remember that a first dose of a two-dose regimen does not mean you should be heading out and socialising.

At present, we have Pennsylvania averaging 5,000 new cases per day. In other words, Pennsylvania alone represents half of Dr. Fauci’s target. We are clearly far from that reopening level.

What I will be curious about in the coming weeks though is that interplay between new cases and vaccinations. If Illinois does begin to see a downward trend in new cases this week, how much of it is due to the state being 25% fully vaccinated?

That’s a complex question to answer, but at some point, increasing vaccinations will force new cases to reach an inflection point. First they will begin to bend downward, increasing more slowly instead of exponentially. Then with even more vaccinations a second point will be reached at which this new surge begins to finally turn and new cases drop.

The question is when.

Credit for the piece is mine.

Choropleths…Again

Admittedly, I was trying to find a data set for a piece, but couldn’t find one. So instead for today’s post I’ll turn to something that’s been sitting in my bookmarks for a little while now. It’s a choropleth map from the US Census Bureau looking at population change between the censuses.

Unequal growth

The reason I have it bookmarked is for the apportionment map, but I will save apportionment for another post because, well, it’s complicated. But map colours are a thing we’ve been discussing of late and we can extend that conversation here.

What I find interesting about this map is how they used a very dark blue-grey colour for their positive growth and an orange that is a fair bit brighter for negative growth, or population loss. And because of that difference in brightness, the orange really jumps out at you.

To be fair, that’s ideal if you’re trying to talk about where state populations are shrinking, because it focuses attention on declines. But, if you’re trying to present a more neutral position, like this seems to be, that colour choice might not be ideal.

Another issue is that if you look at the legend it simply says loss for that orange. But, look above and you’ll see four bins clearly delimited by ranges of percents for the positive growth. If we are trying to present a more neutral story, the use of the orange places it visually somewhere near the top of that blue-grey spectrum.

If you look at the percentages, however, Michigan’s population decline was 0.6% and Puerto Rico’s 2.2%. If this map used a legend that treated positive and negative growth equally, you would place that one state and one should-be state in a presumably light orange. The scale of their negative growth is equal to something like Ohio, which is in the lightest blue-grey available.

Consequently, this map is a little bit misleading when it comes to negative growth.

Credit for the piece goes to the Census Bureau graphics team.

Choropleths and Colours Part 2

Last Thursday I wrote about the use of colour in a choropleth map from the Philadelphia Inquirer. Then on Sunday morning, I opened the door to collect the paper and saw a choropleth above the fold for the New York Times. I’ll admit my post was a bit lengthy—I’ve never been one described as short of words—but the key point was how in the Inquirer piece the designer opted to use a blue-to-red palette for what appeared to be a data set whose numbers ran in one direction. The bins described the number of weeks a house remained on the market, in other words, it could only go up as there are no negative weeks.

Compare that to this graphic from the Times.

More choropleth colours…

Here we are not looking at the Philadelphia housing market, but rather the spread of the UK/Kent variant of SARS-CoV-2, the virus that causes COVID-19. (In the states we call it the UK variant, but obviously in the UK they don’t call it the UK variant, they call it the Kent variant from the county in the UK where it first emerged.)

Specifically, the map looks at the share (percent) of the variant, technically named B.1.1.7, in the tests reported for each country. The Inquirer map had six bins, this Times map has five. The Inquirer, as I noted above, went from less than one week to over five weeks. This map divides 100% into five 20-percent bins.

Unlike the Inquirer map, however, this one keeps to one “colour”. Last week I explained why you’ll see one colour mean yellow to red like we see here.

This map makes better use of colour. It intuitively depicts increasing…virus share, if that’s a phrase, by a deepening red. The equivalent from last week’s map would have, say, 0–40% in different shades of blue. That doesn’t make any sense by default. You could create some kind of benchmark—though off the top of my head none come to mind—where you might want to split the legend into two directions, but in this default setting, one colour headed in one direction makes significant sense.

Separately, the map makes a lot of sense here, because it shows a geographic spread of the variant, rippling outward from the UK. The first significant impacts registering in the countries across the Channel and the North Sea. But within four months, the variant can be found in significant percentages across the continent.

Credit for the piece goes to Josh Holder, Allison McCann, Benjamin Mueller, and Bill Marsh.

Covid Update: 11 April

This time last week I wrote about how we should not be surprised at rising levels of coronavirus in the states of Pennsylvania, New Jersey, Delaware, Virginia, and Illinois. After all, our elected officials reopened economies despite data saying they should do otherwise. On top of that, people have been engaging in reckless behaviour and seemingly abandoning the very behaviours that had been leading to declining rates. With those two failures, our last hope is that vaccines will come quickly and be widely taken by the public.

A week hence.

Well, we are beginning to see some divergent patterns, especially with new cases.

New case curves for PA, NJ, DE, VA, & IL.

Last week there was some evidence that New Jersey might be bucking the trend and headed downwards after weeks of rising new cases. And now that appears to be a more sustained trend as the line for the Garden State’s seven-day average clearly began headed the right direction this past week.

That’s the good news. The bad news is that we continue to see rising numbers of new cases in Pennsylvania, Delaware, and Illinois. Although if we want to try and find the positives in the bad, we can see that Delaware’s upward trend remains fairly shallow. Illinois, while steeper, is rising from a lower base as the Land of Lincoln managed to reach low, summer levels of new case spread earlier this year. And in Pennsylvania, there is a bend in the curve, an inflection point, that could indicate growth in the number of new cases is slowing. We still need to see it turn negative, but slowing growth is better than increasing growth.

Virginia splits the difference between those sets. It remains at an elevated level of new case transmission, but the upward tick we saw—unlike the other states—was not followed by a general surge in new cases. The little rise we did see, in fact seems to have perhaps shifted back downward.

One of the big questions in this current wave of new cases is will deaths rise? We are seeing increasing numbers of new cases and hospitalisations, but will deaths follow? The hope is that we have vaccinated enough of the most vulnerable populations to prevent them from suffering the most serious of results.

Death curves for PA, NJ, DE, VA, & IL.

So far so good. While death rates remain slightly elevated over summer levels, we do not yet see any signs of rising numbers of deaths. The only possible exception is Virginia, where cases bottomed out after the state added delayed death certificates from the holidays, but have risen in recent days.

Finally we have vaccinations. Here is the best news at which we can look. We can now say that at least 20% of the populations of Pennsylvania, Virginia, and Illinois are fully vaccinated. To be clear, that is still a long way from herd immunity levels, but that’s 20 percentage points more than we had four months ago.

Total full vaccination curves for PA, VA, & IL.

One big outstanding question is how much, if at all, can vaccinated people spread coronavirus? This is why we need to continue to wear masks and socially distance even those who have been vaccinated. But at some point—I don’t know when—these increasing levels of full vaccination should begin to flatten the new case curves. Could that be what’s flattening the curves in New Jersey, Virginia, and Pennsylvania? It’s too early to say, but one can hope.

Credit for the piece is mine.

But What About Pluto?

Damn you Neil deGrasse Tyson (but not really though)!

Because, you know, he advocated for de-planet-fying Pluto back in the oughts.

Which I mention because of this post from xkcd, which corrects common images of planets in the solar system accounting for their population.

Still, though, no Pluto?

Credit for the piece goes to Randall Munroe.

Choropleths and Colours

In many cities through the United States, real estate represents a hot commodity. It’s not difficult to understand why, as have covered before, Americans are saving a bit more. Coupled with stay-at-home orders in a pandemic, spending that cash on a home down payment makes a lot of sense for a lot of people. But with little new construction, it’s a seller’s market.

The Philadelphia Inquirer covers that angle for the Philadelphia region and in the article, it includes a map looking at time to sell a house. And it’s that interactive map I want to look at briefly this morning.

Red vs. blue

Primarily I want to discuss the colours, as you can gather from this post’s title. We have six bins here, each indicating an amount of time in one-week intervals. So far so good. Now to the colours, we have red for homes that sell in one week or less and blue for homes that sell in five weeks or more.

Blue to red is a pretty standard choice. You will often see it in maps where you have positive growth to negative growth or something similar, I’ve used it myself on Coffeespoons a number of times, like in this map of population growth at the county level here in Pennsylvania.

In those scenarios, however, note how you have positive values and negative values. The change in colour (hue) encodes the change in numerical value, i.e. positive vs. negative. We then encode the values within that positive or negative range with lighter/darker blues and reds. Most often the darker the blue or red, the greater the value toward the end of the spectrum. For example, in Pennsylvania, the dark blue meant population growth greater than 8% and red meant population declines in excess of 8%.

As an aside you’ll note that there are no dark blue counties in that map and that’s by design. By keeping the legend symmetrical in terms of its minimum and maximum values, we can show how no counties experienced rapid population growth whilst several declined rapidly. If dark blue had meant greater than 4% growth, that angle of the story would have been absent from the map.

Back to our choropleth discussion, however. How does that fit with this map of selling times for homes in the Philadelphia region?

Note first that five weeks is a positive value. But so is one week or less. The use of the red-blue split here is not immediately intuitive. If this map were about the change or growth in how long homes sell, certainly you could see positive and negative rates and those would make sense in red and blue.

The second part to understand about a traditional red-blue choropleth is that at some point you have to switch from red to blue, a mid-point if you will. If you are talking positive/negative like in my Pennsylvania map, zero makes a whole lot of sense. Anything above zero, blue, anything below zero red.

Sometimes, you will see a third colour, maybe a grey or a purple, between that red and blue. That encodes a fuzzier split between positive and negative. Say you want to give a margin of 1%, i.e. any geographic area that has growth between +1% and -1%. That intrinsically means the bin is both positive and negative at the same time, so a neutral colour like grey or a blend of the two colours, a purple in the case of red and blue, makes a whole lot of sense.

Here we have nothing like that. Instead we jump from a light yellow two-to-three weeks to a light blue three-to-four weeks.

What about that yellow? In a spectrum of dark blue to light blue, you will see lighter blues than darker blues. But in a red spectrum, that light red becomes pinkish or salmonish depending on that exact type of red you use. (Conversation for another day.) Personal preferences will often push clients to asking a designer to “use less pink” in their maps. I can’t tell you the number of times I’ve heard that.

If that comes up, designers will often keep their blue side of the legend from the dark to light—no complaints there, or at least I’ve never heard any. But for the red side, they’ll switch to using hue or type of colour instead of dark to light red.

Not all colours are as dark as others. Blue and red can be pretty dark. Yellow, however, is a fairly light colour. Imagine if you converted the colours to greyscale, you’ll have very dark greys for blue and red, but yellow will be consistently far lighter than the other two.

The designer can use the light yellow as the light red. But to link the yellow to red, they need to move through the hues or colours between the two. There’s a whole conversation here about colour theory and pigment and light absorption vs. pixels and light emission, but let’s go back to your colours you learned in primary school (pigment and light absorption). Take your colour wheel and what sits between red and yellow? Orange.

And so if a client objects to a light pink, you’ll see a pseudo dark-to-light red spectrum that uses a dark red, a medium orange, and a light yellow. Just like we see here in this Inquirer map.

Back to the two-to-three week and three-to-four week switch, though. What’s the deal? This is my sticking point with the graphic. I am looking for the explanation of why the sudden break in colour here, but I don’t see any obvious one.

Why would you use this colour scheme where blue and red diverge around a non-zero value? Let’s say the average home in the region sells in three weeks, any of the zip codes in red are selling faster than average, hot markets, and those taking longer than average are in blue, cold markets. Maybe it’s the current average, however. What if it were the average last year? Or the national average? These all serve as benchmarks for the presented data and provide valuable context to understand the market.

Unfortunately it’s not clear what, if any, benchmarks the divergence point in this map reflects. And if there is no reason to change colours mid-legend, with only six bins, a designer could find a single colour, a blue or purple for example, and then provide five additional lighter/darker shades of that to indicate increasing/decreasing levels of speed at which homes sell.

Overall, I left this piece a wee bit confused. The general trend of regional differences in how quickly homes are selling? I get that. But because there’s a non-logical break between red and blue here—or at least one I fail to see in the graphic—this map would work almost as well if each bin were a separate colour entirely, using ROYGBIV as a base for example.

Credit for the piece goes to John Duchneskie.