Building Back Better Boston Transit

The alliteration failed at that last word, but it gets the point across. No mater how you may want to define infrastructure, the term always includes transit. In the Boston Globe, an opinion piece proposed how the city and region of Boston could improve upon the city’s mass transit options.

And they made a map.

One key flaw remains…

The map is an interesting one. It uses thick purple lines to indicate the commuter rail branches—not the metro/subway lines. The problem is that the outside of those lines then encodes the suggested improvements. An orange outline indicates where tracks should be electrified—Boston still uses diesel engines for some of its commuter rail transit. But the problem is that the dark purple dominates the graphic. If, however, the purple were entirely replaced by an orange line, it would be clearer that the Providence needs electrification. (It’s actually already electrified, as that’s the same line Amtrak uses, but Boston’s transit service still uses diesel engines on the line.)

Similarly, the key to indicate upgraded tracks and signals is a blue line of similar “colour” to the purple. That makes it hard to distinguish between the two, especially when next to the green inline option, representing increased speeds.

The key flaw? A long-time wish for Boston transit lovers (or haters). Note how the system is divided into two, the two main hubs, South Station and North Station, do not connect. Connecting the two will require billions of dollars. But the benefits can be tremendous.

Philadelphia, for example, for decades had two rail hubs: Broad Street Station across from City Hall and Reading Terminal several blocks east along Market Street. Reading Terminal was the terminus for the Reading Railroad and Broad Street Station for the Pennsy, or Pennsylvania Railroad. In 1930, Broad Street Station was replaced by an underground station, today’s Suburban Station. But it would not be until 1984 when rail tunnels would finally be opened linking the western/southern Pennsylvania Railroad lines to the northern lines of Reading. But today you can take a train from a southwest suburb to the far northern suburbs without changing trains because of that connection.

Credit for the piece goes to TransitMatters.

What Is Infrastructure?

This morning I read a piece in Politico Playbook that broke down President Biden’s $2.25 trillion proposal for infrastructure spending. A thing generally regarded as the United States sorely needs. $2.25 trillion is a lot of money and it’s a fair question to ask whether all that money is really money for infrastructure.

Because, it turns out, it’s not.

Please, sir, may I have more train money?

That isn’t to say money spent on job retraining or home care services wouldn’t be money well spent. Rather, it’s just not infrastructure.

But politics and the English language is a topic for another day. Oh wait, somebody already did write about that.

Credit for the piece is mine.

Discontinuous Lead Bars

Last week the Guardian published an article about drinking water pollution across the United States. Overall, it was a nicely done piece and the graphics within segmented the longer text into discrete sections. Each unit looks similar:


The left focuses on a definition and provides contextual information. It includes small illustrations of the mechanisms by which the pollutant enters the water system. To the right is a chart showing the levels of the contamination detected in the 120 tests the Guardian (and its partner Consumer Reports) conducted.

In almost all of the charts, we see the maximum depicted on the y-axis. And the bars are coloured if that observation station exceeds the health and safety limits. (The limit is represented by the dotted line.)

But towards the end of the piece we get to lead, a particularly problematic pollutant. There is no safe level of lead contamination. But how the piece handles the lead chart leaves a bit to be desired.

But how bad is it, really?

The first thing is colour, but that’s okay. Everything is red, but again, there is no safe level of lead so everything is over the limit. But look at the y-axis. That little black line at the top indicates a discontinuity in the lines, in other words the values for those three observations are literally off the chart.

But does that work?

First, this kind of thing happens all the time. If you ever have to work with data on either China or India, you’ll often find those two nations, due to their sheer demographic size, skew datasets that involve people. But in these kind of situations, how do we handle off the charts data points?

There is a value to including those points. It can show how extreme of an outlier those observations truly are. In other words, it can help with data transparency, i.e. you’re not trying to hide data points that don’t fit the narrative with which you’re working.

In this piece, it’s never explicitly stated what the largest value in the data set is, but I interpret it as being 5.8. So what happens if we make a quick chart showing a value of 6 (because it’s easier than 5.8)? I added a blue bar to distinguish it from the the rest of the chart.

It’s pretty bad.

You can see that including the data point drastically changes how the chart looks. The number falls well outside the graphic, but it also shows just how dangerously high that one observation truly is.

But if you say, well yeah, but that falls outside the box allowed by the webpage, you’re correct. There are ways it could be handled to sit outside the “box”, but that would require some extra clever bits. And this isn’t a print layout where it’s much easier to play with placement. So what happens when we resize that graphic to fit within its container?

And resized

You can see that All the other bars become quite small. And this is probably why the designers chose to break the chart in the first place. But as we’ve established, in doing so they’ve minimised the danger of those few off-the-charts sites as well as left off context that shows how for the vast majority of sites, the situation is not nearly as dire—though, again, no lead is good lead.

What else could have been done? If maintaining the height of the less affected bars was paramount, the designers had a few other options they could have used. First, you could exclude those observations and perhaps put a line below the 118 text that says “for three sites, the data was off the charts and we’ve excluded them from the set below.”

I have used that approach in the past, but I use it with great reluctance. You are removing important outliers from the data set and the set is not complete without them. After all, if you are looking to use this data set to inform a policy choice such as, which communities should receive emergency funding to reduce lead levels, I’d want to start with the city in blue. Sure, I would like everyone to get money, but we’d have to prioritise resources.

I think the best compromise here would have actually been a small tweak to the original. Above the three bars that are broken (or perhaps to the right with some labelling), label the discontinuous data points to provide clearer context to the vast majority of the sites, which are below 0.5 ppb.

As easy as ABC

This preserves the ability to easily compare the lower level observations, but provides important context of where they sit within the overall data set by maintaining the upper limits of the worst offenders.

Credit for the piece goes to the Guardian’s graphics department.

Building New Railways in America

I wasn’t expecting this piece to fall into the queue for today, but you all know me as a sucker for trains. So today we have this nice set of small multiples from the Guardian. It looks at…I guess we could call it train deserts. They’re like food deserts, except we’re talking about trains.

Some of the US train deserts
Some of the US train deserts

What strikes me is that in a perfect world at least three of these could be on one direct line. You can almost draw a straight line from Columbus, Ohio to Nashville, Tennessee and hit Louisville, Kentucky. Obviously things like property get in the way, but it is something to note.

Credit for the piece goes to Jan Diehm.

Get to Dam Work

Sorry, not sorry.

But also, sorry. This piece was supposed to go up Wednesday after President Trump’s speech where he announced he’d like to spend $1 trillion on infrastructure. But it didn’t post, so you will get two today.

This article from the New York Times dates from about a week or so ago at the height of the flooding out in California. During that deluge, the Oroville Dam emergency spillway partially failed. And a week prior to that, the Twentyone Mile Dam in Nevada burst.

Dams require investment and maintenance along with roads, railways, airports, and well practically all infrastructure. The article leads in with a map locating all those dam locations across the United States and colour codes them by age.

Where are the dam locations?
Where are the dam locations?

The article outlines the potential costs and risks associated with all this dam stuff and is worth a quick read. It also includes some nice secondary graphics about the dam hazard potential in Nevada.

Sorry, not sorry.

Credit for the piece goes to Troy Griggs, Gregor Aisch, and Sarah Almukhtar.