Discontinuous Lead Bars

Last week the Guardian published an article about drinking water pollution across the United States. Overall, it was a nicely done piece and the graphics within segmented the longer text into discrete sections. Each unit looks similar:


The left focuses on a definition and provides contextual information. It includes small illustrations of the mechanisms by which the pollutant enters the water system. To the right is a chart showing the levels of the contamination detected in the 120 tests the Guardian (and its partner Consumer Reports) conducted.

In almost all of the charts, we see the maximum depicted on the y-axis. And the bars are coloured if that observation station exceeds the health and safety limits. (The limit is represented by the dotted line.)

But towards the end of the piece we get to lead, a particularly problematic pollutant. There is no safe level of lead contamination. But how the piece handles the lead chart leaves a bit to be desired.

But how bad is it, really?

The first thing is colour, but that’s okay. Everything is red, but again, there is no safe level of lead so everything is over the limit. But look at the y-axis. That little black line at the top indicates a discontinuity in the lines, in other words the values for those three observations are literally off the chart.

But does that work?

First, this kind of thing happens all the time. If you ever have to work with data on either China or India, you’ll often find those two nations, due to their sheer demographic size, skew datasets that involve people. But in these kind of situations, how do we handle off the charts data points?

There is a value to including those points. It can show how extreme of an outlier those observations truly are. In other words, it can help with data transparency, i.e. you’re not trying to hide data points that don’t fit the narrative with which you’re working.

In this piece, it’s never explicitly stated what the largest value in the data set is, but I interpret it as being 5.8. So what happens if we make a quick chart showing a value of 6 (because it’s easier than 5.8)? I added a blue bar to distinguish it from the the rest of the chart.

It’s pretty bad.

You can see that including the data point drastically changes how the chart looks. The number falls well outside the graphic, but it also shows just how dangerously high that one observation truly is.

But if you say, well yeah, but that falls outside the box allowed by the webpage, you’re correct. There are ways it could be handled to sit outside the “box”, but that would require some extra clever bits. And this isn’t a print layout where it’s much easier to play with placement. So what happens when we resize that graphic to fit within its container?

And resized

You can see that All the other bars become quite small. And this is probably why the designers chose to break the chart in the first place. But as we’ve established, in doing so they’ve minimised the danger of those few off-the-charts sites as well as left off context that shows how for the vast majority of sites, the situation is not nearly as dire—though, again, no lead is good lead.

What else could have been done? If maintaining the height of the less affected bars was paramount, the designers had a few other options they could have used. First, you could exclude those observations and perhaps put a line below the 118 text that says “for three sites, the data was off the charts and we’ve excluded them from the set below.”

I have used that approach in the past, but I use it with great reluctance. You are removing important outliers from the data set and the set is not complete without them. After all, if you are looking to use this data set to inform a policy choice such as, which communities should receive emergency funding to reduce lead levels, I’d want to start with the city in blue. Sure, I would like everyone to get money, but we’d have to prioritise resources.

I think the best compromise here would have actually been a small tweak to the original. Above the three bars that are broken (or perhaps to the right with some labelling), label the discontinuous data points to provide clearer context to the vast majority of the sites, which are below 0.5 ppb.

As easy as ABC

This preserves the ability to easily compare the lower level observations, but provides important context of where they sit within the overall data set by maintaining the upper limits of the worst offenders.

Credit for the piece goes to the Guardian’s graphics department.

Auto Emissions Stuck in High Gear

The last two days we looked at densification in cities and how the physical size of cities grew in response to the development of transport technologies, most notably the automobile. Today we look at a New York Times article showing the growth of automobile emissions and the problem they pose for combating the greenhouse gas side of climate change.

The article is well worth a read. It shows just how problematic the auto-centric American culture is to the goal of combating climate change. The key paragraph for me occurs towards the end of the article:

Meaningfully lowering emissions from driving requires both technological and behavioral change, said Deb Niemeier, a professor of civil and environmental engineering at the University of Maryland. Fundamentally, you need to make vehicles pollute less, make people drive less, or both, she said.

Of course the key to that is probably in the range of both.

The star of the piece is the map showing the carbon dioxide emissions on the roads from passenger and freight traffic. Spoiler: not good.

From this I blame the Schuylkill, Rte 202, the Blue Route, I-95, and just all the highways
From this I blame the Schuylkill, Rte 202, the Blue Route, I-95, and just all the highways

Each MSA is outlined in black and is selectable. The designers chose well by setting the state borders in a light grey to differentiate them from when the MSA crosses state lines, as the Philadelphia one does, encompassing parts of Pennsylvania, New Jersey, Delaware, and Maryland. A slight opacity appears when the user mouses over the MSA. Additionally a little box remains up once the MSA is selected to show the region’s key datapoints: the aggregate increase and the per capita increase. Again, for Philly, not good. But it could be worse. Phoenix, which surpassed Philadelphia proper in population, has seen its total emissions grow 291%, its per capita growth at 86%. My only gripe is that I wish I could see the entire US map in one view.

The piece also includes some nice charts showing how automobile emissions compare to other sources. Yet another spoiler: not good.

I've got it: wind-powered cars with solar panels on the bonnet.
I’ve got it: wind-powered cars with solar panels on the bonnet.

Since 1990, automobile emissions have surpassed both industry emissions and more recently electrical generation emissions (think shuttered coal plants). Here what I would have really enjoyed is for the share of auto emissions to be treated like that share of total emissions. That is, the line chart does a great job showing how auto emissions have surpassed all other sources. But the stacked chart does not do as great a job. The user can sort of see how passenger vehicles have plateaued, but have yet to decline whereas lorries have increased in recent years. (I would suspect due to increased deliveries of online-ordered goods, but that is pure speculation.) But a line chart would show that a little bit more clearly.

Finally, we have a larger line chart that plots each city’s emissions. As with the map, the key thing here is the aggregate vs. per capita numbers. When one continues to scroll through, the lines all change.

Lots of people means lots of emissions.
Lots of people means lots of emissions.

There's driving in the Philadelphia area, but it's not as bad as it could be.
There’s driving in the Philadelphia area, but it’s not as bad as it could be.

Very quickly one can see how large cities like New York have large aggregate emissions because millions of people live there. But then at a per capita level, the less dense, more sprawl-y cities tend to shoot up the list as they are generally more car dependent.

Credit for the piece goes to Nadja Popovich and Denise Lu.