What Is Infrastructure?

This morning I read a piece in Politico Playbook that broke down President Biden’s $2.25 trillion proposal for infrastructure spending. A thing generally regarded as the United States sorely needs. $2.25 trillion is a lot of money and it’s a fair question to ask whether all that money is really money for infrastructure.

Because, it turns out, it’s not.

Please, sir, may I have more train money?

That isn’t to say money spent on job retraining or home care services wouldn’t be money well spent. Rather, it’s just not infrastructure.

But politics and the English language is a topic for another day. Oh wait, somebody already did write about that.

Credit for the piece is mine.

Discontinuous Lead Bars

Last week the Guardian published an article about drinking water pollution across the United States. Overall, it was a nicely done piece and the graphics within segmented the longer text into discrete sections. Each unit looks similar:

PFAs.

The left focuses on a definition and provides contextual information. It includes small illustrations of the mechanisms by which the pollutant enters the water system. To the right is a chart showing the levels of the contamination detected in the 120 tests the Guardian (and its partner Consumer Reports) conducted.

In almost all of the charts, we see the maximum depicted on the y-axis. And the bars are coloured if that observation station exceeds the health and safety limits. (The limit is represented by the dotted line.)

But towards the end of the piece we get to lead, a particularly problematic pollutant. There is no safe level of lead contamination. But how the piece handles the lead chart leaves a bit to be desired.

But how bad is it, really?

The first thing is colour, but that’s okay. Everything is red, but again, there is no safe level of lead so everything is over the limit. But look at the y-axis. That little black line at the top indicates a discontinuity in the lines, in other words the values for those three observations are literally off the chart.

But does that work?

First, this kind of thing happens all the time. If you ever have to work with data on either China or India, you’ll often find those two nations, due to their sheer demographic size, skew datasets that involve people. But in these kind of situations, how do we handle off the charts data points?

There is a value to including those points. It can show how extreme of an outlier those observations truly are. In other words, it can help with data transparency, i.e. you’re not trying to hide data points that don’t fit the narrative with which you’re working.

In this piece, it’s never explicitly stated what the largest value in the data set is, but I interpret it as being 5.8. So what happens if we make a quick chart showing a value of 6 (because it’s easier than 5.8)? I added a blue bar to distinguish it from the the rest of the chart.

It’s pretty bad.

You can see that including the data point drastically changes how the chart looks. The number falls well outside the graphic, but it also shows just how dangerously high that one observation truly is.

But if you say, well yeah, but that falls outside the box allowed by the webpage, you’re correct. There are ways it could be handled to sit outside the “box”, but that would require some extra clever bits. And this isn’t a print layout where it’s much easier to play with placement. So what happens when we resize that graphic to fit within its container?

And resized

You can see that All the other bars become quite small. And this is probably why the designers chose to break the chart in the first place. But as we’ve established, in doing so they’ve minimised the danger of those few off-the-charts sites as well as left off context that shows how for the vast majority of sites, the situation is not nearly as dire—though, again, no lead is good lead.

What else could have been done? If maintaining the height of the less affected bars was paramount, the designers had a few other options they could have used. First, you could exclude those observations and perhaps put a line below the 118 text that says “for three sites, the data was off the charts and we’ve excluded them from the set below.”

I have used that approach in the past, but I use it with great reluctance. You are removing important outliers from the data set and the set is not complete without them. After all, if you are looking to use this data set to inform a policy choice such as, which communities should receive emergency funding to reduce lead levels, I’d want to start with the city in blue. Sure, I would like everyone to get money, but we’d have to prioritise resources.

I think the best compromise here would have actually been a small tweak to the original. Above the three bars that are broken (or perhaps to the right with some labelling), label the discontinuous data points to provide clearer context to the vast majority of the sites, which are below 0.5 ppb.

As easy as ABC

This preserves the ability to easily compare the lower level observations, but provides important context of where they sit within the overall data set by maintaining the upper limits of the worst offenders.

Credit for the piece goes to the Guardian’s graphics department.

Covid Update: 4 April

Last week I wrote about how the inevitable rise in new Covid-19 cases was occurring in Pennsylvania, New Jersey, Delaware, Virginia, and Illinois. Now, one, in the last week, we saw no evidence of states preparing to reinforce their public health and safety restrictions. And two, whilst we have no data on people not following guidelines, anecdotally a large group of people threw a party in my building’s common amenities space so it does seem like people are feeling less inclined to wear masks, socially distance, and isolate to their own households.

Those two conditions, of course, do not help reduce the case count. Instead they add to it. So it should come as no surprise that Covid-19 continues to rapidly spread in our five states, though some are doing worse than others.

New case curves for PA, NJ, DE, VA, & IL.

New Jersey and Pennsylvania arguably performed the worst. If we look at the peak to trough decline from early winter’s surge to late winter’s nadir, we can see that New Jersey has reached 40% of that peak. Pennsylvania enjoyed a better decline and so has a large gap, but is still nearing 20% its previous peak.

Illinois is also remarkable—again not in a good way—as its peak to trough fall was even greater than Pennsylvania’s, however it’s also now clearly rising. The Land of Lincoln, however, did manager to reach late summer levels of new cases—good. But those are now rising—bad. Delaware too is seeing a rise, albeit at a slower rate than its two tristate neighbours.

Only Virginia’s rise remains slight, barely discernible in the chart.

Deaths, while not exactly good news, aren’t exactly good news either. Last week I mentioned how they had stalled out and stopped declining. That is better than rising death rates, but the levels of deaths per day is still higher than we saw last summer. In other words, things could be significantly better even in pandemic terms.

Death curves for PA, NJ, DE, VA, & IL.

Last week? Deaths continued to stubbornly persist at those elevated levels. We remain vigilant, looking for any indication that deaths will follow the rates of new cases and hospitalisations and begin to climb.

The hope, of course, is that we have vaccinated enough of the most at risk populations to prevent a surge in deaths. But, we just don’t know yet. The only good news is that vaccinations continue to progress.

Vaccination curves for PA, VA, & IL.

Illinois has surpassed 18% of its population being fully vaccinated. Virginia is not far behind at 17.75%. Pennsylvania, because of the bifurcated nature of its data reporting, remains unclear. It sits at 17.8% fully vaccinated, but Philadelphia has not posted updated data since late Thursday. It’s likely that the Commonwealth has joined Illinois in surpassing 18%, but it’s not fully certain.

Also this past week, the CDC updated its guidance for the fully vaccinated, saying that it was safe for them to travel. I take some issue with this, primarily on the messaging front.

First, we need to be clear about what fully vaccinated means. It means two weeks after your final dose. For Johnson & Johnson’s vaccine, that means two weeks after your shot as you only receive one. For both Pfizer and Moderna, you are only fully vaccinated two weeks after your second shot—not before. And keep in mind with Pfizer you need to wait three weeks between first and second dose. With Moderna it’s four weeks. In other words, with J&J you need to wait two weeks after your first (and only) shot before you can begin to follow the loosened guidelines. If you receive Pfizer’s, you need to wait five weeks from your first shot, assuming you do receive your second three weeks later, and with Moderna it’s six weeks, again assuming the recommended four week gap.

The problem is that only about 20% of the US population is fully vaccinated. And with the virus spreading at high rates and at high levels, it poses a significant risk as the newer, more lethal, and more infectious variants could take root in the United States and overwhelm the healthcare systems of the 50 states. We do not yet know if fully vaccinated people can spread the virus if they do become infected.

I think the advice should have remained to refrain from all but essential travel until we reached a high percentage of fully vaccinated folks. I ballparked earlier this week something like 2/3 the estimated amount of full vaccinations required for herd immunity (est. at 75%). In other words, keeping restrictions on travel until at least 50% of the US becomes fully vaccinated.

We remain several weeks away from that milestone, unfortunately. I understand the desire/urge people have to get out and do things and enjoy spring after a year of isolation. Sadly, if winter was the darkest/hardest part of the pandemic, I think that makes spring and early summer the most challenging. Because we see progress, we see the light at the end of the tunnel, and it coincides with warmer weather and we want nothing more to get out and do things and see people. But that is the last thing we need to be doing at this point.

I’ve often described the vaccination as the marshmallow test. In a study, scientists presented kids with a marshmallow. They could eat the marshmallow immediately, but if they waited 15 minutes, unsupervised, they could then have an additional marshmallow. We are all just grabbing that first marshmallow whilst the promise of a more normal summer is ours if we can wait just 15 minutes.

Credit for the piece is mine.

Too Much Horsing Around

Last week the Philadelphia Inquirer published an investigation of the staggering number of horse deaths in Pennsylvania’s race track facilities. I found the article fascinating, but admittedly at a point or two a wee bit squeamish when the author described how horses essentially die. Then about halfway through the article I ran into the first of two graphics looking at the data.

Seeing red…

The first is pretty simple, a timeline of deaths over the course of one year, 2019. Overall it works, you can clearly see clusters of racing deaths, but that those clusters spread across the year. When I sat with the graphic for a moment, however, a few things began to stick out at me. The first was a distracting vibration in the background. Not the alternating beige and blue of the months, but if you look closely you’ll see tightly spaced lines within the colour fields: presumably the days of the month for aligning the deaths.

On a large enough graphic it makes all the sense to tick off sub-monthly increments, but in this space I would have probably opted to show only the months. Maybe weeks could have worked, as that approach may have reinforced the statistic about a horse dying every six days on average.

The second point is the black stroke or outline of each dot. Here the designer faces a challenging constraint. Essentially, the smaller the dot (or the symbol) the brighter the colour. In a rich, blood red colour you have a dark heavier colour. Compare that to say a stop sign that is bright red. It has a lighter feel. The blood red colour, in a given space, has let’s say an amount of black ink or pixels—I’m simplifying here—mixed in with the red. But in a large area, there’s enough red ink or pixels to still be clearly blood red. The stop sign red has no other colours but red. And in large areas, it can be an eye-stabbing amount of red—precisely why it’s likely so useful for, you know, stop signs.

But at the small scale of these very small dots, you still proportionally have the same amount of red and black ink, but with fewer and fewer amounts, the eye can begin to experience difficulty in truly reading the colour for what it is. For example, in an area of say 49 pixels (7×7), while the ratio of red to black may be consistent, you still only have a total of 49 pixels with which to convey “red” to the reader. Consequently, in smaller spaces, you may find that designers sometimes opt for brighter colours, a la stop sign red, than they would in larger fields of colour.

Here we have a nice use of brighter red, green, and yellow. (I will quickly add that the choice of red and green can be problematic for colour blindness, but I don’t want to revisit that here.) But to provide better separation between those small, circle sized fields of colour a border probably helps. A thin black line, or stroke, makes sense. But the black is darker than the colours themselves, thus it can draw more attention than the colour fill. And that begins to happen here. I wonder if a thin white stroke may have been less distracting and placed more emphasis on the fill colours.

As I said, overall a really nice if not sobering graphic in an important but disturbing article. I think a few small tweaks could really bring the graphic over the finish line. Pun fully intended. Sorry, not sorry.

Credit for the pieces goes to John Duchneskie.

Covid Update: 29 March

Two weeks ago I wrote about how new cases in the states of Pennsylvania, New Jersey, Delaware, Virginia, and Illinois were stalling out, i.e. no longer declining. Additionally, with the exception of Illinois, they were stalling at rates far higher than what we saw last summer. I wrote

This means that the environment is ripe for a new surge of cases if people stop following social distancing and begin resuming indoor activities with other people. Sadly, both those things appear to be occurring throughout the US.

Two weeks hence, one of one thing inevitably occurred.

New cases are now rising in all five states. I wrote about the flat tails of the curves for the seven-day averages. A quick look at the chart shows those have swung upwards, in some cases sharply.

New case curves in PA, NJ, DE, VA, & IL.

Two weeks ago I referenced Europe as a cautionary tale. Governments there eased up on their restrictions, cases surged, and then as hospitalisations rose, governments had to reimpose restrictions and effect new lockdowns. Europe has typically been 3–4 weeks ahead of us throughout the pandemic. So that we are now at a point where we are seeing rising cases, absolutely none of this should be surprising.

The evidence has been in our faces for weeks, plus we have the European example to look at. Reopening makes no sense until we can get case numbers lower, especially with new more virulent and lethal strains of coronavirus now circulating.

Deaths too have been trending the wrong way over the last few weeks.

Death curves for PA, NJ, DE, VA, & IL.

We have seen the curves largely bottom out. And if you look closely, these bottoms are higher than the rates we saw last summer, in some cases more than 3× as much. This flattening occurred just a few weeks after cases began to flatten. The question becomes, will they rise in a few weeks time? Or have we vaccinated enough of our most vulnerable populations?

That’s the real wildcard.

Right now, we have only fully vaccinated about 15% of the populations of Pennsylvania, Virginia, and Illinois.

Vaccination curves for PA, VA, & IL.

Is that enough to prevent hospitalisations and deaths in what looks like will be a fourth wave?

Credit for the piece is mine.

Kiss Me, I’m Irish

Or just shake my hand, because today marks the second St. Patrick’s Day spent in isolation. I am lucky, of course, because two years ago I spent the holiday in Dublin. One of those bucket list kind of things. There I ran into a(n American) friend who was coincidentally in town. Then the next day I took the train to Cork to visit another friend. If you don’t count weddings, I think that was the last big trip I took.

Two years hence, I am here in my flat alone on a holiday meant to be spent with family and friends. But in the last year, I made significant progress on my Irish genealogy. For part of that progress I took two additional DNA tests. So this St. Patrick’s Day seems like a good time to reflect on those tests.

For those that don’t know, I do a lot of genealogy work as a hobby. Primarily I focus on paper records, but DNA is an important piece of the puzzle. In a sense, it is the only record that cannot lie. It will reveal your biological connections to family that may have been otherwise lost. And it cannot be faked.

But that’s only true for your genetic matches. Those are the real power of taking a DNA test. I would bet, however, that most people initially take the tests for the ethnicity estimates. On a day like today, how Irish are you? How Irish am I?

That’s a lot of green.

Not surprisingly, I’m pretty Irish.

Of course, if you look at me, those Irish values do not quite equal each other. So what’s the deal? After all, the underlying DNA does not change from spit tube to cheek swab.

The first thing to know is that in one sense, ethnicity is, like so many things, a social construct. Super broadly, every individual is unique—except twins. Of course humans have spread across the globe and in that spread, certain regions have evolved incredibly slight differences between the populations. In addition to those genetic differences, the populations created civilisations and cultures. An ethnicity, in a sense, is a group of people who share that culture, civilisation, and genetic similarities vis-a-vis genetic differences across the world.

Importantly, within those groups, we still have differences. The Irish, for example, are known for freckles and red hair. But not all Irish have those traits. Instead, again super broadly, we say that for a group of people, a certain percentage will share a certain set of features. Consequently, within an ethnic group, you will still have variations and outliers. In some cases because generations ago a traveller from a different group entered the gene pool for some reason or another. And while the offspring might identify entirely with their new civilisation and culture, their genes don’t lie and a DNA test would reveal their traits from their ancestor’s foreign gene pool.

The second point to make is that Ireland is a fairly modern creation. Ireland did not exist as a sovereign state until 1922. Before then, the idea of Ireland existed. The country, however, did not. A better example would be German or Italian. Neither Germany nor Italy existed until the 1870s and 1860s, respectively. If you have “German” ancestors who arrived in Philadelphia in 1848, you don’t have German ancestors. You have ancestors from one of the various principalities or bishoprics comprising the German Confederation. Italy had the Venetian Republic, the Kingdom of the Two Sicilies, and many others. Being Irish, German, or Italian is thus a modern construct.

The third point is that identifying anyone as any of these ethnic groups requires a baseline for a comparison. To do that, you need a reference population in the area you are going to define as Ireland, Germany, or Italy. But humans have migrated throughout history. Ireland was conquered by the English. Germans…well, let’s just say Germans have a history with conquering parts of Europe. And so you can see exchanges of genetic information among populations pretty easily. And over time, those genetic populations evolve.

Take those three points and add them together in admixture test and your results are really only good back to about 500 years. And even then, you may find yourself belonging to something incredibly vague and all-encompassing because, especially as with France and Germany, there’s been too much mixture to get so granular as to fit ourselves within the borders of modern political states.

In the above results, you can see my “Irishness” varies from 63% to 75%. Though, as far as I know 21/32 (66%) of my 3xgreat-grandparents arrived from Ireland. That’s why I say I’m 2/3 Irish. But, genetically, I may be more or less because those 21 might have English or Scottish ancestors. Ancestry says I may be 18% Scottish, but whilst I have ancestors who lived in Scotland, I’m not aware of any ancestors born and raised for multiple generations in Scotland.

And then that’s just how Ancestry defines it. Compare that to my results from My Heritage. Because of the aforementioned difficulty in separating out certain population groups, they lump the Irish, Scottish, and Welsh together. Add my Ancestry Irish and Scottish together and I have 81%, not far from My Heritage’s 85% estimate. Then look at my results from Family Tree. They estimate me as 75% Irish, but add in the 10% Scandinavia and I’m up to 85%.

That brings me to my last point about DNA tests. It’s probably fair to say that I’m something like 80–85% genetically from the British Isles/North Sea region. What about the other 15–20%?

You will often hear you receive half your DNA from each of your parents. And they get half from each of theirs and so on and so forth. I’ve had conversations with folks who take that to mean they get 25% from each grandparent and 12.5% from each great-grandparent et cetera. But that’s not quite true.

You do receive 50% of your DNA from your father and the other 50% from your mother. But that 50%, well that’s a sort of random sample from the share your parents received from their parents.

My maternal grandfather was 100% Carpatho-Rusyn. For generations, his ancestors lived, reproduced, and died in the Carpathian Mountains. If we received exactly half from each previous generation, I should expect 25% of my DNA from my grandfather. But Ancestry, which has the best representation of this small ethnic group, says it’s 17% (though they give it as a range of being between 2 and 27%). In other words, I’m missing seven percentage points.

And so if you take a DNA test and you know you have a great-great grandparents of Irish descent, you may only see a small fraction in your results. If your connection to Ireland (or anywhere else) is even further back, the result becomes smaller still. In fact, beyond 5–7 generations back, you may not even inherit any genetic material from a specific ancestor in your family tree.

But ultimately, for today, as I wrote in one of my very first posts here on Coffeespoons, back in 2010, on St. Patrick’s Day, we’re all at least a little bit Irish.

Hopefully next year we’ll be able to celebrate in person.

Credit for the piece is mine.

And Up We Go Again

Yesterday I wrote about Covid-19 here in five states of the US. I mentioned how I am concerned about the levelling out of new cases in certain states, notably Pennsylvania and New Jersey. In Italy, the government issued a new round of lockdowns in an attempt to contain a new wave before it swamps their healthcare system.

At the end of that BBC article, they used a small multiples graphic showing the seven-day average in several European countries. Today is the 16th, and so the data is now a few days old, but the concept remains important.

New cases curves for several European countries.

From a design standpoint, we are seeing a few things here. First, each country’s line chart exists with its own scale. Unfortunately this makes comparing country-to-country nigh impossible. We know from the title that in the present these are the countries with the highest new case rates in Europe. But, how do these rates today compare to earlier peaks? Without axis lines or a baseline, it’s difficult to say.

Of course, the point could well be just to show how in places like Italy, France, Poland, &c. we are seeing an emergent surge of new cases since the holiday peak.

If that is the goal, I think this chart works well. However, if the goal is to provide more context of the state of the pandemic in these select countries, we need some additional context and information.

Credit for the piece goes to the BBC graphics department.

Covid Update: 14 March

Last week I wrote about how our progress in dealing with Covid-19 was stagnating. To put it simply, this past week did not get any better on that front.

New case curves for PA, NJ, DE, VA, & IL.

In Pennsylvania, Delaware, and Illinois we see that the flattened tail I described last week, well remained a flattened tail. In Delaware, we see more movement, but the average of the average, if you will, is flat over the last two weeks. And in New Jersey, where I mentioned some signs of rising numbers, we see a clearly rising number of new cases over the last week. Only in Virginia are numbers heading down, and those are shallowing out.

The problem here is that in Pennsylvania and Delaware, the new case rate, whilst flat, is well above the summer rate of low transmission. This means that the environment is ripe for a new surge of cases if people stop following social distancing and begin resuming indoor activities with other people. Sadly, both those things appear to be occurring throughout the US.

In Europe we see a cautionary tale. They too saw their holidays peaks decline and the national governments began easing restrictions on their populations. Within the last several days, however, new cases have begun to surge. Italy has gone so far as to announce a new lockdown. Other governments are considering the same.

If the United States cannot resume pushing its numbers of new cases down, it could well follow Europe into a new wave of outbreaks that would threaten lockdowns and push back our eventual return of normalcy.

None of this would be an issue if vaccinations were nearing herd immunity levels. However, in the states we cover, nowhere is above 12% fully vaccinated.

Vaccination curves for PA, VA, & IL.

Pennsylvania now lags behind the other two states. But at least the Commonwealth is over 10% fully vaccinated.

And of course, the problem under this dire scenario is that deaths could rise once again, though at this point the most vulnerable are in the middle of being vaccinated. Indeed, if we look at the last week, we see the good news for the week, that deaths are headed down in all five states.

Death curves for PA, NJ, DE, VA, & IL.

Previously, Virginia had been working through a backlog of death records, but those appear now cleared. We are not quite back to summer-level lows, but we are steadily approaching them.

The big question this week will be what happens to those new cases numbers. Today’s data, Monday, will likely show lower numbers because of lower testing on the weekend. But starting Tuesday, what do we see over the course of the next five days?

Credit for the piece is mine.

The Mars Rovers

Perseverance landed on Mars on 18 February, almost a month ago. The video and photography the rover has already sent back has been stunning. We all know she is the most capable rover yet landed on the Red Planet, but what we all want to know is how cute is Perseverance compared to her predecessors?

Thankfully for that we have xkcd.

Still a big fan of Spirit and Opportunity. Designed to last 3 months, it trekked on for over 14 years.

Credit for the piece goes to Randall Munroe.