No Matter What You Say, I’m Still Me

As many long-time readers know, I was long ago bitten by the genealogy bug and that included me taking several DNA tests. The real value remains in the genetic matches, less so the ethnicity estimates. But the estimates are fun, I’ll give you that. Every so often the companies update their analysis of the DNA and you will see your ethnicity results change. I wrote about this last year. Well yesterday I received an e-mail that this year’s updates were released.

So you get another graphic.

The clearest change is that the Scottish bit has disappeared. How do you go from nearly 20% Scottish to 0%? Because population groups in the British isles have mixed for centuries. When the Scottish colonised northern Ireland, they brought Scottish DNA with them. And as I am fairly certain that I have Irish ancestors from present-day Northern Ireland, it would make sense that my DNA could read as Scottish. But clearly with the latest analysis, Ancestry is able to better point to that bit as Irish instead of Scottish. And this shouldn’t surprise you or me, because those purple bars represent their confidence bands. I might have been 20% Scottish, but I also could have been reasonably 0% Scottish.

Contrast that to the Carpatho-Rusyn, identified here as Eastern European and Russian. That hovers around 20%, which makes sense because my maternal grandfather was 100% Carpatho-Rusyn—his mother was born in the old country, present-day Slovakia. We inherit 50% of our DNA from each of our parents, but because they also inherit 50%, we don’t necessarily inherit exactly 25% from our grandparents and 12.5% from our great-grandparents, &c.

But also note how the confidence band for my Carpatho-Rusyn side has narrowed considerably over the last three years. As Ancestry.com has collected more samples, they’re better able to identify that type of DNA as Carpatho-Rusyn.

Finally we have the trace results. Often these are misreads. A tiny bit of DNA may look like something else. Often these come and go each year with each update. But the Sweden and Denmark bit persisted this year with the exact same values. If I compare my matches, my paternal side almost always has some Swedish and Danish ethnicity, not so for my maternal side. And importantly, those matches have more. Remember, because of that inheritance my matches further up on my tree should have more DNA, and that holds true.

That leads me to believe this likely isn’t a misread, but rather is an indication that I probably have an ancestor who was from what today we call Sweden or Denmark. Could be. Maybe. But at 2%, assuming the DNA all came from one person, it’s probably a 4th to a 6th great-grandparent depending on how much I and my direct ancestors inherited.

Clearly there’s more work to do.

It’s a Little Steamy Out There

And by out there I mean 1150 light years away. One of the five amazing images out of the first day’s announcement by the James Webb Space Telescope (JWST) team was not a sexy photo of a nebula or a look back 13.5 billion years in time. Instead it was a plot of the amount of infrared light was blocked as exoplanet WASP-96b, a hot Jupiter, transited in front of its sun. A hot Jupiter is a gas giant roughly the size of Jupiter that orbits its sun so closely—often closer than Mercury does our Sun—its year takes mere days. WASP-96b is about half the mass of Jupiter and a year takes a little over three Earth days. Hot indeed.

The JWST means not just to take those images we saw, but to also capture data about the light passing through planetary atmospheres, just like WASP-96b. And showing the world Tuesday just how that works was a brilliant idea. What they shared was this graphic.

Everyone likes water.

The original post explains the science behind it, but in short we see telltale signs of water vapor in the atmosphere. Remember that the planet is far too hit for liquid water to exist. But because the peaks and troughs were not as pronounced as expected, scientists can conclude that there are clouds and haze in the atmosphere. It did not detect any significant signs of oxygen, carbon dioxide, or methane, all of which would be noticeable if present as we expect in future exoplanets to be studied.

But later that day, the BBC published an article summarising the releases, but included a different version of the above graphic. Though the other four photos were unchanged. The BBC presented us with this.

Also steamy.

The most notable difference is the background. What was a giant illustration of a planet and then a semi-transparent chart background atop that on which the graphic sat is here replaced by a simple white background. Off the bat this chart is easier to read.

But then here we also lose some data clarity. Note on the original how we have axis markers for the wavelengths of light and the parts per million of light blocked. All are absent here. Instead the BBC opted to only put “Shorter” and “Longer” on the wavelength axis. I would submit that there was no real need to remove those labels, but that they could have been added to with these new ones. The new labels certainly explain the numbers to an audience that may not be as scientifically literate as perhaps the JWST’s audience was or was thought to be. There is certainly a value to simplifying and distilling things to a level at which your audience can understand. But there’s also a value in presenting more complex data, issues, and ideas in an attempt to educate and elevate your audience. In other words, instead of always trying to play to the lowest common denominator, it sometimes is worth it to lose a few in the audience if you ultimately increase the level of said denominator overall.

The other notable difference is that the data is presented without what I assume to be plots of the range of observations with their respective medians. You can see this in the original by how every wavelength has a line and a dot sitting in the middle of that line. In other words, over the 6+ hours the planet was observed, at each wavelength a certain amount of light was blocked. The average middle point over that whole time period is the dot. Then a line of best fit “connects” the dots to show the composition of the light streaming though that steamy atmosphere.

Again, I can understand the desire to remove the ranges and keep the median, but I also think that there is little harm in showing both. Though, the first graphic could like have used an explanation of what was shown, as I’m only assuming what we have and I could be way off. You can show more things and raise the level of the denominator, but you can only do so if you explain what your audience is looking at.

Overall both graphics are nice and capture not just the particular makeup of this one exoplanet’s atmosphere, but more broadly the potential power of the JWST and its impact on astronomy.

Credit for the original goes to the NASA, ESA, CSA, and STScI graphics teams.

Credit for the BBC version goes to the BBC graphics department.

Political Hatch Jobs

Earlier this week I read an article in the Philadelphia Inquirer about the political prospects of some of the candidates for the open US Senate seat for Pennsylvania, for which I and many others will be voting come November. But before I get to vote on a candidate, members of the political parties first get to choose whom they want on the ballot. (In Pennsylvania, independent voters like myself are ineligible to vote in party primaries.)

This year the Republican Party has several candidates running and one of them you may have heard of: Dr. Oz. Yeah, the one from television. And while he is indeed the front runner, he is not in front by much as the article explains. Indeed, the race largely had been a two-person contest between Oz and David McCormick until recently when Kathy Barnette pulled just about even with the two.

In fact, according to a recent poll the three candidates are all statistically tied in that they all fall within the margin of error for victory. And that brings us to the graphic from the article.

It would be funny to see a candidate finish with negative vote share.

Conceptually this is a pretty simple bar chart with the bar representing the share of the support of those polled. But I wanted to point out how the designer chose to represent the margin of error via hatched shading to both sides of the ends of the red bar.

In some cases the hatch job does not work for me, particularly with those smaller candidates where the bar goes negative. I would have grave reservations about the vote should any candidate win a negative share of the vote. 0% perhaps, but negative? No. I also don’t think the grey hatching works as well over the grey bar in particular and to a lesser degree the red.

I have often thought that these sorts of charts should use some kind of box plot approach. So this morning I took the chart above and reworked it.

Now with box plots.

Overall, however, I really like this designer’s approach. We should not fear subtlety and nuance, and margins of error are just that. After all, we need not go back too far in time to remember a certain candidate who thought she had a presidential election locked up when really her opponent was within the margin of error.

Credit for the piece goes to John Duchneskie.

Updated DNA Ethnicity Estimates

Earlier this year I posted a short piece that compared my DNA ethnicity estimates provided by a few different companies to each other. Ethnicity estimates are great cocktail party conversations, but not terribly useful to people doing serious genealogy research. They are highly dependent upon the available data from reference populations.

To put it another way, if nobody in a certain ethnic group has tested with a company, there’s no real way for that company to place your results within that group. In the United States, Native Americans are known for their reluctance to participate and, last I heard, they are under-represented in ethnicity estimates. Fortunately for me, Western European population groups are fairly well tested.

But these reference populations are constantly being updated and new analysis being performed to try and sort people into ever more distinct genetic communities. (Although generally speaking the utility of these tests only goes back a handful of generations.)

Last night, when working on a different post, I received an email saying Ancestry.com had updated their analysis of my DNA. So naturally I wanted to compare this most recent update to last September’s.

Still mostly Irish

Sometimes when you look at data and create data visualisation pieces, the story is that there is very little change. And that’s my story. The actual number for my Irish estimate remained the same: 63%. I saw a slight change to my Scottish and Slavic numbers, but nothing drastic. My trace results changed, switching from 2% from the Balkans to 2% from Sweden and Denmark. But you need to take trace results with a pretty big grain of salt, unless they are of a different continent. Broadly speaking, we can be fairly certain about results at a continental level, but differences between, say, French and Germans are much harder to distinguish.

The Scottish part still fascinates me, because as far back as I’ve gone, I have not found an identifiable Scottish ancestor. A great-great-grandfather lived for several years in Edinburgh, but he was the son of two Ireland-born Irish parents. I also know that this Scottish part of me must come from my paternal lines as my mother has almost no Scottish DNA and she would need to have some if I were to have had inherited it from her.

Now for about half of my paternal Irish ancestors, I know at least the counties from which they came. My initial thought, and still best guess, is that the Scottish is actually Scotch–Irish from what is today Northern Ireland. But I am unaware of any ancestor, except perhaps one, who came from or has origins in Northern Ireland.

The other thing that fascinated me is that despite the additional data and analysis the ranges, or degree of uncertainty in another way of looking at it, increased in most of the ethnicities. You can see the light purple rectangles are actually almost all larger this year compared to last. I can only wonder if this time next year I’ll see any narrowing of those ranges.

Credit for the piece is mine.

Covid-19 Is Not the Flu Part Augh!

Yesterday, President Trump once again lied to the American public on his social media platforms. He falsely claimed that Covid-19 was nothing worse than the flu, which he falsely claimed sometimes kills more than 100,000 people. Once again we are going to look at the data comparing influenza to the novel coronavirus and the disease it causes, Covid-19. We are going to look at the president’s claim that Covid isn’t much worse than the flu, which sometimes kills more than 100,000 people.

I mean, I don’t know where else to begin. Over the last decade, not in any flu season has the flu killed 100,000 people. In the 2017/18 season, the CDC estimates the flu killed 61,000 Americans. But they also give a range where they feel with 95% confidence that the flu killed between 46,000 and 95,000 Americans. And that is the closest it’s come.

In fact, as of yesterday, Covid-19 has killed 207,000 Americans. That averages out to about 30,000 Americans per month. In other words, Covid-19 has killed each month the same number of people the flu kills in an entire (average) fly season.

And the worst part is that we still haven’t exited the first wave of the coronavirus, because we never got it under control in the first place.

I just don’t know how many more times we have to say this, but because the president keeps lying about it, I feel like I need to say, once again…

Covid-19. Is. Not. The. Flu.

Credit for the piece is mine.

The Climate Impact of Your Food

Climate change is a thing. And facing it will require a lot of our societies. But the longer we choose not to act, the more the impact will be felt by later generations. Consequently, across the world, young students have been walking out of class to shine light on an issue on which they, as children, have little direct impact. Yet. But what about us? The ones who can vote and make lifestyle decisions?

The BBC had a piece where, after soliciting questions from their readership, they answered questions. One question being, what can individuals do to reduce their impact. And while clearly individuals need to do more than one thing, one facet can be examining one’s diet. The article included this graphic on the climate impact of various food types, vis-a-vis greenhouse gas emissions.

Is this saying I should drink more beer?
Is this saying I should drink more beer?

Essentially we are looking at a simplified box plot of greenhouse gas emissions per serving of food (and drink) type. The box plot looks at a range of values for a specific item. It usually shows the extremes at both ends; the range of a significant number of the data points, e.g. 80% of the set, or by decile, or by quartile; and then lastly the average, be it mean or median. Here we have only low impact, high impact, and average impact. Presumably the minimum, maximum, and then either mean or median.

And it works really well. Chocolate is a great example of how on average, chocolate isn’t terrible. But certain chocolates can have far worse ramifications than low-impact beef, or average-impact lamb and prawns. And beef is well known to be one of the most impactful types of food.

From a design standpoint, I don’t know if the colours necessarily help. The average beef impact, for example, is worse than the high-impact maximum of every other food listed. But the association of green=good and red=bad  here has little value because by that logic, the average=gold beef should be red as it sits above the high-impact everything else. A less editorial choice could be made of say a light grey or blue and then have the bright colour, maybe still orange, indicate where the average sits on that spectrum.

I do like the annotations on the chart. It highlights particular stories, like the aforementioned chocolate one, that the casual, i.e. skimming, reader may miss.

I could probably do without the little food illustrations. But the designer did a good job of making them all recognisable in such a small space—far from an easy task. And being so small, they don’t really distract or take away from the whole graphic.

Overall, this is a strong graphic.

Credit for the piece goes to the BBC graphics department.

How Much Warmer Was 2015

When I was over in London and Dublin, most days were cool and grey. And a little bit rainy. Not very warm. (Though warmer than Chicago.) But, that is weather—highly variable on a daily basis. Climate is longer-term trends and averages. Years, again, can be highly variable—here’s looking at you kid/El Niño. But, even in that variability, 2015 was the warmest year on record. So the New York Times put together a nice interactive piece allowing the user to explorer data for available cities in terms of temperature and precipitation.

You can see the big chart is temperature with monthly, cumulative totals of precipitation. (I use Celsius, but you can easily toggle to Fahrenheit.) Above the chart is the total departure of the yearly average. Anyway, I took screenshots of Philadelphia and Chicago. Go to the New York Times to check out your local cities.

Philadelphia, PA
Philadelphia, PA

Chicago, IL
Chicago, IL

Credit for the piece goes to K.K. Rebecca Lai and Gregor Aisch.