Just a Little Axis if You Please

In my last post, I commented upon a graphic from the Philadelphia Inquirer where a min/max axis line would have been helpful. This post is a quick follow-up of sorts, because a week ago I flagged something similar for me to perhaps mention on Coffee Spoons. So here I shall mention away.

We have another graphic from the Inquirer in an article about the Philadelphia region’s oppressive humidity this summer. The chart presents its information straightforwardly—bars representing the percentage of hours wherein the dew point sat above 70ºF. Muggy. Muggy as hell. Because I guarantee you the heat in Hell is a humid one. None of the dry dessert heat.

Overall, the graphic works well. It is interactive so you can mouse over the bar and read the precise data point. I love that far more than the increasingly prevalent let’s-label-every-data-point-on-the-chart-and-distract-the-eyeball-from-the-actual-pattern-of-what-is-going-on approach.

This summer has been the third muggiest in Philadelphia in the last three-quarters of a century. The designer highlighted 2025 at the end of the series—not necessary, but I can live with it. But what then stands out are the two muggiest years—two very tall bars. But note that there is no axis line above them. No upper bound. Nothing to help inform the user what percentage point they approach.

I do not always use a maximum or minimum axis line, but usually the outlier has to be extreme, and in that case I will add extra lines around that point to give the user the vital context of scale. Otherwise, the outlier should be just a wee bit above or below the line. I thought I would find a relevant example in my work quickly, but it took nearly 20 minutes of reviewing old work to find one such example.

Here you can see in Figure 6 the pink line barely and briefly rises above the 80% maximum. The reader can see the value just pokes above 80% but was otherwise below it during the entire span of time. And that works great.

Again, this is a small critique of the mugginess chart, but I feel an axis line significantly helps the reader see just how muggy those summers were. Spoiler: nearly 54% of the time was “oppressive”.

To play devil’s advocate, perhaps if the article were not about how this is third muggiest summer, the designer could have skipped adding an axis line at 60% or so. But, because such the author placed such emphasis on the third-most bit, the graphic really would benefit from the context of how the 45% thus far for 2025 compares to the top-two summers.

Credit for the piece goes to Stephen Stirling.

Bring on the Beantown Boys

For my longtime readers, you know that despite living in both Chicago and now Philadelphia, I am and have been since way back in 1999, a Boston Red Sox fan. And this week, the Carmine Hose make their biennial visit down I-95 to South Philadelphia.

And I will be there in person to watch.

This is the second series after the All-Star break and as much as I wish it were otherwise, the Red Sox are just not as good as the Phillies. The team my hometown supports is just better than the one for whom I root. The Sox are 54-47 with a .535 winning percentage and the Phillies are 56-43 with a .566 winning percentage. The Phillies have the better rotation, by far. And the Red Sox’ two best pitchers just threw out in Chicago whereas the Phillies’ best toe the rubber over the next three nights.

But…the Boston baseball bats are a bit better and the Bank is a bandbox. Consequently I do not want to say the Phillies sweep the Sox, but my prediction is it will be tough for the Sox.

How does this connect to information design and data visualisation? Last week as the “second half” began, my local rag, the Philadelphia Inquirer, published an article examining the Phillies’ season to date and their road up ahead. It included a couple of graphics I wanted to share, because I found them a nice addition to the type of article usually devoid of such visual pieces.

The first piece looked at the Phillies’ performance relative to recent teams.

You can see the 2025 club is out performing the 2022 and 2023 editions of the team. I have a few critiques, but overall I enjoyed the graphic. I think the heavier stroke and the colour change for 2025 works…but are both necessary? Or at least to the extent the designer chose? And which line is which year?

The chart is too visually busy with too many bits and bobs clamouring for attention. The heaviness of the blue stroke works because the chart needs the loudness. But move the year labels to a consistent location—which, once established helps the user find similar information—and remove the data label annotations—the precise number of games over .500 should be clear through the axis labelling. If I make a couple quick edits in Photoshop to the image, you end up with something like this.

Again, an overall good graphic, but one with just a few tweaks to quiet the overall piece allows the user to more clearly identify the visual pattern—that the Phillies are good and better than two of their three most recent iterations.

The second piece was even better. It looked at the Phillies’ forthcoming opponents, which at the time of publication first included the Los Angeles Angels before the Sox. (For what it may be worth, the Angels won two of three.)

A different graphic, the same critique: overall good, but visually cluttered. Here I revisit the chart, but move some elements around to clear the chart’s visual space of clutter to emphasise the visual pattern in the chart.

I left the annotated point about the Phillies’ winning percentage, because I do think annotations work. But when a chart is full of annotations, the annotations become the story, not the graphic. And if that is the story, then a table or factettes become a better visual solution to the problem.

I will add I do not love how low the line for the opponents falls below the chart’s minimum axis. I probably would have extended the chart to something like .750 and .250, but it is far from the worst sin I see these days. (I keep thinking of writing something about the decline of the quality of data visualisation and information design in recent years, but that feels more akin to a polemical essay than a short blog post.)

Big takeaway, I like seeing my baseball articles with nice data visualisation. It heralds back to a couple of years ago when outlets routinely published such pieces. Baseball especially benefits from data visualisation because the game generates massive amounts of data both within each game and the collective 162-game season.

Good on the Inquirer for this article. I do not usually read the Sports section, because I am not a Philadelphia sports fan, but maybe I will read a bit more of the Phillies coverage if they include visual content like this.

Credit for the original pieces goes to Chris A. Williams. The edits are mine.

2025 Red Sox Draft Breakdown

Monday and Tuesday, Major League Baseball conducted its amateur player draft, wherein teams select American university and high school players. They have two weeks to sign them and assign them. (Though many will not actually play this year.)

Two years ago the Red Sox installed Craig Breslow as their new chief baseball organisation. He has cut a number of front office personnel and reorganised the Red Sox front office, leading to a number of departures. Crucially for this context, a number of the scouts who identified key Red Sox players like Roman Anthony were either let go or left. The team then focused on analysts and models.

My questions have thus been focused on how this might change the Red Sox’ approach to the draft. A running joke in Sox circles has been how every year the Red Sox draft a high school shortstop from California. But this year, the Red Sox’ first pick was Kyson Witherspoon, a starting pitcher from Oklahoma.

The graphic above shows how Witherspoon was ranked by the media who covers this niche area of baseball: a consensus top-10 pick. And yet the Sox selected Witherspoon at no. 15 overall. This has been another trend of the Sox over the last several years, where other teams select lower-ranked players and leave higher-ranked players available to the Sox and other mid-round selectors. Similarly, fourth-round pick Anthony Eyanson, ranked roughly 40–65, remained on the board and so the Sox took him at no. 87.

As someone who follows the Sox system, they need quality pitching prospects as they have very few of proven track records in the minors. Witherspoon and Eyanson provide them that, at least the quality, the track records have yet to develop. Marcus Phillips, seemingly, presents more of a lottery ticket. His ranking spread so far, from 13 to 98, it is clear there is no consensus on the type of talent the Sox took in him.

Godbout is a middle-infielder with a good hit tool, but light on the power. Clearly the Sox believe they can work with him to develop the power in the next few years. But all in all, three pitchers in the first four rounds.

Now, the additional context for the non-baseball fans amongst you who are still reading is this. Baseball’s draft does not work in the same way as those of, say the NFL or the NBA. One, the draft is much deeper at 20 rounds. (In my lifetime it used to be as deep as 50.) Two, teams (usually) do not draft for need. I.e., unlike the NFL where a team , say the Patriots, who needs a wide receiver might draft a wide receiver with their first pick, a team like the Red Sox who need, say, a catcher will not draft a catcher. A key reason why, it takes years for an MLB draftee to reach the majors if he does so at all. Whereas an NFL draftee likely plays for the Patriots the following year. In short, there is often a lag between the draft and the debut—unless you are the Los Angeles Angels. Thus you address your current positional needs via free agency or trades, not the draft. (Unless you are the Angels.) For the purposes of the draft, you therefore draft the “best player available” (BPA).

Some systems, however, are just better at doing different things. Some teams do a better job of developing pitchers, others of developing hitters. Some of developing certain traits of pitching or hitting. Some teams are just bad at it overall. The Sox have, of late, been very good at developing position players/hitters. They have been pretty not-so-great at developing pitching. Hence, when Breslow said he could improve their pitching pipeline, the Sox jumped at the chance to hire him. (It also helps everyone else they interviewed said no, and a number of candidates declined to even be interviewed.)

In part, the failure to develop pitching could be a failure to identify the correct player traits or characteristics. It could be the wrong methods and strategies, improper techniques and technologies. But, if we look at the recent history of Red Sox drafts, it could be, in part, also a consistent lack of drafting pitching. After all, the 26-man MLB team roster comprises 14 pitchers and 12 position players. (Technically it is a limit of 14 pitchers, but teams seem to generally max out their pitcher limit.)

You can see in my graphic above, since the late 2000s, the Red Sox, with few exceptions, ever drafted more than 50% pitchers. This period of time coincides with the ascendance of the vaunted Sox position player development factory and the decline of the homegrown starter. (Again, the obligatory reminder correlation is not causation.)

Nevertheless, in the last few years, we have seen the drafting of pitchers spike. In the first two years of the new Breslow regime, pitchers represent more than 70% of the amateur draft. (There is also the international signing period where players from around the world can be signed within limits. This is how the Sox have drafted very talented players like Rafael Devers and Xander Bogaerts. I omitted this talent acquisition channel from the graphics.)

Consequently, when a team states its strategy is to draft the BPA, but over 70% of all players selected are pitchers, I wonder how one defines “best”. Are the Red Sox weighing pitching more heavily than hitting? Is this an attempt to address a long-standing asymmetry in talent? In the models teams like the Red Sox use, are pitchers worth, say, 1.5× more than hitters? I doubt we will ever know the answer, though the team maintains they draft the best player available.

Ultimately, it may matter very little for the Red Sox in the near-term. The sport’s best prospect, Roman Anthony, is just starting to man the outfield for the Sox. A consensus top-10 prospect, Marcelo Mayer, has also just debuted. A top-25 prospect, Kristian Campbell, debuted on Opening Day. Two second-year players round out the outfield in Ceddanne Rafaela and Wilyer Abreu. A rookie catcher is behind the plate. The Sox may not need serious high-end positional player talent in the next 3–5 years. (Though it certainly helps when trying to trade for other pieces.)

But a two-year lull in drafting high-end positional player talent, on top of the previous two years’ first-round draft picks, catcher Kyle Teal and outfielder Braden Montgomery, being traded for ace Garrett Crochet, means the Sox may well have a several-year gap in positional player matriculation to the majors. That might matter.

Baseball, unlike the NFL and the NBA, is a marathon, however. So perhaps this is all a tempest in a teapot. Let us check back in five years’ time and we can see whether this new draft strategy, if it is indeed a strategy, has cost the Red Sox anything.

Credit for the pieces is mine.

It’s Raining Drones

Last Friday the BBC published an article about the US’ resumption of supplying military assistance to Ukraine in its defence of Russia’s invasion. But in that article, the author referenced the increased intensity of Russian drone and missile strikes on Ukraine over that week.

To show the intensity, the BBC included this graphic, which incorporates a heat map into a traditional calendar design. A thin white line separates each day and a thicker stroke separates the months.

The legend incorporates its own visualisation component, wherein the scale of the difference in the bin buckets shows. After all, there is a significant difference between a bucket of 25 strikes, say between 25 and 50, versus 250 strikes, say between 250 and 500.

I really liked this graphic a lot. It very clearly shows that increasing intensity and annotations point out the worst days for Ukraine were indeed in that last week. And in attention to detail, note how the arrows have a thin white stroke outlining them, helping create visual separation between the arrows and the calendar heatmap below.

Credit for the piece goes to the BBC graphics department.

A Warming Climate Floods All Rivers

Last weekend, the United States’ 4th of July holiday weekend, the remnants of a tropical system inundated a central Texas river valley with months’ worth of rain in just a few short hours. The result? The tragic loss of over 100 lives (and authorities are still searching for missing people).

Debate rages about why the casualties ranked so high—the gutting of the National Weather Service by the administration shines brightly—but the natural causes of the disaster are easier to identify. And the BBC did a great job covering those in a lengthy article with a number of helpful graphics.

I will start with this precipitation map, created with National Oceanic and Atmospheric Administration (NOAA) data.

A map of precipitation over central Texas.

I remain less than fully enthusiastic about continual gradients for map colouration schemes, however the extreme volume of rainfall during the weather event makes the location of the flooding obvious to all. Nonetheless the designers annotated the map, pointing out river, the camp at the centre of the tragedy and the county wherein most of the deaths occurred.

In short, more than 12 inches of rain fell in less than 24 hours. The article also uses a time lapse video to show the river’s flash flooding when it rose a number of feet in less than half an hour.

The article uses the captivating footage of the flash flooding as the lead graphic component. And I get it. The footage is shocking. And you want to get those sweet, sweet engagement clicks and views. But from the standpoint of the overall narrative structure of the piece, I wonder if starting with the result works best.

Rather, the extreme rainfall and geographic features of the river valley contributed at the most fundamental level and showcasing that information and data, such as in the above map, would be a better place to start. The endpoint or culmination of the contributing factors is the flash flooding and the annotated photo of flood water heights inside the cabins of the camp.

Overall I enjoyed the piece tremendously and walked away better informed. I had visited an area 80 miles east of the floods several years ago for a wedding. Coincidentally on the 4th I remarked to a different friend from the area now living in Philadelphia about the flatness and barrenness of the landscape between Austin and San Antonio. I had no idea that just to the west rivers cut through the elevated terrain that would together cause over a hundred deaths a few hours later.

Credit for the piece goes to the BBC, but the article listed a healthy number of contributors whom I shall paste here: Writing by Gary O’Donoghue in Kerr County, Texas, Matt Taylor of BBC Weather and Malu Cursino. Edited by Tom Geoghegan. Images: Reuters/Evan Garcia, Brandon Bell, Dustin Safranek/EPA/Shutterstock, Camp Mystic, Jim Vondruska, Ronaldo Schemidt/AFP and Getty.

Living Longer by the Generations

Last weekend was Easter—for both the Catholics and the Orthodox—and I visited the Appalachian ancestral home of the Carpatho–Rusyn side of my family. Before leaving town I drove up to the old cemetery on a hill overlooking the old church and the Juniata River to pay my respects to those who came before me and without whom I would not be here.

At the end of the four-hour drive back to Philadelphia, stuck in traffic on the Schuylkill Expressway because of course, I realised I had never really looked holistically at the causes of death of my direct ancestors. Earlier this week I spent some time putting that together and then, of course, I realised I wanted to see if I could find any patterns in the data. So of course I made a chart.

If we go back a couple of generations, you can see my ancestors lived to a median age of their mid-60s. But by the time of my grandparents that has increased to almost 80. Of course, the sample size is far smaller for grandparents than great-great-great-&c.-grandparents. Nonetheless, the general trend of the median line is upward.

A few exceptions pull those lines in both directions, however. Catherine Sexton died at the age of 35 from heart disease and James Scollon in the same generation died at 36 from typhoid fever. Additionally, that generation includes a few ancestors who remained in present-day Slovakia in what was one of the most impoverished areas of Europe. Not surprisingly they died in their 40s and 50s. If I exclude those people, the average shoots back up to about 70.

I also decided to colour the minimums and maximums by gender, because as you can see there is a broad pattern of longer-lived women and men who died young. I want to dig more into that aspect of the demographics at a later date to see if that trend holds. I suspect it would because that is the historical trend, but you never know.

Credit for the piece is mine.

Happy Liberation Day

Yesterday I created a map detailing the new tariff rates released by President Trump on Wednesday. I was inspired by the curious inclusion of several small territories with almost no trade with the United States, and a few of whom are uninhabited. What follows is the graphic and the accompanying text I wrote as I wrote it.

I say that only because some people have not entirely caught the…let’s say tone with which I wrote.


All hail the new tariffs. Very obviously, foreign governments will be paying us lots of cash money. Places like Lesotho, with its so-called high rates of poverty, AIDS, and under-development, are clearly just fronts for the rich. Because their tariffs on us are turning them into the richest, most luxurious places on Earth.

Now I don’t know for sure, but some people say the shithole places like Nambia are really cash cows. Nerds tell me places like Nambia don’t exist, but their just idiots looking in the wrong wardrobe. Genius-level intellects like me can easily find Nambia on a map.

There are some very bad ombres out there, and I’m looking at you, Señor Diego Garcia. Some say you’re a thug with bad tattoos whom we should disappear to a secret black site. But the nerds keep telling me you’re not a person, just an island. That you’re not an illegal alien, but a British island where no civilians live, just US soldiers on a secret military base. But we need that money to pay for all the tax cuts for the rich. So we’ll just make our troops there pay Señor Garcia’s tariffs until he stops being lazy and pays us.

Then I’m looking at places like Christmas Island. That Santa Claus is really a bad guy. I know some of you like him—I like him too; he was good to me when I was a child. But all he does is export toys and joys. And that needs to be taxed. So I need Christmas Island to give us all their very real Christmas money.

Finally, I’m looking at Heard Island and McDonald Islands who’re trying to hide near the Antarctic Circle with all the other bad guys and their fortresses of solitude and vaults of swimmable coins. Sure, those nerds keep telling me these islands are uninhabited. But Amber Heard and Ronald McDonald are real people, in league with the Hamburgler, stealing all our rightful American money. The nerds say the islands are only inhabited by penguins. So if you want to say that Amber and Ronald are really just penguins, then we’re going to get all our sweet tariff money from the so-called penguins. Some of whom are emperors. Can you believe that? Emperor penguins? Emperors are rich. So we need to liberate those penguin dollars from the penguin monarchy.

Credit for the piece is mine.

The Red Sox May Finally Have a Second Baseman

Last week was baseball’s opening day. And so on the socials I released my predictions for the season and then a look at the revolving door that has been the Red Sox and second base since 2017.

Back in 2017 we were in the 11th year of Dustin Pedroia being the Sox’ star second baseman. That summer, Manny Machado slid spikes up into second and ruined Pedroia’s knee. Pedroia had surgery and missed Opening Day 2018 then struggled to return. He played 105 games in 2017 then only three in 2018 and then six in 2019. And thus began the instability. Here’s a list of the Opening Day second baseman since 2017.

  • 2018 Eduardo Nuñez
  • 2019 Eduardo Nuñez
  • 2020 José Peraza
  • 2021 Kiké Hernández
  • 2022 Trevor Story
  • 2023 Christian Arroyo
  • 2024 Enmanuel Valdez
  • 2025 Kristian Campbell

And, again, by comparison…

  • 2007 Dustin Pedroia
  • 2008 Dustin Pedroia
  • 2009 Dustin Pedroia
  • 2010 Dustin Pedroia
  • 2011 Dustin Pedroia
  • 2012 Dustin Pedroia
  • 2013 Dustin Pedroia
  • 2014 Dustin Pedroia
  • 2015 Dustin Pedroia
  • 2016 Dustin Pedroia
  • 2017 Dustin Pedroia

But not only is it a lack of stability, it is a lack of production. Wins Above Replacement (WAR) is a statistic that attempts to capture a player’s value relative to an “average” player or substitute. A below replacement level person is less than 0 WAR. A substitute is 0–2, a regular everyday players is 2–5, an All Star is 5–8, and an elite MVP level performance is 8+ WAR. And, spoiler, the Sox have not had a 5+ WAR second baseman since Pedroia’s final full season in 2016.

Suffice it to say, the Sox have long had a need for a long-term second baseman. The graphics I created were meant to be two Instagram images in the same post, and so the the axis labels and lines stretch across the artboards.

The graphic shows pretty clearly the turmoil at the keystone. The two outliers are Kiké Hernández in 2021 and Trevor Story in 2022. The latter is easily explained. Story was signed to be the backup plan in case shortstop Xander Bogaerts left after 2022. (Back in 2013 I made a graphic after a similar revolving door of shortstops in the eight years after the Red Sox traded Nomar Garciaparra. Then the question was, would a young rookie named Xander Bogaerts be the replacement for the beloved Nomah. Xander played 10 years for the Sox.)

Kiké, however, is a bit trickier to explain. WAR weights value by position. A second baseman is worth more than a leftfielder. But shortstops and centrefielders are worth more than second baseman. And Kiké played a lot more shortstop and centre than he did second base, which likely explains his 4.9 WAR that season.

And so now in 2025 we had yet another guy starting at second. His name? Kristian Campbell. I saw him a few times last year as he rocketed from A to AAA, the lowest to highest levels of minor league player development below the major league. I thought he looked good and so did the professionals, because he’s a consensus top-10 prospect in the sport.

Going into Monday’s matchup between Boston and Baltimore, Campbell is hitting 6 for 14 with one homer and two doubles, an on-base percentage of .500 and an OPS (on-base plus slugging, which weights extra base hits more heavily than singles) of 1.286. Spoiler: that’s very good.

Boston beat writers are reporting the Sox and Campbell’s agent are in talks for a long-term extension.

It looks like the Sox may have found their new long-term second baseman.

Credit for the piece is mine.

My Irish Heritage

This week began with Saint Patrick’s Day, a day that here in the States celebrates Ireland and Irish heritage. And I have an abundance of that. As we saw in a post earlier this year about some new genetic ancestry results, Ireland accounts for approximately 2/3 of my ancestry. But as many of my readers know, actual records-based genealogy is one of my big hobbies and so for this Saint Patrick’s Day, I decided to create a few graphics to capture all my current research on my family’s Irish heritage.

In the current political climate wherein we hyperfixate on immigration, I started with my ancestors’ immigration to North America.

My graphic features a timeline marking when certain ancestors arrived, with the massive caveat I do not know when all my Irish ancestors arrived. I separate the ancestors into paternal and maternal lines. My maternal lines are only half Irish, and unfortunately most of them offer little in terms of early records or origins and so the bulk of the graphic lands on my paternal lines.

I did sort out that two–four lines began in Canada and included them with orange dots. (The one couple married in Ireland shortly before setting sail for Canada. The other two lines married in Canada.) I also added a grey bar representing the length of the Great Famine. I suspect a number of my ancestors arrived during the famine based on the fact they begin to appear in the records around 1850, but sadly none of those records state when they arrived specifically instead they just appear in the United States.

I also used filled vs. open dots to indicate whether or not I had primary source documents for arrivals. I.e., a passenger manifest, naturalisation papers, &c. that specifically details immigration information weighs more heavily as evidence than, say, a census record wherein a respondent can say he or she arrived in such a year. (Spoiler, census records are not infallible.)

The overall takeaway, most of my Irish immigrants, for whom I have information, arrived in the middle of the 19th century within a decade of the Great Famine.

The second graphic features even more difficult data to find. Whence did my ancestors come?

For those unfamiliar with Irish genealogy, finding the town or parish from which your ancestors hailed can be nigh impossible. To start, you need some kind of American-based record that gives you a clue as to where in Ireland to look—a county or city. From my experience, most records simply state places of birth as “Ireland”—not very helpful.

Then if you can get back to Ireland, the typical resource you might use in the United States, United Kingdom, and other countries is the census. And Ireland did record a census every ten years, beginning in 1821. Unfortunately 1861 and 1871 were destroyed shortly after the data was recorded. Then during World War I, the 1881 and 1891 censuses were pulped due to a paper shortage. Then in 1921, there was no census because of the whole Irish Civil War thing. Finally in 1922, during the Battle of Dublin in the whole Irish Civil War thing, the Public Records Office at the Four Courts, which held government records dating back hundreds of years as well as guns and ammunition, was blown up. And with the ammunition, so too was blown up the census records for 1821, 1831, 1841, and 1851. In short, genealogists only have access to census records for 1901 and 1911. (The 1926 census organised post-Civil War, does not become public until 2027.)

Then you have the whole unavailability of Catholic Church records, which is another long discussion about the conflict between Protestants and Catholics in Ireland. (Just a minor thing in Irish history.)

There are some civil public records available and they begin in the mid-19th century, which in many cases is just a bit too late for genealogical purposes.

Suffice it to say, Irish genealogy can be tricky and in 15 years of researching it myself, I have only been able to find the origins of 10 Irish immigrant ancestors. For context, to the best of my knowledge I have 18 Irish immigrant ancestors. Thus that map is very empty.

The second map of the United States and United Kingdom is more complete because more complete records. It maps the residences of my Irish and Irish-American ancestors. Initially I attempted to link all the towns and cities with arrows to show the migration patterns, alas it quickly became a mess at such a small scale. That remains a project for another day.

My Irish heritage is a thing of which I am proud, and I am glad to say my genealogy hobby has allowed me to explore it much more deeply and richly than a green-dyed pint would allow.

Credit for the pieces is mine.

A Refreshed Look at My Ethnic Heritage

Late last week I received an update on my ethnic breakdown from My Heritage, a competitor of Ancestry.com and other genealogy/family history/genetic ancestry companies. For many years, the genealogical community had been waiting for this long-promised update. And it has finally arrived.

For my money, My Heritage’s older analysis, v0.95, did not align with my historical record research—something I have done for almost 15 years now. That DNA analysis painted me with an 85% heritage of Irish, Scottish, and Welsh. Because I have spent a decade and a half researching my ancestors, I know all of my second-great-grandparents, 16 total. 85% means 13–14 of them would be Irish, Scottish, or Welsh. However, four of them are Carpatho-Rusyns from present day eastern Slovakia. And nowhere in my research have I found any connection to the Baltic states or Finland.

Compare that to the update.

Here we have a drastically reduced Irish component that, importantly, has been split from Scottish and Welsh, which now exists as its own genetic group. The East European group appears too low, but perhaps My Heritage identified some of my Slavic ancestry as Balkan—there is a sizeable Carpatho-Rusyn community in Vojvodina, an autonomous oblast in Serbia. Maybe Germanic too? That would start to push it near to 20%.

I do have English ancestry—my Angophilia must come from somewhere—though it is relatively small and I can trace it to the Medieval period. That includes more of the Norman elite than the Anglo-Saxon plebs and so seeing Breton register could be indicative of that Norman/Anglo-Saxon population mixture.

But how does My Heritage results compare to those provided by Ancestry and FamilyTreeDNA, two competitors whose services I have also used. And how does it compare to my actual historical document research?

My Heritage’s newest analysis certainly hits a lot better and is nearer to Ancestry, which aligns best with my research. I do have two questions for my second-great-grandparents. One surrounds Nathaniel Miller, one of whose grandparents (Eliza Garrotson) may not be English but rather Dutch from the Dutch colonisation of the Hudson River Valley in New York south of Albany.

The other question revolves around William Doyle. His mother is identified in the records variously as English and Irish. A family story on that side of the family also suggests one ancestor of English descent. And finally, a recently discovered marriage record for his parents details how his mother (Martha Atkins) was baptised and converted to Catholicism as an adult prior to her marriage. Not all Irish are Catholic, but the vast majority are and that would also suggest Martha was not Irish.

Taking those two questions into account, I have a small range of expected values for my English ancestry and a slightly larger one for my Irish and you can see those in the graphic.

When you compare that to the My Heritage results alongside the Ancestry and FamilyTreeDNA results you can see Ancestry aligns best with my research whereas FamilyTreeDNA aligns the least. My Heritage now falls squarely between the two. And so I consider their update a success. I think the company still has some work to do, but progress is progress.

Credit for the pieces is mine.