My Irish Heritage

This week began with Saint Patrick’s Day, a day that here in the States celebrates Ireland and Irish heritage. And I have an abundance of that. As we saw in a post earlier this year about some new genetic ancestry results, Ireland accounts for approximately 2/3 of my ancestry. But as many of my readers know, actual records-based genealogy is one of my big hobbies and so for this Saint Patrick’s Day, I decided to create a few graphics to capture all my current research on my family’s Irish heritage.

In the current political climate wherein we hyperfixate on immigration, I started with my ancestors’ immigration to North America.

My graphic features a timeline marking when certain ancestors arrived, with the massive caveat I do not know when all my Irish ancestors arrived. I separate the ancestors into paternal and maternal lines. My maternal lines are only half Irish, and unfortunately most of them offer little in terms of early records or origins and so the bulk of the graphic lands on my paternal lines.

I did sort out that two–four lines began in Canada and included them with orange dots. (The one couple married in Ireland shortly before setting sail for Canada. The other two lines married in Canada.) I also added a grey bar representing the length of the Great Famine. I suspect a number of my ancestors arrived during the famine based on the fact they begin to appear in the records around 1850, but sadly none of those records state when they arrived specifically instead they just appear in the United States.

I also used filled vs. open dots to indicate whether or not I had primary source documents for arrivals. I.e., a passenger manifest, naturalisation papers, &c. that specifically details immigration information weighs more heavily as evidence than, say, a census record wherein a respondent can say he or she arrived in such a year. (Spoiler, census records are not infallible.)

The overall takeaway, most of my Irish immigrants, for whom I have information, arrived in the middle of the 19th century within a decade of the Great Famine.

The second graphic features even more difficult data to find. Whence did my ancestors come?

For those unfamiliar with Irish genealogy, finding the town or parish from which your ancestors hailed can be nigh impossible. To start, you need some kind of American-based record that gives you a clue as to where in Ireland to look—a county or city. From my experience, most records simply state places of birth as “Ireland”—not very helpful.

Then if you can get back to Ireland, the typical resource you might use in the United States, United Kingdom, and other countries is the census. And Ireland did record a census every ten years, beginning in 1821. Unfortunately 1861 and 1871 were destroyed shortly after the data was recorded. Then during World War I, the 1881 and 1891 censuses were pulped due to a paper shortage. Then in 1921, there was no census because of the whole Irish Civil War thing. Finally in 1922, during the Battle of Dublin in the whole Irish Civil War thing, the Public Records Office at the Four Courts, which held government records dating back hundreds of years as well as guns and ammunition, was blown up. And with the ammunition, so too was blown up the census records for 1821, 1831, 1841, and 1851. In short, genealogists only have access to census records for 1901 and 1911. (The 1926 census organised post-Civil War, does not become public until 2027.)

Then you have the whole unavailability of Catholic Church records, which is another long discussion about the conflict between Protestants and Catholics in Ireland. (Just a minor thing in Irish history.)

There are some civil public records available and they begin in the mid-19th century, which in many cases is just a bit too late for genealogical purposes.

Suffice it to say, Irish genealogy can be tricky and in 15 years of researching it myself, I have only been able to find the origins of 10 Irish immigrant ancestors. For context, to the best of my knowledge I have 18 Irish immigrant ancestors. Thus that map is very empty.

The second map of the United States and United Kingdom is more complete because more complete records. It maps the residences of my Irish and Irish-American ancestors. Initially I attempted to link all the towns and cities with arrows to show the migration patterns, alas it quickly became a mess at such a small scale. That remains a project for another day.

My Irish heritage is a thing of which I am proud, and I am glad to say my genealogy hobby has allowed me to explore it much more deeply and richly than a green-dyed pint would allow.

Credit for the pieces is mine.

113 Years Later and We’re Still Talking About Watertight Compartments

Earlier this week, a Portuguese-flagged cargo container ship collided with an American-flagged tanker just off the Humber estuary in Yorkshire, England. The American-flagged ship, the Stena Immaculate, carries aviation fuel for the US Air Force. The Solong, the Portuguese-flagged tanker, carries alcohol, which is far better than the toxic chemicals initially feared.

We still know very little about the circumstances of the collision other than the Solong, travelling at 16 knots, slammed into the port side of the Stena Immaculate, which was anchored offshore.

I decided to write a little post because I enjoyed this graphic from the BBC, which details why the Stena Immaculate has not yet sunk—and at the time of my writing is not believed to be in danger of—despite the large hold amidships.

The graphic uses a simple line illustration of a bulk carrier in both 3/4 and a frontal view. The first shows how vessels like the Stena Immaculate separate their cargo into distinct holds, often watertight, so that, should a collision occur, the damage will not flood the entire ship or affect the load of the cargo. For the latter, sloshing liquids, as one example, can alter the centre of gravity and negatively impact ship stability.

The second line drawing illustrates the value of a double-hulled vessel wherein the outer hull shields the inner hull from puncture and prevents massive flooding of interior spaces.

Of course on 11 March, we are a little over a month away from the anniversary of the sinking of RMS Titanic. (In)famously in that case the critical issue was the same idea of watertight compartments. She had enough of them, but crucially they did not rise to the top of the ship as they would have necessarily impacted the luxury of first and second class accommodations. Titanic also did not have a double hull—her bottom was, but this did not run up the ship’s sides to the level where the iceberg impacted the ship.

Overall, I really like this graphic. It needs no elaborate and detailed illustration. Nor does it need sophisticated animations. All it uses is simple line illustrations.

Credit for the piece goes to the BBC graphics department.

A Refreshed Look at My Ethnic Heritage

Late last week I received an update on my ethnic breakdown from My Heritage, a competitor of Ancestry.com and other genealogy/family history/genetic ancestry companies. For many years, the genealogical community had been waiting for this long-promised update. And it has finally arrived.

For my money, My Heritage’s older analysis, v0.95, did not align with my historical record research—something I have done for almost 15 years now. That DNA analysis painted me with an 85% heritage of Irish, Scottish, and Welsh. Because I have spent a decade and a half researching my ancestors, I know all of my second-great-grandparents, 16 total. 85% means 13–14 of them would be Irish, Scottish, or Welsh. However, four of them are Carpatho-Rusyns from present day eastern Slovakia. And nowhere in my research have I found any connection to the Baltic states or Finland.

Compare that to the update.

Here we have a drastically reduced Irish component that, importantly, has been split from Scottish and Welsh, which now exists as its own genetic group. The East European group appears too low, but perhaps My Heritage identified some of my Slavic ancestry as Balkan—there is a sizeable Carpatho-Rusyn community in Vojvodina, an autonomous oblast in Serbia. Maybe Germanic too? That would start to push it near to 20%.

I do have English ancestry—my Angophilia must come from somewhere—though it is relatively small and I can trace it to the Medieval period. That includes more of the Norman elite than the Anglo-Saxon plebs and so seeing Breton register could be indicative of that Norman/Anglo-Saxon population mixture.

But how does My Heritage results compare to those provided by Ancestry and FamilyTreeDNA, two competitors whose services I have also used. And how does it compare to my actual historical document research?

My Heritage’s newest analysis certainly hits a lot better and is nearer to Ancestry, which aligns best with my research. I do have two questions for my second-great-grandparents. One surrounds Nathaniel Miller, one of whose grandparents (Eliza Garrotson) may not be English but rather Dutch from the Dutch colonisation of the Hudson River Valley in New York south of Albany.

The other question revolves around William Doyle. His mother is identified in the records variously as English and Irish. A family story on that side of the family also suggests one ancestor of English descent. And finally, a recently discovered marriage record for his parents details how his mother (Martha Atkins) was baptised and converted to Catholicism as an adult prior to her marriage. Not all Irish are Catholic, but the vast majority are and that would also suggest Martha was not Irish.

Taking those two questions into account, I have a small range of expected values for my English ancestry and a slightly larger one for my Irish and you can see those in the graphic.

When you compare that to the My Heritage results alongside the Ancestry and FamilyTreeDNA results you can see Ancestry aligns best with my research whereas FamilyTreeDNA aligns the least. My Heritage now falls squarely between the two. And so I consider their update a success. I think the company still has some work to do, but progress is progress.

Credit for the pieces is mine.

Imports, Tariffs, and Taxes, Oh My!

Apologies, all, for the lengthy delay in posting. I decided to take some time away from work-related things for a few months around the holidays and try to enjoy, well, the holidays. Moving forward, I intend to at least start posting about once per week. After all, the state of information design these days provides me a lot of potential critiques.

Let us start with the news du jour , the application of tariffs on China and the delayed imposition on both Canada and Mexico. Firstly, let us be very clear what a tariff is. A tariff is a tax paid by importers or consumers on goods sourced from outside the country. In this case, we are talking about Canadian, Mexican, and Chinese imports and the United States slapping tariffs on goods from those countries. Foreign governments do not pay money to the United States, neither Canada, nor Mexico, nor China will pay money to the United States.

You will.

You should expect your shopping costs to increase, whether that is on the price of gasoline (imported from Canada), fast fashion apparel (from China), or avocados (from Mexico). On the more durable goods side, homes are built with Canadian lumber and your automobiles with parts sourced from across North America—the reason why we negotiated NAFTA back in the 1990s.

Now that we have established what tariffs are, why is the Trump administration imposing them? Ostensibly because border security and fentanyl. What those two issues have to do with trade policy and economics…I have no idea. But a few news outlets created graphics showing US imports from our top-five trading partners.

First I saw this graphic from the New York Times. It is a variation of a streamgraph and it needs some work.

A streamgraph type chart from the New York Times

To start, at any point along the timeline, can you roughly get a sense of what the value for any country is? No. Because there is no y-axis to provide a sense of scale. Perhaps these are the top import sources and these are their share of the total imports? Read the fine print and…no. These are the countries with a minimum of 2% share in 2024, which is approximately 75% of US imports.

This graphic fails at clearly communicating the share of imports. You need to somehow extrapolate from the y-height in 2024 given the three direct labels for Canada, Mexico, and China what the values are at any other point in time or for any other country.

Nevertheless, the chart does a few things nicely. It does highlight the three countries of importance to the story, using colours instead of greys. That focuses your attention on the story, whilst leaving other countries of importance still available for your review. Secondly, the nature of this chart ranks the greatest share as opposed to a straight stacked area chart.

Overall, for me the chart fails on a number of fronts. You could argue it looks pretty, though.

The aforementioned stacked area charts—also not a favourite of mine for this sort of comparison—forces the designer to choose a starting country in this case and then stack other countries atop it.

A stacked area chart from the BBC

What this chart does really well, especially well compared to the previous New York Times example is provide content for all countries across all time periods by the inclusion of the y-axis. Like the Times graphic it focuses attention on Canada, Mexico, and China with colour and uses grey to de-emphasise the other countries. You can see here how the Times’ decision to exclude all countries below 2% can skew the visual impact of the chart, though here all countries below Japan (everything but the top-five) are grouped as other.

Personally, the inclusion of the specific data labels for Canada, Mexico, and China distract from the visualisation and are redundant. The y-axis provides the necessary framework to visually estimate the share. If the reader needs a value to the precision level of tenths, a table may be a better option.

I could not find one of the charts I thought I had bookmarked and so in an image search I found a chart from one of my former employers on the same topic (though it uses value instead of share) and it is worth a quick critique.

A stacked area chart from Euromonitor International

Towards the end of my time there, I was creating templates for more wide-screen content. My fear from an information design and data visualisation standpoint, however, was the increased stretch in simple, low data-intensity graphics. This chart incorporates just 42 data points and yet it stretches across 1200 pixels on my screen with a height of 500.

Compare that to the previous BBC graphic, which is also 1200 pixels, but has a greater height of 825 pixels. Those two dimensions give ratios of 2.4 for Euromonitor International and 1.455 for the BBC. Neither is the naturally aesthetically pleasing golden ratio of 1.618, but at least the BBC version is close to Tufte’s recommended 1.5–1.6. The idea behind this is that the greater the ratio, the softer the slope of the line. This can make it more difficult to compare lines. A steeper slope can emphasise changes over time, especially in a line chart. You can roughly compare this by looking at the last few years of the longer time span in the BBC graphic to the entirety of this graphic. You can more easily see the change in the y-axis because you have more pixels in which to show the change.

Finally we get to another New York Times graphic. This one, however, is a more traditional line chart.

A line chart from the New York Times

And for my money, this is the best. The data is presented most clearly and the chart is the most legible and digestible. The colours clearly focus your attention on Canada, Mexico, and China. The use of lines instead of stacked area allow the top importer to “rise” to the top. You can track the rapid rise of Chinese imports from the late 1990s through to the first Trump administration and the imposition of tariffs in 2018—note the significant drop in the line. In fact you can see the impact in Mexico becoming the United States’ top trading partner in recent years.

Over the years, if I had a dollar for every time I was told someone wanted a graphic made “sexier” or with more “sizzle” or made “flashier”, I would have…a bigger bank account. The issue is that “cooler” graphics do not always lead to clearer graphics. Graphics that communicate the data better. And the guiding principle of information design and data visualisation should be to make your graphics clear rather than cool.

Credit for the New York Times streamgraph goes to Karl Russell.

Credit for the BBC graphic goes to the BBC graphics department.

Credit for the Euromonitor International graphic goes to Justinas Liuima.

Credit for the New York Times line chart goes to the New York Times.

Predicting…the Known Stats?

I have been trying to post more regularly here on Coffeespoons, but now that baseball’s postseason is in full swing—pun fully intended—my free time is spent watching balls and strikes at all hours of the day. (Though, with the Wild Card round over and the move from four to two games per day, my time will likely expand as the week winds down. Sort of. More on that in a moment.)

What I have noticed on a few broadcasts, however, is the broadcast team touting Google’s ability to forecast a player’s ability to get on base. Most recently, on Sunday afternoon my mates and I were watching the Phillies–Mets contest and the broadcaster announced or the graphic popped on screen claiming Google predicts Francisco Lindor has a 34% chance to get on base in the plate appearance.

That can be a useful nugget of knowledge. And wow, that is crazy that Google can predict Lindor’s chances of getting on base.

Except it is not.

Francisco Lindor’s on base percentage (OBP) for the 2024 season was 0.344. In other words, in 34.4% of plate appearances (PAs), Lindor either gets a hit or takes a walk. With a entire sample of 689 PAs, Lindor got on base 34% of the time. Maybe Google was taking into account some other factors, but that was just the most recent one I can recall.

I wish I could recall which batter first keyed me into this situation. I want to say it was a high OBP guy, and for whatever reason I pulled my mobile out and opened the batter’s page on Baseball Reference only to find the prediction matched the OBP exactly.

Then it happened again. And again. And again.

Baseball is the greatest sport. One reason I love it is because you can use data and information to describe it. Plan for it. Play it. And sometimes predict it. Sometimes that works. Sometimes, when it doesn’t, it breaks your heart.

Baseball has reams of data and, yes, that data can feed into newer and cooler algorithms and models for predicting outcomes. (Outcomes that surely have nothing to do with the flood of sports gambling available on mobile phones.) But to me, it seems a bit disingenuous to call a statistic that has largely moved out of the realm of baseball nerds into the common understanding of the sport—thanks, Moneyball—a company’s new predictive statistic when that statistic has existed forever.

Separately, as I alluded to earlier, I shall not be posting the next few weeks. I have a weekday wedding to attend later in the week and then I am headed out of town for a few weeks and intend to be doing very little digital stuff. Plus, by the time I return baseball’s postseason shall likely be over.

But in the meantime, I am going to be heading out this afternoon to meet some mates as they cheer on their local squad, the Philadelphia Phillies as they play the Mets. (No, the Red Sox did not, yet again, make the postseason.)

As the first batter, Kyle Schwarber, steps to the plate, I predict he will have a 37% chance of getting on base. And look, his OBP is 0.366.

Racing to the Final Finish Line

Thoroughbred racing is big business. And Philadelphia’s Parx Casino owns a racing track that, in a recent article in the Philadelphia Inquirer, has seen a number of horse deaths. The article includes a single graphic worth noting, a bar chart showing the thoroughbred death rate. The graphic contrasts rising deaths at Parx with a national trend of declining deaths.

Traditionally rate statistics are shown using dots or line. The idea is that a bar represents counting stats, i.e. how many total horses died. I understand the coloured bars present a more visually compelling graphic on the page, and so I can buy that reason if you are selling it.

Labelling each datapoint, however, with a grey text label above the bar remains unnecessary. They create sparkling, distracting grey baubles above the important blue bars. If you need the specificity to the hundredths degree, use a table. This graphic is also interactive. The mouseover state is where a specific number can be provided, adding an additional layer or level of depth in a progressive disclosure of information.

Credit for the piece goes to Dylan Purcell.

Titan’s Final Words

Last week wrapped up the Coast Guard’s two-week inquiry into the sinking of the submersible Titan, which imploded on a dive to the wreck of Titanic. The BBC summarised the findings in an article at the weekend. It included a number of fascinating annotated photographs identifying parts of the wreckage. But it also included the following graphic, which captures the text messages sent by the Titan and the depths at which the messages were sent.

This is significantly better than a number of pieces I have seen lately, to be fair, most of those focus on the dive depths of various objects and creatures. Mostly that is because the graphics—this one included—do not scale the objects to the depths. I understand the why; many would be too small to see. But I think that difference in scale really hits home just how deep Titanic rests on the seabed.

Because this graphic does not focus on the dive depths of objects, but rather the texts Titan sent at what depth, the scale issue is less relevant. Though, the weird bit is how Titanic sits below 3800 m. She rests at 3840 and that little dip on the sea floor looks closer to 400 m.

Overall, though, a solid piece.

Credit for the piece goes to the BBC’s graphics department.

Tired of These Motherf*cking Sox on This Motherf*cking Plane

At least, that’s what I imagine South Siders saying in Chicago as they watch the White Sox team charter plane land at Midway. For those not following America’s Major League Baseball season, the Chicago White Sox are one of two clubs claiming Chicago as their home. (The other being the Cubs.) And the White Sox—not to be confused with your author’s favourite club, the Red Sox—are on track to be one of the worst clubs in the modern (post-1900) history of the sport. They have already tied the New York Mets’ record of 120 losses and there are still six left to play.

Earlier this month the Athletic detailed what has gone wrong for the Pale Hose. One of the things that stood out to me the most in the reporting was the complaints about the club’s charter aircraft, an Airbus A320, as the article points out a 1980s aircraft. The article in particular mentioned how other cheapskate teams—including the Boston Red Sox—opt for nicer aircraft with more first-class accommodations for players and staff. Then they cited a graphic shared on Twitter last year by Jay Cuda and when I saw that, I knew I had to cover it.

One thing I find fascinating is how the White Sox use United Airlines for their charter. United Airlines operates the charter—as it does for the Cubs and other airlines. That it does so for the two Chicago teams makes all the sense in the world as the company is headquartered in the Loop in downtown Chicago. It is also one of the largest airlines and thus makes sense in that dimension too.

But as those frequent air travellers among you will know, Chicago has two airports: O’Hare and Midway. O’Hare in northwest of downtown and closer to the Cubs and is the city’s primary airport. But the White Sox typically fly out of Midway, which is just a couple miles from (New) Comiskey. (I presume the team bus hops on the Dan Ryan/I-90 to the Stevenson/I-55 then exits on Cicero.)

Weird because United does not service Midway. And so United, which operates out of O’Hare, must fly aircraft to Midway to then transport the White Sox. I suppose the White Sox would not want to charter a Southwest aircraft, though…. In my own lifetime I think I have flown in and out of Midway only twice. And I lived in Chicago for eight years. (And the White Sox were terrible for probably six of them.)

Some non-White Sox things notable from the graphic. One, iAero no longer exists, so I would be curious whom the Texas Rangers and Oakland Athletics used this year. The Rangers probably used a reputable airline. The Athletics probably made their players and staff charter their own transport.

I also did not realise that even last year the McDonnell Douglas MD-80 still carried passengers in the United States. I assume that by 2024, the Detroit Tigers have fully transitioned to that Boeing 737. I find it fascinating that only the Tigers own their own aircraft. I would be curious to know why more teams do not, though of course it has to be money.

With whom else would the Blue Jays fly but Air Canada?

Finally, I am surprised that my Boston Red Sox use Delta, because that’s a normal, non-budget airline. And anyone who follows the Red Sox know the Red Sox are no longer in the habit of spending money. I thought they would use jetBlue, which is the sponsor for Fenway South, formally jetBluePark, in Fort Myers, Florida, where the Red Sox have their spring training and development league complex.

Anyways, happy Friday, all. At least you don’t play or work for the Chicago White Sox. (Though I suppose it is possible you do, because I do have a large number of readers from Chicago. But I doubt it.)

Credit for the piece goes to Jay Cuda.

I Need My Sharpie. Where’s My Sharpie?

Because who does not recall the great Sharpie forecast track by the National Hurricane Center (NHC)?

Earlier this summer, in the middle of the hurricane season, the National Oceanic and Atmospheric Administration’s (NOAA’s) NHC released a new, experimental warning cone map. For those unfamiliar, these are the maps that have a white and white-shaded forecast for where the centre of the storm will track. Importantly, it is not a forecast of where the storm will impact. If you have ever been through a hurricane—would not recommend—you know you need not be near the centre to feel the storm’s impact.

I have been waiting for a significant storm to threaten the United States before taking a look at these. (It is also important to note, these new maps apply only to the United States.) But this is the current map for Hurricane Helene as of Wednesday morning.

For those of you who, like me, are familiar with these, you will see the red lines along the coast that indicate hurricane warnings. Blue lines indicate current tropical storm warnings. Not on this map are pink lines for hurricane watches and yellow lines for tropical storm watches. But all these lines only represent watches and warnings along the coast. Little dots indicate the storm’s forecast position at certain times and through letter indicators its strength. The full white areas are the forecast track for the centre of the storm through the first three days. The shaded area is for days 4–5.

Contrast that with the new, experimental version.

The background of the map remains the same. In my perfect world, I would probably drop the grey and blue back a little bit, but that is not the end of the world. Instead, the biggest change is that the tropical storm and hurricane watches and warnings, which have always been declared for full counties inland, are now shown on the map.

You can see the red hurricane warnings are now forecast to move through the eastern Florida panhandle and southern Georgia with tropical storm watches forecast for the inland counties north and east of those. And then the three- and five-day forecasts have blended into a single white cone track. Subtly, the stroke or outline for that has changed from black to solid white. That helps reduce the distracting visuals on the map and emphasise the forecast track and watches and warnings.

Overall, I think is a really strong and important and potentially life-saving improvement to the graphics. Could things be improved more? Absolutely. But sometimes the only way to make improvements is through slow and steady incremental changes. This update does that in spades.

Credit for the piece goes to the NHC graphics team.

For Whom the Teamsters Poll Tolls

The Teamsters Union decided to officially endorse neither candidate in the 2024 US presidential election. Prior to their non-announcement announcement, however, the union surveyed its members and then released the polling data ahead of the announcement.

Of course, the teamsters represent but a single union in a large and diverse country. More importantly, the survey results reported only the share of responses for either candidate—and “Other”—so we have no idea how many of what number opted for whom. But hey, it’s another talking point in the final six weeks of the campaign.

Naturally, I decided to visualise the data.

The trend is pretty, pretty clear. The union’s rank-and-file clearly support Trump for president, with the exception of the teamsters in the District of Columbia. (Note, no survey was taken in Wyoming.) In fact, in only eight states plus DC did Harris’ support top 40%.

Credit for the piece is mine.