Late last week I received an update on my ethnic breakdown from My Heritage, a competitor of Ancestry.com and other genealogy/family history/genetic ancestry companies. For many years, the genealogical community had been waiting for this long-promised update. And it has finally arrived.
For my money, My Heritage’s older analysis, v0.95, did not align with my historical record research—something I have done for almost 15 years now. That DNA analysis painted me with an 85% heritage of Irish, Scottish, and Welsh. Because I have spent a decade and a half researching my ancestors, I know all of my second-great-grandparents, 16 total. 85% means 13–14 of them would be Irish, Scottish, or Welsh. However, four of them are Carpatho-Rusyns from present day eastern Slovakia. And nowhere in my research have I found any connection to the Baltic states or Finland.
Compare that to the update.
Here we have a drastically reduced Irish component that, importantly, has been split from Scottish and Welsh, which now exists as its own genetic group. The East European group appears too low, but perhaps My Heritage identified some of my Slavic ancestry as Balkan—there is a sizeable Carpatho-Rusyn community in Vojvodina, an autonomous oblast in Serbia. Maybe Germanic too? That would start to push it near to 20%.
I do have English ancestry—my Angophilia must come from somewhere—though it is relatively small and I can trace it to the Medieval period. That includes more of the Norman elite than the Anglo-Saxon plebs and so seeing Breton register could be indicative of that Norman/Anglo-Saxon population mixture.
But how does My Heritage results compare to those provided by Ancestry and FamilyTreeDNA, two competitors whose services I have also used. And how does it compare to my actual historical document research?
My Heritage’s newest analysis certainly hits a lot better and is nearer to Ancestry, which aligns best with my research. I do have two questions for my second-great-grandparents. One surrounds Nathaniel Miller, one of whose grandparents (Eliza Garrotson) may not be English but rather Dutch from the Dutch colonisation of the Hudson River Valley in New York south of Albany.
The other question revolves around William Doyle. His mother is identified in the records variously as English and Irish. A family story on that side of the family also suggests one ancestor of English descent. And finally, a recently discovered marriage record for his parents details how his mother (Martha Atkins) was baptised and converted to Catholicism as an adult prior to her marriage. Not all Irish are Catholic, but the vast majority are and that would also suggest Martha was not Irish.
Taking those two questions into account, I have a small range of expected values for my English ancestry and a slightly larger one for my Irish and you can see those in the graphic.
When you compare that to the My Heritage results alongside the Ancestry and FamilyTreeDNA results you can see Ancestry aligns best with my research whereas FamilyTreeDNA aligns the least. My Heritage now falls squarely between the two. And so I consider their update a success. I think the company still has some work to do, but progress is progress.
Apologies, all, for the lengthy delay in posting. I decided to take some time away from work-related things for a few months around the holidays and try to enjoy, well, the holidays. Moving forward, I intend to at least start posting about once per week. After all, the state of information design these days provides me a lot of potential critiques.
Let us start with the news du jour , the application of tariffs on China and the delayed imposition on both Canada and Mexico. Firstly, let us be very clear what a tariff is. A tariff is a tax paid by importers or consumers on goods sourced from outside the country. In this case, we are talking about Canadian, Mexican, and Chinese imports and the United States slapping tariffs on goods from those countries. Foreign governments do not pay money to the United States, neither Canada, nor Mexico, nor China will pay money to the United States.
You will.
You should expect your shopping costs to increase, whether that is on the price of gasoline (imported from Canada), fast fashion apparel (from China), or avocados (from Mexico). On the more durable goods side, homes are built with Canadian lumber and your automobiles with parts sourced from across North America—the reason why we negotiated NAFTA back in the 1990s.
Now that we have established what tariffs are, why is the Trump administration imposing them? Ostensibly because border security and fentanyl. What those two issues have to do with trade policy and economics…I have no idea. But a few news outlets created graphics showing US imports from our top-five trading partners.
First I saw this graphic from the New York Times. It is a variation of a streamgraph and it needs some work.
A streamgraph type chart from the New York Times
To start, at any point along the timeline, can you roughly get a sense of what the value for any country is? No. Because there is no y-axis to provide a sense of scale. Perhaps these are the top import sources and these are their share of the total imports? Read the fine print and…no. These are the countries with a minimum of 2% share in 2024, which is approximately 75% of US imports.
This graphic fails at clearly communicating the share of imports. You need to somehow extrapolate from the y-height in 2024 given the three direct labels for Canada, Mexico, and China what the values are at any other point in time or for any other country.
Nevertheless, the chart does a few things nicely. It does highlight the three countries of importance to the story, using colours instead of greys. That focuses your attention on the story, whilst leaving other countries of importance still available for your review. Secondly, the nature of this chart ranks the greatest share as opposed to a straight stacked area chart.
Overall, for me the chart fails on a number of fronts. You could argue it looks pretty, though.
The aforementioned stacked area charts—also not a favourite of mine for this sort of comparison—forces the designer to choose a starting country in this case and then stack other countries atop it.
A stacked area chart from the BBC
What this chart does really well, especially well compared to the previous New York Times example is provide content for all countries across all time periods by the inclusion of the y-axis. Like the Times graphic it focuses attention on Canada, Mexico, and China with colour and uses grey to de-emphasise the other countries. You can see here how the Times’ decision to exclude all countries below 2% can skew the visual impact of the chart, though here all countries below Japan (everything but the top-five) are grouped as other.
Personally, the inclusion of the specific data labels for Canada, Mexico, and China distract from the visualisation and are redundant. The y-axis provides the necessary framework to visually estimate the share. If the reader needs a value to the precision level of tenths, a table may be a better option.
I could not find one of the charts I thought I had bookmarked and so in an image search I found a chart from one of my former employers on the same topic (though it uses value instead of share) and it is worth a quick critique.
A stacked area chart from Euromonitor International
Towards the end of my time there, I was creating templates for more wide-screen content. My fear from an information design and data visualisation standpoint, however, was the increased stretch in simple, low data-intensity graphics. This chart incorporates just 42 data points and yet it stretches across 1200 pixels on my screen with a height of 500.
Compare that to the previous BBC graphic, which is also 1200 pixels, but has a greater height of 825 pixels. Those two dimensions give ratios of 2.4 for Euromonitor International and 1.455 for the BBC. Neither is the naturally aesthetically pleasing golden ratio of 1.618, but at least the BBC version is close to Tufte’s recommended 1.5–1.6. The idea behind this is that the greater the ratio, the softer the slope of the line. This can make it more difficult to compare lines. A steeper slope can emphasise changes over time, especially in a line chart. You can roughly compare this by looking at the last few years of the longer time span in the BBC graphic to the entirety of this graphic. You can more easily see the change in the y-axis because you have more pixels in which to show the change.
Finally we get to another New York Times graphic. This one, however, is a more traditional line chart.
A line chart from the New York Times
And for my money, this is the best. The data is presented most clearly and the chart is the most legible and digestible. The colours clearly focus your attention on Canada, Mexico, and China. The use of lines instead of stacked area allow the top importer to “rise” to the top. You can track the rapid rise of Chinese imports from the late 1990s through to the first Trump administration and the imposition of tariffs in 2018—note the significant drop in the line. In fact you can see the impact in Mexico becoming the United States’ top trading partner in recent years.
Over the years, if I had a dollar for every time I was told someone wanted a graphic made “sexier” or with more “sizzle” or made “flashier”, I would have…a bigger bank account. The issue is that “cooler” graphics do not always lead to clearer graphics. Graphics that communicate the data better. And the guiding principle of information design and data visualisation should be to make your graphics clear rather than cool.
Credit for the New York Times streamgraph goes to Karl Russell.
Credit for the BBC graphic goes to the BBC graphics department.
Credit for the Euromonitor International graphic goes to Justinas Liuima.
Credit for the New York Times line chart goes to the New York Times.
Thoroughbred racing is big business. And Philadelphia’s Parx Casino owns a racing track that, in a recent article in the Philadelphia Inquirer, has seen a number of horse deaths. The article includes a single graphic worth noting, a bar chart showing the thoroughbred death rate. The graphic contrasts rising deaths at Parx with a national trend of declining deaths.
Traditionally rate statistics are shown using dots or line. The idea is that a bar represents counting stats, i.e. how many total horses died. I understand the coloured bars present a more visually compelling graphic on the page, and so I can buy that reason if you are selling it.
Labelling each datapoint, however, with a grey text label above the bar remains unnecessary. They create sparkling, distracting grey baubles above the important blue bars. If you need the specificity to the hundredths degree, use a table. This graphic is also interactive. The mouseover state is where a specific number can be provided, adding an additional layer or level of depth in a progressive disclosure of information.
Last week wrapped up the Coast Guard’s two-week inquiry into the sinking of the submersible Titan, which imploded on a dive to the wreck of Titanic. The BBC summarised the findings in an article at the weekend. It included a number of fascinating annotated photographs identifying parts of the wreckage. But it also included the following graphic, which captures the text messages sent by the Titan and the depths at which the messages were sent.
This is significantly better than a number of pieces I have seen lately, to be fair, most of those focus on the dive depths of various objects and creatures. Mostly that is because the graphics—this one included—do not scale the objects to the depths. I understand the why; many would be too small to see. But I think that difference in scale really hits home just how deep Titanic rests on the seabed.
Because this graphic does not focus on the dive depths of objects, but rather the texts Titan sent at what depth, the scale issue is less relevant. Though, the weird bit is how Titanic sits below 3800 m. She rests at 3840 and that little dip on the sea floor looks closer to 400 m.
Overall, though, a solid piece.
Credit for the piece goes to the BBC’s graphics department.
Because who does not recall the great Sharpie forecast track by the National Hurricane Center (NHC)?
Earlier this summer, in the middle of the hurricane season, the National Oceanic and Atmospheric Administration’s (NOAA’s) NHC released a new, experimental warning cone map. For those unfamiliar, these are the maps that have a white and white-shaded forecast for where the centre of the storm will track. Importantly, it is not a forecast of where the storm will impact. If you have ever been through a hurricane—would not recommend—you know you need not be near the centre to feel the storm’s impact.
I have been waiting for a significant storm to threaten the United States before taking a look at these. (It is also important to note, these new maps apply only to the United States.) But this is the current map for Hurricane Helene as of Wednesday morning.
For those of you who, like me, are familiar with these, you will see the red lines along the coast that indicate hurricane warnings. Blue lines indicate current tropical storm warnings. Not on this map are pink lines for hurricane watches and yellow lines for tropical storm watches. But all these lines only represent watches and warnings along the coast. Little dots indicate the storm’s forecast position at certain times and through letter indicators its strength. The full white areas are the forecast track for the centre of the storm through the first three days. The shaded area is for days 4–5.
Contrast that with the new, experimental version.
The background of the map remains the same. In my perfect world, I would probably drop the grey and blue back a little bit, but that is not the end of the world. Instead, the biggest change is that the tropical storm and hurricane watches and warnings, which have always been declared for full counties inland, are now shown on the map.
You can see the red hurricane warnings are now forecast to move through the eastern Florida panhandle and southern Georgia with tropical storm watches forecast for the inland counties north and east of those. And then the three- and five-day forecasts have blended into a single white cone track. Subtly, the stroke or outline for that has changed from black to solid white. That helps reduce the distracting visuals on the map and emphasise the forecast track and watches and warnings.
Overall, I think is a really strong and important and potentially life-saving improvement to the graphics. Could things be improved more? Absolutely. But sometimes the only way to make improvements is through slow and steady incremental changes. This update does that in spades.
Credit for the piece goes to the NHC graphics team.
The Teamsters Union decided to officially endorse neither candidate in the 2024 US presidential election. Prior to their non-announcement announcement, however, the union surveyed its members and then released the polling data ahead of the announcement.
Of course, the teamsters represent but a single union in a large and diverse country. More importantly, the survey results reported only the share of responses for either candidate—and “Other”—so we have no idea how many of what number opted for whom. But hey, it’s another talking point in the final six weeks of the campaign.
Naturally, I decided to visualise the data.
The trend is pretty, pretty clear. The union’s rank-and-file clearly support Trump for president, with the exception of the teamsters in the District of Columbia. (Note, no survey was taken in Wyoming.) In fact, in only eight states plus DC did Harris’ support top 40%.
This past weekend saw some flooding along the East Coast due to the Moon pulling on Earth’s water. In Boston that meant downtown flooding, including Long Wharf. The Boston Globe’sarticle about the flooding dwelt with more impact, causes, and long-term forecasts—none of which really warranted data visualisation or information graphics. Nonetheless, the article included a long time series examining the change in Boston’s sea level relative to the mean.
For me, the graphic works really well. The data strips out the seasonal fluctuations and presents the reader with a clear view of rising sea levels in Boston. If the noisiness of the red line distracts the reader—one wonders if an annual average could have been used—the blue trend line makes it clear.
And that blue trend line has a nice graphic trick to help itself. Note the designer added a thin white stroke on the outside of the line, providing visual separation from the red line below.
My only real critique with the graphic is the baseline and the axis lines. The chart uses solid black lines for the axes, with grey lines running horizontally depicting the deviation from the mean sea level. But the black lines draw the attention of the eye and thus diminish the importance of the 0 inch line, which actually serves as the baseline of the chart.
If I quickly edit the screenshot in Photoshop, you can see how shifting the emphasis subtly changes the chart’s message.
Today I have a little post about something I noticed over the weekend: labelling line charts.
It begins with a BBC article I read about the ongoing return to office mandates some companies have been rolling out over the last few years. When I look for work these days, one important factor is the office work situation and so seeing an article about the tension in that issue, I had to read it.
The article includes this graphic of Office of National Statistics (ONS) data and BBC analysis.
Overall, the chart does a few things I like, most notably including the demarcation for the methodology change. The red–green here also works. Additionally the thesis expressed by the title, “Hybrid has overtaken WFH”, clearly evidences itself by the green line crossing the blue. (I would quibble and perhaps change the hybrid line to red as it is visually more impactful.)
I also like on the y-axis how we do not have a line connecting all the intervals. Such lines are often unnecessary and can often add visual clutter, see yesterday’s post for something similar. I quibble here with dropping the % symbol for the zero-line. Since the rest of the graphic uses it, I would have put the baseline as 0%. And that baseline is indeed represented by a darker, black line instead of the grey used for the other intervals.
Then we get to the labels on the right of the graphic. Firstly, I do not subscribe to the view charts and graphs need to label individual datapoints. If the designer created the chart correctly, the graph should be legible. Furthermore, charts show relationships, if one needs a specific value, I would opt for a table or a factette instead. These are not the most egregious labels, mind you, but here they label the datapoint, but not the line. Instead, for the line the reader needs to go back to the chart’s data definition and retrieve the information associated with the colour.
Now compare that to a chart representing Major League Baseball’s playoff odds from Fangraphs.
Here too we have mostly good things going on, but I want to highlight the labelling at the right. This chart also includes the precise value, which is fine, but here we also have the actual label for the lines. The user does not need to leave the experience of the chart to find the relevant information, although a secondary/redundant display or legend can be found at the bottom of the chart.
If you can take the time to label the end value, you may as well label the series.
Credit for the BBC graphic goes to the BBC’s graphics department.
Credit for the Fangraphs piece goes to Fangraphs’ design team.
As a wee lad I grew up south of Downingtown, Pennsylvania, an old mill town situated along the banks of the East Branch of the Brandywine Creek. Drop a little stick in the Brandywine and it would float downstream until it joins the Christina River in Wilmington, Delaware and thereafter shortly into the Delaware River.
Delaware has tax-free shopping and movie theatres I frequented in my youth. First laptop purchase for university? Delaware. Furniture for moving out to Chicago? Delaware. In other words, when I posted my most recent map of where I have been, the three counties of Delaware were some of the earliest counties filled in.
Delaware—for better or worse—is seared into my mind. If you look at the state border, you will see the northern border is circular. Look at all other state borders and that circle is kind of weird. Most other borders are straight(ish) lines, mountain ridges, rivers, or bays. The reason is the border between Pennsylvania and Delaware was, essentially, taking out a protractor and drawing a circle twelve miles distant from New Castle, Delaware, the original capital.
Anyway, I have not thought about that in quite some time. But thankfully, xkcd did.
As many of you know, I love geography and so I am aware of many of these places. Lake Manicouagan is one of those places that has an island in it, which has a lake on that island, in which there is another lake. There might even be another island/lake combination, but I could be mistaken.
I grew up less than 15 miles away from the Limerick Nuclear Generating Station, located on the banks of the Schuylkill River northwest of the city of Philadelphia. Our house sat on the north-facing slope of the Great Valley and the cooling towers of Limerick were a ridge line and river valley away from view. But on a clear day, you can see the puffy, billowy clouds of steam rising over the distant horizon—Limerick is splitting the atom.
We all know—or should by now—burning coal, oil, and gas are not terribly great for the planet. They emit carbon dioxide and other gasses that warm the Earth. But the white columns rising over the Schuylkill are water. Fissile uranium is more dense than coal, oil, or gas. And not just by a wee bit. But by orders of magnitude. Splitting the atom provides mankind with enormous amount of energy.
And we need energy. This summer was hot. And I don’t like it hot. Consequently, my air con ran almost nonstop. And I am not the only one. But whence comes all the electricity to power those units? Yes, we can get electricity from the sun, the wind, and the water. But what about when the clouds block the sun? Or the hot, sticky summer air refuses to stir? Or the parched earth has sucked the water from the reservoir?
The uranium atom can still be split, and at a reliable rate. That makes it great to provide a high amount of electricity that can be augmented by the sun, the wind, and the water when conditions permit.
However, in recent years, the cost of oil and gas declined thanks to fracking, and the business cost to run coal plants lowered as environmental standards disappeared. The economics of running nuclear power plants made them less viable than carbon-spewing options. Electricity providers started shutting nuclear plants down.
Things have changed, though. As we run more air con, we need more electricity. As we run more electric busses and trains, we need more electricity. As we charge more electric cars, we need more electricity. As we run more servers for bitcoin mining or AI farms, we need more electricity.
We need more electricity. A lot more.
And so the economics of electricity is changing. The Wall Street Journal had a great article about the re-opening of nuclear plant in Michigan. It included some really nice photographs of the control room and the turbine room. But, the reason we are talking about it here today because the article includes a few diagrams and illustrations. This one caught my attention.
First, I really enjoy how the United States is reduced to a grey outline. Perhaps a very faint grey could have been used to infill the states, but here I think white works best because of the use of the light and medium greys for active plants.
The active plants—not the focus of the article—are in those greys, whilst the decommissioned and -ing plants are in tints of red. What I struggled with a long time ago when I made an infographic about southeastern Pennsylvania’s electricity generation was how to show the different plants at a single facility.
Ultimately, I listed each plant by name then an icon representing the type of fuel, because not every plant uses all the same type of fuel. Eddystone Generating Station just south of Philadelphia used both natural gas/oil plants and two coal plants, though those were retired in the 2010s.
Here the designer, not needing to label each plant and aided by the fact each plant is nuclear, simply encloses the dots within a container. Palisades, the plant in question, receives a thicker, black stroke to call it out against the rest of the plants.
Credit for the piece goes to, I think Adrienne Tong. She is credited for a different graphic in the article, but not the one I highlighted, so I’ll give her the credit unless and until someone else gets the credit.