Predicting…the Known Stats?

I have been trying to post more regularly here on Coffeespoons, but now that baseball’s postseason is in full swing—pun fully intended—my free time is spent watching balls and strikes at all hours of the day. (Though, with the Wild Card round over and the move from four to two games per day, my time will likely expand as the week winds down. Sort of. More on that in a moment.)

What I have noticed on a few broadcasts, however, is the broadcast team touting Google’s ability to forecast a player’s ability to get on base. Most recently, on Sunday afternoon my mates and I were watching the Phillies–Mets contest and the broadcaster announced or the graphic popped on screen claiming Google predicts Francisco Lindor has a 34% chance to get on base in the plate appearance.

That can be a useful nugget of knowledge. And wow, that is crazy that Google can predict Lindor’s chances of getting on base.

Except it is not.

Francisco Lindor’s on base percentage (OBP) for the 2024 season was 0.344. In other words, in 34.4% of plate appearances (PAs), Lindor either gets a hit or takes a walk. With a entire sample of 689 PAs, Lindor got on base 34% of the time. Maybe Google was taking into account some other factors, but that was just the most recent one I can recall.

I wish I could recall which batter first keyed me into this situation. I want to say it was a high OBP guy, and for whatever reason I pulled my mobile out and opened the batter’s page on Baseball Reference only to find the prediction matched the OBP exactly.

Then it happened again. And again. And again.

Baseball is the greatest sport. One reason I love it is because you can use data and information to describe it. Plan for it. Play it. And sometimes predict it. Sometimes that works. Sometimes, when it doesn’t, it breaks your heart.

Baseball has reams of data and, yes, that data can feed into newer and cooler algorithms and models for predicting outcomes. (Outcomes that surely have nothing to do with the flood of sports gambling available on mobile phones.) But to me, it seems a bit disingenuous to call a statistic that has largely moved out of the realm of baseball nerds into the common understanding of the sport—thanks, Moneyball—a company’s new predictive statistic when that statistic has existed forever.

Separately, as I alluded to earlier, I shall not be posting the next few weeks. I have a weekday wedding to attend later in the week and then I am headed out of town for a few weeks and intend to be doing very little digital stuff. Plus, by the time I return baseball’s postseason shall likely be over.

But in the meantime, I am going to be heading out this afternoon to meet some mates as they cheer on their local squad, the Philadelphia Phillies as they play the Mets. (No, the Red Sox did not, yet again, make the postseason.)

As the first batter, Kyle Schwarber, steps to the plate, I predict he will have a 37% chance of getting on base. And look, his OBP is 0.366.

Racing to the Final Finish Line

Thoroughbred racing is big business. And Philadelphia’s Parx Casino owns a racing track that, in a recent article in the Philadelphia Inquirer, has seen a number of horse deaths. The article includes a single graphic worth noting, a bar chart showing the thoroughbred death rate. The graphic contrasts rising deaths at Parx with a national trend of declining deaths.

Traditionally rate statistics are shown using dots or line. The idea is that a bar represents counting stats, i.e. how many total horses died. I understand the coloured bars present a more visually compelling graphic on the page, and so I can buy that reason if you are selling it.

Labelling each datapoint, however, with a grey text label above the bar remains unnecessary. They create sparkling, distracting grey baubles above the important blue bars. If you need the specificity to the hundredths degree, use a table. This graphic is also interactive. The mouseover state is where a specific number can be provided, adding an additional layer or level of depth in a progressive disclosure of information.

Credit for the piece goes to Dylan Purcell.

Titan’s Final Words

Last week wrapped up the Coast Guard’s two-week inquiry into the sinking of the submersible Titan, which imploded on a dive to the wreck of Titanic. The BBC summarised the findings in an article at the weekend. It included a number of fascinating annotated photographs identifying parts of the wreckage. But it also included the following graphic, which captures the text messages sent by the Titan and the depths at which the messages were sent.

This is significantly better than a number of pieces I have seen lately, to be fair, most of those focus on the dive depths of various objects and creatures. Mostly that is because the graphics—this one included—do not scale the objects to the depths. I understand the why; many would be too small to see. But I think that difference in scale really hits home just how deep Titanic rests on the seabed.

Because this graphic does not focus on the dive depths of objects, but rather the texts Titan sent at what depth, the scale issue is less relevant. Though, the weird bit is how Titanic sits below 3800 m. She rests at 3840 and that little dip on the sea floor looks closer to 400 m.

Overall, though, a solid piece.

Credit for the piece goes to the BBC’s graphics department.

Tired of These Motherf*cking Sox on This Motherf*cking Plane

At least, that’s what I imagine South Siders saying in Chicago as they watch the White Sox team charter plane land at Midway. For those not following America’s Major League Baseball season, the Chicago White Sox are one of two clubs claiming Chicago as their home. (The other being the Cubs.) And the White Sox—not to be confused with your author’s favourite club, the Red Sox—are on track to be one of the worst clubs in the modern (post-1900) history of the sport. They have already tied the New York Mets’ record of 120 losses and there are still six left to play.

Earlier this month the Athletic detailed what has gone wrong for the Pale Hose. One of the things that stood out to me the most in the reporting was the complaints about the club’s charter aircraft, an Airbus A320, as the article points out a 1980s aircraft. The article in particular mentioned how other cheapskate teams—including the Boston Red Sox—opt for nicer aircraft with more first-class accommodations for players and staff. Then they cited a graphic shared on Twitter last year by Jay Cuda and when I saw that, I knew I had to cover it.

One thing I find fascinating is how the White Sox use United Airlines for their charter. United Airlines operates the charter—as it does for the Cubs and other airlines. That it does so for the two Chicago teams makes all the sense in the world as the company is headquartered in the Loop in downtown Chicago. It is also one of the largest airlines and thus makes sense in that dimension too.

But as those frequent air travellers among you will know, Chicago has two airports: O’Hare and Midway. O’Hare in northwest of downtown and closer to the Cubs and is the city’s primary airport. But the White Sox typically fly out of Midway, which is just a couple miles from (New) Comiskey. (I presume the team bus hops on the Dan Ryan/I-90 to the Stevenson/I-55 then exits on Cicero.)

Weird because United does not service Midway. And so United, which operates out of O’Hare, must fly aircraft to Midway to then transport the White Sox. I suppose the White Sox would not want to charter a Southwest aircraft, though…. In my own lifetime I think I have flown in and out of Midway only twice. And I lived in Chicago for eight years. (And the White Sox were terrible for probably six of them.)

Some non-White Sox things notable from the graphic. One, iAero no longer exists, so I would be curious whom the Texas Rangers and Oakland Athletics used this year. The Rangers probably used a reputable airline. The Athletics probably made their players and staff charter their own transport.

I also did not realise that even last year the McDonnell Douglas MD-80 still carried passengers in the United States. I assume that by 2024, the Detroit Tigers have fully transitioned to that Boeing 737. I find it fascinating that only the Tigers own their own aircraft. I would be curious to know why more teams do not, though of course it has to be money.

With whom else would the Blue Jays fly but Air Canada?

Finally, I am surprised that my Boston Red Sox use Delta, because that’s a normal, non-budget airline. And anyone who follows the Red Sox know the Red Sox are no longer in the habit of spending money. I thought they would use jetBlue, which is the sponsor for Fenway South, formally jetBluePark, in Fort Myers, Florida, where the Red Sox have their spring training and development league complex.

Anyways, happy Friday, all. At least you don’t play or work for the Chicago White Sox. (Though I suppose it is possible you do, because I do have a large number of readers from Chicago. But I doubt it.)

Credit for the piece goes to Jay Cuda.

I Need My Sharpie. Where’s My Sharpie?

Because who does not recall the great Sharpie forecast track by the National Hurricane Center (NHC)?

Earlier this summer, in the middle of the hurricane season, the National Oceanic and Atmospheric Administration’s (NOAA’s) NHC released a new, experimental warning cone map. For those unfamiliar, these are the maps that have a white and white-shaded forecast for where the centre of the storm will track. Importantly, it is not a forecast of where the storm will impact. If you have ever been through a hurricane—would not recommend—you know you need not be near the centre to feel the storm’s impact.

I have been waiting for a significant storm to threaten the United States before taking a look at these. (It is also important to note, these new maps apply only to the United States.) But this is the current map for Hurricane Helene as of Wednesday morning.

For those of you who, like me, are familiar with these, you will see the red lines along the coast that indicate hurricane warnings. Blue lines indicate current tropical storm warnings. Not on this map are pink lines for hurricane watches and yellow lines for tropical storm watches. But all these lines only represent watches and warnings along the coast. Little dots indicate the storm’s forecast position at certain times and through letter indicators its strength. The full white areas are the forecast track for the centre of the storm through the first three days. The shaded area is for days 4–5.

Contrast that with the new, experimental version.

The background of the map remains the same. In my perfect world, I would probably drop the grey and blue back a little bit, but that is not the end of the world. Instead, the biggest change is that the tropical storm and hurricane watches and warnings, which have always been declared for full counties inland, are now shown on the map.

You can see the red hurricane warnings are now forecast to move through the eastern Florida panhandle and southern Georgia with tropical storm watches forecast for the inland counties north and east of those. And then the three- and five-day forecasts have blended into a single white cone track. Subtly, the stroke or outline for that has changed from black to solid white. That helps reduce the distracting visuals on the map and emphasise the forecast track and watches and warnings.

Overall, I think is a really strong and important and potentially life-saving improvement to the graphics. Could things be improved more? Absolutely. But sometimes the only way to make improvements is through slow and steady incremental changes. This update does that in spades.

Credit for the piece goes to the NHC graphics team.

For Whom the Teamsters Poll Tolls

The Teamsters Union decided to officially endorse neither candidate in the 2024 US presidential election. Prior to their non-announcement announcement, however, the union surveyed its members and then released the polling data ahead of the announcement.

Of course, the teamsters represent but a single union in a large and diverse country. More importantly, the survey results reported only the share of responses for either candidate—and “Other”—so we have no idea how many of what number opted for whom. But hey, it’s another talking point in the final six weeks of the campaign.

Naturally, I decided to visualise the data.

The trend is pretty, pretty clear. The union’s rank-and-file clearly support Trump for president, with the exception of the teamsters in the District of Columbia. (Note, no survey was taken in Wyoming.) In fact, in only eight states plus DC did Harris’ support top 40%.

Credit for the piece is mine.

Fear the Floodwaters

This past weekend saw some flooding along the East Coast due to the Moon pulling on Earth’s water. In Boston that meant downtown flooding, including Long Wharf. The Boston Globe’s article about the flooding dwelt with more impact, causes, and long-term forecasts—none of which really warranted data visualisation or information graphics. Nonetheless, the article included a long time series examining the change in Boston’s sea level relative to the mean.

For me, the graphic works really well. The data strips out the seasonal fluctuations and presents the reader with a clear view of rising sea levels in Boston. If the noisiness of the red line distracts the reader—one wonders if an annual average could have been used—the blue trend line makes it clear.

And that blue trend line has a nice graphic trick to help itself. Note the designer added a thin white stroke on the outside of the line, providing visual separation from the red line below.

My only real critique with the graphic is the baseline and the axis lines. The chart uses solid black lines for the axes, with grey lines running horizontally depicting the deviation from the mean sea level. But the black lines draw the attention of the eye and thus diminish the importance of the 0 inch line, which actually serves as the baseline of the chart.

If I quickly edit the screenshot in Photoshop, you can see how shifting the emphasis subtly changes the chart’s message.

Overall, however, the graphic works really well.

Credit for the piece goes to John Hancock.

Labelling Line Charts

Today I have a little post about something I noticed over the weekend: labelling line charts.

It begins with a BBC article I read about the ongoing return to office mandates some companies have been rolling out over the last few years. When I look for work these days, one important factor is the office work situation and so seeing an article about the tension in that issue, I had to read it.

The article includes this graphic of Office of National Statistics (ONS) data and BBC analysis.

Overall, the chart does a few things I like, most notably including the demarcation for the methodology change. The red–green here also works. Additionally the thesis expressed by the title, “Hybrid has overtaken WFH”, clearly evidences itself by the green line crossing the blue. (I would quibble and perhaps change the hybrid line to red as it is visually more impactful.)

I also like on the y-axis how we do not have a line connecting all the intervals. Such lines are often unnecessary and can often add visual clutter, see yesterday’s post for something similar. I quibble here with dropping the % symbol for the zero-line. Since the rest of the graphic uses it, I would have put the baseline as 0%. And that baseline is indeed represented by a darker, black line instead of the grey used for the other intervals.

Then we get to the labels on the right of the graphic. Firstly, I do not subscribe to the view charts and graphs need to label individual datapoints. If the designer created the chart correctly, the graph should be legible. Furthermore, charts show relationships, if one needs a specific value, I would opt for a table or a factette instead. These are not the most egregious labels, mind you, but here they label the datapoint, but not the line. Instead, for the line the reader needs to go back to the chart’s data definition and retrieve the information associated with the colour.

Now compare that to a chart representing Major League Baseball’s playoff odds from Fangraphs.

Here too we have mostly good things going on, but I want to highlight the labelling at the right. This chart also includes the precise value, which is fine, but here we also have the actual label for the lines. The user does not need to leave the experience of the chart to find the relevant information, although a secondary/redundant display or legend can be found at the bottom of the chart.

If you can take the time to label the end value, you may as well label the series.

Credit for the BBC graphic goes to the BBC’s graphics department.

Credit for the Fangraphs piece goes to Fangraphs’ design team.

Twelve-Mile Circle

As a wee lad I grew up south of Downingtown, Pennsylvania, an old mill town situated along the banks of the East Branch of the Brandywine Creek. Drop a little stick in the Brandywine and it would float downstream until it joins the Christina River in Wilmington, Delaware and thereafter shortly into the Delaware River.

Delaware has tax-free shopping and movie theatres I frequented in my youth. First laptop purchase for university? Delaware. Furniture for moving out to Chicago? Delaware. In other words, when I posted my most recent map of where I have been, the three counties of Delaware were some of the earliest counties filled in.

Delaware—for better or worse—is seared into my mind. If you look at the state border, you will see the northern border is circular. Look at all other state borders and that circle is kind of weird. Most other borders are straight(ish) lines, mountain ridges, rivers, or bays. The reason is the border between Pennsylvania and Delaware was, essentially, taking out a protractor and drawing a circle twelve miles distant from New Castle, Delaware, the original capital.

Anyway, I have not thought about that in quite some time. But thankfully, xkcd did.

As many of you know, I love geography and so I am aware of many of these places. Lake Manicouagan is one of those places that has an island in it, which has a lake on that island, in which there is another lake. There might even be another island/lake combination, but I could be mistaken.

Happy Friday, everyone.

Credit for the piece goes to Randall Munroe.

The Dawn of a New Nuclear Age?

I grew up less than 15 miles away from the Limerick Nuclear Generating Station, located on the banks of the Schuylkill River northwest of the city of Philadelphia. Our house sat on the north-facing slope of the Great Valley and the cooling towers of Limerick were a ridge line and river valley away from view. But on a clear day, you can see the puffy, billowy clouds of steam rising over the distant horizon—Limerick is splitting the atom.

We all know—or should by now—burning coal, oil, and gas are not terribly great for the planet. They emit carbon dioxide and other gasses that warm the Earth. But the white columns rising over the Schuylkill are water. Fissile uranium is more dense than coal, oil, or gas. And not just by a wee bit. But by orders of magnitude. Splitting the atom provides mankind with enormous amount of energy.

And we need energy. This summer was hot. And I don’t like it hot. Consequently, my air con ran almost nonstop. And I am not the only one. But whence comes all the electricity to power those units? Yes, we can get electricity from the sun, the wind, and the water. But what about when the clouds block the sun? Or the hot, sticky summer air refuses to stir? Or the parched earth has sucked the water from the reservoir?

The uranium atom can still be split, and at a reliable rate. That makes it great to provide a high amount of electricity that can be augmented by the sun, the wind, and the water when conditions permit.

However, in recent years, the cost of oil and gas declined thanks to fracking, and the business cost to run coal plants lowered as environmental standards disappeared. The economics of running nuclear power plants made them less viable than carbon-spewing options. Electricity providers started shutting nuclear plants down.

Things have changed, though. As we run more air con, we need more electricity. As we run more electric busses and trains, we need more electricity. As we charge more electric cars, we need more electricity. As we run more servers for bitcoin mining or AI farms, we need more electricity.

We need more electricity. A lot more.

And so the economics of electricity is changing. The Wall Street Journal had a great article about the re-opening of nuclear plant in Michigan. It included some really nice photographs of the control room and the turbine room. But, the reason we are talking about it here today because the article includes a few diagrams and illustrations. This one caught my attention.

First, I really enjoy how the United States is reduced to a grey outline. Perhaps a very faint grey could have been used to infill the states, but here I think white works best because of the use of the light and medium greys for active plants.

The active plants—not the focus of the article—are in those greys, whilst the decommissioned and -ing plants are in tints of red. What I struggled with a long time ago when I made an infographic about southeastern Pennsylvania’s electricity generation was how to show the different plants at a single facility.

Ultimately, I listed each plant by name then an icon representing the type of fuel, because not every plant uses all the same type of fuel. Eddystone Generating Station just south of Philadelphia used both natural gas/oil plants and two coal plants, though those were retired in the 2010s.

Here the designer, not needing to label each plant and aided by the fact each plant is nuclear, simply encloses the dots within a container. Palisades, the plant in question, receives a thicker, black stroke to call it out against the rest of the plants.

Credit for the piece goes to, I think Adrienne Tong. She is credited for a different graphic in the article, but not the one I highlighted, so I’ll give her the credit unless and until someone else gets the credit.