Predicting…the Known Stats?

I have been trying to post more regularly here on Coffeespoons, but now that baseball’s postseason is in full swing—pun fully intended—my free time is spent watching balls and strikes at all hours of the day. (Though, with the Wild Card round over and the move from four to two games per day, my time will likely expand as the week winds down. Sort of. More on that in a moment.)

What I have noticed on a few broadcasts, however, is the broadcast team touting Google’s ability to forecast a player’s ability to get on base. Most recently, on Sunday afternoon my mates and I were watching the Phillies–Mets contest and the broadcaster announced or the graphic popped on screen claiming Google predicts Francisco Lindor has a 34% chance to get on base in the plate appearance.

That can be a useful nugget of knowledge. And wow, that is crazy that Google can predict Lindor’s chances of getting on base.

Except it is not.

Francisco Lindor’s on base percentage (OBP) for the 2024 season was 0.344. In other words, in 34.4% of plate appearances (PAs), Lindor either gets a hit or takes a walk. With a entire sample of 689 PAs, Lindor got on base 34% of the time. Maybe Google was taking into account some other factors, but that was just the most recent one I can recall.

I wish I could recall which batter first keyed me into this situation. I want to say it was a high OBP guy, and for whatever reason I pulled my mobile out and opened the batter’s page on Baseball Reference only to find the prediction matched the OBP exactly.

Then it happened again. And again. And again.

Baseball is the greatest sport. One reason I love it is because you can use data and information to describe it. Plan for it. Play it. And sometimes predict it. Sometimes that works. Sometimes, when it doesn’t, it breaks your heart.

Baseball has reams of data and, yes, that data can feed into newer and cooler algorithms and models for predicting outcomes. (Outcomes that surely have nothing to do with the flood of sports gambling available on mobile phones.) But to me, it seems a bit disingenuous to call a statistic that has largely moved out of the realm of baseball nerds into the common understanding of the sport—thanks, Moneyball—a company’s new predictive statistic when that statistic has existed forever.

Separately, as I alluded to earlier, I shall not be posting the next few weeks. I have a weekday wedding to attend later in the week and then I am headed out of town for a few weeks and intend to be doing very little digital stuff. Plus, by the time I return baseball’s postseason shall likely be over.

But in the meantime, I am going to be heading out this afternoon to meet some mates as they cheer on their local squad, the Philadelphia Phillies as they play the Mets. (No, the Red Sox did not, yet again, make the postseason.)

As the first batter, Kyle Schwarber, steps to the plate, I predict he will have a 37% chance of getting on base. And look, his OBP is 0.366.

Tired of These Motherf*cking Sox on This Motherf*cking Plane

At least, that’s what I imagine South Siders saying in Chicago as they watch the White Sox team charter plane land at Midway. For those not following America’s Major League Baseball season, the Chicago White Sox are one of two clubs claiming Chicago as their home. (The other being the Cubs.) And the White Sox—not to be confused with your author’s favourite club, the Red Sox—are on track to be one of the worst clubs in the modern (post-1900) history of the sport. They have already tied the New York Mets’ record of 120 losses and there are still six left to play.

Earlier this month the Athletic detailed what has gone wrong for the Pale Hose. One of the things that stood out to me the most in the reporting was the complaints about the club’s charter aircraft, an Airbus A320, as the article points out a 1980s aircraft. The article in particular mentioned how other cheapskate teams—including the Boston Red Sox—opt for nicer aircraft with more first-class accommodations for players and staff. Then they cited a graphic shared on Twitter last year by Jay Cuda and when I saw that, I knew I had to cover it.

One thing I find fascinating is how the White Sox use United Airlines for their charter. United Airlines operates the charter—as it does for the Cubs and other airlines. That it does so for the two Chicago teams makes all the sense in the world as the company is headquartered in the Loop in downtown Chicago. It is also one of the largest airlines and thus makes sense in that dimension too.

But as those frequent air travellers among you will know, Chicago has two airports: O’Hare and Midway. O’Hare in northwest of downtown and closer to the Cubs and is the city’s primary airport. But the White Sox typically fly out of Midway, which is just a couple miles from (New) Comiskey. (I presume the team bus hops on the Dan Ryan/I-90 to the Stevenson/I-55 then exits on Cicero.)

Weird because United does not service Midway. And so United, which operates out of O’Hare, must fly aircraft to Midway to then transport the White Sox. I suppose the White Sox would not want to charter a Southwest aircraft, though…. In my own lifetime I think I have flown in and out of Midway only twice. And I lived in Chicago for eight years. (And the White Sox were terrible for probably six of them.)

Some non-White Sox things notable from the graphic. One, iAero no longer exists, so I would be curious whom the Texas Rangers and Oakland Athletics used this year. The Rangers probably used a reputable airline. The Athletics probably made their players and staff charter their own transport.

I also did not realise that even last year the McDonnell Douglas MD-80 still carried passengers in the United States. I assume that by 2024, the Detroit Tigers have fully transitioned to that Boeing 737. I find it fascinating that only the Tigers own their own aircraft. I would be curious to know why more teams do not, though of course it has to be money.

With whom else would the Blue Jays fly but Air Canada?

Finally, I am surprised that my Boston Red Sox use Delta, because that’s a normal, non-budget airline. And anyone who follows the Red Sox know the Red Sox are no longer in the habit of spending money. I thought they would use jetBlue, which is the sponsor for Fenway South, formally jetBluePark, in Fort Myers, Florida, where the Red Sox have their spring training and development league complex.

Anyways, happy Friday, all. At least you don’t play or work for the Chicago White Sox. (Though I suppose it is possible you do, because I do have a large number of readers from Chicago. But I doubt it.)

Credit for the piece goes to Jay Cuda.

I Want a Pitcher Not a Back o’ Head Hitter

We’re about to go into the sportsball realm, readers. Baseball, specifically.

Tuesday night, Atlanta Braves batter Whit Merrifield was hit in the back of the head by a 95 mph fastball. Luckily, modern ballplayers wear helmets. But at that velocity, one does not have the most reaction time in the world a number of other batters have been hit in the face. And generally, that’s not good. Merrifield went off in post-game interviews about the lack of accountability on the pitchers’ side. From my perspective as an armchair ballplayer, back in my day, when I walked up hill through the snow both ways to get to my one-room schoolhouse, if you hit a batter, our pitcher was hitting one of yours.

I have noticed in ballgames, however, I see hit-by-pitch (HBP) more often—and I score most ballgames I attend, so I have records. But I also know a handful attended per year makes for a very small sample size. Nonetheless, I know I have talked to other baseball friends and brought up that I think pitchers throw with less command, i.e. throwing strikes, than they used to, because I see more HBP in the box scores. And when I go to minor league ballgames, which I do fairly often, HBP seems on the rise there, which means in future years those same pitchers will likely be in the majors.

So yesterday morning, I finally took a look at the data and, lo and behold, indeed, since my childhood, the numbers of HBPs has increased.

There is one noticeable sharp dip and that is the 2020 COVID-shortened season. Ignore that one. And then a smaller dip in the mid-90s represents the 114-game and 144-game seasons, compared to the standard 162 per year. Nonetheless, the increase is undeniable.

There is a general dip in the curve, which occurs in the late 200s and early 2010s, with its nadir in 2012. Without doing more research, that was probably the peak of pitchers, who could command—throw strikes—and control—put their strikes where they want in the strike zone—their pitches at the sacrifice of velocity.

2014 saw the rise of the dominant Royals bullpen, which changed the course of modern baseball. Stack your bullpen with a number of power arms who throw 100 mph and just challenge batters to hit the speedball. Problem is, not everyone who can throw 100 knows where that speedball is going. And that leads to more batters being hit.

Merrifield is correct in his assessment that until pitchers and teams face consequences for hitting batters, we are not likely to see a decrease in HBPs. Or at least not until velocity is de-emphasised for some other reason. What if there were a rule a pitcher who hits a batter from the shoulder up is immediately ejected? What if a long-term injury for a batter is tied to a long-term roster removal for the pitcher? If, say, the batter hit in the head is out for a month with a concussion, the same pitcher is on the restricted list for a month?

Have I worked through any of these ideas in depth? Nope. Just spitballing here on ye olde blog. But as my chart shows, it does not look like this potentially life-changing problem in the game is going away anytime soon.

Credit for the piece is mine.

The .500 Red Sox

I initially made this datagraphic over the weekend, after watching the last few weeks of Boston Red Sox baseball wherein they continued to win a game, lose a game, resulting in an even .500 record.

When I started, the graphic I sketched looked very different as I had included timelines and highlighted key moments where key players went down for the year or the year-to-date. But after I added some context of the sport’s leading clubs’ games above or below .500, I realised most of those clubs were all those that my good friends and family followed.

Consequently I ditched my initial concept and opted to instead show how middling my Red Sox have been to the rest of them. And whilst this graphic may have a few more spaghetti lines than I’d typically prefer, it does show that squiggle of consistency in the middle that is the Red Sox 2024 season to date.

Of course, when I posted it, the Red Sox had just lost to the Yankees and I said I expected them to win one and lose one the rest of the weekend to stay at .500. So what happened? The Red Sox won both and are now two games over .500.

Baseball superstition thus requires I post more graphics about the .500 Red Sox to get them more games over .500.

Credit for the piece is mine.

Boston: Sportstown of the 21st Century

Tonight the Boston Celtics play in Game 1 of the NBA Finals against the Golden State Warriors, one of the most dominant NBA teams over the last several years. But since the start of the new century and the new millennium, more broadly Boston’s four major sports teams have dominated the championship series of those sports. In fact tonight marks the 19th championship series a New England team has played since 2001. And in those 18 series thus far, Boston teams have a 12–6 record.

Let’s go Celtics.

Of the 12 titles won, the New England Patriots account for half with six Super Bowl victories out of nine appearances. The Boston Red Sox have won all four World Series they have played in since 2001. Rounding out the list, the Celtics and Bruins have each won a single championship with the Bruins appearing in three Stanley Cups and the Celtics in two NBA Finals. Tonight begins their third.

Credit for the piece is mine.

How the Globe’s Writers Voted

Yesterday we looked at a piece by the Boston Globe that mapped out all of David Ortiz’s home runs. We did that because he has just been voted into baseball’s Hall of Fame. But to be voted in means there must be votes and a few weeks after the deadline, the Globe posted an article about how that publication’s eligible voters, well, voted.

The graphic here was a simple table. But as I’ll always say, tables aren’t an inherently bad or easy-way-out form of data visualisation. They are great at organising information in such a way that you can quickly find or reference specific data points. For example, let’s say you wanted to find out whether or not a specific writer voted for a specific ballplayer.

Just don’t ask me for whom I would have voted…

Simple red check marks represent those players for whom the Globe’s eligible staff voted. I really like some of the columns on the left that provide context on the vote. For the unfamiliar, players can only remain on the list for up to ten years. And so for the first four, this was their last year of eligibility. None made the cut. Then there’s a column for the total number of votes made by the Globe’s staff. Following that is more context, the share of votes received in 2021. Here the magic number if 75% to be elected. Conversely, if you do not make 5% you drop off the following year. Almost all of those on their first year ballot failed to reach that threshold.

The only potential drawback to this table is that by the time you reach the end of the table, there are few check marks to create implicit rules or lines that guide you from writer to player. David Ortiz’s placement helps because six—remarkably not all Globe writers voted for him—it grounds you for the only person below him (alphabetically) to receive a vote. And we need that because otherwise quickly linking Alex Rodriguez to Alex Speier would be difficult.

Finally below the table we have jump links to each writer’s writings about their selections. And if you’ll allow a brief screenshot of that…

Still don’t ask me

We have a nicely designed section here. Designers delineated each author’s section with red arrows that evoke the red stitching on a baseball. It’s a nice design tough. Then each author receives a headline and a small call out box inside which are the players—and their headshots—for whom the author voted. An initial dropped capital (drop cap), here a big red M, grabs the reader’s attention and draws them into the author’s own words.

Overall this was a solidly designed piece. I really enjoyed it. And for those who don’t follow the sport, the table is also an indicator of how divisive the voting can be. Even the Globe’s writers couldn’t unanimously agree on voting for David Ortiz.

Credit for the piece goes to Daigo Fujiwara and Ryan Huddle.

558 Dingers

Yesterday baseball writers elected David Ortiz of the Boston Red Sox, better known as Big Papi, to the Baseball Hall of Fame. I was trying to work on a thing for yesterday, but ran out of time. While I will attempt to return to that later, for now I want to share a simple interactive graphic from the Boston Globe. As the blog title suggests, it’s about the 558 career home runs Ortiz hit between his time with the Twins and the Red Sox. He hit 541 of those during the regular season, tacking on 17 more in the post season including his famous 2013 ALCS grand slam against the Detroit Tigers. (The one where the cop’s arms are in the air alongside Torii Hunter’s legs.)

That’s a lot of runs

Now you can see that Ortiz was a left-handed pull hitter with that home run concentration to right field, especially those wrapped around Fenway’s (in)famous Pesky Pole.

But with the number of dots you see inside the grounds at Fenway, you can also see the one downside of a chart like this. The graphic maps home runs at all Major League ballparks to that of Fenway. Not to mention the role that the Green Monster plays in turning a lot of those line drive home runs that when hit to right field leave the yard, but to left simply bounce off the Monster for doubles or the dreaded long single. But in part that’s why Ortiz also had ridiculous season numbers for extra base hits because of all those Green Monster doubles. (Conversely, how many popups a mile in the sky came down into the Green Monster seats?)

You access this interactive piece by scrolling through the experience as the Globe chose 12 home runs to represent Ortiz’s entire career. I’m fortunate enough to remember watching several of them on the television.

Big Papi was a force to be reckoned with and watching him hit was entertainment. I’m very excited to see him enter the Hall of Fame.

This summer? It’s his effing Hall.

Credit for the piece goes to John Hancock.

Those Are Some Heavy Balls

Unfortunately, I don’t subscribe to Business Insider, but I saw this graphic on the Twitter and felt the need to share it. Primarily because baseball will almost certainly stop at midnight when the owners of the teams will impose a lockout (as opposed to players going on strike). And with that baseball will be on hold until the two parties resolve their current labour issues.

And at present that seems like it could take quite some time.

So on the eve of the lockout Bradford William Davis tweeted a link to an article he wrote, alas no subscription as aforementioned, but he did share one of the graphics therein.

Those are a lot of blue balls…

We have a basic dot plot charting the weight of the centre of baseballs, sorted by the month of game from which they were pulled.

The designer made a few interesting choices here. First, typographically, we have a few decisions around the type. I would have loved to have seen a bit of editing or design to eliminate the widow at the end of the graphic’s subtitle, that bit that just says “(blue)”. Do the descriptors in parentheses even need to be there when the designer included a legend immediately below? I find that one word incredibly distracting.

On the other hand, the designer chose to use a thin white outline around the text on the plot. Normally I’d really like this choice, because it can reduce some of the issues around legibility when lines intersect text, especially when they are the same colour. Here, however, the backgrounds are not white. I would have tried, for the top, using that light blue instead of white as the stroke for the outside of the letters. And on the bottom I would have tried the light pink. That would probably achieve the presumed desired effect of reducing the visual interference unintentionally created by the white. I also would have moved the top label up so it didn’t sit overlay the top dot.

As far as the dot plot itself goes, that works fine. I wonder if some transparency in the dots would have emphasised how many dots sit atop each other. Or maybe they could have clustered, but when overlapping moved horizontally off the vertical axis.

Overall this was a really nice graphic with which to end this half of the baseball off season. Hopefully the lockout doesn’t last too long.

Credit for the piece goes to Taylor Tyson.

Data Analysis and Baseball

First, a brief housekeeping thing for my regular readers. It is that time of year, as I alluded to last week, where I’ll be taking quite a bit of holiday. This week that includes yesterday and Friday, so no posts. After that, unless I have the entire week off—and I do on a few occasions—it’s looking like three days’ worth of posts, Monday through Wednesday. Then I’m enjoying a number of four day weekends.

But to start this week, we have Game 6 of the World Series tonight between the Atlanta Braves and the Houston Astros. That should the Braves vs. the Red Sox, but whatever. If you want your bats to fall asleep, you deserve to lose. Anyways, rest in peace, RemDawg.

Yesterday the BBC posted an article about baseball, which is first weird because baseball is far more an American sport that’s played in relatively few countries. Here’s looking at you Japanese gold medal for the sport earlier this year. Nevertheless I fully enjoyed having a baseball article on the BBC homepage. But beyond that, it also combined baseball with history and with data and its visualisation.

You might say they hit the sweet spot of the bat.

There really isn’t much in the way of graphics, because we’re talking about work from the 1910s. So I recommend reading the piece, it’s fascinating. Overall it describes how Hugh Fullerton, a sportswriter, determined that the 1919 White Sox had thrown the World Series.

Fullerton, long story short, loved baseball and he loved data. He went to games well before the era of Statcast and recorded everything from pitches to hits and locations of batted balls. He used this to create mathematical models that helped him forecast winners and losers. And he was often right.

For the purposes of our blog post, he explained in 1910 how his system of notations worked and what it allowed him to see in terms of how games were won and lost. Below we have this screen capture of the only relevant graphic for our purposes.

Grooves on the diamond

In it we see the areas where the batter is like safe or out depending upon where the ball is hit. Along the first and third base foul lines we thin strips of what all baseball fans fear: doubles or triples down the line. If you look closely you can see the dark lines become small blobs near home plate. We’ve all seen those little tappers off the end of the bat that die, effectively a bunt.

Then in the outfield we have the two power alleys in right- and left-centre. When your favourite power hitter hits a blast deep to the outfield for a home run, it’s usually in one of those two areas.

We also have some light grey lines, which are more where batted balls are going to get through the infielders. We are talking ground balls up the middle and between the middle infielders and the corners. Of course this was baseball in the early 20th century. And while, yes, shifting was a thing, it was nowhere near as prevalent. Consequently defenders were usually lined up in regular positions. These correspond to those defensive alignments.

Finally the vast majority of the infield is coloured another dark grey, representing how infielders can usually soak up any groundball and make the play.

The whole article is well worth the read, but I loved this graphic from 1910 that explains (unshifted) baseball in the 21st century.

Credit for the piece goes to Hugh Fullerton.

Low Expectations

Today the 2021 Major League Baseball season begins its playoffs. Tomorrow we get the Los Angeles Dodgers and the St. Louis Cardinals. Why the Dodgers, the team with the second-best record in all of baseball, need to play a one-game play-in is dumb, but a subject for perhaps another post. Tonight, however, is the American League (AL) Wildcard game and it features one of the best rivalries in baseball if not American sports: the Boston Red Sox vs. the New York Yankees.

Full disclosure, as many of you know, I’m a Sox fan and consider the Yankees the Evil Empire. But at the beginning of the year, the consensus around the sport was that the Yankees would win first place in their division and be followed by the Tampa Bay Rays or the Toronto Blue Jays. The Red Sox would place fourth and the lowly Baltimore Orioles fifth. The Red Sox, as the consensus went, were, after gutting their team of top-flight talent and a no-good, rotten, despicable 2020 showing, nowhere near ready to reach the playoffs. The Yankees were an unstoppable offensive juggernaut.

When the 2021 season ended Sunday night, as the dust around home plate settled, the Rays dominated the AL East to take first. But it was the Red Sox that finished second and the Yankees who took third. Whilst the two teams had the same record, in head-t0-head match-ups the Red Sox won more games than the Yankees, 10–9. Not bad for a team that everyone thought couldn’t make the playoffs and would be in fourth place.

That got me thinking though, how wrong were our expectations? After doing some Googling to find individual reports and finding a Red Sox twitter account (@RedSoxStats) that captured as many preseason forecasts as he could, I was ready to make a chart. The caveat here is that we don’t have data for all beat writers, who cover the Red Sox exclusively or almost exclusively on a daily basis, or even national media writers, who cover the Red Sox along with the rest of the sport and its teams. For example, ESPN polled 37 of its writers, but all we know is that 0 of 37 expected the Red Sox to make the playoffs. I don’t have a single estimate for the number of wins, which obviously determines who gets into said playoffs, for those 37 forecasts. Others, like CBS Sports, broke down each of their five writers’ rankings for the division and all five had the Red Sox finishing fourth. But again, we don’t have numbers of wins. So in a sense, if we could get numbers from back in the winter and early spring, this chart would look even crazier with the Red Sox being even more outperform-ier than they do here.

Dirty water

We should also remember that during September, in the lead-up to the playoffs, the Red Sox were struggling with a Covid-19 outbreak that put nearly half their starting roster on the Injured List (IL). The Sox had the backups to the backups starting alongside the backups, some of whom then also went on the IL with Covid-19 leading to signings of players who, despite being integral to the September success, are not eligible to play in the playoffs due to when they signed. José Iglesias brought some 2013 magic to be sure. Earlier in the year, MLB would postpone games when significant numbers of players were unavailable, but the Red Sox, for whatever reason, had to play every game. And there were instances where players started the game, but in the middle of the game their tests came back positive and they had to be removed from the field in the middle of the game.

I’m not certain where I stand on how much managers influence the win-loss record in baseball. But if the Sox manager, Alex Cora, doesn’t at least get some nods for being manager of the year, I’ll be truly shocked.

The Red Sox are not a great team. This is not the 2018 behemoth, but rather an early rebuild for a hopefully competitive team in 2023. Their defence is not great. They lack depth in the rotation and the bullpen. I, for one, never doubted their offence—2020 surely had to have been a pandemic fluke. But I had serious questions about their starting rotation. Ultimately the rotation proved itself to be…adequate. And while they played through Covid-19 and kept their heads above water in September, the last few weeks were, at times, hard to watch. The Yankees swept them at Fenway, site of tonight’s game, just last weekend. Of late, the Yankees have been the better team. And all year long, the Red Sox played less competitively than I’d like against the other teams that made the playoffs.

I don’t expect them to win let alone make the World Series, but nobody expected them to be here anyway. Maybe they still have a few more surprises in them. After all, anything can happen in October baseball.

Credit for the piece is mine.