Those Are Some Heavy Balls

Unfortunately, I don’t subscribe to Business Insider, but I saw this graphic on the Twitter and felt the need to share it. Primarily because baseball will almost certainly stop at midnight when the owners of the teams will impose a lockout (as opposed to players going on strike). And with that baseball will be on hold until the two parties resolve their current labour issues.

And at present that seems like it could take quite some time.

So on the eve of the lockout Bradford William Davis tweeted a link to an article he wrote, alas no subscription as aforementioned, but he did share one of the graphics therein.

Those are a lot of blue balls…

We have a basic dot plot charting the weight of the centre of baseballs, sorted by the month of game from which they were pulled.

The designer made a few interesting choices here. First, typographically, we have a few decisions around the type. I would have loved to have seen a bit of editing or design to eliminate the widow at the end of the graphic’s subtitle, that bit that just says “(blue)”. Do the descriptors in parentheses even need to be there when the designer included a legend immediately below? I find that one word incredibly distracting.

On the other hand, the designer chose to use a thin white outline around the text on the plot. Normally I’d really like this choice, because it can reduce some of the issues around legibility when lines intersect text, especially when they are the same colour. Here, however, the backgrounds are not white. I would have tried, for the top, using that light blue instead of white as the stroke for the outside of the letters. And on the bottom I would have tried the light pink. That would probably achieve the presumed desired effect of reducing the visual interference unintentionally created by the white. I also would have moved the top label up so it didn’t sit overlay the top dot.

As far as the dot plot itself goes, that works fine. I wonder if some transparency in the dots would have emphasised how many dots sit atop each other. Or maybe they could have clustered, but when overlapping moved horizontally off the vertical axis.

Overall this was a really nice graphic with which to end this half of the baseball off season. Hopefully the lockout doesn’t last too long.

Credit for the piece goes to Taylor Tyson.

Data Analysis and Baseball

First, a brief housekeeping thing for my regular readers. It is that time of year, as I alluded to last week, where I’ll be taking quite a bit of holiday. This week that includes yesterday and Friday, so no posts. After that, unless I have the entire week off—and I do on a few occasions—it’s looking like three days’ worth of posts, Monday through Wednesday. Then I’m enjoying a number of four day weekends.

But to start this week, we have Game 6 of the World Series tonight between the Atlanta Braves and the Houston Astros. That should the Braves vs. the Red Sox, but whatever. If you want your bats to fall asleep, you deserve to lose. Anyways, rest in peace, RemDawg.

Yesterday the BBC posted an article about baseball, which is first weird because baseball is far more an American sport that’s played in relatively few countries. Here’s looking at you Japanese gold medal for the sport earlier this year. Nevertheless I fully enjoyed having a baseball article on the BBC homepage. But beyond that, it also combined baseball with history and with data and its visualisation.

You might say they hit the sweet spot of the bat.

There really isn’t much in the way of graphics, because we’re talking about work from the 1910s. So I recommend reading the piece, it’s fascinating. Overall it describes how Hugh Fullerton, a sportswriter, determined that the 1919 White Sox had thrown the World Series.

Fullerton, long story short, loved baseball and he loved data. He went to games well before the era of Statcast and recorded everything from pitches to hits and locations of batted balls. He used this to create mathematical models that helped him forecast winners and losers. And he was often right.

For the purposes of our blog post, he explained in 1910 how his system of notations worked and what it allowed him to see in terms of how games were won and lost. Below we have this screen capture of the only relevant graphic for our purposes.

Grooves on the diamond

In it we see the areas where the batter is like safe or out depending upon where the ball is hit. Along the first and third base foul lines we thin strips of what all baseball fans fear: doubles or triples down the line. If you look closely you can see the dark lines become small blobs near home plate. We’ve all seen those little tappers off the end of the bat that die, effectively a bunt.

Then in the outfield we have the two power alleys in right- and left-centre. When your favourite power hitter hits a blast deep to the outfield for a home run, it’s usually in one of those two areas.

We also have some light grey lines, which are more where batted balls are going to get through the infielders. We are talking ground balls up the middle and between the middle infielders and the corners. Of course this was baseball in the early 20th century. And while, yes, shifting was a thing, it was nowhere near as prevalent. Consequently defenders were usually lined up in regular positions. These correspond to those defensive alignments.

Finally the vast majority of the infield is coloured another dark grey, representing how infielders can usually soak up any groundball and make the play.

The whole article is well worth the read, but I loved this graphic from 1910 that explains (unshifted) baseball in the 21st century.

Credit for the piece goes to Hugh Fullerton.

Low Expectations

Today the 2021 Major League Baseball season begins its playoffs. Tomorrow we get the Los Angeles Dodgers and the St. Louis Cardinals. Why the Dodgers, the team with the second-best record in all of baseball, need to play a one-game play-in is dumb, but a subject for perhaps another post. Tonight, however, is the American League (AL) Wildcard game and it features one of the best rivalries in baseball if not American sports: the Boston Red Sox vs. the New York Yankees.

Full disclosure, as many of you know, I’m a Sox fan and consider the Yankees the Evil Empire. But at the beginning of the year, the consensus around the sport was that the Yankees would win first place in their division and be followed by the Tampa Bay Rays or the Toronto Blue Jays. The Red Sox would place fourth and the lowly Baltimore Orioles fifth. The Red Sox, as the consensus went, were, after gutting their team of top-flight talent and a no-good, rotten, despicable 2020 showing, nowhere near ready to reach the playoffs. The Yankees were an unstoppable offensive juggernaut.

When the 2021 season ended Sunday night, as the dust around home plate settled, the Rays dominated the AL East to take first. But it was the Red Sox that finished second and the Yankees who took third. Whilst the two teams had the same record, in head-t0-head match-ups the Red Sox won more games than the Yankees, 10–9. Not bad for a team that everyone thought couldn’t make the playoffs and would be in fourth place.

That got me thinking though, how wrong were our expectations? After doing some Googling to find individual reports and finding a Red Sox twitter account (@RedSoxStats) that captured as many preseason forecasts as he could, I was ready to make a chart. The caveat here is that we don’t have data for all beat writers, who cover the Red Sox exclusively or almost exclusively on a daily basis, or even national media writers, who cover the Red Sox along with the rest of the sport and its teams. For example, ESPN polled 37 of its writers, but all we know is that 0 of 37 expected the Red Sox to make the playoffs. I don’t have a single estimate for the number of wins, which obviously determines who gets into said playoffs, for those 37 forecasts. Others, like CBS Sports, broke down each of their five writers’ rankings for the division and all five had the Red Sox finishing fourth. But again, we don’t have numbers of wins. So in a sense, if we could get numbers from back in the winter and early spring, this chart would look even crazier with the Red Sox being even more outperform-ier than they do here.

Dirty water

We should also remember that during September, in the lead-up to the playoffs, the Red Sox were struggling with a Covid-19 outbreak that put nearly half their starting roster on the Injured List (IL). The Sox had the backups to the backups starting alongside the backups, some of whom then also went on the IL with Covid-19 leading to signings of players who, despite being integral to the September success, are not eligible to play in the playoffs due to when they signed. José Iglesias brought some 2013 magic to be sure. Earlier in the year, MLB would postpone games when significant numbers of players were unavailable, but the Red Sox, for whatever reason, had to play every game. And there were instances where players started the game, but in the middle of the game their tests came back positive and they had to be removed from the field in the middle of the game.

I’m not certain where I stand on how much managers influence the win-loss record in baseball. But if the Sox manager, Alex Cora, doesn’t at least get some nods for being manager of the year, I’ll be truly shocked.

The Red Sox are not a great team. This is not the 2018 behemoth, but rather an early rebuild for a hopefully competitive team in 2023. Their defence is not great. They lack depth in the rotation and the bullpen. I, for one, never doubted their offence—2020 surely had to have been a pandemic fluke. But I had serious questions about their starting rotation. Ultimately the rotation proved itself to be…adequate. And while they played through Covid-19 and kept their heads above water in September, the last few weeks were, at times, hard to watch. The Yankees swept them at Fenway, site of tonight’s game, just last weekend. Of late, the Yankees have been the better team. And all year long, the Red Sox played less competitively than I’d like against the other teams that made the playoffs.

I don’t expect them to win let alone make the World Series, but nobody expected them to be here anyway. Maybe they still have a few more surprises in them. After all, anything can happen in October baseball.

Credit for the piece is mine.

Sankey Shows Starters Sticking with Sticky Stuff

I spent way more time trying to craft that title than I’d like to admit. Headline writing is not easy.

Quick little piece today about Sankey diagrams. I love them. You often see them described as flow diagrams—this piece is in the article we’ll get to shortly—but they are more of a subset within a flow diagram. What sets Sankeys apart is their use of proportional strokes or widths of the directional arrows to indicate share of movement.

The graphic in question comes from an article about Major League Baseball’s (MLB’s) problem with “sticky stuff”. For the unfamiliar, sticky stuff is a broad term for foreign substances pitchers put on their fingers to provide better grip on the baseball. A better grip makes it easier to create movement like sliding and sinking in a pitch there therefore makes it harder for a hitter to hit it. Back when I was a wannabe pitcher, it was spitballs and scuff balls. Now professionals use things like Spider Tack. These are substances that allow you to put the ball in the palm of your hand, then turn your hand over to face the ground and not have the ball fall out of your hand.

So the graphic looks at starting pitchers and how their spin rate, the quantifiable measure impacted by sticky stuff, of their fastballs has changed since MLB instituted a ban on sticky stuff. (It had actually long been in place, see spitballs for example, but had rarely been enforced.)

Showing a small number of pitchers have managed to increased their fastball spin rates

This graphic explores how 223 pitchers saw their spin rates change in the first two months after the change in policy was announced to the nearly month after that period.

Sankeys use proportional width not just to show movement from category to category but the important element of what share of which category moves to which category. For example, we can see a little less than half of starting pitchers saw their spin rates stay the same after the policy change and another almost equal group saw their spin rates decrease. That’s probably a sign they were using sticky stuff and stopped lest they get caught.

But we can then see of that group, maybe 1/6 then saw their spin rates increase again over the last month. That could be a sign that they have found a way to evade the ban. Though it could also be they’ve found new ways of gripping or throwing the baseball. Spin rate alone does not prove sticky stuff usage.

Similarly, we can see that in the group that maintained their spin rate, a small group has found a way to increase it. Finally, a small fraction of the original 223 saw their spin rates increase and a fraction of that group has seen their spin rates increase even further.

This was just a really nice graphic to see in an article from the Athletic about sticky stuff and its potential return.

Credit for the piece goes to Max Bay.

Ranking the Red Sox Prospects

My regular readers will know that I am a fan of the Boston Red Sox, an American baseball team located in Boston, Massachusetts. I would consider myself a bit more involved than a casual fan in that I keep tabs on the team’s prospects.

For those unfamiliar with baseball, the sport works by keeping development pipelines of young talent fed through what we call a farm system. In essence a number of teams owned or contractually linked to the Major League team develop young players until they are ready to debut at the sport’s highest level.

Very few of total number of players in the system will ever get called up to “the Show”. In fact, in the history of the sport only 20,000 men have reached that level. Most of the rest will peak somewhere in the Minor Leagues. Most that reach the Majors will have been at some point prospects. And so to keep tabs on your team’s prospects and farm system sets one apart, in my mind, from the casual fan who simply knows a few of the team’s star players and enjoys a hot dog and a pint of beer at the stadium a few times a summer.

Red Sox fans are fortunate to have a website dedicated to coverage of Boston’s farm system, SoxProspects.com. They rank the system’s Top 60 prospects using their own methodology and research and publish the list online for fans like myself to enjoy.

Last week they updated their rankings. Long story short, the pandemic has impacted baseball and the development of young players. Consequently, the rankings changed significantly. What I really wanted to see was a visualisation of all the changes. So I took it upon myself to do just that using their data.

Hopefully we get a good player or two out of this

Now, if you also happen to be a Red Sox fan, I highly recommend their site. It’s fantastic. Normally I would take the train up to Trenton and see the Portland affiliate when it played there, but the Trenton team no longer exists. I’m not sure when I’ll get to see a Red Sox minor league team again. But hopefully sometime soon, because there look to be some good players coming up.

So I’ll be looking forward to, hopefully, a good run of contending teams in the coming years.

Credit for the piece is mine.

Baseball’s Injury Problem

Last week, Ken Rosenthal of the Athletic wrote an article examining the recent spate of injuries in Major League Baseball. For those interested in the sport, the article is well worth the read. For the unfamiliar, baseball played only about 1/3 of the number of games as usual last year due to Covid-19. This year, pitcher after pitcher seems to be falling prey to arm troubles. Position players are straining hamstrings, quads, and other muscles I’ve never heard of let alone used over the last year. And joking aside, therein is thought to be the problem.

And the evidence, in part, shows that we are seeing an increase in the numbers of injuries. But 2020 may not be as much of a problem as youngsters throwing baseballs near 100 mph. But I digress. The article contained a table detailing the numbers of injuries for certain body parts in the first month (April) of the season in both 2021 and 2019, the last comparable season due to Covid-19.

To be fair, the table was nice, but in the exhaustion of post-second dose shot last weekend, I sketched out some things and decided to turn it into a proper post.

Ouch.

Credit for the piece is mine.

Expansion Teams in Baseball

I was not planning on posting this today, because I was—am?—still working on it. But there was some baseball news last night that prompted me to export what I had to try and get this live.

For a little while now I’ve been wondering why a number of baseball stars, albeit in their later years, are still looking for employment. Some are pretty obvious in that they are facing legal troubles. Some may have high demands that ball clubs are not willing to meet. Some may have reasonable demands but the clubs are just being incredibly cheap. Or it may be none of those. Or some combination of those. But when you see some of the players some teams put on the field each night, you can’t tell me some of these free agents wouldn’t be better options.

Separately, I also tend to think baseball needs to expand and add some new clubs. But they won’t until the Oakland Athletics and Tampa Bay Rays resolve their stadium issues.

But what if…

Well a normal expansion would include two teams to keep an even balance. The new teams would likely use some kind of draft to select players from the rosters of other teams, with a certain number of players almost certainly protected. But what if we just used those unsigned ball players?

Anibal Sanchez is the guy messing this up. He’s been a free agent for some time now but is reportedly going to sign by the end of this week, perhaps today. So with him and everyone else, could we field two expansion teams?

Kinda, yeah.

First up, the Charlotte Piedmonters.

The Charlotte Piedmonters could also be looking for a new name.

Not a great team—nor would we expect it to be as all the really good free agents have already been signed. But these former stars, award winners, and fan favoutites may have just enough left in the tank to make for some competitive games if all goes well. My readers who happen to be fellow baseball fans will probably recognise most of these names, though I’ll admit a number of the relief pitchers are new to me. I can figure out basically everything but a centre fielder. But you could probably get somebody from an independent league or international league or just convert somebody.

I used projected Wins Above Replacement (WAR) to determine how good the players would be. For non-baseball fans, WAR is a value you can use to determine how good a player is relative to an average replacement player. Somebody with the value 0 to 1 is a scrub or bench player. Take any average ballplayer and sub them in and you wouldn’t know the difference. 2s and 3s are solid role playing guys, but not likely stars. Stars get into the picture around 4 and your best players are probably 5 to 6 or higher.

In Charlotte, nobody has a WAR higher than Rick Porcello’s 1.4. In other words, he’s a better than average pitcher, but not by much. Tyler Flowers: a better than average catcher, but not by much. Homer Bailey: barely better than average starting pitcher. Everyone else, generally you could sub them out and not know the difference. But, crucially for our purposes, they are not below average players. Some of those are still on the market, but I didn’t assign them to Charlotte.

Now if Charlotte gets a team, so does Portland, Oregon: the Portland Lumberjacks.

Again, I’m open to name suggestions.

Here you can see Anibal Sanchez as the third man in the rotation. You can also see that the rotation here is the weakest part. For Charlotte you could get away with a bullpen game every five days. But two bullpen days? Well, take a look at the Boston Red Sox in 2020 and that pitching dumpster fire and you’ll see what having only two or three starters can do. (Though the relief starters they did use were all worse than the people on these lists, which just makes my point that there are talented if not star-level players available.)

Neither of these teams would be good. You can imagine a team like Charlotte getting beat almost every night in the AL East—except by Baltimore. The NL East might be a bit easier. And Portland in the NL West would be similarly a punching bag—except by Colorado probably. But dump either into the AL or NL Central and who knows.

Two teams is clearly a stretch. So what if we just made one? What if we brought back the Montreal Expos? Sure, it messes up the schedule, but we get to pick the best players from Charlotte and Portland.

No new name needed.

The result is a team that is significantly improved. That doesn’t mean very good. These Expos wouldn’t make the playoffs. But the rotation is full of guys who could be, at best, solid middle- to, more likely, back-end starters. The lineup, well, the lineup would still be mostly replacement level players, a.k.a. scrubs, with two exceptions. But with past track records, it’s not impossible to imagine a few of these players having a better than projected year.

On paper, they still wouldn’t be as good as the worst team in baseball (by WAR), the Pirates. But Pittsburgh also doesn’t have a centre fielder, so…

Anyway, I was going to try and do some more analysis beyond using WAR, but I wanted to get this out before Sanchez signed this week.

I also got to add Oliver Perez, who despite having a good year was released by Cleveland today. Boston needs a solid lefty reliever for the middle innings, and I hope they pick up Perez and option Josh Taylor down to Worcester.

Credit for the piece is mine.

The Super Short European Super League

Sunday night, news broke that a number of European football clubs were creating a rogue league, the European Super League. My British and European readers—and Americans who follow football—will know the names of Manchester United, Liverpool, AC Milan, Juventus, Real Madrid, and the others.

To put this in perspective for my American readers, imagine the Yankees, Dodgers, Red Sox, Astros, Padres, Mets, Cardinals, Phillies, Angels, and Nationals saying that they were leaving Major League Baseball to go and form their own new baseball league. That they were doing so to “save the sport”. But in so doing, they also guarantee they all make the playoffs every year.

My frequent readers and those who know me will know I’m a fan of the Boston Red Sox. I should point out that the owner of the Red Sox, John Henry, owns both the Red Sox and Liverpool through his company Fenway Sports Group.

Of course, the analogy doesn’t quite hold up, because there are some significant differences between American sports and European football. Relegation is a big one. Personally, I wish American sports had some way of using relegation to incentivise teams to not intentionally suck.

The basic premise of relegation. Take English football. You have four levels of play and in theory any team can exist in any level. Each year, the worst teams move from their current level down one whilst the best teams move up. And for the top level, the top teams get to compete in lucrative European-wide matches. That is a bit simplistic, but imagine that at the end of last year, the Pirates, Rangers, Tigers, and Red Sox became AAA minor league teams and the four best AAA minor league teams became MLB teams. MLB teams would theoretically try to do everything they could to stay in the MLB and not drop to AAA, because that would mean a loss of money. After all, the Yankees would no longer be heading to Fenway nor the White Sox to Detroit. Would seeing the Detroit Tigers play the Woo Sox really be worth the ticket prices you pay at Comerica Park?

But that’s not how American sports work. And so a few American owners, namely those of Manchester United, Arsenal, and Liverpool, want to ensure a steady stream of money. By creating their own league where their teams cannot be relegated, they guarantee that revenue stream.

In other words, this is all about the owners of these Super League teams making even more money.

Because, during the last year, teams have been hurting without fans in attendance. And that gets us to why I can write this up. Because the BBC in an article about this new league addressed the fact that most of these teams are heavily in debt.

This graphic, however, is a bit misleading. Look at Liverpool. There is no available data for how much financial debt the club holds. So why is it placed between Chelsea and Manchester City? It could well have more debt than Tottenham. Liverpool should really be left off this chart and included in the note, because its placement suggests that it has little debt, when that may well not be the case. This is a really misleading graphic when it comes to how Liverpool fits with the other 11 clubs.

From a design standpoint, I’m also not clear on why the x-axis line extends beyond the labels for £-200m and £600m.

I’m not going to touch all the data labels. That’s for another piece I’ve been working on off and on for a little while now.

At this point I should point out that I was going to post this article later, but in the last 18 hours or so the whole thing has fallen apart as the English teams, followed by the others, have been dropping out under immense pressure from the sport and their fans. To bring back my analogy above, imagine MLB retaliating and saying that if those teams created their own league, the players would not be allowed to play in any other matches and the teams would be locked out from all other competitive baseball games. It’s a mess.

Credit for the piece goes to the BBC graphics department.

The Armchair General…

Manager.

Of the New England Patriots.

As many of my long-term readers know, I am really only a one sport kind of guy. And that sport is baseball. American football, well, I’ve seen one match live and in person and it was…boring. But it’s a big deal in America. And this is the time of the year when teams begin signing free agents.

I happened to be reading the Boston Globe for news on the Red Sox, my team, when I saw a link to this interactive tool allowing users to build their own roster with free agent signings.

Go Pats

Conceptually, the piece is fairly simple. There is a filterable list of free agents, broken out by whether their forecast signing values falls into the high-, middle-, or low-end of the range. Plus a draft pick.

I root for the Patriots. However, if you asked me to name a single player on last season’s roster, I could only name Cam Newton. Apparently he wasn’t great. I really and truly don’t follow the sport.

The piece displays the available free agents, along with those no longer available. (Though, the piece does offer you the option to go back to the beginning of free agent season and pretend reality didn’t happen.)

I have no idea who any of these people are.

I went through and began semi-randomly picking names. I’d heard of some of them, and others were blind choices. Once you’ve selected within the budget, you can choose a draft pick. They all appear in list format to the right with the ability to remove them via a small X button.

Nope, not a clue.

Once you’ve confirmed your choices you’re taken to a screen that reviews your selection. You are able to either tweet it to the world—which I did not do—or start over again. I would do that, but I wouldn’t do any better than how I just did.

I hope I did at least okay.

Overall, the piece felt intuitive and I never had any issues selecting my free agents. Of course, it would help if I knew anything about the sport. But that’s a user problem.

Credit for the piece goes to Ben Volin.

Farewell, Cardboard Cutouts

In 2020, baseball did not permit fans to attend regular season matches. (They changed this for the playoffs.) Instead, many stadiums opted for cardboard cutouts: fans often paid a fee and submitted a picture that the team printed on cardboard cutouts. Like so many things we will say about 2020, it was surreal.

But in Philadelphia at least, cardboard cutouts are out, and human fans are in. The state government in Harrisburg and the city government will allow 20% capacity at outdoor stadiums and 15% for indoor stadiums.

The Philadelphia Inquirer created a small graphic for its homepage to capture this news.

I cannot wait to safely attend a live match. C’mon, vaccines.

I intentionally included other site elements in the cropping to show how the graphic fits into the broader site. The extra white space around the image helps focus attention on the datagraphic over the numerous photographic elements for each article. Clicking on other tabs in the section brings up full-component-width graphics.

To the graphic itself.

Still can’t wait…

My guess would be this was a quick turnaround piece. There are a few things going on here. The first and most obvious one, the squares as spectators. Now I confess this confused me at first. I was not entirely certain what the coloured squares meant; they mean in-person attendees. Was this supposed to be an overall stadium? Or was it a representative seating section?

The quick turnaround becomes important, because this is probably how I would have first conceptualised the graphic. But, with more time, I may have attempted to incorporate the shape of the playing field, be it a baseball diamond or basketball court, or hockey rink—I know all the sports terms!—and surrounded them with shapes representing a certain number of spectators. Squares might not work in that case because of the curves. Circles? Hexagons? Regardless of the shape, the filling of occupied seats would be the same as here, but it would perhaps be clearer to some readers, i.e. me.

Second, we get to the table below the graphics. Here we have a subtle design decision. Note that here the designer greyed out the normal capacity figures. The new figures at that 20% and 15% rates are what appear in black bold text. My usual instinct is to use typographic weight, regular vs. bold, in these situations. But the grey here works equally well.

Third, and this also involves the table, we have the first game data. We talked about the comparison of the capacity and permitted attendance. But I wonder, did the date of the first game with fans needed to be displayed in the same way as the permitted attendance? Because the news isn’t the dates of the first games—at least not as I read the news—but the numbers of attendees. And because of that, maybe I would have reduced the size of the type for the date of the first game. Or, conversely, set the type for the new attendance in a larger point size.

Overall, I enjoyed seeing this news presented visually, even if I was left confused.

Credit for the piece goes to John Duchneskie.