The Batting Order: Where Managers Cost Their Teams Wins

The managers of MLB seem to have it down to an art.

  • 1st: Speedy guy who can steal bases.
  • 2nd: Guy who can drop a bunt or draw a walk.
  • 3rd: Your best pure hitter.
  • 4th: Cleanup man; your best power hitter.
  • 5th: Complete the “heart” of the order; a good power hitter.
  • 6th-8th: Your remaining position players, in descending order of quality.
  • 9th (NL): Your pitcher.

Unfortunately, the conventional batting order is nothing but tradition, and no one ever seems to question this idea that’s been around for more than a century. Have you considered that this isn’t actually the optimal lineup to win more baseball games (score as many runs as possible)? Many sabermetric studies have looked to find the truly optimal batting order, based mostly on the two most important hitting stats: OBP (on-base percentage) and SLG (slugging percentage). Both of these stats, based on regression modeling, have much higher correlations to runs scored than BA (batting average), which is conventionally the main statistic used to measure the quality of a hitter – once again, an archaic tradition that the game of baseball has unfortunately been resistant to changing.  A higher correlation, in non-statistical terms, means that a team that has a higher OBP or SLG will most likely have more wins over the course of a season than a team that just has a higher BA.

The leveraged out

Back to the batting order – studies have striven to find many different tidbits about lineups that would lead to optimization. One method, as written by Sky Kalkman in Beyond the Box Score, is the importance of avoiding outs for each player in the lineup.  Basically, over the course of a season, which spots in the lineup are most crucial to winning in terms of OBP?  Which spot’s OBP has the biggest effect on a team’s 162-game success? Here’s a section from his study:

“Another way to look at things is to order the batting slots by the leveraged value of the out. In plain English (sort of), we want to know how costly making an out is by each lineup position, based on the base-out situations they most often find themselves in, and then weighted by how often each lineup spot comes to the plate. Here’s how the lineup spots rank in the importance of avoiding outs: #1, #4, #2, #5, #3, #6, #7, #8, #9”

According to Kalkman’s study, you should put your hitters in the specified order, based on descending value of OBP.  Because the first batter’s OBP has the greatest effect on your team’s total runs, then fourth, then second, etc., ordering your lineup in such a way would be a big step up from just generalizing and using traditional methods. To explain the concept further, and to show one way in which managers have failed to give their teams the best chance to win, Bill Petti of FanGraphs developed the following heat map graphic (darker green meaning higher OBP, and darker red meaning lower OBP):

Slide2 Clearly, for the majority of the past forty years, managers have been putting the player with the greatest ability to get on base in the spot that ranks fifth in importance, in terms of OBP.  Vice versa, in the most important spot for a high OBP, managers have tended to, on average, put their third or fourth best player at getting on base. As a baseball fan, facts like this truly disappoint me.  And although the leveraged out is one way of optimizing a lineup, there are additional important studies to consider when crafting your most successful lineup – which MLB managers also seem to ignore.

Lesson: bat your highest OBP player first, while generally following Kalkman’s list the rest of the way, in order to avoid more costly outs over the course of the season.

Run-scoring potential

The most important thing to do on offense in baseball is to score runs. So why not continue to optimize our lineup to allow us the best chance of scoring the most runs? TangoTiger has developed a run expectancy matrix – predicting the number of runs scored in each of 24 possible situations based on the occupancy of the bases and the number of outs. The data was taken over a span of (coincidentally) 24 years.  What did it find?

Screen Shot 2014-07-28 at 5.25.17 PM

The matrix has an x-axis of the number of outs, and a y-axis of the baserunner setup.  Each of the decimal numbers represents the expected number of runs scored from that point on in the inning. For example, with the bases empty and nobody out, there is a base run expectancy of .477. What does this matrix tell us?  For one, it shows that, in general, sacrifice bunts – a conventional-wisdom baseball tactic – are generally not smart.  A man on first with no one out gives you a higher run expectancy than a man on second with one out, and so on.  There are other similarly interesting facts hidden in the matrix as well, but we want to see how it affects our optimal batting order.

According to Keith Law at ESPN, we can consider the number of plate appearances each lineup spot receives, as well as the frequency with which each lineup spot gets each base-out situation (vertex in the matrix) to actually discern which spots in the lineup have the most value in terms of run-scoring over the course of the season.  In other words, we are looking for the spot in the lineup that has the potential to produce the most runs by performing well.  Which lineup spot is that?  The research doesn’t say third, but second – the spot that as recently as 2012 was given to each team’s sixth best player at not getting out according to the aforementioned heat map graphic.  How can so many teams be willing to hurt their chances of winning by not looking at the facts?

Lesson: bat your best overall hitter – perhaps the one with the highest OPS (OBP plus SLG) or SLG – second in order to drive in more runs over the course of a season.

The myth of the stolen base

One of the most well-known lineup rules is that the top spot in the order (and many times the second spot as well) belongs to speedsters who can steal bases and make things happen on the base paths.  However, managers never actually consider when stolen bases are most valuable.

Consider this simple thought experiment – why do players with the ability to advance bases on their own bat directly in front of the team’s most skilled hitters?  If your base-stealer gets on base right before your guys who can produce extra-base hits most often,  is he getting his full value?  A home run from your best power hitter will knock in baserunners no matter which base they’re on, while a single from a less polished hitter won’t get you a run unless there are baserunners in scoring position. The weaker batters toward the end of the lineup are actually the ones that need help driving in runs.

Here’s just one example – Dodgers shortstop Dee Gordon currently leads MLB with 46 stolen bases, and manager Don Mattingly therefore bats him leadoff.  But he’s probably not the right choice for that spot, as he’s oscillated between 3rd, 4th, and 5th in OBP among the team’s starters this season (we learned earlier that your best player at getting on base should lead off). Additionally, the players that follow him in the lineup – usually Yasiel Puig, Adrian Gonzalez, and Hanley Ramirez – sport the three best slugging percentages on the Dodgers!  Clearly, that trio has the ability to produce extra base hits and knock Gordon in themselves, without him risking an out for second base – a tradeoff that the run expectancy matrix has already told us is fruitless. The most common 7th and 8th batters on the Dodgers, Juan Uribe and A.J. Ellis, have the 7th and 20th highest SLG, respectively, on the team, and Ellis is followed in the lineup by the pitcher, likely the team’s worst hitter.  Don Mattingly would probably cringe if he knew how many runs his shortstop could have scored on singles from Uribe and Ellis, runs that were never scored, and how many times his shortstop risked an out (or actually got out) stealing second or third, only to be driven in by a Puig or Gonzalez bomb.  Hopefully soon, data will surface that will actually show managers like Mattingly exactly how many runs they’ve given up this way

Lesson: in the 5th-6th-7th area of the lineup, bat competent base stealers who can give your weaker hitters opportunities to drive in runs.

So, let’s revisit the batting order and see how managers could actually be pulling their weight in the W/L column.

  • 1st: Your player who’s best at not getting out – a high OBP means more reaching base and more run scoring. Speed is an advantage, but a slow player that can draw walks is better than a recklessly swinging burner.
  • 2nd: Your player with the best ability to drive in runs – weighted situations say that putting your best OPS/SLG batter in this slot will lead to the most total runs for a team over the course of the season.
  • 3rd: One of your middle-of-the-pack hitters – not only is this slot fifth out of nine in terms of most important OBP, but it’s second only to the leadoff spot in terms of plate appearances with the bases empty. The spot conventionally reserved for one of your best RBI guys will have few chances to actually get them!
  • 4th: After your best base-reacher and best overall slugger, your third best hitter should bat cleanup, where reaching base will be very valuable and a high slugging percentage will still provide ample opportunity to drive in baserunners.
  • 5th-9th: Starting with middle-of-the-pack guys that are comparable to your number three hitter, put the rest of your hitters in descending order of quality, as those lower in the lineup will receive fewer plate appearances over time and therefore hurt the team effort less. Put players with the ability to steal bases in the earlier slots here to maximize run-scoring potential.

There is much more work to be done – and much work currently being done – in terms of creating the perfect batting order.  However, there is already much concrete statistical evidence, including leveraged outs, run expectancy, base-out situations, and run-scoring potential, that show that the lineup makers are truly failing both their teams and their fans by not providing better strategies by which to win baseball games.  Now, the matter becomes when the league will open its eyes.

by Derek Reifer, Northwestern University


Breaking Down the James Rodriguez Transfer

It is almost certain now that James Rodriguez will be moving from Monaco to Los Blancos, for 60 million pounds, to become the 5th most expensive player of all time, trailing only his teammates Cristiano Ronaldo and Gareth Bale, and his rivals at Barcelona, Luis Suarez and Neymar. Like both Bale and Neymar, a lot of his value is in his youth and the idea that he may not yet have reached his full potential at the ripe age of 23. Now that the transfer is a done deal, it is time to analyze the ramifications of the transfer for Real Madrid and to attempt to decide if James is money well spent, or a big mistake for Real Madrid.

The reason that James Rodriguez is now a top 5 transfer of all time is because of his play in the World Cup. Comparing James’ performance at the World Cup to those of his teammates at Real Madrid who play a similar attacking midfield position, it is clear that he was the best. On a per-minute-played basis, he was the best player at his position for Real Madrid in all of the key Squawka attacking categories, and at WhoScored, he had the 2nd-best overall rating in the tournament, trailing only Lionel Messi – who of course won the Golden Boot, scoring the most goals in the tournament.


While his play was especially inspired during these five games, the problem with paying 60 million pounds for a player after a great World Cup run is that he did only play five games. Five games is a remarkably small sample size, and there are many players who have put together phenomenal five-game runs that will not be purchased for anywhere near 60 million pounds. When you dive into James’ World Cup stats a little closer, you find something a little more troubling. Looking at Squawka’s game ratings, while James put up a phenomenal game rating of 106.27 against Japan, in the two games against his toughest competition, Ivory Coast and Brazil, he had game ratings of just 13.11 and 17.42. While those aren’t terrible games by any stretch of the imagination, they are not what you would expect from a player of that expense.

A more predictive way to look at how James Rodriguez will do on Real Madrid is by looking at his stats from last season on Monaco. When you compare the same attacking stats that James dominated at the World Cup against his Real Madrid teammates, to what he did in the regular season in those stats, he does not seem to stand out. His attacking ability looks strikingly similar to Angel Di Maria, and his overall score isn’t as good as Luka Modric’s. On a per minute basis, though, the star of the group is Isco, a player who plays the same number ten position that James will likely play for Real Madrid and is even younger than James with potentially more room to grow. On WhoScored, his yearly rating for Monaco was a 7.41, a strong rating that was good for the 3rd best on his club team. However, Angel Di Maria had a higher rating that James and Isco were just behind at 7.39.


The reason that the comparision between Isco, Di Maria and James is important is because it is believed that Di Maria is as good as gone for Real Madrid, with him heading to PSG, and Isco is potentially the next man to get the boot from Madrid (to make room for the large James transfer) with Real Madrid losing a key starter to their Champions League Champion squad last year, and their most promising prospect, who was still playing at a high level in his own right.

Comparing this transfer to Madrid’s great move for Toni Kroos for less than 20 million pounds, the transfer for James Rodriguez has much better chance to be a big bust for Los Blancos. If he continues his five great games at the World Cup at Real Madrid, then he will be undoubtedly worth the money. However, if the more likely scenario occurs and he plays more toward his form at Monaco, then Real Madrid might regret purchasing him, and losing Di Maria (and potentially Isco) as well.

by Robert Garcia, Northwestern University

The Statistical Impact of Neymar and Thiago Silva’s Absence

The loss of Neymar

The entire nation of Brazil was punched in the stomach when the news came that Neymar, their young superstar, was out for the remainder of the World Cup, and the emotional video that Neymar sent out to the country caused tears for millions across the world. For most nations in the World Cup, losing a player of Neymar’s caliber would probably end their Cup dreams, the same way that #7 FIFA ranked Uruguay looked like a completely different team in their two games without their superstar, Luis Suarez. Even Argentina didn’t look the same after Angel Di Maria left the field and Enzo Perez, midfield for Benfica, stepped in for the rest of the match.

Fortunately for Brazil, the two most logical replacements, Willian and Bernard, would start for most any other squad in the World Cup. Willian is a battle-tested, attacking midfielder who plays for Chelsea and has a style similar to that of Neymar. Both Willian and Neymar are dribblers who can hold possession and use their speed to get the ball up the pitch. When you look at the stats from Squawka, you can see how Willian is better in several key areas than Neymar, though Neymar is still the superior overall player, especially when it comes to scoring goals.


From the comparison on Squawka, you can see that Willian has the edge in creativity, in terms of creating chances in the club league season for Chelsea, in the more difficult Premier League, and in about the same amount of time played, he created vastly more chances, 64-41. He also had about double Neymar’s key passes, and even had a better Squawka possession score than Neymar. However, the one thing that Brazil has especially lacked this world cup from their two strikers, Jo and Fred, has been goal scoring, and Neymar does a much better job scoring goals than Willian, scoring more than double Willian’s goals this season. Jo and Fred have scored 1 goal combined, and most of the scoring responsibilities have fallen on Neymar. Willian doesn’t have the same scoring ability as Neymar, but Brazil’s World Cup hopes depend on his passing and creative ability to bring more out of the other Brazil attackers, such as Jo, Fred, and Hulk.
Thiago Silva’s absence, and why it is difficult to properly measure
Thiago Silva was suspended for the upcoming semifinal game against Germany, a huge blow after a remarkably stupid yellow card from the seasoned vet. Thiago Silva’s replacement, and the analysis that comes with it, is much more complicated, and there is no cut-and-dry replacement for him. Silva is one of the best defenders in the world, and any time a team is missing a player of Silva’s quality, it undoubtably hurts. However, when just looking at the stats between Thiago Silva and his replacement, FC Bayern star Dante, there does not appear to be a large gap between the two players.


It is a well-known fact that David Luiz plays more like a defensive midfielder than a true center back, and his squawka score reflects those tendencies, showing his very high attack score, relative to his total minutes played. Dante also has a high attack score relative to his defending score, which is where the biggest problem lies for Brazil. Thiago Silva does a phenomenal job covering defensively for Luiz, and his defensive prowess is shown in his defense score in Squawka which is more than double Dante’s. His clearances also stand out, being more than 60 above Dante in a similar number of minutes played. David Luiz and Thaigo Silva’s great attack/defense chemistry (which PSG paid 50 million pounds to get) is key to the Brazilian back 4, with both Marcelo and Dani Alves known for their attacking skills. This all shows how much the defensive ability of Silva is relied upon, as Dante will not  be able to play the same style that he excels at.

What this means for Brazil’s chances
All in all, losing two of your best players can only hurt a teams chances of winning, and Brazil’s biggest challenge all tournament will come next game against the best team in the tournament, Germany. If the last two games were being played on a truly neutral field, Germany would be the favorites to take home the title, however, with Brazil’s home field advantage, they are still the favorite, with Nate Silver of FiveThirtyEight giving them a 46% chance of winning the title and a 67% chance of beating Germany. However, while Silver thinks that the lost value is from the attack, with the attack-minded Dante and increased chances from Willian, the attack may stay closer to form than expected, while the area to watch for Brazil is their defense without Thiago Silva’s support.

by Robert Garcia, Northwestern University