Breaking Down WAR: How Traditional Stats Explain Baseball's Most All-Encompassing Statistic
Wins Above Replacement is a bit of a black-box. Relating it to more mainstream stats can help understand it.
I talk a lot about Wins Above Replacement in Baseball. It’s a statistic that can be very difficult to comprehend, even for the most passionate baseball fans. To be honest, it took me a while to fully understand it when it first started to really gain popularity across the sport. WAR measures a player's value in all areas of the game by calculating how many more wins he is worth compared to a replacement-level player at his same position.1 A “replacement-level player” describes a Minor League replacement or a readily available free agent and varies from year-to-year. The method in which WAR is calculated is a bit complex, but you can read the formulas on MLB.com. Understanding what generates a win above replacement is still not quite clear even after studying the formulas. I wanted to try and explain WAR in the contest of more traditional, mainstream stats that are easier to understand.
I selected some of the most common “traditional” stats for position-players, starting pitchers, and relief pitchers, and investigated what stats were most correlated to WAR and how important each is in explaining a player’s WAR.2
Correlations
I examined the full-season statistics of qualified players since integration (1947) to measure how correlated WAR is with each stat. Position-players, starting pitchers, and relief pitchers were looked at separately due to the different statistics each are measure by.
For position-players, we see OPS (On-base plus Slugging Percentage) being the most correlated, with its components ranking second and third. The R-squared value in the last column represents the proportion of variation in WAR that can be explained by each individual stat, in a vacuum.
For starting pitchers, strikeouts has the strongest correlation, with the ability to prevent runners from reaching base (WHIP) ranking second. Relief pitcher stats show similar correlations, but with saves ranking fourth and wins being far less correlated.
Variable Importance
To determine the most important traditional stats and their relative contribution to Wins Above Replacement, I first identified the best subset of predictive stats in terms of "adjusted R-squared”.3 Following that, I ran a ridge regression model with the identified subset as the predictor variables and WAR as the outcome. I chose ridge regression to combat the multicollinearity between many of these stats (i.e., a batter with lots of home runs will naturally have more RBIs, or a pitcher with a high strikeout rate would be expected to have a low WHIP). The results from the regression model allowed me to then calculate the relative importance of each stat on Wins Above Replacement.4 The “importance” represents the relative contribution of each stat on the model’s R-squared. In other words, how much each stat contributes to the total explainability the model has on WAR. For context, the position-player adjusted R-squared value was 0.647, for starting pitchers was 0.716, and for relievers was 0.634.5 It can be expected that much of the lack of explainability by the model (i.e., difference between the model’s R-squared and 1.00) can be attributed to the positional adjustment applied to the WAR calculation, which we could not measure in this modeling.
For position-players, on-base percentage was determined to have the greatest influence on a player’s WAR, accounting for 18.4% of the variability in the model. Strikeout percentage was deemed least important with less than a 1% contribution.
Strikeouts were most important for both starters and relievers. For starting pitchers, strikeouts represented nearly a third of the variation in WAR, more than twice as important as the next-best stat - slugging percentage. For relief pitchers, strikeouts were still the most important, but to a somewhat lesser extent at just 25.9% relative importance. So, we can expect a high strikeout pitcher to generate a lot of WAR.
Conclusion
While WAR can be a bit “black-boxy”, measuring how more traditional, mainstream stats are able to predict it can help us better understand the key factors that go into it. I hope this has led to a clearer picture of what goes into Wins Above Replacement, and the influence that more understandable numbers have in generating what has become baseball’s key, all-encompassing metric for evaluating a player’s production.
All WAR data based on FanGraphs WAR calculation.
Using the “regsubsets” function from the leaps package in R.
Calculated using the “calc.relaimpo” function from the relaimpo package in R.
This can be interpreted as the percent of the outcome that can be explained by the predictor variables in the model (e.g., 71.6% of starting pitcher WAR can be explained by those stats).
This is great work, and work that I wish more writers would undertake with respect to the increasing use of statistical black boxes that you reference. I have a rule that writers and other members of sports media should not be allowed to use a statistic unless they can explain, at least at a rudimentary level, not "how" it works ("then you divide by 11.1...") but "why" it works ("and this makes sense because..."). Oh, and just saying that "everybody agrees this works" isn't sufficient.
Our society is increasingly and alarmingly a "plug and chug" one where supposed black box "truths" are proliferating while the number of people who understand why their numerical outputs are actually meaningful or predictive grows increasingly extinct. This piece reflects an elegant effort to take the road not taken and answer the "why" question. I so appreciate that.
Beyond that, a few observations. One of the more annoying and self-defeating aspects of sabermetrics and in part why it is held in such low regard with the baseball public is its elitist rejection of traditional counting stats. Counting stats are not only a crucial link to baseball's past, but they also highly relate to what a fan experiences at a game and the outcomes they are trying to understand. It's why they were created first because they were so logical.
A fan knows that a runner crossing the plate is an important event to the outcome, as is a player's hit driving other players across the plate. Counting them up makes sense. Fans also understand that home runs are inaccurately counted twice - both for a run scored and a run driven in despite producing only a single run - and that efficiency on a per plate appearance basis matters too. Counting stats have created a huge amount of understanding and fan interest.
It's ironic to me that in the rare instance when we need to show our work at the board and demonstrate how some complex baseball calculation works - we revert to counting stats because they are so fundamental to not only history but also of how we experience the game. They have organic credibility. Nobody sits in a seat at a game and thinks about WAR - they think about a HR, a strikeout, or an RBI. WAR is something that is created after the game by those who not only did not experience the game but are not even required to.
It's that linkage between fan experience and understanding that the sabermetric community seems to not fully appreciate. Unfortunately, it's not just baseball. As I watch the tragic Boeing 737 MAX saga unfold, I'm struck by how Boeing repeatedly ignored and rejected so many aeronautical engineering "counting stats" and, instead, looked to a black box solution that so few at the company or at the Regulatory bodies understood or could explain. It's why asking questions and figuring out our black boxes is so crucial instead of simply taking them at face value.
I come not to bury sabermetrics but to praise it and make it more responsive to fans. 45 year ago, Bill James did not reject counting stats - he embraced them. However, he used them in new, novel, and entertaining ways to reveal new truths to a wholly resistant baseball fan base. James was nothing if not a master politician seeking a revolution. He knew that publishing a book alone would not result in that revolution, which could only come through the logical, compelling, and persuasive words and ideas he articulated within its pages.
To that end, James understood his audience and that using the credibility of counting stats would help win over the public in ways that overly complex and counterintuitive formulae would not. It's why when he introduced his revised - and more accurate - Runs Created Formula, he noted that he rarely used it because its increased accuracy wasn't worth the added complexity. He connected with fans and explained to them - in their terms - why a revolution was necessary to advance understanding.
Today, unlike James, I often feel as if the sabermetric community is dismissive and at war (no pun intended) with average baseball fans. No longer in the business of convincing anyone of anything, they occasionally answer the "how" question when pressed while completely ignoring the "why." Your efforts here succeed on many levels. Like James, you understand the need to win the public over - not bludgeon them - and use their language and understanding to convince them of the case.
Thanks again for a great piece.