In the third and final installment of my three-part series analyzing the MLB MVP races, I discuss what stats are most important in determining who wins the award, and introduce my predictive model. Apologies in advance for this being a little statistically “in the weeds”, I find it interesting, but then again, I am kind of a nerd.
About the Approach
I looked at individual-season data of all players with a single-season Wins Above Replacement (WAR) of at least 4.9 over the past 30 years.1 Why 4.9, you ask? I chose 4.9 as it represents three standard deviations below the average WAR of any player to win MVP since WAR became a statistic on Baseball Reference in 2009. In the past 30 years, only Justin Morneau has won the MVP with a lower WAR (4.3 in 2006). I separated out position players and pitchers due to the fact that they are measured by very different statistics, for the most part. Then, I ran logistic regression models for each, using a stepwise approach to select the most important variables. What ultimately resulted were the probabilities that each player would win the MVP based solely on their stats. The models were highly accurate, with the position player model generating an accuracy of 95.2%, a sensitivity (true-positive rate, or rate of actual MVP’s correctly identified as winning MVP) of 92.3%, and specificity (true-negative rate, or rate of non-MVP’s correctly identified as not winning the award) of 96.0%. For pitchers, I expanded the scope to include data since 1961 (the start of the Expansion Era in baseball), as only 7 full-time starting pitchers have won MVP in that timespan. This model came out with a 100% accuracy, likely due to the highly disproportionate number of non-MVPs versus MVPs.2 To find the probability of each player winning MVP in their respective season, these percentages were normalized to be relative to the rest of their league in that year, so all percentages summed to 100%. For example, in a situation where a league had two players having stellar seasons, their respective probabilities would be lower than if there was just one superstar head-and-shoulders above the rest.
Important Factors Determining an MVP
For position players, the model selected Plate Appearances, Runs, Hits, Singles, Doubles, Triples, Stolen Bases, Walks, OPS+, WAR, Wins Above Average (WAA), oWAR (Offensive WAR), dWAR (Defensive WAR), Batting Runs (Rbat), and Rbaser+Rdp (Baserunning Runs plus Double-Play Runs) to be the predictors. You can click the links attached to some of the more advanced statistics to read more about them. Using the coefficients generated form the Logistic Regression, I found the relative importance of each of these factors.
This tells us how much each of the variables contributes to the overall model. OPS+, therefore, was identified as the most important stat in determining MVP probability. For pitchers, the important stats were Wins, ERA, Strikeouts, FIP (Fielding-Independent Pitching), WHIP (Walks + Hits / Innings Pitched), Hits/9IP, and Walks/9IP. The variable importance shows WHIP is most important, with Hits/9 and Walks/9 also above 20% relative importance. Those three stats were shown to account for over 82% of the variability in the model.
Historically Most Likely MVPs
In applying the model to the past 30 years, I found the players most likely to win the MVP in each season. Below shows the top-25 most-likely MVPs since 1994, along with whether or not they won MVP and where they ranked in terms of final MVP voting by the sportwriters.3
2024 MVP Probabilities
Based on data through June 8th, I ran the model on this year’s crop of players to find the leaders in this year’s MVP races.4 Here are the top-10 in each league currently.
As discussed in my previous articles on the AL and NL MVP races, we have an interesting situation where two players from the same team are among the most likely MVP candidates in their leagues. I debated in my previous articles whether or not that had a positive or negative impact on their chances. I now have an answer - it does not! In fact, I found that the number of teammates a player has with a season-long WAR of 5+ has a correlation coefficient of merely -0.022. Essentially zero relationship whatsoever. So, we can put to rest the argument that Soto, Judge, Ohtani, or Betts are hurt or helped by having great players around them. I hope to update this weekly and keep the updated model in the new “Models and Predictions” section of my Substack. It will be interesting to see how things change as the season goes on.
Statistics used for the model are from Baseball Reference
Data was semi-balanced for both pitchers and position players, but only so much can be done when you only have 7 positive outcomes over more than 50 years of data.
Years Shohei Ohtani won as a two-way player were excluded from this table, as it is nearly impossible to model such a unicorn. Unfortunately, we may never see him pitch and hit again at the volume he did. While this may be unfortunate for Baseball, my model wouldn’t mind that =). 2020 also was excluded due to the truncated 60-game season.
Players must have had a projected season-total WAR of 4.9 or greater and meet the minimum qualifying standards to win the Batting Average or ERA titles