Who Will Be Voted into the 2025 Baseball Hall of Fame Induction Class?
Predicting the 2025 class and looking ahead to future candidates.
Ichiro Suzuki, C.C. Sabathia, and Dustin Pedroia headline the group of 14 first-time candidates on the 2025 Baseball Hall of Fame ballot. They join 14 returning stars, including Billy Wagner, Andrew Jones, Carlos Beltran and Alex Rodriguez, seeking induction into Baseball’s most elite fraternity. Writers are casting their votes as we speak, and results will be announced at 6pm EST on January 21st. In anticipation of the announcement, I tried to predict who on the ballot will, either this year or in the future, receive the phone call that cements their name in baseball history.
The Process
Using Hall of Fame induction as the outcome variable, I ran a stepwise logistic regression model on all players who have amounted either 4,500 plate appearances or 275 games pitched, and are not currently active, on the Hall of Fame ballot, or eligible for the ballot within the next 5 years1. Two separate models were run - one for batters and one for pitchers - using 70/30 train-test split, partitioning by the outcome variable to ensure proportional representation between the two data sets.2 The resulting models were excellent, with a 95.5% test-set accuracy for batters an 93.0% accuracy for pitchers. They were also great at identifying “true-positives”, with a sensitivity of 98.4% and 96.3% for each respective model.3
The Stats That Made Up the Model
As previously stated, I ran a stepwise logistic regression, which is a form of regression that uses an automated iterative process to select the optimal predictor variables based on a defined criteria.4 To accomplish this, I used the stepAIC() function from the MASS package in R, which selected the variables that yielded the model withe lowest AIC value.5 I started with collection of various offensive, defensive, and pitching stats from FanGraphs, and upon running the stepwise procedure resulted in a batter model of 14 stats and pitching model of 12 stats. Below are the stats, along with their relative importance in the model.
If you are unsure of what some of these stats may be, or are interested in learning more about any of them, you can reference the FanGraphs glossary: Glossary | FanGraphs.
Results
Here is what the model predicts for the 2025 Hall of Fame class…
The model believes Alex Rodriguez, Billy Wagner, Ichiro, Carlos Beltran, Francisco Rodriguez, Chase Utley, Manny Ramirez, Andrew Jones, and Jimmy Rollins are all Hall of Famers. This doesn’t mean they are expected to be voted in this year necessarily, but at some point over their tenure on the ballot they are likely to get in. I’m not entirely surprised that A-Rod was predicted to be a sure lock, but obviously the model doesn’t account for the controversy surrounding his PED usage. Similarly, I would expect Beltran and Ramirez to have lower odds than the model states also, given Beltran’s involvement in the Astro’s sign-stealing scandal and Manny’s PED usage (…and just “Manny being Manny”).
Looking Ahead
There is no shortage of stars recently retired or in the game today. Active players like Clayton Kershaw and Mike Trout are considered by many to be sure-fire first ballet HOF’ers. While generational Baseball icons such as Albert Pujols and Miguel Cabrera await inclusion on the ballot following the required 5-year post-retirement waiting period. Among those active or recently retired players who meet the qualified plate appearances or games pitched numbers I set, who can we expect will one day get the call to the Hall?
Now, it should be noted that for those active, the model only considers stats to date, and does not project any future output. So essentially for active player this can be viewed as “if this player retired today, how likely would it be that they make the Hall of Fame?”. This list is largely unsurprising, but the model’s love of relief pitchers (Jansen, Kimbrel, Chapman) was intriguing, especially since pitcher type (starter or reliever) was modeled. Similar to A-Rod and Manny, we should also heed caution to Cano’s high probability given his history with PED’s. Let’s see of guys like Freeman, Arenado, Lindor, and Machado - all of whom currently have greater than a 20% probability of making the Hall - will do enough in the years they have left to cross that 50% threshold.
We will check back in on January 22nd, and in years down the road, to see how these predictions pan out. One thing is for certain however, some generational stars are bound to get the call this year and in the foreseeable future.
The required waiting period after retirement.
This was particularly important due to the imbalance of HOF’ers and non-HOF’ers in the data.
The rate at which Hall of Famers we accurately identified as Hall of Famers by the model.
Anthony Miller, John Panneerselvam, Lu Liu, A review of regression and classification techniques for analysis of common and rare variants and gene-environmental factors, Neurocomputing, Volume 489, 2022, Pages 466-485, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2021.08.150.(https://www.sciencedirect.com/science/article/pii/S092523122101907X)
Akaike’s information criterion (AIC) compares the quality of statistical models to each other. It is not an absolute measure, and says nothing about the quality of a model in a vacuum. Therefore, it should only be used as a tool of comparison in which the lower value is preferred.
I think your model greatly overrates relief pitchers chances as seen by Kimbrel over Verlander.