MLB Regression Candidates - For Better or Worse
Which players are showing signs that their hot starts won't continue, or slow starts will turn around.
In baseball, we love to draw bold conclusions from small samples. After months of deprivation, fans are eager to banter about potential breakouts or slumping stars. But baseball is a long season, and we’ve seen time and time again that early trends often fail to hold up over the course of a 162-game campaign.
Take Aaron Judge’s incredible MVP season last year — in which he posted a .322 batting average, a 1.159 OPS, 56 home runs, 144 RBIs, and 11.2 Wins Above Replacement. It didn’t start out all that hot. Through April 24 of that season, Judge was hitting just .191 with a .702 OPS, and his 9 home runs had him on pace for only 25.
Judge is no anomaly — plenty of hitters have defied early-season numbers by the time the season is said and done.
So how should we view early-season numbers? While raw “actual” stats—like batting average and OPS (on-base plus slugging percentage) for hitters, or ERA and FIP (Fielding Independent Pitching) for pitchers—can offer some insight, they’re often unreliable indicators of how a player will perform over the course of an entire season, especially early on.
That’s where “expected statistics”, become particularly useful—both for hitters and pitchers. These metrics strip away the influence of defense and park effects to measure the skill demonstrated at the moment of batted-ball contact.1 A hit probability is given using factors such as launch angle and exit velocity, based on the outcomes of historically similar batted balls.
By aggregating the expected outcomes of each batted ball along with actual strikeouts, walks, and hit-by-pitches, we get advanced metrics like Expected Batting Average (xBA), Expected Slugging (xSLG), and Expected Weighted On-Base Average (xwOBA)—all of which reflect a player's performance based on the quality and frequency of contact, rather than just the outcomes.
For pitchers, similar metrics exist. xFIP calculates a pitcher's FIP but replaces actual home runs allowed with projected home runs based on that season's league average HR/FB rate. xERA is a straight translation of a pitcher's xwOBA onto the ERA scale.2
Predictable Power of Expected Stats
The correlation between a pitcher’s ERA approximately one month into the season (specifically on April 24) and their end-of-season ERA is just 0.487, indicating that April 24th ERA explains only about 23.7% of the variance in end-of-season ERA.3 Similarly, the correlation between their FIP on April 24th and at season’s end is also fairly low, at just 0.595. However, when we compare early-season xFIP to end-of-season FIP, the correlation increases to 0.625. While that increase in magnitude may seem marginal, a deeper dive into the data reveals more meaningful insights.
When comparing the two, the error statistics tell the full story. The average difference between April 24 FIP and end-of-season FIP is -0.170, indicating a bias toward a lower early-season FIP. In contrast, the average difference between April 24 xFIP and end-of-season FIP is 0.035, showing a slight bias toward a lower season-end value. The Mean Absolute Error (MAE) is 0.76 for FIP versus 0.57 for xFIP, while the Root Mean Square Error (RMSE) is 0.956 for FIP compared to 0.730 for xFIP. While FanGraphs does not provide expected statistics for specific dates in past seasons, we observe similarly low correlations for early-season batting stats. On April 24, the correlation between OPS and end-of-season OPS is just 0.493, and for wOBA, it’s even lower at 0.482.
Clearly, if we want to better understand how a player is performing—and, statistically speaking, how they’re likely to perform by season’s end—expected stats are a far better, though still imperfect, tool. They also allow us to identify potential regression candidates: players whose performance is likely to improve or decline as the season progresses.
Pitching Regression Candidates
Below is a table comparing xFIP and FIP, and xERA and ERA for qualified pitchers this season.
The Rays appear due for some positive regression, with starters Ryan Pepiot and Zach Littell rank second and fifth, respectively, in greatest current FIP and xFIP difference (for the better). This is a promising sign for a team, which already sports the 9th-best ERA in the big leagues. Also, the Rays are currently just 2.5 games out of a playoff spot in the American League, which has been remarkably mediocre with extreme parity, as FanGraphs gives 13 of its 15 teams at least a 17% chance at making the postseason.4 On the flip side, Michael Wacha’s hot start is unlikely to continue, with a league-leading 1.38-point greater xFIP than FIP. Toronto’s Bowden Francis is due for some negative regression as well, as the starter holds an xERA (a paltry 6.48) gruesomely higher than his ERA by nearly 3.00-points, the worst in the league.
Batting Regression Candidates
Well, the White Sox may have a near-zero chance of playing in October, but at least they have the hopes of some improvements on offense. First baseman Andrew Vaughn shows the second-largest difference between xwOBA and wOBA, third-best Batting Average and xBA difference, and third-best Slugging Percentage and xSLG difference among all qualified batters in 2025. The ageless Royals legend Salvador Perez hold the most promising differences in xwOBA versus wOBA, xBA versus AVG, and xSLG versus SLG. On the other hand, the hot start from Orioles outfielder Cedric Mullins might not be long for the stay. Mullins has the largest discrepancy in all of baseball between his xwOBA and wOBA, as well as between his xSLG and SLG.
Keep an eye on these players on both ends of these tables over the coming weeks and months. These carts can be accessed anytime in the MLB Models tab on the newsletter dashboard, and will be updated periodically throughout the season.
R-squared, calculated via squaring the correlation coefficient (r), is a measure of explained variance.
Sorry Angels (5%) and White Sox (<1%).