The Anatomy of a "Cinderella"
PLUS: Which teams are set up to upset in the 2025 NCAA tournament.
It is quite possibly what gives the NCAA tournament it’s charm. Far lower-seeded underdogs upsetting championship contenders in the big dance is among the most thrilling — and sometimes frustrating — moments in sports. While we may rip up our brackets when a 16-seed defeats the nation’s top team (sorry Virginia), or wallow in sorrow when our school sees it’s championship hopes squashed by a college we’ve never heard of, upsets make the NCAA tournament so uniquely special. Often they seemingly come out of nowhere. Like, who would have every expected the 22-14 N.C. State Wolfpack, who frankly only made the tourney because of an incredible run to win the ACC tournament, to advance all the way to the Final Four?1 Digging into the numbers can sometimes help us spot potential “Cinderella” teams before they ruin all of our brackets.
What Makes A Winning Underdog?
First, let me state that I am defining a “big underdog” here as a team seeded at least 4 seeds lower than their opponent (i.e., an 8 seed vs. a 4 seed or 15 seed vs. a 2 seed). Beginning with a set of over 50 stats from various sources such as KenPom, Bart Torvik, and TeamRankings.com, I used the Boruta feature selection algorithm to determine the stats/numbers most important to an underdog causing a “big upset”.2 The observations in the data consisted of every team to beat a team seeded 4+ spots higher in the NCAA tournament from 2008-2024. Boruta functions as a wrapper algorithm around a Random Forest process. It works by first adding randomness to the data through shadow features — shuffled copies of all variables. Then, it trains a random forest classifier on the extended data set and applies the importance measure, Mean Decrease in Accuracy, to evaluate the each feature’s importance. At each iteration, it checks if a real feature has a greater importance than its best shadow feature, removing features determined highly unimportant. The algorithm stops either when all features get confirmed/rejected, or it reaches the defined maximum iterations. This algorithm is ideal for high dimensional data sets (data sets with many variables relative to the number of observations) such as this. The algorithm gave us the 9 most important statistics for an underdog to have to cause a “big” 4+-seed upset. The importance scores are scaled to sum to 100% for ease of interpretation.
A team’s resume, a ranking by Bart Torvik assessing a team’s tournament resume, was deemed most important with a 16.2% relative importance. We can see that a team’s performance against quality competition, which is essentially what is captured by Resume, Elite Strength of Schedule, NET Ranking, and Wins Above Bubble, is highly important, as it accounts for over 60% of the importance.
The Most Typical Winning Underdogs & 2025 Tournament Upset Candidates
To identify the teams which have most resembled a typical “big upset” winning underdog, I ran a K-Means classification algorithm using the same data as above with the factors deemed important as the variables. I also included 2025 teams ranked outside of the KenPom top-25 (those less likely to receive a 1-6 seed, to eliminate the better teams which are unlike to have a “big upset” opportunity), to see what teams we might anticipate causing an upset this year. This produced the Euclidean Distances between each winning underdog and the “Average Winning Underdog” from 2008-2024.3 From here, we can see the teams which most resembled an upset candidate.
This year’s Memphis Tigers topped the charts. CBS’s Jerry Palm lists Memphis as a projected 6-seed as of Sunday morning, so they may not get a first round chance, but possibly down the line, especially if they drop to a 7-seed. The second team on this list, last year’s Oregon Ducks, were an 11-seed and beat 6th-seeded South Carolina in the first round before falling to 3-seeded Creighton. In the same year, #7 on the list, San Diego State, advanced all the way to the National Championship game as a 5-seed, knocking off 1-seed Alabama on the way. Below is a breakdown of this year’s top upset contenders along with the teams they are most similar to from the past. I excluded teams highly unlikely to make the tournament (off Joe Lunardi and Jerry Palm’s radar) from the table below.
The Multi-Win Underdog
On occasion, an upset may be just a fluke. We all can probably think of some time when a team somehow had every break go their way to squeak by a higher seed, only to get throttled in the next round. To examine “big upset” teams making a run in the tournament, I ran the exact same procedure, but this time limiting the scope to only include teams which won multiple NCAA tournament games, with at least one of those games being a 4+ seed upset. Below we see what teams most resemble the average multi-win “big upset” team.
We see a lot of 2025 teams on the list. Unfortunately several, including Nevada, Minnesota, and Utah at the top, are highly unlikely to make the tournament, barring a big run to win their conference tournaments. If one of them does get in, however, be aware! VCU in 2011 ran all the way to the Final Four as an 11-seed, being the first team to do so coming out of the “First Four”/play-in round. The 2013-14 Dayton team advanced to the Elite 8 as an 11-seed, while last year’s N.C. State Wolfpack, as we previously discussed, went from off the bubble radar to in the Final Four. If one of the teams high up on this list sneaks in, watch out! Below is a table similar to the one above, but this time for the top 2025 multi-win underdog candidates with reasonable chance to make the tournament
While we will all surely be shocked by some upset in a few weeks, I hope this helps you somewhat in filling out your brackets. And if this helps you at all in winning some cash in your bracket pool, don’t forget about me =).
Their ACC Tournament run sadly began by beating my Syracuse Orange =(. Unfortunately, that is FAR from the worst moment of Syracuse Basketball over the past two seasons, but that’s a conversation for a different day…
The stats were pulled from the March Madness dataset on Kaggle
“Average Winning Underdog” being a team with the average values for all statistics among the winning underdog teams in the dataset.