You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Randomness, uncertainty, and the optimal college football championship tournament size

Abstract

Every year, there is a popular debate over how many teams should take part in the NCAA’s FBS-level college football championship tournament, and especially whether it should be expanded from 4 teams to 8 or even 12. The inherent tradeoff is that the larger the tournament, the higher the probability that the true best team is included (“validity”), but the lower the probability that the true best team will avoid being upset and win the tournament (“effectiveness”). Using simulation based on empirically-derived estimates of the ability to measure true team quality and the amount of randomness inherent in each game, we show that the effect of expanding the tournament to 8 teams could be very small, an effectiveness decrease of only 2-3% while increasing validity by 1-4%, while a 7-team tournament provides slightly better tradeoffs. A 12-team tournament would decrease effectiveness by 5-6%.

1Introduction

In 2012, the National Collegiate Athletic Association (NCAA) approved using a four-team postseason playoff tournament to determine a national champion in college football at the FBS level (the highest level of intercollegiate competition), starting in 2014. The decision effectively doubled the number of teams involved in the postseason tournament, and there was immediate discussion, which has continued through now, about whether an 8-team tournament (or larger) would be even better. In this paper, we address the question of the optimal size of this postseason tournament.

Each year since at least 1936, a national champion has been chosen among FBS (formerly Division I-A) college football teams. Initially, the champion was chosen by polls of experts. These expert polls included the Associated Press (AP) poll of sportswriters (the first major national poll) from 1936 to 1997, and various polls of coaches from 1950 to 1997 (e.g., United Press International from 1950 to 1990, USA Today/CNN from 1991 to 1996, and USA Today/ESPN in 1997). The highest-ranked team in the polls was declared national champion; in the few years that the AP and coaches’ polls disagreed, the national championship was shared between the two polls’ winners (National Collegiate Athletic Association). In 1998, a new system, the Bowl Championship Series (BCS), was created. In the BCS, expert poll rankings and analytical rankings (called “computer rankings”) were combined to determine a ranking of the top teams. From 1998 to 2005, the BCS-designated national champion was unofficial (and for the 2003 season, the AP poll chose a different champion). From 2005 to 2013, the top two teams in the BCS played in a designated postseason game, with the winner being named the official national champion (National Collegiate Athletic Association). Beginning in 2014, a new system has been in place: A panel of experts selects four teams to play a three-game, two-round single-elimination tournament, the winner of which is named national champion.

The progression from polls (effectively a one-team tournament) to the BCS championship game (a two-team tournament) to a four-team tournament has been motivated partly by economics (the paid attendance and television revenue of playoff games is substantial), but even more so by a grassroots feeling that it is necessary to determine a champion “on the field” (i.e., by playing games) rather than by poll or computer, because neither voters nor algorithms are guaranteed to identify the absolute best team(s). Before the BCS system, the top teams in the polls rarely played each other at the end of the season, and it could be difficult to differentiate between the best teams. The novelty of the BCS system was that the two highest-ranked teams were guaranteed to play each other at the end of the season, and the argument in favor of having even more teams in the championship tournament is that even the top two can be difficult to differentiate from a larger set of very good teams so letting the teams play each other is the most fair way to sort out which one is really the best. (Even in the BCS system, there could be significant disagreement as to the selection of the two playoff teams, for example in 2004 when three major-conference teams (USC, Oklahoma, and Auburn) each won all of their regular-season games.)

On the other hand, the outcomes of sporting events contain enough randomness that the winner of a game is not necessarily the better team, and it is possible that a committee, although it might make mistakes, could have a higher probability of correctly identifying the best team than playoff games that are subject to football’s inherent randomness. Today’s tournament selection committee has two advantages over polls of the past: better information (many more games are televised to a national audience, video recording/playback capability allows them to see multiple games that are played at the same time, and more and deeper statistical information is available about teams and players) and better analytics (many of the top quantitative rating and evaluation systems were not developed in the polling era). As a result, the playoff selection committee is likely to have a smaller error in its evaluation of teams than polls had in the past. The progression of tournament size has actually been opposite what intuition might suggest is optimal: As human experts have been given the tools to make better judgments and decrease their likelihood of error, the tournament size has expanded, increasing the chance that a team correctly identified as the best by the human experts will fail to win the tournament.

In this paper, we investigate the optimal size of the college football national championship tournament by taking into account the relative magnitudes of the randomness inherent in college football and the errors in team evaluation by humans and algorithms.

2Literature review

Optimal tournament design has been studied before, but none of the existing literature is sufficient to answer our research question. One main stream of optimal tournament size research (e.g., Dizdar 2013; Fullerton and McAfee 1999) has focused on the issue of effort, especially in research tournaments. Given assumptions on the technology and knowledge available to each firm that might enter a research competition, and on their probabilities of winning, these papers use game-theoretic models to estimate how much effort each competitor would spend, and use that analysis to determine the optimal number of participants, how to select participants, etc. Others (e.g., Chen, Ham, and Lim 2011; Hochtl et al. 2010; Sheremeta and Wu 2012) try to empirically test such predictions. These papers all start with the basic assumption that firms have more than one way to spend effort, so they might choose to put forth less effort in competitions where they are less likely to win (and thus a firm that is likely to succeed might also not need to put in maximum effort). In our work, we sidestep this issue, presuming that every team in a national championship tournament has just one football goal (to win the tournament) and will put forth maximum effort.

A second stream of research in designing optimal tournaments is not the composition, but rather the structure. Glenn (1960), Marchand (2002), Scarf and Bilbao (2006), and Seals (1963), among others, compare different tournament setups such as round-robin, pure knockout, and hybrids. In our work, we assume that the NCAA will retain its single-elimination (knockout) round-based format. Glickman (2008), Hwang (1982), and Schwenk (2000) look at adaptive approaches where tournaments may be re-seeded between rounds; in our work, we assume that the NCAA will not re-seed, so fans can make travel plans in advance (as is the case currently for the existing 68-team NCAA basketball tournament).

Other research, assuming a non-reseeded knockout tournament, looks at the optimality of the standard seeding of teams into tournament slots. Appleton (1995), Groh et al. (2012), Jennessy and Glickman (2016), Horen and Riezman (1985), Ryvkin (2005), and Vu (2010) investigate how to seed teams so that various objectives are optimized. For example, Horen and Riezman (1985) show that under some assumptions about team strength and head-to-head win probability, for a 3-round, 8-team tournament the standard seeding method does not maximize the probability that the best team will win. Jennessy and Glickman (2016) show the same empirically for 16-team tournaments, using a Bayesian approach that considers uncertainty in team strength. However, we assume that the NCAA will retain standard seeding, to retain fairness properties that Vu (2010) calls envy-freeness (that in every round, each team’s best-possible opponent must be weaker than the best-possible opponent of all lower-seeded teams) and delayed confrontation (that the top 2k teams may not play each other until only 2k or fewer teams remain in the tournament). We also assume that the NCAA will start every game with the standard 0-0 score, rather than giving one team some initial points based on an estimate of how much better they are than their opponent, as in the approach of Paine (2014).

The objective function when referring to an “optimal” tournament can be defined in different ways. Most research assumes a goal of maximizing the probability that the tournament winner will be the best team (Appleton 1995; David 1988; Glenn 1960; Glickman 2008; Jennessy and Glickman 2016; Hwang 1982; Marchand 2002; Schwenk 2000; Seals 1963; Vu 2010); others maximize the probability that the winner will be a certain team (Vu 2010), maximize the quality of the winner’s result in research tournaments (Dizdar 2013; Fullerton and McAfee 1999), minimize the fraction of unimportant games (Scarf and Bilbao 2006; Scarf and Shi 2008), maximize the average rank of the winner (Scarf and Bilbao 2006), maximize the average revenue of the tournament (Vu 2010), maximize the probability of the top two teams meeting in the final (Jennessy and Glickman 2016), maximize the consistency between expected number of wins and team strength (Jennessy and Glickman 2016), etc. Sokol (2010) also considered the number of significant upsets in a tournament as a driver of fan interest. In this paper, we consider two objectives. The primary objective is the probability of correctly identifying the best team, i.e., the probability that the best team wins the tournament; we refer to this as the effectiveness of the tournament. We also discuss secondarily the probability that the best team is selected to play in the tournament; we refer to this as the tournament’s validity.

Finally, and critically for our work, in the previous literature the information about each team, including its strength relative to competing teams, is almost always assumed to be known deterministically. Of all the work cited above, only Glickman (2008) and Jennessy and Glickman (2016) consider uncertainty in the strength of each team; their Bayesian models address how to seed (Jennessy and Glickman 2016) or re-seed (Glickman 2008) a tournament, not how many teams should be included.

So, none of the existing literature exactly addresses the question of how many teams should be in a tournament like the college football championship given both uncertainty in team strength estimation and randomness of game results.

The remainder of the paper is organized as follows: In Section 3, we describe our underlying models of the uncertainty and randomness in the system. In Section 4, we discuss how we populate our model with empirical data and simulate tournament results. Section 5 discusses parameterizing by the relative magnitudes of randomness and uncertainty, and we use the method of Curry and Sokol (2016) to estimate those current relative magnitudes and their effects on tournament outcomes. Finally, in Section 6 we show the simulation results, and in Section 7 we discuss the implications of our work for the optimal size of the national college football championship tournament and conclude with some final remarks.

3Models

In this section, we describe the core model. Here and in the remainder of the paper, we refer to a random variable by an uppercase letter and a specific realization of it by the corresponding lowercase letter (e.g., a would be a single draw from the random variable A).

We let g = (t1, t2) denote a college football game between teams t1 and t2, where the pair of teams is ordered lexicographically. Let st1True and st2True represent the true strengths of teams t1 and t2; we assume that those team strengths do not vary during a season. When teams t1 and t2 play each other in game g, the expected margin of victory (the line) lgTrue for team t1 over team t2 is the difference of the team strengths:

(1)
lgTrue=st1True-st2True.

In other words, in the absence of randomness, team t1 would beat team t2 by lgTrue points. Note that if t2 is a better (stronger) team than t1 at the time of game g, then lgTrue will be negative. (Equation (1) assumes the teams play on a neutral field, i.e., neither team has the advantage of playing at its home stadium. If one team is playing at home, an additional term for home-field advantage would be added.)

Of course, true team strengths stTrue are not exactly known, and different observers (e.g., experts, poll respondents, computer rating systems, etc.) will have different estimates of stTrue . Based on the results, game statistics, and observed play in previous games, each observer i has an estimate st,gi of the strength of team t at the time of game g. The difference between st,gi and stTrue is the observation error of observer i for team t, a random variable which we denote as et,gi . Therefore,

(2)
st,gi=stTrue+et,gi.

We assume that for each observer i, the values of e·,·i are independent and identically distributed across teams and games.

When teams t1 and t2 play each other in a game g on a neutral field, as is the case in the postseason tournament, the margin of victory predicted by observer i will be

(3)
lgi=st1,gi-st2,gi=(st1True+et1,gi)-(st2True+et2,gi)=lgTrue+(et1,gi-et2,gi).

The outcome of a game is assumed to be based on a random process; even a perfect observer (for whom st,gi=stTrue for all t and g) will be unable to exactly predict the margin of victory (i.e., the number of points by which team t1 wins game g). We denote by R the random variable for the error in the prediction, so that for game g = (t1, t2) the actual margin of victory mg of team t1 over team t2 is different from lgTrue by rg:

(4)
mg=lgTrue+rg=(st1True-st2True)+rg.

R includes all of the random factors that might affect the outcome of a football game, such as day-to-day performance variation, weather, the direction a loose ball bounces on the ground, etc. We assume that the values of r are independent and identically distributed across games, and that the distribution of et,gi is independent of the value stTrue .

In reality, there is no data to tell us true team strengths sTrue. Rearranging terms in Equation (3) and solving for lgTrue , or solving Equation (2) for stTrue and substituting into Equation (4), yield that for each observer i, the observed margin of victory is

(5)
mg=lgTrue+rg=(st1,gi-st2,gi)+(et2,gi-et1,gi+rg).

Thus, for game g, observer i’s prediction error xgi is

(6)
xgi=mg-(st1,gi-st2,gi)=et2,gi-et1,gi+rg.

There are a variety of observers who publish their estimates of team strengths s. In this paper, we first demonstrate the model using the Sagarin ratings from Sagarin as the observer, for two reasons: availability and quality. End-of-season ratings1 for each college football team are available in the format we need (where the difference between two teams’ ratings is the estimate of the margin of victory in a game between those teams) for all years from 1998-2019 (we omit 2020 because of the different schedules and playing conditions in the COVID year), and the empirical standard error in the Sagarin ratings’ margin-of-victory predictions (i.e., the observations of xSag) is less than a point higher than that of the Las Vegas betting line, a common standard for game prediction quality.

As we note in Section 4, both the Sagarin ratings’ predictions LSag and the Las Vegas line LVegas have normally-distributed errors XSag and XVegas with means that are not significantly different from zero. Others have also noted and/or used this normal distribution in football (e.g. Gill 2000; Berry 2003; Fanson 2020). Since X is empirically shown to be normally distributed with mean zero, we make the mild assumption that its components E (error in team-strength estimates) and R (in-game randomness) are also both normally distributed and all independent of each other, with variances σE·2 and σR2 , respectively, and mean zero for R. (Because ratings are relative to each other, the mean μE of E is not important.) Therefore, for each i (Sagarin and Vegas), for any game the three independent components of the observed prediction error xi are

(7)
et1,giN(μEi,σEi2),et2,giN(μEi,σEi2),andrgN(0,σR2)

so

(8)
XiN(0,2σEi2+σR2),
.

The fraction of the variance in Xi that is attributable to error in team strength estimates ( 2σEi2 ) and to in-game randomness ( σR2 ) is not known. In Section 5, we discuss how we parameterize our results on the fraction of variation attributable to in-game randomness, but first, in the next section, we discuss how we use our model to create the simulated tournaments that we use for our analysis.

4Simulating tournaments

Our tournament simulation has four basic steps:

  • 1. Draw observed team strengths sObs for each team from an empirical distribution of historical ratings. The observed team strengths correspond to the opinions of the tournament selection committee, so the teams chosen for the tournament and their seeding in the tournament are based on the observed team strengths.

  • 2. Generate true team strengths sTrue for each team based on the observed team strength and a randomly-generated observation error from the distribution of E. The sTrue are the teams’ actual strengths, so game outcomes in the simulated tournaments are based on these.

  • 3. Seed the tournament based on observed team strengths sObs.

  • 4. Simulate the winner and loser of each tournament game based on true team strengths sTrue and in-game randomness R.

Types of simulated tournaments

There are four types of tournament setups that we simulate. In some, like the current football playoff system, the top (observed) teams are the tournament participants regardless of whether or not they are champions of their conferences. We refer to this type of tournament as a fully-open tournament. Another approach, like the current NCAA basketball tournament system, is to guarantee participation to conference champions regardless of their ranking. We refer to this type of tournament as a partially-open tournament. For football, most proposals have been to guarantee a spot in the tournament only to winners of the “Power Five” conferences: the Atlantic Coast Conference (ACC), Big 12 Conference (Big 12), Big Ten Conference (Big Ten), Pac-12 Conference (Pac-12), and Southeastern Conference (SEC).

Some proposed tournament setups have included guaranteeing that the highest-ranked non-Power-Five team would be included in the tournament. This guarantee could be included in both fully-open and partially-open tournaments, yielding the full set of four tournament types that we test (see Table 1). The non-Power-Five teams are from the American Athletic Conference (AAC), Conference USA (C-USA), Mid-American Conference (MAC), Mountain West Conference (MWC), and Sun Belt Conference (Sun Belt), which collectively are called the “Group of Five” conferences, plus any teams that are independent (not playing in a conference, but part of the FBS; in our simulations we do not include Notre Dame in this category because they are viewed like a Power-Five team, and in fact they play in a Power-Five conference for non-football sports).

Table 1

Tournament types tested

Tournament typeTournament size kDescription
Fully-open1 ≤ k ≤ 128k highest-ranked teams regardless
of conference affiliation and conference champion status
Fully-open with non-Power-Five guaranteek = 1Highest-ranked non-Power-Five team
2 ≤ k ≤ 128Highest-ranked non-Power-Five team, and
k - 1 highest-ranked other teams
Partially-open1 ≤ k ≤ 5k highest-ranked Power-Five conference champions
6 ≤ k ≤ 128All five Power-Five conference champions, and
k - 5 highest-ranked other teams
Partially-open with non-Power-Five guarantee1 ≤ k ≤ 5k highest-ranked teams from among the
highest-ranked non-Power-Five team and
the Power-Five conference champions
k = 6All five Power-Five conference champions, and
highest-ranked non-Power-Five team
7 ≤ k ≤ 128All five Power-Five conference champions, and
highest-ranked non-Power-Five team, and
k - 6 highest-ranked other teams

Because we are going to compare different types of tournaments as well as tournament sizes, we split the process into two parts. In each run of the simulation, we first generate a set of teams with observed and real strengths, using Steps 1 and 2. Then, we simulate each type and size of tournament using Steps 3 and 4. We next describe in more detail each of the steps.

Drawing observed team strengths

We use Sagarin rating data for the past eleven years, 2009-2019, as the set of empirically observed team ratings (we use only ratings for teams in the NCAA’s FBS). There were 120 FBS teams in 2009-2011, 124 in 2012, 125 in 2013, 128 in 2014-2016, and 130 in 2017-2019. The ratings varied from a high of 105.35 (Clemson in 2016) to a low of 30.72 (Massachusetts in 2019). The overall distribution of ratings passes the Anderson-Darling and Kolmogorov-Smirnoff tests for normality, but we observed that the tails are not quite a good fit in the normal probability plot. Because the behavior of the upper tail (i.e., the best teams) is a primary focus of this paper, we therefore chose to not model the observed ratings with a normal distribution; instead, we used the eleven years (1383 data points) of Sagarin data as an empirical distribution. Tables 10, 11, and 12 in Appendix 1 show the full set of Sagarin ratings from 2009 to 2019.

For partially-open tournaments, it is important to know which teams are the Power-Five conference champions and which is the top-rated non-Power-Five team. Therefore, we keep that data separate, and draw from those empirical distributions separately. Tables 13 and 14 show the Power Five conference champions and top-rated non-Power-Five teams from 2009-2019.

For each of the simulated data sets (one for each run of the simulation), we draw stObs for each of 128 observed team strengths at the time of tournament selection: For each Power-Five conference we draw one rating from the set of its champions’ data, we draw one rating from the set of top-ranked non-Power-Five team ratings, and we draw the remaining 122 from the full set of remaining Sagarin ratings (excluding the Power-Five conference champions and the top-ranked non-Power-Five teams). The observed rating distributions for each Power-Five conference’s champion are sufficiently different that we draw once from each conference’s empirical distribution, rather than five times from the combined data set.

Generating true team strengths

To simulate games in a tournament, we need each team’s true strength stTrue . The values of stTrue are unknown (otherwise, there would be no need for a tournament to determine the best team); what is known is only stObs , each team’s observed strength. Therefore, we need to generate for the simulation a set of true strengths stTrue based on the observed strengths stObs .

Given the set of observed team strengths, we use a conditional probability approach to randomly generate true team strengths. The normality of the overall distribution of Sagarin ratings allows us to model the distribution of team observed strength SObs as the sum of independent draws from two iid normal distributions: the true strength STrue (a normal distribution with mean μSag and variance σTrue2 ) and the estimation error EObs (a normal distribution with mean 0 and variance σEObs2 ). Because we are using the Sagarin data, SObs = SSag, EObs = ESag, and σEObs2=σESag2 .

In Appendix 3, we show that STrue|SObs is normally distributed, according to

(9)
N(stObs-(stObs-μSag)σESag2σSag2,(σSag2-σESag2)σESag2σSag2).

In Equation (9), μSag and σSag2 are observed data, and stObs is drawn from Sagarin data as described above. In Section 5, we describe how we deal with the unknown σESag2 by parameterizing, bounding, and using a natural-experiment approach for estimation.

For a single value of σESag2 , we could generate stTrue from stObs by drawing from the distribution N(stObs-(stObs-μSag)σESag2σSag2,(σSag2-σESag2)σESag2σSag2) . However, to compare across multiple values of σESag2 , we instead generate a z-score zt for each team, and use the same z-score for each value of σESag2 .

Seeding the tournament

Because there are approximately 128 teams playing FBS-level football each year, and 128 is a convenient power of 2 for a single-elimination tournament, we use 128-team tournaments in our simulations. In a full 128-team tournament, the teams are seeded into a 7 round single-elimination structure, in order of observed rating. In the first round, the ith-highest-rated team plays against the (27 - i + 1)th-highest-rated team, for every i = 1, …, 27. In subsequent rounds, previous-round winners are matched so that, if higher-rated teams always win, the sum of the ranks of teams playing against each other in round r would always equal 27-r+1 + 1; otherwise, a lower-rated team that beats a higher-rated team would take the higher-rated team’s place in the next round. Figure 1 shows the structure of a 3-round size-8 tournament as an example. This is a common structure for single-elimination tournaments (for example, it is used in the NCAA basketball championship tournament).

Fig. 1

Structure of a 3-round size-8 tournament.

Structure of a 3-round size-8 tournament.

In a size-128 tournament with fewer than 128 teams, a team automatically advances to the next round if it has no opponent in the current round. For example, in Figure 1, if there were only three teams, Teams 1, 2, and 3 would have no opponents in the first round, so they would automatically advance to the second round. In the second round, Team 1 would again have no opponent (since it would normally play the winner of the game between Teams 4 and 5), so it would automatically advance to the third round, where it would play against the winner of the second-round game between Teams 2 and 3. Automatic advancement in the absence of an opponent is called a bye.

The teams that play in the tournament are selected as in Table 1, according to their observed team strengths. For example, in an 8-team partially-open tournament where Power-Five conference champions and the highest-rated Non-Power-Five team are all guaranteed places in the tournament, the eight teams in the tournament would be the five Power-Five conference champions, the highest-rated Non-Power-Five team, and the highest-rated two other teams. In partially-open tournaments, we assume that teams are seeded based on their ratings without regard to conference championship or Power-Five status, similar to the NCAA basketball tournament. For example, if a conference champion team is the 8th-highest-rated team out of those participating in the tournament, then that team will be seeded 8th despite being one of the first five teams that was automatically selected for the tournament.

Simulating the tournament

For each simulated tournament, we calculate the probability of each team winning based on the teams’ true (simulated) team strengths and the variance σR2 of in-game randomness. The calculation is straightforward. Let prt be the probability that Team t wins round r of the tournament (defining p0t = 1 for every team in the tournament), and qtu be the probability that Team t would beat Team u if they play head-to-head, so

(10)
qtu=Pr(N(stTrue-suTrue,σR2)>0)=Φ(stTrue-suTrueσR).

Let Ort be the set of all possible opponents for team t in round r. Then, for each round r and each team t,

(11)
prt=pr-1,tuOrtpr-1,uqtu.

Let t* be the team with the highest true strength among all teams. As defined earlier, each simulated tournament is valid if t* is one of the teams selected to play in the tournament, and the probability that the tournament is effective is equal to p7,t*.

5Parameterizing on randomness

The simulation procedure described above depends on two different parameters: the variance σR2 of the randomness in college football games, and the variance σEObs2 in the error in team strength estimations (the difference between the observed ratings and the true ones). Neither of those variances is known, but we can obtain results by parameterizing over σR and σEObs (which is really σESag because we use the Sagarin ratings as our observed team strengths). We test values of σR ∈ {0, 1, 2, …, 16} and σEObs ∈ {0, 1, 2, …, 13}. Appendix 2 shows the validity and effectiveness of each tournament from 1 to 128 teams, of each of the four types in Table 1, for each of the 17 × 14 = 238 pairs of parameter values.

Bounding and reducing the set of parameter values

The ability to parameterize can be valuable for extending this work to other tournaments; however, for the college football national championship tournament using Sagarin ratings as the observed team strengths, we can significantly reduce the relevant set of parameter values.

First, we deduce an upper bound on σEObs using the fact that each observed team strength is equal to the team’s real strength plus an error term. Because we assume the errors are iid, Equation (2) implies that σObs2=σTrue2+σEObs2 , so σObs2σEObs2 . Since the variance in Sagarin ratings is approximately 169, this gives the upper bound

(12)
σESag13.

A second upper bound on σEObs can be derived from Equation (8), the distribution of XObs, the error in the observed ratings’ predictions of each game’s margin of victory. Empirically, XSag is normally distributed with variance approximately 262 (ThePredictionTracker.com). Equation (8) implies that σXSag2=2σESag2+σR2 , which gives (for the Sagarin ratings) a tighter upper bound:

(13)
σESag262211.4.

Equation (8) also provides a value for σR given a value of σESag:

(14)
σR=262-2σESag2.

Finally, we can also derive an approximate lower bound on reasonable values of σEObs. The Sagarin ratings use only game-score information as input2. The margin of victory in each game played by team i gives an observation of team i’s true strength relative to its opponent, but that observation is wrong by some amount equal to the effect of the in-game randomness, i.e., a normal random variable with mean zero and variance σR2 .

The distribution of the average difference between observed margin mg and true strength difference lgTrue in k games played by team i will have variance σR2k , so the error in the observer’s estimate of team i’s strength will have at least that much variance. (It might have more, if the observer imperfectly converts game-score information to team-strength estimates, but as a lower bound the error in the observer’s team strength estimate will have variance at least σR2k .) Since σESag2σR2k and σR2+2σESag2=σXSag2 , we can derive a bound of σESag2σXSag2k+2 . By the time teams are chosen for the national championship tournament, most teams will have played 11 games (some may have played one or two fewer games, or one or two more games), which yields an approximate bound of

(15)
σESag262134.5.

Taken together, the bounds yield 4.5 ≤ σESag ≤ 11.4.

Estimating the actual value of σESag

We know from Equation (8) that our model’s variance in prediction error is equal to 2σESag2+σR2 ; however, the amount of variance attributable to error in team strength estimates ( 2σESag2 ) and to in-game randomness ( σR2 ) is not known. We follow the natural-experiment methodology of Curry and Sokol (2016) to estimate the relative magnitudes of randomness and uncertainty that comprise the total variance in Xi (the error in observer i’s predicted margin of victory). As in Curry and Sokol (2016), we exploit the rare cases in which two teams played a same-year rematch (i.e., they played each other twice in the same college football season); this allows us to estimate σESag and σR2 without the need to try to estimate stTrue , which would introduce another source of error. We found 63 such matchups from 1997 through 2019, and obtained data on the location, Las Vegas line (predicted margin of victory), and actual margin of victory from OddsShark.com. We used the Las Vegas line because Sagarin data was not fully available, but the two are similar: From 2009 to 2019 the estimation error of the Las Vegas line (ThePredictionTracker.com) was normally distributed with variance σXVegas2=243 and mean not significantly different from zero. Because the Las Vegas and Sagarin estimates are both subject to the exact same in-game randomness, we could attribute the difference of σXSag2-σXVegas2=262-243=19 in their error variances entirely to error in team strength estimation3.

The rematch data is shown in Appendix 4, in Table 15. We first adjust each line and each outcome by 3 points in favor of the road team to account for the value of playing at home, which models generally value at approximately three points (see, for example, (Sagarin)). We then use the models of Curry and Sokol (2016) to estimate the fraction of variance in the line estimation error that is due to randomness. Their models’ maximum likelihood estimates are that approximately 167 or 194 of the variance in the line estimation error is due to σR2 , the in-game randomness4. As a result, the estimates of σESag are 262-19426 and 262-16727 .

Even having an estimate for the randomness component of total variance and an estimate for the Sagarin ratings’ estimation error, we still do not know the variance contributed by the error in the tournament selection committee’s evaluation of teams. Of course, zero is a (unattainable) lower bound, but we test other values of σE as well. Specifically, we test values of σE equal to 0 (a perfect committee), 7 (approximately equal to the higher estimate of the Sagarin ratings’ error), and every integer in between.

6Results

Tables 6, 7, 8, and 9 show the results of our simulations. The results show the expected tradeoffs: The larger the tournament, the higher the validity, while effectiveness varies depending on how much of the observed prediction error is due to randomness and how much is due to incorrect team strength estimates.

Table 2

Validity of tournaments when σE ∈ {0, 1, 2, 3, 4, 5, 6, 7} and σR2=167

SizeFully-open
01234567
1100.089.079.668.460.250.943.835.2
2100.099.195.688.381.572.063.153.3
3100.099.998.694.689.281.172.562.3
4100.0100.099.697.293.887.579.269.8
5100.0100.099.898.596.191.384.175.6
6100.0100.099.999.196.792.486.278.8
7100.0100.0100.099.498.194.589.382.8
8100.0100.0100.099.898.996.391.885.8
9100.0100.0100.099.899.397.093.088.0
10100.0100.0100.099.999.597.894.189.9
11100.0100.0100.0100.099.798.295.391.3
12100.0100.0100.0100.099.899.096.492.6
13100.0100.0100.0100.099.899.196.993.6
14100.0100.0100.0100.099.899.197.294.3
15100.0100.0100.0100.099.899.497.995.2
16100.0100.0100.0100.099.999.698.596.1
17-128100.0-100.0100.0-100.0100.0-100.0100.0-100.0100.0-100.099.7-100.098.6-100.096.4-100.0
Fully-open with non-Power 5 guarantee
01234567
10.00.00.00.20.40.50.61.1
2100.089.079.668.660.651.444.436.3
3100.099.195.688.581.972.563.754.4
4100.099.998.794.789.581.773.263.4
5100.0100.099.697.294.087.979.770.9
6100.0100.099.898.596.291.584.476.4
7100.0100.099.999.297.093.086.980.0
8100.0100.0100.099.498.394.989.883.4
9100.0100.0100.099.899.196.692.386.5
10100.0100.0100.099.899.497.293.488.8
11100.0100.0100.099.999.598.094.790.9
12100.0100.0100.0100.099.798.595.791.7
13100.0100.0100.0100.099.899.196.893.1
14100.0100.0100.0100.099.899.197.194.2
15100.0100.0100.0100.099.899.197.394.6
16100.0100.0100.0100.099.999.698.495.9
17-128100.0-100.0100.0-100.0100.0-100.0100.0-100.099.9-100.099.6-100.098.5-100.096.3-100.0
Partially-open
01234567
184.276.970.561.254.646.439.731.8
284.283.681.575.870.863.355.947.5
384.284.082.178.374.067.460.853.2
484.284.082.178.774.969.163.456.3
584.284.082.279.275.469.964.857.8
6100.099.498.195.491.886.280.272.4
7100.0100.099.898.495.591.285.377.7
8100.0100.099.998.896.793.588.682.0
9100.0100.0100.099.798.195.391.586.0
10100.0100.0100.099.898.996.693.188.0
11100.0100.0100.099.899.397.594.490.3
12100.0100.0100.099.999.698.295.691.9
13100.0100.0100.0100.099.898.996.893.4
14100.0100.0100.0100.099.899.097.394.3
15100.0100.0100.0100.099.899.398.195.4
16100.0100.0100.0100.099.999.498.395.8
17-128100.0-100.0100.0-100.0100.0-100.0100.0-100.099.9-100.099.6-100.098.6-100.096.4-100.0
Partially-open with non-Power 5 guarantee
01234567
184.276.970.561.254.646.439.731.8
284.283.681.576.071.063.255.847.3
384.284.082.178.574.367.560.953.3
484.284.082.178.875.269.363.456.5
584.284.082.279.475.870.365.258.5
684.284.082.279.475.870.465.458.9
7100.099.498.295.792.386.880.973.5
8100.0100.099.898.495.791.585.778.5
9100.0100.099.998.896.993.889.082.9
10100.0100.0100.099.798.295.692.086.7
11100.0100.0100.099.899.297.093.789.1
12100.0100.0100.099.899.497.794.991.0
13100.0100.0100.099.999.698.396.092.4
14100.0100.0100.0100.099.899.097.193.8
15100.0100.0100.0100.099.899.197.694.9
16100.0100.0100.0100.099.999.498.395.8
17-128100.0-100.0100.0-100.0100.0-100.0100.0-100.099.9-100.099.5-100.098.4-100.096.1-100.0
Table 3

Effectiveness of tournaments when σE ∈ {0, 1, 2, 3, 4, 5, 6, 7} and σR2=167

SizeFully-open
01234567
1100.089.079.668.460.250.943.835.2
260.560.759.957.254.649.945.139.2
363.461.359.556.754.250.246.140.4
447.047.247.547.647.546.043.339.3
547.847.847.847.647.245.943.540.0
648.348.448.247.747.145.543.240.0
750.349.348.347.346.545.043.140.2
841.942.042.142.242.342.040.838.8
942.342.242.142.042.041.540.338.6
1042.242.242.242.141.941.440.238.6
1142.242.242.242.242.041.540.238.7
1242.642.742.642.442.241.640.238.6
1343.643.543.342.942.441.640.238.5
1443.943.943.643.142.541.540.138.4
1544.844.343.742.842.041.039.738.0
1640.940.840.640.339.839.138.337.0
17-12840.7-42.640.7-42.440.5-41.940.0-41.339.3-40.538.2-39.436.8-38.135.1-36.8
Fully-open with non-Power 5 guarantee
01234567
10.00.00.00.20.40.50.61.1
288.879.371.061.254.145.939.432.1
364.563.561.858.355.149.944.838.8
456.555.454.552.650.847.543.738.7
549.249.049.048.648.146.243.339.4
648.048.248.347.947.445.943.440.0
749.649.048.547.746.845.242.939.8
845.244.944.544.143.842.941.238.8
943.042.942.742.642.541.940.638.6
1042.142.242.242.242.141.540.238.6
1142.142.242.242.242.041.540.238.8
1242.542.642.642.442.241.640.338.6
1343.543.443.242.942.441.740.338.6
1443.943.943.643.142.541.640.238.5
1544.744.343.742.942.141.139.738.0
1641.441.341.040.640.139.438.537.1
17-12840.7-42.640.7-42.440.5-41.940.0-41.339.3-40.538.2-39.436.8-38.135.1-36.8
Partially-open
01234567
184.276.970.561.254.646.439.731.8
253.453.553.050.448.444.540.335.2
356.455.153.450.448.044.140.335.6
446.646.445.644.142.640.237.533.9
547.847.446.444.642.940.337.634.0
651.050.750.149.047.945.642.939.3
750.349.749.148.247.245.442.939.4
844.944.844.644.243.642.540.938.4
943.743.643.343.142.741.840.638.7
1042.542.642.542.442.241.540.338.5
1142.242.242.242.242.141.540.338.7
1242.542.642.642.442.241.540.338.7
1343.543.443.242.942.441.640.438.8
1443.943.943.643.142.541.540.238.6
1544.744.343.742.942.141.139.938.3
1641.341.341.040.640.139.438.537.1
17-12840.7-42.640.7-42.440.5-41.940.0-41.339.3-40.638.2-39.436.8-38.135.1-36.8
Partially-open with non-Power 5 guarantee
01234567
184.276.970.561.254.646.439.731.8
253.453.452.950.548.544.540.335.1
356.355.053.350.348.044.140.335.6
446.146.045.243.842.540.037.233.8
547.246.845.844.242.640.037.334.0
647.247.046.144.542.840.137.434.0
752.151.350.349.047.745.242.538.9
846.546.346.245.645.043.541.338.2
944.944.744.443.943.242.140.538.1
1043.143.243.042.942.541.640.438.5
1142.342.442.442.442.341.640.438.6
1242.542.542.542.442.341.640.338.7
1343.443.343.242.942.441.640.338.6
1443.843.843.643.142.541.740.438.7
1544.644.243.642.942.241.239.938.3
1641.741.641.340.940.339.638.637.2
17-12840.7-42.640.7-42.440.5-41.940.0-41.339.3-40.638.2-39.436.8-38.135.1-36.8
Table 4

Validity of tournaments when σE ∈ {0, 1, 2, 3, 4, 5, 6, 7} and σR2=194

SizeFully-open
01234567
1100.089.079.668.460.250.943.835.2
2100.099.195.688.381.572.063.153.3
3100.099.998.694.689.281.172.562.3
4100.0100.099.697.293.887.579.269.8
5100.0100.099.898.596.191.384.175.6
6100.0100.099.999.196.792.486.278.8
7100.0100.0100.099.498.194.589.382.8
8100.0100.0100.099.898.996.391.885.8
9100.0100.0100.099.899.397.093.088.0
10100.0100.0100.099.999.597.894.189.9
11100.0100.0100.0100.099.798.295.391.3
12100.0100.0100.0100.099.899.096.492.6
13100.0100.0100.0100.099.899.196.993.6
14100.0100.0100.0100.099.899.197.294.3
15100.0100.0100.0100.099.899.497.995.2
16100.0100.0100.0100.099.999.698.596.1
17-128100.0-100.0100.0-100.0100.0-100.0100.0-100.0100.0-100.099.7-100.098.6-100.096.4-100.0
Fully-open with non-Power 5 guarantee
01234567
10.00.00.00.20.40.50.61.1
2100.089.079.668.660.651.444.436.3
3100.099.195.688.581.972.563.754.4
4100.099.998.794.789.581.773.263.4
5100.0100.099.697.294.087.979.770.9
6100.0100.099.898.596.291.584.476.4
7100.0100.099.999.297.093.086.980.0
8100.0100.0100.099.498.394.989.883.4
9100.0100.0100.099.899.196.692.386.5
10100.0100.0100.099.899.497.293.488.8
11100.0100.0100.099.999.598.094.790.9
12100.0100.0100.0100.099.798.595.791.7
13100.0100.0100.0100.099.899.196.893.1
14100.0100.0100.0100.099.899.197.194.2
15100.0100.0100.0100.099.899.197.394.6
16100.0100.0100.0100.099.999.698.495.9
17-128100.0-100.0100.0-100.0100.0-100.0100.0-100.099.9-100.099.6-100.098.5-100.096.3-100.0
Partially-open
01234567
184.276.970.561.254.646.439.731.8
284.283.681.575.870.863.355.947.5
384.284.082.178.374.067.460.853.2
484.284.082.178.774.969.163.456.3
584.284.082.279.275.469.964.857.8
6100.099.498.195.491.886.280.272.4
7100.0100.099.898.495.591.285.377.7
8100.0100.099.998.896.793.588.682.0
9100.0100.0100.099.798.195.391.586.0
10100.0100.0100.099.898.996.693.188.0
11100.0100.0100.099.899.397.594.490.3
12100.0100.0100.099.999.698.295.691.9
13100.0100.0100.0100.099.898.996.893.4
14100.0100.0100.0100.099.899.097.394.3
15100.0100.0100.0100.099.899.398.195.4
16100.0100.0100.0100.099.999.498.395.8
17-128100.0-100.0100.0-100.0100.0-100.0100.0-100.099.9-100.099.6-100.098.6-100.096.4-100.0
Partially-open with non-Power 5 guarantee
01234567
184.276.970.561.254.646.439.731.8
284.283.681.576.071.063.255.847.3
384.284.082.178.574.367.560.953.3
484.284.082.178.875.269.363.456.5
584.284.082.279.475.870.365.258.5
684.284.082.279.475.870.465.458.9
7100.099.498.295.792.386.880.973.5
8100.0100.099.898.495.791.585.778.5
9100.0100.099.998.896.993.889.082.9
10100.0100.0100.099.798.295.692.086.7
11100.0100.0100.099.899.297.093.789.1
12100.0100.0100.099.899.497.794.991.0
13100.0100.0100.099.999.698.396.092.4
14100.0100.0100.0100.099.899.097.193.8
15100.0100.0100.0100.099.899.197.694.9
16100.0100.0100.0100.099.999.498.395.8
17-128100.0-100.0100.0-100.0100.0-100.0100.0-100.099.9-100.099.5-100.098.4-100.096.1-100.0
Table 5

Effectiveness of tournaments when σE ∈ {0, 1, 2, 3, 4, 5, 6, 7} and σR2=194

SizeFully-open
01234567
1100.089.079.668.460.250.943.835.2
259.859.959.156.353/749.044.338.5
362.660.558.655.853.249.245.239.5
445.545.646.046.146.044.541.938.1
546.346.346.346.145.744.442.138.7
646.846.946.746.245.644.041.838.7
748.847.746.845.744.943.441.638.7
839.839.940.040.240.440.038.937.1
940.140.140.140.140.139.638.536.9
1040.140.140.140.140.039.638.436.9
1140.140.240.240.240.139.638.537.0
1240.640.640.640.540.239.738.536.9
1341.641.541.341.040.539.738.436.8
1441.941.941.741.140.539.638.336.6
1542.942.341.740.840.039.137.836.2
1638.438.438.237.937.537.036.235.0
17-12838.2–40.438.2-40.238.0-39.737.6-39.236.9-38.435.9-37.334.5-36.132.9-34.8
Fully-open with non-Power 5 guarantee
01234567
10.00.00.00.20.40.50.61.1
287.377.969.760.253.345.238.831.6
364.163.061.157.554.249.044.038.0
454.853.852.951.149.446.142.437.5
547.847.647.647.246.644.841.938.2
646.646.846.846.546.044.542.038.7
748.247.546.946.145.243.641.438.3
843.142.742.442.141.941.039.537.1
940.940.840.740.640.640.138.836.9
1040.040.140.240.240.139.638.436.9
1140.040.140.240.240.139.638.437.0
1240.540.540.640.540.339.838.536.9
1341.541.441.340.940.539.838.536.8
1441.941.941.641.140.639.738.436.8
1542.842.341.740.940.139.137.836.2
1638.938.938.638.337.837.336.435.1
17-12838.2-40.438.2-40.238.0-39.737.6-39.236.9-38.435.9-37.334.5-36.132.9-34.8
Partially-open
01234567
184.276.970.561.254.646.439.731.8
252.752.752.249.647.643.839.734.6
355.654.252.549.547.143.339.534.8
445.144.944.142.741.338.936.332.8
546.245.944.943.241.639.036.532.9
649.549.248.647.646.444.241.638.0
748.948.247.646.645.643.841.438.0
842.842.742.542.241.640.739.236.8
941.641.541.341.140.740.038.837.0
1040.440.540.540.540.339.738.536.8
1140.140.240.240.240.139.638.537.0
1240.540.640.640.540.339.638.537.0
1341.541.441.341.040.539.838.637.0
1441.941.941.641.240.639.738.436.8
1542.842.341.740.940.139.238.136.4
1638.938.838.638.337.837.236.435.1
17-12838.2-40.438.2-40.238.0-39.737.6-39.236.9-38.435.9-37.434.5-36.132.9-34.8
Partially-open with non-Power 5 guarantee
01234567
184.276.970.561.254.646.439.731.8
252.652.752.149.747.743.739.634.4
355.554.152.449.547.243.339.534.8
444.644.443.742.341.138.736.132.7
545.645.344.442.841.338.736.232.9
645.745.644.743.141.638.936.232.9
750.649.848.947.546.243.741.137.6
844.444.344.243.743.041.739.636.6
942.842.642.441.941.340.338.736.5
1041.041.141.040.940.539.838.636.8
1140.240.340.440.540.439.838.636.9
1240.440.540.540.540.439.738.637.0
1341.441.441.240.940.539.738.536.9
1441.841.841.641.240.639.838.636.9
1542.742.241.740.940.239.238.136.5
1639.239.239.038.638.137.536.635.2
17-12838.2-40.438.2-40.238.0-39.737.6-39.236.9-38.435.9-37.434.5-36.132.9-34.8
Table 6

Simulation results for σE ∈ {3, 4, 5} in 7- and 8-team tournaments when σR2=167 0

Tournament typeStandard error of committee team strength estimate
σE = 3σE = 4σE = 5
ValidityEffectivenessValidityEffectivenessValidityEffectiveness
4-Team fully-open (current system)97.247.693.847.587.546.0
8-Team fully-open99.842.298.942.396.342.0
8-Team partially-open99.444.198.343.894.942.9
8-Team fully-open w/ non-Power 598.844.296.743.693.542.5
8-Team partially-open w/ non-Power 598.445.695.745.091.543.5
7-Team fully-open99.447.398.146.594.545.0
7-Team partially-open98.448.295.547.291.245.4
7-Team fully-open w/ non-Power 599.247.797.046.893.045.2
7-Team partially-open w/ non-Power 595.749.092.347.786.545.2
Table 7

Simulation results for σE ∈ {3, 4, 5} in 7- and 8-team tournaments when σR2=194

Tournament typeStandard error of committee team strength estimate
σE = 3σE = 4σE = 5
ValidityEffectivenessValidityEffectivenessValidityEffectiveness
4-Team fully-open (current system)97.246.193.846.087.544.5
8-Team fully-open99.840.298.940.496.340.0
8-Team partially-open98.842.296.741.693.540.7
8-Team fully-open w/ non-Power 599.442.198.341.994.941.0
8-Team partially-open w/ non-Power 598.543.795.743.091.541.7
7-Team fully-open99.445.798.144.994.543.4
7-Team partially-open98.446.695.545.691.243.8
7-Team fully-open w/ non-Power 599.246.197.045.293.043.6
7-Team partially-open w/ non-Power 595.747.592.346.286.843.7
Table 8

Simulation results for σE ∈ {3, 4, 5} in 12-team tournaments when σR2=167

Tournament typeStandard error of committee team strength estimate
σE = 3 σE = 4 σE = 5
ValidityEffectivenessValidityEffectivenessValidityEffectiveness
4-Team fully-open (current system)97.247.693.847.587.546.0
12-Team fully-open99.842.498.942.296.341.6
12-Team partially-open99.442.498.342.294.941.5
12-Team fully-open w/ non-Power 598.842.496.742.293.541.6
12-Team partially-open w/ non-Power 598.442.495.742.391.541.6
Table 9

Simulation results for σE ∈ {3, 4, 5} in 12-team tournaments when σR2=194

Tournament typeStandard error of committee team strength estimate
σE = 3 σE = 4 σE = 5
ValidityEffectivenessValidityEffectivenessValidityEffectiveness
4-Team fully-open (current system)97.246.193.846.087.544.5
12-Team fully-open100.040.599.840.299.039.7
12-Team partially-open99.940.599.640.398.239.6
12-Team fully-open w/ non-Power 5100.040.599.740.398.539.8
12-Team partially-open w/ non-Power 599.840.599.440.497.739.7

Even in the case where the standard error of the committee’s team strength estimates is as high as 7 points, the validity results for fully-open tournaments show that the true best team is about 35% likely to be ranked highest; of course, as the standard error decreases, the validity increases to the expected maximum of 100% when the committee makes no errors.

As a result, except where the only team to automatically qualify for the tournament is the top non-Power-Five team (which is unlikely to be the true best), a 1-team tournament clearly has the highest effectiveness as long as the committee’s standard error of team strength estimate is 4 points or lower. When that standard error is 5, the 1-team tournament generally has the highest effectiveness by 1-2%, and for standard errors of 6 or 7 larger tournaments are better. For all tournament types and sizes, the effectiveness of the most-effective tournament decreases as the standard error of the committee’s estimates increases.

Another consequence of the committee’s estimation error being relatively low is that the simulation results show effectiveness tiers not by number of rounds of a tournament (e.g., tournaments of size 5-8 require three rounds to determine a champion; 9-16 teams require 4 rounds, etc.), but by the number of games that the top-ranked team needs to play. For example, the effectiveness of a 3-round tournament with 8 teams is more similar to the effectiveness of a 4-round tournament with 9 teams than to a 3-round tournament with 7 teams. The reason is that in a 7-team tournament, the top-ranked team (which is reasonably likely to be the true best team) gets a bye in the first round; without an 8th team, the top-ranked team automatically advances to the second round. So, the top-ranked team has only two chances for in-game randomness to cause it to be upset, whereas in an 8- or 9-team tournament, the top-ranked team has three such chances.

Our results show, therefore, tiers of similar effectiveness: Tournaments of size 2 or 3 have similar effectiveness, as do tournaments of size 4 through 7, tournaments of size 8 through 15, and tournaments of size 16 and greater. (Tournaments of size 32 to 63, size 64 to 127, and size 128 each require the top-ranked team to play an additional game; however, the probability of the top-ranked team defeating the 32nd, 64th, or 128th-ranked team is sufficiently high even with in-game randomness that there is not much impact on effectiveness.)

All of these observations hold whether the in-game randomness has a variance of 167 or 194, just with slightly different magnitudes.

7Discussion

The current playoff system is a fully-open 4-team tournament. Most of the main criticisms and defenses of this tournament system can be phrased in the language of effectiveness and validity. Validity-based arguments, that the best team might be left out of the tournament without expansion and/or guarantees of inclusion, include that the current tournament might leave out the true best team in some years, every Power-Five conference winner should have a chance to play for the championship, and top non-Power-Five teams deserve a chance. Effectiveness arguments, that in an expanded tournament the best team might be less likely to win, include that the field might be diluted in an expanded tournament. Aside from validity and effectiveness, there are also fan-based and economic-based arguments: Fans want to see the championship decided by teams playing each other, and a larger tournament with more playoff games might also have the economic benefit of increased revenue and increase the number of fans who can attend a playoff game. In this section, we address the first two sets of questions by observing our simulated tournaments’ validity and effectiveness, and then discuss the implications on the fan-based and economic arguments.

In Section 6, we show simulation results for committees that range from perfect (σE = 0) to approximately equal to the Sagarin ratings (σE = 7), but we believe it is unlikely that either extreme is correct. In this section, we consider the middle range of committee quality, σE ∈ {3, 4, 5}.

Tables 2 and 3 show the validity and effectiveness of the current 4-team fully-open tournament, as well as the validity and effectiveness of all four types of 8-team tournaments we tested. Increasing the tournament from 4 to 8 while retaining its fully-open character decreases effectiveness by 4-6%, while increasing validity by 2-9%. However, the need for tradeoff decreases when all conference champions and the top non-Power-Five team are guaranteed spots in an 8-team tournament; in that case, effectiveness decreases by just 2-3% while validity increases by 1-4%. In essence, the simulation results suggest that substituting an 8-team tournament with guarantees for conference champions and the top non-Power-Five team does not create significant changes in validity or effectiveness. The effectiveness is not significantly decreased by the increase in number of teams, and while the validity does not increase significantly, giving opportunities to all conference champions and the top non-Power-Five team does not hurt effectiveness.

Tables 2 and 3 also show that 7-team tournaments have the potential for good effectiveness/validity tradeoffs. A 7-team fully-open tournament provides validity increases of 2-7% while decreasing effectiveness by just 0.3-1.1% compared with the current 4-team fully-open tournament, and if σE is 3, a 7-team tournament with guarantees for conference champions provides small increases for both validity and effectiveness.

On the other hand, newer proposals for a 12-team tournament (e.g., CollegeFootballPlayoff.com (2021)) would have a greater impact on validity and effectiveness. Tables 4 and 5 show the validity and effectiveness of the current 4-team fully-open tournament as well as the validity and effectiveness of all four types of 12-team tournaments we tested. The simulation results show that expanding the tournament to 12 teams would decrease effectiveness by 5-6% no matter what, and increase validity by 2-12%. Unlike expansion to 7 or 8 teams where there might be little difference, changing from a 4-team tournament to a 12-team tournament includes a definite effectiveness/validity tradeoff in addition to the economic and fan effects.

Overall, when considering only 4-team and 8-team tournaments, our simulations suggest that the annual debate over tournament size may be much ado about nothing. Replacing the current 4-team fully-open tournament with an 8-team tournament with guarantees for conference champions and the top non-Power-Five team is likely to lead only to small changes in both validity and effectiveness. As a result, decision-makers can give full consideration to fan and economic issues. It is also possible to obtain increased validity with only a very small effectiveness change by switching to a 7-team partially-open tournament, albeit with one fewer playoff game (and the resulting fan and economic effects) than an 8-team tournament would require. On the other hand, if 12-team tournaments are under consideration, there is a distinct validity/effectiveness tradeoff involved: the tournament would be 2-12% more likely to include the true best team, but that true best team would be 5-6% less likely to be correctly identified by winning the championship.

Acknowledgements

The authors would like to thank two anonymous referees and an editor for their suggestions for strengthening this paper.

Appendices

Appendix 1:

Sagarin rating data

Table 10

Sagarin ratings, 2009-2012

2009201020112012
Alabama100.25Auburn98.06Alabama104.17Alabama99.40
Florida95.75Stanford98.05LSU100.30Oregon93.91
Texas92.39Oregon96.98Oklahoma State97.01Texas A&M93.31
TCU90.16TCU94.69Oklahoma92.48Georgia92.15
Boise State89.35Alabama94.30Oregon91.82Notre Dame91.08
Ohio State88.35Boise State93.03Arkansas91.06South Carolina90.10
Virginia Tech87.69Ohio State92.75Stanford90.48Florida89.84
Cincinnati86.84LSU91.16Wisconsin88.67Kansas State88.98
Iowa85.82Arkansas88.77Boise State88.53Stanford87.85
Penn State85.43Oklahoma88.72South Carolina88.23LSU87.63
Oregon85.27Oklahoma State87.57Michigan86.54Florida State87.50
Georgia Tech84.60Wisconsin86.99Southern California86.48Oklahoma85.70
LSU84.22Virginia Tech86.10Baylor86.01Ohio State85.37
Nebraska84.06Florida State85.19Texas A&M85.87Clemson85.21
BYU83.60Mississippi State84.81Houston85.26Oregon State83.78
Pittsburgh83.50Nevada84.44Missouri85.25Texas83.63
Oklahoma83.15Missouri83.04Texas85.21Oklahoma State83.56
Arkansas82.49NC State82.85Michigan State85.13Baylor82.64
Mississippi82.38Notre Dame82.47Kansas State84.56Utah State82.41
Southern California81.95Texas A&M82.37TCU84.13Michigan82.25
Miami-Florida81.72Iowa82.35Georgia84.01Northwestern81.72
Clemson81.54Southern California81.93West Virginia81.78Nebraska81.06
Wisconsin81.21Arizona State81.54Florida State81.16Wisconsin81.02
Utah80.59Florida81.34Southern Miss81.07Mississippi80.91
Texas Tech80.25South Carolina81.11Nebraska81.06Vanderbilt80.77
Georgia80.08Utah80.56Notre Dame79.91BYU80.26
Auburn79.43Nebraska80.23Virginia Tech79.14Louisville79.98
Connecticut79.42Washington80.09Penn State78.82Arizona State79.87
Stanford78.89Oregon State79.99Florida78.78San Jose State79.87
Florida State78.56Arizona79.29Cincinnati78.63Penn State78.83
West Virginia78.55Michigan State79.28Mississippi State78.43UCLA78.51
North Carolina78.47Pittsburgh78.76Clemson77.78TCU78.39
Oregon State78.35California78.29Auburn77.40Southern California78.32
Air Force78.21San Diego State78.29BYU77.18Michigan State78.15
Tennessee77.96Tulsa78.03Tulsa77.02Cincinnati77.81
Arizona77.64West Virginia78.03Rutgers76.98Texas Tech77.71
Navy77.51Air Force77.94California76.20Syracuse77.61
South Florida77.28Maryland77.85Utah75.92Northern Illinois77.33
Oklahoma State77.28Illinois77.78Toledo75.21Missouri77.14
Rutgers76.98Miami-Florida76.91Arizona State74.94Boise State76.99
Central Michigan76.48North Carolina76.77Vanderbilt74.94Mississippi State76.80
South Carolina76.43Central Florida75.85Iowa State74.64North Carolina76.70
Boston College76.14Texas Tech74.50Washington74.52Arizona76.18
Mississippi State76.04South Florida74.43Iowa74.51Central Florida(UCF)76.17
California75.77BYU74.35Temple74.50Georgia Tech75.92
Kentucky75.35Northern Illinois74.02Northern Illinois74.46Tulsa75.81
Notre Dame75.31Hawaii73.74Louisiana Tech74.34West Virginia75.44
UCLA75.03Syracuse73.68Ohio State74.19Miami-Florida75.27
East Carolina74.52Boston College73.10Tennessee74.09Arkansas State75.25
Washington74.34Penn State73.05SMU73.88Louisiana Tech75.15
Houston73.54Louisville73.02Miami-Florida73.61Washington75.05
Missouri73.42Clemson72.82Texas Tech73.17Rutgers74.62
Michigan State73.28Georgia72.77North Carolina72.86Virginia Tech74.37
Wake Forest72.55Navy72.71Illinois72.80Iowa State74.15
Texas A&M71.72Connecticut72.68Georgia Tech72.31Tennessee73.50
Fresno State71.39Kansas State72.15NC State71.70SMU72.94
Northwestern70.95Michigan71.91Northwestern71.35Fresno State72.41
Kansas70.88Baylor71.49South Florida71.02Pittsburgh72.40
Middle Tennessee70.87UCLA71.41Nevada70.89Kent State72.25
Minnesota70.61Tennessee70.71Louisville70.75Utah71.27
SMU70.35Texas70.52Arizona70.52Louisiana-Lafayette71.07
Iowa State70.30Iowa State69.25Purdue70.28NC State70.65
Arizona State70.11Temple69.08UCLA70.18Arkansas70.64
Kansas State69.79Cincinnati68.96Arkansas State69.89San Diego State69.94
Troy69.60Colorado68.82Ohio69.81Ball State69.71
Nevada69.43Southern Miss68.58Pittsburgh69.65Minnesota69.22
Central Florida69.34Georgia Tech68.32San Diego State69.30Toledo69.04
Temple69.24Northwestern68.08Virginia69.17Iowa68.79
Virginia69.01Army67.64Air Force68.58Purdue68.63
NC State68.85Kentucky67.62Wake Forest68.42Duke67.97
Purdue68.60Miami-Ohio67.24Navy68.37Bowling Green67.90
Southern Miss68.07Fresno State67.07Connecticut68.09Ohio67.59
Duke67.87Washington State66.73Central Florida(UCF)67.61Indiana67.47
Marshall67.60Troy66.63Western Michigan67.51Louisiana-Monroe66.89
Baylor67.37Houston66.01Marshall67.49California66.64
Michigan67.18SMU65.90Utah State67.43Nevada66.31
Wyoming67.09Mississippi65.85Louisiana-Lafayette66.60Auburn65.86
Syracuse67.05East Carolina65.82Washington State66.14Navy65.78
Idaho66.19Virginia65.00Wyoming66.09Rice65.49
Louisville66.13Fla. International64.99Kentucky66.03East Carolina65.24
Ohio University66.13Idaho64.81Syracuse65.97Middle Tennessee65.03
Louisiana Tech66.12Rutgers64.47Oregon State65.76South Florida64.88
Colorado65.89Louisiana Tech64.40Minnesota64.89Virginia64.65
Bowling Green65.35Toledo63.75Rice64.45Connecticut64.35
UNLV64.91Minnesota63.74Boston College64.40Western Kentucky64.16
Illinois64.37Duke63.61Kansas64.09Kentucky63.99
Northern Illinois64.33Purdue63.19UTEP63.71Kansas63.57
Tulsa63.96Indiana62.79East Carolina63.52Temple63.34
Maryland63.83Western Michigan62.67Fla. International63.26Maryland63.28
Indiana63.56Wake Forest61.63San Jose State62.99Troy63.01
UAB63.49Ohio University61.10Hawaii62.79Washington State62.65
Utah State63.33Wyoming60.44Miami-Ohio62.62Marshall61.25
San Diego State62.52Marshall59.57Bowling Green62.57Houston61.23
Vanderbilt62.47Utah State59.45Ball State62.43Wake Forest61.11
Hawaii62.47UAB59.29Western Kentucky62.11Texas-San Antonio60.89
Colorado State62.05Arkansas State59.27Mississippi61.77Central Michigan60.83
Buffalo61.95Kansas59.04Maryland61.76Boston College60.42
Louisiana-Monroe61.51Kent State59.01Fresno State61.68Western Michigan59.94
UTEP59.94UTEP58.80Colorado61.65North Texas59.15
Florida Atlantic59.26Vanderbilt58.20Army60.51Wyoming58.79
Kent State58.07Rice58.17Duke60.07Memphis58.60
Toledo57.77Colorado State57.66Kent State59.79Texas State58.36
Louisiana-Lafayette57.67UNLV57.60Eastern Michigan59.57Miami-Ohio58.23
Western Michigan57.31Central Michigan56.81North Texas59.18Illinois58.11
Washington State57.16Tulane56.31Louisiana-Monroe58.88Fla. International58.09
Arkansas State56.44Middle Tennessee55.27New Mexico State56.68Air Force57.57
Memphis56.39Louisiana-Monroe55.12Buffalo56.15Buffalo57.54
Army56.02North Texas53.78Central Michigan56.00Colorado State57.14
Fla. International54.42Florida Atlantic53.27Indiana55.37Florida Atlantic56.65
New Mexico54.19Ball State52.27Idaho54.74UTEP56.55
Akron54.14San Jose State52.06Colorado State54.65UAB55.73
San Jose State54.06Louisiana-Lafayette52.05Troy52.68New Mexico54.98
Tulane53.55Bowling Green52.01UAB52.04Eastern Michigan54.27
Rice53.26Western Kentucky51.31UNLV51.47Army53.81
Ball State51.99New Mexico50.64Middle Tennessee48.29Tulane52.95
New Mexico State51.95New Mexico State50.18Tulane47.69Colorado52.89
Miami-Ohio51.52Memphis49.75New Mexico47.05UNLV52.84
North Texas51.15Eastern Michigan47.94Memphis45.68Hawai’i51.25
Eastern Michigan44.11Buffalo47.70Florida Atlantic43.84South Alabama50.19
Western Kentucky43.67Akron42.68Akron42.49Idaho49.82
Southern Miss49.70
Akron49.40
New Mexico State47.25
Massachusetts46.83
Table 11

Sagarin ratings, 2013-2015

201320142015
Florida State101.90Ohio State100.81Alabama100.92
Oregon93.58TCU99.61Clemson94.82
Alabama93.37Alabama97.42Ohio State92.92
Auburn91.76Oregon95.71Oklahoma91.50
Stanford91.57Georgia95.17Stanford90.71
Michigan State90.76Michigan State93.49Mississippi90.03
Missouri90.33Baylor90.62TCU87.72
UCLA89.39Mississippi89.94Baylor87.38
Baylor89.28Mississippi State89.32Michigan87.07
South Carolina88.99Arkansas88.64Tennessee86.81
Washington88.39Auburn88.56Notre Dame86.65
Oklahoma88.08Georgia Tech87.25Florida State86.62
Oklahoma State88.04Clemson86.98LSU85.81
Clemson87.22LSU85.69Southern California84.59
LSU87.22Missouri85.19Arkansas84.43
Ohio State86.57Florida State84.48North Carolina84.26
Wisconsin85.79Kansas State84.44Michigan State84.01
Arizona State85.51Stanford84.34Mississippi State84.00
Louisville85.05UCLA84.29Wisconsin83.06
Southern California84.55Wisconsin83.89Oregon82.90
Texas A&M83.56Southern California83.86Houston82.73
Kansas State83.39Texas A&M83.11Iowa82.70
Arizona83.32Marshall82.42Georgia82.65
Central Florida(UCF)82.02Utah82.39Utah82.41
Georgia81.96Arizona State81.74Washington82.35
Mississippi80.49Florida81.62Oklahoma State81.67
Notre Dame79.80Tennessee81.50West Virginia81.02
Oregon State79.72Oklahoma81.48Florida80.74
Texas Tech79.64Nebraska81.02Auburn80.02
Mississippi State79.58Louisville80.56UCLA79.99
Iowa78.68Notre Dame79.78Navy79.61
Texas78.63West Virginia78.81California79.59
Utah78.49Louisiana Tech77.69Texas A&M79.34
BYU77.87South Carolina77.61Louisville79.01
Vanderbilt77.76Boise State77.33Western Kentucky78.24
Georgia Tech77.14Arizona77.17Toledo77.70
Nebraska76.82Minnesota76.35Boise State77.66
Bowling Green76.54Memphis76.15BYU77.31
Washington State76.21Virginia Tech75.24Pittsburgh76.94
Utah State76.15Miami-Florida74.94Arizona State76.80
Duke75.97Duke74.94Nebraska76.55
Virginia Tech75.91Boston College74.34Washington State76.36
Michigan75.75Washington74.07Northwestern76.35
North Carolina75.68Penn State73.73San Diego State76.15
Boise State74.53Iowa73.24Virginia Tech75.76
TCU74.11NC State73.16Texas Tech75.37
Miami-Florida73.86Kentucky73.14Memphis75.04
Houston73.84Maryland72.75Miami-Florida74.94
Navy73.53Virginia72.69NC State74.72
Penn State73.47Utah State72.65Penn State74.69
Florida73.43Texas72.42Temple74.34
Fresno State73.01Oklahoma State72.41Bowling Green73.96
Pittsburgh72.94BYU71.61Georgia Tech73.57
East Carolina72.54Cincinnati71.56Arizona73.37
Indiana72.06Central Florida(UCF)71.22Texas73.20
Minnesota72.06California71.16South Florida73.15
Northern Illinois71.86Michigan70.86Duke72.91
Tennessee71.86Pittsburgh70.84Kansas State72.39
Marshall71.46Rutgers70.72Appalachian State71.63
Syracuse70.91Georgia Southern69.55Georgia Southern70.96
North Texas70.81East Carolina69.46Missouri70.75
Boston College70.60Colorado State69.39Western Michigan70.63
Northwestern70.08Northwestern69.34Minnesota70.59
Cincinnati69.66Rice68.35South Carolina69.58
Toledo67.69Navy68.20Marshall69.54
Rice67.67Northern Illinois68.15Iowa State69.49
Ball State67.35Toledo68.05Louisiana Tech69.32
Colorado State67.14Houston67.88Indiana69.30
Iowa State67.06Air Force67.74Southern Miss68.96
Maryland66.79North Carolina67.57Illinois68.88
West Virginia66.75Louisiana-Lafayette66.74Virginia68.61
Colorado66.42Texas Tech66.49Air Force68.55
Illinois66.33Western Kentucky66.27Utah State68.46
Arkansas66.21Washington State65.86Cincinnati68.45
San Diego State65.89Oregon State65.71Arkansas State66.85
Texas-San Antonio65.58Illinois65.61Syracuse66.54
Louisiana-Lafayette65.44UAB65.40Kentucky66.44
San Jose State65.34Arkansas State65.13Vanderbilt66.23
Buffalo64.97Temple65.02East Carolina66.05
Western Kentucky64.78Western Michigan64.87Northern Illinois65.76
Wake Forest64.72Indiana64.24Maryland65.76
Florida Atlantic64.59Nevada64.18Boston College65.72
South Alabama64.20San Diego State64.10Central Michigan64.88
Tulane64.14Syracuse62.93Middle Tennessee64.72
Arkansas State62.79Colorado62.38Colorado64.10
Nevada62.59Purdue62.30Connecticut63.11
UNLV62.26Central Michigan62.09Wake Forest63.05
Rutgers61.85Appalachian State61.93Colorado State62.82
SMU61.57UTEP61.43San Jose State62.61
Middle Tennessee61.04Iowa State61.42Ohio62.49
Kentucky60.92Middle Tennessee60.89Tulsa61.81
Virginia60.57Fresno State59.02Rutgers61.69
Memphis60.46Texas State58.90Akron61.46
NC State59.79Kansas58.32Nevada61.39
Troy59.36Old Dominion58.10Purdue59.93
Temple58.73Ball State57.72Oregon State59.48
California58.54Bowling Green57.25New Mexico59.46
Kansas58.41Vanderbilt57.21Troy58.42
Connecticut58.28Fla. International56.99Georgia State56.92
Ohio58.18Hawai’i56.62Buffalo56.38
Kent State58.08Wake Forest56.46UNLV54.94
Hawai’i56.73Wyoming56.26Fresno State53.65
Louisiana-Monroe56.71Louisiana-Monroe56.06Florida Atlantic52.78
Tulsa55.86South Alabama56.00SMU52.75
Wyoming55.84New Mexico55.11Fla. International52.15
Akron55.42Akron54.88Ball State51.78
Central Michigan55.27Ohio54.88South Alabama51.61
Texas State53.72Texas-San Antonio54.32Louisiana-Lafayette51.47
South Florida53.66Florida Atlantic53.82Massachusetts51.29
New Mexico53.40Southern Miss53.61Rice50.98
Army51.95Buffalo53.57Army50.96
Purdue51.65Tulane53.29Idaho49.88
Air Force49.87North Texas52.96Tulane49.75
Louisiana Tech48.21South Florida52.74UTSA49.24
UAB47.93San Jose State52.01Wyoming49.19
Georgia State44.65Massachusetts51.72Kansas48.79
UTEP44.42Army51.56Texas State48.24
Idaho42.44Miami-Ohio50.33Kent State47.82
New Mexico State42.38Tulsa48.45Miami-Ohio47.67
Western Michigan42.38Kent State47.65UTEP47.61
Massachusetts41.45UNLV46.87Old Dominion47.51
Southern Miss40.57Idaho46.78Hawai’i47.42
Eastern Michigan38.81Troy46.19Central Florida(UCF)46.15
Fla. International35.80New Mexico State45.20New Mexico State45.97
Miami-Ohio35.24Connecticut44.71Louisiana-Monroe45.87
Georgia State41.07North Texas42.59
SMU39.16Eastern Michigan42.08
Eastern Michigan38.81Charlotte39.00
Table 12

Sagarin ratings, 2016-2019

2016201720182019
Clemson105.35Alabama101.18Clemson103.16LSU104.88
Alabama105.33Ohio State97.15Alabama101.44Ohio State104.83
Michigan94.05Georgia96.70Ohio State92.33Clemson101.53
Washington93.28Penn State95.65Georgia91.50Alabama98.50
Ohio State93.27Clemson95.32Oklahoma90.99Georgia94.44
Oklahoma93.21Oklahoma94.31Michigan88.80Oregon93.65
LSU91.99Wisconsin94.30Notre Dame87.43Oklahoma93.37
Florida State91.56Auburn91.45Mississippi State87.32Penn State92.27
Wisconsin90.59Washington89.29Washington87.05Wisconsin92.22
Southern California89.92Notre Dame88.90Iowa86.60Florida90.67
Oklahoma State89.71Oklahoma State87.58Penn State86.27Notre Dame90.56
Miami-Florida88.28TCU87.39Texas A&M85.68Michigan89.58
Penn State87.77Central Florida (UCF)87.11Florida85.21Auburn88.57
Florida87.04Stanford85.79LSU85.16Iowa87.50
Virginia Tech85.25Southern California85.42Auburn84.97Texas86.53
Kansas State84.75Miami-Florida84.47West Virginia83.76Baylor86.22
Auburn83.40Mississippi State84.35Texas A&M83.45Washington86.13
Stanford83.24Iowa83.80Missouri83.43Minnesota84.30
Western Kentucky83.07LSU83.66Washington State83.22Texas A&M84.20
Tennessee82.68Northwestern83.54Fresno State82.36Utah83.82
Western Michigan82.22Virginia Tech83.50Central Florida(UCF)82.18Memphis83.37
Louisville81.96NC State82.92Utah82.05Oklahoma State81.39
Georgia Tech81.11Michigan State82.92Stanford81.37Central Florida(UCF)81.29
NC State80.99Louisville81.59Wisconsin81.12Navy81.27
Minnesota80.06Iowa State81.48Kentucky80.44Appalachian State81.06
Colorado80.03Michigan81.47Utah State79.47Southern California80.73
San Diego State79.98Texas81.43Syracuse79.45Kansas State80.60
West Virginia79.74Florida State80.07Boise State79.21Iowa State80.25
Tulsa79.72Wake Forest79.95NC State79.13Air Force80.04
Texas A&M79.53Kansas State79.48Oklahoma State79.01Cincinnati79.84
North Carolina79.45Memphis78.78Northwestern78.37Boise State78.55
South Florida79.42Boise State78.36Michigan State78.14Kentucky78.32
Washington State79.08South Carolina78.31Miami-Florida78.11Virginia78.25
Northwestern78.72Utah78.06Oregon78.00Tennessee77.74
BYU78.27Boston College77.88Appalachian State77.09Michigan State77.01
Utah77.91Purdue77.77Iowa State76.72Arizona State76.97
Georgia77.48Washington State77.57Boston College76.46TCU76.42
Pittsburgh77.44Duke77.54Cincinnati76.41Indiana75.94
Appalachian State76.74Oregon77.15TCU76.38North Carolina75.94
Iowa76.45South Florida76.68South Carolina76.31California75.82
Baylor76.34Georgia Tech76.27Minnesota76.14Florida Atlantic75.80
Temple75.46Texas A&M75.45Virginia75.98Lousiana74.78
Nebraska74.18Arizona State74.75Army West Point75.83Virginia Tech74.47
Arkansas74.06Pittsburgh74.67Arizona State75.63SMU74.25
Mississippi State74.05Florida Atlantic74.40Texas Tech75.62Washington State74.18
Wake Forest73.93Texas Tech74.09Purdue75.43Mississippi State73.26
TCU73.35West Virginia73.97Georgia Tech75.12Louisville73.22
Memphis73.16Indiana73.76Duke74.67Nebraska72.87
Notre Dame72.95Missouri73.74Pittsburgh74.36San Diego State72.55
Toledo72.51Arizona73.56Southern California74.11Texas Tech72.23
Air Force72.50Fresno State73.49Kansas State73.59West Virginia72.21
Arkansas State72.22California73.05Memphis73.20Missouri72.19
Houston71.85Navy72.79Nebraska73.01South Carolina72.17
Mississippi71.59UCLA72.57Ohio72.77Florida State71.96
California71.35Mississippi71.92Maryland72.67Oregon State71.44
Louisiana Tech70.78Appalachian State71.85Vanderbilt72.62Mississippi71.38
Texas70.63San Diego State71.78California72.37Tulane71.14
Navy70.31Army West Point71.58Wake Forest72.32Wake Forest70.05
Boston College70.28Houston71.11Temple72.12UCLA69.92
Troy69.76Minnesota70.68Baylor71.26Pittsburgh69.76
Boise State69.62Ohio70.09BYU71.20Colorado69.75
Kentucky69.62Kentucky70.04Virginia Tech71.19Wyoming69.55
Vanderbilt69.57Troy69.89Arizona70.74Syracuse69.48
Texas Tech69.14Florida69.82Indiana70.74Boston College69.34
Indiana68.84Toledo69.59Florida State70.47BYU69.13
UCLA68.27Colorado68.51Mississippi70.35Hawai’i68.78
Iowa State68.18Temple68.38Tennessee70.22Purdue68.40
Colorado State67.96North Carolina68.27UCLA69.76Miami-Florida68.35
Idaho67.89Syracuse68.08Colorado69.55Northwestern68.10
Old Dominion67.66Wyoming67.51UAB68.78Houston67.89
Oregon State67.59Arkansas67.29Houston68.36Stanford67.76
New Mexico67.43Nebraska67.21Toledo67.99Western Kentucky67.54
Oregon67.40Northern Illinois67.04Troy67.24Buffalo67.50
Wyoming66.97Maryland65.82Air Force67.21Temple66.39
Duke66.61Louisiana Tech65.80Marshall66.29Duke66.37
Missouri66.29Virginia65.71Buffalo66.20Tulsa66.02
South Carolina66.18Vanderbilt65.66Wyoming66.00Illinois65.97
Syracuse65.95Colorado State65.39Georgia Southern65.86Louisiana Tech65.81
Michigan State95.93Marshall64.97North Texas65.55Utah State65.71
Central Florida(UCF)64.70SMU63.73Northern Illinois65.48Ohio65.65
Army West Point64.32Western Michigan63.51Miami-Ohio65.30Arizona64.94
Maryland63.98Arkansas State63.40Tulane65.22Marshall64.53
Arizona State63.86Utah State63.22Arkansas State65.08Kansas64.04
SMU63.13Tulane63.07Nevada65.05Western Michigan64.03
Southern Miss63.09Tennessee63.01North Carolina64.75Arkansas State63.52
Northern Illinois62.34Air Force62.38Middle Tennessee64.61Kent State63.18
Ohio State62.20Buffalo62.03Eastern Michigan64.53Georgia Southern63.09
UTSA61.18Eastern Michigan61.82Kansas64.31Miami-Ohio62.80
Arizona61.14Baylor61.66San Diego State63.61Fresno State62.01
Hawai’i60.62Rutgers61.66Florida Atlantic62.65Ball State61.41
Miami-Ohio60.51Central Michigan61.46Arkansas62.49Maryland61.39
Virginia59.99Middle Tennessee61.09Fla. International62.42South Florida61.20
Georgia Southern59.67North Texas61.04South Florida61.36Southern Miss61.17
Tulane59.62Southern Miss60.05Louisiana Tech61.09Troy60.63
Illinois59.21Tulsa59.43SMU60.72Liberty60.58
Utah State58.85Miami-Ohio59.12Western Michigan60.26Central Michigan60.42
Eastern Michigan58.17UTSA58.82Navy60.19Colorado State60.34
Louisiana-Lafayette58.11BYU58.77Southern Miss60.17UAB60.08
Cincinnati57.46New Mexico State58.03Illinois59.37NC State60.04
East Carolina57.46Nevada58.02Lousiana58.59Georgia Tech59.73
Central Michigan57.27Fla. International57.85Louisville58.48Army West Point59.72
Nevada56.40Akron57.69Tusla57.91Vanderbilt59.12
Middle Tennessee56.16UNLV57.40Oregon State56.96Nevada58.93
South Alabama56.11Massachusetts57.30Hawai’i56.11ULM58.72
Kansas55.22Illinois56.04ULM55.97San Jose State58.48
Purdue54.97UAB55.55Rutgers55.46Eastern Michigan58.33
San Jose State53.71Cincinnati55.42Colorado State55.25Charlotte57.64
Ball State53.51East Carolina55.13UNLV54.49Arkansas State57.61
Bowling Green53.34Oregon State54.37New Mexico54.35Fla. International57.49
Arkon52.53Western Kentucky54.19Western Kentucky53.35Georgia State57.42
Georgia State52.32Louisiana -Monroe54.04Akron53.21Northern Illinois56.63
Massachusetts51.48Idaho53.74Ball State53.03Toledo56.55
Fla. International51.38Connecticut53.61Charlotte52.59Middle Tennessee56.47
UNLV51.15Georgia State53.23Massachusetts52.50UNLV56.44
North Texas51.11New Mexico53.09Liberty52.48Coastal Carolina55.33
Louisiana-Monroe50.26Bowling Green52.98East Carolina52.44North Texas54.18
Kent State50.25Georgia Southern51.84Coastal Carolina52.40Rutgers53.59
Connecticut50.11Old Dominion51.78San Jose State51.42East Carolina52.80
Rutgers49.82Louisiana-Lafayette51.15Georgia State51.01Rice52.27
Charlotte49.45Hawai’i50.56Central Michigan50.65Texas State49.33
New Mexico State49.25South Alabama50.54Old Dominion50.52UTSA48.77
Marshall49.05Kansas49.75Bowling Green50.10New Mexico48.75
UTEP48.87Coastal Carolina48.72Kent State49.30South Alabama48.71
Rice47.95Rice43.82South Alabama47.17New Mexico State45.85
Florida Atlantic47.75San Jose State43.42New Mexico State46.53Bowling Green43.98
Fresno State45.77Texas State42.83UTSA45.69Old Dominion43.74
Buffalo44.84Kent State42.66Texas State45.53Connecticut42.99
Texas State38.00Ball State40.15Rice43.41UTEP37.98
Charlotte39.80Connecticut43.01Akron33.56
UTEP38.38UTEP41.02Massachusetts30.72
Table 13

Power-Five conference champions and top non-Power-Five teams, 2009-2015

Conference2009201020112012201320142015
ACCGeorgia Tech84.60Virginia Tech86.10Clemson77.78Florida State87.50Florida State101.90Florida State84.48Clemson94.82
Big 12Texas92.39Oklahoma88.72Oklahoma State97.01Kansas State88.98Baylor89.28TCU99.61Oklahoma91.50
Big TenOhio State88.35Wisconsin86.99Wisconsin88.67Wisconsina81.02Michigan State90.76Ohio State100.81Michigan State84.01
Pac-12Oregon85.27Oregon96.98Oregon91.82Stanford87.85Stanford91.57Oregon95.71Stanford90.71
SECAlabama100.25Auburn98.06LSU100.3Alabama99.40Auburn91.76Alabama97.42Alabama100.92
Non-Power-FiveBoise State89.35Boise State93.03Boise State88.53Utah State82.41Louisville85.05Marshall82.42Houston82.72

aOhio State (12-0, ranked #3 by the AP poll with Sagarin rating of 85.37) was ineligible for postseason play, so Wisconsin (81.02) was the official Big Ten champion after winning the conference championship game.

Table 14

Power-Five conference champions and top non-Power-Five teams, 2016-2019

Conference2016201720182019
ACCClemson105.35Clemson95.32Clemson103.16Clemson101.53
Big 12Oklahoma93.21Oklahoma94.31Oklahoma90.99Oklahoma93.37
Big TenPenn State87.77Ohio State97.15Ohio State92.33Ohio State104.83
Pac-12Washington93.28Southern California74.11Washington87.05Oregon93.65
SECAlabama105.33Georgia96.70Alabama101.44LSU104.88
Non-Power-FiveWestern Kentucky83.07Central Florida87.11Fresno State82.36Memphis83.37

Appendices

Appendix 2:

Parameterized validity and effectiveness graphs

This appendix contains the validity and effectiveness of tournaments parameterized over the values of σR and σEObs. Each figure contains a 17 × 14 grid of line graphs; the graph in the ith row from the top and the jth column from the left shows the results of tournaments with σR = i and σEObs = j. The line in each graph, from left to right, shows the validity or effectiveness (from 0% to 100%) as tournaments increase in size from 1 team to 128 teams.

Fig. 2

Validity of fully-open tournaments, parameterized over σR and σEObs.

Validity of fully-open tournaments, parameterized over σR and σEObs.
Fig. 3

Validity of partially-open tournaments with Power-Five conference guarantees only, parameterized over σR and σEObs.

Validity of partially-open tournaments with Power-Five conference guarantees only, parameterized over σR and σEObs.
Fig. 4

Validity of partially-open tournaments with non-Power-Five conference guarantee only, parameterized over σR and σEObs.

Validity of partially-open tournaments with non-Power-Five conference guarantee only, parameterized over σR and σEObs.
Fig. 5

Validity of partially-open tournaments with both Power-Five and non-Power-Five conference guarantees, parameterized over σR and σEObs.

Validity of partially-open tournaments with both Power-Five and non-Power-Five conference guarantees, parameterized over σR and σEObs.
Fig. 6

Effectiveness of fully-open tournaments, parameterized over σR and σEObs.

Effectiveness of fully-open tournaments, parameterized over σR and σEObs.
Fig. 7

Effectiveness of partially-open tournaments with Power-Five conference guarantees only, parameterized over σR and σEObs.

Effectiveness of partially-open tournaments with Power-Five conference guarantees only, parameterized over σR and σEObs.
Fig. 8

Effectiveness of partially-open tournaments with non-Power-Five conference guarantee only, parameterized over σR and σEObs.

Effectiveness of partially-open tournaments with non-Power-Five conference guarantee only, parameterized over σR and σEObs.
Fig. 9

Effectiveness of partially-open tournaments with both Power-Five and non-Power-Five conference guarantees, parameterized over σR and σEObs.

Effectiveness of partially-open tournaments with both Power-Five and non-Power-Five conference guarantees, parameterized over σR and σEObs.

Appendices

Appendix 3:

Conditional distribution of real team strength given observed team strength

Given an observed team strength stObs drawn from SSag, we want to draw from the conditional distribution of team t’s real strength, i.e., Pr(STrue=s:|:SSag=stObs) . Because we assume that the estimation error E is independent of the true team strength STrue,

(16)
Pr(STrue=s|SSag=stObs)=Pr(STrue=s,SSag=stObs)Pr(SSag=stObs)=Pr(STrue=s,ESag=stObs-s)Pr(SSag=stObs)
(17)
=Pr(STrue=s)Pr(ESag=stObs-s)Pr(SSag=stObs)
(18)
=(1σTrue2πe-12(s-μSag)2/σTrue2)(1σESag2πe-12(stObs-s)2/σESag2)1σSag2πe-12(stObs-μSag)2/σSag2.

Since σTrue2=σSag2-σESag2 ,

(19)
Pr(STrue=s|SSag=stObs)
(20)
=12πσSagσESagσSag2-σESag2e-12[s-(stObs-(stObs-μSag)σESag2σSag2)]2/[(σSag2-σESag2)σESag2σSag2].

So, STrue|SObs is normally distributed, according to

(21)
N(stObs-(stObs-μSag)σESag2σSag2,(σSag2-σESag2)σESag2σSag2).

Appendices

Appendix 4:

Rematch data, 1997-2019

This appendix shows the 63 times from 1997 to 2019 that two teams played each other twice in a season. As in Curry and Sokol (2016), we use this information, to estimate the variance of randomness σR2 in a college football game.

Table 15

Rematch data, 1997-2019. Types of games include regular season (“Reg”), conference championship game (“Conf”), and postseason bowl game (“Bowl”).

SeasonTeamsDateTypeHome teamLineResult
1997LSUNotre Dame11/15/1997RegLSUNotre Dame by 11Notre Dame by 18
12/28/1997BowlneutralLSU by 7LSU by 18
1999AlabamaFlorida10/02/1999RegFloridaFlorida by 16Alabama by 1
12/04/1999ConfneutralFlorida by 7.5Alabama by 27
1999NebraskaTexas10/23/1999RegTexasNebraska by 16.5Texas by 4
12/04/1999ConfneutralNebraska by 9.5Nebraska by 16
1999MarshallW Michigan11/13/1999RegW MichiganMarshall by 12.5Marshall by 14
12/03/1999ConfMarshallMarshall by 20.5Marshall by 4
2000MarshallW Michigan10/05/2000RegMarshallMarshall by 7W Michigan by 20
12/02/2000ConfMarshallW Michigan by 6Marshall by 5
2000AuburnFlorida10/14/2000RegFloridaFlorida by 9.5Florida by 31
12/02/2000ConfneutralFlorida by 9.5Florida by 22
2000Kansas StOklahoma10/14/2000RegKansas StKansas St by 9.5Oklahoma by 10
12/02/2000ConfneutralOklahoma by 2Oklahoma by 3
2001LSUTennessee09/29/2001RegTennesseeTennessee by 8Tennessee by 8
12/08/2001ConfneutralLSU by 7LSU by 11
2001ColoradoTexas10/20/2001RegTexasTexas by 12Texas by 34
12/01/2001ConfneutralTexas by 9Colorado by 2
2002ColoradoOklahoma11/02/2002RegOklahomaOklahoma by 13.5Oklahoma by 16
12/07/2002ConfneutralColorado by 7.5Oklahoma by 22
2003GeorgiaLSU09/20/2003RegLSULSU by 1.5LSU by 7
12/06/2003ConfneutralLSU by 3LSU by 21
2003Florida StMiami (FL)10/11/2003RegFlorida StFlorida St by 7Miami (FL) by 8
01/01/2004BowlneutralFlorida St by 1.5Miami (FL) by 2
2003Bowling GreenMiami (OH)11/04/2003RegMiami (OH)Miami (OH) by 7Miami (OH) by 23
12/04/2003ConfBowling GreenMiami (OH) by 6.5Miami (OH) by 22
2004AuburnTennessee10/02/2004RegTennesseeTennessee by 1.5Auburn by 24
12/04/2004ConfneutralAuburn by 14.5Auburn by 10
2004Miami (OH)Toledo11/02/2004RegMiami (OH)Miami (OH) by 6Miami (OH) by 7
12/02/2004ConfneutralToledo by 1Toledo by 8
2005AkronN Illinois09/24/2005RegAkronN Illinois by 8Akron by 6
12/01/2005ConfneutralN Illinois by 13Akron by 1
2005ColoradoTexas10/15/2005RegTexasTexas by 15.5Texas by 25
12/03/2005ConfneutralTexas by 25Texas by 67
2006HoustonSouthern Miss10/14/2006RegSouthern MissSouthern Miss by 1.5Southern Miss by 4
12/01/2006ConfHoustonHouston by 5Houston by 14
2007BYUUCLA09/08/2007RegUCLAUCLA by 8UCLA by 10
12/22/2007BowlneutralBYU by 6.5BYU by 1
2007Central MichiganPurdue09/15/2007RegPurdueCentral Michigan by 21.5Purdue by 23
12/26/2007BowlneutralCentral Michigan by 8Purdue by 3
2007MissouriOklahoma10/13/2007RegOklahomaOklahoma by 13Oklahoma by 10
12/01/2007ConfneutralOklahoma by 3Oklahoma by 21
2007TulsaUCF10/20/2007RegUCFUCF by 3UCF by 21
12/01/2007ConfUCFUCF by 8UCF by 19
2007Boston CollegeVirginia Tech10/25/2007RegVirginia TechVirginia Tech by 3Boston College by 4
12/01/2007ConfneutralVirginia Tech by 4.5Virginia Tech by 14
2008Air ForceHouston09/13/2008RegHoustonHouston by 2.5Air Force by 3
12/31/2008BowlneutralHouston by 5.5Houston by 6
2008NavyWake Forest09/27/2008RegWake ForestWake Forest by 17Navy by 7
12/20/2008BowlneutralWake Forest by 3Wake Forest by 10
2008Boston CollegeVirginia Tech10/18/2008RegBoston CollegeBoston College by 3Boston College by 5
12/06/2008ConfneutralBoston College by 1Virginia Tech by 18
2009ClemsonGeorgia Tech09/10/2009RegGeorgia TechGeorgia Tech by 5Georgia Tech by 3
12/05/2009ConfneutralEvenGeorgia Tech by 5
2010NebraskaWashington09/18/2010RegWashingtonNebraska by 3Nebraska by 35
12/30/2010BowlneutralNebraska by 13.5Washington by 12
2010AuburnSouth Carolina09/25/2010RegAuburnAuburn by 3Auburn by 8
12/04/2010ConfneutralAuburn by 3.5Auburn by 39
2011ClemsonVirginia Tech10/01/2011RegVirginia TechVirginia Tech by 7.5Clemson by 20
12/03/2011ConfneutralVirginia Tech by 7Clemson by 28
2011Michigan StWisconsin10/22/2011RegMichigan StWisconsin by 7.5Michigan St by 6
12/03/2011ConfneutralWisconsin by 9.5Wisconsin by 3
2011AlabamaLSU11/05/2011RegAlabamaAlabama by 5.5LSU by 3
01/09/2012BowlneutralAlabama by 2.5Alabama by 21
2012Iowa StTulsa09/01/2012RegIowa StTulsa by 1.5Iowa St by 15
12/31/2012BowlneutralTulsa by 1.5Tulsa by 14
2012NebraskaWisconsin09/29/2012RegNebraskaNebraska by 12Nebraska by 3
12/01/2012ConfneutralNebraska by 3Wisconsin by 39
2012TulsaUCF11/17/2012RegTulsaTulsa by 1Tulsa by 2
12/01/2012ConfTulsaTulsa by 3Tulsa by 6
2012StanfordUCLA11/24/2012RegUCLAStanford by 3Stanford by 18
11/30/2012ConfStanfordStanford by 9.5Stanford by 3
2013Arizona StStanford09/21/2013RegStanfordStanford by 7Stanford by 14
12/07/2013ConfArizona StArizona St by 3Stanford by 24
2014ArizonaOregon10/02/2014RegOregonOregon by 21.5Arizona by 7
12/05/2014ConfneutralOregon by 14.5Oregon by 38
2014Fresno StBoise St10/17/2014RegBoise StBoise St by 18Boise St by 10
12/06/2014ConfBoise StBoise St by 24Boise St by 14
2015USCStanford09/19/2015RegUSCUSC by 10Stanford by 10
12/05/2015ConfneutralStanford by 4.5Stanford by 19
2016Western KentuckyLouisiana Tech10/06/2016RegLouisiana TechWestern Kentucky by 3Louisiana Tech by 3
12/03/2016ConfWestern KentuckyWestern Kentucky by 12Western Kentucky by 14
2016ArmyNorth Texas10/22/2016RegArmyArmy by 17.5North Texas by 17
12/27/2016BowlneutralArmy by 10.5Army by 7
2016WyomingSan Diego State11/19/2016RegWyomingSan Diego State by 9.5Wyoming by 1
12/03/2016ConfWyomingSan Diego State by 7San Diego State by 3
2017USCStanford09/09/2017RegUSCUSC by 4USC by 18
12/01/2017ConfneutralUSC by 3.5USC by 3
2017UCFMemphis09/30/2017RegUCFUCF by 5.5UCF by 27
12/02/2017ConfUCFUCF by 7UCF by 7
2017Florida AtlanticNorth Texas10/21/2017RegFlorida AtlanticFlorida Atlantic by 3.5Florida Atlantic by 38
12/02/2017ConfFlorida AtlanticFlorida Atlantic by 11Florida Atlantic by 24
2017AkronToledo10/21/2017RegToledoToledo by 15.5Toledo by 27
12/02/2017ConfneutralToledo by 20.5Toledo by 17
2017OklahomaTCU11/11/2017RegOklahomaOklahoma by 6Oklahoma by 18
12/02/2017ConfneutralOklahoma by 7.5Oklahoma by 24
2017GeorgiaAuburn11/11/2017RegAuburnGeorgia by 2.5Auburn by 23
12/02/2017ConfneutralGeorgia by 2Georgia by 21
2017Boise StFresno St11/25/2017RegFresno StBoise St by 6.5Fresno St by 11
12/02/2017ConfBoise StBoise St by 9.5Boise St by 3
2018WashingtonUtah09/15/2018RegWashingtonWashington by 4Washington by 14
11/30/2018ConfneutralWashington by 4.5Washington by 7
2018OklahomaTexas10/06/2018RegneutralOklahoma by 7Texas by 3
12/01/2018ConfneutralOklahoma by 9.5Oklahoma by 12
2018LibertyNew Mexico State10/06/2018RegNew Mexico StateLiberty by 9New Mexico State by 7
11/24/2018RegLibertyLiberty by 7Liberty by 7
2018UCFMemphis10/13/2018RegMemphisUCF by 5UCF by 1
12/01/2018ConfUCFUCF by 1UCF by 15
2018Appalachian StLouisiana10/20/2018RegAppalachian StAppalachian St by 25Appalachian St by 10
12/01/2018ConfAppalachian StAppalachian St by 17.5Appalachian St by 11
2018Boise StFresno St11/09/2018RegBoise StFresno St by 2.5Boise St by 7
12/01/2018ConfBoise StBoise St by 1.5Fresno St by 3
2018Middle TennesseeUAB11/24/2018RegMiddle TennesseeUAB by 3Middle Tennessee by 24
12/01/2018ConfMiddle TennesseeMiddle Tennessee by 1.5UAB by 2
2019LibertyNew Mexico State10/05/2019RegNew Mexico StateLiberty by 7.5Liberty by 7
11/30/2019RegLibertyLiberty by 15Liberty by 21
2019Appalachian StLouisiana10/09/2019RegLouisianaLouisiana by 2.5Appalachian St by 10
12/07/2019ConfAppalachian StAppalachian St by 6Appalachian St by 7
2019Boise StHawaii10/12/2019RegBoise StBoise St by 12.5Boise St by 22
12/07/2019ConfBoise StBoise St by 14Boise St by 21
2019Ohio StWisconsin10/26/2019RegOhio StOhio St by 14.5Ohio St by 31
12/07/2019ConfneutralOhio St by 16.5Ohio St by 13
2019OklahomaBaylor11/16/2019RegBaylorOklahoma by 10.5Oklahoma by 3
12/07/2019ConfneutralOklahoma by 9Oklahoma by 7
2019CincinnatiMemphis11/29/2019RegMemphisMemphis by 14Memphis by 10
12/07/2019ConfMemphisMemphis by 9Memphis by 5

References

1 

Appleton, D.R. (1995) , May the best man win?, The Statistician, 44: , 529–538.

2 

Berry, S.M. (2003) , A Statistician Reads the Sports Pages: CollegeFootball Rankings: The BCS and the CLT, Chance 16: , 46–49.

3 

Chen, H. , Ham, S.H. , & Lim, N. (2011) , Designing multiperson tournaments with asymmetric contestants: an experimental study, Management Science 57: , 864–883.

4 

CollegeFootballPlayoff.com. 2021. 12-Team Playoff Proposed by College Football Playoff Working Group. Https://collegefootballplayoff.com/news/2021/6/10/12-team-playoff-proposal.aspx.

5 

Curry, S. , & Sokol, J. 2016, Quantifying March’s Madness. Working paper.

6 

David, H.A. 1988, The method of paired comparisons, 2nd ed. Chapman / Hall.

7 

Dizdar, D. 2013, On the optimality of small research tournaments. Working Paper, University of Bonn, Institute of Economic Theory, http://dx.doi.org/10.2139/ssrn.2357096.

8 

Fanson, P. 2020, Vegas Always Knows? A Mathematical Deep Dive. Https://www.theonlycolors.com/2020/9/29/21492301/vegas-always-knows-a-mathematical-deep-dive, downloaded June 1, 2022.

9 

Fullerton, R.L. , & McAfee, R.P. (1999) , Auctioning entry intotournaments, Journal of Political Economy 107: , 573–605.

10 

Gill, P.S. (2000) , Late-Game Reversals in Professional Basketball, Football, and Hockey, The American Statistician 54: , 94–99.

11 

Glenn, W.A. (1960) , A comparison of the effectiveness of tournaments, Biometrika 47: , 253–262.

12 

Glickman, M.E. (2008) , Bayesian locally optimal design of knockout tournaments, Journal of Statistical Planning and Inference 47: , 2177–2127.

13 

Groh, C. , Moldovanu, B. , Sela, A. , & Sunde, U. (2012) , Optimal seedings in elimination tournaments, Economic Theory 49: , 59–80.

14 

Hochtl, W. , Kerschbamer, R. , Stracke, R. , & Sunde, U. 2010, Optimal design of multi-stage elimination tournaments with heterogeneous agents: theory and experimental evidence. Working Paper.

15 

Horen, J. , & Riezman, R. (1985) , Comparing draws for single elimination tournaments, Operations Reearch 33: , 249–262.

16 

Hwang, F.K. (1982) , New concepts in seeding knockout tournaments, American Mathematical Monthly 89: , 235–239.

17 

Jennessy, J. , & Glickman, M. (2016) , Bayesian optimal design offixed knockout tournament brackets, Journal of Quantitative Analysis in Sports 12: , 1–15.

18 

Marchand, E. (2002) , On the comparison between standard and randomknockout tournaments, The Statistician 51: , 169–178.

19 

National Collegiate Athletic Association. Football Bowl Subdivision records. Http://fs.ncaa.org/Docs/stats/football_records/2014/FBS.pdf, last downloaded January 15, 2014.

20 

OddsShark.com. NCAAF football odds & handicapping database. Http://www.oddsshark.com/ncaaf/database, data downloaded August 29, 2021.

21 

Paine, N. 2014 “How to fix the NFL playoffs.” Http://www.slate.com/articles/sports/sports_nut/2014/01/nfl_playoffs_2014_an_insane_idea_to_ensure_that_the_best_teams_have_the.html, downloaded January 8, 2014.

22 

Ryvkin, D. 2005, The predictive power of noisy elimination tournaments. Technical report, Center for Economic Research and Graduate Education –Economic Institute, Prague.

23 

Sagarin, J. Jeff Sagarin computer ratings archive. Http://usatoday30.usatoday.com/sports/sagarin-archive.htm, each year’s data downloaded August 29, 2021.

24 

Scarf, P. , & Bilbao, M. 2006, The optimal design of sporting contests. Working Paper 320/06, Salford Business School, University of Salford, Manchester, UK.

25 

Scarf, P.A. , & Shi, X. (2008) , The importance of a match in atournament, Computers and Operations Research 35: , 2406–2418.

26 

Schwenk, A.J. (2000) , What is the correct way to seed a knockouttournament? The American Mathematical Monthly 107: , 140–150.

27 

Seals, D.T. (1963) , On the probability of winning with different tournament procedures, Journal of the American Statistical Association 58: , 1064–1081.

28 

Sheremeta, R.M. , & Wu, S.Y. 2012, Testing canonical tournament theory: on the impact of risk, social preferences, and utility structure. Working Paper, Purdue University.

29 

Sokol, J. , March 26, 2010. In a 96-team field, upsets could be a thing of the past. Http://thequad.blogs.nytimes.com/2010/03/26/in-a-96-team-field-upsets-could-be-a-thing-of-the-past/, New York Times.

30 

ThePredictionTracker.com. Computer rating system prediction results for college football (NCAA IA). Http://www.ThePredictionTracker.com/ncaaresults.php, each year’s data downloaded August 29, 2021.

31 

Vu, T.D. 2010, Knockout tournament design: a computational approach. PhD diss., Department of Computer Science, Stanford University.

Notes

1 We note that Sagarin does not publicly archive week-by-week ratings, so we use the only available data, the rankings after the postseason.

2 This is in contrast to not only human rankings like polls, but also algorithmic ratings that use secondary statistics such as yards gained, game progress, etc. A team might lose a game despite appearing to play better and having better secondary statistics; a relevant example might be when LSU beat Alabama in 2011 by a score of 9-6 despite having worse secondary statistics, in part due to Alabama missing four field goals. At the end of the season, LSU and Alabama were ranked as the top two teams in the BCS, and Alabama won the rematch, and the national championship, by a score of 21-0.

3 The Las Vegas line also incorporates factors such as how teams’ specific strengths and weaknesses match up with each other, so “error in team strength estimation” is really an oversimplification. We retain the usage of the term for simplicity, but for the Las Vegas line we mean it to include all factors related to estimating attributes of a team, and excluding in-game randomness.

4 Model 1 of Curry and Sokol (2016) gives a value of 194, and Models 2 and 3 give nearly identical results, each estimating that in-game randomness accounts for 167 of the variance.