Content uploaded by Nicolas W Hengartner

Author content

All content in this area was uploaded by Nicolas W Hengartner

Content may be subject to copyright.

J Stat Phys (2013) 151:458–474

DOI 10.1007/s10955-012-0648-x

Randomness in Competitions

E. Ben-Naim ·N.W. Hengartner ·S. Redner ·F. Vazque z

Received: 20 September 2012 / Accepted: 14 November 2012 / Published online: 27 November 2012

© Springer Science+Business Media New York 2012

Abstract We study the effects of randomness on competitions based on an elementary ran-

dom process in which there is a ﬁnite probability that a weaker team upsets a stronger

team. We apply this model to sports leagues and sports tournaments, and compare the the-

oretical results with empirical data. Our model shows that single-elimination tournaments

are efﬁcient but unfair: the number of games is proportional to the number of teams N,

but the probability that the weakest team wins decays only algebraically with N. In con-

trast, leagues, where every team plays every other team, are fair but inefﬁcient: the top √N

of teams remain in contention for the championship, while the probability that the weak-

est team becomes champion is exponentially small. We also propose a gradual elimination

schedule that consists of a preliminary round and a championship round. Initially, teams

play a small number of preliminary games, and subsequently, a few teams qualify for the

championship round. This algorithm is fair and efﬁcient: the best team wins with a high

probability and the number of games scales as N9/5, whereas traditional leagues require N3

games to fairly determine a champion.

Keywords Competitions ·Social dynamics ·Kinetic theory ·Scaling laws ·Algorithms

E. Ben-Naim ()

Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos,

NM 87545, USA

e-mail: ebn@lanl.gov

N.W. Hengartner

Computing and Computer Science Division, Los Alamos National Laboratory, Los Alamos, NM 87545,

USA

S. Redner

Department of Physics, Boston University, Boston, MA 02215, USA

F. Vazquez

Max-Planck-Institut für Physik Komplexer Systeme, Nöthnitzer Str. 38, 01187 Dresden, Germany

Randomness in Competitions 459

1 Introduction

Competitions play an important role in society [1–4], economics [5], and politics. Further-

more, competitions underlie biological evolution and are replete in ecology, where species

compete for food and resources [6]. Sports are an ideal laboratory for studying competitions

[7–10]. In contrast with evolution, where records are incomplete, the results of sports events

are accurate, complete, and widely available [11,12].

Randomness is inherent to competitions. The outcome of a single match is subject to

a multitude of factors including game location, weather, injuries, etc., in addition to the

inherent difference in the strengths of the opponents. Just as the outcome of a single game is

not predictable, the outcome of a long series of games is also not completely certain. In this

paper, we review1a series of our studies that focus on the role of randomness in competitions

[13–17]. Among the questions we ask are: What is the likelihood that the strongest team

wins a championship? What is the likelihood that the weakest team wins? How efﬁcient are

the common competition formats and how “accurate” is their outcome?

We introduce an elementary model where a weaker team wins against a stronger team

with a ﬁxed upset probability q, and use this elementary random process to analyze a se-

ries of competitions [13]. To help calibrate our model, we ﬁrst determine the favorite and

the underdog from the win-loss record over many years of sports competition from several

major sports. We ﬁnd that the distribution of win percentage approaches a universal scal-

ing function when the number of games and the number of teams are both large. We then

simulate a realistic number of games and a realistic number of teams, and demonstrate that

our basic competition process successfully captures the empirical distribution of win per-

centage in professional baseball [14,15]. Moreover, we study the empirical upset frequency

and observe that this quantity differentiates professional sports leagues, and furthermore,

illuminates the evolution of competitive balance.

Next, we apply the competition model to single-elimination tournaments where, in each

match, the winner advances to the next round and the loser is eliminated [16]. We use the

very same competition rules where the underdog wins with a ﬁxed probability. Here, we

introduce the notion of innate strength and assume that entering the competition, the teams

are ranked. We ﬁnd that the typical rank of the winner decays algebraically with the size of

the tournament. Moreover, the rank distribution for the winner has a power-law tail. Hence,

larger tournaments do produce stronger winners, but nevertheless, even the weakest team

may have a realistic chance of winning the entire tournament. Therefore, tournaments are

efﬁcient but unfair.

Further, we study the league format, where every team plays every other team [17]. We

note that the number of wins for each team performs a biased random walk. Using heuristic

scaling arguments, we establish that the top √Nteams have a realistic chance of becoming

champion, while it is highly unlikely that the weakest teams can win the championship. In

addition, the total number of games required to guarantee that the best team wins is cubic in

N. In this sense, leagues are fair but inefﬁcient.

Finally, we propose a gradual elimination algorithm as an efﬁcient way to determine the

champion. This hybrid algorithm utilizes a preliminary round where the teams play a small

number of games and a small fraction of the teams advance to the next round. The number

of games in the preliminary round is large enough to ensure the stronger teams advance. In

1Most of the results reported in this mini-review appear in [13–17]. For clarity, we present a number of

additional plots including Figs. 1,3,5,8. The scaling arguments in Sect. 4are equivalent to those presented

in [17].

460 E. Ben-Naim et al.

the championship round, each team plays every other team ample times to guarantee that the

strongest team always wins. This algorithm yields a signiﬁcant improvement in efﬁciency

compared to a standard league schedule.

The rest of this paper is organized as follows. In Sect. 2, the basic competition model

is introduced and its predictions are compared with empirical standings data. The notion

of innate team strength is incorporated in Sect. 3, where the random competition process is

used to model single-elimination tournaments. Scaling laws for the league format are derived

in Sect. 4. Scaling concepts are further used to analyze the gradual elimination algorithm

proposed in Sect. 5. Finally, basic features of our results are summarized in Sect. 6.

2 The Competition Model

In our competition model, Nteams participate in a series of games. Two teams compete

head to head and, at the end of each match, one team is declared the winner and the other as

the loser. There are no ties.

To study the effect of randomness on competitions, we consider the scenario where there

is a ﬁxed upset probability qthat a weaker team upsets a stronger team [2,13]. This prob-

ability has the bounds 0 ≤q≤1/2. The lower bound corresponds to predictable games

where the stronger team always wins, and the upper bound corresponds to random games.

We consider the simplest case where the upset probability qdoes not change with time and

is furthermore independent of the relative strengths of the competitors.

In each game, we determine the stronger and the weaker team from current win-loss

records. Let us consider a game between a team with kwins and a team with jwins. The

competition outcome is stochastic: if k>j,

(k, j ) →(k +1,j) with probability p,

(k, j +1)with probability q, (1)

where p+q=1. If k=j, the winner is chosen randomly. Initially, all teams have zero wins

and zero losses.

We use a kinetic framework to analyze the outcome of this random process [18], taking

advantage of the fact that the number of games is a measure of time. We randomly choose

the two competing teams and update the time by t→t+t, with t =1/(2N), after each

competition. With this normalization, each team participates in one competition per unit

time.

Let fk(t) be the fraction of teams with kwins at time t. This probability distribution must

be normalized, kfk=1. In the limit N→∞, this distribution evolves according to

dfk

dt =p(fk−1Fk−1−fkFk)+q(fk−1Gk−1−fkGk)+1

2f2

k−1−f2

k,(2)

for k≥0. Here we also introduced two cumulative distribution functions: Fk=k−1

j=0fjis

the fraction of teams with less than kwins and Gk=∞

j=k+1fjis the fraction of teams

with more than kwins. Of course, Fk+Gk−1=1. The ﬁrst two terms on the right-hand-

side of (2) account for games in which the stronger team wins, and the next two terms

correspond to matches where the weaker team wins. The last two terms account for games

between teams of equal strength (the numerical prefactor is combinatorial). Accounting for

the boundary condition f−1≡0 and summing the rate equations (2), we readily verify that

the normalization kfk=1 is preserved. The initial conditions are fk(0)=δk,0.

Randomness in Competitions 461

In contrast to fk, the cumulative distribution functions obey closed evolution equations.

In particular, the quantity Fkevolves according to [13]

dFk

dt =q(Fk−1−Fk)+1

2−qF2

k−1−F2

k,(3)

which may be obtained by summing (2). The boundary conditions are F0=0andF∞=1,

and the initial condition is Fk(0)=1fork>0. We note that the average number of wins,

k=t/2, where k=kkfk, follows from the fact that each team participates in one com-

petition per unit time and that one win is awarded in each game. As k=kk(Fk+1−Fk),

we can verify that dk/dt =1/2 by summing the rate equations (3).

We ﬁrst discuss the asymptotic behavior when the number of games is very large. In the

limit t→∞, we use the continuum approach and replace the difference equations (3) with

the partial differential equation [19,20]

∂F

∂t +q−(1−2q)F∂F

∂k =0.(4)

According to our model, the weakest team wins at least a fraction qof its games, on average,

and similarly, the strongest team wins no more than a fraction pof its games. Hence, the

number of wins is proportional to time, k∼t. We thus seek the scaling solution

Fk(t) Φk

t.(5)

Here and throughout this paper, the quantity Φ(x) is the scaled cumulative distribution of

win percentage; that is, the fraction of teams that win less than a fraction xof games played.

The boundary conditions are Φ(0)=0andΦ(∞)=1.

We now substitute the scaling form (5)into(4), and ﬁnd that the scaling function satisﬁes

Φ[(x −q) −(1−2q)Φ]=0 where prime denotes derivative with respect to x.Thereare

two solutions: Φ=constant and the linear function Φ=(x −q)/(1−2q). Therefore, the

distribution of win percentages is piecewise linear

Φ(x) =⎧

⎪

⎨

⎪

⎩

00≤x≤q,

x−q

p−qq≤x≤p,

1p≤x.

(6)

As expected, there are no teams with win percentage less than the upset probability q,and

there are no teams with win percentage greater than the complementary probability p.Fur-

thermore, one can verify that x=1/2. The linear behavior in (6) indicates that the actual

distribution of win percentage becomes uniform, Φ=1/(p −q) for q<x<p, when the

number of games is very large.

As shown in Fig. 1, direct numerical integration of the rate equation (4) conﬁrms the scal-

ing behavior (5). Moreover, as the number of games increases, the function Φ(x)approaches

the piecewise-linear function given by Eq. (6). However, there is a diffusive boundary layer

near x=qand x=p, whose width decreases as t−1/2in the long-time limit [19].

Generally, the win percentage is a convenient measure of team strength. For example,

Major League Baseball (MLB) in the United States, where teams play ≈160 games dur-

ing the regular season, uses win percentage to rank teams. The fraction of games won is

preferred over the number of wins because throughout the season there are small variations

between the number of games played by various teams in the league.

462 E. Ben-Naim et al.

Fig. 1 The cumulative

distribution Φ(x) versus win

percentage xfor q=1/4 at times

t=100 and t=500. Also shown

for reference is the limiting

behavior (6)

The piecewise-linear scaling function in (6) holds in the asymptotic limits N→∞and

t→∞. To apply the competition model (1), we must use a realistic number of games and

a realistic number of teams. To test whether the competition model faithfully describes the

win percentage of actual sports leagues, we compared the results of Monte Carlo simula-

tions with historical data for a variety of sports leagues [14,15]. In this paper, we give one

representative example: Major League Baseball.

In our simulations, there are Nteams, each participating in exactly tgames through-

out the season. In each match, two teams are selected at random, and the outcome of the

competition follows the stochastic rule (1): with the upset probability q, the team with the

lower win percentage is victorious, but otherwise, the team with the higher win percentage

wins. At the start of the simulated season, all teams have an identical record. We treated the

upset frequency as a free parameter and found that the value qmodel =0.41 best describes

the historical data for MLB (N=26 and t=162). As shown in Fig. 2, the competition

model faithfully captures the empirical distribution of win percentages at the end of the sea-

son. The latter distribution is calculated from all season-end standings over the past century

(1901–2005).

In addition, we directly measured the actual upset frequency qdata from the outcome of

all ≈163,000 games played over the past century. To calculate the upset frequency, we

chronologically ordered all games and recreated the standings at any given day. Then we

counted the number of games in which the winner was lower in the standings at the time of

the current game. Game location and the margin of victory were ignored. For MLB, we ﬁnd

the value qdata =0.44, only slightly higher than the model estimate qmodel =0.41.

The standard deviation in win percentage, σ,deﬁnedbyσ2=x2−x2, is commonly

used to quantify parity of a sports league [21,22]. For example, in baseball, where the

win percentage typically varies between 0.400 and 0.600, the historical standard deviation

is σ=0.084. From the cumulative distribution (6), it straightforwardly follows that the

standard deviation varies linearly with the upset probability,

σ=1/2−q

√3.(7)

There is an obvious relationship between the predictability of individual games and the

competitive balance of a league: the more random the outcome of an individual game, the

higher the degree of parity between teams in the league.

The standard deviation is a convenient quantity because it requires only year-end stand-

ings, which consist of only Ndata points per season. The upset frequency, on the other

Randomness in Competitions 463

Fig. 2 The cumulative

distribution Φ(x) versus win

percentage xfor: (i) Monte Carlo

simulations of the competition

process (1) with qmodel =0.41,

and (ii) Season-end standings for

Major League Baseball (MLB)

over the past century

(1901–2005)

hand, requires the outcome of each game, and therefore involves a much larger number of

data points, Nt/2 per season. Yet, as a measure for competitive balance, the upset frequency

has an advantage [14,15]. As seen in Fig. 3, the quantity σconsists of two contributions:

one due to the intrinsic nature of the game and one due to the ﬁnite length of the season. For

example, the large standard deviation σ=0.21 in the National Football League (NFL) is in

large part due to the extremely short season, t=16. Therefore, the upset frequency, which

is decoupled from the length of the season, provides a more accurate measure of competitive

balance [23–27].

The evolution of the upset frequency over time is truly fascinating (Fig. 4). Although

qvaries over a narrow range, this quantity can differentiate the four sports leagues. The

historical data shows that MLB has consistently had the least predictable games, while NBA

and NFL games have been the most predictable. The trends for qfor these sports leagues are

even more interesting. Certain sports leagues (MLB and to a larger extent, NFL) managed to

increase competitiveness by changing competition formats, increasing the number of teams,

having unbalanced schedules where stronger teams play more challenging opponents, or

using a draft where the weakest team can ﬁrst pick the most promising upcoming talent.

In spite of the fact that NHL and NBA implemented some of these same measures to

increase competitiveness, there are no clear long-term trends in the evolution of the upset

probability in these two leagues. Another plausible interpretation of Fig. 4is that the sports

leagues are striving to achieve an optimal upset frequency of q≈0.4. One may even spec-

ulate that the various sports leagues compete against each other to attract public interest,

and that making the games less predictable, and hence, more interesting to follow is a key

objective in this evolutionary-like process [6,29,30]. In any event, the upset frequency is a

natural and transparent measure for the evolution of competitive balance in sports leagues.

The random process (1) involves only a single parameter, q. The model does not take into

account many aspects of real competitions including the game score, the game location, the

relative team strength, and the fact that in many sports leagues the schedule is unbalanced,

as teams in the same geographical region may face each other more often. Nevertheless,

with appropriate implementation, the competition model speciﬁed in Eq. (1) captures basic

characteristics of real sports leagues. In particular, the model can be used to estimate the

distribution of team win percentages as well as the upset frequency.

464 E. Ben-Naim et al.

Fig. 3 The standard deviation σ

as a function of time t.Shown

are results of numerical

integration of the rate equation

(2) with q=1/4. Also shown for

reference is the limiting value

σ∞=1/(4√3)

3 Single Elimination Tournaments

Thus far, our approach did not include the notion of innate team strength. Randomness

alone controlled which team reaches the top of the standings and which teams reaches at

the bottom. Indeed, the probability that a given team has the best record at the end of the

season equals 1/N . Furthermore, we have used the cumulative win-loss record to deﬁne

team strength. However, this deﬁnition can not be used to describe tournaments where the

number of games is small.

We now focus on single-elimination tournaments, where the winner of a game advances

to the next round of play while the loser is eliminated [16,31]. A single-elimination tour-

nament is the most efﬁcient competition format: a tournament with N=2rteams requires

only N−1 games through rrounds of play to crown a champion. In the ﬁrst round, there

are Nteams and the N/2 winners advance to the next round. Similarly, the second round

produces N/4 winners. In general, the number of competitors is cut by half at each round

N→N/2→N/4→···→2→1.(8)

In many tournaments, for example, the NCAA college basketball tournament in the

United States or in tennis championships, the competitors are ranked according to some

predetermined measure of their strength. Thus, we introduce the notion of rank into our

modeling framework. Let xibe the rank of the ith team with

x1<x

2<x

3<···<x

N.(9)

In our deﬁnition, a team with lower rank is stronger. Rank measures innate strength, and

hence, it does not change with time. Since ranking is strict, we use the uniform ranking

scheme xi=i/N without loss of generality.

Again, we assume that there is a ﬁxed probability qthat the underdog wins the game, so

that the outcome of each match is stochastic. When a team with rank x1faces a team with

rank x2,wehave

(x1,x

2)→x1with probability p,

x2with probability q, (10)

when x1<x

2. The important difference with (1) is that the losing team is now eliminated.

Randomness in Competitions 465

Fig. 4 Evolution of the upset

frequency qwith time. Shown is

data [28] for: (i) Major League

Baseball (MLB), (ii) the National

Hockey League (NFL), (iii) the

National Basketball Association

(NBA), and (iv) the National

Football League (NFL). The

quantity qis the cumulative upset

frequency for all games played in

the league up to the given year. In

football, a tie counts as one half

of a win

Let w1(x) be the distribution of rank for all competitors. This quantity is normalized,

∞

0dxw1(x) =1. In a two-team tournament, the rank distribution of the winner, w2(x),is

given by

w2(x) =2pw1(x)1−W1(x)+2qw1(x)W1(x), (11)

where W1(x) =x

0dy w1(y) is the cumulative distribution of rank. The structure of this

equation resembles that of (2), with the ﬁrst term corresponding to games where the fa-

vorite advances, and the second term to games where the underdog advances. Mathemati-

cally, there is a basic difference with Eq. (2) in that Eq. (11) does not contain loss terms.

Again, ties are not allowed to occur. By integrating (11), we obtain the closed equation

W2(x) =2pW1(x) +(1−2p)[W1(x)]2.

In general, the cumulative distribution obeys the nonlinear recursion equation

W2N(x) =2pWN(x) +(1−2p)WN(x)2.(12)

Here, WN(x) =x

0dywN(y),andwN(x) is the rank distribution for the winner of an N-

team tournament. The boundary conditions are WN(0)=0andWN(∞)=1. The prefactor 2

arises because there are two ways to choose the winner. The quadratic nature of Eq. (12)re-

ﬂects that two teams compete in each match (competitions with three teams are described

by cubic equations [32–34]). Starting with W1(x) =xthat corresponds to uniform ranking,

w1(x) =1, we can follow how the distribution of rank evolves by iterating the recursion

equation (12). As shown in Fig. 5, the rank of the winner decreases as the size of the tour-

nament increases. Hence, larger tournaments produce stronger winners.

By substituting W1(x) =xinto Eq. (12), we ﬁnd W2(x ) =(2p)x and in general,

WN(x) =(2p)rx. This behavior suggests the scaling form

WN(x) Ψ(x/x

∗), (13)

where the scaling factor x∗is the typical rank of the winner. This quantity decays alge-

braically with the size of the tournament,

x∗=N−β,β=ln(2p)

ln2 .(14)

When games are perfectly random (upset probability q=1/2), the typical rank of the winner

becomes independent of the number of teams, β(q =1/2)=0. When the games are highly

466 E. Ben-Naim et al.

Fig. 5 The cumulative

distribution of rank. The quantity

WN(x) is calculated by iterating

Eq. (12) with q=1/4

predictable, the top teams tend to win the tournament, β(0)=1. Again, the scaling behavior

(14) shows that larger tournaments tend to produce stronger champions.

By substituting (13)into(12), we see that the scaling function Ψ(z) obeys the nonlocal

and nonlinear equation

Ψ(2pz) =2pΨ (z) +(1−2p)Ψ 2(z). (15)

The boundary conditions are Ψ(0)=0andΨ(∞)=1. From Eq. (15), we deduce the

asymptotic behaviors

Ψ(z)zz→0,

1−Cz

γz→∞,(16)

with the scaling exponent γ=ln(2q)

ln(2p) .Thelarge-zbehavior is obtained by substituting

Ψ(z)=1−U(z) into (15) and noting that since U→0whenz→∞, the correction obeys

the linear equation U(2pz) =2qU(z).

The large-zbehavior of the scaling function Ψ(z) gives the likelihood that a very weak

team manages to win the entire tournament. The scaling behavior (13) is equivalent to

wN(x) (1/x∗)ψ (x/x∗)with ψ(z) =Ψ(z). In the limit z→0, the distribution approaches

a constant ψ(z) →1. However, the tail of the rank distribution is algebraic

ψ(z) ∼z−α,α=1−ln(2q)

ln(2p),(17)

when z→∞. The exponent α>1 increases monotonically with p, and it diverges in the

limit p→1.2

Moreover, the probability that the weakest team wins the tournament, PN=qN, decays

algebraically with the total number of teams, PN=Nln q/ln 2. In the following section, we

discuss sports leagues and ﬁnd that: (i) the rank distribution of the winner has an expo-

nential tail, and (ii) the probability that the weakest team is crowned league champion is

exponentially small.

The scaling behavior (13) indicates universal statistics when the size of the tournament

is sufﬁciently large. Once rank is normalized by typical rank, the resulting distribution does

2For deterministic competitions, q=0, the scaling function is exponential ψ(z) =e−z.

Randomness in Competitions 467

Fig. 6 The cumulative

distribution of rank for the

NCAA college basketball

tournament. Shown is the

cumulative distribution W16(x)

versus the rank xfor (i) NCAA

tournament data (1979–2006),

(ii) iteration of Eq. (12)

not depend on tournament size. Further, the scaling law (14) and the power-law tail (17)

reﬂect that tournaments can produce major upsets. With a relatively small number of upset

wins, a “Cinderella” team can emerge, and for this reason, tournaments can be very excit-

ing. Furthermore, tournaments are maximally efﬁcient as they require a minimal number of

games to decide a champion.

Figure 6shows that our theoretical model nicely describes empirical data [28]forthe

NCAA college basketball tournament in the United States [16]. In the current format, 64

teams participate in four sub-tournaments, each with N=16 teams. The four winners of

each sub-tournament advance to the ﬁnal four, which ultimately decides the champion. Prior

to the tournament, a committee of experts ranks the teams from 1 to 16. We note that the

game schedule is not random, and is designed such that the top teams advance if there are

no upsets.

Consistent with our theoretical results, the NCAA tournament has been producing major

upsets: the 11th seed team has advanced to the ﬁnal four twice over the past 30 years.

Moreover, only once did all of the four top-seeded teams advance simultaneously (2008).

Our model estimates the probability of this event at 1/190, a ﬁgure that is of the same order

of magnitude as the observed frequency 1/132.

We also mention that in producing the theoretical curve in Fig. 6, we used the upset

frequency qmodel =0.18, whereas the actual game results yield qdata =0.28. This larger dis-

crepancy (compared with the MLB analysis above) is due to a number of factors including

the much smaller dataset (≈7000 games) and the non-random game schedule. Indeed, our

Monte Carlo simulations which incorporate a realistic schedule give better estimates for the

upset frequency [16].

4 Leagues

We now discuss the common competition format in which each team hosts every other team

exactly once during the season. This format, ﬁrst used in English soccer, has been adopted in

many sports. In a league of size N, each team plays 2(N −1)games and the total number of

games equals N(N −1). Given this large number of games, does the strongest team always

wins the championship?

To answer this question, we assume that each team has an innate strength and rank the

teams according to strength. Without loss of generality, we use the uniform rank distribution

468 E. Ben-Naim et al.

w(x) =1 and its cumulative counterpart W(x)=xwhere 0 ≤x≤1. Moreover, we implic-

itly take the large-Nlimit. Consider a team with rank x. The probability v(x) that this team

wins a game against a randomly-chosen opponent decreases linearly with rank,

v(x) =p−(2p−1)x, (18)

as follows from v(x) =p[1−W1(x)]+qW1(x) [see also Eq. (11)]. Consistent with our

competition rules (1)and(10), the probability v(x) satisﬁes q≤v≤p.

Since team strength does not change with time, the average number of wins V(x,t) for

a team with rank xgrows linearly with the number of games t,

V(x,t)=v(x)t. (19)

Accordingly, the number of wins of a given team performs a biased random walk: after each

game the number of wins increases by one with probability v, and remains unchanged with

the complementary probability 1 −v. Also, the uncertainty in the number of wins, V ,

grows diffusively with t,

V (x , t) √Dt, (20)

with diffusion coefﬁcient D=v(1−v) [18].

Let us assume that each team plays tgames. If the number of games is sufﬁciently large,

the best team has the most wins. However, at intermediate times, it is possible that a weaker

team has the most wins. For a team with strength x∗to still be in contention at time t,

the difference between its expected number of wins and that of the top team should be

comparable with the diffusive uncertainty

V(0,t)−V(x

∗,t)∼V (0,t). (21)

We now substitute Eqs. (18)–(20) into this heuristic estimate and obtain the typical rank of

the leader as a function of time,

x∗∼1

√t.(22)

In obtaining this estimate, we tacitly ignored numeric prefactors, including in particular, the

dependence on q.

This crude estimate (22) shows that the best team does not always win the league cham-

pionship. Since t∼N,wehave

x∗∼1

√N.(23)

Since rank is a normalized quantity, the top √Nof the teams have a realistic chance of

emerging with the best record at the end of the season. Thus randomness plays a crucial role

in determining the champion: since the result of an individual game is subject to randomness,

the outcome of a long series of games reﬂects this randomness.

We can also obtain the total number of games Tneeded for the best team to always

emerge as the champion,

T∼N3.(24)

This scaling behavior follows by replacing x∗in (22) with 1/N which corresponds to the

best team. For the best team to win, each team must play every other team O(N) times!

Randomness in Competitions 469

Fig. 7 The total number of

games Tneeded for the best

team to emerge as champion in a

league of size N. The simulation

results represent an average over

103simulated sports leagues.

Also shown for reference is the

theoretical prediction

Alternatively the number of games played by each team scales quadratically with the size

of the league. Clearly, such a schedule is prohibitively long, and we conclude that the tradi-

tional schedule of playing each opponent with equal frequency is neither efﬁcient nor does

it guarantee the best champion.

We conﬁrmed the scaling law (24) numerically. In our Monte Carlo simulations, the

teams are ranked from 1 to Nat the start of the season. We implemented the traditional

league format where every team plays every other team and kept track of the leader deﬁned

as the team with the best record. We then measured the last-passage time [35], that is, the

time in which the best team takes the lead for good. We deﬁne the average of this ﬂuctuating

quantity as T[36,37].AsshowninFig.7, the total number of games required is cubic.

Again, we expect that the probability distribution w(x,t) that a team with rank xhas the

best record after tgames is characterized by the scale x∗given in (22)

w(x, t) (1/x∗)ϕ(x/x∗). (25)

Numerical results conﬁrm this scaling behavior [17]. Since the number of wins performs a

biased random walk, we expect that the distribution of the number of wins becomes normal

in the long-time limit. Moreover, the scaling function in (25) has a Gaussian tail [17]

ϕ(z) ∼exp−const.×z2,(26)

as z→∞.

Using this scaling behavior, we can readily estimate the probability that worst team be-

comes champion (in the standard league format). For the worst team, x∼1, and the corre-

sponding scaling variable in Eq. (25)isz∼√N. Hence, the Gaussian tail (26)showsthat

the probability PNthat the weakest team wins the league is exponentially small,

PN∼exp(−const.×N). (27)

In sharp contrast with tournaments, where this probability is algebraic, leagues do not pro-

duce upset champions. Leagues may not guarantee the absolute top team as champion, but

nevertheless, they do produce worthy champions.

To compare leagues and tournaments, we calculated the probability Pnthat the nth

ranked team is champion for a realistic number of games N=16 and a realistic upset

probability q=0.4(Fig.8). For leagues, we calculated this probability from Monte Carlo

470 E. Ben-Naim et al.

Fig. 8 Leagues versus

tournaments. Shown is Pn,the

probability that the nth-ranked

team has the best record at the

end of the season in the format of

playing all opponents with equal

frequency, and the probability

that the nth-ranked team wins an

N-team single-elimination

tournament. The upset

probability is q=0.4and

N=16

simulations, and for tournaments, we used Eq. (12). Indeed, the top four teams fare better

in a league format while the rest of the teams are better off in a tournament. This behavior

is fully consistent with the above estimate that the top √Nteams have a realistic chance to

win the league.

What is the probability Ptop that the top team ends the season with the best record in a re-

alistic sports league? To answer this question, we investigated the four major sports leagues

in the US: MLB, NHL, NFL, and NBA. We simulated a league with the actual number

of teams Nand the actual number of games t, using the empirical upset frequencies (see

Fig. 3). All of these sports leagues have comparable number of teams, N≈25. Surpris-

ingly, we ﬁnd almost identical probabilities for three of the sports leagues: (i) MLB with

the longest season and most random games (t=162, q=0.44) has Ptop =0.31, (ii) NFL

with the shortest season but most deterministic games (t=16, q=0.37) has Ptop =0.30,

and (iii) NHL with intermediate season and intermediate randomness (t=80, q=0.41) has

Ptop =0.32. Standing out as an anomaly is the value Ptop =0.45 for the NBA which has a

moderate-length season but less random games (t=80 and q=0.37).

This interesting result reinforces our previous comments about sports leagues competing

against each other for interest and our hypothesis that there are optimal randomness param-

eters. Having a powerhouse win every year does not serve the league well, but having the

strongest team ﬁnish with the best record once every three years may be optimal.

5 Gradual Elimination Algorithm

Our analysis demonstrates that single-elimination tournaments have optimal efﬁciency but

may produce weak champions, whereas leagues which result in strong winners are highly

inefﬁcient. Can we devise a competition “algorithm” that guarantees a strong champion

within a minimal number of games?

As an efﬁcient algorithm, we propose a hybrid schedule consisting of a preliminary round

and a championship round [17]. The preliminary round is designed to weed out a majority

of teams using a minimal number of games, while the championship round includes ample

games to guarantee the best team wins.

In the preliminary round, every team competes in tgames. Whereas the league schedule

has complete graph structure with every team playing every other team, the preliminary

round schedule has regular random graph structure with each team playing against the same

number of randomly-chosen opponents. Out of the Nteams, the Mteams with the largest

Randomness in Competitions 471

number of wins in the preliminary-round advance to the championship round. The number

of games tis chosen such that the strongest team always qualiﬁes. By the same heuristic

argument (21) leading to (22), the top team ranks no lower than 1/√tafter tgames. We

thus require

M

N∼1

√t,(28)

and consequently, each team plays ∼(N/M )2preliminary games. The championship round

uses a league format with each of the Mqualifying teams playing Mgames against every

other team. Therefore, the total number of games, T, has two components

T∼N3

M2+M3.(29)

In writing this estimate, we ignore numeric prefactors, as well as the dependence on the

upset frequency q. The quantity Tis minimal when the two terms in (29) are comparable

[38]. Hence, the size of the championship round M1and the total number of games T1scale

algebraically with N,

M1∼N3/5,and T1∼N9/5.(30)

Consequently, each team plays O(N4/5)games in the preliminary round. Interestingly, the

existence of a preliminary round signiﬁcantly reduces the number of games from N3to

N9/5. Without sacriﬁcing the quality of the champion, the hybrid schedule yields a huge

improvement in efﬁciency!

We can further improve the efﬁciency by using multiple elimination rounds. In this gen-

eralization, there are k−1 consecutive rounds of preliminary play culminating in the cham-

pionship round. The underlying graphical structure of the preliminary rounds is always a

regular random graph, while the championship round remains a complete graph. Each pre-

liminary round is designed to advance the top teams, and the number of games is sufﬁciently

large so that the top team advances with very high probability. When there are krounds, we

anticipate the scaling laws

Mk∼Nνk,and Tk∼Nμk,(31)

where Mkis the number of teams advancing out of the ﬁrst round and Tkis the total number

of games. Of course, when there are no preliminary rounds, ν0=1andμ0=3. Following

Eq. (31), the number of teams gradually declines in each round,

N→Nνk→Nνkνk−1→···→Nνkνk−1···ν1→1.(32)

According to the ﬁrst term in (29), the number of games in the ﬁrst round scales as

N3/M2

k∼N3−2νk, and therefore, the total number of games obeys the recursion

Tk∼N3−2νk+Tk−1Nνk.(33)

Indeed, if we replace M1with Nν1in Eq. (29) we can recognize the recursion (33). The sec-

ond term scales as Nνkμk−1and becomes comparable to the second when 3 −2νk=νkμk−1.

Hence, the scaling exponents satisfy the recursion relations

νk=3

2+μk−1,and μk=μk−1νk.(34)

472 E. Ben-Naim et al.

Tab le 1 The exponents νkand

μkin Eq. (31)fork≤4k01 2 3 4 ∞

νk03

515

19 57

65 195

211 1

μk39

527

19 81

65 243

211 1

Using ν0=1andμ0=3, we recover ν1=3/5andμ1=9/5 in agreement with (30). The

general solution of (34)is[17]

νk=1−(2/3)k

1−(2/3)k+1,μ

k=1

1−(2/3)k+1.(35)

Hence, the efﬁciency is optimal, and the number of games becomes linear in the limit

k→∞. For a modest number of teams, a small number of preliminary rounds, say 1–3

rounds, may sufﬁce. As shown in Table 1, with as few as four elimination rounds, the num-

ber of games becomes essentially linear, μ4∼

=1.15.

Interestingly, the result μ∞=1 indicates that championship rounds or “playoffs” have

the optimal size M∗given by

M∗∼N1/3.(36)

Gradual elimination is often used in the arts and sciences to decide winners of design compe-

titions, grant awards, and prizes. Indeed, the selection process for prestigious prizes typically

begins with a quick glance at all nominees to eliminate obviously weak candidates, but con-

cludes with rigorous deliberations to select the winner. Multiple elimination rounds may be

used when the pool of candidates is very large.

To verify numerically the scaling laws (30), we simulated a single preliminary round

followed by a championship round. We chose the size of the preliminary round strictly

according to (31) and used a championship round where all M1teams play against all M1

teams exactly M1times. We conﬁrmed that as the number of teams increases from N=101

to 102,to10

3, etc., the probability that the best team emerges as champion is not only high

but also, independent of N. We also conﬁrmed that the concept of preliminary rounds is

useful for small N.ForN=10 teams, the number of games can be reduced by a factor

>10 by using a single preliminary round.

6 Discussion

We introduced an elementary competition model in which a weaker team can upset a

stronger team with ﬁxed probability. The model includes a single control parameter, the

upset frequency, a quantity that can be measured directly from historical game results. This

idealized competition model can be conveniently applied to a variety of competition formats

including tournaments and leagues. The random competition process is amenable to theo-

retical analysis and is straightforward to implement in numerical simulations. Qualitatively,

this model explains how tournaments, which require a small number of games, can produce

major upsets, and how leagues which require a large number of games always produce qual-

ity champions. Additionally, the random competition process enables us to quantify these

intuitive features: the rank distribution of the champion is algebraic in the former schedule

but Gaussian in the latter.

Randomness in Competitions 473

Using our theoretical framework, we also suggested an efﬁcient algorithm where the

teams are gradually eliminated following a series of preliminary rounds. In each preliminary

round, the number of games is sufﬁcient to guarantee that the best team qualiﬁes to the

next round. The ﬁnal championship round is held in a league format in which every team

plays many games against every other team to guarantee that the strongest team emerges as

champion. Using gradual elimination, it is possible to choose the champion using a number

of games that is proportional to the total number of teams. Interestingly, the optimal size of

the championship round scales as the one third power of the total number of teams.

The upset frequency plays a major role in our model. Our empirical studies show that

the frequency of upsets, which shows interesting evolutionary trends, is effective in differ-

entiating sports leagues. Moreover, this quantity has the advantage that it is not coupled

to the length of the season, which varies widely from one sport to another. Nevertheless,

our approach makes a very signiﬁcant assumption: that the upset frequency is ﬁxed and

does not depend on the relative strength of the competitors. Certainly, our approach can

be generalized to account for strength-dependent upset frequencies [39]. We note that our

single-parameter model fares better when the games tend to be close to random, and that

model estimates for the upset frequency have larger discrepancies with the empirical data

when the games become more predictable. Clearly, a more sophisticated set of competition

rules are required when the competitors are very close in strength, as is the case for example,

in chess [40].

Acknowledgements We thank Micha Ben-Naim for help with data collection. We acknowledge support

from DOE (DE-AC52-06NA25396) and NSF (DMR0227670, DMR0535503, & DMR-0906504).

References

1. Castellano, C., Fortunato, S., Loreto, V.: Rev. Mod. Phys. 81, 591 (2009)

2. Bonabeau, E., Theraulaz, G., Deneubourg, J.-L.: Physica A 217, 373 (1995)

3. Ben-Naim, E., Redner, S.: J. Stat. Mech. 2005, L11002 (2005)

4. Malarz, K., Stauffer, D., Kulakowski, K.: Eur. Phys. J. B 50, 195 (2006)

5. Axtell, R.L.: Science 293, 5536 (2001)

6. Gould, S.J.: Full House: The Spread of Excellence from Plato to Darwin. Harmony Books, New York

(1996)

7. Park, J., Newman, M.E.J.: J. Stat. Mech. 2005, P10014 (2005)

8. Petersen, A.M., Jung, W.S., Stanley, H.E.: Europhys. Lett. 83, 50010 (2008)

9. Heuer, A., Mueller, C., Rubner, O.: Europhys. Lett. 89, 38007 (2010)

10. Radicchi, F.: PLoS ONE 6, e17249 (2011)

11. Albert, J., Bennett, J., Cochran, J.J. (eds.): Anthology of Statistics in Sports. SIAM, Philadelphia (2005)

12. Gembris, D., Taylor, J.G., Suter, D.: Nature 417, 506 (2002)

13. Ben-Naim, E., Vazquez, F., Redner, S.: Eur. Phys. J. B 26, 531 (2006)

14. Ben-Naim, E., Vazquez, F., Redner, S.: J. Quant. Anal. Sports 2(4), 1 (2006)

15. Ben-Naim, E., Vazquez, F., Redner, S.: J. Korean Phys. Soc. 50, 124 (2007)

16. Ben-Naim, E., Redner, S., Vazquez, F.: Europhys. Lett. 77, 30005 (2007)

17. Ben-Naim, E., Hengartner, N.W.: Phys. Rev. E 76, 026106 (2007)

18. Krapivsky, P.L., Redner, S., Ben-Naim, E.: A Kinetic View of Statistical Physics. Cambridge University

Press, Cambridge (2010)

19. Whitham, G.B.: Linear and Nonlinear Waves. Wiley, New York (1974)

20. Burgers, J.M.: The Nonlinear Diffusion Equation. Reidel, Dordrecht (1974)

21. Fort, R., Quirk, J.: J. Econ. Lit. 33, 1265 (1995)

22. Fort, R., Maxcy, J.: J. Sports Econ. 4, 154 (2003)

23. Wesson, J.: The Science of Soccer. IOP, Bristol and Philadelphia (2002)

24. Lundh, T.: J. Quant. Anal. Sports 2(3), 1 (2006)

25. Stern, H.S.: Am. Stat. 45, 179 (1991)

26. Stern, H.S.: Chance 10, 19 (1997)

474 E. Ben-Naim et al.

27. Stern, H.S., Mock, B.R.: Chance 11, 26 (1998)

28. Data source. http://www.shrpsports.com/

29. Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press,

Cambridge (1998)

30. Lieberman, E., Hauert, Ch., Nowak, M.A.: Nature 433, 312 (2005)

31. Fink, T.M.A., Coe, J.B., Ahnert, S.E.: Europhys. Lett. 83, 60010 (2008)

32. Ben-Naim, E., Kahng, B., Kim, J.S.: J. Stat. Mech. 2005, P07001 (2006)

33. Mungan, M., Rador, T.: J. Phys. A 41, 055002 (2008)

34. Rador, T., Derici, R.: Eur. Phys. J. B 83, 289 (2011)

35. Redner, S.: A Guide to First-Passage Processes. Cambridge University Press, Cambridge (2001)

36. Krapivsky, P.L., Redner, S.: Phys. Rev. Lett. 89, 258703 (2002)

37. Ben-Naim, E., Krapivsky, P.L.: Europhys. Lett. 65, 151 (2004)

38. de Gennes, P.G.: Scaling Concepts in Polymer Physics. Cornell University Press, Ithaca (1979)

39. Sire, C., Redner, S.: Eur. Phys. J. B 67, 473 (2009)

40. Glickman, M.E.: Am. Chess J. 3, 59 (1995)