Content uploaded by Sidney Redner
Author content
All content in this area was uploaded by Sidney Redner on Oct 03, 2014
Content may be subject to copyright.
Journal of the Korean Physical Society, Vol. 50, No. 1, January 2007, pp. 124∼126 Review Articles
What is the most Competitive Sport?
E. Ben-Naim
∗
Theoretical Division and Center for Nonlinear Studies,
Los Alamos National Laboratory, Los Alamos, New Mexico 87545
F. Vazquez and S. Redner
Theoretical Division and Center for Nonlinear Studies,
Los Alamos National Laboratory, Los Alamos, New Mexico 87545 and
Department of Physics, Boston University, Boston, Massachusetts 02215
(Received 18 August 2006)
We present an extensive statistical analysis of the results of all sports competitions in five major
sp orts leagues in England and the United States. We characterize the parity among teams by the
variance in the winning fraction from season-end standings data and quantify the predictability of
games by the frequency of upsets from game results data. We introduce a mathematical model in
which the underdog team wins with a fixed upset probability. This model quantitatively relates the
parity among teams with the predictability of the games, and the model can be used to estimate
the upset frequency from standings data. We propose the likelihood of upsets as a measure of
comp etitiveness.
PACS numb ers: 01.50.Rt, 02.50.-r, 05.40.-a, 89.75.Da
Keywords: Sports statistics, Stochastic processes, Competition
What is the most competitive team sport? We an-
swer this question via a statistical survey of an exten-
sive dataset of game results [1–3]. We relate parity with
predictability and propose the likelihood of upsets as a
measure of competitiveness.
We studied the results of all regular season competi-
tions in five major professional sports leagues in Eng-
land and the United States (Table 1): the premier soc-
cer league of the English Football Association (FA), Ma-
jor League Baseball (MLB), the National Hockey League
(NHL), the National Basketball Association (NBA), and
the National Football League (NFL). NFL data include
the short-lived AFL. We considered only complete sea-
sons; these data are comprised of more than 300,000
games over a century [4].
The winning fraction, the ratio of wins to total games,
quantifies the team’s strength [5]. Thus, the distribu-
tion of the winning fraction measures the parity between
teams in a league. We computed F (x), the fraction of
teams with a winning fraction of x or lower at the end
of the season, as well as σ =
p
hx
2
i −hxi
2
, the stan-
dard deviation of the winning fraction. Here h·i denotes
the average over all teams and all years using season-end
standings. For example, in baseball where the winning
fraction x typically falls between 0.400 and 0.600, the
variance is σ = 0.084. As Figs. 1 and 2(a) show, the win-
∗
E-mail: ebn@lanl.gov
ning fraction distribution clearly distinguishes the five
leagues. It is narrowest for baseball, and widest for foot-
ball.
Do these results imply that MLB games are the most
competitive and NFL games the least? Not necessarily!
The length of the season is a significant factor in the
variability in the winning fraction. In a scenario where
the outcome of a game is completely random, the to-
tal number of wins performs a simple random walk, and
the standard deviation σ is inversely proportional to the
square root of the number of games played. Generally,
the shorter the season, the larger σ. Thus, the small
Table 1. Summary of the sports statistics data. Listed are
the time periods, total number of games, average number of
games played by a team in a season (hgamesi), variance in
the win-percentage distribution (σ), measured frequency of
upsets (q), and upset probability obtained using the theoret-
ical model (q
model
). The fraction of ties in soccer, hockey,
and football is 0.246, 0.144, and 0.016, respectively.
league years games hgamesi σ q q
model
FA 1888 – 2005 43350 39.7 0.102 0.452 0.459
MLB 1901 – 2005 163720 155.5 0.084 0.441 0.413
NHL 1917 – 2004 39563 70.8 0.120 0.414 0.383
NBA 1946 – 2005 43254 79.1 0.150 0.365 0.316
NFL 1922 – 2004 11770 14.0 0.210 0.364 0.309
-124-
What is the most Competitive Sport? – E. Ben-Naim et al. -125-
0 0.2 0.4
0.6
0.8 1
x
0
0.2
0.4
0.6
0.8
1
F(x)
NFL
NBA
NHL
MLB
(a)
Fig. 1. Winning fraction distribution (curves) and the
b est-fit distributions from simulations of our model (circles).
For clarity, FA, that lies between MLB and NHL, is not dis-
played.
number of games is partially responsible for the large
variability observed in the NFL.
To account for the varying season length and to re-
veal the true nature of the sport, we set up mock sports
leagues where teams, paired at random, play a fixed num-
ber of games. In this simulation model, the team with
the better record is considered as the favorite, and the
team with the worse record is considered as the under-
dog. The outcome of a game depends on the relative
team strengths: with “upset probability” q < 1/2, the
underdog wins, but otherwise, the favorite wins. Our
analysis of the nonlinear master equations that describe
the evolution of the distribution of team win/loss records
shows that σ decreases both as the season length in-
creases and as games become more competitive, i.e., as
q increases [6,7]. In a hypothetical season with an infi-
nite number of games, the winning fraction distribution
is uniform in the range q < x < 1 −q, and as a result,
σ =
1/2 −q
√
3
. (1)
We run Monte Carlo simulations of these artificial
sports leagues, with sport-specific numb er of games and
a range of q values. We then determine the value of q that
gives the best match between the distribution F (x) from
the simulations to the actual sports statistics (Fig. 1).
Generally, we find good agreement between the simu-
lated results and the data for reasonable q values.
To characterize the predictability of games, we fol-
lowed the chronologically-ordered results of all games
and reconstructed the league standings at any given day.
We then measured the upset frequency q by counting the
fraction of times that the team with the worse record
on the game date actually won (Table 1). Games be-
tween teams with no record (start of a season) or teams
with equal records were disregarded. The game location
was ignored and so was the margin of victory. In soc-
cer, hockey, and football, ties were counted as 1/2 of a
victory for both teams. We verified that handling ties
1900 1920 1940
1960
1980 2000
year
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
0.26
0.28
σ
NFL
NBA
NHL
MLB
FA
(a)
1900 1920 1940
1960
1980 2000
year
0.30
0.32
0.34
0.36
0.38
0.40
0.42
0.44
0.46
0.48
q
FA
MLB
NHL
NBA
NFL
(b)
Fig. 2. (a) The cumulative variance in the winning fraction
distribution (for all seasons up to a given year) versus time.
(b) The cumulative frequency of upsets q, measured directly
from game results, versus time.
this way did not significantly affect the results: the up-
set probability changes by at most 0.02 (and typically,
much less) if ties are ignored.
We find that soccer and baseball are the most com-
petitive sports with q = 0.452 and q = 0.441, respec-
tively, while basketball and football, with nearly identi-
cal q = 0.365 and q = 0.364, are the least. There is also
good agreement between the upset probability, q
model
,
obtained by fitting the winning fraction distribution from
numerical simulations of our model to data as in Fig. 1,
and the measured upset frequency (Table 1). Consistent
with our theory, the variance σ mirrors the bias, 1/2 −q
(Figs. 2a and 2b). Tracking the evolution of either q or
σ leads to the same conclusion: NFL and MLB games
[8] are becoming more competitive while over the past
60 years, FA displays an opposite trend.
In summary, we propose a single quantity, q, the fre-
quency of upsets, as an index for quantifying the pre-
dictability and, hence, the competitiveness of sports
games. This measure [9] complements the existing quan-
titative measures of competitiveness [10–12]. In our
view, a league in which weak teams have a good chance
to defeat strong teams is competitive, but of course,
there are other sensible measures of competitiveness. We
demonstrated the utility of this measure via a compar-
-126- Journal of the Korean Physical Society, Vol. 50, No. 1, January 2007
ative analysis that shows that soccer and baseball are
the most competitive sports. Trends in this measure
may reflect the gradual evolution of teams in response
to competitive pressure [8], as well as changes in game
strategy or rules [13, 14].
Our model, in which the stronger team is favored to
win a game [6,7], enables us to take into account the vary-
ing season length, and this model directly relates parity,
as measured by the variance σ, with predictability, as
measured by the upset likelihood q. This connection has
practical utility as it allows one to conveniently estimate
the likelihood of upsets from the more easily-accessible
standings data. In our theory, all teams are equal at the
start of the season, but by chance, some end up strong
and some weak. Our idealized model does not include the
notion of innate team strength; nevertheless, the spon-
taneous emergence of disparate-strength teams provides
the crucial mechanism needed for quantitative modeling
of the complex dynamics of sports competitions.
ACKNOWLEDGMENTS
We thank Micha Ben-Naim for assistance in data col-
lection and acknowledge support from the Department
of Energy (W-7405-ENG-36) and the National Science
Foundation (DMR0535503).
REFERENCES
[1] H. Stern, The American Statistician 45, 179 (1991).
[2] D. Gembris, J. G. Taylor and D. Suter, Nature 417, 506
(2002).
[3] Anthology of Statistics in Sports, edited by J. Albert,
J. Bennett and J. J. Cochran (SIAM, Philadelphia,
2005).
[4] http://www.shrpsports.com/, http://www.the-english-
fo otball-archive.com/.
[5] R. Fort and J. Quirk, J. Econ. Liter. 33 , 1265 (1995).
[6] E. Ben-Naim, F. Vazquez and S. Redner, Eur. Phys.
Jour. B. 49, 531 (2006).
[7] E. Ben-Naim, F. Vazquez and S. Redner, J. Quant. Anal.
Sp orts 2, article 1 (2006).
[8] S. J. Gould, Full House: The Spread of Excellence from
Plato to Darwin (Harmony Books, New York, 1996).
[9] J. Wesson, The Science of Soccer (IOP, Bristol and
Philadelphia, 2002).
[10] R. Fort and J. Maxcy, Jr., Journal of Sports Economics
4, 154 (2003).
[11] T. Lundh, J. Quant. Anal. Sports 2, Article 1 (2006).
[12] H. S. Stern, Chance 10, 19 (1997).
[13] J. Hofbauer and K. Sigmund, Evolutionary Games
and Population Dynamics (Cambridge University Press,
Cambridge, 1998).
[14] E. Lieberman, C. Hauert and M. A. Nowak, Nature 433,
312 (2005).