ArticlePDF Available

Abstract

This work is concerned with the interpretation of the results produced by the well known Elo algorithm applied in various sport ratings. The interpretation consists in defining the probabilities of the game outcomes conditioned on the ratings of the players and should be based on the probabilistic rating-outcome model. Such a model is known in the binary games (win/loss), allowing us to interpret the rating results in terms of the win/loss probability. On the other hand, the model for the ternary outcomes (win/loss/draw) has not been yet shown even if the Elo algorithm has been used in ternary games from the very moment it was devised. Using the draw model proposed by Davidson in 1970, we derive a new Elo-Davidson algorithm, and show that the Elo algorithm is its particular instance. The parameters of the Elo-Davidson are then related to the frequency of draws which indicates that the Elo algorithm silently assumes games with 50% of draws. To remove this assumption, often unrealistic, the Elo-Davidson algorithm should be used as it improves the fit to the data. The behaviour of the algorithms is illustrated using the results from English Premier League.
Understanding Draws in Elo Rating Algorithm
This is the corrected version of the paper, “Understanding draws in Elo rating algorithm”,
Journal of Quantitative Analysis in Sports, 16 (3), 211-220, May 2020.
Leszek Szczecinski and Aymen Djebbi
November 22, 2021
Abstract
This work is concerned with the interpretation of the results pro-
duced by the well known Elo algorithm applied in various sport ratings.
The interpretation consists in defining the probabilities of the game out-
comes conditioned on the ratings of the players and should be based on
the probabilistic rating-outcome model. Such a model is known in the
binary games (win/loss), allowing us to interpret the rating results in
terms of the win/loss probability. On the other hand, the model for the
ternary outcomes (win/loss/draw) has not been yet shown even if the
Elo algorithm has been used in ternary games from the very moment it
was devised.
Using the draw model proposed by Davidson in 1970, we derive a
new Elo-Davidson algorithm, and show that the Elo algorithm is its
particular instance. The parameters of the Elo-Davidson are then re-
lated to the frequency of draws which indicates that the Elo algorithm
L. Szczecinski and A. Djebbi are with INRS, Montreal, Canada. [e-mail: {leszek, ay-
men.djebbi}@emt.inrs.ca]. The work was supported by NSERC, Canada.
silently assumes games with 50% of draws. To remove this assumption,
often unrealistic, the Elo-Davidson algorithm should be used as it im-
proves the fit to the data. The behaviour of the algorithms is illustrated
using the results from English Premier League.
1 Introduction
Rating of players (or teams) is, arguably one of the most important problems
in sport analytics and in this work we are concerned with rating in the sports
with one-on-one games yielding ternary results of win, loss and draw. Such a
situation appear in almost all team sports and many individual sports even
if sometimes, the draws are resolved using additional rules (such as overtime
game or shoot-outs) which we ignore in this work.
Rating in sports consists in assigning a numerical value to a player
using the results of the past games. Most of the sport ratings today use the
points which are attributed to the game’s winner. This approach is challenged
by the rating algorithm which was developed in late fifties by Arpad Elo in
the context of chess competition, Elo (2008), and adopted later by F´ed´eration
Internationale des ´
Echecs (FIDE).
Namely, the Elo algorithm changes the players’ rating using not only
the game outcome but also the ratings of the players before the game. The Elo
algorithm is arguably one of the most popular, non-trivial rating algorithm and
was used to analyze different sports, although mostly informally, (Langville
and Meyer, 2012, Chap. 5), Wikipedia contributors (2019); it is also used for
rating in eSports, Herbrich and Graepel (2006). The tuned versions of the Elo
algorithm were adopted by F´ed´eration Internationale de Football Association
(FIFA) for rating women (in 2013) and men (in 2018) national football teams,
FIFA (2018). The Elo algorithm thus deserves a particular attention because,
despite its simplicity many of its aspects are not entirely understood, Aldous
(2017).
In this work we are concerned with the interpretation of the results
produced by the Elo algorithm. We adopt the probabilistic modeling point of
view, where the game outcomes should be related to the rating by conditional
probabilities. The advantage of using such a rating–outcome model is twofold:
i) the obvious one is that, once the rating is known, we can find the probability
of the game outcome, and ii) to find the rating, we can use the conventional
estimation strategies, such as maximum likelihood (ML) or apply the Bayesian
procedures. Here we focus on the ML and note that if the corresponding op-
timization problem has a unique solution (e.g., as in convex optimization), we
may replace the batch optimization with the stochastic gradient (SG) imple-
mentation, where the parameters (rating) updates are obtained after each new
result appears; this is the principle of on-line rating which is often of interest.
We thus focus on the rating-outcome model and the optimization algorithm,
where the algorithm (here, based on the ML) is derived once the model is
defined.
The rating dynamics (referring to how the players’ abilities evolve
through time) is a part of the modelling issues. It was considered, e.g., in
Glickman (1999), where, after simplifications, the Elo algorithm emerged. In
this work, however, we choose to ignore the rating dynamics which would yield
a much more complete but also more complex model as shown in Glickman
(1999). This is not because the dynamics is not important, but rather because
i) we are interested in modeling of the relationship between the ratings and
the outcomes, ii) the Elo algorithm, which we want to interpret in ternary
games, also ignores the rating dynamics, and iii) the derivations based on the
ML are simpler.
Indeed, the Elo algorithm may be seen as the ML-based estimation of
the rating parameters using the model of Bradley and Terry (1952) (i.e., the
rating–outcome model) via the SG (i.e., the optimization algorithm); this was
said e.g., in (Kir´aly and Qian, 2017, Sec. 3.1.2) but still deserves to be stated
explicitly because, while the rating-outcome model may be inferred from the
description of the Elo algorithm, Elo (2008), its derivation does not refer ex-
plicitly to the optimization criterion, nor to the optimization algorithm.
The above understanding of the Elo algorithms which allows for the
probabilistic interpretation of its results applies, however, only to the binary
games (win/loss).
On the other hand, for the games with ternary-outcome, the rating-
outcome model underlying the Elo algorithm has not been yet shown explicitly
even if the algorithm was applied in the games with draws (starting with
chess for which it was devised and, more recently, in football, FIFA (2018)).
In fact, the Elo algorithm acknowledges the presence of draws by using the
concept of a fractional score (of the game) without altering other elements of
the algorithm. Since the fractional score is also traditionally used in many
sports, this “compatibility” and the resulting simplicity are very appealing
and were widely accepted. And this, despite the fact that without the model
for the draw, using the rating provided by the Elo algorithm, we cannot say
what the probability of the draw (or win and loss, by that matter) is. Unless
we rely on heuristics, as done e.g., in Lasek, Szl´avik, and Bhulai (2013).
It is thus natural to ask the following questions: How, knowing the
rating results of the Elo algorithm, the probability of the win/loss/draw should
be calculated? To answer it, we have to ask a corollary question: What
model (if any) underlies the Elo algorithm in ternary-outcome (win/loss/draw)
games?
To answer the above questions we will use the well-known mathematical
formalism of rating in sports which originated in psychometrics for evaluat-
ing the preferences in pairwise-comparison setup, Thurston (1927), Bradley
and Terry (1952), and which was deeply studied and extended in different
directions, David (1963), Cattelan (2012), Caron and Doucet (2012). The
mathematical modeling of draws has also been proposed in psychometrics;
two main approaches may be traced back to i) Rao and Kupper (1967), which
uses thresholding of the unobserved (latent) variables, and ii) Davidson (1970),
which uses an axiomatic definition. These two approaches have also been ap-
plied in sport rating, the first e.g., in Herbrich and Graepel (2006), Kir´aly and
Qian (2017), Koning (2000), and the second, e.g., in Joe (1990), Glickman
(2018).
Our main result is to demonstrate that the Elo algorithm for the ternary
outcome games is a particular case of the rating algorithm obtained from the
Davidson (1970) draw model via the ML estimation principle and the SG
optimization. In other word, we demonstrate that even if the Elo algorithm
does not explicitly model the draw as the third outcome of the game, it actually
does that but only implicitly. While simple, this result has not been yet shown
and, again, the model underlying the Elo algorithm allows us to interpret the
rating results. The algorithm we derive from Davidson (1970) is dubbed the
Elo-Davidson algorithm as it should be seen as a natural generalization of the
original Elo algorithm.
We take our analyzes one step further focusing on a free parameter in
the Davidson (1970) model which is related to the frequency of draws. We show
the explicit relationship between the draw parameter and the draw frequency
which demonstrates that the Elo rating algorithm silently (because the value of
the draw parameter is merely implicit in the algorithm) assumes the frequency
of draws to be 50%. It is, of course, an unrealistic assumption in majority of
sports, and thus, we propose to always use the Elo-Davidson algorithm which
should provide a better fit to the data. This is done in the spirit of the Elo
on-line rating algorithm: in fact, comparing to the Elo algorithm only the non-
linear function has to be changed to yield the Elo-Davidson algorithm. And
this should not be considered a drastic change because the Elo algorithm itself
evolved with that regard. Namely, the current version of the algorithm in the
FIDE ranking uses the logistic non-linear function, while the initial proposition
of the Elo algorithm was based on the Gaussian cumulative density function
(CDF) (Langville and Meyer, 2012, Ch. 5).
The paper is organized as follows. We define the mathematical model
of the problem in Sec. 2 and show how the principle of ML combined with
the SG yield the Elo algorithm in the binary-outcome games. We treat the
issue of draws in Sec. 3 which is the main contributions of the paper: using
Davidson (1970) model we derive a new on-line rating algorithm and demon-
strate that the Elo algorithm may be seen as a particular instance of this new
Elo-Davidson algorithm. We also show how to find its parameters to take into
account the win/loss/draw frequency. In Sec. 4 we illustrate the analysis with
numerical results and the final conclusions are drawn in Sec. 5.
2 Rating: Problem definition
We consider the problem of Mplayers, indexed by m= 1,...,M, challenging
each other in face-to-face games. At a time twe observe the result/outcome yt
of the game between the players defined by the pair {it, jt}. The index itrefers
to the “home” player, while jtindicates the “away” player. This distinction is
important to take into account the so-called home-field advantage (HFA).1
We consider three possible game results: i) the home player wins; de-
noted as {itjt}in which case {yt=H}; ii) the draw {yt=D}, denoted also
as {it.
=jt}; and finally, iii) {yt=A}, which means that the “away” player
wins which we denote also as {itjt}.
For compactness of notation it is convenient to encode the categorical
variable ytinto numerical indicators defined over the set {0,1}
ht=Iyt=H, at=Iyt=A, dt=Iyt=D,(1)
with I·being the indicator function: IA= 1 if Ais true and IA= 0,
otherwise. The mutual exclusivity of the win/loss/draw events guarantees
ht+at+dt= 1.
1The name used in the team sports but the same effect is observed also in individual
sports. For example, in chess, the player who starts the game may be considered a home
player. In tennis, the local player may have advantage over the out-of-the country player,
etc.
Having observed the outcomes of the games, yl, l = 1,...,t, we want
to rate the players, i.e., assign a rating level—a real number—θt,m to each of
them. The rating level should represent the player’s ability to win; for this
reason it is also called strength, Glickman (1999) or skill, Herbrich and Graepel
(2006), Caron and Doucet (2012). The ability should be understood in the
probabilistic sense: no player has a guarantee to win so the outcome ytis
treated as a realization of a random variable Yt. Thus, the levels θt1,m, m =
1,...,M should provide a reliable estimate of the distribution of Ytover the
set {H,A,D}. In other words, the formal rating becomes an expert system
explaining the past– and predicting the future results.
2.1 Win-loss model
It is instructive to consider first the case when the outcome of the game is
binary, yt∈ {H,A}, i.e., for the moment, we ignore the possibility of draws,
D, and we consider them separately in Sec. 3.
As for the HFA, we we will use the method known from the literature
which consists in an artificial increase of the rating of the home player, Rao and
Kupper (1967), Davidson (1970), Koning (2000). Since this issue is relatively
well-known, we omit it from the initial derivations and show it in Sec. 3.2.
We want to establish the probabilistic model linking the result of the
game and the rating levels of the involved players. By far the most popular
approach is based on the so-called linear model (David, 1963, Ch. 1.3)
Pr {ij|θi, θj}=FH(θiθj),(2)
where FH(z) is a monotonically increasing function of z=θiθj. Indeed,
(2) corresponds to our intuition: the growing difference between rating levels
θiθjshould translate into increasing probability of user iwinning against
the user j.
By symmetry, Pr {ij}= Pr {ji}, and law of total probability,
Pr {ij}+ Pr {ij}= 1, we obtain
Pr {ij|θi, θj}=FA(θiθj) = F(z) = 1 F(z),(3)
where we use FH(z) = F(z) to indicate that the entire model is defined by the
one function F(z), which affects both FH(z) and FA(z) via (3).
A popular choice for F(z) is the logistic CDF (Bradley and Terry, 1952)
F(z) = 1
1 + 10z/σ =100.5z
100.5z/σ + 100.5z/σ ,(4)
where σ > 0 is a scale parameter.
We note that the rating is arbitrary regarding
the origin—because any value θ0can be added to all the levels θmwithout
affecting the difference θiθjappearing as the argument of F(·) in (2),
the scaling—because the levels θmobtained with the scale σcan be trans-
formed into levels θ
mwith a scale σvia multiplication: θ
m=θmσ,
and then the value of F(θiθj) used with σis the same value as F(θ
iθ
j)
used with σ;2and
2The rating implemented by FIFA uses σ= 600 FIFA (2018), while FIDE uses σ= 400.
the base of the exponent in (4); for example, 10z/σ = ezwith σ=
σlog10 e; therefore, changing from the base-10 to the base of the natural
logarithm requires replacing σwith σ.
2.2 Maximum likelihood estimation
Using the results from Sec. 2.1, the random variables, Yt, and the rating levels
are related through the conditional probability
Pr {Yt=H|xt,θ}=FH(zt) = F(zt),(5)
Pr {Yt=A|xt,θ}=FA(zt) = F(zt),(6)
zt=xT
tθ=θitθjt,(7)
where θ= [θ1,...,θM]Tis the vector which gathers all the rating levels, (·)T
denotes transpose, ztis thus a result of linear combiner xtapplied to θ, and
xtis the game-scheduling vector, i.e.,
xt= [0,...,0,1
|{z}
it-th pos.
,0,...,0,1
|{z}
jt-th pos.
,0,...,0]T.(8)
We prefer the notation using the scheduling vector as it liberates us from
somewhat cumbersome repetition of the indices itand jtas in (7).
Our goal now, is to find the levels θat time tusing the game outcomes
{yl}t
l=1 and the scheduling vectors {xl}t
l=1. This is fundamentally a parameter
estimation problem (model fitting) and we solve it using the ML principle.
The ML estimate of θat time tis obtained via optimization
ˆ
θt= argmax
θ
Jt(θ) (9)
with
Jt(θ) =
t
X
l=1
Ll(θ) (10)
Ll(θ) = log Pr {Yl=yl|θ}(11)
=hllog FH(xT
lθ) + allog FA(xT
lθ),(12)
where we applied the model (5)-(6) and assumed that Yl, l = 1,...,t are
independent when conditioned on θ.
2.3 Stochastic gradient and Elo algorithm
The maximization in (9) can be done via steepest ascent which would result
in the following operations
ˆ
θtˆ
θt+µθJt(ˆ
θt) (13)
iterated (hence the symbol “”) till convergence for a given t; the gradient is
calculated as
θJt(θ) =
t
X
l=1 θLl(θ),(14)
and the step, µ, should be adequately chosen to guarantee the convergence.
Moreover, since Jt(θ) is concave,3the optimum is global.4
From (12) we obtain
θLl(θ) = xlel(zl) (15)
where
el(zl) = 1
σ[hlF(zl)].(16)
and σ=σlog10 e.
The solution in (13), based on the model (5)-(6), requires θto remain
constant throughout the time l= 1,...,t. Since, in practice, the ability
(rating) of the players may vary in time (the abilities evolve due to training,
age, coaching strategies, fatigue, etc.), it is necessary to track θ.
The tracking may rely on the probabilistic model of the dynamics of the
rating, {θl}t
l=1, leading to the state-space representation, where the estima-
tion can be solved recursively via non-linear Kalman filtering, e.g., Glickman
(1999). Modelling a temporal relationship between the games may also be
useful to take into account not only the “natural” evolution of the players but
to consider the burden of having to deal with the games played close to each
other in time.5
3The problem at hand is essentially equivalent to the logistic regression, which is know
to be concave (Bishop, 2006, Ch. 4.3.3).
4While the optimum is global, it is not unique due to the ambiguity of the origin θ0we
mentioned at the end of Sec. 2.1: for any θand θ0we have Jt(θ) = Jt(θ+θ01), where
1= [1,...,1]T.
5For example, this is quite common in hockey, where the teams regularly have to play
two games in two consecutive days.
An even simpler strategy ignores the dynamics of the ratings and uses
the stochastic gradient (SG) which may be interpreted as the approximation
of the steepest descent with the following fundamental differences: i) at time
tonly one iteration of the steepest descent is executed, ii) the gradient is
calculated solely for the current observation term Lt(ˆ
θt), and iii) the available
estimate ˆ
θtis used as the starting point for the update
ˆ
θt+1 =ˆ
θt+µθLt(θ) = ˆ
θt+µxtet(zt) (17)
=ˆ
θt+µxt[htF(zt)] (18)
=ˆ
θtµxt[atF(zt)],(19)
where µis the adaptation step; with abuse of notation the fraction 1
σfrom
(16) is absorbed by µin (18)-(19).
In the rating context, xthas only two non-zero terms, and therefore
only the rating level of the players itand jtwill be modified. By inspection, the
update (18)-(19) may be written as a single equation for any player i∈ {it, jt}
ˆ
θt+1,i =ˆ
θt,i +KsiF(∆i)(20)
where ∆i=ˆ
θt,i ˆ
θt,j , and jis the player opposing the player i, i.e., j∈ {it, jt},
j6=i;si=Iijindicates if the player iwon the game. We also replaced µ
with Kso that (20) has the form of the Elo algorithm as usually presented in
the literature Elo (2008), (Langville and Meyer, 2012, Ch. 5). Thus, the Elo
algorithm implements the SG to obtain the ML estimate of the levels θunder
the model (3). This observation can be found, e.g., in (Kir´aly and Qian, 2017,
Sec. 3.1.2).
We note that, in the description of the Elo algorithm, Elo (2008), siis
defined as a numerical “score” attributed to the game outcome Hor A. In a
sense, it is a legacy of rating methods which attribute numerical value to the
game result. On the other hand, in the modelling perspective we adopted, siis
the indicator; we do not need to attribute a numerical values to the categorical
variables Hand A.
3 Draws
We want to address now the issue of draws in the game outcome. As we
said, the presence of draws is acknowledged in the Elo algorithm: when draw
occurs, the fractional score, si=1
2, is used in (20). This simplicity may be
appealing but should be considered a heuristics, at least in the probabilistic
perspective because there is no formal model underlying the use of fractional
score. Moreover, using the fractional score in the ternary games does not tell
us how to predict the results of the games from the rating levels.
Thus, the preferred approach is to model the draws explicitly. We must,
therefore, augment our model to include the conditional probability of a draw
which is treated as a third outcome of the game
Pr {i.
=j|θi, θj}=FD(θiθj),(21)
where FD(z) should be decreasing with the absolute value of its argument, and
be maximized for z= 0. The justification is that large absolute difference in
levels increases the probability of win or loss, while the rating levels proximity,
θiθj, should increase the probability of a draw.
3.1 Modeling Draws
The literature proposes essentially two approaches compatible with (2) to
model the draws: Rao and Kupper (1967), and Davidson (1970).
The model proposed by Rao and Kupper (1967) and used later, e.g., in
Fahrmeir and Tutz (1994), Kir´aly and Qian (2017), shifts the functions FH(z)
(to the right) and FA(z) (to the left)
FH(z) = F(zz0), FA(z) = F(zz0) (22)
and then, from the law of total probability, we obtain
FD(z) = 1 FA(z)FH(z) = F(z+z0)F(zz0).(23)
This modelling approach can be easily applied also if we change the
function F(z): for example, Herbrich and Graepel (2006) and Koning (2000),
used the Gaussian CDF F(z). From the statistical perspective, Rao and Kup-
per (1967) model may be seen as an example of the ordered logit or probit
approach, Koning (2000), depending whether we use the Gaussian or logistic
CDF as F(z).
The model proposed by Davidson (1970) can be defined as
FH(z) = F(z;κ) = 100.5z/σ
100.5z/σ + 100.5z/σ +κ(24)
FA(z) = F(z;κ) = 100.5z/σ
100.5z/σ + 100.5z/σ +κ(25)
FD(z) = κpFH(z)FA(z) = κ
100.5z/σ + 100.5z/σ +κ,(26)
where κ0 is a draw parameter we have to define.
More advanced models for the draws were also proposed, e.g., by Glick-
man (2018) to reflect the fact that the strong players tend to draw more often
than the weak ones, when facing the similar-skills opponents.
A priori, both models, Rao and Kupper (1967), as well as, Davidson
(1970), might be useful and their fit to the data should determine which one
we choose. Here, however, we are only interested in unveiling the model un-
derlying the original Elo algorithm which is stated by the following.
Proposition 1 The Elo algorithm (20) may be derived from the model (24)-
(26) via the ML estimation and the SG optimization with κ= 2 and the scale
parameter σ/2.
The demonstration goes through the derivation of the algorithm based on the
(24)-(26) model which yields the Elo-Davidson algorithm.
We recalculate (11) considering the ternary outcomes
Ll(θ) = hllog FH(zl) + allog FA(zl) + dllog FD(zl) (27)
=˜
hllog F(zl;κ) + ˜allog F(zl;κ) + dllog κ, (28)
where ˜
hl=hl+dl/2 and ˜al=al+dl/2 = 1 ˜
hl.
The gradient is given by
θLl(θ) = el(zl)xl(29)
with
el(zl) = ˜
hlψ(zl;κ) + (˜
hl1)ψ(zl;κ) (30)
ψ(z;κ) = F(z;κ)
F(z;κ)=1
σ
100.5z/σ +1
2κ
100.5z/σ + 100.5z/σ +κ=1
σG(z;κ),(31)
where, as before, σ=σlog10 e, and using (31) in (30) yields
el(zl) = 1
σ˜
hlG(zl;κ),(32)
where
G(z;κ) = 100.5z/σ +1
2κ
100.5z/σ + 100.5z/σ +κ.(33)
Using (32) in (17) yields a new, Elo-Davidson, rating algorithm
ˆ
θt+1,i =ˆ
θt,i +KsiG(∆i;κ),(34)
where ∆iand Khave the same meaning as in binary games, and si∈ {0,1
2,1}
indicates the outcome of the game, including a fractional value for the draw.
This is almost the same expression as in the Elo algorithm (20) except
for the function G(∆i;κ) which replaces F(∆i). Thus, to demonstrate the
equivalence between the Elo-Davidson algorithm and the original Elo algo-
rithm, it is enough to use κ= 2
G(z; 2) = 100.5z/σ + 1
100.5z/σ + 100.5z/σ + 2 =100.25z
100.25z/σ + 100.25z/σ (35)
and thus G(2z; 2) = F(z). This means that, the Elo-Davidson algorithm with
κ= 2 and the scale σ/2 produces the same results as the Elo algorithm with
the scale σ. In other words, the Elo algorithm is implicitly based on the draw
model proposed by Davidson (1970) if we set a particular value of the draw
parameter, κ= 2 (and change the scale). This ends the proof.
The lesson learned is that, despite the apparent simplicity of the Elo
algorithm, we should resist the temptation to tweak its parameters. While
using the fractional score value si=1
2for the draw is now explained, we
cannot guarantee that modifying siin arbitrary manner will correspond to a
particular probabilistic model. Therefore, rather than tweaking the SG/Elo
algorithm (20), the modification should start with the probabilistic model
itself.
Can the Elo algorithm be derived from a different rating-outcome model?
There is no definitive answer but some clarification is given by the following:
Corollary 1 ML estimation of the ratings via SG based on the Rao and Kup-
per (1967) model defined in (22)-(22) does not yield the Elo algorithm, except
in the trivial case of binary games.
The demonstration is easy and goes through the derivation of the al-
gorithms following the steps we took deriving the Elo-Davidson algorithm.
3.2 Home-field advantage and draw-parameter
Let us return to the issue of the HFA, where the home wins {yt=H}are
more frequent than the away wins {yt=A}. In the rating context, this
effect is practically always modelled by artificially increasing the level of the
home player, e.g., Rao and Kupper (1967), Davidson and Beaver (1977), which
corresponds, de facto, to left-shifting of all involved conditional probability
functions
Fhfa
H(z) = FH(z+ησ), F hfa
A(z) = FA(z+ησ), F hfa
D(z) = FD(z+ησ),
(36)
where we normalize the HFA parameter, η0, so its value is independent of
the scale σ.6
To find the draw- and the HFA- parameters it is possible to run the
model fitting algorithm (such as Elo or Elo-Davidson) for different values of
ηand κ. This is of course a viable approach, but provides no insight into the
relationship between these parameters and prior knowledge about the game.
In particular, we might want to relate both parameters, ηand κ, to the “meta-
observations” such as the empirical probabilities obtained by averaging over a
6We note that (36) is slightly different from (Davidson and Beaver, 1977, Eq. 2.4);
with our formulation, the relationship (26), FD(z) = κpFH(z)FA(z), holds for the HFA-
dependent functions in (36).
large time window
pH=1
N
N
X
l=1
Iyl=H,pA=1
N
N
X
l=1
Iyl=A,pD=1
N
N
X
l=1
Iyl=D;
(37)
these empirical probabilities should be in agreement with the probabilities
predicted by the model which fits well the data.
Conceptually the simplest way to address this problem is to assume
that the levels θt,m, m = 1,...,M do not vary with time t. Then, the long
term averages in (37) are approximately equal to the expected values
pH1
N
N
X
t=1
Pr{yt=H}=2
M(M1)
M
X
m,n=1
m6=n
Pr{y=H|θm, θn},(38)
where the sum over the time twas replaced with the sum over all possible
values of pairs θm, θn, n 6=m, which assumes that all pairs of players are
uniformly scheduled to play. This condition materializes, e.g., in the regular
season team games, where each team faces the same number of opponents.7
Then we can rewrite (38) as
pH2
M(M1) X
m,n=1
m6=n
Fhfa
H(δm,n),(39)
where δm,n =θmθn.
Of course, to calculate (39) we would need to know δm,n, which means
7For example, in English Premier Ligue (EPL) all opponents are encountered exactly
twice; in National Hockey Ligue (NHL), each team plays 81 games but local encounters are
privileged.
that the model fitting would have to be done in the first place, defying the
purpose of our analysis. We will thus rely on the linearization of Fhfa
H(z) and
assume that all (or at least, most of) δm,n are encountered in the linear region.
In fact, this is actually the case as the level differences are mostly small in
amplitudes and thus the linearization around z= 0 is convenient
Fhfa
H(z)Fhfa
H(0) + cHz, F hfa
A(z)Fhfa
A(0) + cAz, F hfa
D(z)Fhfa
D(0),
(40)
where cH>0 and cA<0.
Moreover, since Pm,n δm,n = 0, we do not need to specify cHas it
cancels out in the summation over δm,n in (39). Using (40) in (39) and doing
the same for pAand pDwe obtain
pHFhfa
H(0) = 100.5η
100.5η+ 100.5η+κ,(41)
pAFhfa
A(0) = 100.5η
100.5η+ 100.5η+κ,(42)
pDFhfa
D(0) = κ
100.5η+ 100.5η+κ,(43)
which are solved by
η= log10
pH
pA
,(44)
κ=pD
pHpA
.(45)
We may look further into the details of (45): denoting the difference
between the frequency of home and away wins as ∆ = pHpA, from the
law of total (empirical) probability we obtain pH=1
2(1 pD+ ∆) and pA=
1
2(1 pD∆), which used in (45) yields
κ=2pD
q(1 pD)222pD
1pD
,(46)
where the approximation is valid when ∆ can be neglected comparing to
1pD.8In such a case the inverse of the approximate relationship (46) tells
us what is the implicit assumption about pDfor a given κ
pDκ
2 + κ.(47)
Thus, using κ= 2 (as done implicitly in the Elo algorithm used in
the current rating of FIDE and FIFA) the fit to the data is done assuming
pD0.5. This is definitely not the case in the football games, and materializes
in chess only for games between high-rank players. We thus can expect that,
when implementing the Elo-Davidson algorithm with suitable values of κ, will
improve the quality of the fit to the data in many sports where pD<0.5.
We will also show in Sec. 4 that (44) and (46) predict very accurately the
parameters κand ηwhich allow us to maximize the fit.
4 Numerical Examples
The main objective now is not to show the advantage of the draw model
from Davidson (1970) over other models from the literature, but rather to
8Which is quite reasonable in many cases. For example, in English Premier Ligue, it is
common to see pH= 0.5, pA= 0.25 and pD= 0.25; which yields κ= 0.71 using (45) while
we obtain κ0.66 using the approximation in (46).
indicate the benefit arising from an explicit use of the draw parameter κin
the Elo-Davidson algorithm when compared to the Elo algorithm. We also
want to show that we can predict analytically, via (44) and (46), the values of
fit-maximizing parameters κand η.
We use the results from the England Premier League football games
available at Football-data.co.uk (2019). In this context, there are M= 20
teams playing against each other in one home- and one away-games. We
consider one season at the time, thus t= 1,...,N, indexes the games in the
chronological order, and N=M(M1) = 380.
As in FIFA rating algorithm, FIFA (2018), we set σ= 600; the levels
are initialized at θ0,m = 0; as we said before these values are arbitrary. In
what follows we always use the normalization K=˜
Kσ which removes the
dependence on the scale: for a given ˜
Kthe prediction results will be exactly
the same even if we change the value of σ.
An example of the estimated ratings θt,m for a group of teams is shown
in Fig. 1 to illustrate the fact that quite a large portion of time in the be-
ginning of the season is dedicated to the convergences of the algorithm; this
is a well-known “learning” period characteristic of the SG and also known in
the application of the Elo algorithm. Of course, using larger step ˜
Kwe can
accelerate the learning at the cost of increased variability of the rating. These
well-known issues are related to the operation of SG but solving them is out
of the scope of this work. We mention them mostly because, to evaluate the
performance of the algorithms, we decide to use the second half of the season,
where we assume the algorithms converged and the rating levels follow the
performance of the teams. This is somewhat arbitrary of course, but our goal
50 100 150 200 250 300 350
-400
-300
-200
-100
0
100
200
300
400
Arsenal
Aston Villa
Chelsea
Leicester
Norwich
Sunderland
West Ham
t
ˆ
θm,t
Figure 1: Evolution of the rating levels ˆ
θm,t for selected English Premier League
teams in the season 2015-2016; N= 380, σ= 600, ˜
K= 0.125, η= 0.3, κ= 0.7.
We assume that, the first half of the season absorbs the learning phase, and
the tracking of the teams’ levels in the second half is free of the initialization
effect.
here is to show the influence of the draw-parameter and not to solve the entire
problem of convergence/tracking in SG/Elo algorithms.
The estimated probability of the game result {H,A,D}calculated before
the game at the time tusing the rating levels ˆ
θt1obtained at the time t1,
is denoted as
ˆpt,H=FH(xT
tˆ
θt1),ˆpt,A=FA(xT
tˆ
θt1),ˆpt,D=FD(xT
tˆ
θt1).(48)
We show the (negative) logarithmic score, Gelman, Hwang, and Vehtari
(2014), averaged over the second-half of the season
LS = 2
N
N
X
l=N/2+1
LSl,(49)
where
LSl=(hllog ˆpl,H+allog ˆpl,A+dllog ˆpl,D).(50)
For comparison with the Elo algorithm, we still have to define the
prediction of the draw in the latter. Since these are undefined, we will follow
the heuristics of Lasek et al. (2013) which may be summarized as follows: the
conventional Elo algorithm is used to find the ratings, but the prediction is
based on functions
˜
Fhfa
H(z)100.5(z/σ+η),˜
Fhfa
A(z)100.5(z/σ+η),˜
Fhfa
D(z)1,(51)
which are next normalized to satisfy the law of total probability. In fact, this
approach can be seen as using two different models: one for adaptation based
on the Elo-Davidson algorithm with κ= 2 and the scale σ/2 (see Proposition 1)
and another one for prediction which is based on the Elo-Davidson algorithm
with κ= 1 and the scale σ.
We show in Fig. 2 the logarithmic score LS for different values of the
draw parameter κ, and normalized step ˜
K. We compare our predictions
with those based on the probabilities inferred from the odds of the betting
site Bets365 available, together with the game results, at Football-data.co.uk
(2019).9These are constant reference lines in Fig. 2 as they, of course, do not
vary with the parameters we adjust.
9The published decimal odds for the three events, oH,oA, and oD, are used to infer the
probabilities, ˜pH1
oH, ˜pA1
oA, and ˜pD1
oD, Kir´aly and Qian (2017).
We use two seasons with quite different optimal values of the draw-
parameter κwhere we can also appreciate that the simulation-based optimiza-
tion coincides with the values of κand ηwe derived analytically using approx-
imations in (46) and (44). We observe that introducing the draw parameter
κthe logarithmic score is improved. On the other hand, using Elo-Davidson
algorithm with κ= 2 yields particularly poor results if we explicit the model
(and thus use κ= 2 for the prediction); a much better solution is to use a
mismatched model for prediction shown in (51); the results obtained are, in
general very close to those obtained using Elo-Davidson algorithm with κ1.
Thus the heuristics of Lasek et al. (2013) might be a viable solution but only
for relatively a large frequency of draws, see Fig. 2b. On the other hands,
when the draws are less frequent, as in the season 2013-2014, the advantage
of the Elo-Davidson prediction is clear, see Fig. 2a
5 Conclusions
In this paper we were mainly concerned with explaining the rationale and
mathematical foundation behind the Elo algorithm in ternary games. The
whole discussion may be summarized as follows:
We explained that, in the binary-outcome games (win/loss), the Elo
algorithm is an instance of the well-known stochastic gradient algorithm
applied to solve the ML estimation of the rating levels. This observation
already appeared in the literature, e.g., Kir´aly and Qian (2017) so it was
made for completeness but also to lay ground for further discussion.
0 0.1 0.2 0.3 0.4 0.5 0.6
0.9
0.95
1
1.05
1.1
1.15
LS
κ= 0.4
κ= 0.8
κ= 1
κ= 2
Elo + (51)
Bet365
2013 2014
η
0 0.1 0.2 0.3 0.4 0.5 0.6
0.96
0.98
1
1.02
1.04
1.06
1.08
1.1
1.12
LS
κ= 0.4
κ= 0.8
κ= 1
κ= 2
Elo + (51)
Bet365
2017 2018
η
a) b)
Figure 2: Logarithmic score (49) for the second half of the seasons of English
Premier Ligue with σ= 600, ˜
K= 0.125, different values of κindicated in the
legend, and varying the home-advantage parameter, η; a) season 2013-2014,
where pH= 0.49, pA= 0.33 , pD= 0.17 and thus using (46), we obtain κ0.4
and from (44) we obtain η0.16; b) season 2017-2018, where pH= 0.46,
pA= 0.27, pD= 0.26, and thus κ0.75 and η0.25. The results “Elo+
(51)” are obtained from the conventional Elo algorithm but we use (51) in the
prediction. The results “Bet365” are based on the probabilities inferred from
the betting odds offered by the site Bet365.
We have shown the model behind the Elo algorithm in the case of the
games with draws. Although the algorithm has been used for decades in
this type of games, the model of the draws has not been shown, impeding,
de facto, the formal probabilistic interpretation of the results. We thus
filled this logical gap showing at the same time a natural generalization
of the Elo algorithm obtained from the well-known model proposed by
Davidson (1970); the resulting algorithm, which we call Elo-Davidson,
has the same structure as the original Elo algorithm, yet provides addi-
tional parameter to adjust to the frequency of draws.
We proposed an analytical approach to estimate the home-field advan-
tage and the draw parameters from the known frequencies of win/draw/loss.
By extension, we revealed that the implicit model behind the Elo algo-
rithm assumes that the frequency of draws close to 50%.
To illustrate the main concepts, we have shown numerical examples based
on the results of the football games in English Premier League.
Finally, we conclude that, while in the past, the Elo algorithm has
satisfied to a large extent the demand for simple rating algorithms, it is still
possible to provide better, more flexible, and yet equally simple solutions. In
particular, the Elo-Davidson we derived is more flexible allowing us to adjust
to the frequency of draws. Yet, it preserves the simplicity of the Elo algorithm
and should be considered as its natural generalization: to switch from the
Elo to the Elo-Davidson algorithm, only the non-linear function needs to be
modified. In fact, such a modification already happened in the early history
of the Elo utilisation for chess rating, when the Gaussian CDF was replaced
by the logistic function which was found to provide a better fit to the data.
Acknowledgments
Many thanks to J.-C. Gregoire (INRS, Canada) and E. V. Kuhn (Federal
University of Santa Catarina, Brazil) for critical reading.
References
Aldous, D. (2017): “Elo ratings and the sports model: A neglected topic in
applied probability?” Statist. Sci., 32, 616–629, URL https://doi.org/
10.1214/17-STS628.
Bishop, C. (2006): Pattern Recognition and Machine Learning, Springer.
Bradley, R. A. and M. E. Terry (1952): “Rank analysis of incomplete block
designs: 1 the method of paired comparisons,” Biometrika, 39, 324–345.
Caron, F. and A. Doucet (2012): “Efficient Bayesian inference for generalized
Bradley–Terry models,” Journal of Computational and Graphical Statistics,
21, 174–196, URL https://doi.org/10.1080/10618600.2012.638220.
Cattelan, M. (2012): “Models for paired comparison data: A review with
emphasis on dependent data,” Statist. Sci., 27, 412–433.
David, H. (1963): The Method of Paired Comparison, Charles Griffin Co. Ltd.
Davidson, R. R. (1970): “On extending the Bradley-Terry model to accom-
modate ties in paired comparison experiments,” Journal of the American
Statistical Association, 65, 317–328, URL http://www.jstor.org/stable/
2283595.
Davidson, R. R. and R. J. Beaver (1977): “On extending the Bradley-Terry
model to incorporate within-pair order effects,” Biometrics, 33, 693–702.
Elo, A. E. (2008): The Rating of Chess Players, Past and Present, Ishi Press
International.
Fahrmeir, L. and G. Tutz (1994): Dynamic stochastic models for time-
dependent ordered paired comparison systems,” Journal of the American
Statistical Association, 89, 1438–1449, URL http://dx.doi.org/10.1093/
biomet/39.3-4.324.
FIFA (2018): “F´ed´eration international de football association: men’s
ranking procedure,” URL https://www.fifa.com/fifa-world-ranking/
procedure/.
Football-data.co.uk (2019): “Historical football results and betting odds
data,” URL https://www.football-data.co.uk/data.php.
Gelman, A., J. Hwang, and A. Vehtari (2014): Understanding predictive
information criteria for Bayesian models,” Statistics and Computing, 24,
997–1016, URL https://doi.org/10.1007/s11222-013-9416-2.
Glickman, M. (2018): “Paired comparison models with
tie probabilities and order effects as a function of
strength,” URL http://www.fields.utoronto.ca/talks/
Paired-Comparison-Models-Tie-Probabilities-and-Order-Effects-Function-Strength.
Glickman, M. E. (1999): “Parameter estimation in large dynamic paired
comparison experiments,” Journal of the Royal Statistical Society: Series
C (Applied Statistics), 48, 377–394, URL http://dx.doi.org/10.1111/
1467-9876.00159.
Herbrich, R. and T. Graepel (2006): “Trueskill(TM): A
Bayesian skill rating system,” Technical report, URL
https://www.microsoft.com/en-us/research/publication/
trueskilltm-a-bayesian-skill-rating-system-2/.
Joe, H. (1990): “Extended use of paired comparison models, with application
to chess rankings,” Journal of the Royal Statistical Society. Series C (Applied
Statistics), 39, 85–93, URL http://www.jstor.org/stable/2347814.
Kir´aly, F. J. and Z. Qian (2017): “Modelling Competitive Sports: Bradley-
Terry-Elo Models for Supervised and On-Line Learning of Paired Competi-
tion Outcomes,” arXiv e-prints, arXiv:1701.08055.
Koning, R. H. (2000): “Balance in competition in Dutch soccer,” Jour-
nal of the Royal Statistical Society: Series D (The Statistician), 49, 419–
431, URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/
1467-9884.00244.
Langville, A. N. and C. D. Meyer (2012): Who’s #1, The Science of Rating
and Ranking, Princeton University Press.
Lasek, J., Z. Szl´avik, and S. Bhulai (2013): “The predictive power of ranking
systems in association football,” International Journal of Applied Pattern
Recognition, 1, 27–46, URL https://www.inderscienceonline.com/doi/
abs/10.1504/IJAPR.2013.052339, pMID: 52339.
Rao, P. V. and L. L. Kupper (1967): “Ties in paired-comparison experiments:
A generalization of the Bradley-Terry model,” Journal of the American
Statistical Association, 62, 194–204, URL https://amstat.tandfonline.
com/doi/abs/10.1080/01621459.1967.10482901.
Thurston, L. L. (1927): “A law of comparative judgement,” Psychological
Review, 34, 273–286.
Wikipedia contributors (2019): “Wikipedia: elo rating system,” URL https:
//en.wikipedia.org/wiki/Elo_rating_system.
... The simplicity of the Elo algorithms is indisputably appealing but the algorithm is sport-independent only in the binary games, where there is no coefficients to adjust (and even then, we have to deal with the home-field advantage (HFA) in a sport-specific manner). On the other hand, the sportindependence is merely apparent in the ternary games: as shown in Szczecinski and Djebbi (2020), the Elo algorithm is implicitly based on the draw model proposed by Davidson (1970) and assumes a particular value of the draw coefficient that can be related to frequency of draws equal to 50%. This is clearly unrealistic in most sports so, to correct for this unrecognized algorithmic bias, Szczecinski and Djebbi (2020) proposed the Elo-Davidson algorithm which is as simple and interpretable as the Elo algorithm, yet it provides the possibility to set the model's coefficients taking into account both, the frequency of draws as well as the HFA, which are sport-dependent. ...
... On the other hand, the sportindependence is merely apparent in the ternary games: as shown in Szczecinski and Djebbi (2020), the Elo algorithm is implicitly based on the draw model proposed by Davidson (1970) and assumes a particular value of the draw coefficient that can be related to frequency of draws equal to 50%. This is clearly unrealistic in most sports so, to correct for this unrecognized algorithmic bias, Szczecinski and Djebbi (2020) proposed the Elo-Davidson algorithm which is as simple and interpretable as the Elo algorithm, yet it provides the possibility to set the model's coefficients taking into account both, the frequency of draws as well as the HFA, which are sport-dependent. ...
... In our view, the most natural way of generalizing the Elo rating algorithm, is by recognizing first that the Elo (and the Elo-Davidson) algorithm is based on a particular case of the Adjacent Categories (AC) model which is well-known in the literature on the ordinal data analysis (Agresti, 2013, Ch. 8.3), (Agresti, 1992, Sec. 3). In the ternary-output case, the AC model corresponds to the one proposed in Davidson (1970)-a relationship which was recognized in the literature, see Sinclair (1982), Dittrich, Francis, Hatzinger, and Katzenbeisser (2007), and Szczecinski and Djebbi (2020), but-to the best of our knowledge-was not exploited in the sport-rating literature for games with higher-than-ternary outcomes. ...
Article
Full-text available
In this work we develop a new algorithm for rating of teams (or players) in one-on-one games by exploiting the observed difference of the game-points (such as goals), also known as a margin of victory (MOV). Our objective is to obtain the Elo-style algorithm whose operation is simple to implement and to understand intuitively. This is done in three steps: first, we define the probabilistic model between the teams’ skills and the discretized MOV variable: this generalizes the model underpinning the Elo algorithm, where the MOV variable is discretized into three categories (win/loss/draw). Second, with the formal probabilistic model at hand, the optimization required by the maximum likelihood rule is implemented via stochastic gradient; this yields simple online equations for the rating updates which are identical in their general form to those characteristic of the Elo algorithm: the main difference lies in the way the scores and the expected scores are defined. Third, we propose a simple method to estimate the coefficients of the model, and thus define the operation of the algorithm; it is done in a closed form using the historical data so the algorithm is tailored to the sport of interest and the coefficients defining its operation are determined in entirely transparent manner. The alternative, optimization-based strategy to find the coefficients is also presented. We show numerical examples based on the results of the association football of the English Premier League and the American football of the National Football League.
... Furthermore, for η = 0 and κ = 2 we obtain F 2 (z) = F (z/2) and thus (27) is again equivalent to the Elo rating algorithm but with the doubled scale value. These observations explain our preference for the model: it leads to a simple algorithm which generalizes the Elo/FIFA rating algorithm, (Szczecinski & Djebbi, 2020). ...
... (2020). 13 • A more important improvement is obtained by optimizing the parameter κ which takes into account the draws and their frequency as discussed in Szczecinski and Djebbi (2020). ...
... These two elements seem to be particularly important from the point of view of the performance of the rating algorithm. While the HFA is well-known and is part of FIFA Womens' rating (FIFA, 2007), the possibility of generalizing the Elo algorithm by using the Davidson model, was only recently shown in Szczecinski and Djebbi (2020). ...
Preprint
Full-text available
In this work we study the ranking algorithm used by Fédération Internationale de Football Association (FIFA); we analyze the parameters it currently uses, show the formal probabilistic model from which it can be derived, and optimize the latter. In particular, analyzing the games since the introduction of the algorithm in 2018, we conclude that the game's "importance" (as defined by FIFA) used in the algorithm is counterproductive from the point of view of the predictive capability of the algorithm. We also postulate the algorithm to be rooted in the formal modelling principle, where the Davidson model proposed in 1970 seems to be an excellent candidate, preserving the form of the algorithm currently used. The results indicate that the predictive capability of the algorithm is notably improved by using the home-field advantage (HFA) and the explicit model for the draws in the game. Moderate, but notable improvement may be attained by introducing the weighting of the results with the goal differential, which although not rooted in a formal modelling principle, is compatible with the current algorithm and can be tuned to the characteristics of the football competition.
... • Davidson draw model (Davidson, 1970), (Szczecinski and Djebbi, 2020) with y t = 0 (away win), and y t = 1 (draw), and y t = 2 (home win) ...
... A simple, but less obvious observation is that, setting κ = 2, we obtain G D (z) = F L (z) and thus g(z; y t ) in (71) is half of (75). A direct consequence, observed in Szczecinski and Djebbi (2020), is that, even if the Bradley-Terry and the Davidson models are different, their SG updates (61) may be identical. ...
... whereK absorbs the term ln 10 from (71), and we recognize (95) as the wellknown Elo rating algorithm. The fact that the Elo algorithm may be seen as the SG update in the Bradly-Terry model has been already noted before, e.g., in Király and Qian (2017), Szczecinski and Djebbi (2020), Lasek and Gagolewski (2020). ...
Preprint
Full-text available
In this work, we deal with the problem of rating in sports, where the skills of the players/teams are inferred from the observed outcomes of the games. Our focus is on the online rating algorithms which estimate the skills after each new game by exploiting the probabilistic models of the relationship between the skills and the game outcome. We propose a Bayesian approach which may be seen as an approximate Kalman filter and which is generic in the sense that it can be used with any skills-outcome model and can be applied in the individual-as well as in the group-sports. We show how the well-know algorithms (such as the Elo, the Glicko, and the TrueSkill algorithms) may be seen as instances of the one-fits-all approach we propose. In order to clarify the conditions under which the gains of the Bayesian approach over the simpler solutions can actually materialize, we critically compare the known and the new algorithms by means of numerical examples using the synthetic as well as the empirical data.
... The simplicity of the Elo algorithms is indisputably appealing but the algorithm is sport-independent only in the binary games, where there is no coefficients to adjust (and even then, we have to deal with the home-field advantage (HFA) in a sport-specific manner). On the other hand, the sportindependence is merely apparent in the ternary games: as shown in Szczecinski and Djebbi (2020), the Elo algorithm is implicitly based on the draw model proposed by Davidson (1970) and assumes a particular value of the draw coefficient that can be related to frequency of draws equal to 50%. This is clearly unrealistic in most sports so, to correct for this unrecognized algorithmic bias, Szczecinski and Djebbi (2020) proposed the Elo-Davidson algorithm which is as simple and interpretable as the Elo algorithm, yet it provides the possibility to set the model's coefficients taking into account both, the frequency of draws as well as the HFA, which are sport-dependent. ...
... On the other hand, the sportindependence is merely apparent in the ternary games: as shown in Szczecinski and Djebbi (2020), the Elo algorithm is implicitly based on the draw model proposed by Davidson (1970) and assumes a particular value of the draw coefficient that can be related to frequency of draws equal to 50%. This is clearly unrealistic in most sports so, to correct for this unrecognized algorithmic bias, Szczecinski and Djebbi (2020) proposed the Elo-Davidson algorithm which is as simple and interpretable as the Elo algorithm, yet it provides the possibility to set the model's coefficients taking into account both, the frequency of draws as well as the HFA, which are sport-dependent. ...
... In our view, the most natural way of generalizing the Elo rating algorithm, is by recognizing first that the Elo (and the Elo-Davidson) algorithm is based on a particular case of the Adjacent Categories (AC) model which is well-known in the literature on the ordinal data analysis (Agresti, 2013, Ch. 8.3), (Agresti, 1992, Sec. 3). In the ternary-output case, the AC model corresponds to the one proposed in Davidson (1970)-a relationship which was recognized in the literature, see Sinclair (1982), Dittrich, Francis, Hatzinger, and Katzenbeisser (2007), and Szczecinski and Djebbi (2020), but-to the best of our knowledge-was not exploited in the sport-rating literature for games with higher-than-ternary outcomes. ...
Preprint
Full-text available
In this work we develop a new algorithm for rating of teams (or players) in one-on-one games by exploiting the observed difference of the game-points (such as goals), also known as a margin of victory (MOV). Our objective is to obtain the Elo-style algorithm whose operation is simple to implement and to understand intuitively. This is done in three steps: first, we define the probabilistic model between the teams' skills and the discretized MOV variable: this generalizes the model underpinning the Elo algorithm, where the MOV variable is discretized into three categories (win/loss/draw). Second, with the formal probabilistic model at hand, the optimization required by the maximum likelihood rule is implemented via stochastic gradient; this yields simple on-line equations for the rating updates which are identical in their general form to those characteristic of the Elo algorithm: the main difference lies in the way the scores and the expected scores are defined. Third, we propose a simple method to estimate the coefficients of the model, and thus define the operation of the algorithm; it is done in a closed form using the historical data so the algorithm is tailored to the sport of interest and the coefficients defining its operation are determined in entirely transparent manner. The alternative, optimization-based strategy to find the coefficients is also presented. We show numerical examples based on the results of the association football of the English Premier League and the American football of the National Football League.
... , N, index the games in the chronological order, and N = M(M 1) = 380. [5] Football (and other) games are known to produce the so-called home-field advantage, where the home wins = are more frequent than the away wins = . In the rating context, this is modelled by artificially increasing the level of the home player, which corresponds to left-shifting of the conditional probability functions. ...
... In the rating context, this is modelled by artificially increasing the level of the home player, which corresponds to left-shifting of the conditional probability functions. [5] ...
Thesis
This project explores the practical and theoretical aspects of rating in competitions where teams face each other via matches. The rating consists of finding an interpretable way to allow predict the outcome of future matches. The project is an analysis of these methods. My role in this project is to participate in the algorithmic study as well as data collection, processing, and visualization using Python as a programming language.
... Nonetheless, (Szczecinski and Djebbi, 2019) propose a hybrid algorithm, k-Elo, to tackle the draw state. Further research design could warrant the inclusion of the draw condition and, therefore, the use of k-Elo that accounts for the frequency of draws, thus removing any doubt to the effect of the absent draw condition. ...
Thesis
Personally identifying information (PII) are complex resources. Each item of PII, e.g., a fingerprint, holds a confidence-based utility that fuels identity assurance, i.e., processing fingerprints towards a desired confidence that a person is whom they claim. Each time we use an item of PII however, for identity assurance or otherwise, we inadvertently expose it to misuse. Exposure thus accumulates to deplete the confidence that may be extracted for subsequent identity assurance uses. Therefore, in terms of identity assurance, PII exhibit some of the properties of a commons, wherein resources are accessible to all, and whereby individual actions can affect the group. In this depiction of identity assurance, there is an underlying usage dilemma surrounding PII. This dilemma arises because coaxed by the affordance of the modern Web, PII of increasing veracity is being digitally exchanged, processed, and stored in ever-increasing volumes and varieties. Towards a novel sense of identity assurance as a commons-esque system, this work combines empirical and agent-based simulation methods to investigate PII exchange between individuals and organisations. First, by repurposing Elo’s (1979) ranking algorithm, I produce a unique user-centric measure of PII’s personal utility by ranking identifiers based on the quantification of (N =125) users’ willingness to disclose. These results also incorporate inter-contextual differences with a design spanning social, commercial and state-based contexts. Second, I qualitatively analyse 23 one-to-one semi-structured interviews regarding disclosure decisions. From this, I identify six super-ordinate classes of heuristics that users rely upon during disclosures: prominence, network, reliability, accordance, narrative, and modality, along with a seventh non-heuristics class; trade. Third, I combine my empirical results with theory to produce a dual-system decision model of users exchanging PII with organisations. Finally, I explore the dynamics of PII exchange via an agent-based simulation of my model that serves to illustrate the potential effect of interventions such as educating users or increasing competition. I show that our onus on disclosure self-management threatens the future efficacy of identity assurance methods.
Article
Full-text available
We provide an overview and comparison of predictive capabilities of several methods for ranking association football teams. The main benchmark used is the official FIFA ranking for national teams. The ranking points of teams are turned into predictions that are next evaluated based on their accuracy. This enables us to determine which ranking method is more accurate. The best performing algorithm is a version of the famous Elo rating system that originates from chess player ratings, but several other methods (and method versions) provide better predictive performance than the official ranking method. Being able to predict match outcomes better than the official method might have implications for, e.g., a team's strategy to schedule friendly games.
Article
In a simple model for sports, the probability A beats B is a specified function of their difference in strength. One might think this would be a staple topic in Applied Probability textbooks (like the Galton-Watson branching process model, for instance) but it is curiously absent. Our first purpose is to point out that the model suggests a wide range of questions, suitable for "undergraduate research" via simulation but also challenging as professional research. Our second, more specific, purpose concerns Elo-type rating algorithms for tracking changing strengths. There has been little foundational research on their accuracy, despite a much-copied "30 matches suffice" claim, which our simulation study casts doubt upon.
Article
Prediction and modelling of competitive sports outcomes has received much recent attention, especially from the Bayesian statistics and machine learning communities. In the real world setting of outcome prediction, the seminal \'{E}l\H{o} update still remains, after more than 50 years, a valuable baseline which is difficult to improve upon, though in its original form it is a heuristic and not a proper statistical "model". Mathematically, the \'{E}l\H{o} rating system is very closely related to the Bradley-Terry models, which are usually used in an explanatory fashion rather than in a predictive supervised or on-line learning setting. Exploiting this close link between these two model classes and some newly observed similarities, we propose a new supervised learning framework with close similarities to logistic regression, low-rank matrix completion and neural networks. Building on it, we formulate a class of structured log-odds models, unifying the desirable properties found in the above: supervised probabilistic prediction of scores and wins/draws/losses, batch/epoch and on-line learning, as well as the possibility to incorporate features in the prediction, without having to sacrifice simplicity, parsimony of the Bradley-Terry models, or computational efficiency of \'{E}l\H{o}'s original approach. We validate the structured log-odds modelling approach in synthetic experiments and English Premier League outcomes, where the added expressivity yields the best predictions reported in the state-of-art, close to the quality of contemporary betting odds.
Article
Linear paired comparison models are studied when ties or draws are allowed, where the probability of a win plus half the probability of a draw is modelled as a symmetric cumulative function evaluated at the difference of strength parameters. These models are extended to make use of covariate information and used for ranking 64 top chess players since 1800 with information on career periods. The Davidson model which allows for draws does not fit chess data well because of the large variability in draw percentages from player to player. Some new methodology including an analysis of an appropriate goodness-of-fit test is given for this extended use of the linear model.