Content uploaded by Ioannis Ntzoufras
Author content
All content in this area was uploaded by Ioannis Ntzoufras
Content may be subject to copyright.
Bayesian modelling of football outcomes:
Using the Skellam’s distribution for the goal difference
Dimitris Karlis and Ioannis Ntzoufras
Department of Statistics, Athens University of Economics and Business,
Athens, GREECE; e-mails: {karlis, ntzoufras}@aueb.gr .
ABSTRACT
Modelling football match outcomes is becoming increasingly popular nowa-
days for both team managers and betting funs. Most of the existing literature
deals with modelling the number of goals scored by each team. In the present
paper we work in a different direction. Instead of modelling the number of goals
directly, we focus on the difference of the number of goals, i.e. the margin of
victory. We recast interest in the so-called Skellam distribution. Modelling the
differences instead of the scores themselves has some major advantages. Firstly,
we eliminate correlation imposed by the fact that the two opponent teams com-
pete each other and secondly we do not assume that the scored goals by each team
are marginally Poisson distributed. Application of the Bayesian methodology for
the Skellam’s distribution using covariates is discussed. Illustrations using real
data from the English Premiership for the season 2006-2007 are provided. The
advantages of the proposed approach are also discussed.
Key Words: Goal difference, Overdispersion, Poisson difference, Skellam’s Dis-
tribution, Soccer, Zero Inflated Models.
1 Introduction
In the recent years, an increasing interest has been observed concerning models related to
football (soccer). The increasing popularity of football modelling and prediction is mainly
1
due to two distinct reasons. Firstly, the market related to football has increased considerably
the last years; modern football teams are profitable companies usually with large investments
and budgets, while the interest of the sports funs concerning football is extremely large. The
second reason is betting. The amount spent on bets have increased dramatically in Europe.
As a result, the demand for models which provide good predictions for the outcome of a
football game arises. Since bets are becoming more complicated, more complicated and
more refined models are needed.
The statistical literature contains a series of models for this purpose. Some of them
model directly the probability of a game outcome (win/loss/draw) while other formulate and
predict the match score. On the other hand, the type of models applied vary according to the
geographical location of each football league or tournament indicating that the characteristics
of the game may be influenced by local features.
The Poisson distribution has been widely used as a simple modelling approach for de-
scribing the number of goals in football (see, for example, Lee, 1997). This assumption can
be questionable in certain leagues where overdispersion (sample variance exceeds the sample
mean) has been observed in the number of goals. In addition, empirical evidence has shown
a (relatively low) correlation between the goals in a football game. This correlation must be
incorporated in the model.
Maher (1982) discussed this issue, while Dixon and Coles (1997) extended the inde-
pendent Poisson model introducing indirectly a type of dependence. Moreover, Karlis and
Ntzoufras (2003) extended the bivariate Poisson model with diagonal inflation in order to
account for the increased (relative to the simple bivariate Poisson model) draws observed
in certain leagues. This inflation produces non-Poisson marginal distributions that can be
overdispersed.
The present paper moves in a different direction. Instead of modelling jointly the number
of goals scored by each team, we focus on the goal difference. By this way, we remove the
effect of the correlation between the scoring performance of the two competing teams, while
the proposed model does not assumes Poisson marginals. The model can be used to predict
2
the outcome of the game as well as for betting purposes related to the Asian handicap.
However, the model cannot predict the final score.
We proceed using the Bayesian approach concerning the estimation for the model param-
eters along the lines of Karlis and Ntzoufras (2006). The Bayesian approach is suitable for
modelling sports outcomes in general, since it allows the user to incorporate any available
information about each game via the prior distribution. Information that can be incorpo-
rated in the model can be based on historical knowledge or data, weather conditions or the
fitness of a team. Finally, the Bayesian approach naturally allows for predictions via the
predictive distribution. This allows to predict future games and produce a posterior predic-
tive distribution for future scores, outcomes or even reproduce the whole tournament and
produce quantitative measures concerning the performance of each team.
The remaining of the paper proceeds as follows: Section 2 describes the proposed model
and provides some properties and interesting points that will assist us to understand the
behavior of the model and its interpretation. A zero-inflated model is also described in
order to capture the (possible) excess of draws in a league. In section 3, we describe the
Bayesian approach used to estimate the model. A real data application using the results
of the English Premier League for the season 2006-2007 is described in section 4. Finally
concluding remarks and further work is discussed in section 5.
2 The Model
2.1 Derivation
Consider two discrete random variables Xand Yand their difference Z=X−Y. The
probability function of the difference Zis a discrete distribution defined on the set of inte-
ger numbers Z={. . . , −2,−1,0,1,2, . . .}. Although publications concerning distributions
defined on Zare rare, the difference of two independent Poisson random variables has been
discussed by Irwin (1937) for the case of equal means and Skellam (1946) for the case of
different means.
3
The Skellam’s distribution (or Poisson difference distribution) is defined as the distribu-
tion of a random variable Zwith probability function
fP D(z|λ1, λ2) = P(Z=z|λ1, λ2) = e−(λ1+λ2)µλ1
λ2¶z/2
I|z|³2pλ1λ2´.(2.1)
for all z∈ Z,λ1, λ2>0, where Ir(x) is the modified Bessel function of order r(see
Abramowitz and Stegun, 1974, pp. 375) given by
Ir(x) = ³x
2´r∞
X
k=0
³x2
4´k
k!Γ(r+k+ 1).
We will denote as P D(λ1, λ2) the distribution with probability function given in (2.1).
Although the Skellam’s distribution was originally derived as the difference of two indepen-
dent Poisson random variables, it can be also derived as the difference of distributions which
have a specific trivariate latent variable structure.
Lemma: For any pair of variables (X, Y ) that can be written as X=W1+W3,Y=W2+W3
with W1∼P oisson(λ1), W2∼P oisson(λ2) and W3follows any distribution with parameter
vector θ3then Z=X−Yfollows a P D(λ1, λ2) distribution.
The proof of the above Lemma is straightforward. It is however interesting that the
joint distribution of X, Y is a bivariate distribution with correlation induced by the common
stochastic component in both variables W3. For example, if W3follows a Poisson distribution
then the joint distribution is the bivariate Poisson which has been used for modelling scores
(see Karlis and Ntzoufras, 2003). In addition, the marginal distributions for Xand Ywill be
Poisson distributed only in the case where W3is degenerate at zero or a Poisson distributed
random variable. Therefore, in the general formulation, the marginal distributions of Xand
Yare not any more Poisson but they are defined as the convolution of a Poisson random
variable with another discrete random variable of any distributional form. Thus the marginal
distribution can be overdispersed or even underdispersed relative to a Poisson distribution
and, hence, a large portion of the distributional assumptions concerning the number of goals
scored by each team is removed. This underlines the efficiency and the flexibility of our
proposed model.
4
Although the above lemma implies that the type of the goal difference distribution will
be the same regardless the existence or the type of association between the two variables,
this does not implies that the parameter estimates and their interpretation will be the same.
Finally, the trivariate reduction scheme used to define the P D distribution provides a suit-
able data augmentation scheme that can be used efficiently for constructing the estimation
algorithm (see section 2.2 for details).
The expected value of the P D(λ1, λ2) distribution is given by E(Z) = λ1−λ2while the
variance is V ar(Z) = λ1+λ2. Note that, for the range of mean values observed in football
games, the distribution cannot be sufficiently approximated by the normal distribution and,
hence, inference based on simple normal regression can be misleading. Additional properties
of the distribution are described in Karlis and Ntzoufras (2006).
2.2 A model for the goal difference.
We can use the Poisson difference distribution to model goal difference by specifying as the
response variable the goal difference in game i. Hence we specify
Zi=Xi−Yi∼P D(λ1i, λ2i)
for i= 1,2, . . . , n; where nis the number of games, Xiand Yiare the number of goals scored
by the home and away team respectively in igame. Concerning the model parameters
λ1i,λ2i, we adopt the same structure as in simple or bivariate Poisson models used for the
number of goals scored by each team (see Lee, 1997, Karlis and Ntzoufras, 2003 respectively).
Therefore, we set
log(λ1i) = µ+H+AHTi+DATi(2.2)
log(λ2i) = µ+AATi+DHTi(2.3)
where µis a constant parameter, His the home effect, Akand Dkare the ‘net’ attacking
and defensive parameters of team kafter removing correlation, HTiand ATiare the home
and away team competing each other in game i.
5
Note that for parameters Akand Dkwe propose to use the sum to zero constraints in
order to make the model identifiable. Therefore we need to impose the constraints
K
X
k=1
Ak= 0 and
K
X
k=1
Dk= 0 (2.4)
where Kis the number of the different teams competing each other in the available dataset.
Under the above parametrization all parameters have a straightforward interpretation since
His the expected goal difference in a game where two opponent teams have the same
attacking and defensive skills, µis a constant parameter corresponding to the PD parameter
for the away team in the same case, while Akand Dkcan be interpreted as deviations of the
‘net’ attacking and defensive abilities from a team of moderate performance.
2.3 Zero-Inflated version of the model.
As we have already mentioned, the number of draws (and the corresponding probability)
are under estimated by Poisson based models used for the number of goals scored by each
team. For this reason, we can specify the zero inflated version of Skellam’s distribution to
model the possible excess of draws. Hence, we can define the zero inflated Poisson difference
(ZP D) distribution as the one with probability function
fZP D (0|p, λ1, λ2) = p+ (1 −p)fP D (0|λ1, λ2)(2.5)
fZP D (z|p, λ1, λ2) = (1 −p)fP D(z|λ1, λ2),for z∈ Z \ {0},(2.6)
where p∈(0,1) and fP D(z|λ1, λ2) is given by (2.1).
3 Bayesian Inference
3.1 The prior distributions
To fully specify a Bayesian model, we need to specify the prior distribution. When no
information is available, we propose to use normal prior distributions for the parameters
6
of the PD model with mean equal to zero and large variance (e.g. 104) to express prior
ignorance. For the mixing proportion pused in the zero inflated version of the proposed
model, we propose a uniform distribution defined in the (0,1) interval. This set of priors
was used in the analysis of the Premier league which follows.
Nevertheless, the Bayesian approach offers the ability to incorporate external information
to our inference via our prior distribution. For example, when a last minute injury is reported
or the weather conditions support one of the two competing teams, then this information
can be easily used to specify our prior distributions. Also, prior elicitation techniques can be
employed in order to produce a prior distribution by extracting information by experts on the
topic such as sport analysts and bookmakers. In this case, more general prior distributions
can be used. For example we can use normal prior distributions with small variance centered
at a certain value for the parameters of the PD model and a Beta prior for the mixing
proportion for the zero inflated model.
Finally, the Bayesian approach can be used sequentially by using the previous fixture
posterior distribution as a prior distribution and update by this way much faster our model.
3.2 The posterior distributions
In the Bayesian approach, the inference is based on the posterior distribution of the model
parameters θ. In the PD model we consider the parameter vector
θ= (µ, H, A2, . . . , AK, B2, . . . , BK)
and we need to calculate the posterior distribution
f(θ|z) = fP D(z|θ)f(θ)
RfP D(z|θ)f(θ)dθ
where zis a n×1 vector with the observed goal differences, f(θ) is the joint prior distribution
which is here defined as the product of independent normal distributions and fP D (z|θ) is
the model likelihood
fP D(z|θ) =
n
Y
i=1
fP D(zi|λ1i, λ2i)
7
with fP D(z|λ1, λ2) given by (2.1) and λ1i, λ2iby (2.2) and (2.3) respectively. The attacking
and defensive abilities of the omitted team is simply calculated via the constraints (2.4) and
therefore A1and B1will be substituted in the likelihood by
A1=−
K
X
k=2
Akand B1=−
K
X
k=2
Bk.
Note that the approach is similar for the zero inflated version but, in this case, we have
to additionally estimate the mixing proportion p. Hence the posterior is given by
f(θ, p|z) = Qn
i=1 fZP D (zi|λ1i, λ2i, p)f(θ)f(p)
R R Qn
i=1 fZP D (zi|λ1i, λ2i, p)f(θ)f(p)dθdp
where f(p) is the prior of the mixing proportion pand fZ P D(z|λ1, λ2, p) is given by (2.5) and
(2.6).
Inference concerning the components of the parameter vector θ(and p) can be based
on the posterior summaries of the marginal posterior distribution (mean, median, standard
deviation and quantiles). The above posterior distribution is not analytically tractable.
For this reason, we use Markov chain Monte Carlo (MCMC) algorithms to generate values
from the posterior distribution and hence estimate the posterior distribution of interest and
their corresponding measures of fit. In the next section, we provide brief details on how to
implement MCMC in our proposed model.
3.3 The Markov chain Monte Carlo algorithm
Our approach is based on the sampling augmentation scheme proposed by Karlis and Nt-
zoufras (2006). Hence, a key element for constructing an MCMC algorithm for the proposed
PD and ZPD models is to generate the w1iand w2iaugmented data for PD model and
additionally the latent binary indicators δifor the ZPD model. The first set, will be used
to specify the observed data zias a difference of two Poisson distributed variables while the
latter will be used in the ZPD model to identify from which component we get the observed
difference zi(i.e. from the PD component or from the inflated one).
Hence, in each iteration of the MCMC algorithm we
8
•Generate latent data w1iand w2ifrom
f(w1i, w2i|zi=w1i−w2i, λ1i, λ2i)∝λw1i
1i
w1i!
λw2i
2i
w2i!I(zi=w1i−w2i)
where I(E) = 1 if Eis true and zero otherwise.
•Generate latent binary indicators δifrom
f(δi|zi, λ1i, λ2i)∼
=Bernoulli(˜pi) with ˜pi=p
p+ (1 −p)fP D(zi|λ1i, λ2i).
Concerning the simulation of the augmented data (w1i, w2i) used in the PD we propose
to use the following Metropolis Hastings step:
•If zi<0 and (w1i, w2i) the current values of the augmented data then
–Propose w0
1i∼P oisson(λ1i) and w0
2i=w0
1i−zi.
–Accept the proposed move with probability α= min n1, λ(w0
1i−w1i)
2i
(w1i−zi)!
(w0
1i−zi)! o.
•If zi≥0 and (w1i, w2i) the current values of the augmented data then
–Propose w0
2i∼P oisson(λ2i) and w0
1i=w0
2i+zi.
–Accept the proposed move with probability α= min n1, λ(w0
2i−w2i)
1i
(w2i+zi)!
(w0
2i+zi)! o.
Given the augmented data, (w1i, w2i, δi) the parameters θcan be generated as in simple
Poisson log-models with data y= (wP D
1,wP D
2); where wP D
1,wP D
2are vectors with elements
the w1iand w2ifor which δi= 0. The conditional posterior distributions will be given by
f(θ|p, δ,w1,w2)∝
n
Y
i=1
[fP(w1i|λ1i)fP(w2i|λ2i)]1−δif(θ)
and
f(p|θ,δ,w1,w2)∝pPn
i=1 δi(1 −p)n−Pn
i=1 δif(p).
Note that in the PD model, is similar to setting all δi= 0 for all observations (and p= 0
respectively). In the case, that we use a Beta prior distribution with parameters aand b
9
for pthen the above conditional posterior will be also beta with parameters Pn
i=1 δi+aand
n−Pn
i=1 δi+b. When we wish to impose additional covariates on the mixing proportion
then the parameters can be generated as in the case of a simple logistic regression model
having as a response the latent binary indicators δ.
The above algorithm can be implemented in any programming language or more statis-
tical friendly programming software (such as Rand Matlab). Alternatively we can directly
use WinBUGS (Spiegelhaler et al., 2003), a statistical tool for the implementation of Bayesian
models using MCMC methodology. Results presented in this article, have been reproduced
using both Rand WinBUGS. The latter is available by the authors upon request.
3.4 Simulating future games and leagues from the predictive dis-
tribution
An important feature of Bayesian inference is the predictive distribution. Consider a future
game between the home team hand away team a. Hence we wish to predict a future goal
difference zpred
(h,a). This can be done directly using the posterior predictive distribution
f¡zpred
(h,a)¯
¯z¢=Zf¡zpred
(h,a)¯
¯θ¢f(θ|z)dθ.(3.7)
Note that in the ZPD model θis replaced by θ0= (θ, p). Moreover, f¡zpred
(h,a)¯
¯θ¢depends only
on parameters µ,H, (Ah, Dh) and (Aa, Da) (and pin the ZPD model) which are related to
the teams competing each other in the game we wish to predict. When we wish to predict
goal differences zpred for npred >1 games in which the home team H T pred
kcompetes with the
away team AT pred
kin k-th game (for k= 1, . . . , npred), then the resulting posterior predictive
distribution is given by
f¡zpred¯
¯HT pred,AT pred,z¢=Zf¡zpred ¯
¯HT pred,AT pred,θ¢f(θ|z)dθ
=Znpred
Y
k=1
f¡zpred
k¯
¯µ, H, AH T pred
k
, DHT pred
k
, AAT pred
k
, DAT pred
k¢f(µ, H, A2, . . . , AK, D2, . . . , DK|z)dθ.
10
where HT pred and AT pred are the vectors of length npred with the competing teams in the
future games we wish to predict.
When using an MCMC algorithm, it is straight forward to generate values of the zpred
(h,a)
from the corresponding predictive distribution (3.7) by simply adding the following steps in
the MCMC sampler used in the PD model
•Calculate λpred
1=µ+Ah+Da+Hand λpred
2=µ+Aa+Dh
•Generate wpred
1and wpred
2from a Poisson distribution with parameters λpred
1and λpred
2
respectively.
•Set zpred
(h,a)=wpred
1−wpred
2as the generated value from the predictive distribution of
interest.
In the above procedure, parameters µ,H, (Ah, Dh) and (Aa, Da) will be equal to their
corresponding values generated in each iteration of the MCMC algorithm.
For the ZPD model we need add the following steps in our sampling algorithm
•Generate δpred from a Bernoulli with probability p.
•If δpred = 1 then set zpred
(h,a)= 0 otherwise proceed as in the PD model.
A usual practice when modelling sports outcomes is to reproduce the predictive distri-
bution of the ranking table in a league or a tournament. This procedure is very useful and
was initially introduced by Lee (1997) using plug-in maximum likelihood estimates from a
simple Poisson model. It can be used for two reasons. Firstly, to probabilistically quantify
the final outcome of a league or a tournament. This can be used to assess the goodness of
fit of the model and the overall performance of specific teams in the league. For example,
if the ranking resulting from the predictive distribution is in general agreement with the
observed data, this implies a good fit of our model. On the other hand, specific deviations
may indicate that specific teams performed better or worse than expected in specific games
and hence ended up in a different ranking. Secondly, it can be used to estimate the ranking
11
distribution if the competition had a different structure. For example, when we use data
from knock-out tournaments to estimate the rankings if the teams were competing in full
season in round robin (league) system. Of course this procedure, with the current model,
assumes that the teams’ performance is constant across the whole competition.
In order to generate the posterior predictive distribution of each league, in each iteration
of the algorithm we need to generate zpred. In the case of a foul season league, zpred =z,
H T pred
i=HTiand AT pred
i=ATifor all i= 1,2, . . . , n. Having zpred generated within a
single iteration of the MCMC algorithm, we further
•Calculate the points Ppred
kof each team for k= 1,2, . . . , K (usually giving three points
for a win, one point for a draw and zero points for a loss in each team per game).
•Calculate the ranking Rpred
kin descending order (giving one to the team with highest
number of points) for each team.
After completing the procedure we end up with posterior samples for the points gained
by each team as well as for the rankings in the final league table. We can directly estimate
the number of points that each team was a-posteriori expected to earn (using simple means
of the sampled Pkvalues), and the probability distribution of the ranking for each team
(from simple frequency tables of Rk’s).
4 Application: The English Premier League 2006-2007
The data refer to the English Premierhsip for 2006-2007 season. Data were downloaded by
the web page http://soccernet-akamai.espn.go.com.
For all the parameters of the PD component we have used normal prior distributions
with zero mean and low precision equal to 10−4(i.e. large variance equal to 104) to express
our prior ignorance. A uniform prior distribution was used for the mixing proportion pof
the ZPD model. All results were produced using 10,000 iterations after discarding additional
1,000 iterations as a burn-in period.
12
A plot of the 95% posterior intervals for all ‘net’ attacking and defensive parameters
for all teams is provided in Figures 1 and 2 respectively. According to this plot, Manch-
ester United, Liverpool, Arsenal, Blackburn and Chelsea had the highest ‘net’ attacking
parameters. Concerning the ‘net’ defensive parameters, Chelsea had a considerably higher
parameter than the rest of the teams followed by Manchester United and Arsenal. If we look
at the actual goals scored and conceded by each team, we see that Chelsea and Arsenal had
scored more goals than Liverpool. Nevertheless, Liverpool ‘attacking’ parameter is much
higher than the corresponding parameters of the other teams. This is due to the properties
of the Poisson difference distribution. Since focus is given in the differences, ‘net’ attacking
and defensive parameter must be interpreted altogether. Hence, we can see that Chelsea
overall has much better performance since its ‘net’ defensive parameters are much higher
than Liverpool. In Figure 3 we provide the distribution of goals in favour of Liverpool and
Chelsea for comparison reasons. In this Figure we observe that Liverpool has earned more
games with differences higher than one goal margin (right part of the distribution), while
it suffers considerably in games with differences of one goal, in draws and losses. These
differences resulted to a better ‘net’ attacking parameters but much worse ‘net’ defensive
parameters for Liverpool than Chelsea.
Figure 4 depicts the observed and predicted relative frequencies for the goal differences.
It is evident that there is there is a close agreement between the observed and the predicted
values. Concerning the number of draws, the PD models slightly over-estimates them which
comes in contradiction with the results observed when using simple or bivariate Poisson
models for the actual number of goals. However, note that the marginal distributions for our
model are not necessarily Poisson and this can explain why our model fits better the draws.
Nevertheless, the 95% posterior predictive interval contains the observed value, indicating
minor deviations from the data. Since the PD model slightly overestimates draws, the ZPD
model will not offer much in terms of goodness of fit. Indeed, fitting the ZPD model resulted
to similar predictive values with no obvious differences.
Furthermore, we have reproduced the predictive distribution of the final table using the
13
−1.5 −1.0 −0.5 0.0 0.5
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Man Utd (83)
Chelsea (64) Liverpool (57)
Arsenal (63)
Tottenham (57)
Everton (52)
Bolton (47)
Reading (52)
Portsmouth (45) Blackburn (52)
Aston Villa (43) Middlesbrough (44)
Newcastle (38)
Man City (29)
West Ham (35)
Fulham (38) Wigan (37)
Sheff Utd (32)
Charlton (34)
Watford (29)
Figure 1: 95% Posterior intervals for ‘net’ attacking coefficients (Ai)
−0.5 0.0 0.5 1.0 1.5 2.0
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Man Utd (27) Chelsea (24)
Liverpool (27) Arsenal (35)
Tottenham (54)
Everton (36)
Bolton (52)
Reading (47)
Portsmouth (42)
Blackburn (54) Aston Villa (41)
Middlesbrough (49)
Newcastle (47)
Man City (44)
West Ham (59) Fulham (60)
Wigan (59)
Sheff Utd (55)
Charlton (60)
Watford (59)
Figure 2: 95% Posterior intervals for ‘net’ defensive coefficients [(−1) ×Di]; within brackets
the observed scored goals.
14
0 5 10 15
liverpool.d
−3 −2 −1 0 1 2 3 4
Liverpool
Chelsea
Figure 3: Distribution of goal differences for Liverpool and Chelsea; within brackets the
observed conceded goals.
15
0 20 40 60 80 100 120
Goal difference
Number of games
−4 −3 −2 −1 0 1 2 3 4 6
__
_
_
_
_
_
_
___
__
_
_
_
_
_
_
___
ZPD
PD
Figure 4: 95% Posterior intervals for predicted differences (in red=observed differences; ’-’
indicates the posterior median) No major differences between the two models are observed.
procedure described in section 3.4. Results are presented in Table 1 including details from the
observed final Table. Minor differences were observed between expected and observed points.
Predicted rankings were calculated via the expected number of points. For 60% and 80% of
the competing teams, the absolute difference between predicted and observed points were
found to be ≤2 and ≤3 points respectively. Only for 4 teams the corresponding difference
was found between 3-7 points. Predictive and observed rankings are also close since, for 80%
of the teams, the final observed and predicted ranking was the same or changed by only one
position.
The largest differences are related to the performance of Bolton and Fulham. Bolton
managed to earn seven (7) points additional to the ones expected (56 instead of 49) with
final ranking 7 instead of the expected 12 while Fulham earned six (6) points additional to
the ones expected (39 instead of 33.1) getting 16th position instead of the predicted 19th
avoiding by this way the relegation to the next division.
In order to examine the behaviour of these two teams and compare them with the pre-
16
Posterior Predictive Table Observed Final Table
Pred.(Obs.) Post. Expectations Obs. Obs. values
Rank Team Pts G.Dif. Rank Team Pts G.Dif.
1 (1) Man Utd 86.7 56.0 1 Man Utd 89 56
2 (2) Chelsea 81.0 40.0 2 Chelsea 83 40
3 (3) Arsenal 70.5 28.0 3 Liverpool 68 30
4 (3) Liverpool 69.4 30.2 4 Arsenal 68 28
5 (6) Everton 62.5 16.0 5 Tottenham 60 3
6 (8) Reading 55.5 5.4 6 Everton 58 16
7 (5) Tottenham 54.0 3.2 7 Bolton 56 -5
8 (9) Portsmouth 53.3 2.8 8 Reading 55 5
9 (10) Blackburn 51.8 -2.1 9 Portsmouth 54 3
10 (11) Aston Villa 51.5 1.6 10 Blackburn 52 -2
11 (12) Middlesbrough 49.0 -4.7 11 Aston Villa 50 2
12 (7) Bolton 49.0 -5.7 12 Middlesbrough 46 -5
13 (13) Newcastle 43.8 -8.8 13 Newcastle 43 -9
14 (14) Man City 41.8 -14.8 14 Man City 42 -15
15 (15) West Ham 38.6 -24.3 15 West Ham 41 -24
16 (17) Wigan 38.1 -21.6 16 Fulham 39 -22
17 (18) Sheff Utd 37.0 -23.0 17 Wigan 38 -22
18 (19) Charlton 35.7 -25.9 18 Sheff Utd 38 -23
19 (16) Fulham 33.1 -22.0 19 Charlton 34 -26
20 (20) Watford 29.7 -30.3 20 Watford 28 -30
Table 1: Observed and predicted under the model points and goals differences for all the
teams; in the second column (within brackets) the actual ranking is provided.
17
dicted ones, we have calculated their outcome probabilities for each game and the points
expected to be earned in each game. We have traced as outliers (surprising results according
to our model), all games with absolute difference between the expected and observed number
of points greater than 1.95 (first criterion) or games with probability of observed outcome
lower that 20% (second criterion). Six and four games were traced as surprising results (or
outliers) for Bolton and Fulham respectively and are presented in Table 2.
Concerning Bolton, we notice that it earned 32 points instead of the expected 30.6 in
their home field (+1.4 points) and 24 instead of the expected 18.3 in their away games
(+5.7 points). Hence, Bolton performance in away games was much higher than expected.
Specifically, in their home field won Arsenal where the expected number of points was equal
to one (and probability of loosing equal to 47%) but they lost by Wigan in which game
the expected number of points were equal to 2.1 and the probability of loosing only 16%
(probability of winning this game 61%). All four away games with surprising scores are
in favour of Bolton. They surprisingly won Aston Villa, Blackburn and Portsmouth (with
probabilities ranging from 20% to 24%) while they managed to draw with Chelsea (with
probability of draw equal to 19%). The additional point difference in these four games is
equal to seven (7) points which is the difference between the predicted and observed league
tables.
Looking at Fullham’s games, we observe four games with highly surprisingly results.
Fulham, unlike Bolton, had much better performance than the expected one in home games.
It managed to earn 28 points instead of the expected 20.6 (+7.4) in home games while it
earned 11 points instead of the expected 12.6 (-1.6) in the away games. From the presented
results, we see that Fulham managed to win Arsenal, Everton and Liverpool in its home
field. All these three teams are much better in terms of performance and budget and the
probability for Fulham winning these games was lower than 16%. Finally, Fulham managed
to unexpectedly win Newcastle in the corresponding away game (probability of winning equal
to 13%).
The same type of analysis can be performed for all teams. We have simply focused on
18
Goal Probabilities
Home Away Final difference Home Away Observed Expected Point
Team Team Score (zi) Win Draw Win Points Points Difference
Bolton Arsenal 3-1 2 0.25 0.29 0.47 3 1.0 2.0
Bolton Wigan 0-1 -1 0.61 0.23 0.16 0 2.1 -2.1
Aston Villa Bolton 0-1 -1 0.48 0.32 0.20 3 0.9 2.1
Blackburn Bolton 0-1 -1 0.56 0.21 0.24 3 0.9 2.1
Portsmouth Bolton 0-1 -1 0.53 0.27 0.20 3 0.9 2.1
Chelsea Bolton 2-2 0 0.77 0.19 0.04 1 0.3 0.7
Fulham Arsenal 2-1 1 0.13 0.36 0.51 3 0.7 2.4
Fulham Everton 1-0 1 0.16 0.39 0.46 3 0.9 2.1
Fulham Liverpool 1-0 1 0.16 0.28 0.56 3 0.7 2.3
Newcastle Fulham 1-2 -1 0.44 0.43 0.13 3 0.8 2.2
Table 2: Surprising Results for Bolton and Fulham; Games with expected absolute point
difference >1.95 or probability of observed outcome <0.20.
these two teams in order to understand why these two teams performed in a way different
than the one expected by the model.
Finally, Table 3 presents some goodness of fit measures based on the predictive distribu-
tion. Namely, for each of the quantity appearing in the table, denoted in a general form as
Q, we have calculated the deviation given by
Deviation =v
u
u
t1
|Q|
|Q|
X
i=1 ¡E(QP red
i|y)−Qobs
i¢2,
where |Q|is the length of vector Q. For the calculation of the deviations of the frequencies
and the relative we used |Q|= 13 to consider for differences from -6 to 6, while for the
deviations of the expected points and the expected differences from the final table we set
19
Deviation
Comparison PD ZPD
1. Relative Frequencies (counts/games) 1.06% 1.32%
2. Frequencies (counts) 4.04 5.04
3. Relative Frequencies of win/draw/lose 2.20% 2.80%
4. Frequencies of win/draw/lose 8.30 10.65%
5. Expected points 3.02 3.07
6. Expected goal difference 0.28 0.40
Table 3: Deviations between observed and predictive measures.
|Q|=K= 20 i.e. the number of teams in the league. So Qobs
iis the observed quantity and
E(QP red
i|y) the predicted quantity; the quantities used are reported in the first column of
Table 3. The results show a satisfactory fit of the model to the actual data. In addition the
zero-inflated model does not seem to improve the fit of the model for this data since we did
not observed any excess of draws.
5 Concluding Remarks
In the present paper, we have proposed an innovative approach for modelling football data.
The proposed model has some interesting advantages over the existing ones used for the
same purpose. It is based on the goal differences in each game and its main feature is
that it accounts for the correlation by eliminating any additive covariance. For this reason,
we avoid modelling correlation or imposing assumptions about its structure as needed in
any bivariate distribution (as for example in the bivariate Poisson model). The model has
a straightforward Poisson latent variable interpretation although we do not need to make
assumptions about the distributions of the actual goals scored by each team. Therefore,
the proposed model is quite generic and applicable to data from a wide range of football
20
leagues in which teams with different behaviour compete. Furthermore, its parameters have a
relatively easy interpretation while the parameter estimation is easier than the corresponding
one for the bivariate Poisson model. The Poisson difference model is appropriate for bets
like the Asian handicap where only the difference between the two teams is considered. On
the other hand, by considering only the goal differences, it discards part of the available data
and information and, hence, it cannot be used for modelling the final score of a game.
We have also considered a possible extension of the model by considering a zero-inflated
component. Although, draws were under-estimated using other Poisson related models,
for the 2006-7 English Premier league data the draws are slightly over-estimated using the
proposed PD model. Therefore no zero inflation was needed.
We are currently working on further extensions of the proposed model. A first direction
for extending the proposed model is to consider other distributions defined on Z(see for
examples in Ong et al, 2007). Secondly, variable selection techniques may be implemented
to identify variables with good predictive power and by this way construct a more precise
but also parsimonious model. In addition, Bayesian model averaging techniques can be used
to improve the predictive power. As far as the zero-inflated version, when excess of draws is
observed, we may also incorporate covariates in the mixing proportion pin order to predict
more precisely draws and further identify additional factors that increase the probability of
a draw in a football game.
References
Dixon, M.J. and Coles, S.G. (1997). Modelling association football scores and inefficiencies
in football betting market. Applied Statistics,46, 265-280.
Irwin, W. (1937). The frequency distribution of the difference between two Poisson variates
following the same poisson distribution. Journal of the Royal Statistical Society, Series
A,100, 415-416.
Karlis D. and Ntzoufras, I. (2006). Bayesian analysis of the differences of count data.
21
Statistics in Medicine,25, 1885-1905.
Karlis, D. and Ntzoufras, I. (2005). Bivariate Poisson and Diagonal Inflated Bivariate
Poisson Regression Models in R. Journal of Statistical Software, Volume 10, Issue 10.
Karlis, D. and Ntzoufras, I. (2003). Analysis of Sports Data Using Bivariate Poisson Models.
Journal of the Royal Statistical Society, D, (Statistician), 52, 381 – 393.
Lee, A.J. (1997). Modeling scores in the Premier League: Is Manchester United really the
best? Chance,10, 15-19.
Maher, M.J. (1982). Modelling association football scores. Statistica Neerlandica,36,
109–118.
Ong, S.H., Shimizu, K. and Ng, C.M (2007). A Class of Discrete Distributions Arising from
Difference of Two Random Variables, Computational Statistics and Data Analysis (to
appear)
Skellam, J.G. (1946). The frequency distribution of the difference between two Poisson
variates belonging to different populations. Journal of the Royal Statistical Society,
Series A, , 109, 296.
Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D. (2003). WinBUGS User Man-
ual, Version 1.4. MRC Biostatistics Unit, Institute of Public Health and Department
of Epidemiology & Public Health, Imperial College School of Medicine, UK, available
at http://www.mrc-bsu.cam.ac.uk/bugs.
22