Content uploaded by Patrice Marek
Author content
All content in this area was uploaded by Patrice Marek on Jul 21, 2017
Content may be subject to copyright.
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League
Patrice Marek* and František Vávra**
*European Centre of Excellence NTIS – New Technologies for the Information Society,
Faculty of Applied Sciences, University of West Bohemia, Czech Republic: patrke@kma.zcu.cz
** Department of Mathematics, Faculty of Applied Sciences,
University of West Bohemia, Czech Republic: vavra@kma.zcu.cz
Abstract
The home team advantage in association football is a well known phenomenon. The aim
of this study is to offer a different view on the home team advantage. Usually, in association
football, each two teams – team A and team B – play twice in a season. Once as a home
team and once as a visiting team. This offers two results between teams A and B which are
combined together to evaluate whether the team A against its opponent B recorded a result at
the home field – in comparison to the away field – that is better, even, or worse. This leads to
a random variable with three possible outcomes, i.e. trinomial distribution. Combination and
comparison of home and away results of the same two teams is the key to eliminate problems
with different strength of teams in the league. Using a uniform distribution as a prior we obtain
a Dirichlet distribution as a posterior. This is later used to determine point and interval estimates
of unknown parameters of the source trinomial distribution, i.e. the probability that the result
at home will be better, even, or worse. Moreover, it is possible to test a hypothesis that the
home team advantage for a selected team is statistically significant. This approach can be used
to construct a measure of the home team advantage for a single team. Described procedure is
demonstrated on English Premier League results from the 1992/1993 season to the 2015/2016
season.
1 Introduction
Home team advantage is phenomenon that is well known. It is used in models that estimate probability of
win, draw and loss in a match. Usage of home team advantage in modelling and predicting sports results can
be traced back to Maher (1982) who used one parameter to adjust strength of team’s attack and weakness
of team’s defence for matches played on away field. Home team advantage was later used in many papers
that studied different sports, e.g. in association football by Dixon & Coles (1997), in water polo by Karlis &
Ntzoufras (2003) and in ice hockey by Marek et al. (2014).
Home team advantage as a self-standing phenomenon was deeply studied by Pollard & Pollard (2005).
Their paper offers nice summary of previous research on this phenomenon and analysis of more than 400,000
matches in many sports played between years 1876 and 2003. They quantified home team advantage in
association football as "the number of points obtained by the home team expressed as a percentage of all
points obtained in all games played". The same definition of home team advantage was used by Allen &
Jones (2014) in analysis of the English Premier League in the seasons 1992/1993–2011/2012. Their results
showed that 60.77% (±8.30) of total points was won in home games.
244
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
This paper offers a slightly different view on home team advantage and – instead of points – home team
advantage is based on number of goals scored and their differences. The advantage of using goals can be
demonstrated on results of a team that played the same opponent at the home and away field. Let us assume,
that the result at home field was 3–0 win, and the result at away field was 2–1 win. Obviously, better result
was recorded at the home field; however, based on points obtained, it is not possible to distinguish between
these results as the team is always awarded by 3 points. Method described in the following part will allow to
distinguish between these results, and it will offer to measure the home team advantage for individual teams
and observe changes during the time.
2 Data and Methods
English Premier League results from the 1992/1993 season to the 2015/2016 season were obtained from Eng-
land Football Results and Betting Odds (2017). Data for the first English Premier League season (1992/1993)
were obtained from official website Premier League Football News, Fixtures, Scores & Results (2017). This
website was also used for basic control of all data, e.g. total number of scored goals by team in the whole
season.
Premier League consisted of 22 teams in the first 3 seasons and of 20 teams in the rest of seasons.
Balanced schedule was used in all seasons, i.e. each team played each other team exactly two times, once
as a home team and once as a visiting team. This means that for each team there are 19 opponents (21 in
the first three seasons) with two results in a season. These two results are combined together and used to
measure home team advantage which is evaluated according to Definitions 1, 2 or 3. Naturally, each season
is analysed separately to eliminate changes in teams that form the league and to eliminate changes in rosters
that are usually bigger between seasons.
Definition 1. Active measure of home team advantage is a random variable A that can take values 1,0,
and 1.A=1for team T1if two matches between teams T1and T2in a season ended with a result where
team T1scored more goals on a field of team T2than on its own field. A =0for team T1if this team scored
exactly the same number of goals on a home field and away field and A =1for team T1if this team scored
more goals on its own field than on a field of team T2. With results hT1:aT2on a home field of team T1and
hT2:aT1on a home field of team T2the value of random variable A is determined as
A=sgn(hT1aT1).(1)
Definition 2. Passive measure of home team advantage is a random variable P that can take values 1,0,
and 1.P=1for team T1if two matches between teams T1and T2in a season ended with a result where
team T1conceded more goals on a home field than on a field of team T2.P=0for team T1if this team
conceded exactly the same number of goals on a home field and away field and P =1for team T1if this team
conceded more goals on a field of team T2than on its own field. With results hT1:aT2on a home field of team
T1and hT2:aT1on a home field of team T2the value of random variable P is determined as
P=sgn(hT2aT2).(2)
Definition 3. Combined measure of home team advantage is a random variable C that can take values 1,0,
and 1.C=1for team T1if two matches between teams T1and T2in a season ended with a better result –
245
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
measured by a goal difference in matches – for team T1on an away field. C =0for team T1if goal difference
in both matches was exactly the same from T1’s point of view and C =1for team T1if this team recorded
better result – measured by a goal difference in matches – on its own field. With results hT1:aT2on a home
field of team T1and hT2:aT1on a home field of team T2the value of random variable C is determined as
C=sgn((hT1aT2)(aT1hT2)).(3)
All three measures are defined so that value 1 means that a result was better on a home field, 0 means that
there was no difference and 1 means that better result was recorded on an away field. Obviously, active
measure for team T1is passive measure for team T2. More or less, combination or results between two
same teams – as used in Definitions 1, 2 or 3 – eliminates the fact that teams in league are of different
quality. All three random variables can take same values with same interpretation; therefore, in following
parts the combined measure Cis used and it can be easily substituted by Aor Pto obtain results for other
two measures.
English Premier League used balanced schedule in all seasons with exactly two matches between each
two teams. Let Ldenote number of teams in a league (for our data L=22 or L=20) then for each team
in a season, there are K,K=L1,opponents. Random sample C1,C2,...,CKis obtained as one season’s
results of given team and its opponents. Ci’s are considered to be identically distributed because there are no
big changes in a team during one season. Therefore, probabilities p1,p0and p1of possible outcomes 1,0
and 1 are considered constant in a season. The meaning is that during a season the home team advantage of
a team is stationary. The second assumption is that Ci’s are independent. The interpretation is that matches
with one opponent does not influence matches with other opponents.
Remark 1.Assumption that Ci,i=1,2,...K,are i.i.d. may not be true in reality. However, it can be
expected that violation of this assumption is not strong, and therefore, it is used in the same sense in majority
of studies that deal with sports. Without this simplification it would be impossible to use statistics for sports
as every single match could be played under slightly different conditions (for example, in different weather
conditions). Moreover, undermentioned methods will be robust, and this simplification should not result in
any problems with interpretation of obtained findings.
Let Zr,r=1,0,1, is random variable which describes number of cases in a season where it is possible
to observe home team advantage (r=1), away team advantage (r=1) and no advantage (r=0). Obvi-
ously, for Kmatches in a season Z1+Z0=KZ1. Vector (Z1,Z0,Z1)follows trinomial distribution with
parameters Kand p1,p0,p1. Probability mass function under this notation is given by
P(k1,k0,k1)= K!
k1!k0!k1!pk1
1pk0
0pk1
1,(4)
where Kis total number of opponents in a season for one team, p1,p0,p1are probabilities of occurring
a home team advantage (r=1), an away team advantage (r=1) and no advantage (r=0). k1,k0,k1,
k1+k0+k1=K, are observations of appropriate advantage.
Bayesian inference is used to estimate unknown parameters and consequently confidence intervals. Prior
distribution of parameters p1,p0and p1is set to be uniform, i.e. it does not matter where a team plays
a match and probability in Equation 4 is used as conditional probability of observation under given parame-
ters, i.e. P(k1,k0,k1|p1,p0,p1). This leads to posterior probability density of parameters p1,p0,p1given
by
246
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
P(p1,p0,p1|k1,k0,k1)= G(K+3)
G(k1+1)G(k0+1)G(k1+1)pk1
1pk0
0pk1
1,p1,p0,p10,
1
Â
r=1
pr=1,(5)
where Kis total number of opponents in a season for one team and k1,k0,k1,k1+k0+k1=K, are
observations of given advantage. Equation 5 is probability density function of a Dirichlet distribution
Dir(a1=k1+1,a2=k0+1,a3=k1+1). Bayesian estimator of probabilities in 4 is given (using squared-
error loss function) as mean value of this Dirichlet distribution, i.e.
ˆpr=nr+1
K+3,r=1,0,1.(6)
If p1,p0,p1follows Dirichlet distribution Dir(a1=k1+1,a2=k0+1,a3=k1+1),k1+k0+k1=K,
then marginal distribution of pr,r=1,0,1, is Beta(a=kr+1,b=Kkr+2)(see (Pitman 1993, p. 473)).
This can be used to find individual (1alau)-confidence intervals (ˆpr,l,ˆpr,u)for each prwhich are given
by
ˆpr,l=Beta1(al,kr+1,Kkr+2)(7)
and
ˆpr,u=Beta1(au,kr+1,Kkr+2)(8)
Remark 2.These individual confidence intervals can be used for simultaneous confidence interval of all three
parameters. Based on Bonferroni inequality, they form together a (13(al+au))-simultaneous confidence
interval.
For testing hypothesis it is necessary to obtain P(p1>p1)from Equation 5. Using results of (Omar &
Joarder 2012, p. 932) and observed values of k1and k1this probability is estimated as
P(p1>p1)=1I1/2(k1+1,k1+1),(9)
where I1/2(k1+1,k1+1)is regularized incomplete beta function or cumulative distribution function of
Beta distribution.
Remark 3.P(p1>p1)in this paper is an estimate based on observed values of k1and k1. However, for
better readability, the word estimate is omitted in the following text.
P(p1>p1)is the probability of occurrence of home team advantage, i.e. it can be used as a measure of
home team advantage (the higher value of P(p1>p1), the higher home team advantage). Hypothesis that
the home team advantage is real can be accepted if P(p1>p1)1a.
3 Results
As mentioned before, we analysed English Premier League from the 1992/1993 season to the 2015/2016
season. Totally, 9,366 matches were played in these seasons, and, thanks to promotion and relegation,
there are 47 teams that played at least one season in the English Premier League. Out of these teams, only
247
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
seven played in each season (Arsenal, Aston Villa, Chelsea, Everton, Liverpool, Manchester United, and
Tottenham). We also remind that in the first three seasons English Premier League consisted of 22 teams and
of 20 teams in the following seasons.
For each team in each season the hypothesis that home team advantage is real was tested (see Equation 9).
The hypothesis is accepted in the case where P(p1>p1)0.95. These tests were performed for the
combined measure of home team advantage that was described in Definition 3. Numbers of teams for which
the hypothesis about home team advantage was accepted are presented in Table 1. The highest number was
recorded in the 2009/2010 season (17 teams out of 20), and the lowest number was recorded in the 2015/2016
season (2 teams out of 20).
Season Teams Season Teams Season Teams
1992/93 11 2000/01 9 2008/09 5
1993/94 5 2001/02 8 2009/10 17
1994/95 12 2002/03 8 2010/11 10
1995/96 8 2003/04 7 2011/12 9
1996/97 4 2004/05 10 2012/13 4
1997/98 9 2005/06 10 2013/14 5
1998/99 6 2006/07 8 2014/15 5
1999/00 13 2007/08 12 2015/16 2
Table 1: Numbers of teams for which the hypothesis about home team advantage was accepted.
Table 2 contains numbers of cases where combined measure of home team advantage (Ci) took value of
1,0,or 1 in the 2015/2016 season. Each team played with 19 opponents, and therefore 19 observations
(samples) are obtained for each team. This table also contains P(p1>p1)(based on Ci’s), and two teams –
Newcastle and Swansea – where it is possible to accept the hypothesis that home team advantage exists are
marked with an asterisk.
Now, we will present evolution of P(p1>p1), estimate ˆp1, and 95% confidence interval (ˆp1,l,ˆp1,u)dur-
ing the time. These results are presented for two selected teams (we choose among the previously mentioned
seven teams that played in each season of English Premier League). The first presented team – Liverpool –
is the team with the highest home team advantage (measured simply as an average of obtained probabilities
P(p1>p1)in all seasons). Liverpool is also the team with the lowest changes in P(p1>p1). These
changes were measured using two criteria; the first was sample standard deviation of P(p1>p1), and the
second was sum of absolute differences in P(p1>p1)between two consecutive seasons. In both criteria,
Liverpool recorded the lowest value out of the seven mentioned teams. Results of Liverpool are in Figure 1
and Figure 2; the first figure contains evolution of P(p1>p1)and the second figure contains evolution of
ˆp1,ˆp1,l, and ˆp1,u. Seasons where it is possible to accept hypothesis that home team advantage exists, i.e.
where P(p1>p1)0.95, are denoted by full bullets (•) in Figure 1.
The team with highest changes in P(p1>p1)was Arsenal (this holds for both used criteria). Arsenal
also had the second lowest home team advantage (i.e. average value of P(p1>p1)). The lowest home team
advantage among the seven mentioned teams was recorded by Chelsea with average value of P(p1>p1)
equalling to 0.818. For comparison, the average value of this probability for Arsenal was 0.833 and for
Liverpool 0.892. Evolution of parameters for Arsenal are presented in Figure 3 and Figure 4.
248
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
Team Ci=1Ci=0Ci=1 Sum P(p1>p1)
Arsenal 5 4 10 19 0.895
Aston Villa 5 6 8 19 0.788
Bournemouth 8 5 6 19 0.304
Crystal Palace 9 4 6 19 0.227
Everton 8 3 8 19 0.500
Chelsea 6 5 8 19 0.696
Leicester 6 6 7 19 0.605
Liverpool 6 5 8 19 0.696
Man City 5 3 11 19 0.928
Man United 4 5 10 19 0.941
Newcastle⇤2 4 13 19 0.998
Norwich 5 4 10 19 0.895
Southampton 7 0 12 19 0.868
Stoke 6 3 10 19 0.834
Sunderland 6 1 12 19 0.916
Swansea⇤5 2 12 19 0.952
Tottenham 5 8 6 19 0.613
Watford 4 8 7 19 0.806
West Brom 9 3 7 19 0.315
West Ham 9 1 9 19 0.500
Table 2: Results for the 2015/2016 season
Figure 1: Evolution of P(p1>p1)for Liverpool.
249
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
Figure 2: Evolution of Bayesian estimate and symmetric 95% confidence interval for p1for Liverpool.
Figure 3: Evolution of P(p1>p1)for Arsenal.
250
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
Figure 4: Evolution of Bayesian estimate and symmetric 95% confidence interval for p1for Arsenal.
Evolution of P(p1>p1)for all teams that played at least once between the 2012/13 season and the
2015/2016 season is presented in Table 3. Bold font is used for those results where it is possible to accept
hypothesis that home team advantage exists. Norwich in the 2013/2014 season is nice example that the home
team advantage does not ensure good results. It only ensures that results on a home field are better than on
an away field but both can mean loss. Norwich in the 2013/2014 season recorded three times Ci=1, once
Ci=0, and 15 times Ci=1. For example, Norwich lost 0–1 to Manchester United at home field and 0–4
in Manchester. Obviously, 0–1 is better results than 0–4, and therefore Ci=1 in this case, as described in
Definition 3. In fact, home team advantage can be, in this sense, called away field disadvantage.
The last presented results are extreme values obtained in all seasons. Five lowest values of P(p1>p1)
are presented in Table 4 and five highest values in Table 5. These tables also contain numbers of cases where
combined measure of home team advantage (Ci) took value of 1,0,or 1 in the referred season. It can be
seen that P(p1>p1)is in many cases close to 1 but it is usually far from 0.
4 Discussion
Methods were presented on English Premier League data between 1992/1993 season and 2015/2016 season.
Each team was tested in each season to identify whether it is possible to accept hypothesis about the home
team advantage. Results are diverse – from two teams with the home team advantage in the 2015/16 season
to 17 teams in the 2009/2010 season – and with no clear trend. Full results for the 2015/2016 season were
presented along with P(p1>p1)(i.e. probability that probability of home team advantage is higher than
probability of away team advantage) that can be used as a measure of the home team advantage; the higher
value, the higher home team advantage. In the 2015/2016 season only Swansea and Newcastle had this
probability over 0.95, and hypothesis about existing home team advantage can be accepted for them.
251
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
Team Season
12/13 13/14 14/15 15/16
Arsenal 0.895 0.962 0.975 0.895
Aston Villa 0.760 0.685 0.820 0.788
Bournemouth — — — 0.304
Burnley — — 0.685 —
Cardiff — 0.928 — —
Chelsea 0.834 0.849 0.849 0.696
Crystal Palace — 0.696 0.212 0.227
Everton 0.994 0.928 0.849 0.500
Fulham 0.760 0.962 ——
Hull — 0.928 0.928 —
Leicester — — 0.895 0.605
Liverpool 0.773 0.748 0.834 0.696
Man City 0.952 1.000 0.820 0.928
Man United 0.867 0.500 0.996 0.941
Newcastle 0.760 0.820 0.975 0.998
Norwich 0.994 0.998 — 0.895
QPR 0.500 — 0.996 —
Reading 0.788 — — —
Southampton 0.788 0.849 0.881 0.868
Stoke 0.773 0.975 0.916 0.834
Sunderland 0.773 0.500 0.402 0.916
Swansea 0.788 0.928 0.867 0.952
Tottenham 0.500 0.895 0.928 0.613
Watford — — — 0.806
West Brom 0.928 0.941 0.676 0.315
West Ham 0.994 0.788 0.952 0.500
Wigan 0.304 — — —
Table 3: Evolution of P(p1>p1)for all teams in the seasons 2012/13–2015/16.
Team Season P(p1>p1)Ci=1Ci=0Ci=1
Hull 2008/09 0.038 11 4 4
Norwich 1993/94 0.072 11 5 5
Blackburn 2003/04 0.166 10 3 6
Wolves 2011/12 0.166 10 3 6
Crystal Palace 1997/98 0.180 11 1 7
Table 4: Five lowest obtained values of P(p1>p1).
252
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
Team Season P(p1>p1)Ci=1Ci=0Ci=1
Blackburn 2009/10 0.99999 0 4 15
Leeds 1992/93 0.99998 1 2 18
West Ham 1997/98 0.99998 1 0 18
Arsenal 1997/98 0.99993 1 2 16
Bolton 2005/06 0.99993 1 2 16
Table 5: Five highest obtained values of P(p1>p1)(more decimal places of estimates are shown only for
illustration, all results can be considered as equivalent).
Since the 1992/1993 season, only seven teams played all seasons of English Premier League. Among
these teams, Liverpool had the highest home team advantage and Chelsea had the lowest. It is necessary to
remind that the home team advantage means that a result at a home field is better than on an away field, and
both results can be loss. Therefore, the home team advantage does not imply good results. In fact, home
team advantage can be also named away field disadvantage.
In results for all teams and all seasons, the lowest value of P(p1>p1)was obtained for Hull in the
2008/2009 season. This probability was 0.038, and it is based on observation that out of 19 opponents Hull
recorded better result on away field for 11 of them. On the other side is Blackburn in the 2009/2010 season
with the highest recorded value of P(p1>p1). Out of 19 opponents, Blackburn played better on a home
field in 15 cases, and in 4 cases there was no advantage on either side.
5 Conclusion
This paper offers alternative approach for identification of home team advantage in results. The new method
is based on goals scored rather than on points awarded. This allows to distinguish matches that looks iden-
tical when points are used; for example, a 0–2 loss is not as bad as a 1–5 loss. Three measures of home
team advantage were defined: active, passive, and their combination. Later, the Bayesian estimator and con-
fidence intervals for probabilities of appropriate states – home team advantage, no advantage, and away team
advantage – were found. The last theoretical part contains test of the home team advantage. The new method
was presented on English Premier League, and results suggest that home team advantage is real; however, it
cannot be taken for granted.
Acknowledgement
This publication was supported by the project LO1506 of the Czech Ministry of Education, Youth and Sports.
References
Allen, M. S. & Jones, M. V. (2014), ‘The home advantage over the first 20 seasons of the English Premier
League: Effects of shirt colour, team ability and time trends.’, International Journal Of Sport And Exercise
Psychology 12(1), 10–18.
253
MathSport International 2017 Conference Proceedings
Home Team Advantage in English Premier League P. Marek, F. Vávra
Dixon, M. J. & Coles, S. G. (1997), ‘Modelling Association Footbal Scores and Inefficiencies in the Football
Betting Market’, Journal of the Royal Statistical Society. Series C (Applied Statistics) 46(2), 265–280.
England Football Results and Betting Odds (2017), ‘Premiership Results & Betting Odds’.
.
Karlis, D. & Ntzoufras, I. (2003), ‘Analysis of sports data by using bivariate Poisson models’, The Statistician
52(3), 381–393.
Maher, M. J. (1982), ‘Modelling association football scores’, Statistica Neerlandica 36(3), 109–118.
Marek, P., Šedivá, B. & ˇ
Toupal, T. (2014), ‘Modeling and prediction of ice hockey match results’, Journal
of Quantitative Analysis in Sports 10(3), 357–365.
Omar, M. H. & Joarder, A. H. (2012), ‘Some Mathematical Characteristics of the Beta Density Function of
Two Variables’, Bulletin of the Malaysian Mathematical Sciences Society 35(4), 923–933.
Pitman, J. (1993), Probability, 1 edn, Springer.
Pollard, R. & Pollard, G. (2005), ‘Long-term trends in home advantage in professional team sports in North
America and England (1876–2003)’, Journal of Sports Sciences 23(4), 337–350.
Premier League Football News, Fixtures, Scores & Results (2017), ‘Premier League Football Scores, Results
& Season Archives’. .
254