Article

A birth process model for association football matches

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Data from over 4000 recent association football (soccer) matches from the main English competitions show clear evidence that the rate of scoring goals changes over the course of a match. This rate tends to increase over the game but is also influenced by the current score. We develop a model for a soccer match that incorporates parameters for both the attacking and the defensive strength of a team, home advantage, the current score and the time left to play. This model treats the number of goals scored by the two teams as interacting birth processes and shows a satisfactory fit to the data. We also investigate football cliches and find evidence that contradicts the cliche that a team is more vulnerable just after it has scored a goal. Our model has applications in the football spread betting market, where prices are updated during a match, and may be useful to both bookmakers and bettors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Prior work by Dixon and Robinson [10] models how the rate of scoring goals changes over the course of a match. Their model incorporates parameters for both the attacking and the defensive strength of a team, home advantage, the current score and the time left to play. ...
... We compare how often teams took our optimised action in the real world (based on the two different approaches suggested) and if not, evaluate how much our action suggestion would have boosted the team's in-game chances of moving to a more positive state and to win the game. 10 We first test the action payoff model discussed in Section 5.2 which uses the state transition probability, substitution and the time of the game to calculate the payoff of the given substitute. By so doing, our model (tested in the previous subsection) predicts the next state with an average accuracy is 95.5% (standard deviation of 4.5%), again tested using a train-test split of 70% to 30% with a cross-validation approach for 10 fold. ...
... This shows that the changes in tactics that are made in a game can have an impact on the overall outcome and help teams to move into more positive states or stay in the current state if a team is winning a game. By using the stochastic game payoffs we can optimise the efficiency of these decisions by 3.4% 10 We do not have data for the players that are included as substitutes so we consider all squad players (instead of just the 7 substitutes) which impacts our accuracy. which could have a significant difference to a team across a season in a game such as football, where every marginal gain counts. ...
Preprint
In this paper we present a novel approach to optimise tactical and strategic decision making in football (soccer). We model the game of football as a multi-stage game which is made up from a Bayesian game to model the pre-match decisions and a stochastic game to model the in-match state transitions and decisions. Using this formulation, we propose a method to predict the probability of game outcomes and the payoffs of team actions. Building upon this, we develop algorithms to optimise team formation and in-game tactics with different objectives. Empirical evaluation of our approach on real-world datasets from 760 matches shows that by using optimised tactics from our Bayesian and stochastic games, we can increase a team chances of winning by up to 16.1\% and 3.4\% respectively.
... In spreading betting markets, it is crucial to predict the following score conditional on the current score at any time during the match between specified teams. To this end, Dixon and Robinson (1998) developed a birth process model. The processes of goal times of home and away teams are taken to be two nonhomogeneous Poisson processes, which indicates more than one goal in any time interval is permitted. ...
... The processes of goal times of home and away teams are taken to be two nonhomogeneous Poisson processes, which indicates more than one goal in any time interval is permitted. Later, Volf (2009) proposed a random point process model similar to Dixon and Robinson (1998) by considering the effect of covariates. However, according to historical record, no more than one goal in a minute had happened, except for the time intervals (44,45] and (89,90] in consideration of injury. ...
... .} at the 90th minute, which is obtained by integrating over all possible times and all possible routes to arrive at a state (x, y). Therefore, the heavy computation makes direct calculation infeasible (Dixon and Robinson, 1998). Based on the discrete-time and finite-state Markov chain model, we derive a recursive algorithm that makes direct calculation feasible. ...
Article
A birth process model proposed by Dixon and Robinson (1998 Dixon, M. and Robinson, M. (1998). A birth process model for association football matches. Journal of the Royal Statistical Society: Series D (The Statistician), 47(3), 523–538.[CrossRef] [Google Scholar]) has been widely used in football spread betting market. However, multiple goals in a minute are permitted in the model, which does not conform to historical record. Moreover, it is difficult to calculate the outcome probability of the process accurately. The paper presents a discrete-time and finite-state Markov chain model for real-time forecast of football matches and a recursive algorithm is derived to calculate the outcome probability accurately. Empirical study shows that the proposed model outperforms the models of Dixon and Robinson (1998 Dixon, M. and Robinson, M. (1998). A birth process model for association football matches. Journal of the Royal Statistical Society: Series D (The Statistician), 47(3), 523–538.[CrossRef] [Google Scholar]) and Dixon and Coles (1997 Dixon, M.J. and Coles, S.G. (1997). Modelling association football scores and inefficiencies in the football betting market. Journal of the Royal Statistical Society: Series C (Applied Statistics), 46(2), 265–280.[CrossRef], [Web of Science ®] [Google Scholar]).
... However, among the huge amount of literature published on prediction of association football games, only a few papers concentrated on the in-play prediction. Dixon et al. [20] developed a pure birth process model, where the processes of goal times of home and away teams are taken to be two nonhomogeneous Poisson processes. In attempting to align themselves with practical circumstances, Zou et al. [21] proposed a discrete-time and finite-state Markov chain model that is grounded within the Poisson processes, where no more than one goal in a minute had happened, except during time intervals (44,45] and (89,90] in consideration of injury time, and a recursive algorithm was derived to accurately calculate the outcome probability. ...
... Before introducing how to update the teams' strength by the in-match information, we firstly summarize the model of Dixon and Robinson (1998). It is the basic model for the calibration of the teams' ability parameters, which is also named as the pure birth process model. ...
Article
Full-text available
Point process models have made a significant contribution to the prediction of football association outcomes. It is conventionally the case that defence and attack capabilities have been assumed to be constant during a match and estimated against the average performance of all other teams in history. Drawing upon a Bayesian method, this paper proposes a dynamic strength model which relaxes assumption of the constant teams' strengths and permits applying in-match performance information to calibrate them. An empirical study demonstrates that although the Bayesian model fails to achieve improvement in goal difference prediction, it registers clear achievements with regard to the prediction of the total number of goals and Win/Draw/Loss outcome prediction. When the Bayesian model bets against the SBOBet bookmaker, one of the most popular gaming companies among Asian handicaps fans, whose odds data were obtained from both the Win/Draw/Loss market and over-under market, it may obtain positive returns; this clearly contrasts with the process model with constant strengths, which fails to win money from the bookmaker.
... Distribution of in-play goal times has been studied by Dixon and Robinson (1998) who applied a state-dependent Poisson model where the goal intensities of the teams depend on the current score and time. The model also accounts for other factors such as home effect and injury time. ...
... 1/ √ 90min. The fact that implied goal intensities are increasing during the game is consistent with findings of Dixon and Robinson (1998) who found gradual increase of scoring rates by analysing goal times of 4012 matches between 1993 and 1996. ...
Preprint
A risk-neutral valuation framework is developed for pricing and hedging in-play football bets based on modelling scores by independent Poisson processes with constant intensities. The Fundamental Theorems of Asset Pricing are applied to this set-up which enables us to derive novel arbitrage-free valuation formul\ae\ for contracts currently traded in the market. We also describe how to calibrate the model to the market and how trades can be replicated and hedged.
... Similarly, Groll and Abedieh (2013) and Groll et al. (2015) show that, up to a certain amount, the scores' dependence on two competing teams may be explained by the inclusion of some specific teams' covariates in the linear predictors. However, Dixon and Robinson (1998) note that modelling the dependence along a single match is possible: in such a case, a temporal structure in the 90 minutes is required. ...
... Baio and Blangiardo (2010) and Dixon and Coles (1997) assume that these team-specific effects do not vary over the time, and this represents a major limitation in their models. In fact, Dixon and Robinson (1998) show that the attack and defence effects are not static and and may even vary during a single match; thus, a static assumption is often not reliable for making predictions and represents a crude approximation of the reality. Rue and Salvesen (2000) propose a generalised linear Bayesian model in which the team-effects at match time τ are drawn from a Normal distribution centred at the team-effects at match time τ − 1, and with a variance term depending on the time difference. ...
Article
Full-text available
Modelling football outcomes has gained increasing attention, in large part due to the potential for making substantial profits. Despite the strong connection existing between football models and the bookmakers' betting odds, no authors have used the latter for improving the fit and the predictive accuracy of these models. We have developed a hierarchical Bayesian Poisson model in which the scoring rates of the teams are convex combinations of parameters estimated from historical data and the additional source of the betting odds. We apply our analysis to a nine-year dataset of the most popular European leagues in order to predict match outcomes for their tenth seasons. In this paper, we provide numerical and graphical checks for our model.
... "Moreover, once a goal is scored, another goal becomes more and more likely whether the goal was scored or conceded" (Nevo and Ritov, 2012). Also, Dixon and Robinson (1998) argued that expectation of a goal is dependent on current score, and when an early away goal is scored, expectation of further goals is increased (more than the original expectation of goals before the match) with a bias towards the home team having their goal expectation increased. A recent study conducted by Pratas et al., (2016), which used proportional-hazards regression models with timedependent covariates, allowed for the identification of performance indicators (that is goal difference, shots VOLUME 13 | ISSUE 1 | 2018 | 9 on goal, disciplinary sanctions and substitutions) that influence the time at which the first goal is scored in high-level football matches. ...
... The probability of a goal being scored is certainly dependent on the current score and there are variables which increase goal expectation, such as shots on goal (Pratas et al., 2016), an early red card (Bar-Eli et al., 2006) and an early away goal (Dixon and Robinson, 1998). These findings support the suggestion that in order to predict future performance and outcome based on past performances, it is not enough to analyse merely what happened in a match, but it is also important to know when events occurred. ...
Article
Full-text available
The aim of this paper is to review the available literature on goal scoring in elite male football leagues. A systematic search of two electronic databases (SPORTDiscus with Full Text and ISI Web Knowledge All Databases) was conducted and of the 610 studies initially identified, 19 were fully analysed. Studies that fitted all the inclusion criteria were organised according to the research approach adopted (static or dynamic). The majority of these studies were conducted in accordance with the static approach (n=15), where the data were collected without considering dynamic of performance during matches and were analysed using standard statistical methods for data analysis. They focused predominantly on a description of key performance indicators (technical and tactical). Meanwhile, in a few studies the dynamic approach (n=4) was adopted, where performance variables were recorded taking into account the chronological and sequential order in which they occurred. Different advanced analysis techniques for assessing performance evolution over time during the match were used in this second group of studies. The strengths and limitations of both approaches in terms of providing the meaningful information for coaches are discussed in the present study.
... Nous avons donc cherché à analyser l'influence du lieu de la rencontre sur le nombre d'essais marqués par les joueurs des différents postes. Des données antérieures démontrent que la probabilité de marquer des points n'est pas constante au cours d'une rencontre et tend à augmenter vers la fin du match aussi bien en Rugby (Conquet, 1995) qu'en football (Dixon & Robinson, 1998). Nous avons cherché à confirmer ce phénomène, et à le quantifier en fonction du lieu de la rencontre. ...
... On observe une augmentation progressive du nombre d'essais du début à la fin du match, identique quel que soit le lieu du match. Ces données confirment les observations de Conquet (1995) et Dixon & Robinson (1998). Ce phénomène reste encore difficile à expliquer. ...
... Similarly, Groll and Abedieh (2013) and Groll et al. (2015) show that, up to a certain amount, the scores' dependence on two competing teams may be explained by the inclusion of some specific teams' covariates in the linear predictors. However, Dixon and Robinson (1998) note that modelling the dependence along a single match is possible: in such a case, a temporal structure in the 90 minutes is required. ...
... Baio and Blangiardo (2010) and Dixon and Coles (1997) assume that these team-specific effects do not vary over the time, and this represents a major limitation in their models. In fact, Dixon and Robinson (1998) show that the attack and defence effects are not static and and may even vary during a single match; thus, a static assumption is often not reliable for making predictions and represents a crude approximation of the reality. Rue and Salvesen (2000) propose a generalised linear Bayesian model in which the team-effects at match time τ are drawn from a Normal distribution centred at the team-effects at match time τ − 1, and with a variance term depending on the time difference. ...
Preprint
Full-text available
Modelling football outcomes has gained increasing attention, in large part due to the potential for making substantial profits. Despite the strong connection existing between football models and the bookmakers' betting odds, no authors have used the latter for improving the fit and the predictive accuracy of these models. We have developed a hierarchical Bayesian Poisson model in which the scoring rates of the teams are convex combinations of parameters estimated from historical data and the additional source of the betting odds. We apply our analysis to a nine-year dataset of the most popular European leagues in order to predict match outcomes for their tenth seasons. In this paper, we provide numerical and graphical checks for our model.
... Similarly, Groll and Abedieh (2013) and Groll et al. (2015) show that, up to a certain amount, the scores' dependence on two competing teams may be explained by the inclusion of some specific team covariates in the linear predictors. However, Dixon and Robinson (1998) note that modelling the dependence along a single match is possible: in such a case, a temporal structure in the 90 minutes is required. ...
... Baio and Blangiardo (2010) and Dixon and Coles (1997) assume that these team-specific effects do not vary over the time, and this represents a major limitation in their models. In fact, Dixon and Robinson (1998) show that the attack and defence effects are not static and may even vary during a single match; thus, a static assumption is often not reliable for making predictions and represents a crude approximation of the reality. Rue and Salvesen (2000) propose a generalised linear Bayesian model in which the team-effects at match time τ are drawn from a Normal distribution centred at the team-effects at match time τ − 1, and with a variance term depending on the time difference. ...
Article
Full-text available
Modelling football outcomes has gained increasing attention, in large part due to the potential for making substantial profits. Despite the strong connection existing between football models and the bookmakers’ betting odds, no authors have used the latter for improving the fit and the predictive accuracy of these models. We have developed a hierarchical Bayesian Poisson model in which the scoring rates of the teams are convex combinations of parameters estimated from historical data and the additional source of the betting odds. We apply our analysis to a nine-year dataset of the most popular European leagues in order to predict match outcomes for their tenth seasons. In this article, we provide numerical and graphical checks for our model.
... The first aspect of forecasting is statistical and related to developing team ratings and forecasting models with the best possible ability to derive forecasts from obvious predictors such as prior match results. One of the most prominent approaches is to estimate offensive and defensive strength parameters of the teams and use these as inputs for probability models including Poisson models (Koopman and Lit 2015;Maher 1982), birth process models (Dixon and Robinson 1998) and Weibull count models (Boshnakov et al. 2017). Other researchers have used regression models based on one or various covariates such as Hvattum and Arntzen (2010) using ELO ratings in combination with an ordered logit regression model or Goddard and Asimakopoulos (2004) using various covariates in an ordered probit regression model. ...
... In fact, some researchers have put thoughts to the scoring processes during the course of the match in more detail. Dixon and Robinson (1998) use a birth process model allowing scoring intensities to change during the match and depend on the score to analyse the deviations from constant scoring rates. Similarly, Heuer and Rubner (2012) use a model-free statistical analysis to investigate in which match situations scoring intensities deviate from a constant rate. ...
Article
Full-text available
Data-related analysis in football increasingly benefits from Big Data approaches and machine learning methods. One relevant application of data analysis in football is forecasting, which relies on understanding and accurately modelling the process of a match. The present paper tackles two neglected facets of forecasting in football: Forecasts on the total number of goals and in-play forecasting (forecasts based on within-match information). Sentiment analysis techniques were used to extract the information reflected in almost two million tweets from more than 400 Premier League matches. By means of wordclouds and timely analysis of several tweet-based features, the Twitter communication over the full course of matches and shortly before and after goals was visualized and systematically analysed. Moreover, several forecasting models including a random forest model have been used to obtain in-play forecasts. Results suggest that in-play forecasting of goals is highly challenging, and in-play information does not improve forecasting accuracy. An additional analysis of goals from more than 30,000 matches from the main European football leagues supports the notion that the predictive value of in-play information is highly limited compared to pre-game information. This is a relevant result for coaches, match analysts and broadcasters who should not overestimate the value of in-play information. The present study also sheds light on how the perception and behaviour of Twitter users change over the course of a football match. A main result is that the sentiment of Twitter users decreases when the match progresses, which might be caused by an unjustified high expectation of football fans before the match.
... One significant improvement on Maher was the use of home team advantage which is discussed in Clarke & Norman (1995), here the value of home advantage is calculated. Other improvements on these models are shown in Dixon & Robinson (1998) and Crowder, Dixon, Ledford, & Robinson (2002). ...
Article
Full-text available
In this paper, we critically evaluate the performance of nine machine learning classification techniques when applied to the match outcome prediction problem presented by American Football. Specifically, we implement and test nine techniques using real-world datasets of 1280 games over 5 seasons from the National Football League (NFL). We test the nine different classifier techniques using a total of 42 features for each team and we find that the best performing algorithms are able to improve one previous published works. The algoriothms achieve an accuracy of between 44.64% for a Guassian Process classifier to 67.53% with a Naïve Bayes classifer. We also test each classifier on a year by year basis and compare our results to those of the bookmakers and other leading academic papers.
... Detailed evidence of its forecast precision in forecasting match results is presented. Finally, another interesting and original contribution in this category is given by Dixon and Robinson (1998) who treat the number of scored goals by the competing teams during a match as interacting birth processes. ...
... Distribution of in-play goal times has been studied by Dixon and Robinson (1998) who applied a state-dependent Poisson model where the goal intensities of the teams depend on the current score and time. The model also accounts for other factors such as home effect and injury time. ...
Article
Full-text available
A risk-neutral valuation framework is developed for pricing and hedging in-play football bets based on modelling scores by independent Poisson processes with constant intensities. The Fundamental Theorems of Asset Pricing are applied to this set-up which enables us to derive novel arbitrage-free valuation formulæ for contracts currently traded in the market. We also describe how to calibrate the model to the market and how trades can be replicated and hedged.
... Thus, it is not necessary to restrict the inputs to smaller sets or apply additional techniques to reduce the dimensions of the predictors' pool. For example, Dixon and Robinson (1998) Other studies consider the odds of home win/draw/away win for football game predictions (see among others, Dixon and Coles, 1997;Crowder et al., 2002;Dobson and Goddard, 2003;Constantinou et al., 2012;and Boshnakov et al., 2017). The number of corner kicks implies offensive pressure and is considered a good proxy for higher scoring probability. ...
Thesis
Full-text available
This thesis consists of four essays exploring quantitative methods for investment analysis. Chapter 1 is an introduction to the topic where the backgrounds, motivations and contributions of the thesis are discussed. This Chapter proposes an expert system paradigm which accommodates the methodology for all four empirical studies presented in Chapters 2 to 5. In Chapter 2 the profitability of technical analysis and Bayesian Statistics in trading the EUR/USD, GBP/USD, and USD/JPY exchange rates are examined. For this purpose, seven thousand eight hundred forty-six technical rules are generated, and their profitability is assessed through a novel data snooping procedure. Then, the most promising rules are combined with a Naïve Bayes (NB), a Relevance Vector Machine (RVM), a Dynamic Model Averaging (DMA), a Dynamic Model Selection (DMS) and a Bayesian regularised Neural Network (BNN) model. The findings show that technical analysis has value in Foreign eXchange (FX) trading, but the profit margins are small. On the other hand, Bayesian Statistics seems to increase the profitability of technical rules up to four times. Chapter 3 introduces the concept of Conditional Fuzzy (CF) inference. The proposed approach is able to deduct Fuzzy Rules (FRs) conditional to a set of restrictions. This conditional rule selection discards weak rules and the generated forecasts are based only on the most powerful ones. In order to achieve this, an RVM is used to extract the most relevant subset of predictors as the CF inputs. Through this process, it is capable of achieving higher forecasting performance and improving the interpretability of the underlying system. The CF concept is applied in a betting application on football games of three main European championships. CF’s performance in terms of accuracy and profitability over the In-Sample (IS) and Out-Of-Sample (OOS) are benchmarked against the single RVM and an Adaptive Neuro-Fuzzy Inference System (ANFIS) fed with the same CF inputs and an Ordered Probit (OP) fed with the full set of predictors. The results demonstrate that the CF is providing higher statistical accuracy than its benchmarks, while it is offering substantial profits in the designed betting simulation. Chapter 4 proposes the Discrete False Discovery Rate (DFDR+/-) as an approach to compare a large number of hypotheses at the same time. The presented method limits the probability of having lucky findings and accounts for the dependence between candidate models. The performance of this approach is assessed by backtesting the predictive power of technical analysis in stock markets. A pool of twenty-one thousand technical rules is tested for a positive Sharpe ratio. The surviving technical rules are used to construct dynamic portfolios. Twelve categorical and country-specific Morgan Stanley Capital International (MSCI) indexes are examined over ten years (2006-2015). There are three main findings. First, the proposed method has high power in detecting the profitable trading strategies and the time-related anomalies across the chosen financial markets. Second, the emerging and frontier markets are more profitable than the developed markets despite having higher transaction costs. Finally, for a successful portfolio management, it is vital to rebalance the portfolios on a monthly basis or more frequently. Chapter 5 undertakes an extensive investigation of volatility models for six securities in FX, stock index and commodity markets, using daily one-step-ahead forecasts over five years. A discrete false discovery controlling procedure is employed to study one thousand five hundred and twelve volatility models from twenty classes of Generalized AutoRegressive Conditional Heteroskedasticity (GARCH), Exponential Weighted Moving Average (EWMA), Stochastic Volatility (SV), and Heterogeneous AutoRegressive (HAR) families. The results indicate significant differences in forecasting conditional variance. The most accurate models vary across the three market categories and depend on the study period and measurement scale. Time-varying means, Integrated GARCH (IGARCH) and SV, as well as fat-tailed innovation distributions are the dominant specifications for the outperforming models compared to three benchmarks of ARCH (1), GARCH (1,1), and the volatility pool’s 90th percentile. Finally, Chapter 6 puts together the main findings from the four essays and presents the concluding marks.
... It can be understood as a Bernoulli experiment, where the occurrence of each goal has a fixed probability, and events do not influence each other. To a good approximation, the distribution of the number of goals in a match between two teams A and B is taken as the product of two independent Poisson distributions [8,9,10,11,12]. The probability p(k,l) for the match result is given by ...
Preprint
Full-text available
Resilience is the ability to positively respond to adversity. It has been studied in psychology for several decades, with focus on how individuals overcome traumata or cope with setbacks and obstacles in their professional career. Research on resilience in the sport context is rather new. Activities are based on insights that in highly competitive environments, tiny effects tip the scales. A key question of measuring resilience in sports is what parameters to measure. Here a novel concept is proposed to measure the resilience of soccer teams. The frequency of matches is determined, where a soccer team, which is initially trailing by 2 goals, finally succeeds to win the match or at least to reach a draw. The analysis is applied to the last 59 seasons of the German premier soccer league Bundesliga. The empirical data are compared with a theoretical model derived from Poisson distributions. It is shown how leading teams in the premier soccer league differ from the average with respect to resilience, which provides further insights into the hidden secrets of top soccer teams.
... Angelini & De Angelis, 2017;Baio & Blangiardo, 2010;M. Dixon & Robinson, 1998;Oberstone, 2009). Predictors which have been used in the previous literature, but were not included here include the number of: corners won(Andersson et al., 2009), shots on target ...
Article
Gamblers are frequently reminded to “gamble responsibly.” But these qualitative reminders come with no quantitative information for gamblers to judge relative product risk in skill-based gambling forms. By comparison, consumers purchasing alcohol are informed of product strength by alcohol by volume (ABV %) or similar labels. This paper uses mixed logistic regression machine learning to uncover the potential variation in soccer betting outcomes. This paper uses data from four bet types and eight seasons of English Premier League soccer, ending in 2018. Outcomes across each bet type were compared using three betting strategies: the most-skilled prediction, a random strategy, and the least-skilled prediction. There was a large spread in betting outcomes, with for example the per-bet average loss varying by a factor of 54 (from 1.1% to 58.9%). Gamblers’ losses were positively correlated with the observable betting odds across all bets, indicating that betting odds are one salient feature which could be used to inform gamblers about product risk. Such large differences in product risk are relevant to the promotion of responsible gambling.
... Therefore, the model was provided with in-game statistics summarising each team's performance over the previous five matches. Based on successful in-game predictors from the previous literature, we chose the cumulative number of points earnedthree for a win, one for a draw and zero for a loss (Goddard & Asimakopoulos, 2004;Goddard, 2005) and the number of goals scored and conceded (Dixon & Robinson, 1998;Oberstone, 2009;Baio & Blangiardo, 2010;Angelini & De Angelis, 2017). Predictors that have been used in the previous literature but were not included here include the number of corners won (Andersson et al., 2009), shots on target (Oberstone, 2009), recent injuries (Constantinou & Fenton, 2017) and disciplinary bookings (Titman et al., 2015). ...
Article
Full-text available
Gamblers are frequently reminded to ‘gamble responsibly’. But these qualitative reminders come with no quantitative information for gamblers to judge relative product risk in skill-based gambling forms. By comparison, consumers purchasing alcohol are informed of product strength by alcohol by volume percentage (ABV%) or similar labels. This paper uses mixed logistic regression machine learning to uncover the potential variation in soccer betting outcomes. This paper uses data from four bet types and eight seasons of English Premier League soccer, ending in 2018. Outcomes across each bet type were compared using three betting strategies: the most-skilled prediction, a random strategy and the least-skilled prediction. There was a large spread in betting outcomes, with, for example, the per-bet average loss varying by a factor of 54 (from 1.1% to 58.9%). Gamblers’ losses were positively correlated with the observable betting odds across all bets, indicating that betting odds are one salient feature that could be used to inform gamblers about product risk. Such large differences in product risk are relevant to the promotion of responsible gambling.
... As for the academic community, the appearance in recent years of a number of scientific studies evidences a gradual increase of scholarly interest in the debates surrounding World Cup issues. Most of these works have dealt with predicting results (e.g., Maher 1982;Dixon and Robinson 1998;Rue and Øyvind Salvesen 2000;Dyte and Clarke 2000) while others have focused specifically on the FIFA rankings (McHale and Davies 2007;Suzuki et al. 2010;Lasek et al. 2013), but mostly studying their predictive power rather than proposing modifications. More recently, Lasek et al. (2016) elaborate strategies to improve a team's position in the FIFA ranking, based on choosing opponents for friendly games so as to maximize the probability of advancing in the ranking. ...
Article
Full-text available
This paper analyzes the procedure used by FIFA up until 2018 to rank national football teams and define by random draw the groups for the initial phase of the World Cup finals. A predictive model is calibrated to form a reference ranking to evaluate the performance of a series of simple changes to that procedure. These proposed modifications are guided by a qualitative and statistical analysis of the FIFA ranking. We then analyze the use of this ranking to determine the groups for the World Cup finals. After enumerating a series of deficiencies in the group assignments for the 2014 World Cup, a mixed integer linear programming model is developed and used to balance the difficulty levels of the groups.
... Somewhat surprisingly given the popularity of in-play betting in football, models for forecasting the results of matches, once the match has begun are rare in the academic literature. Dixon & Robinson (1998) present a paper dealing with in-play forecasting in football and use a bivariate birth process to estimate the hazard (instantaneous scoring rate) of the two teams scoring throughout a match. Titman et al. (2015) present a similar model but allow for both interdependence between the goals scored by the two teams and the yellow and red cards received by the two teams. ...
Article
Match fixing is a growing threat to the integrity of sport, facilitated by new online in-play betting markets sufficiently liquid to allow substantial profits to be made from manipulating an event. Screens to detect a fix employ in-play forecasting models whose predictions are compared in real-time with observed betting odds on websites around the world. Suspicions arise where model odds and market odds diverge. We provide real examples of monitoring for football and tennis matches and describe how suspicious matches are investigated by analysts before a final assessment of how likely it was that a fix took place is made. Results from monitoring driven by this application of forensic statistics have been accepted as primary evidence at cases in the Court of Arbitration for Sport, leading more sports outside football and tennis to adopt this approach to detecting and preventing manipulation.
... For example,Dixon and Robinson (1998),Oberstone (2009) and Angelini and De Angelis (2017) use the number of goals scored in a match to improve forecasting accuracy of the final football outcome. Other studies consider the odds of home win/ draw/ away for football game predictions (see amongst others,Dixon and Coles, 1997;Crowder et al., 2002;Dobson and Goddard, 2003;Constantinou et al., 2012;and Boshnakov et al., 2017). ...
Article
This study introduces a Conditional Fuzzy inference (CF) approach in forecasting. The proposed approach is able to deduct Fuzzy Rules (FRs) conditional on a set of restrictions. This conditional rule selection discards weak rules and the generated forecasts are based only on the most powerful ones. Through this process, it is capable of achieving higher forecasting performance and improving the interpretability of the underlying system. The CF concept is applied in a series of forecasting exercises on stocks and football games datasets. Its performance is benchmarked against a Relevance Vector Machine (RVM), an Adaptive Neuro-Fuzzy Inference System (ANFIS), an Ordered Probit (OP), a Multilayer Perceptron Neural Network (MLP), a k-Nearest Neighbour (k-NN), a Decision Tree (DT) and a Support Vector Machine (SVM) model. The results demonstrate that the CF is providing higher statistical accuracy than its benchmarks.
... However, empirical studies showed that this is rather questionable. In particular, goals are more likely to be scored at the end of each half because of players' tiredness, see for example Dixon and Robinson (1998, Figure 1). ...
Article
Full-text available
A new alternative to the standard Poisson regression model for count data is suggested. This new family of models is based on discrete distributions derived from renewal processes, i.e., distributions of the number of events by some time t. Unlike the Poisson model, these models have, in general, time-dependent hazard functions. Any survival distribution can be used to describe the inter-arrival times between events, which gives a rich class of count processes with great flexibility for modelling both underdispersed and overdispersed data. The R package Countr provides a function, renewalCount(), for fitting renewal count regression models and methods for working with the fitted models. The interface is designed to mimic the glm() interface and standard methods for model exploration, diagnosis and prediction are implemented. Package Countr implements stateof-the-art recently developed methods for fast computation of the count probabilities. The package functionalities are illustrated using several datasets.
... The benchmark models used in this work were chosen because of their wide popularity among football fans in Brazil, despite the availability of several other models in the literature. Among them, we can cite those that model the match as a stochastic process evolving in time (Dixon and Robinson, 1998;Volf, 2009;Titman et al., 2015), those allowing for the team performance parameters to change along the season (Rue and Salvesen, 2000;Crowder et al., 2002;Owen, 2011;Koopman and Lit, 2015) and those modeling dependence between number of goals by means of bivariate count distributions (Dixon and Coles, 1997;Karlis and Ntzoufras, 2003;Scarf, 2007, 2011). Contrary to the multinomial models we proposed, some of these approaches are able to answer several questions, for instance, they can estimate teams' performance parameters allowing to rank the teams according to their offensive and defensive qualities, and can also predict the number of goals scored in a particular match. ...
Article
Full-text available
We propose two Bayesian multinomial-Dirichlet models to predict the final outcome of football (soccer) matches and compare them to three well-known models regarding their predictive power. All the models predicted the full-time results of 1710 matches of the first division of the Brazilian football championship and the comparison used three proper scoring rules, the proportion of errors and a calibration assessment. We also provide a goodness of fit measure. Our results show that multinomial-Dirichlet models are not only competitive with standard approaches, but they are also well calibrated and present reasonable goodness of fit.
... Soccer's rigid league structures and well-defined promotion, relegation, tournament qualification and winning targets also mean it is especially amenable to the kind of analysis considered here. Further, our model abstracts from key features of soccer matches such as the fact that goal-scoring patterns are not very well understood (Kuper and Szymanski, 2014) amid other stylised empirical facts (Dixon and Robinson, 1998). ...
... In-play models of football are surprisingly scarce in the academic literature. Dixon and Robinson (1998) presented a birth-process model for estimating scoring rates during a game, and Titman et al. (2015) proposed a multivariate counting process for modeling both goals and cards. Here, we adapt the process used by many in the bookmaking industry to generate in-play match predictions. ...
Article
By modeling minute‐by‐minute television audience figures from English Premier League soccer matches, with close to 50,000 minute‐observations, we show that demand is partly driven by suspense and surprise. We also identify an additional relevant factor of appeal to audiences, namely shock, which refers to the difference between pre‐match and current game outcome probabilities. Suspense, surprise, and shock remain significant in the presence of a traditional measure of outcome uncertainty. (JEL C23, D12, L82, L83, Z20)
... Reep et al. (1971) used a negative binomial distribution to model the aggregate goal counts, before Maher (1982) used independent Poisson distributions to capture the goals scored by competing teams on a game by game basis. Dixon and Coles (1997) also used the Poisson distribution to model scores, however they departed from the assumption of independence; the model is extended in Dixon and Robinson (1998). The model of Dixon and Coles (1997) is also built upon in Ntzoufras (2000, 2003) who inflate the probability of a draw. ...
Article
We consider the task of determining a soccer player's ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian inference approach that centres on variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a specific ability, through the marginal posterior variational densities. We then use these inferred player abilities to extend the Bayesian hierarchical model of Baio and Blangiardo (2010), which captures a team's scoring rate (the rate at which they score goals). We apply the resulting scheme to the English Premier League, capturing player abilities over the 2013/2014 season, before using output from the hierarchical model to predict whether over or under 2.5 goals will be scored in a given fixture or not in the 2014/2015 season.
... The effect of the home advantage has to be taken out, see also below. One might naturally expect that the team strength is estimated based on the previous results of team A, see, e.g., [5,6,7,8]. As intensively discussed in this work, the estimation of S a may be also based on different performance indicators rather than on previous goals. ...
Preprint
A myriad of different data are generated to characterize a soccer match. Here we discuss which performance indicators are particularly helpful to forecast the future results of a team via an estimation of the underlying team strengths with minimum statistical uncertainty. We introduce an appropriate statistical framework and exemplify it for different performance indicators for the German premier soccer league. Two aspects are involved: (i) It is quantified how well the estimation process would work if no statistical noise due to finite information is present. The related score directly expresses to which degree the chosen performance indicator reflects the underlying team strength. (ii) Additionally, the reduction of the forecasting quality due to statistical noise is determined. From both pieces of information a normalized value can be constructed which is a direct measure of the overall forecasting quality. It turns out that the so-called packing rate works best. New perspectives of performance indicators enter when trying to understand the outcome of single matches based on match-specific observations from the same match. Implications for the purpose of forecasting as well as consequences for the interpretation of team strengths are discussed.
... In particular, taking the goals from previous matches for forecasting purposes, one may expect that the random contributions are very high due to the small number of goals in a soccer match. Thus, older approaches, based on previous match results [10,11,12,13] have a limited predictability [14]. For other performance indicators, displaying a larger number of occurrences (chances for goals, number of passes, ...) it is likely that the random effects are less disturbing [14,15,16,17]. ...
Preprint
The forecasting of sports events is of broad interest from the applied but also from the theoretical perspective. In this work the question is addressed for the example of the German soccer Bundesliga how a theoretically optimum forecast of the goal difference of a match can be characterized. This involves a careful analysis of the random contributions in a match and its disentanglement from the informative contributions, resulting from the individual team strengths. An important aspect is the consideration of the time dependence of the team strength which turns out to mainly fluctuate around a team-specific value during the course of a season. Two types of time-dependent properties have to be distinguished, one being uncorrelated between different match-days, the other being correlated and thus accessible by an appropriate correlation analysis. For some performance indicators, which may be used to estimate the team strength, the quality of the respective forecast is compared to the theoretical optimum. Knowledge of the informative contribution allows one to conclude that the offensive team strength is more important than the defensive team strength for the final success.
... Therefore, association football is particularly interested in performance management and there is already an established academic literature albeit one that has tended to focus upon purely statistical aspects (see e.g. Maher, 1982;Dixon and Coles, 1997; Dixon and Robinson, 1998). As we demonstrate below football's transparent tournament structures and extreme competitiveness also make it especially amenable to a quantitative treatment. ...
Article
Motivated by excessive managerial pressure and sackings, together with associated questions over the inefficient use of scarce resources, we explore realistic performance expectations in association football. Our aim is to improve management quality by accounting for information asymmetry. Results highlight uncertainty caused both by football's low-scoring nature and the intensity of the competition. At a deeper level we show that fans and journalists are prone to underestimate uncertainties associated with individual matches. Further, we quantify reasonable expectations in the face of unevenly distributed resources. In line with the statactivist approach we call for more rounded assessments to be made once the underlying uncertainties are adequately accounted for. Managing fan expectations is probably impossible though the potential for constructive dialogue remains.
... On the other hand, a different dynamic can occur when two specific teams -characterised by specific attack and defence skills -play against each other: for example, a strong team playing against a weak one may score less goals than those expected since it mainly focuses on conserving energy for future more challenging matches. A discussion on the topic is also provided by Dixon and Robinson (1998) and Rue and Salvesen (2000). ...
Preprint
Full-text available
The approaches commonly used to model the number of goals in a football match are characterised by strong assumptions about the dependence between the number of goals scored by the two competing teams and about their marginal distribution. In this work, we argue that the assumptions traditionally made are not always based on solid arguments and sometimes they can be hardly justified. In light of this, we propose a modification of the Dixon and Coles (1997) model by relaxing the assumption of Poisson-distributed marginal variables and by introducing an innovative dependence structure. Specifically, we define the joint distribution of the number of goals scored during a match by means of thoroughly chosen marginal (Mar-) and conditional distributions (-Co). The resulting Mar-Co model is able to balance flexibility and conceptual simplicity. A real data application involving five European leagues suggests that the introduction of the novel dependence structure allows to capture and interpret fundamental league-specific dynamics. In terms of betting performance, the newly introduced Mar-Co model does not perform worse than the Dixon and Coles one in a traditional framework (i.e. 1-X-2 bet) and it outperforms the competing model when a more comprehensive dependence structure is needed (i.e. Under/Over 2.5 bet).
... Models based on the Poisson distribution are a common approach to model exact scores in soccer (Heuer, Müller, & Rubner, 2010;Karlis & Ntzoufras, 2003). Stochastic processes like birth processes have been proposed in American football (Baker & McHale, 2013) and soccer (Dixon & Robinson, 1998). In horse racing an adaptation of random forest classifiers has been used to forecast race outcomes (Lessmann et al., 2010). ...
Article
In the scientific community a large literature on sports forecasting exists, covering a wide range of different sports, methods and research questions. At the same time a lack of general literature such as reviews or meta-analyses on aspects of sports forecasting can be attested, partly attributable to characteristics of forecasting in sports that make it difficult to present through systematic approaches. The present study contributes to filling this gap by providing a narrative review about forecasting related to the outcomes of sports events. An overview about relevant topics in forecasting the outcomes of sports events is presented, a basic methodology is discussed and a categorization of methods is introduced. Having a specific focus on forecasting from ratings, we shed light on the difference between systematic and unsystematic effects influencing the outcomes of sports events. Finally an outlook on the expected impact of the increasing amount and complexity of available data on future sports forecasting research is presented. The present review can serve as a valuable starting point for researchers aiming at the investigation of sports-related forecasts, both helping to find appropriate methods and classify their work in the context of the state of research.
... Few models have been made that incorporate the dynamics of a match itself, taking into account the timing of the goals scored or other events that can happen during a match to influence its outcome. Dixon and Robinson (1998) applied techniques from survival analysis to analyse scoring rates of teams as a function of the number of minutes played of a match. They devised a birth-process model where scoring rates are allowed to vary based on both the amount of time played and the current score. ...
Article
The main goal of this article is to compare the performance of team ratings and individual player ratings when trying to forecast match outcomes in association football. The well-known Elo rating system is used to calculate team ratings, whereas a variant of plus-minus ratings is used to rate individual players. For prediction purposes, two covariates are introduced. The first represents the pre-match difference in Elo ratings of the two teams competing, while the second is the average difference in individual ratings for the players in the starting line-ups of the two teams. Two different statistical models are used to generate forecasts. The first type is an ordered logit regression (OLR) model that directly outputs probabilities for each of the three possible match outcomes, namely home win, draw and away win. The second type is based on competing risk modelling and involves the estimation of scoring rates for the two competing teams. These scoring rates are used to derive match outcome probabilities using discrete event simulation. Both types of models can be used to generate pre-game forecasts, whereas the competing risk models can also be used for in-game predictions. Computational experiments indicate that there is no statistical difference in the prediction quality for pre-game forecasts between the OLR models and the competing risk models. It is also found that team ratings and player ratings perform about equally well when predicting match outcomes. However, forecasts made when using both team ratings and player ratings as covariates are significantly better than those based on only one of the ratings.
... Ahogy azt Tunaru és szerzőtársai (2003) is bemutatják, a kezdeti statisztikai és ökonometriai kutatások a hivatásos sport területén főként a klubok és játékosaik teljesítményének (Bennett-Flueck, 1983;Berri, 1999;Carmichael et al., 2000) és egyes mérkőzések eredményének (Maher, 1982;Stern, 1991;Dixon-Robinson, 1998) előrejelzésére irányultak. A hivatásos labdarúgók csapatuk számára nyújtott teljesítményének értékével kapcsolatos kutatások az 1990-es évek második felében, a Bosman-szabály bevezetése után váltak elterjedtté (Szymanski-Smith, 1997;Szymanski-Kuypers, 1999;Dawson et al., 2000). ...
... Ahogy azt Tunaru és szerzőtársai (2003) is bemutatják, a kezdeti statisztikai és ökonometriai kutatások a hivatásos sport területén főként a klubok és játékosaik teljesítményének (Bennett-Flueck, 1983;Berri, 1999;Carmichael et al., 2000) és egyes mérkőzések eredményének (Maher, 1982;Stern, 1991;Dixon-Robinson, 1998) előrejelzésére irányultak. A hivatásos labdarúgók csapatuk számára nyújtott teljesítményének értékével kapcsolatos kutatások az 1990-es évek második felében, a Bosman-szabály bevezetése után váltak elterjedtté (Szymanski-Smith, 1997;Szymanski-Kuypers, 1999;Dawson et al., 2000). ...
... A kezdeti statisztikai és ökonometriai kutatások a hivatásos sport területén főként a klubok és játékosaik teljesítményének [Bennett-Flueck, 1983;Berri, 1999;Carmichael et al., 2000] és egyes mérkőzések eredményének [Dixon-Robinson, 1998;Maher, 1982;Stern, 1991] előrejelzésére irányultak [Tunaru et al., 2003]. A hivatásos labdarúgók csapatuk számára nyújtott teljesítményének értékével kapcsolatos kutatások a játékosok szabad nemzetközi mozgását lehetővé tevő Bosman-szabály bevezetése után terjedtek el [Dawson et al., 2000;Szymanski-Smith, 1997;Szymanski-Kuypers, 1999]. ...
Article
Full-text available
A tanulmány3 célja, hogy megvizsgálja, milyen mértékben alkalmazható alternatív adatbázis a hivatásos labdarúgók értékeléséhez. Az egyre jelentősebb üzletté váló hivatásos labdarúgás egyik legfőbb értékteremtő tényezői maguk a hivatásos játékosok, ezért kiemelten fontos, hogy teljesítményük és piaci értékük minél pontosabban számszerűsíthető legyen. Cikkünkben a FIFA Electronic Arts videojáték adatbázisának kutatási célú használhatóságát ismertetjük. Kutatási kérdésünk, hogy ez az adatbázis megfelelő-e leíró statisztikák, korreláció-analízis és regressziós becslések alkalmazására. Eredményeink szerint az adatbázis használatának számos előnye van a többi, szabadon elérhető adatbázishoz képest, és mind a tudományos elemzések, mind pedig a hivatásos labdarúgás érintettjei számára jó lehetőséget jelent statisztikai és ökonometriai kutatások elvégzésére. Kulcsszavak: hivatásos labdarúgás, játékospiac, játékjog értékelés, futball ökonometria. = The purpose of this study is to examine the feasibility of using an alternative database for analysing the value of professional football players. Professional football has become an important business and professional players are one of the key value drivers, so it is vital to quantify their performance and market value as accurately as possible. This article introduces the academic utility of the FIFA Electronic Arts Video Game Database. The research question is whether this database is appropriate for using descriptive statistics, correlation analysis, and regression estimates. According to our results, the use of the database has many advantages over other freely available databases and provides a good opportunity both for scientific analysis and for stakeholders of professional football to perform statistical and econometric research. Key words: professional football, transfer market, players’ value evaluation, football econometrics.
... The distribution of in-play scores during the 90 minute interval of 4012 games between 1993 and 1996 has been studied by [23]. They found that goal scoring intensities depend on the game time with intensity increasing steadily as the game progresses. ...
Conference Paper
This thesis is about modelling the in-play football betting market. Our aim is to apply and extend financial mathematical concepts and models to value and risk-manage in-play football bets. We also apply machine learning methods to predict the outcome of the game using in-play indicators. In-play football betting provides a unique opportunity to observe the interplay between a clearly defined fundamental process, that is the game itself and a market on top of this process, the in-play betting market. This is in contrast with classical finance where the relationship between the fundamentals and the market is often indirect or unclear due to lack of direct connection, lack of information and infrequency or delay of information. What makes football betting unique is that the physical fundamentals are well observable because of the existence of rich high frequency data sets, the games have a limited time horizon of usually 90 minutes which avoids the buildup of long term expectations and finally the payoff of the traded products is directly linked to the fundamentals. In the first part of the thesis we show that a number of results in financial mathematics that have been developed for financial derivatives can be applied to value and risk manage in-play football bets. In the second part we develop models to predict the outcomes of football games using in-play data. First, we show that the concepts of risk-neutral measure, arbitrage freeness and completeness can also be applied to in-play football betting. This is achieved by assuming a model where the scores of the two teams follow standard Poisson processes with constant intensities. We note that this model is analogous to the Black-Scholes model in many ways. Second, we observe that an implied intensity smile does exist in football betting and we propose the so-called Local Intensity model. This is motivated by the local volatility model from finance which was the answer to the problem of the implied volatility smile. We show that the counterparts of the Dupire formulae [31] can also be derived in this setting. Third, we propose a Microscopic Model to describe not only the number of goals scored by the two teams, but also two additional variables: the position of the ball and the team holding the ball. We start from a general model where the model parameters are multi-variate functions of all the state variables. Then we characterise the general parameter surfaces using in-play game data and arrive to a simplified model of 13 scalar parameters only. We then show that a semi-analytic method can be used to solve the model. We use the model to predict scoring intensities for various time intervals in the future and find that the initial ball position and team holding the ball is relevant for time intervals of under 30 seconds. Fourth, we consider in-play indicators observed at the end of the first half to predict the number of goals scored during the second half, we refer to this model as the First Half Indicators Model. We use various feature selection methods to identify relevant indicators and use different machine learning models to predict goal intensities for the second half. In our setting a linear model with Elastic Net regularisation had the best performance. Fifth, we compare the predictive powers of the Microscopic Model and the First Half Indicators Model and we find that the Microscopic Model outperforms the First Half Indicators Model for delays of under 30 seconds because this is the time frame where the initial team having the ball and the initial position of the ball is relevant.
... This analysis was performed in two ways, one for all 12 papers and another one by the authors, grouping the references of articles A1, A2, A4, and A5, which share the same main author and tend to share references among themselves, as could be identified in Figure 4 by their node proximity. Table 2 shows that, despite the large number of references analyzed (209) only three bibliographies were shared by more than one author: Maher (1982), Dixon & Coles (1997) and Dixon & Robinson (1998), which places them as some of the main references for this type of study. ...
Article
Full-text available
Team formation is a key aspect of football, being able to bring sporting and financial results, but also susceptible to great risks. Considering that this problem involves subjective decisions, it also becomes subject to failures. The use of quantitative approach methodologies can overcome such aspects and offer better results for football clubs. In this context, a systematic review was carried out on Operations Research techniques for player selection and formation of football teams, demonstrating the main references and authors of the area. A search on the scientific bases Web of Knowledge, Scopus, and ScienceDirect was carried out during the month of January of 2018, analyzing a total of 1,637 articles. Of these, only 12 were selected for analysis. The research of Boon and Sierksma (2003) was identified as the main reference of the area, being referenced by four other authors. It is highlighted the low number of citations between the papers, as well as the lack of review articles like this one. All these aspects contribute to the relevance of this research, addressing a significant problem in one of the most popular sports in the world and unifying its main references.
Chapter
Sports betting has increased in popularity and complexity in the past 5 years. With the leverage of communication technology the betting sector has become similar to financial markets. Bets, and particularly sports-related bets, which can be bought and sold in real time, are similar to financial derivatives. Thus, a betting underwriter is similar to a trading house or investment banker. While the investment industry is regulated and supervised the betting industry is completely unregulated. Some major cases of financial crime are linked to the betting industry and were recently noted by investigators.
Article
This study models soccer as a Markov process. We discretize the pitch into nine zones, and define the states of the Markov process according to the zone of the pitch in which the ball is located, the team in possession and the score. Log-linear models are used to represent state transitions. Using the log-linear models, we estimate team strengths not only with respect to scoring or conceding, but also with respect to gaining or losing possession, while considering the discretized zones in which the ball is located. We use play-by-play data from Japan League Division 1 games in the 2015 season to illustrate our approach, and characterize the strengths of teams in this league. Sanfrecce Hiroshima is used as a particular example. We determine the goodness-of-fit of the log-linear models. Additionally, we introduce random effects into the log-linear models and discuss the complexity of the state transition process. We demonstrate that our Markov model, at the nine-zone level, provides estimates of teams’ strengths to a good approximation.
Chapter
The world we live in has changed immeasurably during the last quarter of a century. The role computers have played as a catalyst for this change in modern society cannot be overstated. Soon after the advent of the Internet came social media, which was followed by the dawn of the so-called era of big data, and all of this has happened in less than half a century since Bill Gates and his friends were toying with transistors and capacitors in creating what would eventually become Microsoft and Apple. The era of big data has meant that in recent times, the status of statisticians, analysts and “quants” has been elevated to such heights that interrogating data is now a core activity of mega-corporations like Google and Facebook. Financial markets have long used analysts to help gain an edge over competitors and better build portfolios which balance returns and risk, and so it is no surprise that another type of financial market, the global market on sports betting, is also employing statistics to gain an edge.
Conference Paper
Football is one of the world's most favored sports with a huge amount of data that one could inspect, analyze and reach interesting conclusions. In this paper we analyze such football data made available through collecting via world football matches.. Our goal is to rank the teams not just based on directmatch result, but also considering team relationship. For this purpose, we apply the PageRank algorithms with restarting mechanismto a graph built from the games. Several statistics such as matches wining and goals scored are combined in different metrics with weights to the links in the graph. Finally, our results indicate that the Random walk approach with the use of right metrics can indeed produce relevant yet more meaningful rankings comparable to the official ranking.
Article
We develop a new dynamic multivariate model for the analysis and forecasting of football match results in national league competitions. The proposed dynamic model is based on the score of the predictive observation mass function for a high-dimensional panel of weekly match results. Our main interest is in forecasting whether the match result is a win, a loss or a draw for each team. The dynamic model for delivering such forecasts can be based on three different dependent variables: the pairwise count of the number of goals, the difference between the numbers of goals, or the category of the match result (win, loss, draw). The different dependent variables require different distributional assumptions. Furthermore, different dynamic model specifications can be considered for generating the forecasts. We investigate empirically which dependent variable and which dynamic model specification yield the best forecasting results. We validate the precision of the resulting forecasts and the success of the forecasts in a betting simulation in an extensive forecasting study for match results from six large European football competitions. Finally, we conclude that the dynamic model for pairwise counts delivers the most precise forecasts while the dynamic model for the difference between counts is most successful for betting, but that both outperform benchmark and other competing models.
Article
In this article, a discrete-time and finite-state Markov chain model is developed to fit the NBA basketball data. It can be used to produce in-play prediction for basketball matches. An iterative algorithm is designed to calculate probabilities of the final score difference. Empirical study shows that the proposed model performs well, and more profoundly it can have positive return when we bet with the market.
Article
Football, as one of the most popular sports, can provide exciting examples to motivate students learning statistics. In this paper, we analyzed the number of goals scored in the UEFA EURO 2020 final phase as well as the waiting times between goals, considering censored times. Such a dataset allows us to consider some aspects of count data taught at an introductory level (such as the Poisson distribution), as well as more advanced topics (such as survival analysis taking into account the presence of censored times). Employing data from the final phase of UEFA EURO 2020, depending on the course level, the student will acquire knowledge and understanding of a range of key topics and analytical techniques in statistics, develop knowledge of the theoretical assumption underlying them and learn the skills needed to model count data.
Article
This paper presents an in-play prediction model based on the gamma process for the scoring processes of the National Basketball Association matches. The model is team-specific, i.e., it takes account of the relative strengths of the two teams playing in a match. The dependence between the home and away scoring processes is characterized by a common latent variable. A Bayesian dynamic forecasting procedure for future games is developed, which utilizes the in-match information to update the scale parameter of the model as the match progresses. An evaluation against baseline models is provided in an empirical study. Our proposed model can predict the final score and total points, while the baseline models are unable to make such predictions. Furthermore, our model can produce positive returns on the point spread betting market and the over-under betting market.
Article
The sports domain presents a number of significant computational challenges for artificial intelligence (AI) and machine learning (ML). In this paper, we explore the techniques that have been applied to the challenges within team sports thus far. We focus on a number of different areas, namely match outcome prediction, tactical decision making, player investments, fantasy sports, and injury prediction. By assessing the work in these areas, we explore how AI is used to predict match outcomes and to help sports teams improve their strategic and tactical decision making. In particular, we describe the main directions in which research efforts have been focused to date. This highlights not only a number of strengths but also weaknesses of the models and techniques that have been employed. Finally, we discuss the research questions that exist in order to further the use of AI and ML in team sports.
Article
We consider the task of determining a football player’s ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian model which is fit using variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a specific ability, through the marginal posterior variational densities. We then use these inferred player abilities to extend the Bayesian hierarchical model of Baio and Blangiardo (2010, Journal of Applied Statistics, 37(2), 253–264) which captures a team’s scoring rate (the rate at which they score goals). We apply the resulting scheme to the English Premier League, capturing player abilities over the 2013/2014 season, before using output from the hierarchical model to predict whether over or under 2.5 goals will be scored in a given game in the 2014/2015 season. This validates our model as a way of providing insights into team formation and the individual success of sports teams.
Article
This study aims to find variables that affect the winning rate of the football team before a match. Qualitative variables such as venue, match importance, performance, and atmosphere of both teams are suggested to predict the outcome. Regression analysis is used to select proper variables. In this study, the performance of the football team is based on the opinions of experts, and the team atmosphere can be calculated with the results of the previous five games. ELO rating represents the state of the opponent. Also, the selected qualitative variables are expressed in fuzzy numbers using fuzzy partitions. A fuzzy regression model for the winning rate of the football team can be estimated by using the least squares method and the least absolute method. It is concluded that the stadium environment, ELO rating, team performance, and importance of the match have effects on the winning rate of Korean National Football (KNF) team from the data on 118 matches.
Article
This work studies outcome uncertainty and competitive balance from a broad perspective. It considers four sports with varying scoring rates, from soccer with typically three goals per match to netball with one hundred goals per match. Within a general modelling framework for a two-competitor contest, we argue that outcome uncertainty, the extent to which the outcome of a contest is unpredictable, depends on scoring rate, on strength variation and on score dependence. Score dependence is essentially the tendency for scores to alternate because possession alternates and possession is advantageous. We regard competitive balance as lack of variation in strength or skill, so that when strength variation is large competitive balance is low and vice versa. Thus, we argue that the outcome of a contest depends on skill, scoring rate, score dependence and chance. This description of outcome is useful because it informs policy-making in sport about the design of scoring systems and the control of competitive imbalance. Broadly, we find that: soccer is relatively competitively unbalanced but outcomes are uncertain because the scoring rate is low; the Australian football league is competitively balanced and so outcomes are uncertain in spite of the high scoring rate in this sport; international rugby matches are relatively neither competitive nor uncertain so that little is left to chance; and netball matches have uncertain outcomes because scores are positively dependent.
Article
Full-text available
Two highly relevant aspects of football, namely forecasting of results and performance analysis by means of performance indicators, are combined in the present study by analysing the value of in-play information in terms of event and positional data in forecasting the further course of football matches. Event and positional data from 50 matches, including more than 300 million datapoints were used to extract a total of 18 performance indicators. Moreover, goals from more than 30,000 additional matches have been analysed. Results suggest that surprisingly goals do not possess any relevant informative value on the further course of a match, if controlling for pre-game market expectation by means of betting odds. Performance indicators based on event and positional data have been shown to possess more informative value than goals, but still are not sufficient to reveal significant predictive value in-play. The present results are relevant to match analysts and bookmakers who should not overestimate the value of in-play information when explaining match performance or compiling in-play betting odds. Moreover, the framework presented in the present study has methodological implications for performance analysis in football, as it suggests that researchers should increasingly segment matches by scoreline and control carefully for general team strength.
Article
Live soccer betting markets differ from other hinary options markets in that all fundamental information is observable, the options mature in less than two hours and the markets are highly liquid. This study presents a new method for the identification of hidden information in market prices. The method is based on two independent Poisson distributions and on a numerical algorithm for the aggregation of all market price information into one rational number. The method is applied to an empirical dataset of real time market prices in 29,413 soccer games. The results indicate that the method selects the most profitable markets and allows for a significant improvement in average investment returns.
Article
Full-text available
In a previous paper, it was demonstrated that distinctly different prediction methods when applied to 2435 American college and professional football games resulted in essentially the same fraction of correct selections of the winning team and essentially the same average absolute error for predicting the margin of victory. These results are now extended to 1446 Australian rules football games. Two distinctly different prediction methods are applied. A least-squares method provides a set of ratings. The predicted margin of victory in the next contest is less than the rating difference, corrected for home-ground advantage, while a 0.75 power method shrinks the ratings compared with those found by the least-squares technique and then performs predictions based on the rating difference and home-ground advantage. Both methods operate upon past margins of victory corrected for home advantage to obtain the ratings. It is shown that both methods perform similarly, based on the fraction of correct selections of the winning team and the average absolute error for predicting the margin of victory. That is, differing predictors using the same information tend to converge to a limiting level of accuracy. The least-squares approach also provides estimates of the accuracy of each prediction. The home advantage is evaluated for all teams collectively and also for individual teams. The data permit comparisons with other sports in other countries. The home team appears to have an advantage (the visiting team has a disadvantage) due to three factors:the visiting team suffers from travel fatigue; crowd intimidation by the home team fans; lack of familiarity with the playing conditions
Article
Four teams in the four divisions of the English Football League have been playing their home matches on artificial pitch surfaces at certain times over the last 10 years or so. A Commission of Enquiry (Football League, 1989) recently recommended that the introduction of further artificial pitches be restricted. One of the factors leading to this recommendation was the possible advantage gained by the home team on such pitches. A statistical analysis of the end-of-season results for the four divisions over the last 10 years (carried out for the Football League) showed that there is indeed such an advantage and that it is of a sufficient scale to be a cause for concern.
Article
The rapid growth of sports betting in Europe beginning in the mid-1980s has continued into the 1990s. The gaming industry has been slow to take full advantage of this opportunity owing to a lack of implementation of management science and operations research techniques by managers. Additionally, recruitment of odds compilers with professional qualifications by the major bookmakers in the UK is practically unknown. Index betting on sports is an important new area of sports betting. Examples from American football, soccer and tennis are quoted and the main differences between a sports index and a stock-market index are discussed. All the companies competing for a share of the index betting market in the UK have experienced difficulties due to their lack of management science and operations research expertise. Some of the basic modelling techniques that the managers of these companies should have at their disposal are illustrated by using estimates from Wimbledon to calculate an index for the total games in a tennis match.
Article
Simulations of various kinds of sporting tournament have been carried out to assess their relative ability to produce as winner the best of the entrants. A strong contender when it is necessary to play relatively few games is the seeded draw and process. When the players are closely matched and there can be more games the round robin played twice is most effective.
Article
Least squares is used to fit a model to the individual match results in English football and to produce a home ground advantage effect for each team in addition to a team rating. We show that for a balanced competition this is equivalent to a simple calculator method using only data from the final ladder. The existence of a spurious home advantage is discussed. Home advantages for all teams in the English Football League from 1981-82 to 1990-91 are calculated, and some reasons for their differences investigated. A paired home advantage is defined and shown to be linearly related to the distance between club grounds.
Article
Extreme records in athletics are increasingly questioned as being due to the use of performance enhancing drugs. To assess such performances, statistical methods are developed that are based on extreme value techniques for estimating the ultimate performance possible by the current population of competing athletes. These methods are applied to the analysis of data from the women's 3000 m track event, where we find that a recently broken record shows signs of being inconsistent with previous performances.
Article
Statistical inference for parametric models of spatial birth-and-death processes is discussed in detail. In particular, a flexible and statistically tractable parametric class of such processes, defined on the real line, is presented and analysed by likelihood and partial likelihood methods. The suggested methods are illustrated by applying them to two sets of data given in the form of aerial photographs from the Kalahari Desert.
Article
A maximum likelihood method of fitting a model to a series of records is proposed, using ideas from the analysis of censored data to construct a likelihood function based on observed records. This method is tried out by fitting several models to series of athletics records for mile and marathon races. A form of residual analysis is proposed for testing the models. Forecasting consequences are also considered. In the case of mile records, a steady linear improvement since 1931 is found. The marathon data are harder to interpret, with a steady improvement until 1965 with only slight improvement in world records since then. In both cases, the normal distribution appears at least as good as extreme-value distributions for the distribution of annual best performances. Short-term forecasts appear satisfactory, but serious reservations are expressed about using regression-type methods to predict long-term performance limits.
Article
A parametric model is developed and fitted to English league and cup football data from 1992 to 1995. The model is motivated by an aim to exploit potential inefficiencies in the association football betting market, and this is examined using bookmakers' odds from 1995 to 1996. The technique is based on a Poisson regression model but is complicated by the data structure and the dynamic nature of teams' performances. Maximum likelihood estimates are shown to be computationally obtainable, and the model is shown to have a positive return when used as the basis of a betting strategy.
Soc. A, 156, 39±50. Fig. 5. Expected gain which arises from selling 1 unit at each time point throughout the match (. , approximate 90% con®dence intervals) Model for Association Football Matches Chedzoy, O. (1995) In¯uences on the distribution of goals in soccer
  • J R Statist
J. R. Statist. Soc. A, 156, 39±50. Fig. 5. Expected gain which arises from selling 1 unit at each time point throughout the match (..........., approximate 90% con®dence intervals) Model for Association Football Matches Chedzoy, O. (1995) In¯uences on the distribution of goals in soccer. Private communication.
The Of®cial Football League Yearbook
  • B J Hugman
Hugman, B. J. (1991) The Of®cial Football League Yearbook. Chichester: Facer.
Modelling association football scores Statistical analysis of a spatial birth-and-death process model with a view to modelling linear dune ®elds. Scand Estimating the effect of a red card in soccer
  • M J Maher
  • ±
  • J Moller
  • M G Sorenson
  • J S Cramer
  • P Hopstaken
Maher, M. J. (1982) Modelling association football scores. Statist. Neerland., 36, 109±118. Moller, J. and Sorenson, M. (1994) Statistical analysis of a spatial birth-and-death process model with a view to modelling linear dune ®elds. Scand. J. Statist., 21, 1±19. Ridder, G., Cramer, J. S. and Hopstaken, P. (1994) Estimating the effect of a red card in soccer. J. Am. Statist. Ass., 89, 1124±1127.
Football Club Directory Chichester: Hamsworth Active. Ð (1993) Football Club Directory Football Club Directory
  • T Williams
Williams, T. (1992) Football Club Directory. Chichester: Hamsworth Active. Ð (1993) Football Club Directory. Chichester: Hamsworth Active. Ð (1994) Football Club Directory. Chichester: Hamsworth Active.