Article

Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: An application to the FIFA World Cup 2014

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this article an approach for the analysis and prediction of international soccer match results is proposed. It is based on a regularized Poisson regression model that includes various potentially influential covariates describing the national teams’ success in previous FIFA World Cups. Additionally, within the generalized linear model (GLM) framework, also differences of team-specific effects are incorporated. In order to achieve variable selection and shrinkage, we use tailored Lasso approaches. Based on preceding FIFA World Cups, two models for the prediction of the FIFA World Cup 2014 are fitted and investigated. Based on the model estimates, the FIFA World Cup 2014 is simulated repeatedly and winning probabilities are obtained for all teams. Both models favor the actual FIFA World Champion Germany.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In general, bivariate Poisson modelling approaches are well established and started without any form of dependency. For example, in the case of modelling football scores, independent Poisson distributions were used e.g. by Lee (1997), Karlis and Ntzoufras (2000), Dyte and Clarke (2000), Groll et al. (2015) or Ley et al. (2019). However, in recent years many different approaches have been proposed to include dependency. ...
... The LASSO technique has already been successfully applied in the context of football. For example, in Groll and Abedieh (2013) a penalised generalised linear mixed model has been used for modelling and prediction of European championship match data, and in Groll et al. (2015), a similar LASSO model has been applied on FIFA World Cup data. An L 1 -penalised approach for Bradley-Terrytype models has also been proposed by on data for the German Bundesliga. ...
... The data set originates from Groll et al. (2015) and was also already used by Schauberger and , by and, finally, used and expanded by van der Wurp et al. (2020), from where essentially the following summary was taken. ...
Full-text available
Article
In this work, we propose an extension of the versatile joint regression framework for bivariate count responses of the R package GJRM by Marra and Radice (R package version 0.2-3, 2020) by incorporating an (adaptive) LASSO-type penalty. The underlying estimation algorithm is based on a quadratic approximation of the penalty. The method enables variable selection and the corresponding estimates guarantee shrinkage and sparsity. Hence, this approach is particularly useful in high-dimensional count response settings. The proposal’s empirical performance is investigated in a simulation study and an application on FIFA World Cup football data.
... Regularized horseshoe priors (Piironen and Vehtari 2017) are assumed for the regression coefficients to control their posterior variance, avoid multicollinearity, and limit the occurrence of over-fitting issues that might lead to poor out-of-sample performances. Regularized estimates of the coefficients were considered in recent works (Groll et al. 2015;Schauberger et al. 2018) from a pure frequentist perspective. ...
... For this reason, regularization methods are used for variable selection since they shrunk to zero coefficient estimates related to negligible covariates, reducing the parameters' variance. Among the others, Groll et al. (2015) and Tutz and Schauberger (2015) considered the LASSO framework, whereas the problem has not been tackled yet from the Bayesian perspective. A plethora of shrinkage priors for the regression coefficients are available (Bhadra et al. 2019), here we decide to adopt the regularized horseshoe prior by Piironen and Vehtari (2017): it easily allows to incorporate prior information about sparseness and can be interpreted as the continuous version of the popular spike-and-slab priors. ...
... Throughout the paper, we will label this formulation of the linear predictors as M 0 , where M will be replaced by the specified model (i.e., IP, BP, or Sk). Then, following Groll et al. (2015) and Groll et al. (2018), the differences between the covariates observed for the two teams in match g are used: we refer to this specification with M 1 . With M 2 , we indicate the most flexible model specification considered: we link each linear predictor to the covariates observed on the specific team. ...
Full-text available
Article
Passes are undoubtedly the more frequent events in football and other team sports. Passing networks and their structural features can be useful to evaluate the style of play in terms of passing behavior, analyzing and quantifying interactions among players. The present paper aims to show how information retrieved from passing networks can have a relevant impact on predicting the match outcome. In particular, we focus on modeling both the scored goals by two competing teams and the goal difference between them. With this purpose, we fit these outcomes using Bayesian hierarchical models, including both in-match and network-based covariates to cover many aspects of the offensive actions on the pitch. Furthermore, we review and compare different approaches to include covariates in modeling football outcomes. The presented methodology is applied to a real dataset containing information on 125 matches of the 2016–2017 UEFA Champions League, involving 32 among the best European teams. From our results, shots on target, corners, and such passing network indicators are the main determinants of the considered football outcomes.
... This thesis looks at the design, analysis and evaluation of the rating systems for association football. The problem of designing accurate team rating models has long history and until very recently was mainly considered in applied statistics (see, e.g., Cattelan et al. 2013;Crowder et al. 2002;Dixon and Coles 1997;Elo 1978;Goddard 2005;Groll et al. 2015;Koning 2000;Maher 1982). It is worth noting that there are numerous applications of rating models in decision making in sports. ...
... One of the most prominent applications of team rating models is their use in match outcome prediction. Many approaches to this problem have been put forward in applied statistics, data mining, and machine learning (see, e.g., Cattelan et al. 2013;Constantinou 2018;Crowder et al. 2002;Dixon and Coles 1997;Goddard 2005;Groll et al. 2015;Hubáček et al. 2018;Joseph et al. 2006;Lasek 2016;Maher 1982;Peeters 2018;Shin and Gasparyan 2014;Sismanis 2010). Given this, the problem of match outcome prediction became a widely accepted method for evaluating and comparing different team rating models. ...
... Because the uncertainty factor is relatively large, the use of a regularisation term in the domain of match outcome prediction is advised. This usually helps to obtain more accurate predictions (Groll et al., 2015;Lasek and Gagolewski, 2015b). The parameters of the model are found by minimising the above function or, equivalently maximising the penalised log-likelihood of results with respect to parameters r, h and c. ...
... For example, Dyte and Clarke (2000) applied this model to data from FIFA World Cups and let the Poisson intensities of both competing teams depend on their FIFA ranks. Groll and Abedieh (2013) and Groll, Schauberger, and Tutz (2015) considered a large set of potentially influential variables for EURO and World Cup data, respectively, and used L 1penalized approaches to detect a sparse set of relevant covariates. Based on these, predictions for the EURO 2012 and FIFA World Cup 2014 tournaments were provided. ...
... The first type of data we describe covers all matches of the four FIFA World Cups 2002-2014 together with several potential influence variables. Basically, we use a very similar set of covariates as introduced in Groll et al. (2015). For each participating team, the covariates are observed either for the year of the respective World Cup (e.g. ...
... As a first possible extension of the model (2), the linear predictor can be augmented by team-specific attack and defense effects for all competing teams. This extension was used in Groll et al. (2015) to predict the FIFA World Cup 2014. There, each couple of attack and defense parameters corresponding to a team has been treated as a group and, hence, the Group Lasso penalty proposed by Yuan and Lin (2006) has been applied on those parameter groups. ...
Article
In this work, we propose a new hybrid modeling approach for the scores of international soccer matches which combines random forests with Poisson ranking methods . While the random forest is based on the competing teams’ covariate information, the latter method estimates ability parameters on historical match data that adequately reflect the current strength of the teams. We compare the new hybrid random forest model to its separate building blocks as well as to conventional Poisson regression models with regard to their predictive performance on all matches from the four FIFA World Cups 2002–2014. It turns out that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate the predictive power can be improved substantially. Finally, the hybrid random forest is used (in advance of the tournament) to predict the FIFA World Cup 2018. To complete our analysis on the previous World Cup data, the corresponding 64 matches serve as an independent validation data set and we are able to confirm the compelling predictive potential of the hybrid random forest which clearly outperforms all other methods including the betting odds.
... In this section, we provide a brief description of the underlying dataset covering all matches of the four preceding FIFA World Cups 2002-2014 together with several potential influence variables. In general, we use essentially the same set of covariates that is introduced in Groll et al. (2015). For each participating team, most of these covariates are observed shortly before the start of the respective World Cup (e.g., the FIFA ranking) or for the same year of the World Cup (e.g., the GDP per capita). ...
... Beside these sportive variables, also certain economic factors as well as variables describing the structure of a team's squad are collected. A detailed description of these variables can be found in Groll et al. (2015). ...
... Finally, we use a Group Lasso approach which is a different extension of the Lasso approach presented earlier. This approach corresponds to the approach proposed by Groll et al. (2015) where it was used to predict the FIFA World Cup 2014. Here, the linear predictor from (4.2) is extended by team-specific attack and defense effects for all competing teams and has the form log( ijk ) =ˇ0 + (x ik − x jk ) ˇ+ z ik + z jk ı + att i − def j . ...
Article
Many approaches that analyse and predict results of international matches in football are based on statistical models incorporating several potentially influential covariates with respect to a national team's success, such as the bookmakers’ ratings or the FIFA ranking. Based on all matches from the four previous FIFA World Cups 2002–2014, we compare the most common regression models that are based on the teams’ covariate information with regard to their predictive performances with an alternative modelling class, the so-called random forests. Random forests can be seen as a mixture between machine learning and statistical modelling and are known for their high predictive power. Here, we consider two different types of random forests depending on the choice of response. One type of random forests predicts the precise numbers of goals, while the other type considers the three match outcomes—win, draw and loss—using special algorithms for ordinal responses. To account for the specific data structure of football matches, in particular at FIFA World Cups, the random forest methods are slightly altered compared to their standard versions and adapted to the specific needs of the application to FIFA World Cup data.
... In this section, we briefly describe the underlying data set covering all matches of the four preceding FIFA World Cups 2002 -2014 together with several potential influence variables. Basically, we use the same set of covariates that is introduced in Groll et al. (2015). For each participating team, the covariates are observed either for the year of the respective World Cup (e.g., GDP per capita) or shortly before the start of the World Cup (e.g., FIFA ranking), and, therefore, vary from one World Cup to another. ...
... For the sake of illustration, Figure 2 shows bar plots for a random forest applied to the World Cup data introduced in Section 2. It can be seen that the most important predictors are Rank, Oddset, CL.Players and Confed.Oppo. This finding is in line with the findings in Groll et al. (2015) in the context of Lasso estimation. ...
... As a possible extension of the model (1), the linear predictor can be augmented by team-specific attack and defense effects for all competing teams. This extension was used in Groll et al. (2015) to predict the FIFA World Cup 2014. There, the two effects corresponding to the same team have been treated as a group of parameters and, hence, the Group Lasso penalty proposed by (Yuan and Lin, 2006) has been applied on those parameter groups. ...
Full-text available
Preprint
In this work, we compare three different modeling approaches for the scores of soccer matches with regard to their predictive performances based on all matches from the four previous FIFA World Cups 2002-2014: Poisson regression models, random forests and ranking methods. While the former two are based on the teams' covari-ate information, the latter method estimates adequate ability parameters that reflect the current strength of the teams best. Within this comparison the best-performing prediction methods on the training data turn out to be the ranking methods and the random forests. However, we show that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate we can improve the predictive power substantially. Finally, this combination of methods is chosen as the final model and based on its estimates, the FIFA World Cup 2018 is simulated repeatedly and winning probabilities are obtained for all teams. The model slightly favors Spain before the defending champion Germany. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as the most probable tournament outcome.
... In this section, we briefly describe the underlying data set covering all matches of the four preceding FIFA World Cups 2002 -2014 together with several potential influence variables. Basically, we use the same set of covariates that is introduced in Groll et al. (2015). For each participating team, the covariates are observed either for the year of the respective World Cup (e.g., GDP per capita) or shortly before the start of the World Cup (e.g., FIFA ranking), and, therefore, vary from one World Cup to another. ...
... For the sake of illustration, Figure 2 shows bar plots for a random forest applied to the World Cup data introduced in Section 2. It can be seen that the most important predictors are Rank, Oddset, CL.Players and Confed.Oppo. This finding is in line with the findings in Groll et al. (2015) in the context of Lasso estimation. ...
... As a possible extension of the model (1), the linear predictor can be augmented by team-specific attack and defense effects for all competing teams. This extension was used in Groll et al. (2015) to predict the FIFA World Cup 2014. There, the two effects corresponding to the same team have been treated as a group of parameters and, hence, the Group Lasso penalty proposed by (Yuan and Lin, 2006) has been applied on those parameter groups. ...
Full-text available
Preprint
In this work, we compare three different modeling approaches for the scores of soccer matches with regard to their predictive performances based on all matches from the four previous FIFA World Cups 2002 - 2014: Poisson regression models, random forests and ranking methods. While the former two are based on the teams' covariate information, the latter method estimates adequate ability parameters that reflect the current strength of the teams best. Within this comparison the best-performing prediction methods on the training data turn out to be the ranking methods and the random forests. However, we show that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate we can improve the predictive power substantially. Finally, this combination of methods is chosen as the final model and based on its estimates, the FIFA World Cup 2018 is simulated repeatedly and winning probabilities are obtained for all teams. The model slightly favors Spain before the defending champion Germany. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as the most probable tournament outcome.
... Therefore, the linear predictor for the number of goals of a specific team depends both on parameters of the team itself and its competitor. Groll, Schauberger, and Tutz (2015) have already pointed out that when fitting exactly the same model to FIFA World Cup data the estimates of the attack and defense abilities of two opposing teams are negatively correlated. Therefore, although (conditionally) independent Poisson distributions are used for the scores in one match, the linear predictors and, accordingly, the predicted outcomes are (negatively) correlated. ...
... These results are further confirmed in Groll et al. (2015). Following Groll and Abedieh (2013), an L 1 -regularized (conditionally) independent Poisson model is used on FIFA World Cup data. ...
... This can be done by using covariate information on the competing teams. Similar to Groll et al. (2015), let now the model be extended such that both linear predictors contain differences of several informative covariates of both competing teams, i.e. ...
Article
When analyzing and modeling the results of soccer matches, one important aspect is to account for the correct dependence of the scores of two competing teams. Several studies have found that, marginally, these scores are moderately negatively correlated. Even though many approaches that analyze the results of soccer matches are based on two (conditionally) independent pairwise Poisson distributions, a certain amount of (mostly negative) dependence between the scores of the competing teams can simply be induced by the inclusion of covariate information of both teams in a suitably structured linear predictor. One objective of this article is to analyze if this type of modeling is appropriate or if additional explicit modeling of the dependence structure for the joint score of a soccer match needs to be taken into account. Therefore, a specific bivariate Poisson model for the two numbers of goals scored by national teams competing in UEFA European football championship matches is fitted to all matches from the three previous European championships, including covariate information of both competing teams. A boosting approach is then used to select the relevant covariates. Based on the estimates, the tournament is simulated 1,000,000 times to obtain winning probabilities for all participating national teams.
... The sport's world governing body Féderation Internationale de Football Association (FIFA) has more members than the United Nations (Haan, Koning, and van Witteloostuijn 2007). Despite the tremendous popularity of the Women's World Cup and continued growth in the total number of female soccer players, sports analytics papers studying the structure of the World Cup tend to focus on the Men's World Cup (see, e.g., Jones 1990, Rathgeber and Rathgeber 2007, Scarf and Yusof 2011, Groll, Schauberger, and Tutz 2015, Guyon 2015, Stone and Rod 2016, Laliena and López 2019, Cea et al. 2020, Guyon 2020, Stronka 2020, Chater et al. 2021, Csató 2022a). 1 Given the lack of research around the Women's World Cup, we analyze the structure of the Women's World Cup in this paper. ...
... Any home advantage for the host will be included in the host rating. However, teams from the same continent as the host (including the host itself) can also benefit from climatic conditions and cultural circumstances (Groll, Schauberger, and Tutz 2015). In addition, teams from the same continent and their fans benefit from shorter travel distances (Monks and Husch 2009). ...
Full-text available
Article
The FIFA Women's World Cup tournament consists of a group stage and a knockout stage. We identify several issues that create competitive imbalance in the group stage. We use match data from all Women's World Cup tournaments from 1991 through 2019 to empirically assess competitive imbalance across groups in each World Cup. Using least squares, we determine ratings for all teams. For each team, we average the ratings of the opponents in the group to calculate group opponents rating. We find that the range in group opponents rating varies between 2.5 and 4.5 goals indicating substantial competitive imbalance. We use logistic regression to quantify the impact of imbalance on the probability of success in the Women's World Cup. Specifically, our estimates show that one goal less in group opponents rating can increase the probability of reaching the quarterfinal by 33%. We discuss several policy recommendations to reduce competitive imbalance at the Women's World Cup.
... Although there is a lot of data collected during matches (e.g., pass accuracy, team's mileage, etc.), only a very limited number of data is publicly available which can be used as possible covariates for forecast models. Groll et al. (2015) performed a variable selection on various covariates and found that the three most significant retrospective covariates are the FIFA ranking followed by the number of Champions league and Euro league players of a team. However, at the time of this analysis the composition and the line ups of the teams have not been announced and hence the number of Champions/Euro league players as covariates are not available. ...
... In this article we follow the retrospective approach and we present a nested generalized Poisson regression model with zero-inflation for the prediction of the scores of single matches, where the model is solely based on the Elo ranking and matches of the participating teams since 2016, where we additionally take the location of matches into account Since the FIFA World Cup 2022 is a complex tournament, involving important effects such as, e.g., group draws (e.g., see Deutsch (2011)) and dependences of the different matches, Monte-Carlo simulations are used to forecast the whole course of the tournament. For a more detailed summary on statistical modeling of major international football events, see, e.g., Groll et al. (2015) and references therein. As we will see later our proposed model shows a good fit, the obtained forecasts are conclusive and outperform classical Poisson models in many cases; furthermore, our model gives quantitative insights in each team's individual chances to proceed to certain stages of the tournament. ...
Full-text available
Preprint
This article is devoted to the forecast of the FIFA World Cup 2022 via nested zero-inflated generalized Poisson regression. Our regression model incorporates the Elo points of the participating teams, the location of the matches and the of team-specific skills in attack and defense as covariates. The proposed model allows predictions in terms of probabilities in order to quantify the chances for each team to reach a certain stage of the tournament. We use Monte Carlo simulations for estimating the outcome of each single match of the tournament, from which we are able to simulate the whole tournament itself. The model is fitted on all football games of the participating teams since 2016 weighted by date and importance. Validation with previous tournaments and comparison with other Poisson models are given.
... The use of regularisation for match outcome prediction models is generally advised because the uncertainty factor is relatively large in this domain. It usually helps to provide more accurate forecasts (Groll et al., 2015;Lasek & Gagolewski, 2018). ...
... In the basic setup, Maher (1982) suggests modelling the goals scored under the independence assumption. This was one of the first approaches specifically crafted for football and it serves as a basis for more complex models (Crowder et al. 2002, Dixon and Coles 1997, Groll et al. 2015, Karlis and Ntzoufras 2003, Kharrat 2016, Koopman and Lit 2015, Rue and Salvesen 2000. The Maher model in introduced in greater detail below. ...
Article
We introduce several new sports team rating models based on the gradient descent algorithm. More precisely, the models can be formulated by maximising the likelihood of match results observed using a single step of this optimisation heuristic. The proposed framework is inspired by the prominent Elo rating system, and yields an iterative version of ordinal logistic regression, as well as different variants of Poisson regression-based models. This construction makes the update equations easy to interpret, and adjusts ratings once new match results are observed. Thus, it naturally handles temporal changes in team strength. Moreover, a study of association football data indicates that the new models yield more accurate forecasts and are less computationally demanding than corresponding methods that jointly optimise the likelihood for the whole set of matches. [https://authors.elsevier.com/a/1cK2dcUIKLguP]
... 2.1). was introduced and described in detail in Groll et al. (2015) and Schauberger and Groll (2018). It was then used in Groll et al. (2019) to make predictions for the World Cup 2018. ...
... Moreover, we believe that the method's predictive performance can be further improved by penalising covariate effects via LASSO-type penalties (Tibshirani 1996;Friedman et al. 2010) or via boosting (e.g., Bühlmann and Hothorn 2007;Hothorn et al. 2010), a technique that stems from machine learning. These methods already proved to be effective in the context of predicting football matches (e.g., Groll et al. 2015Groll et al. , 2018. source, provide a link to the Creative Commons licence, and indicate if changes were made. ...
Full-text available
Article
We propose a versatile joint regression framework for count responses. The method is implemented in the R add-on package GJRM and allows for modelling linear and non-linear dependence through the use of several copulae. Moreover, the parameters of the marginal distributions of the count responses and of the copula can be specified as flexible functions of covariates. Motivated by competitive settings, we also discuss an extension which forces the regression coefficients of the marginal (linear) predictors to be equal via a suitable penalisation. Model fitting is based on a trust region algorithm which estimates simultaneously all the parameters of the joint models. We investigate the proposal’s empirical performance in two simulation studies, the first one designed for arbitrary count data, the other one reflecting competitive settings. Finally, the method is applied to football data, showing its benefits compared to the standard approach with regard to predictive performance.
... Model building was based on a data set constructed from the five past FIFA World Cups 2002 -2018 with 64 matches each. The basic data set (without the World Cup 2018 data) was introduced and described in detail in Groll et al. (2015) and Schauberger and Groll (2018). It was then used in Groll et al. (2019) to make predictions for the World Cup 2018. ...
... Moreover, we believe that the method's predictive performance can be Number of bets placed vs. threshold ε further improved by penalising covariate effects via LASSO-type penalties (Tibshirani, 1996;Friedman et al., 2010) or via boosting (e.g., Bühlmann and Hothorn, 2007;Hothorn et al., 2010), a technique that stems from machine learning. These methods already proved to be effective in the context of predicting football matches (e.g., Groll et al., 2015Groll et al., , 2018. ...
Preprint
We propose a versatile joint regression framework for count responses. The method is implemented in the \texttt{R} add-on package \texttt{GJRM} and allows for modelling linear and non-linear dependence through the use of several copulae. Moreover, the parameters of the marginal distributions of the count responses and of the copula can be specified as flexible functions of covariates. Motivated by a football application, we also discuss an extension which forces the regression coefficients of the marginal (linear) predictors to be equal via a suitable penalisation. Model fitting is based on a trust region algorithm which estimates simultaneously all the parameters of the joint models. We investigate the proposal's empirical performance in two simulation studies, the first one designed for arbitrary count data, the other one reflecting football-specific settings. Finally, the method is applied to FIFA World Cup data, showing its competitiveness to the standard approach with regard to predictive performance.
... see Deutsch (2011), and dependences of the different matches, Monte-Carlo simulations are used to forecast the whole course of the tournament. For a more detailed summary on statistical modeling of major international football events, see Groll et al. (2015) and references therein. ...
... These days a lot of data on possible covariates for forecast models is available. Groll et al. (2015) performed a variable selection on various covariates and found that the three most significant retrospective covariates are the FIFA ranking followed by the number of Champions league and Euro league players of a team. In this article the Elo ranking (see http://en.wikipedia.org/wiki/World ...
Full-text available
Article
This article is devoted to the forecast of the Africa Cup of Nations 2019 football tournament. It is based on a Poisson regression model that includes the Elo points of the participating teams as covariates and incorporates differences of team-specific skills. The proposed model allows predictions in terms of probabilities in order to quantify the chances for each team to reach a certain stage of the tournament. Monte Carlo simulations are used to estimate the outcome of each single match of the tournament and hence to simulate the whole tournament itself. The model is fitted on all football games on neutral ground of the participating teams since 2010. Published in African Journal of Applied Statistics Vol. 6 (1), 2019, pages 599 – 615. DOI: http://dx.doi.org/10.16929/ajas/2019.599.233
... For example, Dyte and Clarke (2000) applied this model to FIFA World Cup data, with Poisson intensities that depend on the FIFA ranks of both competing teams. Groll and Abedieh (2013) and Groll et al. (2015) considered a large set of potential predictors for EURO and World Cup data, respectively, and used L 1 -penalized approaches to detect sparse sets of relevant covariates. Based on these, they calculated predictions for the EURO 2012 and FIFA World Cup 2014 tournaments. ...
... The first type of data we describe covers all matches of the two FIFA Women's World Cups 2011 and 2015 together with several potential influence variables. Basically, we use a similar (but smaller 1 ) set of covariates as introduced in Groll et al. (2015). For each participating team, the covariates are observed either for the year of the respective World Cup (e.g., GDP per capita) or shortly before the start of the World Cup (e.g., average age), and, therefore, vary from one World Cup to another. ...
Preprint
In this work, we combine two different ranking methods together with several other predictors in a joint random forest approach for the scores of soccer matches. The first ranking method is based on the bookmaker consensus, the second ranking method estimates adequate ability parameters that reflect the current strength of the teams best. The proposed combined approach is then applied to the data from the two previous FIFA Women's World Cups 2011 and 2015. Finally, based on the resulting estimates, the FIFA Women's World Cup 2019 is simulated repeatedly and winning probabilities are obtained for all teams. The model clearly favors the defending champion USA before the host France.
... see (Deutsch, 2011), and dependences of the different matches, Monte-Carlo simulations are used to forecast the whole course of the tournament. For a more detailed summary on statistical modeling of major international football events, see (Groll et al., 2015) and references therein. Different similar models based on Poisson regression of increasing complexity (including discussion, goodness of fit and comparing them in terms of scoring functions) were analysed and used in (Gilch and Müller, 2018) for the prediction of the FIFA World Cup 2018. ...
... These days a lot of data on possible covariates for forecast models is available. (Groll et al., 2015) performed a variable selection on various covariates and found that the three most significant retrospective covariates are the FIFA ranking followed by the number of Champions league and Euro league players of a team. In this article the Elo ranking (see http://en.wikipedia.org/wiki/World_Football_Elo_Ratings) is preferably considered instead of the FIFA ranking (which is a simplified Elo ranking since July 2018), since the calculation of the FIFA ranking changed over time and the Elo ranking is more widely used in football forecast models. ...
Full-text available
Preprint
This article is devoted to the forecast of the Africa Cup of Nations 2019 football tournament. It is based on a Poisson regression model that includes the Elo points of the participating teams as covariates and incorporates differences of team-specific skills. The proposed model allows predictions in terms of probabilities in order to quantify the chances for each team to reach a certain stage of the tournament. Monte Carlo simulations are used to estimate the outcome of each single match of the tournament and hence to simulate the whole tournament itself. The model is fitted on all football games on neutral ground of the participating teams since 2010.
... see (Deutsch, 2011), and dependences of the different matches, Monte-Carlo simulations are used to forecast the whole course of the tournament. For a more detailed summary on statistical modeling of major international football events, see (Groll et al., 2015) and references therein. Different similar models based on Poisson regression of increasing complexity (including discussion, goodness of fit and comparing them in terms of scoring functions) were analysed and used in (Gilch and Müller, 2018) for the prediction of the FIFA World Cup 2018. ...
... These days a lot of data on possible covariates for forecast models is available. (Groll et al., 2015) performed a variable selection on various covariates and found that the three most significant retrospective covariates are the FIFA ranking followed by the number of Champions league and Euro league players of a team. In this article the Elo ranking (see http://en.wikipedia.org/wiki/World_Football_Elo_Ratings) is preferably considered instead of the FIFA ranking (which is a simplified Elo ranking since July 2018), since the calculation of the FIFA ranking changed over time and the Elo ranking is more widely used in football forecast models. ...
Full-text available
Preprint
This article is devoted to the forecast of the Africa Cup of Nations 2019 football tournament. It is based on a Poisson regression model that includes the Elo points of the participating teams as covariates and incorporates differences of team-specific skills. The proposed model allows predictions in terms of probabilities in order to quantify the chances for each team to reach a certain stage of the tournament. Monte Carlo simulations are used to estimate the outcome of each single match of the tournament and hence to simulate the whole tournament itself. The model is fitted on all football games on neutral ground of the participating teams since 2010.
... In their case, they apply a typical Poisson model (see Generalized Linear Models: Overview; Generalized Linear Models: Introduction) to data from FIFA world cups and incorporate the FIFA ranks of both competing teams as covariates into the model. Similarly, Groll and Abedieh [15] , Groll et al. [6] , and Groll et al. [16] include many different variables in Poisson-type models for the FIFA World Cup or the EURO. Analogously, also in (ordinal) Bradley-Terry models, covariates can be incorporated, as, for example, demonstrated by Tutz and Schauberger [17] or Schauberger et al. [18] . ...
... When a large number of covariates is supposed to be incorporated into a model and/or if the predictive power of the single variables is not clear in advance, it can be sensible to estimate these models with regularized estimation approaches. For example, Groll and Abedieh [15] , Groll et al. [16] , or Schauberger et al. [18] use L 1 -penalized approaches (see Lasso, the), while Groll et al. [6] apply the so-called boosting techniques (see Boosting). Furthermore, in order to increase the interpretability and to reduce the complexity of the models, regularization approaches can be used to cluster teams with equal effects with respect to certain covariates. ...
Chapter
We present the major approaches for the modeling and prediction of soccer matches. Two principal approaches can be distinguished, namely prediction of the scores of both teams and prediction of the match outcomes represented by the categories win, draw, and loss. The most important elements of these strategies are presented together with several different extensions and further developments.
... see Deutsch [9], and dependences of the different matches, we use Monte-Carlo simulations to forecast the whole course of the tournament. For a more detailed summary on statistical modeling of major international football events we refer to Groll et al. [1] and references therein. ...
... These days a lot of data on possible covariates for the forecast models is available. Groll et al. [1] performed a variable selection on various covariates and found that the three most significant retrospective covariates are the FIFA ranking followed by the number of Champions league and Euro league players of a team. We prefer to consider the Elo ranking instead of the FIFA ranking, since the calculation of the FIFA ranking changed over time and the Elo ranking is more widely used in football forecast models. ...
Full-text available
Preprint
We propose an approach for the analysis and prediction of a football championship. It is based on Poisson regression models that include the Elo points of the teams as covariates and incorporates differences of team-specific effects. These models for the prediction of the FIFA World Cup 2018 are fitted on all football games on neutral ground of the participating teams since 2010. Based on the model estimates for single matches Monte-Carlo simulations are used to estimate probabilities for reaching the different stages in the FIFA World Cup 2018 for all teams. We propose two score functions for ordinal random variables that serve together with the rank probability score for the validation of our models with the results of the FIFA World Cups 2010 and 2014. All models favor Germany as the new FIFA World Champion. All possible courses of the tournament and their probabilities are visualized using a single Sankey diagram.
... [13] and [14] have further extended these Poisson models by incorporating a large set of potential influence variables as well as team-specific (either random or fixed) ability parameters. By using different regularization technique, they discovered a sparse set of relevant covariates, which were then used to predict the European championship (EURO) 2012 and FIFA World Cup 2014 winners, respectively. ...
Full-text available
Article
Keywords: Prediction, Draw, Support Vector Machine (SVM), Back '4', Playing Position Many researches have tried to predict soccer outcomes using different methods such as supervised learning and unsupervised learning approaches. Many factors and features have been used to carry out this prediction. However, none so far has predicted the outcome of match results with emphasis and information on critical position of play in the entire team. Most teams lose by a slim margin, and this is vivid in the fact that 81.57% of matches have a goal difference per match that is less than two goals. This research critically considered the back '4' of teams, and was able to predict whether the match will be won, lost or end in stalemate with much emphasis on deadlock. Support Vector Machine was used to predict the result of matches using only the back '4' information of the team. The result was evaluated by predicting the whole of 2018/2019 season of the English Premier League. The prediction results give a 76.8% accuracy and error rate of 0.23.
... Betting odds are another established measure of heterogeneity 15 in sport contests (e.g., Bartling et al. 2015;Deutscher et al., 2013;Sunde, 2009) and have proven to be an efficient forecasting instrument (see e.g. Forrest et al., 2005;Groll et al., 2015;Spann & Skiera, 2009). However, in our setting, betting odds have the disadvantage of being available only a few days prior to a match and thus only one game in advance. ...
Full-text available
Article
When heterogeneous players make strategic investment decisions in multi-stage contests, they might conserve resources in a current contest to spend more in a subsequent contest, if the degree of heterogeneity in the current (subsequent) contest is sufficiently large (small). We confirm these predictions using data from German professional soccer, in which players are subject to a one-match ban if they accumulate five yellow cards. Players with four yellow cards facing the risk of being suspended for the next match are (i) less likely to be fielded when the heterogeneity in the current match increases and (ii) more likely to receive a fifth yellow card in the current match when heterogeneity in the next match increases or heterogeneity in the next match but one (when they return from their ban) decreases.
... The newly generated variables can be added to the covariate data based on previous UEFA EUROs and a random forest, an xgboost model or actually any other statistical or machine learningmodel, such as e.g. a lasso-regularized regression model 10 , can be fitted to these data. Lasso regression was used for example in Groll et al. (2015) to predict the FIFA World Cup 2014. More details on how classical regression approaches can be used for the modeling and prediction of football matches can also be found in Groll et al. (2020). ...
Preprint
Three state-of-the-art statistical ranking methods for forecasting football matches are combined with several other predictors in a hybrid machine learning model. Namely an ability estimate for every team based on historic matches; an ability estimate for every team based on bookmaker consensus; average plus-minus player ratings based on their individual performances in their home clubs and national teams; and further team covariates (e.g., market value, team structure) and country-specific socio-economic factors (population, GDP). The proposed combined approach is used for learning the number of goals scored in the matches from the four previous UEFA EUROs 2004-2016 and then applied to current information to forecast the upcoming UEFA EURO 2020. Based on the resulting estimates, the tournament is simulated repeatedly and winning probabilities are obtained for all teams. A random forest model favors the current World Champion France with a winning probability of 14.8% before England (13.5%) and Spain (12.3%). Additionally, we provide survival probabilities for all teams and at all tournament stages.
... The 1998 World Cup is covered in [6]. Suzuki et al. [25] use a Bayesian approach to predict the result of the 2006 World Cup, while Groll et al. [8] apply their model to the 2014 World Cup. Until recently, the articles focusing on international competitions looked at the predictive power of the model. ...
Article
In the last round of the FIFA World Cup group stage, games for which the outcome does not affect the selection of the qualified teams are played with little enthusiasm. Furthermore, a team that has already qualified may take into account other factors, such as the opponents it will face in the next stage of the competition so that, depending on the results in the other groups and the scheduling of the next stage, winning the game may not be in its best interest. Even more critically, there may be situations in which a simple draw will qualify both teams for the next stage of the competition. Any situation in which the two opposing teams do not play competitively is detrimental to the sport, and, above all, can lead to collusion and match-fixing opportunities. We here develop a relatively general method of evaluating competitiveness and apply it to the current format of the World Cup group stage. We then propose changes to the current format in order to increase the stakes in the last round of games of the group stage, making games more exciting to watch and, at the same time, reducing any collusion opportunities. We appeal to the same method to evaluate a “groups of 3” format which will be introduced in the 2026 World Cup edition as well as a format similar to the one of the current Euro UEFA Cup.
... Deutsch (2011) wants to judge the impact of the draw in the 2010 World Cup, as well as to look back and identify surprises, disappointments, and upsets. Groll et al. (2015) fit and examine two models to forecast the 2014 FIFA World Cup. O'Leary (2017) finds that the Yahoo crowd was statistically significantly better at predicting the outcomes of matches in the 2014 World Cup compared to the experts and was similar in performance to established betting odds. ...
Full-text available
Preprint
This paper investigates the 2018 FIFA World Cup qualification process via Monte-Carlo simulations. The qualifying probabilities are calculated for 102 nations, all teams except for African and European countries. A reasonable method is proposed to measure the degree of unfairness, which shows substantial differences between the FIFA confederations: for example, a South American team could have doubled its chances by playing in Asia. Using a fixed matchup in the inter-continental play-offs instead of the current random draw can reduce unfairness by about 10%. The move of Australia from the Oceanian to the Asian zone is found to increase its probability of participating in the 2018 FIFA World Cup by 75%. Our results provide important insights for the confederations on how to reallocate the qualifying berths.
... Moreover, a play's position in the field is commonly considered in basketball [13,14] and football studies [9]. Additionally, in international competitions among different national teams, the gross domestic production per capita, population size and other relevant factors for each country have been considered [15,16]. ...
Full-text available
Article
Match outcome prediction is a challenging problem that has led to the recent rise in machine learning being adopted and receiving significant interest from researchers in data science and sports. This study explores predictability in match outcomes using machine learning and candlestick charts, which have been used for stock market technical analysis. We compile candlestick charts based on betting market data and consider the character of the candlestick charts as features in our predictive model rather than the performance indicators used in the technical and tactical analysis in most studies. The predictions are investigated as two types of problems, namely, the classification of wins and losses and the regression of the winning/losing margin. Both are examined using various methods of machine learning, such as ensemble learning, support vector machines and neural networks. The effectiveness of our proposed approach is evaluated with a dataset of 13261 instances over 32 seasons in the National Football League. The results reveal that the random subspace method for regression achieves the best accuracy rate of 68.4%. The candlestick charts of betting market data can enable promising results of match outcome prediction based on pattern recognition by machine learning, without limitations regarding the specific knowledge required for various kinds of sports.
... Additionally, they obtained the posterior distributions with the Highest Density Intervals of the probability to being champion for each team. Groll et al., (2015) proposed an approach for the analysis and prediction of soccer match results. ...
Full-text available
Article
This study was aimed at determining the average performance of teams that participated at the 2014 World Cup in terms of goals scored. Bayesian approach was used to analyze this problem. There was an update of belief about the average goals scored in the tournament through the conjugate Gamma prior and the Poisson likelihood. Using the conjugate Gamma prior and Poisson likelihood, the average goals scored per team was 1.3354 and a posterior standard deviation of 0.1018. The 95% credible interval for the parameter (average goals scored in the tournament) was [1.143, 1.542]. The point estimate for either prior showed that it is within the limit.
... Additionally, they obtained the posterior distributions with the Highest Density Intervals of the probability to being champion for each team. Groll et al., (2015) proposed an approach for the analysis and prediction of soccer match results. ...
Full-text available
Article
This study was aimed at determining the average performance of teams that participated at the 2014 World Cup in terms of goals scored. Bayesian approach was used to analyze this problem. There was an update of belief about the average goals scored in the tournament through the conjugate Gamma prior and the Poisson likelihood. Using the conjugate Gamma prior and Poisson likelihood, the average goals scored per team was 1.3354 and a posterior standard deviation of 0.1018. The 95% credible interval for the parameter (average goals scored in the tournament) was [1.143, 1.542]. The point estimate for either prior showed that it is within the limit.
... They used the generalized linear mixed model (GLMM) approach to incorporate teamspecific random effects. Later Groll et al. (2015) predicted soccer match results based on a regularized Poisson regression model that includes various covariates describing the national teams' success in previous FIFA World Cups. Leitner et al.(2010) proposed techniques for forecasting the results of the European football championship 2008, for which the consensus model based on bookmakers' odds outperforms the methods based on both the Elo rating and the FIFA/Coca Cola World rating. ...
Full-text available
Article
This paper is about predicting the outcome of tennis matches of the Association of Tennis Professionals (ATP) and the Women’s Tennis Association (WTA) using both data and judgments. There are many factors that influence that outcome. An important question is which factors have significant influence on the outcome. We have identified numerous factors and systematically prioritized them subjectively and objectively, so as to improve the accuracy of the prediction. We then used them to predict the win-lose outcome of the 2015 US OPEN tennis matches (63 men and 31 women’s games) before they took place. The tennis match prediction in sports literature thus far reported an accuracy rate of 70%.The accuracy of our proposed model which combines data and judgment reaches 85.1%
... In this section, we briefly describe the underlying data set covering all matches of the four preceding IHF World Men's Handball Championships 2011 -2017 together with several potential influence variables 1 . Basically, we use a similar set of covariates as Groll et al. (2015) do for their soccer FIFA World Cup analysis, with certain modifications that are necessary for handball. For each participating team, the covariates are observed either for the year of the respective World Cup (e.g., GDP per capita) or shortly before the start of the World Cup (e.g., a team's IHF ranking), and, therefore, vary from one World Cup to another. ...
Preprint
In this work, we compare several different modeling approaches for count data applied to the scores of handball matches with regard to their predictive performances based on all matches from the four previous IHF World Men's Handball Championships 2011 - 2017: (underdispersed) Poisson regression models, Gaussian response models and negative binomial models. All models are based on the teams' covariate information. Within this comparison, the Gaussian response model turns out to be the best-performing prediction method on the training data and is, therefore, chosen as the final model. Based on its estimates, the IHF World Men's Handball Championship 2019 is simulated repeatedly and winning probabilities are obtained for all teams. The model clearly favors Denmark before France. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as probabilities for all teams to qualify for the main round.
... Forrest andSimmons, 2008, Deutscher et al., 2018) and on modelling specific game events, especially goals (see e.g. Maher, 1982, Dixon and Coles, 1997, Groll et al., 2015). For the latter, a Poisson distribution is usually as- sumed for the number of goals scored during a match. ...
Full-text available
Article
Recent years have seen several match-fixing scandals in soccer. In order to avoid match-fixing, existing literature and fraud detection systems primarily focus on analysing betting odds provided by bookmakers. In our work, we suggest to not only analyse odds but also total volume placed on bets, thereby making use of more of the information available. As a case study for our method, we consider the second division in Italian soccer, Serie B, since for this league it has effectively been proven that some matches were fixed, such that to some extent we can ground truth our approach. For the betting volume data, we use a flexible generalized additive model for location, scale and shape (GAMLSS), with log-normal response, to account for the various complex patterns present in the data. For the betting odds, we use a GAMLSS with bivariate Poisson response to model the number of goals scored by both teams, and to subsequently derive the corresponding odds. We then conduct outlier detection in order to flag suspicious matches. Our results indicate that monitoring both betting volumes and betting odds can lead to more reliable detection of suspicious matches.
... On the other hand, Baio and Blangiardo (2010) assume conditional independence within hierarchical Bayesian models, on the grounds that the correlation of the goals is already taken into account by the hierarchical structure. Similarly, Groll and Abedieh (2013) and Groll et al. (2015) show that, up to a certain amount, the scores' dependence on two competing teams may be explained by the inclusion of some specific team covariates in the linear predictors. However, Dixon and Robinson (1998) note that modelling the dependence along a single match is possible: in such a case, a temporal structure in the 90 minutes is required. ...
Full-text available
Article
Modelling football outcomes has gained increasing attention, in large part due to the potential for making substantial profits. Despite the strong connection existing between football models and the bookmakers’ betting odds, no authors have used the latter for improving the fit and the predictive accuracy of these models. We have developed a hierarchical Bayesian Poisson model in which the scoring rates of the teams are convex combinations of parameters estimated from historical data and the additional source of the betting odds. We apply our analysis to a nine-year dataset of the most popular European leagues in order to predict match outcomes for their tenth seasons. In this article, we provide numerical and graphical checks for our model.
... Most previous studies have been applied to national championships while very few have considered the FIFA World Cup. The 1998 World Cup is covered in [2], Suzuki et al. [7] use a Bayesian approach to predict the result of the 2006 World Cup while Groll et al. [8] apply their model to the 2014 edition of the World Cup. Among articles devoted to the international competition, the focus was on the predictive power of their model rather than improving the fairness and competitiveness of the World Cup. ...
Full-text available
Preprint
In the soccer World Cup, at the group stage, before the knock-out stage, games whose outcome has no impact for qualification are played with little enthusiasm. If a team is already qualified when playing its last game, different factors come into play so that winning may not be in its best interest. More critically, it is not unusual that a simple draw secures the qualification for both teams. Any situation in which the two opposing teams do not play competitively is detrimental to the sport, and can lead to collusion and match-fixing opportunities. The paper first develops a method to analyze the format of the World Cup group stage and its competitiveness. We then propose changes to the current format in order to increase the stakes in the last round of games of the group stage, reducing collusion opportunities and making games more exciting to watch. We use the same method to evaluate the "groups of 3" format which may be introduced in the 2026 World Cup edition, as well as a "groups of 5" format.
... In this section we recall the model in which a Poisson distribution is assumed for the number of goals scored (the Poisson model for short). This model is quite popular in the context of modelling and forecasting of football match outcomes (see, e.g., Maher 1982;Dixon and Coles 1997;Crowder et al. 2002;Goddard 2005;Graham and Stott 2008;Groll et al. 2015 ...
Article
The efficacy of different league formats in ranking teams according to their true latent strength is analysed. To this end, a new approach for estimating attacking and defensive strengths based on the Poisson regression for modelling match outcomes is proposed. Various performance metrics are estimated reflecting the agreement between latent teams’ strength parameters and their final rank in the league table. The tournament designs studied here are used in the majority of European top-tier association football competitions. Based on numerical experiments, it turns out that a two-stage league format comprising of the three round-robin tournament together with an extra single round-robin is the most efficacious setting. In particular, it is the most accurate in selecting the best team as the winner of the league. Its efficacy can be enhanced by setting the number of points allocated for a win to two (instead of three that is currently in effect in association football).
... Methods of forecasting outcomes of soccer matches are usually based on the rate of correctly predicted results in terms of win (W), draw (D) and loss (L) or on scoring rules for probability forecasts of each of these three outcomes via e.g. the Brier or the logarithm score (Groll et al., 2015;Gneiting & Raftery,2007;Foulley, 2015). ...
... Methods of forecasting outcomes of soccer matches are usually based on the rate of correctly predicted results in terms of win (W), draw (D) and loss (L) or on scoring rules for probability forecasts of each of these three outcomes via e.g. the Brier or the logarithm score (Groll et al., 2015;Gneiting & Raftery,2007;Foulley, 2015). ...
Preprint
This note proposes a penalty criterion for assessing correct score forecasting in a soccer match. The penalty is based on hierarchical priorities for such a forecast i.e., i) Win, Draw and Loss exact prediction and ii) normalized Euclidian distance between actual and forecast scores. The procedure is illustrated on typical scores, and different alternatives on the penalty components are discussed.
... We will show that these scores are correlated with various measures of team performance in the World Cup, even after controlling for other important characteristics like the team's official FIFA ranking. There is consensus in the literature that this ranking, available since 1993, is the main predictor for forecasting World Cup outcomes (Dyte & Clarke, 2000;Suzuki, Salasar, Leite, & Louzada-Neto, 2010), 5 even when controlling for competition specific covariates as bookmakers' odds (Groll, Schauberger, & Tutz, 2015). The effect of emotions is small in comparison, but robust across multiple specifications. ...
Article
Emotion display serves as incentives or deterrents for others’ in many social interactions. We study the portrayal of anger and happiness, two emotions associated with dominance, and its relationship to team performance in a high stake environment. We analyze 4318 pictures of players from 304 participating teams in twelve editions (1970–2014) of the FIFA Soccer World Cup, and use automated face-reading (FaceReader 6) to evaluate the display of anger and happiness. We observe that the display of both anger and happiness is positively correlated with team performance in the World Cup. Teams whose players display more anger, an emotion associated with competitiveness, concede fewer goals. Teams whose players display more happiness, an emotion associated with confidence, score more goals. We show that this result is driven by less than half the players in a team.
... On the other hand, Baio and Blangiardo (2010) assume the conditional independence within hierarchical Bayesian models, on the grounds that the correlation of the goals is already taken into account by the hierarchical structure. Similarly, Groll and Abedieh (2013) and Groll et al. (2015) show that, up to a certain amount, the scores' dependence on two competing teams may be explained by the inclusion of some specific teams' covariates in the linear predictors. However, Dixon and Robinson (1998) note that modelling the dependence along a single match is possible: in such a case, a temporal structure in the 90 minutes is required. ...
Full-text available
Preprint
Modelling football outcomes has gained increasing attention, in large part due to the potential for making substantial profits. Despite the strong connection existing between football models and the bookmakers' betting odds, no authors have used the latter for improving the fit and the predictive accuracy of these models. We have developed a hierarchical Bayesian Poisson model in which the scoring rates of the teams are convex combinations of parameters estimated from historical data and the additional source of the betting odds. We apply our analysis to a nine-year dataset of the most popular European leagues in order to predict match outcomes for their tenth seasons. In this paper, we provide numerical and graphical checks for our model.
... On the other hand, Baio and Blangiardo (2010) assume the conditional independence within hierarchical Bayesian models, on the grounds that the correlation of the goals is already taken into account by the hierarchical structure. Similarly, Groll and Abedieh (2013) and Groll et al. (2015) show that, up to a certain amount, the scores' dependence on two competing teams may be explained by the inclusion of some specific teams' covariates in the linear predictors. However, Dixon and Robinson (1998) note that modelling the dependence along a single match is possible: in such a case, a temporal structure in the 90 minutes is required. ...
Full-text available
Article
Modelling football outcomes has gained increasing attention, in large part due to the potential for making substantial profits. Despite the strong connection existing between football models and the bookmakers' betting odds, no authors have used the latter for improving the fit and the predictive accuracy of these models. We have developed a hierarchical Bayesian Poisson model in which the scoring rates of the teams are convex combinations of parameters estimated from historical data and the additional source of the betting odds. We apply our analysis to a nine-year dataset of the most popular European leagues in order to predict match outcomes for their tenth seasons. In this paper, we provide numerical and graphical checks for our model.
... Knowing how to collect, access, retrieve and integrate information is critical to effective performance analysis and decision-making processes (Vincent et al. 2009). The development and research of data in sports has taken different perspectives, like data mining (Ofoghi et al. 2013;Li 2014;Leung and Joseph 2014;Haghighat et al. 2013), information systems (Shao et al. 2014;Qi and Wang 2014;Xie and Cai 2014;Luo and Deng 2014), event detection in videos (Li and Sezan 2002;Taki et al. 1996;Tong et al. 2005;Taki and Hasegawa 2000), behavioural models (Menéndez et al. 2013;Cheng et al. 2002;Hernandez Mendo and Anguera 2002), social network analysis (Lusher et al. 2010;Vaz de Melo et al. 2012;Pardalos and Zamaraev 2014;Passos et al. 2011) and also outcome prediction by different statistical applications (Baker and Scarf 2006;Groll et al. 2015;Stekler et al. 2010;Leitner et al. 2010). A characteristic feature of all of these approaches is the amount of data obtained by any method chosen or perspective taken. ...
Full-text available
Chapter
Data analysis in sports has adopted many different approaches given its usefulness in quantitative and objective management. Several advances have been made considering the researches and technologies that have been developed up until now. It is possible to find many complex methodologies of sport performance analysis in order to have as much as information as possible to achieve success. Therefore, a wide variety of options are available for sport managers, coaches or anyone interested, including advances on information systems, data mining, machine learning and motion analysis. However, the cost of these powerful methodologies induces the search of cheaper techniques based on basic but proper notation methodology. The aim of this chapter is to provide an observational methodology for soccer match analysis. When paired with PageRank as the main indicator of performance, it allows for a deep analysis of the data and better decision-making and performance analysis in soccer. To show some insights about the proposed model, real data from past matches are presented and discussed. Results show graph visualization that sum up the whole match in terms of the flows of a network modelled with passes and recoveries from the players as weights of its edges. One implication of our research is to be a first approach in generalizing the PageRank algorithm to soccer team’s management, which could be extrapolated to other disciplines. It also points to the feasibility of making a quantitative analysis for sport managers with a reasonable cost-benefit ratio. This analysis opens the paths to further analysis that could include spatiotemporal variables.
... Therefore, these approaches can only use covariates which are known before a match takes place. Examples of predictive approaches focusing on the prediction of the exact scores of a match can be found in Dixon and Coles [11], Karlis and Ntzoufras [19], Dyte and Clarke [12], and Groll et al. [17]. Another (but smaller) part of the literature focuses on the post-hoc analysis of football matches. ...
Article
In modern football, various variables as, for example, the distance a team runs or its percentage of ball possession, are collected throughout a match. However, there is a lack of methods to make use of these on-field variables simultaneously and to connect them with the final result of the match. This paper considers data from the German Bundesliga season 2015/2016. The objective is to identify the on-field variables that are connected to the sportive success or failure of the single teams. An extended Bradley–Terry model for football matches is proposed that is able to take into account on-field covariates. Penalty terms are used to reduce the complexity of the model and to find clusters of teams with equal covariate effects. The model identifies the running distance to be the on-field covariate that is most strongly connected to the match outcome.
... Sports prediction is an obviously interesting topic and always attracts sport fans' interests. In match prediction outcomes in soccer for example, there are various match prediction models which have been developed; either to predict the number of goals scored and conceded (Koopman and Lit, 2015;Groll et al., 2015;McHale and Szcepanski, 2014;Baker and McHale, 2013;McHale and Scarf, 2011) or to directly predict the result outcomes such as the number of wins, draws and loses (Koning et al., 2003;Dobson and Goddard, 2003;Boulier and Stekler, 2003). However, in terms of prediction of effect of various types of tournament structures, there are few researches focusing on that issue. ...
Full-text available
Article
This paper develops a system integrating the knowledge of statistical modeling and Monte Carlo simulation to evaluate various types of soccer tournament structures. A system called as “E-compare of Soccer Tournament Structures” aims to assist decision makers to choose the competitive soccer design. The system reports various tournaments metrics such as the expected number of goals scored and conceded, the expected number of wins, draws, and losses, and the expected final ranking at the end of the tournament. Based on a large number of simulations using teams participated in the Malaysian soccer super league, our analysis showed that different designs gave different impacts on the final ranking of the teams. Round robin is found to be the best structure in terms of identifying the strongest team to win the league compared to a knockout structure. © 2016, International Review of Management and Marketing. All rights reserved.
... Regression models are a common tool for sports analytics research, though their application is largely focused within one of two categories: analyzing player performance and predicting win probabilities. Such examinations have spanned several team sports including basketball (Deshpande and Jensen 2016;Baghal 2012;Fearnhead and Taylor 2011;Okamoto 2011;Teramoto and Cross 2010), ice hockey (Gramacy et al. 2013;Macdonald 2012), baseball (Hamrick and Rasp 2011;Neal et al. 2010), and soccer (Groll et al. 2015;Oberstone 2009). ...
Full-text available
Article
Box score statistics in the National Basketball Association are used to measure and evaluate player performance. Some of these statistics are subjective in nature and since box score statistics are recorded by scorekeepers hired by the home team for each game, there exists potential for inconsistency and bias. These inconsistencies can have far reaching consequences, particularly with the rise in popularity of daily fantasy sports. Using box score data, we estimate models able to quantify both the bias and the generosity of each scorekeeper for two of the most subjective statistics: assists and blocks. We then use optical player tracking data for the 2014-2015 season to improve the assist model by including other contextual spatio-temporal variables such as time of possession, player locations, and distance traveled. From this model, we present results measuring the impact of the scorekeeper and of the other contextual variables on the probability of a pass being recorded as an assist. Results for adjusting season assist totals to remove scorekeeper influence are also presented.
Article
The main goal of this article is to compare the performance of team ratings and individual player ratings when trying to forecast match outcomes in association football. The well-known Elo rating system is used to calculate team ratings, whereas a variant of plus-minus ratings is used to rate individual players. For prediction purposes, two covariates are introduced. The first represents the pre-match difference in Elo ratings of the two teams competing, while the second is the average difference in individual ratings for the players in the starting line-ups of the two teams. Two different statistical models are used to generate forecasts. The first type is an ordered logit regression (OLR) model that directly outputs probabilities for each of the three possible match outcomes, namely home win, draw and away win. The second type is based on competing risk modelling and involves the estimation of scoring rates for the two competing teams. These scoring rates are used to derive match outcome probabilities using discrete event simulation. Both types of models can be used to generate pre-game forecasts, whereas the competing risk models can also be used for in-game predictions. Computational experiments indicate that there is no statistical difference in the prediction quality for pre-game forecasts between the OLR models and the competing risk models. It is also found that team ratings and player ratings perform about equally well when predicting match outcomes. However, forecasts made when using both team ratings and player ratings as covariates are significantly better than those based on only one of the ratings.
Full-text available
Article
In this work, we compare several different modeling approaches for count data applied to the scores of handball matches with regard to their predictive performances based on all matches from the four previous IHF World Men’s Handball Championships 2011 – 2017: (underdispersed) Poisson regression models, Gaussian response models and negative binomial models. All models are based on the teams’ covariate information. Within this comparison, the Gaussian response model turns out to be the best-performing prediction method on the training data and is, therefore, chosen as the final model. Based on its estimates, the IHF World Men’s Handball Championship 2019 is simulated repeatedly and winning probabilities are obtained for all teams. The model clearly favors Denmark before France. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as probabilities for all teams to qualify for the main round.
Article
Across countries and continents, football (soccer) has drawn increasingly more attention over the last decades and developed into a huge commercial complex. Consequently, the market of bookmakers providing the possibility to bet on the result of football matches grew rapidly, especially with the appearance of the internet. With a high number of games every week in multiple countries, football league matches hold enormous potential for generating profits over time with the use of advanced betting strategies. In this paper, we use machine learning for predicting the outcome of football league matches by exploiting data about match characteristics. Based on insights from the field of statistical arbitrage stock market trading, we show that one could generate meaningful profits over time by betting accordingly. A simulation study analyzing the matches of the five top European football leagues from season 2013/14 to 2017/18 presented economically and statistically significant returns achieved by exploiting large data sets with modern machine learning algorithms. In contrast to these modern algorithms, the break-even point could not be reached with an ordinary linear regression approach or simple betting strategies, e.g. always betting on the home team.
Full-text available
Article
In a sports league such as in a soccer league, the teams' competition standing is based on a cumulative point system. Typically, the standard point system is given to every single match for win, draw and lose teams is the 3-1-0 point system. In this paper, we explore the effect of changing point systems to teams' competition standing by changing the weightage values for win, draw and lose teams. Three types of point systems are explored in our soccer simulation model; firstly the 3-1-0, secondly the 2-1-0 and thirdly the 4-1-0 point system. Based on the teams participating in a Malaysian soccer Super League, our simulation result shows that there are small changes in term of teams' competition standing when we compared the actual rank and the simulation rank position. However, the 4-1-0 point system recorded the highest Pearson correlation value which is 0.97, followed by the 2-1-0 point system (0.95) and thirdly the 3-1-0 point system (0.94).
Full-text available
Research
Cet article-interview accordé à Variances.eu, webmagazine des ENSAE Alumni lors d’un entretien avec Philippe Tassi est un plaidoyer pour une application de la statistique mathématique dans le sport selon toutes ses facettes (sport de haut niveau, de loisir, individuel ou collectif) et dans toutes les activités d’amont et d’aval qui s’y rattachent (infrastructures, media, préparation, suivi et physiologie, etc…). L’abondance et la qualité des données recueillies donnent lieu, de plus en plus, à des analyses quantitatives pour lesquelles les mathématiques et notamment la statistique constituent des outils de choix en vue en vue de la description et la modélisation des phénomènes sous-jacents aux compétitions et aux performances des sportifs. Cette orientation est illustrée dans le domaine du football et de ses compétitions de prestige entre nations (Coupe du Monde, EURO) ou entre clubs (Ligue des champions) à travers les problématiques des classements des équipes et des prédictions des résultats des matches. L’accent est mis sur diverses techniques de classement depuis la méthode historique dite ELO jusqu’à des modèles actuels non linéaires plus sophistiqués intégrant toutes les résultats des confrontations deux à deux ainsi que des informations historiques, le tout dans une perspective dynamique d’évolutions des performances des équipes.
Article
Identifying the decisive matches in international football tournaments is of great relevance for a variety of decision makers such as organizers, team coaches and/or media managers. This paper addresses this issue by analyzing the role of the statistical approach used to estimate the outcome of the game on the identification of decisive matches on international tournaments for national football teams. We extend the measure of decisiveness proposed by Geenens (2014) in order to allow us to predict or evaluate the decisive matches before, during and after a particular game on the tournament. Using information from the 2014 FIFA World Cup, our results suggest that Poisson and kernel regressions significantly outperform the forecasts of ordered probit models. Moreover, we find that although the identification of the most decisive matches is independent of the model considered, the identification of other key matches is model dependent. We also apply this methodology to identify the favorite teams and to predict the most decisive matches in 2015 Copa America before the start of the competition. Furthermore, we compare our forecast approach with respect to the original measure during the knockout stage.
Article
In this paper, we investigate the progress of score difference (between home and away teams) in professional basketball games employing functional data analysis (FDA). The observed score difference is viewed as the realization of the latent intensity process, which is assumed to be continuous. There are two major advantages of modeling the latent score difference intensity process using FDA: (1) it allows for arbitrary dependent structure among score change increments. This removes potential model mis-specifications and accommodates momentum which is often observed in sports games. (2) further statistical inferences using FDA estimates will not suffer from inconsistency due to the issue of having a continuous model yet discretely sampled data. Based on the FDA estimates, we define and numerically characterize momentum in basketball games and demonstrate its importance in predicting game outcomes.
Full-text available
Article
This paper presents models for the number of goals scored by opposing teams in international soccer matches. The bivariate discrete distributions employed are defined in terms of the marginal distributions and a dependence copula. This copula representation allows dependence in the bivariate distribution to be modelled in a flexible manner by specifying a suitable family of copula functions and fitting this to the bivariate data using maximum likelihood. Marginal means are modelled with match covariates. The nature of the dependence in the number of goals scored is complex, and we develop the idea that the strength of dependence depends on the competitive balance of a match. Our analysis suggests that for games between closely matched teams, the overall dependence is low, and that the dependence becomes increasingly negative as the competitiveness of a match decreases. In this way, we relate dependence to competitive balance and suggest a method to measure the latter quantity. The models developed here may also offer better forecasts than those offered by match outcome models with independent marginal distributions.
Full-text available
Article
Nowadays many approaches that analyze and predict the results of soccer matches are based on bookmakers' ratings. It is commonly accepted that the mod-els used by the bookmakers contain a lot of expertise as the bookmakers' profits and losses depend on the performance of their models. One objective of this article is to analyze the explanatory power of bookmakers' odds together with many additional, potentially influental covariates with respect to a national team's success at European football championships. Therefore a pairwise Poisson model for the number of goals scored by national teams competing in European football championship matches is used. Moreover, the generalized linear mixed model (GLMM) approach, which is a widely used tool for modeling cluster data, allows to incorporate team-specific ran-dom effects. Two different approaches to the fitting of GLMMs incorporating variable selection are used, subset selection as well as a LASSO-type technique, including an L 1 -penalty term that enforces variable selection and shrinkage simultaneously. Based on the two preceeding European football championships a sparse model is obtained that is used to predict all matches of the current tournament resulting in a possible course of the European football championship (EURO) 2012.
Full-text available
Article
Existing methods for the prediction of the final scores in football games focus on modelling the numbers of goals scored by the two competitors with parameter estimation of the assumed model usually based on the maximum likelihood approach. Although this approach allows for sufficiently accurate prediction of the final score, it does not account for large or surprising final scores than may deteriorate parameter estimates. This is especially the case in competitions with insufficient number of games compared to the participating teams (e.g. World Cup or Champions League). In this paper, we propose a weighted likelihood approach which allows the modeller to underweight a specific football score if it is felt that the result was not typical and falsifies (in any way) the parameter estimates. The imposed game weights can be defined subjectively or by assuming a model-based structure where the parameters can be estimated by iterative algorithms. The weight structure usually reflects deviations from the assumed model. Hence, scores that have low probability under the assumed model will be underweighted. This procedure may provide robust estimates even if surprising (under the assumed model) scores are observed. Champions League data are used to demonstrate the potential of the proposed approach.
Full-text available
Article
Four years after the last European football championship (EURO) in Austria and Switzerland, the two finalists of the EURO 2008 - Spain and Germany - are again the clear favorites for the EURO 2012 in Poland and the Ukraine. Using a bookmaker consensus rating - obtained by aggregating winning odds from 23 online bookmakers - the forecast winning probability for Spain is 25.8% followed by Germany with 22.2%, while all other competitors have much lower winning probabilities (The Netherlands are in third place with a predicted 11.3%). Furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, we can infer that the probability for a rematch between Spain and Germany in the final is 8.9% with the odds just slightly in favor of Spain for prevailing again in such a final (with a winning probability of 52.9%). Thus, one can conclude that - based on bookmakers' expectations - it seems most likely that history repeats itself and Spain defends its European championship title against Germany. However, this outcome is by no means certain and many other courses of the tournament are not unlikely as will be presented here. All forecasts are the result of an aggregation of quoted winning odds for each team in the EURO 2012: These are first adjusted for profit margins ("overrounds"), averaged on the log-odds scale, and then transformed back to winning probabilities. Moreover, team abilities (or strengths) are approximated by an "inverse" procedure of tournament simulations, yielding estimates of all pairwise probabilities (for matches between each pair of teams) as well as probabilities to proceed to the various stages of the tournament. This technique correctly predicted the EURO 2008 final (Leitner, Zeileis, Hornik 2008), with better results than other rating/forecast methods (Leitner, Zeileis, Hornik 2010a), and correctly predicted Spain as the 2010 FIFA World Champion (Leitner, Zeileis, Hornik 2010b). Compared to the EURO 2008 forecasts, there are many parallels but two notable differences: First, the gap between Spain/Germany and all remaining teams is much larger. Second, the odds for the predicted final were slightly in favor of Germany in 2008 whereas this year the situation is reversed.
Full-text available
Article
Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L 1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized log-likelihood yielding models with reduced complexity. In contrast to common procedures it can be used in high-dimensional settings where a large number of potentially influential explanatory variables is available. The method is investigated in simulation studies and illustrated by use of real data sets.
Full-text available
Article
This paper presents models for the number of goals scored by opposing teams in international soccer matches. The bivariate discrete distributions employed are defined in terms of the marginal distributions and a dependence copula. This copula representation allows dependence in the bivariate distribution to be modelled in a flexible manner by specifying a suitable family of copula functions and fitting this to the bivariate data using maximum likelihood. Marginal means are modelled with match covariates. The nature of the dependence in the number of goals scored is complex, and we develop the idea that the strength of dependence depends on the competitive balance of a match. Our analysis suggests that for games between closely matched teams, the overall dependence is low, and that the dependence becomes increasingly negative as the competitiveness of a match decreases. In this way, we relate dependence to competitive balance and suggest a method to measure the latter quantity. The models developed here may also offer better forecasts than those offered by match outcome models with independent marginal distributions.
Full-text available
Book
This book presents a detailed economic analysis of professional football at club level, using a combination of economic reasoning and statistical and econometric analysis. Most of the original empirical research reported in the book is based on English club football. A wide range of international comparisons help emphasize both the broader relevance as well as the unique characteristics of the English experience. Specific topics include: the links between football clubs’ financial strength and competitive balance and uncertainty of outcome; the determinants of professional footballers’ compensation; measuring the football manager’s contribution to team performance, the determinants of managerial change, and its effects on team performance; patterns of spectator demand for attendance; predicting match results, betting on football, and the market in football clubs’ company shares. The book concludes with an extended discussion of the major economic policy issues currently facing football’s legislators and administrators worldwide.
Full-text available
Article
In soccer knockout ties which are played in a two-legged format the team having the return match at home is usually seen as advantaged. For checking this common belief, we analyzed matches of the UEFA Champions League knockout phase since 1995. It is shown that the observed differences in frequencies of winning between teams first playing away and those which are first playing at home can be completely explained by their performances on the group stage and - more importantly - by the teams' general strength.
Article
Attack and defense strengths of football teams vary over time due to changes in the teams of players or their managers. We develop a statistical model for the analysis and forecasting of football match results which are assumed to come from a bivariate Poisson distribution with intensity coefficients that change stochastically over time. This development presents a novelty in the statistical time series analysis of match results from football or other team sports. Our treatment is based on state space and importance sampling methods which are computationally efficient. The out-of-sample performance of our methodology is verified in a betting strategy that is applied to the match outcomes from the 2010/11 and 2011/12 seasons of the English Premier League. We show that our statistical modeling framework can produce a significant positive return over the bookmaker's odds.
Article
In this paper a method is suggested for predicting the distribution of scores in international soccer matches, treating each team’s goals scored as independent Poisson variables dependent on the Fédération Intemationale de Football Association (FIFA) rating of each team, and the match venue. The results of a Poisson regression to estimate parameters for this model were used to simulate matches played during the 1998 World Cup tournament. For the model to be a more effective predictor, some manual adjustments must be made to the ratings data. The predictions of the model were placed on a web page to create interest in applications of mathematics, and proved popular with the general public.
Article
Models based on the bivariate Poisson distribution are used for modelling sports data. Independent Poisson distributions are usually adopted to model the number of goals of two competing teams. We replace the independence assumption by considering a bivariate Poisson model and its extensions. The models proposed allow for correlation between the two scores, which is a plausible assumption in sports with two opposing teams competing against each other. The effect of introducing even slight correlation is discussed. Using just a bivariate Poisson distribution can improve model fit and prediction of the number of draws in football games. The model is extended by considering an inflation factor for diagonal terms in the bivariate joint distribution. This inflation improves in precision the estimation of draws and, at the same time, allows for overdispersed, relative to the simple Poisson distribution, marginal distributions. The properties of the models proposed as well as interpretation and estimation procedures are provided. An illustration of the models is presented by using data sets from football and water-polo.
Article
A common discussion subject for the male part of the population in particular is the prediction of the next week-end's soccer matches, especially for the local team. Knowledge of offensive and defensive skills is valuable in the decision process before making a bet at a bookmaker. We take an applied statistician's approach to the problem, suggesting a Bayesian dynamic generalized linear model to estimate the time-dependent skills of all teams in a league, and to predict the next week-end's soccer matches. The problem is more intricate than it may appear at first glance, as we need to estimate the skills of all teams simultaneously as they are dependent. It is now possible to deal with such inference problems by using the Markov chain Monte Carlo iterative simulation technique. We show various applications of the proposed model based on the English Premier League and division 1 in 1997–1998: prediction with application to betting, retrospective analysis of the final ranking, the detection of surprising matches and how each team's properties vary during the season.
Article
A parametric model is developed and fitted to English league and cup football data from 1992 to 1995. The model is motivated by an aim to exploit potential inefficiencies in the association football betting market, and this is examined using bookmakers' odds from 1995 to 1996. The technique is based on a Poisson regression model but is complicated by the data structure and the dynamic nature of teams' performances. Maximum likelihood estimates are shown to be computationally obtainable, and the model is shown to have a positive return when used as the basis of a betting strategy.
Article
Abstract  Previous authors have rejected the Poisson model for association football scores in favour of the Negative Binomial. This paper, however, investigates the Poisson model further. Parameters representing the teams' inherent attacking and defensive strengths are incorporated and the most appropriate model is found from a hierarchy of models. Observed and expected frequencies of scores are compared and goodness-of-fit tests show that although there are some small systematic differences, an independent Poisson model gives a reasonably accurate description of football scores. Improvements can be achieved by the use of a bivariate Poisson model with a correlation between scores of 0.2.
Book
The second edition of this popular book presents a detailed economic analysis of professional football at club level, with new material included to reflect the development of the economics of professional football over the past ten years. Using a combination of economic reasoning and statistical and econometric analysis, the authors build upon the successes and strengths of the first edition to guide readers through the economic complexities and peculiarities of English club football. It uses a wide range of international comparisons to help emphasize both the broader relevance as well as the unique characteristics of the English experience. Topics covered include some of the most hotly debated issues currently surrounding professional football, including player salaries, the effects of management on team performance, betting on football, racial discrimination and the performance of football referees. This edition also features new chapters on the economics of international football, including the World Cup.
Article
In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Article
Professional advice is available in several forecasting contexts, such as share prices, sales and the weather. English newspaper tipsters offer professional advice on the outcomes of English and Scottish football (soccer) matches. Such advice could potentially inform selections of bettors in fixed odds or pools betting. This paper investigates the effectiveness of the guidance given by newspaper tipsters. Employing a sample of three tipsters and 1694 English league games, we find that tipster success rates are higher than would follow from random forecasting methods. We identify some differences between the processes by which actual results and tipster forecasts are determined. Likelihood-ratio tests imply that the tipsters fail adequately to utilise easily obtainable public information on teams’ strength. Further tests show that only one of three tipsters appears to make successful use of other unspecified information relevant to game outcomes. A consensus forecast across the three tipsters appears to outperform any single tipster.
Article
In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors are not orthogonal. Proposed is an estimation procedure based on adding small positive quantities to the diagonal of X′X. Introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality. It is then shown how to augment X′X to obtain biased estimates with smaller mean square error.
Article
Different methods for assessing the abilities of participants in a sports tournament, and their corresponding winning probabilities for the tournament, are embedded in a common framework and their predictive performances compared. First, ratings of abilities (such as the Elo rating) are complemented with a simulation approach which yields winning probabilities for the full tournament. Second, tournament winning probabilities are extracted from bookmakers' odds using a consensus model, and the underlying abilities of the competitors are then derived by an "inverse" application of the tournament simulation. Both techniques are employed for forecasting the results of the European football championship 2008 (UEFA EURO 2008), for which the consensus model based on bookmakers' odds outperforms methods based on both the Elo rating and the FIFA/Coca Cola World rating. Moreover, the bookmaker consensus model correctly predicts that the final will be played by the teams from Germany and Spain (with a probability of about 20.5%), while showing that both finalists profit from being drawn in groups with relatively weak competitors.
Article
The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.
Article
We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods. Copyright 2006 Royal Statistical Society.
Article
The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences. Copyright 2008 Royal Statistical Society.
FIFA World Cup: How Much Are Those Legs Worth?
  • Lloyd's
Lloyd's. 2014. "FIFA World Cup: How Much Are Those Legs Worth?" Accessed February 16, 2015. http://www.lloyds.com/news-andinsight/news-and-features/market-news/industry-news-2014/ fifa-world-cup-how-much-are-those-leg-worth.
“It’s Brazil’s World Cup to Lose
  • N Silver
Das Ganze ist mehr als die Summe seiner Lichtgestalten. Eine ganzheitliche Analyse der Erfolgschancen bei der Fußballweltmeisterschaft
  • V Stoy
  • R Frankenberger
  • D Buhr
  • L Haug
  • B Springer
  • J Schmid
Stoy, V., R. Frankenberger, D. Buhr, L. Haug, B. Springer, and J. Schmid. 2010. "Das Ganze ist mehr als die Summe seiner Lichtgestalten. Eine ganzheitliche Analyse der Erfolgschancen bei der Fußballweltmeisterschaft 2010." Working Paper 46, Eberhard Karls University, Tübingen, Germany.
The World Cup and Economics 2014
  • Goldman-Sachs Global Investment Research
Forecasting the Winner of the FIFA World Cup 2010
  • C Leitner
  • A Zeileis
  • K Hornik
  • Wu Vienna
Home Victory for Brazil in the 2014 FIFA World Cup
  • A Zeileis
  • C Leitner
  • K Hornik
Zeileis, A., C. Leitner, and K. Hornik. 2014. "Home Victory for Brazil in the 2014 FIFA World Cup." Working paper, Faculty of Economics and Statistics, University of Innsbruck.
The World Cup and Economics 2014 Accessed February 23
  • Goldman-Sachs Global Investment
  • Research
Goldman-Sachs Global Investment Research. 2014. " The World Cup and Economics 2014. " Accessed February 23, 2015. http:// www.goldmansachs.com/our-thinking/outlook/world-cup- and-economics-2014-folder/world-cup-economics-report.pdf.
The World Cup and Economics
  • Goldman-Sachs
Goldman-Sachs Global Investment Research. 2014. "The World Cup and Economics 2014." Accessed February 23, 2015. http:// www.goldmansachs.com/our-thinking/outlook/world-cupand-economics-2014-folder/world-cup-economics-report.pdf.
Forecasting the Winner of the FIFA World Cup
  • C Leitner
  • A Zeileis
  • K Hornik
  • Mathematics
  • Vienna
Leitner, C., A. Zeileis, and K. Hornik. 2010b. "Forecasting the Winner of the FIFA World Cup 2010." Report Series / Department of Statistics and Mathematics, 100. Institute for Statistics and Mathematics, WU Vienna.
It's Brazil's World Cup to Lose
  • N Silver
Silver, N. 2014. "It's Brazil's World Cup to Lose." Accessed February 18, 2015. http://fivethirtyeight.com/features/its-brazils-worldcup-to-lose/.
Repeating Beats Germany in the EURO final Working Paper Faculty of University of Innsbruck
  • Zeileis
The Rating of Players Past Present San
  • Chess
Home for in the FIFA World Cup Working paper Faculty of University of Innsbruck
  • Zeileis
Core Team for Statistical Foundation for Statistical www project org
  • Vienna
Forecasting the Winner of the FIFA World Cup Report Series Department of Institute for
  • Leitner
The of Football University
  • Dobson
It World Cup to Lose Accessed http fivethirtyeight com features its brazils world cup to lose
  • Silver