ArticlePDF Available

Abstract and Figures

The paper presents a plus-minus rating for use in association football (soccer). We first describe the standard plus-minus methodology as used in basketball and ice-hockey and then adapt it for use in soccer. The usual goal-differential plus-minus is considered before two variations are proposed. For the first variation, we present a methodology to calculate an expected goals plus-minus rating. The second variation makes use of in-play probabilities of match outcome to evaluate an expected points plus-minus rating. We use the ratings to examine who are the best players in European football, and demonstrate how the players' ratings evolve over time. Finally, we shed light on the debate regarding which is the strongest league. The model suggests the English Premier League is the strongest, with the German Bundesliga a close runner-up.
Content may be subject to copyright.
A preview of the PDF is not available
... The state-of-the-art models in soccer analytics are focusing on several aspects such as evaluating actions, players, and the strategies. Plus/minus method is an early work on player evaluation that has been proposed by Kharrat et al. [13]. This method assigns plus for each goal scored and minus for each goal conceded by the players per total time they were on the pitch. ...
Preprint
Full-text available
Soccer is a sparse rewarding game: any smart or careless action in critical situations can change the result of the match. Therefore players, coaches, and scouts are all curious about the best action to be performed in critical situations, such as the times with a high probability of losing ball possession or scoring a goal. This work proposes a new state representation for the soccer game and a batch reinforcement learning to train a smart policy network. This network gets the contextual information of the situation and proposes the optimal action to maximize the expected goal for the team. We performed extensive numerical experiments on the soccer logs made by InStat for 104 European soccer matches. The results show that in all 104 games, the optimized policy obtains higher rewards than its counterpart in the behavior policy. Besides, our framework learns policies that are close to the expected behavior in the real world. For instance, in the optimized policy, we observe that some actions such as foul, or ball out can be sometimes more rewarding than a shot in specific situations.
... Nevertheless, there is now growing interest in evaluating the performance of handball teams, which involves establishing an impartial method of assessment. However, there are very few references in the literature compared to other sports [7], including basketball, ice hockey or football [8]. These assessments are often based on expert opinions, and they do not always agree on the importance of the chosen criteria. ...
Article
Full-text available
Handball experts agree that the most crucial position in a handball match is that of the goalkeeper. Their performance can be a good predictor of a team’s ranking in tournaments. Despite this, few studies have been conducted on the relevance of every elite goalkeeper’s action to their performance in the match. This paper provides the features or criteria for objectively evaluating a handball goalkeeper based on their actions during a match. For this purpose, the feature-weighting problem is formulated as an optimization problem. The problem is solved using eight metaheuristic algorithms to adjust the weights of the features. Computer experiments using real data from the 2020 Women’s and Men’s European Handball Championships are carried out with these algorithms. The algorithms optimize the weights based on three metrics. The first metric is to identify the best goalkeeper; the second metric is to identify the top five goalkeepers, regardless of order; and the third metric is to identify and order the top five goalkeepers. A case study is carried out with real data from the 2021 Women’s and Men’s World Handball Championships, where the best goalkeeper found in both cases with the optimized weights coincide with the best goalkeeper chosen by the International Handball Federation (IHF). Finally, the paper shows the particularities and specific difficulties involved in evaluating handball goalkeepers.
... Current use cases for soccer data focus on evaluating player performance. For example, Plus-Minus rating [10] that represent the weighted sum of the contributions of playings to goals scored/conceded in a game, and VAEP [4] that evaluates player actions by calculating the probability of an action leading to a goal in the short term. ...
Chapter
Decision-making is one of the crucial factors in soccer (association football). The current focus is on analyzing data sets rather than posing “what if” questions about the game. We propose simulation-based methods that allow us to answer these questions. To avoid simulating complex human physics and ball interactions, we use data to build machine learning models that form the basis of an event-based soccer simulator. This simulator is compatible with the OpenAI GYM API. We introduce tools that allow us to explore and gather insights about soccer, like (1) calculating the risk/reward ratios for sequences of actions, (2) manually defining playing criteria, and (3) discovering strategies through Reinforcement Learning.
Article
Full-text available
During the last few years, sports analytics has been growing rapidly. The main usage of this discipline is the prediction of soccer match results, even if it can be applied with interesting results in different areas, such as analysis based on the player position information. In this paper, we propose an approach aimed to recognize the player position in a soccer match, predicting the specific zone in which the player is located in a specific moment. Similar objectives have never been considered yet with our best knowledge. We consider supervised machine learning techniques by considering a dataset obtained through video capturing and tracking system. The data analyzed refer to several professional soccer games captured at the Alfheim Stadium in Tromso, Norway. The approach can be used in real-time, in order to verify if a player is playing according to the guidelines of the coach. In the experimental analysis, three different types of classification have been performed, i.e., three different divisions of the field, reaching the best results with Random Tree Algorithm.
Article
Full-text available
In order to scientifically explore the effective path of strength quality training of basketball players and improve the effect of strength quality training of basketball players, this paper takes young basketball players as the research object and comprehensively observes the changes and improvement of strength quality by building a strength training monitoring system for basketball players. On this basis, it is proposed to integrate blood flow restriction and basketball players’ special strength training. Through the comparison with the traditional resistance strength training method, it is found that after 8 weeks of experimental comparison, the athletes’ strength quality test indicators show that the average 3RM of the experimental group 1 bench press is 65.2 kg, the experimental group 2 is 65.7 kg, and the experimental group 3 is 72.2 kg. The average performance of the traditional control group was 55.4 kg. Compared with the traditional group, the average performance of the three experimental groups in bench press was significantly improved, which also verified the feasibility of this method in strength quality training.
Article
Market inefficiencies, known as Moneyball effect, have been recently documented in different sports and their scope largely remains an empirical question. This article focuses on football, where fans and club managers seem to value forwards more than defenders. Apparently, football rules imply equally important roles for goals scored and goals conceded in a team win. Economic theory in this case suggests that marginal returns on offensive and defensive efforts should be equal. This prediction can be potentially violated, resulting in labour market inefficiency. To test this hypothesis, we use two separate data sets at team-game and player-season levels (1224 and 772 observations, respectively) from two seasons (2017/18–2018/19) of the German Bundesliga. We compare the relative contribution of the offensive and defensive actions to a team win with the same relative contribution to players’ market value and show that defensive actions are relatively underestimated by the market compared to the offensive.
Article
The paper presents a model for estimating the transfer fees of professional footballers. We seek to improve on the literature in two dimensions. First, we utilise advanced player performance metrics to better capture the playing ability of footballers. Second, we adopt machine learning algorithms to improve out-of-sample prediction accuracy. The model proves to be a considerable improvement on linear regression, and the advanced performance metrics further improve the predictions. We use the model to identify value-for-money transfers, before assessing the past records of clubs in identifying value-for-money and find that, Liverpool and Atlético Madrid, for example, are successful at identifying value-for-money, whilst Manchester United and Barcelona are not.
Article
Full-text available
This paper examines whether workers are rewarded for inconsistent performance by salary premia. Some earlier research suggests that performance inconsistency leads to salary premia, while other research finds premia for consistent performance. Using detailed salary and performance data for top‐level footballers in Italy’s Serie A, we find that inconsistency is penalized for some important dimensions of basic performance measures associated with key skills of players, specifically clearances, aerial duels won, and shots on target.
Article
Full-text available
Identifying match events that are related to match outcome is an important task in football match analysis. Here we have used generalised mixed linear modelling to determine relationships of 16 football match events and 1 contextual variable (game location: home/away) with the match outcome. Statistics of 320 close matches (goal difference ≤ 2) of season 2012-2013 in the Spanish First Division Professional Football League were analysed. Relationships were evaluated with magnitude-based inferences and were expressed as extra matches won or lost per 10 close matches for an increase of two within-team or between-team standard deviations (SD) of the match event (representing effects of changes in team values from match to match and of differences between average team values, respectively). There was a moderate positive within-team effect from shots on target (3.4 extra wins per 10 matches; 99% confidence limits ±1.0), and a small positive within-team effect from total shots (1.7 extra wins; ±1.0). Effects of most other match events were related to ball possession, which had a small negative within-team effect (1.2 extra losses; ±1.0) but a small positive between-team effect (1.7 extra wins; ±1.4). Game location showed a small positive within-team effect (1.9 extra wins; ±0.9). In analyses of nine combinations of team and opposition end-of-season rank (classified as high, medium, low), almost all between-team effects were unclear, while within-team effects varied depending on the strength of team and opposition. Some of these findings will be useful to coaches and performance analysts when planning training sessions and match tactics.
Article
The paper presents a forecasting model for association football scores. The model uses a Weibull-inter-arrival times based count process and a copula to produce a bivariate distribution for the number of goals scored by the home and away teams in a match. We test it against a variety of alternatives, including the simpler Poisson distribution-based model and an independent version of our model. The out-of-sample performance of our methodology is illustrated first using calibration curves and then in a Kelly-type betting strategy that is applied to the pre-match win/draw/loss market and to the over-under 2.5 goals market. The new model provides an improved fit to data compared to previous models and results in positive returns to betting.
Conference Paper
Quantitative evaluation of the ability of soccer players to contribute to team offensive performance is typically based on goals scored, assists made, and shots taken. In this paper, we describe a novel player ranking system based entirely on the value of passes completed. This value is derived based on the relationship of pass locations in a possession and shot opportunities generated. This relationship is learned by applying a supervised machine learning model to pass locations in event data from the 2012-2013 La Liga season. Interestingly, though this metric is based entirely on passes, the derived player rankings are largely consistent with general perceptions of offensive ability, e.g., Messi and Ronaldo are near the top. Additionally, when used to rank midfielders, it separates the more offensively-minded players from others.
Article
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
Traditionally, most of football statistical and media coverage has been focused almost exclusively on goals and (ocassionally) shots. However, most of the duration of a football game is spent away from the boxes, passing the ball around. The way teams pass the ball around is the most characteristic measurement of what a team's "unique style" is. In the present work we analyse passing sequences at the player level, using the different passing frequencies as a "digital fingerprint" of a player's style. The resulting numbers provide an adequate feature set which can be used in order to construct a measure of similarity between players. Armed with such a similarity tool, one can try to answer the question: Who might possibly replace Xavi at FC Barcelona?
Article
Passing the ball is one of the key skills of a football player yet the metrics commonly used to evaluate passing ability are crude and largely limited to various forms of a pass completion rate. These metrics can be misleading for two general reasons: they do not account for the difficulty of the attempted pass nor the various levels of uncertainty involved in empirical observations based on different numbers of passes per player. We address both these deficiencies by building a statistical model in which the success of a pass depends on the skill of the executing player as well as other factors including the origin and destination of the pass, the skill of his teammates and the opponents, and proxies for the defensive pressure put on the executing player as well as random chance. We fit the model by using data from the 2006–2007 season of the English Premier League provided by Opta, estimate each player's passing skill and make predictions for the next season. The model predictions considerably outperform a naive method of simply using the previous season's completion rate as a predictor of the following season's completion rate. In particular, we show how a change in the difficulty of passes attempted in both seasons explains a significant proportion of the shift in the observed performance of some players—a fact that is ignored if the raw completion rate is used to evaluate player skill.