ArticlePDF Available

A football player rating system

Authors:
  • audEERING GmbH

Abstract and Figures

Association football (soccer) is the most popular sport in the world, resulting in a large economic interest from investors, team managers, and betting agencies. For this reason, a vast number of rating systems exists to assess the strength of football teams or individual players. Nevertheless, most of the existing approaches incorporate deficiencies, e. g., that they depend on subjective ratings from experts. The objective of this work was the development of a new rating system for determining the playing strength of football players. The Elo algorithm, which has established itself as an objective and adaptive rating system in numerous individual sports, has been expanded in accordance with the requirements of team sports. Matches from 16 different European domestic leagues, the UEFA Champions and Europa Leagues have been recorded, with more than 17 000 matches played in recent years, and 12 400 different players. The developed rating system produced promising results, when evaluating the matches based on its predictions. A high relevance of the created system results from the fact that only the associated match report is needed and thus—in relation to existing valuation models—significantly more football players can be assessed.
Content may be subject to copyright.
Journal of Sports Analytics 6 (2020) 243–257
DOI 10.3233/JSA-200411
IOS Press
243
A football player rating system
Stephan Wolfa,, Maximilian Schmittaand Bj¨
orn Schullera,b
aChair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany
bGLAM Group on Language, Audio & Music, Imperial College London, London, UK
Abstract. Association football (soccer) is the most popular sport in the world, resulting in a large economic interest from
investors, team managers, and betting agencies. For this reason, a vast number of rating systems exists to assess the strength
of football teams or individual players. Nevertheless, most of the existing approaches incorporate deficiencies, e.g., that
they depend on subjective ratings from experts. The objective of this work was the development of a new rating system
for determining the playing strength of football players. The Elo algorithm, which has established itself as an objective and
adaptive rating system in numerous individual sports, has been expanded in accordance with the requirements of team sports.
Matches from 16 different European domestic leagues, the UEFA Champions and Europa Leagues have been recorded,
with more than 17 000 matches played in recent years, and 12 400 different players. The developed rating system produced
promising results, when evaluating the matches based on its predictions. A high relevance of the created system results from
the fact that only the associated match report is needed and thus—in relation to existing valuation models—significantly
more football players can be assessed.
Keywords: Football, soccer, rating system, elo algorithm, player performance
1. Introduction
Association football (soccer) is the most popular
sport in the world (Worldatlas 2018, TotalSportek
2019). According to the World Football Association
FIFA (2018a), over the half of the global popula-
tion saw the coverage of the 2018 World Cup. From
this popularity of football results a correspondingly
great interest, with fans and journalists watching foot-
ball matches of all professional teams, analysing and
discussing controversially the performance of each
player. Questions like “Is this or that player the better
footballer?” or “Who is the best player ever?” enjoy
great popularity among fans as well as in the media.
To clarify these questions, there are various assess-
ment models, which can be roughly divided into two
categories (Stefani & Pollard 2007). In subjective
models, experts rate the performances of the teams
Corresponding author: Stephan Wolf, Chair of Embedded
Intelligence for Health Care and Wellbeing, University of Augs-
burg, Eichleitnerstraße 30, Augsburg, Germany. E-mail: stephan
wolf94@gmail.com.
or athletes—in football, journalists usually rate the
performances of individual players. Peeters (2018)
showed that ratings based on the valuation obtained
by crowdsourcing platforms are superior to official
or Elo team rating schemes. In contrast, an objective
approach to assessing player performance is based
on their statistics. In each game, a lot of event data
is collected for each player. From the number of ball
contacts, the traveled distance, the pass and tackle
quota as well as various other statistics, an algo-
rithm determines how good or bad the performance of
the player concerned has been. For example, Paix˜
ao
et al. (2015) showed that UEFA Champions League
semi-finalists using shorter passing sequences had a
higher chance of winning the match. Very often, a
machine learning algorithm, i. e., learning a black box
model from a given training set of recorded data, is
employed. The model determines how good or bad
the performance of the player concerned has been
or makes predictions for the performance in future
matches. Aslan & Inceoglu (2007) emphasise on the
importance of the input parameter selection in order
ISSN 2215-020X/20/$35.00 © 2020 IOS Press and the authors. All rights reserved
This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial License (CC BY-NC 4.0).
244 S. Wolf et al. / A football player rating system
to train a robust model. The data of one season of the
Italian Serie A are used for the evaluation. The model
is trained with the data from the first half of the sea-
son, ignoring the first 6 game days because “random
factors play significant role at the start of the season
and can distort the training procedure”.
Arndt & Brefeld (2016) predict future perfor-
mances of players from the German Bundesliga based
on more than 1 000 features extracted from event
feeds, including information about duels and passes.
However, the authors state that data for individual
players is often too sparse for learning individual
player models. Also Pappalardo et al. (2019) propose
a metric to rank performances of individual foot-
ball players based on events and a machine learning
approach.
Kempe et al. (2018) introduced the usage of
pass accuracy and effectiveness to rate the tactical
behaviour of individual players. The authors pur-
sue the use case of scouting, but they state as well
that their approach requires real-time player and ball
tracking systems. A hybrid approach based on event
data and tracking systems has been proposed by
Decroos et al. (2019). A language to represent indi-
vidual actions by football players is presented, where
each player action receives a label indicating how
much it effects (positively or negatively) the match
score. Also here, the authors stress the problem that
real-time tracking data is required, which is avail-
able only for teams playing in top leagues. Thus, an
application of such approaches for scouting is ques-
tionable.
Finally, Hossain et al. (2017) use wearables worn
during football matches and deep learning techniques
to analyse events such as dribblings and passes to
evaluate the ability of a player. They are considering
their system mainly as an additional rating source for
coaches, besides the subjective visual analysis of the
match. A usage for scouting is also imaginable, but
this would require a large scale deployment, based
on players’ consent, and implicate privacy protection
issues.
Using a combined approach that considers both
subjective journalist opinions and objective game
features, Egidi & Gabry (2018) created a hier-
archical model for predicting individual player
performances.
In both, subjective and objective methods, players
can accumulate ratings in a monotonous fashion over
a period of time—such cumulative procedures have
the disadvantage that the ratings cannot be adjusted
to current results, but only an average of past results is
determined. Another criticism of these models is that
no cross-comparison between players from different
leagues is possible because the different competitions
do not have the same level of play and the teams
partially use different philosophies of play.
Football is a complex sport—a player’s perfor-
mance does not only depend on physical, technical
and tactical skills, but also on psychological and men-
tal components (Beswick 2010). The assessment of
a player’s strength based on subjective impressions
or cumulative statistics therefore turns out to be dif-
ficult to impossible. The goal, with which each team
and each player starts into a football game, is on
the other hand conceivably simple and can be suffi-
ciently described in one word: Winning. Taking into
account the results of a player, i.e., how often he
or she reaches the goal of winning, and the level at
which the player acts, as well as the strengths of his /
her teammates and opponents, one receives an objec-
tive rating, which describes the strength of the player
and which is comparable with the ratings of other
players.
Comparing the results a team produces with and
without a certain player, one can determine the impact
of the player on the performance of the team. Kharrat
et al. (2017) adapted the plus-minus model, which
usually measures a player’s impact on the game in
teamsports like icehockey or basketball, for use in
football. R. Schultze & Wellbrock (2018) followed
a similar approach in their studies and evaluated the
importance of football players for their team based on
a weighted plus-minus metric. The EA Sports Play-
ers Performance Index (Actim Index) is the official
player rating system of the English Premier League
and awards points for good stats as well as for positive
team results (McHale et al. 2014).
Elo-based ranking systems for football have
been proposed only—to the best knowledge of the
authors—for teams and not for individual players. For
example, Sullivan & Cronin (2016) introduce a sys-
tem to predict the outcomes of matches of the English
Premier League. Hub´
aˇ
cek et al. (2019) evaluate four
different models for football match result prediction
on a large dataset of 52 leagues. They employ only
the final scores as input to their models and find that
the Elo model is slightly outperformed by a double
Poisson model and pi-ratings, while it performs better
than a graph-based model.
The contribution of this article is as follows: In con-
trast to previous research, we propose an Elo-based
rating system for individual football players. The
input of the system is solely the information found
S. Wolf et al. / A football player rating system 245
Fig. 1. Basic idea of the algorithm: The illustration shows how the old rating RAiof a player Aifrom team A is updated after each game
played. The expectation placed on the player before the game results from his rating and the rating of his own and the opposing team. If the
actual result is better than expected, this leads to a positive change. If the player does not meet expectations, the change is negative. After
multiplying by a weight, the change and the old rating result in the new rating. If expectations are exceeded, the player’s rating improves.
The more unexpected the result, the greater the difference between the old and the new rating.
in the official match reports, i. e., the goals scored by
each team and the lineups with substitutions, includ-
ing the respective minutes of play. Thus, the system
takes into account also intermediate results of the
matches, but requires much less data than the event-
based rating systems introduced in the past. After
each match played, the rating of each player involved
is adjusted. If the player has exceeded the expecta-
tions placed on him before the game, his rating will be
increased—if he could not meet the expectations, he
will be devalued. Figure 1 illustrates the basic struc-
ture of the algorithm. The developed system is able
to: (a) rank the playing strengths of football players,
(b) rank the playing strenghts of football teams, (c)
predict the outcomes of future football matches, and
(d) identify new talents in football. The advantage of
our approach compared to Elo ratings of whole teams
is that our model performs also across seasons, where
player transfers take place, without any loss of accu-
racy. As the required input data is available for all
divisions of football leagues, it can also be deployed
for scouting purposes.
The rest of the article is organised as follows.
Section 2 introduces the Elo algorithm and its appli-
cations in individual sports. Section 3 first explains
the Elo-based football player rating system and then
introduces the database used in the study. The results
are presented and evaluated in Section 4. Then, in
Section 5, the potentials and limitations of the sys-
tem are discussed. The final section concludes the
article.
2. The Elo-Algorithm
In various sports, such rating systems which adapt
the rating of the players based on their results and
taking into account the relative skill level after each
match, have been established for some time. The
American statistician Arpad Emrick Elo (1903
1992) developed the Elo algorithm already in 1960,
which serves since 1970 the world chess federation
FIDE for the calculation of FIDE ratings—objective
scores, which indicate the playing strength of chess
246 S. Wolf et al. / A football player rating system
players. Each chess player has an Elo rating (= FIDE
rating) R, whose value correlates positively with his
playing strength. The best players in the world have
a rating of over 2 800. The most amateur players are
ranked between 1 200 and 2 000 points.
If a player is not yet registered in the system, his
Elo rating is initialised based on the level of the com-
petition on which he completes his first scored match.
After each played game, the rating of the player will
be updated according to his result and taking into
account the relative playing strength of the oppo-
nent. The algorithm for calculating the new rating
is described by Elo (1978) as follows:
EAindicates the expected chance of success of
player Awith Elo rating RAin the duel against player
Bwith Elo rating RB. This value varies between 0 and
1 and depends on the Elo ratings of the two actors
involved.
EA=1
1+10 RBRA
400
.(1)
The actual result that Player Aachieves is repre-
sented by SA.
SA=
1,A wins
0.5,draw
0,A loses.
(2)
The difference between the result SAand the
expectation EAyields—after multiplication by a fac-
tor kA—the value of the rating change CA.CAis
positive if the player has exceeded expectations,
resulting in an increase of his Elo rating.
CA=(SAEA)·kA.(3)
The weight kAis a positive number between 10 and
40 and depends on the playing strength, the age, and
the number of games played by A so far. For example,
if Ahas completed only a few evaluated games, his
Elo rating is still inaccurate—for this reason, kAis
set to a high value and his rating is thus more closely
aligned to current results. On the other hand, the Elo
value of a world-class player can be specified more
precisely because he has already participated in many
competitions and more games are added regularly.
Therefore, kAis lower for a player with a high playing
strength. FIDE (2017) has set the following rules for
the k-value:
kA=
40,A has completed less then 30 games or A is younger than 18 years (as long as RAremains under 2 300)
20,R
Awas always under 2 400
10,A has completed at least 30 games and RAis / was at least 2 400 .
(4)
The new Elo rating R
Athat is assigned to player
Aafter the game is the sum of his old Elo rating and
the change CA.
R
A=RA+CA.(5)
The Elo system is used not only in chess but also in
other individual sports. These variants, in which only
details have been modified, are all based on the basic
principle that the points are redistributed depending
on the expectation and result values. Elo’s rating sys-
tem can be used in all competitions in which two
parties compete directly against each other. In team
sports, however, it makes no sense to transfer the pre-
sented algorithm directly to individual players. Since
the players usually have different statistics regard-
ing goal difference (goals scored by the own team
while the player was on the pitch minus goals scored
by the opponent while the player was on the pitch)
and minutes on the pitch, the algorithm must be fun-
damentally revised. However, considering the whole
team as a single party, Elo’s considerations can be
applied to team sports teams to be classified according
to their skill level.
The FIFA uses the algorithm described above in
a slightly modified form to compile the FIFA World
Ranking, which assesses the national teams’ strength.
For women, the rankings are formed in this way since
their introduction in 2003 (FIFA 2020a). Yet in 2013,
Lasek et al. (2013) found that an Elo-based system
for team performance prediction works best and out-
performs the one used by FIFA for Men’s National
Teams at that time; the same claim has been made by
G´
asquez & Royuela (2016). The FIFA / Coca-Cola
World Ranking for Men’s National Teams has been
based on the Elo calculation method not until August
2018 (FIFA 2020b). According to FIFA (2018b), the
main objective in the introduction of the new proce-
dure was “to identify an algorithm that is not only
intuitive, easy to understand and improves overall
accuracy of the formula, but also addresses feedback
received about the previous model and provides fair
and equal opportunities for all teams”.
S. Wolf et al. / A football player rating system 247
ClubElo (2014) evaluates the performance of
teams at European club level using a similar and Elo
algorithm based method. A total of almost 600 000
games from 33 different competitions and teams from
54 different countries have been recorded so far. With
so much data on club teams, this site provides a good
foundation for a rating system that uses a modified
version of the Elo algorithm to determine the strength
of individual footballers.
3. Football player rating system
3.1. Algorithm
Since in the following, each football player is
considered individually and should be assigned a rat-
ing, the algorithm described above, which is suitable
only for exactly two parties, must be modified and
expanded accordingly. The following new variables
are introduced:
rAi: A player of Team A.
rRAi: Rating of player Ai.
rRA/RB: Average rating of team A/ average
rating of team B.
rEAi: Individual expectation value of player Ai
for a given match—indicates the chances of suc-
cess for the player and is calculated from RAi
and RB.
rMAi: Individual number of minutes played by
Aiin a given match—indicates how long the
player was on the pitch.
rDAi: Individual goal difference of Aifor a given
match—indicates the difference of the goals of
the teams that were scored while the player was
on the pitch.
The rating of a team RAindicates the strength of the
lineup of club A. For this, in addition to the starting
11, the minutes of play of the team’s player changes
must be known. RAis calculated from the average of
the ratings of the nplayers, taking into account the
associated time on the pitch.
RA=
n1
i=0
RAi·MAi
n1
i=0
MAi
.(6)
Depending on the rating of the player as well as the
team rating of the opponent, the Elo formula calcu-
lates EAi, an individual expectation value for each
player. EA, the chance of success of the team, is
determined from RAand RBas in Formula 1. The
home team wins statistically more often than the away
team (Pollard 1986, Trombley 2016) and therefore
has a higher expectation value—because of this home
advantage (H), the home team’s rating and the rat-
ings of all home team players are adjusted before the
expected values are calculated. It therefore applies
RA:= RA+H.
EAi=1
1+10
RBRAi
400
.(7)
Each player gets his own game score SAibased on
the goals scored while the player was on the pitch.
If the team with the player on the pitch scores more
goals than the opponent, the goal difference is posi-
tive (DAi>0) and the game is counted as a win (SAi=
1). The same applies for balanced and negative goal
differences (see Formula 2). The result of the team
SAis determined in the same way.
SAi=
1 for DAi>0
0.5 for DAi=0
0 for DAi<0.
(8)
Similar to Formula 3, an individual change CAi
is calculated for each player according to his result
and expectation value. In the event of a win or a
defeat, the goal difference DAiis taken into account,
thereby weighting the clarity of the game’s outcome.
A clearer result leads to a larger point change—the
use of the third root of the goal difference ensures
homogeneous relations between the magnitude of the
change and the clarity of the result. For example, in
a win with 8 goals difference, the change is twice
as large as in a success with 1 goal difference. If a
match ends in a draw for a player, the played min-
utes will be taken into account and the change will be
weighted less in the case of a short time on the pitch.
The background is that you cannot expect a favoured
team—who should emerge victorious at the end of
a game—to score more goals than their opponent in
every section of the game. Mmax is the duration of
the game and thus the maximum number of minutes
a player can reach. The change is finally multiplied
by w, the weight of the game. This value is usually
the neutral element of multiplication (1) and therefore
has no effect—in important games wis set higher, in
unimportant matches accordingly lower. For exam-
ple, wcould be set higher in national team games of
the World Cup than in qualifying matches, which in
248 S. Wolf et al. / A football player rating system
turn are more important than friendly matches. CA,
the change for the team, is calculated identically:
CAi=w·(SAiEAi)·3
|DAi|,,DAi/=0
w·(SAiEAi)·MAi
Mmax ,,DAi=0.
(9)
The calculation of the new rating R
Aiis done
for each player according to his previous rating, his
played minutes, his personal result and the result of
his team. The weighting of the individual change in
proportion to the team change depends on the value
qAi, which is between 0.5 and 1.0. As with Elo’s algo-
rithm, the entire change is multiplied by an individual
factor kAi. If a player has not completed the entire
game, the team’s change will be weighted corre-
spondingly less, as the player had a smaller influence
on the outcome of the team. An adaptation of the
Elo rating taking into account the margin of the win
has also been considered in the system by Sullivan &
Cronin (2016).
In every game, the sum of the rating changes over
all players is usually zero (different k-values can
result in minimal deviations). The average player
rating remains constant regardless of the number of
games recorded.
R
Ai=RAi+kAi
·(qAi·CAi)+(1 qAi)·CA·MAi
Mmax .
(10)
The player is responsible for the result of the team
and the team is responsible for the result of the
player—for this reason, both the personal and the
team change are considered in the calculations. The
following example should clarify this: Team Awins
1:0, player Axis substituted at the score of 0:0 by
player Ay.SoAxhas a draw as individual result and
player Ayhas a win. The fact that the team produced
a better result with player Ay, does not necessarily
have to be due to the performance of the players (e.g.
could also have had tactical reasons)—for this rea-
son, the team score for both players is considered a
win. Player Ayhas though achieved a better overall
result than player Ax.
For the algorithm to respond properly to a player’s
individual situation, the individual kand qweights
are important. If a player is newly registered in the
system, his ability to perform can initially only be
estimated and under certain circumstances may dif-
fer significantly from that of his teammates. For this
reason, a high value is chosen for both kAiand qAi,
which means that the rating is adjusted more closely
to current results and mainly the personal result is
responsible for the change.
With each completed game, the player’s rating
becomes more accurate and the comparability with
the ratings of his teammates better—therefore, kand
qare decremented after each game in which the player
has played. If the player does not play for a long time
due to injury or other reasons, the accuracy of his
rating will decrease, which is why kAiand qAiare
increased in each match without Ai.
A special case is present, if the player changes the
club: Players of a team usually are on a similar rating
level—in a club change, the rating of the player may
differ significantly from that of his new teammates.
In order for the rating to be adapted as quickly as
possible to the new situation, the k-factor and the q-
value of a player are set to the maximum each time a
club changes. For example, if a player makes a career
jump to a much better club, he has the opportunity to
increase his rating faster and adapt it to the values of
his new teammates—provided he produces the corre-
sponding results. kAiis at least 24 and at most 40; qAi
varies between 0.5 and 1 (for more k- and q-values
details see Appendix A.1). Adaptation of the k-factor
has also been employed by Sullivan & Cronin (2016),
who found that a higher kat the beginning of a season
improves the quality of the match prediction.
The team rating will also be updated after each
match. The following applies: R
A=RA+20 ·CA.
3.2. Data
For the dataset of this work, football matches from
the English Premier League, the Spanish La Liga,
the French Ligue 1, the Italian Serie A, the German
Bundesliga, the German 2. Bundesliga, the Turk-
ish S¨
uper Lig, the Russian Premier Liga, the Greek
Super League, the Dutch Erendivisie, the Austrian
Bundesliga, the Swiss Super League, the Portuguese
Liga NOS, the Belgian Jupiler League, the Danish
Superliga and the Ukraine Premier-Liga as well as
the UEFA(Union of European Football Associations)
Champions League and the UEFA Europa League
were recorded over a period of 4 years. The dataset
contains a total of 17 086 matches from 18 different
competitions, which were played between the first
round of the season 14/15 and the last of the season
17/18. Table 1 shows for each league the number and
the period of the recorded games.
The data of the match reports originate from the
website kicker.de. If the number of recorded games
S. Wolf et al. / A football player rating system 249
Table 1
Recorded games per league in the database
League Area Period of time Recorded
games
Premier League England 14/15 - 17/18 1520
La Liga Spain 14/15 - 17/18 1520
Ligue 1 France 14/15 - 17/18 1520
Serie A Italy 14/15 - 17/18 1519
Bundesliga Germany 14/15 - 17/18 1224
2. Bundesliga Germany 14/15 - 17/18 1222
S¨
uper Lig Turkey 14/15 - 17/18 1222
Premier Liga Russia 14/15 - 17/18 960
Super League Greece 15/15 - 17/18 956
Eredivisie Netherlands 15/16 - 17/18 918
Bundesliga Austria 14/15 - 17/18 719
Super League Switzerland 14/15 - 17/18 719
Liga NOS Portugal 16/17 - 17/18 612
Jupiler League Belgium 16/17 - 17/18 479
Superliga Denmark 15/16 - 17/18 388
Premier-Liga A Ukraine 15/16 - 17/18 268
Europa League UEFA 14/15 - 17/18 820
Champions League UEFA 14/15 - 17/18 500
Total 17086
is less than the expected number, there may be sev-
eral reasons: As a rule, a game could not be recorded
because a complete match report with lineups, goals
and player changes was not available—in exceptional
cases, the game was judged by the association or
canceled.
After the evaluation of all matches, 11 139 different
players and 438 different club teams were registered
in the database. 6 721 of these players were still active
in one of the recorded teams in the season 17/18; 6 538
completed at least 20 games.
3.3. Player initialisation
The players have to be initialised with different
values. In a model without initialisation, the system
would take much time, or respectively, a large number
of games to adapt the expectation values for every-
one to realistic ranges. If all players start with the
same rating, those who are in good teams and play
lots of games have an advantage—however, it is not
the quantity of the games that should be judged, but
the quality of the results on the pitch. The following
example shows that more games do not necessarily
lead to a higher rating: If player A plays all games, he
is always on the field when the team loses. If player
B, who belongs to the same team, is absent during
one of these defeats and plays all the other games, he
has a better rating than player A.
As already mentioned, calculates club team ratings
using the Elo method. These values provide a good
basis for classification. If no or only a few players
have a rating in a team, new players are initialised
based on the team’s current Elo rating. Due to player
transfers and the different lineups of a team, a club’s
Elo rating may differ from the actual strength of the
players on the pitch. If more than half of the players
of a lineup are already registered in the system, the
initial rating of a new player is therefore the average
score of his teammates.
The website records only games on professional
level and rates most teams between 1 000 and 2 100
points. Since the rating system for football players
should also include players at amateur level, the val-
ues of have been adjusted accordingly. The teams
were initialised so that professional players usually
have ratings of more than 4 000 points. Values of
5 000 points or more are seldom reached even by
world-class players. The value range is basically not
limited.
3.4. Player impact
A player’s rating strongly depends on the perfor-
mance of his team, as the player and his team usually
get the same results. Therefore, players from teams
performing below average will never achieve special
ratings by playing regularly. For this reason, with
the player impact, another key figure was developed.
This value indicates the influence of the player on
the results of his team. IAi, the impact of a player
is positive if his presence on the pitch had a posi-
tive influence on the results of the club. Players who
are rated particularly well here were therefore very
important to the success of their team.
The player impact calculations are based on the
difference between the team rating change with the
player and the team rating change without the player.
In addition, the number of minutes that the player
was on the field and the number of minutes the team
played without the player are considered:
IAi=CAwith Ai
MAwith Ai
CAwithout Ai
MAwithout Ai·90.
(11)
Current games are weighted more heavily than
matches in the past. The half-life of a game is 1
year. If the sum of minutes played with the player or
the sum of minutes played without the player is less
than 900 minutes (equivalent to 10 complete games),
the results will be weighted weaker because of their
low meaningfulness (for more player impact details
see A.2).
250 S. Wolf et al. / A football player rating system
Table 2
Top Ratings. Top 25 football players from the considered competitions according to their personal rating as of May 26, 2018
Rank Name Current Club Rating Impact Games Points Goal-
per Game difference
1. GERARD PIQU ´
E FC Barcelona 4974 4.23 147 2.43 271
2. LIONEL MESSI FC Barcelona 4960 4.49 180 2.39 341
3. DAVID ALABA Bayern M¨
unchen 4959 3.74 136 2.35 248
4. KEYLOR NAVAS Real Madrid 4957 2.77 130 2.33 222
5. DANIEL CARVAJAL Real Madrid 4951 1.3 132 2.28 192
6. IVAN RAKITIC FC Barcelona 4946 -1.02 176 2.25 272
7. JORDI ALBA FC Barcelona 4941 1.18 151 2.37 269
8. CRISTIANO RONALDO Real Madrid 4937 2.69 177 2.32 308
9. LUIS SUAREZ FC Barcelona 4931 0.8 168 2.39 298
10. LUKA MODRIC Real Madrid 4929 1.22 139 2.24 207
11. SERGIO BUSQUETS FC Barcelona 4921 0.18 169 2.32 278
12. JEROME BOATENG Bayern M¨
unchen 4920 3.62 111 2.43 186
13. SERGIO RAMOS Real Madrid 4918 0.21 144 2.23 193
14. MARCELO Real Madrid 4918 -0.52 166 2.23 271
15. RAPHAEL VARANE Real Madrid 4918 0.38 143 2.24 218
16. NEYMAR Paris St. Germain 4917 3.46 154 2.43 318
17. MARC ANDRE TER STEGEN FC Barcelona 4916 -0.69 121 2.34 205
18. ARJEN ROBBEN Bayern M¨
unchen 4912 0.31 110 2.26 176
19. CASEMIRO Real Madrid 4909 0.1 118 2.21 158
20. ALEIX VIDAL FC Barcelona 4909 1.82 77 2.16 92
21. KYLE WALKER Manchester City 4904 2.99 128 2.05 148
22. KEVIN DE BRUYNE Manchester City 4900 6.98 169 2.02 172
23. NACHO Real Madrid 4899 -1.41 108 2.21 165
24. NICOLAS OTAMENDI Manchester City 4897 6.45 155 2.04 174
25. DAVID SILVA Manchester City 4896 4.15 146 2.03 153
4. Analysis of the results
4.1. Presentation of the results
The initialisation of teams and players was done
as described in Section 3.3. The weight of each game
wis 1 in all domestic competitions. On international
level, wis increased due to the importance of the
competitions and amounts to 1.5 for games of the
UEFA Champions League and 1.25 for games of the
UEFA Europa League. The home advantage Hwas
initially estimated at 75 points, which is the approxi-
mated average home advantage by clubelo.com, and
adjusted after each game according to the result. If
the home team has exceeded expectations, the home
advantage increases. Analogously, His decreased if
the home team could not meet expectations and con-
sequently lost points. The new home advantage c is
calculated as follows: H=H+CA.
After evaluating the data, 21 players have a rating
of more than 4 900 and 103 players have a rating of
more than 4 800. Only players are considered who
completed at least 40 games and at the end of the
data collection were still active in Europe.
Table 2 presents the 25 players with the highest rat-
ings. The data are on the status of May 26, 2018. The
25 best players are active for 5 different clubs. With
Real Madrid (9 nominations) and FC Barcelona (8
nominations), 2 Spanish clubs are dominating this top
selection. The superiority of the Spaniards, who also
lead the UEFA Country coefficients ranking (UEFA
2018) can be explained by the fact that they have
performed very well in international comparison and
have won the UEFA Champions League in all 4 years
of the evaluation period. The fact that the list of the 25
best players also includes players that most experts
are unlikely to classify as world-class has the follow-
ing reason: The listed players play in very good or
successful teams, and the results they produce on the
pitch are as good as or even better than those of their
teammates.
Gerard Piqu´
e, who plays for FC Barcelona, has
the highest rating of all 3 733 players considered,
with a score of 4 974, and leads the ranking ahead
of his teammate Lionel Messi and David Alaba from
FC Bayern M¨
unchen. Piqu´
e is also the player with
the most points per game. Like Neymar and Jerome
Boateng, he also scores an average of 2.43 points per
match. The fact that this value correlates only par-
tially with the player’s rating has two reasons: The
value does not record the quality of the opponent and
creates only an average without considering the date
S. Wolf et al. / A football player rating system 251
Table 3
Top Impacts. Top 15 football players from the considered competitions according to their impact as of May 26, 2018. Only players with a
rating of more than 4800 are considered for these statistics
Rank Name Current Club Impact Rating Games Club Rating
1. MOHAMED SALAH FC Liverpool 8.43 4858 156 4779
2. KEVIN DE BRUYNE Manchester City 6.98 4900 169 4840
3. NICOLAS OTAMENDI Manchester City 6.45 4897 155 4840
4. RAHEEM STERLING Manchester City 4.97 4879 165 4840
5. SADIO MANE FC Liverpool 4.75 4830 138 4779
6. LIONEL MESSI FC Barcelona 4.49 4960 180 4923
7. GERARD PIQU ´
E FC Barcelona 4.23 4974 147 4923
8. EDERSON Manchester City 4.21 4854 84 4840
9. DAVID SILVA Manchester City 4.15 4896 146 4840
10. EDINSON CAVANI Paris St. Germain 4.13 4814 172 4772
11. MARCO VERRATTI Paris St. Germain 4.06 4842 127 4772
12. ERIC BAILLY Manchester United 4.04 4832 101 4751
13. PAUL POGBA Manchester United 3.99 4831 156 4751
14. ANTOINE GRIEZMANN Atletico Madrid 3.95 4849 191 4818
15. DAVID ALABA Bayern M¨
unchen 3.74 4959 136 4876
Table 4
Correctly predicted games by seasons. ‘Team rating’ evaluates the strength of the team regardless of the lineup. ‘Player rating’ takes into
account the strength of each player in the lineup. Both methods always bet on the team that has the higher rating. The listed values indicate
how often the prediction (‘win Team A or ‘win Team B’) was identical to the actual outcome (‘win Team A’ or ‘draw’ or ‘win Team B’)
14/15 15/16 16/17 17/18
Player rating (%) 1860 (51.57) 2010 (51.00) 2595 (54.39) 2513 (52.72) 8978 (52.55)
Team rating (%) 1864 (51.68) 1993 (50.57) 2575 (53.97) 2502 (52.49) 8934 (52.29)
Difference -4 17 20 11 44
Total 3607 3941 4771 4767 17 086
of the game. The record for the highest rating ever is
held by Neymar, who reached 5022 points on March
16, 2016 as a player of FC Barcelona.
In Table 3, the players are sorted by their player
impact. Only the 103 players with a rating of
over 4 800 are considered. The statistics is led by
Mohamed Salah, who played for FC Basel, AS Roma,
and FC Liverpool during the recorded period. Salah
had consequently a big positive impact on the results
of his clubs. For comparison, the rating of the team
is given for each player—it can be seen that the indi-
vidual rating of the player is always higher than the
corresponding club rating.
4.2. Evaluation of the results
For the evaluation, all games from the evaluated
period are analysed. A comparison between the actual
results of the games and the expectations predicted
by different models is made. It is measured how
often the model favoured team could win the match.
It has already been shown by Bigsby & Ohlmann
(2017), who evaluated the results of college wrestling
matches in their studies, that meaningful predictions
can be made with the Elo method in sports.
The first method (‘player rating’) is based on the
average player ratings of each of both teams and is
calculated according to Formula 6. The presented
system not only rates the players, but also assigns rat-
ings to teams using the same procedure. This value is
therefore independent of the lineup of the respective
team and is used as a second method (‘team rating’)
for the evaluation.
The main goal of the evaluation is to clarify the
question: Does the player rating procedure outper-
form the team rating method in predicting game
results? In other words, can the outcomes of football
matches be better predicted if one knows not only
the strength of the two teams, but also has knowl-
edge of the ratings of the participating players? For
best results, orderly probit or logit regression models
should be used, according to the studies of Hvattum
& Arntzen (2010) and Asimakopoulos & Goddard
(2004), who tested Elo-based prediction algorithms
in football. Since the optimisation of the results in
this article plays only a minor role, an approach that
is easy to automate and requires no additional input
data is used for the evaluation.
Table 4 shows the values of the correctly pre-
dicted games for the two procedures presented and
252 S. Wolf et al. / A football player rating system
Fig. 2. Difference of correctly predicted games between ‘Team rating’ and ‘Player rating’. ‘Team rating’ evaluates the strength of the team
regardless of the lineup. ‘Player rating’ takes into account the strength of each player in the lineup. Both methods always bet on the team
that has the higher rating. The listed values indicate how often the prediction (‘win Team A or ‘win Team B’) was identical to the actual
outcome (‘win Team A or ‘draw’ or ‘win Team B’).
Table 5
Correctly predicted games 17/18 for Premier League (PL), Ligue 1 (L1), Bundesliga (BL), Serie A (SA), La Liga (L1) and Champions
League (CL). ‘Team rating’ evaluates the strength of the team regardless of the lineup. ‘Player rating’ takes into account the strength of each
player in the lineup. Both methods always bet on the team that has the higher rating. The listed values indicate how often the prediction
(‘win Team A or ‘win Team B’) was identical to the actual outcome (‘win Team A or ‘draw’ or ‘win Team B’)
PL L1 BL SA LL CL
Player rating (%) 209 (55.00) 208 (54.74) 150 (49.02) 224 (58.95) 210 (55.26) 71 (56.80) 1072 (54.95)
Team rating (%) 206 (54.21) 206 (54.21) 148 (48.37) 221 (58.16) 207 (54.47) 69 (55.20) 1057 (54.18)
Total 380 380 306 380 380 125 1951
individual seasons. Since players are initialised based
on their team rating, the two variants initially produce
the same predictions. Over time, the player rating
deviates from the team rating and produces better
predictions. Figure 2 shows this history and repre-
sents the difference between the games predicted
correctly.
For the last recorded season, the games of the
English Premier League, the France Ligue 1 (L1),
the German Bundesliga (BL), the Italian Serie A
(SA), the Spanish La Liga (LL), and the UEFA
Champions League (CL) are evaluated separately and
together (). Of the players playing in these compe-
titions, comparatively most games were previously
recorded. Table 5 shows the results. It can be seen
that the player rating makes better predictions for all
competitions.
The Brier score function was used as a further eval-
uation method. Therefore, the quadratic loss between
the expectation EAtand the result SAtof the home
team was calculated for each game t. The Brier Score
is the average of all Ngames and can be calculated
with the following formula:
BS =1
N
N
t=1
(EAtSAt)2(12)
Again, all major league games from the 2017/18
season were evaluated using the team rating and the
player rating procedure, which calculated the expec-
tation value. In addition, a baseline evaluation was
carried out—the expectation value of the home team
was therefor always set to 0.5. The home team’s
result is 1 if they win, 0.5 if they draw and 0 if
they lose (see Formula 8). The results are shown in
Table 6. For all leagues, the player rating procedure
receives the smallest output value, so it has the least
deviation between expectation and result and conse-
quently beats the other systems. This result supports
the hypothesis that the player rating method can pro-
duce more accurate predictions than the team rating
method after a sufficient training time.
S. Wolf et al. / A football player rating system 253
Table 6
Brier score for games 17/18 for Premier League (PL), Ligue 1 (L1), Bundesliga (BL), Serie A (SA), La Liga (L1) and Champions League
(CL). ‘Team rating’ evaluates the strength of the team regardless of the lineup. ‘Player rating’ takes into account the strength of each player
in the lineup. Baseline rates both teams as equally strong. The listed values indicate the average quadratic loss between the expectation and
the result of the home team
PL L1 BL SA LL CL
Player rating 0.1440 0.1454 0.1575 0.1448 0.1562 0.1457 0.1490
Team rating 0.1451 0.1504 0.1581 0.1454 0.1632 0.1586 0.1524
Baseline 0.1848 0.1868 0.1822 0.1953 0.1934 0.1980 0.1894
5. Potentials and limitations of the rating
system
The system offers many possibilities for enhance-
ments and improvements. One possible extension
concerns the initialisation of young players. Since
the players are always initialised on the level of
their teammates at the first game, this is particularly
problematic for young players. For example, a youth
player of Bayern M¨
unchen has a higher value than
a large part of the remaining Bundesliga after ini-
tialisation. As young players usually improve their
skills over time, this should also be considered in the
rating system. This would be possible if one consid-
ers the age of the players during the initialisation and
additionally introduces a ‘youth bonus’. For example,
one could initialise an 18-year-old with only 75 % of
the actual value and give him bonus points for every
game he finishes at the age of 21. Thus, the develop-
ment curve of the player as well as the number of his
matches would be considered. So far, only matches
at club level have been recorded by the system. How-
ever, if the k- and q-values are adjusted, the results of
national team matches can also be taken into account.
The system has various applications and great
potential due to the incredibly large number of active
football players. Most of the players play on ama-
teur level. While, as mentioned above, at professional
level, countless data is collected for each game and
player, amateur football has no statistics to judge a
player’s performance. For almost all games, however,
a match report which contains result, date, goals, line-
ups, and player changes is recorded and published. A
match report thus contains all the information that
the presented system needs to evaluate football play-
ers. This makes it possible to evaluate an extremely
large number of games. Thus, not only professional
players but also amateur footballers can be recorded,
rated based on their skill level, and compared to each
other—regardless of league, region, gender, and age.
That adaptive scoring systems in general and proce-
dures according to the Elo algorithm in particular also
produce reasonable results in mass sports, shows the
success in the application to various individual sports.
Creating a good system for initialising amateurs is
not trivial. A possible formula for the initial rating of
amateur teams and potential problems are discussed
in the following: In Germany, a team in the 1st divi-
sion (league level 1) has an average rating of 4375
points. For clubs in the 2nd division (league level 2),
the rating is around 250 points lower on average. As
a rough heuristic, we assume that the average rating
per league level in Germany will drop by 250 points.
In Germany there are between 9 and 13 league levels
depending on the region. We assume an average of 11
levels Since the strengths of the teams at the lowest
level will be approximately the same in all regions,
the quality / strength differences between teams of
two leagues are greater in regions with fewer levels.
Therefore, the formula also takes into account the
number of league levels. For a German team Aplay-
ing at league level LAin a region with LMax league
levels, the following formula results for the initial
rating RA:
RA=4375 250 ·(LA1) ·10
LMax 1(13)
A team playing in the lowest league initially has
1 875 points (regardless of the number of levels in the
region). If a club plays in the 6th of 11 leagues, it will
start with 3 125 points. For other European countries,
depending on the average rating in the top division
and the number of league levels, a similar formula
based on the considerations above can be created.
Here we assume that teams from the lowest league
are on average about as strong as those from other
countries and should therefore also initially be rated
with 1 875 points. For a general formula for clubs
from other European countries, the average rating in
the 1st division (?RL1) must also be considered. This
results in the following formula:
RA=?RL1(?RL11875) ·(LA1)
LMax 1(14)
254 S. Wolf et al. / A football player rating system
Table 7
Statistics of FC Barcelona in the Spanish La Liga (LL) and the Champions League (CL) for the entire season 17/18. The first row indicates
the minutes (M), goal difference (D = goals scored - goals conceded) and the minutes per goal difference (M/D) for all matches of the season.
The second row only considers minutes and goals if Aleix Vidal was on the pitch. The third row takes into account the total playing time
and all goals of games in which the Aleix Vidal has played for at least 1 minute
LL CL
M D M/G M D M/D M D M/D
Total 4320 +70 48.9 900 +11 81.8 4320 +81 53.3
With Aleix Vidal on the pitch 452 +15 30.1 112 +2 56.0 564 +17 33.2
Games with Aleix Vidal involved 1350 +33 40.1 360 +6 60.0 1710 +39 43.8
The average rating of a Spanish 1st division club is
4 460 points—a team that plays in Spain in the 6th of 9
leagues initially receives a rating of 2 844 (rounded),
for example. The presented formula should only give
a first intuition for the possible initialisation of the
amateurs. For more specific initialisations, one would
have to deal individually and intensively with the
league systems of the respective countries.
Another possible application is the scouting of
players. The system can filter out players who—
compared to their teammates and with their club—
succeed above-average. Such players, who ‘make
their own team better’, are likely to be interesting
to various other teams. The benefit of an individual
player rating system was impressively demonstrated
by the National Baseball Team Oakland Raiders in
the so-called “Moneyball Years” (Lewis 2003).
As an example of an underrated player, Aleix Vidal
could be mentioned. This player occupies place 20
in the overall top ratings, although he was usually
only a substitute at FC Barcelona and had relatively
little playing time. The reason why this player is
ranked so well can be seen from Table 7: On aver-
age, FC Barcelona in the 17/18 season needed fewer
minutes to score 1 goal more than their opponent
when Aleix Vidal was on the pitch—thus the team
played more successfully with Vidal than without
him. Against this, one could argue that Vidal was
just lucky and only played against supposedly easier
opponents, against whom Barcelona could achieve
better goal differences. Thus, we only consider the
games in which the player was used: Again, the goal
difference in relation to the playing minutes with
Vidal on the pitch is better than the goal difference
without Vidal. After the season 17/18, Aleix Vidal’s
appearances and statistics were not as good as before.
Reasons for this are that footballers’ careers are not
always linear and that the random factor also plays a
role in games—therefore, the results previously pro-
cessed by the system cannot predict all future events
correctly.
Nevertheless, as Section 4.2 makes clear, the rating
system is quite suitable for game predictions. With
appropriate adjustment, it would also be possible to
specify quotas for corresponding bets. It should be
noted that our system knew the actual lineup of the
teams and used them for the evaluation. In reality,
however, the lineups will only be announced shortly
before kick-off. In order to calculate the player rating
of a team, one would have to use the expected lineups.
It would still have to be checked whether similarly
good results can be achieved with this.
Similar to the way Elo’s algorithm was adopted by
numerous individual sports, this system could also
be adopted in a slightly modified form by other team
sports. In order to be able to evaluate handball, ice
hockey, or basketball players on the basis of their
results, in the present algorithm only the influences
of goal or point difference and played minutes would
have to be adapted to the respective conditions (dura-
tion of a game, possible results).
6. Conclusion
We developed a novel rating system for football
players. The basic hypothesis is that the success of
a footballer positively correlates with their playing
strength. The developed system is based on the algo-
rithm of Elo, which as an objective and adaptive
rating model offers various advantages over subjec-
tive or statistics-based methods. The Elo algorithm,
designed for chess and adapted from other individ-
ual sports, updates the ratings of the players involved
after each game by comparing the expectation value
resulting from the strengths of the two players with
the actual outcome of the game.
According to the requirements of the team sport
football, the algorithm had to be fundamentally
revised and extended. The modified algorithm for
football players takes into account the strengths of
the two opposing teams and the result of the game as
S. Wolf et al. / A football player rating system 255
well as for each individual player the personal rating,
the individual result, and the minutes on the pitch. In
addition, a second key figure was developed with the
player impact. This value indicates the dependence
of the team success on the presence of a player on the
pitch. Over the 4 years between the first matchday of
the 14/15 season and the last matchday of the 17/18
season, matches from 18 different European compe-
titions were recorded. A total of 17 086 games were
applied to the algorithm and 11 139 different players
were registered in the database.
After the application of the Elo model to all games,
FC Barcelona’s Gerard Piqu´
e had the highest rating
with 4 974 points ahead of Lionel Messi and David
Alaba. The Spanish top clubs Real Madrid and FC
Barcelona, who won the UEFA Champions League
in all 4 years of the rating period, own 17 of the top
25 players. Mohamed Salah, who lastly played for
Liverpool, had the highest impact of all top players.
Kevin de Bruyne and Nicolas Otamendi, both Manch-
ester City players, are on the second and third place,
respectively, in this ranking.
The average of the lined-up players’ rating indi-
cates the strength of a team lineup for a match. In the
evaluation part, it was checked for all games whether
the team favoured by the system based on the deter-
mined strength values could actually win the game. It
could be shown that the result of a game can be more
accurately predicted, if one knows not only the par-
ticipating teams but also the ratings of the lined-up
players.
The presented rating system still offers possibili-
ties for improvements and expansions. For example,
the implementation of youth balancing, which takes
into account the development curve of young actors,
would be possible. In order to record a game, the
system only needs the corresponding match report,
which is normally also recorded and published on the
amateur level. The system can thus determine rating
values for an extremely large number of footballers—
resulting in high relevance. After an appropriate
adaptation, the algorithm could also produce prof-
itable insights in determining odds, scouting players,
or evaluating players from other team sports.
Recently, promising extensions of the Elo rating
system have been suggested. For instance, Dorsey
(2019) extends the Elo algorithm for chess play-
ers with time as a covariate, called Elo regression,
improving the accuracy of the ranking. Kovalchik
(2020) compares 4 different variants of Elo mod-
els for tennis players, which are able to predict
the margin of victory as well. These and further
possible enhancements will be object to our future
research.
In conclusion, it can be noted that the pre-
sented rating system for football players, which—as
an objective and adaptive procedure—represents an
innovation compared to existing models, has great
potential and high relevance due to the shown bene-
fits.
References
Arndt, C. and Brefeld, U., 2016, ‘Predicting the future performance
of soccer players’, Statistical Analysis and Data Mining: The
ASA Data Science Journal 9(5), 373-382.
Asimakopoulos, I. and Goddard, J., 2004, ‘Modelling football
match results and the efficiency of fixed-odds betting’, Jour-
nal of Forecasting 23(1), 51-66.
Aslan, B. G. and Inceoglu, M. M., 2007, A comparative study on
neural network based soccer result prediction, in ‘Proceed-
ings of the Seventh International Conference on Intelligent
Systems Design and Applications (ISDA)’, IEEE, pp. 545-
550.
Beswick, B., 2010, Focused for Soccer, Human Kinetics, Cham-
paign, United States. 2nd Edition.
Bigsby, K. G. and Ohlmann, J., 2017, ‘Ranking and prediction of
collegiate wrestling’, Journal of Sports Analytics 3(1), 1-19.
ClubElo 2014, ‘Football club elo ratings’. Last access: 2020-06-28.
URL: http://clubelo.com/System.
Decroos, T., Bransen, L., Van Haaren, J. and Davis, J., 2019,
Actions speak louder than goals: Valuing player actions in
soccer, in Proceedings of the 25th ACM SIGKDD Interna-
tional Conference on Knowledge Discovery & Data Mining’,
Anchorage, AK, USA, pp. 1851-1861.
Dorsey, J., 2019, Elo regression extending the elo rating system,
Master’s thesis, University of Akron, Akron, OH, USA.
Egidi, L. and Gabry, J., 2018, ‘Bayesian hierarchical models
for predicting individual performance in soccer’, Journal of
Sports Analytics 3(14), 143-157.
Elo, A. E. 1978, The rating of chessplayers, past and present, Arco
Pub., New York, United States.
FIDE 2017, ‘Fide rating regulations’. Last access: 2020-06-28.
URL: https://ratings.fide.com/calculator rtd.phtml.
FIFA 2018a, ‘2018 fifa world cup russia - global broadcast and
audience summary’. Last access: 2020-06-28. URL: https://
www.fifa.com/worldcup/news/more-than-half-the-world-
watched-record-breaking-2018-world-cup.
FIFA 2018b, ‘Revision of the fifa / coca-cola world ranking’.
Last access: 2020-06-28. URL: https://img.fifa.com/image/
upload/edbm045h0udbwkqew35a.pdf.
FIFA 2020a, ‘The fifa women’s world ranking’. Last access:
2020-06-28. URL: https://www.fifa.com/fifa-world-ranking/
ranking-table/women/.
FIFA 2020b, ‘The fifa/coca-cola men’s ranking’. Last access:
2020-06-28. URL: https://www.fifa.com/fifa-world-ranking/.
256 S. Wolf et al. / A football player rating system
G´
asquez, R. and Royuela, V., 2016, ‘The determinants of interna-
tional football success: a panel data analysis of the elo rating’,
Social Science Quarterly 97(2), 125-141.
Hossain, H. S., Khan, M. A. A. H. and Roy, N. 2017, Soccer-
mate: A personal soccer attribute profiler using wearables,
in ‘Proceedings of the International Conference on Pervasive
Computing and Communications Workshops’, IEEE, Kona,
HI, USA, pp. 164-169.
Hub´
aˇ
cek, O., ˇ
Sourek, G. and ˇ
Zelezn´
y, F., 2019, Score-based soc-
cer match outcome modeling-an experimental review, Athens,
Greece, pp. 164-172.
Hvattum, L. M. and Arntzen, H., 2010, ‘Using elo ratings for
match result prediction in association football’, International
Journal of Forecasting 26(3), 460-470.
Kempe, M., Goes, F. R. and Lemmink, K. A., 2018, Smart data
scouting in professional soccer: Evaluating passing perfor-
mance based on position tracking data, in ‘Proceedings of the
14th International Conference on e-Science’, IEEE, Amster-
dam, The Netherlands, pp. 409-410.
Kharrat, T., Pe˜
na, J. and Mchale, I., 2017, ‘Plus-minus player rat-
ings for soccer’, European Journal of Operational Research
726-736.
Kovalchik, S., 2020, ‘Extension of the elo rating system to margin
of victory’, International Journal of Forecasting. 13 pages.
Lasek, J., Szl´
avik, Z. and Bhulai, S., 2013, ‘The predictive powerof
ranking systems in association football’, International Jour-
nal of Applied Pattern Recognition 1(1), 27-46.
Lewis, M., 2003, Moneyball: the art of winning an unfair game,
W.W. Norton, New York, United States.
McHale, I. G., A. Scarf, P. and Folker, D. E., 2014, ‘On the devel-
opment of a soccer player performance rating system for
the english premier league’, INFORMS Journal on Applied
Analytics 4(42), 329-340.
Paix˜
ao, P., Sampaio, J., Almeida, C. H. and Duarte, R., 2015,
‘How does match status affects the passing sequences of
top-level european soccer teams?’, International Journal of
Performance Analysis in Sport 15(1), 229-240.
Pappalardo, L., Cintia, P., Ferragina, P., Massucco, E., Pedreschi,
D. and Giannotti, F., 2019, ‘PlayeRank: data-driven perfor-
mance evaluation and player ranking in soccer via a machine
learning approach’, ACM Transactions on Intelligent Systems
and Technology (TIST) 10(5), 1-27.
Peeters, T., 2018, ‘Testing the wisdom of crowds in the field:
Transfermarkt valuations and international soccer results’,
International Journal of Forecasting 34(1), 17-29.
Pollard, R. 1986, ‘Home advantage in soccer: A retrospective anal-
ysis’, Journal of Sports Sciences 4(3), 237-248.
Schultze, R. S. and Wellbrock, C.-M., 2018, A weighted
plus/minus metric for individual soccer player performance’,
Journal of Sports Analytics 4(2), 121-131.
Stefani, R. and Pollard, R., 2007, ‘Football rating systems for top-
level competition: A critical survey’, Journal of Quantitative
Analysis in Sports 3(3), 21 pages.
Sullivan, C. and Cronin, C., 2016, ‘Improving elo rankings for
sports experimenting on the english premier league’, Virginia
Tech CSx824/ECEx424 technical report, VA, USA.
TotalSportek 2019, ‘25 world’s most popular sports (ranked
by 13 factors)’. Last access: 2020-06-28. URL:
https://www.totalsportek.com/most-popular-sports/.
Trombley, M. J., 2016, ‘Does artificial grass affect the competitive
balance in major league soccer?’, Journal of Sports Analytics
2(2), 73-87.
UEFA 2018, ‘Uefa country coefficients’. Last access: 2020-
06-28. URL: https://www.uefa.com/memberassociations/ue
farankings/country/#/yr/2018.
Worldatlas 2018, ‘The most popular sports in the world’. Last
access: 2020-06-28. URL: https://www.worldatlas.com/arti
cles/what-are-the-most-popular-sports-in-the-world.html.
S. Wolf et al. / A football player rating system 257
Appendix A. Calculation of some values in
detail
Chapter 3 presented the formulas for the football
player rating system—for reasons of clarity, some
details have been omitted there. Appendix A provides
the interested reader with more details about some of
these formulas.
A.1. k- and q-Value
kAi, the value indicating the strength of the rating
change of player Aiis initially 32, maximum 40 and
minimum 24. After each game of the team of Ai,
the player’s k-value is recalculated according to the
following formula:
k
Ai=
40 if Aichanged the club
kAi+0.5ifAiwas not on the pitch during the game
kAi0.25 if Aiwas on the pitch during the game
qAi, the value that indicates the weight of the
individual change compared to the team change is
initially 1, maximum 1 and minimum 0.5. After each
game of the team of Ai, the player’s q-value is recal-
culated according to the following formula:
q
Ai=
1ifAichanged the club
qAi+0.025 if Aiwas not on the pitch during the game
qAi0.025 if Aiwas on the pitch during the game .
The values for k and q were initially chosen intu-
itively and then optimised through a few tests. For
example, if a newly initialised player plays regularly
for 1 season, his k- and q-values are at the bottom
of the scale. If a player plays less often, his values
increase and the results of his few games are weighted
more heavily.
A.2 Player impact
Section 3.4 presented the player impact, a key fig-
ure that indicates the influence of a player to the
success of the team.
In this chapter, it was mentioned that the change
of the team rating with player Ai(CAwith Ai)is
weighted less if the sum of the minutes with player
Ai(MAwith Ai) is less than 900 minutes. In detail
the formula in this case is as follows:
IAi=CAwith AiCAwithout Ai
MAwithout Ai·90
·0.1+MAwith Ai
900 .
If the player was not on the pitch less than 900
minutes (ergo: MAwithout Ai< 900), the formula
is similar:
IAi=CAwith Ai
MAwith Ai·90 CAwithout Ai
·0.1+MAwithout Ai
900 .
If MAwith Ai< 900 and MAwithout Ai<
900 applies, the formula is modified as follows:
IAi=CAwith AiCAwithout Ai
·0.1+MAwith Ai
900
·0.1+MAwithout Ai
900 .
The results have a half-life of 1 year. This means
that the minutes and the team change of a 365-day-
old game are weighted half as much as the minutes
and the team change of a current game. The weight
Wby which the results of a game Gplayed xdays
ago are multiplied with is calculated as follows:
WG=0.5x
365.25
... sorting the rating values [1]; so, a "stronger" team has a large rating and is ranked before the weaker team [2]. Hence, the rating, which reflects the skills/abilities/strength of each team compared to the others (see the extensive surveys in [1-4]), can be used to obtain quick information on the state of a tournament, to assess the strength of teams, to decide on seeding/pairing (defining game schedules with more "interesting" matches) or promotion/relegation (between high-and low-level leagues) [3], as well as to provide support for both bookmakers and gamblers [5,6]. As a consequence of practical applicability, the development and refinement of rating 1 systems have become an important and active research field (as properly summarized, e.g., through [10] and [11] and references therein). ...
... Variants of this algorithm have also been applied to other sports [3,6,9,15,[21][22][23][24][25][26][27][28][29][30][31][32][33][34], as well as in the context of tournament simulations [35][36][37]. Although the rating principles are very similar, each application can use its own version of the algorithm, e.g., incorporating custom step sizes, the so-called home-field advantage (HFA), and/or different approaches for modeling 1 The problem of rating falls into the topic of "paired comparison" modeling in the field of statistics [7][8][9]. ...
... in other words, it is not possible to establish any deterministic guarantee to reach the optimum 6 As pointed out critically in [39,Sec. 3], the statement "ratings tend to converge on a team's true strength relative to its competitors after about 30 matches" is accepted at face value both in the literature and applications [34], [17]. ...
Article
Full-text available
The Elo algorithm, renowned for its simplicity, is widely used for rating in sports tournamentsand other applications. However, despite its widespread use, a detailed understanding of theconvergence characteristics of the Elo algorithm is still lacking. Aiming to fill this gap, thispaper presents a comprehensive (stochastic) analysis of the Elo algorithm, considering round-robin tournaments. Specifically, analytical expressions are derived describing the evolution ofthe skills and performance metrics. Then, taking into account the relationship between thebehavior of the algorithm and the step-size value, which is a hyperparameter that can be con-trolled, design guidelines and discussions about the performance of the algorithm are provided.Experimental results are shown confirming the accuracy of the analysis and illustrating theapplicability of the theoretical findings using real-world data obtained from SuperLega, theItalian volleyball league Available from: https://authors.elsevier.com/c/1iAun3l0~hpGUs
... Expected threat (xThreat) [10][11][12] and VAEP [13][14][15][16] are examples of bottom-up ratings that quantify action quality and use the quality of the actions to create general ratings. Top-down ratings such as plus-minus ratings [17][18][19][20][21], Elo ratings adjusted for team sports [22], and SciSkill algorithm [23] distribute credit of player performance based on the result of a team as a whole. For the monetary value of players, many models about the estimation of transfer fees and market values have been introduced and provide indicators of the current value of football players [24]. ...
... Elo ratings are such ratings and were originally developed to evaluate performance in one-on-one sports. The concept was subsequently adapted to the game of football by Wolf et al. [22]. This adapted algorithm provides ratings for each individual football player and calculates the team rating via the average of players in a game weighted by the number of minutes played. ...
Preprint
Full-text available
Transfers in professional football (soccer) are risky investments because of the large transfer fees and high risks involved. Although data-driven models can be used to improve transfer decisions, existing models focus on describing players' historical progress, leaving their future performance unknown. Moreover, recent developments have called for the use of explainable models combined with uncertainty quantification of predictions. This paper assesses explainable machine learning models based on predictive accuracy and uncertainty quantification methods for the prediction of the future development in quality and transfer value of professional football players. Using a historical data set of data-driven indicators describing player quality and the transfer value of a football player, the models are trained to forecast player quality and player value one year ahead. These two prediction problems demonstrate the efficacy of tree-based models, particularly random forest and XGBoost, in making accurate predictions. In general, the random forest model is found to be the most suitable model because it provides accurate predictions as well as an uncertainty quantification method that naturally arises from the bagging procedure of the random forest model. Additionally, our research shows that the development of player performance contains nonlinear patterns and interactions between variables, and that time series information can provide useful information for the modeling of player performance metrics. Our research provides models to help football clubs make more informed, data-driven transfer decisions by forecasting player quality and transfer value.
... The Elo rating system, employed by the World Chess Federation and other chess associations, serves as a method to gauge the relative skill levels of players (Berg, 2020). Additionally, the Elo rating system can be effectively applied in diverse sporting domains (Angelini et al., 2022;Kovalchik, 2020;Wolf et al., 2020). The formula for rating teams based on Elo rating is provided by Berg (2020 ...
... Generally, teams playing on other turfs lose more than the home team (Trombley, 2016), as their opponent is playing in their home and enjoys the familiarity with the stadium or turf and the support from the audience at the stadium. Thus, Wolf et al. (2020) proposed an adjusted Elo rating system for football, where home field advantage is adjusted to the home team's rating. If R is the rating of the home team, the adjusted rating of the home team with home-field advantage is given by the formula, ...
Article
With the advent of formats of limited overs in cricket like the One Day and Twenty-20 Internationals, test cricket, the longest-running version of the game, has witnessed a significant decline in popularity. The World Test Championship (WTC) was launched in 2019 by the International Cricket Council to reignite interest in test cricket. However, the initial point distribution system devised for the inaugural WTC had several shortcomings, which include allocating points based on the outcome of the entire series rather than the number of test matches played, the absence of rewards for teams winning matches away from home, failing to take into consideration the margin of victory (MOV) and not accounting for the relative strength of competing teams. Considering these shortcomings, this article proposes a new point system, thus providing an alternative to the existing one. The point system uses the Elo rating system, including factors such as home-field advantage, impact of the toss and MOV. Statistical analyses were conducted to validate the home-field advantage and toss impact. Subsequently, the proposed model developed was named the points won per match system. The team rankings in the WTC are established based on this proposed model and compared with the actual one.
... Rating consists of assigning a numerical value to a competitor based on empirical observations (of the game outcomes), while ranking is obtained by sorting the rating values [1]; so, a "stronger" team has a large rating and is ranked before the weaker team [2]. Thus, the rating, reflecting the skills/abilities/strength of each team as compared with the others (see the extensive surveys in [1-3]), can be used to obtain quick information on the state of a competition, to assess the strength of teams, to decide on seeding/pairing (defining game schedules with more "interesting" matches) or promotion/relegation (between high-and low-level leagues) [3], as well as to provide support for both bookmakers and gamblers [4,5]. ...
... Over the recent decades, different ranking/rating algorithms have been devised and used in tournaments/competitions (sports and games) to assign numbers to teams (or players in the Elo algorithm have also been derived and applied to various other sports, as discussed, among others, in [3,5,8,14,[23][24][25][26][27][28][29][30]. However, despite the widespread use of the Elo algorithm, the discussions presented so far in the literature have generally been based on empirical studies. ...
Preprint
Full-text available
The Elo algorithm, due to its simplicity, is widely used for rating in sports competitions as well as in other applications where the rating/ranking is a useful tool for predicting future results. However, despite its widespread use, a detailed understanding of the convergence properties of the Elo algorithm is still lacking. Aiming to fill this gap, this paper presents a comprehensive (stochastic) analysis of the Elo algorithm, considering round-robin (one-on-one) competitions. Specifically, analytical expressions are derived characterizing the behavior/evolution of the skills and of important performance metrics. Then, taking into account the relationship between the behavior of the algorithm and the step-size value, which is a hyperparameter that can be controlled, some design guidelines as well as discussions about the performance of the algorithm are provided. To illustrate the applicability of the theoretical findings, experimental results are shown, corroborating the very good match between analytical predictions and those obtained from the algorithm using real-world data (from the Italian SuperLega, Volleyball League).
... One of the skills that have been developed and raised is scoring, which is considered the most important skill, which is the outcome of effort and fatigue, and in its loss, all the player's maneuvers are lost and there is no use in them without scoring, and here they were focused on with the Funion exercises by setting several small goals to raise the level of scoring. Wolf, Schmitt, and Schuller (2020) [30] affirm that the success of a player is heavily influenced by the accuracy of their scoring and their physical strength. They argue that the player's skill and training directly impact their ability to score powerfully and precisely in a certain location. ...
... Menurut (Rifaldi et al., 2023) permainan yang menggunakan media atau alat salah satunya sepak bola, bola basket, bola voli dan futsal. Permainan yang menggunakan bola ini merupakan olahraga yang sangat populer diseluruh dunia (Wolf et al., 2020). Hal ini menandakankan bahwa olahraga ini sangat digandrumi oleh banyak orang. ...
Article
Full-text available
This study aims to examine the effect of target patterns on shooting accuracy in futsal games at the extracurricular SMA Negeri 1 Ciwaringin. This form of research uses quantitative with a form of experimental method and sampling is done using total sampling with a research sample of 20 people. The research research design used a one-group pretest-posttest design. This research was conducted to determine whether or not shooting accuracy in futsal players, the research instrument was a shooting kick test to the target. The research was conducted 12 times a meeting by conducting an initial test (pretest) then given treatment (treatment) 10 times a meeting using a target pattern and then a final test (posttest). The results of the normality test with the Mean of the posttest 11.3 is greater than the pretest 7.45 with a Mean difference of 3.85 so there is a change after being given tereatment. The results of hypothesis testing using the t test, it can be seen that the t value (13.491) > t table (1.699). From the results of the t test Sig value obtained is 0.000 smaller < 0.05, then Ho is rejected and Ha is accepted. So it can be concluded that there is a significant effect of the target game pattern training model on shooting accuracy in extracurricular participants of SMA Negeri 1 Ciwaringin.
... Expected goals are one of the most famous approaches and focus on the production process instead of accounting September 2023 SILVA, NAKAMURA, JAVANI, MARCELINO only for what happens on the score line [3]. Another example is the Elo system that considers expectations before the match -if a player exceeds expectations, his rating will increase [24]. These protocols use algorithms that combine different information and report classification of players' performance. ...
Article
Full-text available
Introduction. Soccer players attribute particular importance to media and how newspapers portray their individual performances. However, these analysis can be biased and unfair to players. Aim of Study. This study analyzed how sport newspapers rated soccer players’ performances, comparing them to performance variables from a data-driven platform (SofaScore). Material and Methods. Ratings from the last five games of the Portuguese first division (2021/22 season) were collected from ‘A Bola’, ‘O Jogo’ and ‘Record’ newspapers, in addition of SofaScore data: SofaScore rating, goals scored, saves (for goalkeepers), assists to goals, successful exits (for goalkeepers), accurate passes, key passes, success dribbles, ball lost, shots on target, tackles and duels won (both aerial and on the ground). Results. Correlations between newspapers and between newspapers and SofaScore Rating were moderate to strong (0.54-0.64, p < 0.001). Goalkeepers received higher ratings (‘A Bola’: 5.77 ± 0.99; ‘O Jogo’: 5.73 ± 0.83; ‘Record’: 2.85 ± 0.80). Goalkeepers receive higher newspaper’ ratings if they perform more Exits (‘A Bola’ and ‘Record’) and won Duels (‘O Jogo’). Outfield players receive higher newspaper’ ratings if they score (‘A Bola’, ‘O Jogo’ and ‘Record’) or assist (‘O Jogo’ and ‘Record’) goals. Conclusions. With this information, players can better understand newspapers ratings, while the media can evaluate the fairness of those evaluations, especially regarding players that are usually distant from goal situations.
Article
Full-text available
Biathlon is an Olympic sport combining cross-country skiing with rifle shooting, giving a penalty for each target miss. The biathletes ran different race formats, including the pursuit race. During this race, the biathletes chase the leader with a start time identical to the result of the sprint race previously achieved. So, pursuit involves different skills (such as tactics or management of emotional pressure) that are not present during races with an interval-start procedure like sprint. Nevertheless, final pursuit rankings are strongly correlated to sprint ones, which prevents a spectacular comeback after a disappointing sprint race. We present here an alternative pursuit ranking system that is nearly decorrelated to sprint rankings. This simple ranking system is based on comparisons with previous pursuit results. The current and the alternative rankings were then compared on different pursuit rankings, using a database of 148 results from men pursuit world cups. The alternative ranking was shown to strongly modify a single pursuit ranking but these modifications were smoothed on a whole world cup season. Advantages and limitations of the alternative ranking system are discussed, paving the way to a fairer modification of the current pursuit ranking to increase surprise and suspense in biathlon pursuit races.
Article
Full-text available
We investigate the state-of-the-art in score-based soccer match outcome modelling to identify the top-performing methods across diverse classes of existing approaches to the problem. Namely, we bring together various statistical methods based on Poisson and Weibull distributions and several general ranking algorithms (Elo, Steph ratings, Gaussian-OD ratings) as well as domain-specific rating systems (Berrar ratings, pi-ratings). We review, reimplement and experimentally compare these diverse competitors altogether on the largest database of soccer results available to identify true leaders. Our results reveal that the individual predictions, as well as the overall performances, are very similar across the top models tested, likely suggesting the limits of this generic approach to score-based match outcome modelling. No study of a similar scale has previously been done.
Article
Full-text available
The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this article, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players’ evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank—i.e. searching players and player versatility—showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.
Conference Paper
Full-text available
Assessing the impact of the individual actions performed by soccer players during games is a crucial aspect of the player recruitment process. Unfortunately, most traditional metrics fall short in addressing this task as they either focus on rare actions like shots and goals alone or fail to account for the context in which the actions occurred. This paper introduces (1) a new language for describing individual player actions on the pitch and (2) a framework for valuing any type of player action based on its impact on the game outcome while accounting for the context in which the action happened. By aggregating soccer players' action values, their total offensive and defensive contributions to their team can be quantified. We show how our approach considers relevant contextual information that traditional player evaluation metrics ignore and present a number of use cases related to scouting and playing style characterization in the 2016/2017 and 2017/2018 seasons in Europe's top competitions.
Article
Full-text available
Although there is no consensus on how to measure and quantify individual performance in any sport, there has been less development in this area for soccer than for other major sports. And once this measurement is defined, does modeling for predictive purposes make sense. We use the player ratings provided by a popular Italian fantasy soccer game as proxies for the players’ performance; we discuss the merits and flaws of a variety of hierarchical Bayesian models for predicting these ratings, comparing the models on their predictive accuracy on hold-out data. Our central goals are to explore what can be accomplished with a simple freely available dataset comprising only a few variables from the 2015–2016 season in the top Italian league, Serie A, and to focus on a small number of interesting modeling and prediction questions that arise. Among these, we highlight the importance of modeling the missing observations and we propose two models designed for this task. We validate our models through graphical posterior predictive checks and we provide out-of-sample predictions for the second half of the season, using the first half as a training set. We use Stan to sample from the posterior distributions via Markov chain Monte Carlo.
Article
Full-text available
Despite soccer being the number one sport in the world in many respects, the “beautiful game” still lags behind other sports in terms of analytics. We propose a weighted plus/minus metric to be used as an instrument to evaluate player performance. An unweighted plus/minus metric subtracts goals conceded from goals scored for each player while on the field of play and are regularly used in hockey or basketball. Key improvements to this established, unweighted +/– metric include control for opponents’ strength, the importance of a particular goal, and considerations for the fact that scoring is less frequent in soccer. The results from three teams (Bayern Munich, VfL Wolfsburg, Werder Bremen) in the German Bundesliga from the 2012-13 season were used as a demonstration and comparison of the two metrics. In addition to the creation of a weighted +/– system, a spatial mapping system of team shots was developed to give a potential visual explanation of why certain players were a net positive/negative influence for their team.
Article
Full-text available
The paper presents a plus-minus rating for use in association football (soccer). We first describe the standard plus-minus methodology as used in basketball and ice-hockey and then adapt it for use in soccer. The usual goal-differential plus-minus is considered before two variations are proposed. For the first variation, we present a methodology to calculate an expected goals plus-minus rating. The second variation makes use of in-play probabilities of match outcome to evaluate an expected points plus-minus rating. We use the ratings to examine who are the best players in European football, and demonstrate how the players' ratings evolve over time. Finally, we shed light on the debate regarding which is the strongest league. The model suggests the English Premier League is the strongest, with the German Bundesliga a close runner-up.
Article
The Elo rating system is one of the most popular methods for estimating the ability of competitors over time in sport. The standard Elo system focuses on predicting wins and losses, but there is often also interest in the margin of victory (MOV) because it reflects the magnitude of a result. There have been few theoretical investigations and comparisons of Elo-based models. In the present study, we propose four model options for an MOV Elo system: linear, joint additive, multiplicative, and logistic. Notations and guidance for tuning each model are provided. The models were applied to men’s tennis for several MOV choices. The results showed that all MOV approaches using within-set statistics improved the predictive performance compared with the standard Elo system, but only the joint additive model yielded unbiased ratings with stable variance in the simulation study. This general framework for MOV Elo ratings provide sports modelers with a new set of tools for building systems to rate competitors and forecast outcomes in sport.
Article
This paper investigates the value of collective judgments which stem from settings that have not been designed explicitly to elicit the ‘Wisdom of Crowds’. In particular, I investigate information obtained from transfermarkt.de, an online platform where a crowd of registered users assess the value of professional soccer players. I show that forecasts of international soccer results based on the crowd's valuations are more accurate than those based on standard predictors, such as the FIFA ranking and the ELO rating. When this improvement in forecasting performance is applied to betting strategies, it leads to sizable monetary gains. I further exploit information on the preferences of individual crowd members in order to investigate whether wishful thinking hampers the accuracy of crowd valuations, but fail to find evidence that such is the case.