ArticlePDF Available

Machine learning in men's professional football: Current applications and future directions for improving attacking play

Authors:
  • Deutscher Fußball-Bund

Abstract and Figures

It is common practice amongst coaches and analysts to search for key performance indicators related to attacking play in football. Match analysis in professional football has predominately utilised notational analysis, a statistical summary of events based on video footage, to study the sport and prepare teams for competition. Recent increases in technology have facilitated the dynamic analysis of more complex process variables, giving practitioners the potential to quickly evaluate a match with consideration to contextual parameters. One field of research, known as machine learning, is a form of artificial intelligence that uses algorithms to detect meaningful patterns based on positional data. Machine learning is a relatively new concept in football, and little is known about its usefulness in identifying performance metrics that determine match outcome. Few studies and no reviews have focused on the use of machine learning to improve tactical knowledge and performance, instead focusing on the models used, or as a prediction method. Accordingly, this article provides a critical appraisal of the application of machine learning in football related to attacking play, discussing current challenges and future directions that may provide deeper insight to practitioners.
Content may be subject to copyright.
Review
Machine learning in men’s professional
football: Current applications and future
directions for improving attacking play
Mat Herold
1,2
, Floris Goes
3
, Stephan Nopp
2
, Pascal Bauer
2
,
Chris Thompson
1
and Tim Meyer
1
Abstract
It is common practice amongst coaches and analysts to search for key performance indicators related to attacking play in
football. Match analysis in professional football has predominately utilised notational analysis, a statistical summary of
events based on video footage, to study the sport and prepare teams for competition. Recent increases in technology
have facilitated the dynamic analysis of more complex process variables, giving practitioners the potential to quickly
evaluate a match with consideration to contextual parameters. One field of research, known as machine learning, is a
form of artificial intelligence that uses algorithms to detect meaningful patterns based on positional data. Machine
learning is a relatively new concept in football, and little is known about its usefulness in identifying performance metrics
that determine match outcome. Few studies and no reviews have focused on the use of machine learning to improve
tactical knowledge and performance, instead focusing on the models used, or as a prediction method. Accordingly, this
article provides a critical appraisal of the application of machine learning in football related to attacking play, discussing
current challenges and future directions that may provide deeper insight to practitioners.
Keywords
Football, machine learning, match analysis, soccer, tactical performance
Introduction
The search for key performance indicators related to
goal-scoring in elite association football is of great
interest to researchers, coaches, and analysts.
Historically, researchers and practitioners have pre-
dominately utilised observational analysis to optimise
the training process and game preparation of their
players and teams.
1
Studies have shown successful foot-
ball teams create more goal-scoring opportunities than
the opposition
2
by penetrating the defence
3
and achiev-
ing greater entries into the penalty box (defined as ‘an
event that took place either when the team in posses-
sion of the ball passed it into the opponent’s penalty
area, regardless of whether the pass was received by a
teammate, or when a player in possession of the ball
went into that area of the pitch’).
4–6
These findings are
valuable for identifying key game events, however, such
time-consuming approaches give little or no consider-
ation to the interaction process between teams, includ-
ing rapidly changing contextual circumstances. Thus,
the demand for more automated approaches to analyse
tactical behaviour in men’s professional football is
increasing.
7
In the last decade, the ability to find key individual
and team performance indicators has been greatly
increased due to technological developments in (but
not limited to) automatic tracking systems, video-
based motion analysis, and Global Positioning
System
8
units.
9
As well as their widespread use in
Reviewer: Ray Stefani (California State University, US),
Robert Rein (German Sports University, Germany),
Ros Frederic (Universite d’Orleans, France)
1
Institute of Sports and Preventive Medicine, Saarland University,
Saarbru¨cken, Germany
2
Deutscher Fußball-Bund, Frankfurt am Main, Germany
3
Center for Human Movement Sciences, University Medical Center
Groningen, University of Groningen, Groningen, The Netherlands
Corresponding author:
Mat Herold, Institute of Sports and Preventive Medicine, Saarland
University, Campus B8.2, 66123 Saarbru¨cken, Germany.
Email: mat.herold@uni-saarland.de
International Journal of Sports Science
& Coaching
0(0) 1–20
!The Author(s) 2019
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/1747954119879350
journals.sagepub.com/home/spo
training sessions, FIFA have permitted the use of wire-
less sensors to track player positions and physiological
parameters during competitions.
10
Accounting for com-
plex interactions occurring within a match, network
analysis
11
and spatio-temporal metrics
12,13
have been
employed for dynamic analysis and data simulation in
sports.
14–17
Such information can be helpful to guide
analysts, coaches, and players in making crucial tactical
decisions at the highest levels of elite football.
18
Aside from current approaches, the application of
machine learning in football is an emerging field of
research used to reveal trends and distinguish between
successful and less successful teams.
19,20
Machine
learning is the practice of utilising statistical and
computational methods for classification, pattern
recognition (similar to data mining), prediction
21
and
to draw inferences from datasets consisting of input
data without labelled responses.
22
Machine learning is
typically divided into two areas: supervised and
unsupervised learning. In supervised learning, one
aims to optimise a model on a set of labelled training
data to fit to a given response. Case in point, the team
tactic of penetrating passes can be learned by feeding
the machine with examples of penetrating passes.
Ultimately, supervised learning is aimed at satisfying
the following equation
y¼fxðÞ
where yis a given response which is binary, multi-class,
or continuous; Xis the data comprising features and fis
a function that machine learning attempts to optimise
in order to approximate the equation. An example of
supervised learning research in football is the work of
Wei et al.,
23
who used a decision tree to map player
movement to the response of different phases of the
game (shots, corners, free kicks, etc.).
In unsupervised approaches, a model aims to
uncover structures and patterns in unlabelled data.
For complex problems with an unknown desired
response, unsupervised machine learning approaches
have been used to measure inter-player coordination,
team–team interaction including the time preceding key
game events such as shots on goal, and compact-
ness.
23–27
Serving as an example of an unsupervised
learning approach, Bialkowski et al.,
24
used tracking
data to define a specific ‘role’ within the formation of
the team for individual players at various intervals of
the game. More specifically, they utilised minimum
entropy data partitioning which does not rely on a fun-
damental pre-determined model.
As illustrated by those examples, machine learning
algorithms hold the potential to provide coaches and
analysts with additional information to evaluate the
game. By identifying specific patterns in large datasets,
machine learning models can perform tasks such
as automatically identifying team formations,
28
or pre-
dicting how players move on offense and defence.
29
Further, due to the subjective perspectives of the
coach or scout observing a game and rating the tactics,
the data often lack objectivity and reliability.
30
In this
regard, machine learning may allow a more profound
analysis of complex process variables, potentially offer-
ing more scientifically backed, evidence-based informa-
tion to coaches and analysts.
11,31–33
As new possibilities arise with the use of new data
sources and new approaches to analyse these data,
there is a demand to further understand the advantages
and limitations of applying machine learning methods
to football.
34–36
Tracking data contrasts with the trad-
itional event data approach as the volume, variety and
precision provide detailed datasets to sports visualiza-
tion researchers.
37
Professional players nowadays are
tracked during every training session and match by dif-
ferent systems, but the quality of data remains in ques-
tion.
38
Further, it is plausible that mounts of data
without solid theory will not accurately inform deci-
sions and thus, these methods must be meticulously
validated. Accordingly, a multi-disciplinary approach,
including the collaboration of big data technologies
with football research may facilitate a comprehensive
theoretical model and understanding of tactical per-
formance. The aim of this review is to provide a critical
appraisal of the literature related to machine learning in
football, and in turn, inspire the future application of
these complex approaches in a more relevant manner.
This review consists of two sections: (a) a review of
existing research on machine learning in football
related to attacking play, outlining findings and limita-
tions and (b) emphasising the practical application of
machine learning, identify challenges and suggest ave-
nues of future research to improve upon the features
and practices associated with football performance.
Methods
A descriptive review of the available literature on match
analysis in elite professional male football was con-
ducted. Data were collected from the following compu-
terized databases for the period 1996–2018: PubMed,
Web of Science, MEDLINE, SPORTDiscus, Research
gate, Elsevier, and ProQuest. Multiple searches were
conducted. The search terms included football, soccer,
machine learning, match analysis, performance
analysis, game analysis, notational analysis, dynamic
analysis, performance indicators.
The inclusion criteria for these articles were
39
rele-
vant data concerning technical and tactical evalu-
ation,
39
participants included professional adult male
footballers
39
and written in the English language.
2International Journal of Sports Science & Coaching 0(0)
Studies were excluded if they
39
included children or
adolescents (under 18 years),
39
included females,
39
focused on set plays, small-sided games or futsal.
The focus on professional male football was deter-
mined by several factors:
1. There is greater relevance and a larger number of
studies have been conducted with machine learning
approaches using match data versus training data.
This enables the comparison of different studies.
2. There are behavioural differences between match
play and training.
40
3. Studies using small-sided games have shown import-
ant physiological and tactical differences to 11 versus
11 football.
41,42
4. There are differences in tactical behaviour between
youth football and the elite.
43
To evaluate the included studies, we looked for several
characteristics. The general machine learning approach
(supervised vs unsupervised), the specific machine
learning approach (e.g. neuronal networks, k-nearest
neighbour), the input data used (event vs tracking
data) and if the authors provided sufficient information
to redo the analysis (see Table 1).
Machine learning in football
The purpose of this section will be to review the existing
research on machine learning in football related to
attacking play, outlining findings and limitations. This
section is split into two sections: studies using event
data and studies using tracking data.
Machine learning models in football using event data
Event data is the standard source to quantify and
evaluate individual and team performance in the last
decades.
20,44
Event data consists of outcome measures
such as frequencies, proportions and other accumulated
performance indicators of events happening through-
out a match.
44
In general, those studies were interested
in identifying tactical patterns in a game using unsuper-
vised machine learning approaches or predicting indi-
vidual or team success using supervised approaches.
To derive tactical patterns from match data, Hirano
and Tsumoto
45
created a multi-scale matching and
rough clustering method based on temporal event data
consisting of 168 time-series sequences from 64 games of
the 2002 FIFA World Cup. Pass patterns such as side-
attacks and zig-zag pass transactions leading to a goal
could be automatically clustered. In another implement-
ing a pattern recognition approach, Montoliu et al.
46
formulated a Bag-of-Words-based method to analyse
the most common movements in a dataset of two regular
Spanish La Liga matches played by four professional
teams. Among other possible uses, common team activ-
ities such as ball possession, quick attacks and set-piece
plays could be recognized to identify and evaluate one’s
own team, and the opponent team’s strengths and weak-
nesses before, during and after a match. Although the
results of those studies were promising, the quality of
their models should be questioned due to the relatively
low number of data points. In addition, they did not
include test sets of data not used to build their models
and to validate them.
A recent study involving a much larger dataset
including 6396 games and 10 million events from the
2013–2014, 2014–2015 and 2015–2016 seasons of the
Premier League, Serie A, La Liga, Bundesliga, Dutch
Eredivisie, Ligue 1 used machine learning to quantify
the relation between performance (passes, crosses,
shots, tackles, dribbles, clearances, goalkeeping actions,
fouls, intercepts, aerial duels, goals scored and goals
conceded) and success based on goal difference.
47
The
logistical regression and classification model could pre-
dict simulated team rankings close to the actual rank-
ings. The discriminatory features between top teams
and bottom teams included producing more passes
and shots than the opponent, and committing less
fouls, tackles and goalkeeping actions. Although the
events extracted from the match appear like traditional
notational analysis, a team’s playing quality including
pass precision and a team’s spatial and temporal dom-
inance including average team position, speed and
accelerations could be measured as well. However,
there was some observed divergence due to the erratic
nature of football, such as psychological factors and
contextual factors either not captured by soccer logs
or not measurable by existing technologies. It was con-
cluded that combining their data logs with player track-
ing data and mathematical models could better describe
the spatio-temporal trajectories of players during a
match. Reproducing game patterns between two
teams would more accurately characterise the relation-
ship between technical performance and success.
In addition to pass quality and variability, the loca-
tion of passes is also a determinant of successful
offense. Providing context related to playing in critical
pitch zones, Brooks et al.
48
used a k-nearest neighbour
approach to qualitatively assess the effect of passes tra-
velling into and out of Zone 14, the zone located in the
middle of the pitch immediately outside the opposing
penalty area (see Figure 1). Based on all passes from the
2012–2013 Spanish La Liga season, possession in Zone
14 often correlates to shooting opportunities. This is in
support of previous studies using notational analysis,
where Zone 14 was correlated with assists by making
forward passes into the penalty area.
49,50
However, one
limitation of using pitch zones is that achieving
Herold et al. 3
Table 1. Data extraction table (machine learning in men’s professional football).
Study ML approach Used ML algorithms
Reproducible
details
reported Study synopsis Transfer to practice Data
Tracking
versus
event
data
Hirano and
Tsumoto
45
Unsupervised
(clustering)
Multi-scale matching/
induced dissimilarity
matrix and rough
clustering method
Yes Presented a method for grouping
pass patterns such as side-attacks
after complex pass transactions,
and zig-zag pass transactions in
soccer game records.
Set stage for future work
to help coaches identify
and execute passing
patterns to increase
goal-scoring.
64 games of the FIFA world
cup 2002
Event
Joseph et al.
81
Supervised
(classification)
Several techniques
were used: MC4, a
decision tree lear-
ner; Naive Bayesian
learner; Data Driven
Bayesian and a
K-nearest neighbour
learner
Yes With a MC4 learner, identified attri-
butes which have the largest effect
on the outcome of the game. The
Bayesian network looked for cor-
relations between the values of
the attributes including the result.
Can help analyse and iden-
tify important factors
from past games.
Matches played by Tottenham
Hotspur (1995–1997)
Event
Hucaljuk and
Rakipovic
´
80
Supervised
(prediction)
Naı¨ve Bayesian
Network, KNN,
ANN, Random
Forests
Yes Based on number of injuries, goals,
team formation and other factors
training a supervised ML model to
predict scores. ML model had
60% accuracy in predicting the
score.
Model can be improved and
then can be used to
predict score and pri-
oritise tactics
accordingly.
The group stage of
Champions League (96
matches)
Event
Lucey et al.
61
Supervised
(classification)
k-nearest neighbours Yes Proposed a method to characterise
team behaviour for soccer by
representing team behaviour via
play segments, which are spatio-
temporal descriptions of ball
movement over fixed windows of
time. Using these representations,
it characterized team behaviour
from entropy maps, which gives a
measure of predictability of team
behaviours across the field.
Can be used to predict
tactics using spatio-
temporal data.
2010–2011 English Premier
League soccer data
Tracking
Chassy
64
Unsupervised
(clustering)
PCA for clustering.
Correlation to judge
efficacy of passing
and compare it with
possession
Yes Explored the idea that a football
team can be formalised as a self-
organising system. By applying the
definition of self-organisation to
football, the study concluded that
team play constitutes the core of
performance.
Coaches can use this infor-
mation to improve pas-
sing behaviour.
Data from the 2013 European
Champions League
Event
(continued)
4International Journal of Sports Science & Coaching 0(0)
Table 1. Continued
Study ML approach Used ML algorithms
Reproducible
details
reported Study synopsis Transfer to practice Data
Tracking
versus
event
data
Wei et al.
23
Supervised
(classification)
and Unsupervised
(clustering)
k-means, PCA,
Decision Trees
Yes Presented a method for large-scale
analysis of team behaviour across
large volumes of player and ball
tracking data. Clustered plays of a
team to describe their most likely
motion patterns associated with
an event (such as shots, corners,
free-kicks). Proposed a two-layer
hierarchical approach to auto-
matically segment a match. Using
a decision-tree formulation, can
accurately retrieve events or
detect highlights.
A useful method to
improve the under-
standing about decision-
making and identify pat-
ters related to goal-
scoring.
Tracking data across nine
complete matches from a
top-tier European soccer
league
Tracking
Bialkowski
et al.
28
Unsupervised
(clustering)
Expectation
Maximization (like k-
means)
Yes Could automatically detect and visu-
alize formations from soccer
player tracking data. Utilizing an
entire season’s data from Prozone
from a top-tier professional
league, they used an automatic
approach to see whether the for-
mation they played had anything
to do with explaining why teams
are more successful at home
rather than away. Could auto-
matically detect and visualise for-
mations using tracking data.
Could determine if formation had
anything to do with playing home
versus away, discovering teams at
home play higher up the pitch.
Can help analysts discover
and implement new
tactical behaviours.
20 teams who played every
team home and away and
the data consisted of the
location of every player at
10 frames-per-second as
well as the ball events. 20
teams who played every
team home and away and
the data consisted of the
location of every player at
10 frames-per-second as
well as the ball events. 20
teams who played every
team home and away and
the data consisted of the
location of every player at
10 frames-per-second as
well as the ball events. Top
tier professional league
consisting of 20 teams
who played every team
home and away
Tracking
(continued)
Herold et al. 5
Table 1. Continued
Study ML approach Used ML algorithms
Reproducible
details
reported Study synopsis Transfer to practice Data
Tracking
versus
event
data
Bialkowski
et al.
24
Unsupervised
(clustering)
Expectation
Maximization, k-
means clustering,
Minimum Entropy
data partitioning
Yes Presented a role-based representa-
tion to represent player tracking
data, which was found by mini-
mizing the entropy of a set of
player role distributions. Showed
how this could be efficiently
solved using an EM approach
which simultaneously assigns
players to roles throughout a
match and discovers the team’s
overall role distributions.
Can help perform individ-
ual player and team
analysis, including large-
scale team analysis over
a full season.
One season of ball tracking
data from a professional
soccer league
(&40,00,00,000 data
points)
Tracking
Lucey et al.
73
Supervised
(prediction)
Unsupervised
(clustering)
Logistic Regression and
Conditional Random
Field
Yes Found that not only is the game
phase important (i.e. corner, free-
kick, open-play, counter-attack
etc.), the strategic features such
as defender proximity, interaction
of surrounding players, speed of
play, coupled with the shot loca-
tion play an impact on determin-
ing the likelihood of a team
scoring a goal.
Can help coach know
important factors lead-
ing to a goal.
One season ball tracking data
taken from a professional
league (&40,00,00,000
data points)
Both
Eggels and
Pechenizkiy
54
Supervised
(prediction)
Logistic Regression,
Decision Trees,
Random Forests
Yes Proposed a method to determine the
expected winner of a match by
estimating the probability of
scoring for the individual goal-
scoring opportunities. The out-
come of a match is then obtained
by integrating these probabilities.
Can pre-strategise accord-
ing to predicted results
obtained using the pro-
posed model.
Event data tracked by
ORTEC. Data about the
quality of players
extracted from the web.
Spatio-temporal data from
seven different leagues
over three seasons
Both, but
only classified
with event
data due to
accuracy
issues with
ORTEC
Fernando
et al.
25
Unsupervised
(clustering)
k-means Yes Presented a method which can be
used to compare the scoring
methods of teams in soccer using
fine-grained player tracking and
ball-event data.
Help compare goal-scoring
style of teams.
One season of tracking data
from Prozone consisting
of 20 teams
Tracking
(continued)
6International Journal of Sports Science & Coaching 0(0)
Table 1. Continued
Study ML approach Used ML algorithms
Reproducible
details
reported Study synopsis Transfer to practice Data
Tracking
versus
event
data
Horton
et al.
57
Supervised
(classification)
Multinomial Logistic
Regression
Yes Presented a motion model that cal-
culates regions such as ‘passable
regions’ for every pass that can
learn a classifier to rate the quality
of passes made during a football
match with an accuracy of up to
85.8%. Evaluated passes according
to difficulty or decision quality.
Help use quality of pass to
train players on improv-
ing their passing
behaviour.
Four Home Games of
Arsenal Football Club
Both
Montoliu
et al.
46
Supervised
(classification)
Bag of Words, k-
Nearest Neighbour,
the Support Vector
Machine and the
Random Forest
Yes Proposed a new method based on
using local motion features and a
Bag-of-Words strategy to charac-
terise short Football video clips.
Approach can correctly distin-
guish between actions related to
Ball Possession, Quick Attack and
Set-Piece plays. The proposed
Bag-of-Word feature vector
appropriately captures the behav-
iour of players. The proposed
method is useful to analyse the
most common movements of a
team when playing a match. The
Random Forest classifier obtains
the best classification.
Can be used to predict
opponent activity and
prepare for it more
quickly. Identify the
opponent team’s
strengths and weak-
nesses before the
match, obtain the
opponent’s main tactics
during a match, observe
his/her team’s perform-
ance during a match and
evaluate the players as a
whole or individually
immediately after the
match, among other
possible applications.
A private dataset of football
videos of four games
Event
Ruiz et al.
53
Supervised
(prediction)
Multilayer Perceptron
(algorithm for
supervised learning
of binary classifiers)
Yes Model can predict shot expectancy
accurately and can be used to
strategise for an advantage.
Modelling the expected
goal value of shots gives
an additional level of
detail to the analysis of
offensive and defensive
performance in football.
10,318 shots taken during the
2013–2014 English
Premier League, extracted
from Prozone
Matchviewer event data
Event
Tax and
Joustra
82
Supervised
(prediction)
PCA, Naı¨ve Bayesian,
Multilayer
Perceptron
Yes Model can predict result of a match
using relevant input up to a high
accuracy.
Can be used to determine
areas where a team
needs to improve on
and/ or plan strategy to
combat the expected
result.
A public data based match
prediction system for the
Dutch Eredivisie
Event
(continued)
Herold et al. 7
Table 1. Continued
Study ML approach Used ML algorithms
Reproducible
details
reported Study synopsis Transfer to practice Data
Tracking
versus
event
data
Bialkowski
et al.
22
Unsupervised
(clustering)
Expectation
Maximization, k-
means clustering,
Minimum Entropy
data partitioning
Yes Could identify team structure or
‘formation’ which served as a
strong descriptor for identifying a
team’s style.
Can be useful for strategic
planning, evaluation and
tactical adjustments.
20 teams who played home
and away. 38 matches for
each team or 380 matches
overall. Six of these
matches were omitted
due to missing data
Tracking
Brooks et al.
48
Supervised
(classification
and prediction)
k-Nearest Neighbours Yes Using heatmaps and KNN, ranked
offensive players and identified
team patterns using passing data.
Can help improve passing
behaviour.
2012–2013 La Liga season.
Dataset provided by Opta
Sports.
Event
Feuerhake
26
Supervised
(classification)
and Unsupervised
(clustering)
k-means, Apriori,
Levenstein dis-
tances, DBSCAN,
Euclidean movement
space
Yes Used distances, clustering and clas-
sification to analyse sequence of
movements (pattern recognition)
in a soccer game.
Needs more research, but
being able to predict the
trajectories of players
can potentially help
coaches devise tactical
plans and for player
selection that favours
different playing styles.
Three datasets with different
characteristics. The first
two datasets contain
movement information of
players during football
matches and the third
contains car trajectories
and thus is from a com-
pletely different context
Tracking
Knauf et al.
67
Unsupervised
(clustering)
Clustering tasks and k-
medoids. Temporal
kernels were
Gaussian kernel
combined with
three different spa-
tial kernels
Yes Presented spatio-temporal convolu-
tion kernels for multi-object
scenarios.
Clusters can help coaches
identify how teams
behave in attack. That is,
teams preferring many
short moves that
involve multiple players
versus teams utilising
long moves with more
linear actions and less
ball contacts.
10 soccer games of the
German Bundesliga from
the 2011–2012 seasons
Tracking
Szczepan
´ski and
McHale
60
Supervised
(classification
and prediction)
Additive Mixed
Modelling
Yes Presented a method which can be
used to evaluate passing skill of
footballers controlling for the
difficulty of their attempts.
Combined proxies for various
factors influencing the probability
that a pass is successful in a stat-
istical model and evaluate the
inherent player skill in this
context.
Coaches can use it to
improve passing for
players.
2006–2007 season of the
English Premier League
provided by Opta
Event
(continued)
8International Journal of Sports Science & Coaching 0(0)
Table 1. Continued
Study ML approach Used ML algorithms
Reproducible
details
reported Study synopsis Transfer to practice Data
Tracking
versus
event
data
Chawla
et al.
59
Supervised
(classification)
Multinomial Logistic
Regression
Yes Present a model that can teach a
classifier to rate the quality of
passes made during a football
match with an accuracy of up to
85.8%. Furthermore, a rating of
the quality of each of the passes
made in the four matches was
made by two human observers.
Help use quality of pass to
train players on improv-
ing their passes.
Four home matches played by
Arsenal Football Club
(2007–2008)
Both
Le et al.
29
Supervised
(prediction)
LSTM Neural
Networks
Yes Generated the defensive motion
pattern of the ‘league average’
team, resulting in a similar
expected goal value (69.1% for
Swansea and 71.8% for the ‘league
average’ ghosts).
Further research can help
build automated strate-
gies against specific
opponents.
100 games of player tracking
and event data from a
professional soccer league
Tracking
Memmert
et al.
27
Supervised
(classification)
and Unsupervised
(clustering)
Neural Networks Yes Demonstrated different kinds of
computer science approaches to
obtain and analyse new param-
eters such as inter-player coord-
ination, inter-team coordination
before critical events and team–
team interaction and compact-
ness coefficients.
Potentially help coaches
modify their training
methods (e.g. focusing
on recent trends in
game philosophy and
tactics) according to
their needs and to
improve the tactical
behaviour of their
players. Can be an
important step towards
objectification of tactical
performance compo-
nents in team sports.
A single set of position data
from an 11 versus 11
match (Bayern Munich
against FC Barcelona)
Event
Pappalardo and
Cintia
47
Supervised
(classification
and prediction)
Logistic Regression Yes A team’s position in a competition
final ranking is significantly related
to its typical performance, as
described by a set of technical
features extracted from the
soccer data. Victory and defeats
can be explained by the team’s
performance during a game, but it
is difficult to detect draws.
Helps coaches prioritise
objectives and predict
success based on per-
formance features cor-
related to success.
More than 6000 games and
10 million events in six
European leagues
Event
(continued)
Herold et al. 9
Table 1. Continued
Study ML approach Used ML algorithms
Reproducible
details
reported Study synopsis Transfer to practice Data
Tracking
versus
event
data
Power et al.
75
Supervised
(classification)
Unsupervised
(clustering)
Logistic Regression and
k-means clustering
Yes Presented an objective method of
estimating the risk and reward of
all passes using a supervised
learning approach.
Can be used to coach and
choose players accord-
ing to opposition in
terms of pass risk and
reward.
English Premier League games
between 2014–2015 and
2015–2016 seasons total-
ling 726 matches
Both
Rathke
52
Supervised
(prediction)
Logistic Regression Yes Demonstrated the value and reli-
ability that xG has within profes-
sional football. The variables of
distance and angle together were
seen to have a major impact of
calculating xG rather than dis-
tance as a variable alone.
No direct practical applica-
tion; however, it could
be incorporated into
training and as a teach-
ing tool on both offense
(how attacking players
should strike certain
shots) and defence
(optimise positioning to
defend dangerous
shots).
Shots from the Premier
League and Bundesliga
games (380 and 306) from
the 2012–2013 season
Event
Ruiz et al.
53
Supervised
(prediction)
Multilayer Perceptron Yes Model can predict shot expectancy
accurately and can be used to
strategise.
Modelling the expected
goal value of shots gives
an additional level of
detail to the analysis of
offensive and defensive
performance in football.
10,318 shots taken during the
2013–2014 English
Premier League, extracted
from Prozone
Matchviewer event data
Event
Barron et al.
51
Supervised
(classification
and prediction)
Multilayer Artificial
Neural Network
Yes Using match and player data and
artificial neural network, the
model learns which player can be
helpful or conducive to team by
predicting his career trajectory
using quantifiable features.
Possibly useful for player
recruitment.
Performance data from 966
outfield players90-min
performances in the
English Football League
Event
Goes et al.
62
Supervised
(prediction)
Unsupervised
(clustering)
Multiple Linear
Regression, PCA
Yes Presented approach is the first
quantitative model to measure
pass effectiveness based on
tracking data that are not linked
directly to goal-scoring
opportunities.
Help coaches train their
players to increase the
efficacy of passes.
18 competitive professional
soccer matches of 1 team
against 13 different teams
during the 2017–2018
Dutch premier league
(Eredivisie)
Tracking
(continued)
10 International Journal of Sports Science & Coaching 0(0)
possession in a specific zone does not guarantee or pro-
vide information about whether a team is penetrating
the opponent’s defence. As an illustration, a team with
possession in Zone 14 can still face an opposition with
11 players behind the ball such as a rebound after a set
play, or against a team that tactically emphasises a
defensive structure. Moreover, long completed passes
showed a negative relationship with shots taken,
suggesting teams should consider how the ball arrives
to Zone 14.
48
In a study demonstrating the potential for machine
learning to be used in the scouting and recruitment
process in a professional football, technical perform-
ance data were collected from 966 outfield players in
the English Football League Championship during the
2008–2009 and 2009–2010 seasons.
51
Key performance
indicators that influence players’ league status and
accurately predict their future success in football were
identified using quantifiable features, circumventing the
subjective process and bias using traditional notational
observation. Players most likely to end up in the
English Premier League averaged the fewest unsuccess-
ful first-time passes, had a higher mean number of
possessions and averaged more passes to teammates
in the penalty area. The authors stated that future
work assessing players should account for positional
differences as well as pass accuracy over varying dis-
tances and directions in specific areas of the pitch.
In recent years, an expected goal value (xG) model
has been developed to evaluate the offensive perform-
ance of players and teams. The xG model assigns a
value between 0 and 1 (with 1 being the maximum
and representing a certain goal) to every attempt
based on the quantity and quality (i.e. assist type,
shot angle and distance from goal, whether it was
a headed shot etc.) of shots taken. While a variety
of these models have proven valuable in predicting
shooting outcomes and scouting for players with high
conversion rates, they either did not acknowledge
52,53
or capture
54
opponent positioning and thus, failed to
provide context that coaches and analysts can apply to
the match. In general, studies conducted to predict indi-
vidual or team success used larger samples to build
their model. However, they mostly used rather simple
regression models to build classification or prediction
models.
Machine learning models using tracking data
In the second part of the review, we present studies
using tracking data to build their machine learning
models. One of the advantages of tracking data is
that it allows analysts to make quantitative compari-
sons between two teams, different groups of players or
among different individual players.
55
Therefore, newer
Table 1. Continued
Study ML approach Used ML algorithms
Reproducible
details
reported Study synopsis Transfer to practice Data
Tracking
versus
event
data
Hobbs et al.
77
Unsupervised
(clustering)
Spatio-temporal
Trajectory
Clustering
Yes Could objectively and automatically
identify counter-attacks and
counter-pressing without requir-
ing unreliable human annotations.
Knowledge about the types
of plays a team runs
immediately after
regaining the ball has use
for tactical planning and
game adjustment.
Counter-attack data both for
and against each team
during the 2016–2017
English Premier League
Tracking
Spearman
76
Supervised
(classification)
Logistic regression and
Bayesian approach
Yes Presented a new approach to using
this tracking data to quantify off-
ball scoring opportunity. This
metric can be used as a leading
indicator of future player scoring,
and there are many possible
applications for the opportunity
model.
Can be used to improve
player’s off-ball move-
ment and improve pas-
sing decisions. Use as a
scouting tool to identify
players who are making
good movements off the
ball but are not being
rewarded with passes.
58 matches played between
teams from a 14-team
professional soccer league
during the 2017–2018
season
Both
ML: machine learning; PCA: principle component analysis; KNN: k-nearest neighbors; ANN: artificial neural network; DBSCAN: density based clustering algorithm; LSTM: long short term memory; EM:
expectation maximization.
Herold et al. 11
studies using machine learning approaches tend to use
the richer input data. In general, the use of machine
learning approaches in this section can be split into
four thematic sections. The first includes studies that
try to find patterns of pass sequences or evaluate
passes of individual players, while the second looks at
the same characteristics on a team level. The third line
of research investigated the use of time and space to
create goals and goal scoring opportunities. The last
section is concerned with defending or regaining the
ball (in this case the concept of pressing).
Pass pattern recognition and classification. Advanced ana-
lysis on offensive tactics has mostly included the study
of movement patterns and passing behaviour. For
instance, using simulation data, Gudmundsson and
Wolle
56
created a 2D model based on Frechet distance
and the average Euclidean distances between trajectory
paths to cluster sequences of passes between distinct
players. After inputting 23 trajectories representing
the movement of the ball and 22 players on the pitch,
the model automatically output a description of every
possible pass. They could then evaluate players’ ability
to execute a pass, become available for a pass and
receive a pass, as well as measure a player’s perception
and decision-making ability based on all passes a player
made, and every time the player did not make a pass.
The authors concluded that the incorporation of a clas-
sification scheme (including examples of ‘good’ and
‘bad’ passes determined by expert football analysts)
would be necessary before the tool could be used by
coaches and analysts on a practical level.
Implementing such a classification scheme, Horton
et al.
57
added to Gudmundsson and Wolle’s
56
motion
models by incorporating a multinomial logistic regres-
sion model (supervised learning). Given input data
from four home matches played by Arsenal football
club consisting of 2932 observations, the input features
included location of players, player trajectories, stra-
tegic positioning of a team based on dominant regions
58
and physiological attributes using a motion model to
determine how quickly a player can reach a given point.
As a heuristic to evaluate and validate the use of a
machine learning classifier to the inter-rater agreement
between the two observers, Cohen’s kappa coefficient
was calculated. Yielding a score of 0.393, the authors
concluded the experts were in moderate agreement.
From this, the model could predict the quality of a
pass of Good, Ok and Bad with an accuracy of
85.6%. Due to a limited number of observations, the
labels were then condensed from a 6-class problem to a
3-class problem. Although accuracy was high, precision
and recall were relatively low, stating that false nega-
tives and false positives were high. In other words,
passes classified as good were indeed ‘Bad’ or ‘Ok,’
and vice versa. The greatest limitations of this paper
are the absence of data and the use of only two obser-
vers, which is not enough to reach a consensus. Despite
these limitations, the methodology set the stage for
future work, including the design of improved predictor
variables.
Further expanding on the earlier work exploring
pass classification, a supervised machine learning
model was developed by Chawla et al.
59
to automatic-
ally classify the quality of all passes on the field. Passes
were labelled as ‘Good’, ‘OK’ or ‘Bad’, with an accur-
acy level of 90.2% between the classifier ratings and the
ratings made by a human observer. Whilst the accuracy
was high, the subjective rankings are limited in that
they did not give quantitative measurements of pass
effectiveness or information about ball velocity, ball
trajectory and what direction players were facing
Figure 1. Zone 14: By dividing the field into a six-by-three grid, there are 18 zones on the pitch. Zone 14 is the zone located in the
middle of the pitch immediately outside the penalty area.
12 International Journal of Sports Science & Coaching 0(0)
during each pass (common issue with tracking data).
Despite challenges, the technological progress enabling
analysts to classify passes can be used as a scouting
tool. By applying spatio-temporal metrics to player tra-
jectories, Feuerhake
26
could recognise and predict indi-
vidual and group movement patterns, including
differences between playing positions. For example,
the wing players use mainly straight runs fixed along
the sideline compared to the central midfielder display-
ing more freedom and significantly more turns. Due to
computational complexities, real-time information
about why a player might have changed their behaviour
could not be provided. Another study focused on creat-
ing an individual evaluation tool for player recruitment
that evaluated the passing ability of individual players
by controlling for the difficulty of their attempts based
on the probability of completion.
60
Contextual factors
such as the skill of the player and the conditions of the
player passing the ball were unable to be considered,
and only the distance between the passer and the recei-
ver could be derived directly from the model. Thus,
comparing players performing similar types of passes,
in similar circumstances, was most useful. The authors
suggested that future work should account for different
playing positions, defensive pressure, and rather than
their difficulty, passes should be evaluated by their
value for the team.
To sum up this section, authors used more complex
machine learning approaches to cluster (unsupervised)
or classify (supervised) passes. Compared to the event
data studies, a higher degree of data variety could be
observed as authors tried to enrich their models. In
addition, the methodology used in those studies was
solid, as they validated their models with a test set
that was not part of the training data.
Team passing behaviour. In a first attempt to utilise arti-
ficial intelligence to provide process-oriented tactical
insight in a football match, the validity of an unpredict-
able passing strategy was investigated by tracing play-
segments defined as spatio-temporal descriptions of ball
movement (where the ball started and ended) over fixed
windows of time.
52
Measuring ball trajectories in
380 games from the English Premier League showed a
high level of entropy, a measure of predictability for
team behaviour, and was an attribute of the top
5 ranked teams. This was the case, especially around
the penalty area where more defensive players are
trying to protect their goal.
61
The authors then
attempted to show how discriminative their entropy
map approach was by identifying the home versus
away team based by classification of playing style
using a k-nearest neighbour approach. Combining
their entropy map approach with 23 match statistics
currently used in analysis (e.g. passes, shots, tackles,
fouls, aerials, possession and time in-play), they
reached 47% classification accuracy. Coaches and ana-
lysts can apply this information to measure their team’s
entropy levels and that of their opponents. However,
knowledge about what types of off-ball movements
encouraged the greater variety of passing, and how spe-
cific passing sequences lead to goal-scoring were unable
to be identified.
Earlier work using notational analysis provided evi-
dence that direct counter-attacking tactics effective
against imbalanced defences (defined ‘as only without
second defender within 5 m estimated distance from
first defender’) are not necessarily effective against
balanced ones.
4
For that reason, evaluating the quality
of a pass should be considered by the effect a pass has
on the opposition. Accounting for the interactive
dynamics of both teams to determine the effects of pas-
sing behaviour, Goes et al.
62
considered previous
research on team centroid, spread and surface
63
in the
evaluation of defensive organisation. Goes et al.
62
cal-
culated the defensive disruptiveness (D-Def) score as an
index that represents the change in defensive organisa-
tion resulting from a pass. Based on the line formations
and starting formations (substitutions accounted for) of
teams provided by coaches before the match, D-Def
was calculated based on the displacement of the aver-
age Xand Ypositions (or centroids) for the full team,
as well as the defensive, midfield and attacking lines
between the moment a pass was given (t0) and 3 s
later (t0þ3). The authors discovered that greater
amounts of individual movement (I-Mov) occurring
after a pass result in a disruption of the D-Def.
Moreover, they could distinguish top, average and
low performance passes, and determine which players
are more effective passers using a one-way analysis of
variance comparing the I-Mov, D-Def, pass length,
pass angle and pass velocity in the top 10%, average
80% and bottom 10% passes ranked on D-Def score.
Consistent with the findings of Chassy,
64
the speed and
precision of passes are predictors of success, causing
greater D-Def scores. Although passes in a slightly
more forward direction produced the best D-Def
scores, passing angle was not a determining factor for
the effectiveness of passes. This was the first model that
did not favour passes in the forward direction, demon-
strating the value of backwards and sideways passes in
the overall attacking process. Still, previous studies
have shown that the teams with increased space control
in the attacking third have a greater chance of win-
ning,
65
with scoring probability increasing as the dis-
tance from the goal decreases and centrality increases.
66
Therefore, Goes et al.
62
suggested that to rate the actual
effectiveness of a pass, future work should incorporate
pitch values to measure space creation and investigate
the relationship between D-Def and game outcome
Herold et al. 13
(goals). Furthermore, by only including successful
passes, the authors were unable to measure decision-
making or determine whether a certain player was a
good passer overall.
In a related work analysing both player trajectories
and passing behaviour for two different teams in the
Bundesliga 2011–2012 season, Knauf et al.
67
used
spatio-temporal convolution kernels to extract strate-
gies used during the build-up phase of attack and
during scoring opportunities. The authors could iden-
tify the difference between teams utilising rehearsed
methods consisting of shorter passes amongst several
players compared with more chaotic approaches char-
acterised by long, straight passes.
The studies presented in this section show a lot of
potential for practical applications. A main weakness of
most of the presented studies is that their data sample is
on the edge of being big enough to conduct their used
methods.
Team behaviour related to time, space and goal-
scoring. Studies using a multiple regression model
68
and comparative analysis
69
have reported that success-
ful teams create a higher number of shots and shots on
target. Other studies using notational analysis suggest
that rather than the total number of shots, shot effect-
iveness best discriminated between successful and
unsuccessful performance.
44,70
Machine learning meth-
ods based on tracking data have also been used to
evaluate offensive tactics related to the management
of space and time.
71,72
Applying a machine learning
approach to gain insight into the process behind creating
more effective scoring chances, Lucey et al.
73
used the
spatio-temporal patterns of the 10-s window before a
shooting attempt to determine that expected goals
depend on several factors. These include the interaction
of surrounding players, speed of play, and in support of
the observational analysis by Schulze et al.,
74
shot loca-
tion and defender proximity to the shooter. Compared
to statistics such as shots, and shots on goal that do not
provide information on the quality of the shooting
attempt, their xG model could provide a better approxi-
mation about whether a team was ‘dominant’, with a
higher number of quality (high xG value) chances, or
‘lucky’, with significantly less quality chances but still
won the match. However, isolated plays such as their
finding that ‘a play which has the left-winger controlling
down the left uncontested and then slotting the ball
between the back four to a player in the six-yard box
results in a chance of 70.59%’ (p.7)
73
does not explain
the origins of that scenario, nor how the probabilities
may change depending on specific players being utilised
against various opponents.
In another study involving the concept of a 10-s
window before a shot, Power et al.
75
used tracking
data to evaluate passes based on risk, the likelihood
of executing a pass in each situation and reward, the
likelihood of a pass creating a chance. They defined
a Dangerous Pass (DP) as an attempted pass that has
a>6% chance of leading to a shot in the next 10 s. It
was discovered that the passes with the highest reward
(DP) occurred around the edge of the penalty area, and
these passes also have the highest risk. The ability of a
team to play high reward passes that lead to scoring
chances also depends on the function of players not in
possession of the ball. In a similar manner to calculat-
ing expected goals based on instantaneous game state,
Spearman
76
created opportunity maps based on spatio-
temporal tracking data to measure ‘off-ball scoring
opportunity’ (OBSO), the probability that a player cur-
rently not in possession of the ball will score. This study
also measured individual finishing ability, team attack-
ing trends and team decision-making within and
around the penalty area by comparing OBSO to
actual goals scored. The model suggested that certain
players get into dangerous positions but are not always
rewarded with a pass, and some players have distinct
zones of the pitch where they are more dangerous. This
information is valuable to coaches and analysts as a
tool to scout for prospective players, prepare for
specific opponents, evaluate offensive movement and
decision-making, and answers questions about how
some players’ off-ball movement gives them seemingly
greater ‘instinctive’ qualities. This model, however, fails
to assess how different types of defensive pressure and
player speed influence the ability to successfully deliver
a pass to a teammate. Furthermore, this method of
measuring OBSO does not include information about
how individual skill and awareness determine conver-
sion rates, and why some players and teams have lower
conversion rates than others.
Defence and pressing tactics related to attacking play. In a
continuous, interactive sport like football, there are fre-
quent turnovers of the ball from one team to the other.
Attacking sequences occurring quickly after recovering
the ball facilitate space exploitation in the defence. Due
to the inherent imbalance of a team transitioning from
offence to defence, counter-attacks generated a greater
number of high-quality shots, and top teams were
effective at both utilising and stopping counter-
attacks.
77
Hobbs et al.
77
implemented machine learning
to automatically identify the precise time-stamp of
counter-attacks and come up with a metric known as
‘offensive threat’, the likelihood of a shot being taken in
the next 10 s. The authors were also able to identify the
types of plays a team runs immediately after regaining
the ball. The challenge remains to identify how param-
eters such as the direction and total number of
passes, time taken and distance covered correspond
14 International Journal of Sports Science & Coaching 0(0)
to a counter-attack’s success. Nevertheless, machine
learning-based studies could exceed the capabilities of
notational analyses
78
and provide a method for evalu-
ating team tactical behaviour as well as understanding
that of their opponents.
Using advanced machine learning methodologies
called ‘deep imitation learning’ (a subfield of machine
learning that can learn and make intelligent decisions
on its own), researchers compared fine-grain movement
patterns from a season’s worth of tracking data from
the English Premier League.
29
Although it has only
been presented at a rather commercially oriented con-
ference and some critical distance should be exercised,
their ghosting method could access the location, vel-
ocity and acceleration of every player at the frame
level to visualize the defensive movement patterns of
a league average team. By fine-tuning the model, they
could also identify how a specific team would have
responded to an attacking situation. For example, if
the opponent produces a shot on goal, perhaps the
hypothetical ghosting player would have more pro-
actively closed the player’s passing lane that could
have prevented the shot. The capacity to estimate
how a team could have responded to an offensive situ-
ation enables coaches and analysts to measure the
effectiveness of defensive positioning. Moreover,
a broader understanding of defensive tactics and team
trends facilitates the scouting process and the develop-
ment of superior attacking strategies.
Current challenges and future directions
of machine learning in football
Current challenges of machine learning in football
The latest advances in technology have the potential to
add novelty and speed to exploring contextual variables
related to football. The automatic, quantitative analysis
that machine learning offers is beyond the scope of
observational analysis, as supervised classifying
models are able to produce the same and richer obser-
vational data.
59
We differentiated between two types of
input data: event data and tracking data. The main
findings from event data include the recognition of
team patterns and characteristics, and the identification
of key performance indicators as predictors of success.
The main findings from tracking data were more pro-
cess oriented, such as the determinants of effective
passes and the scoring probabilities of players not in
possession of the ball, or after quick regains. In general,
studies used classification/clustering to model decisions
experts normally do, and to predict goals, game out-
come and league success (see Figure 2).
Despite the advantages of tracking data, scenarios
involving quick, unpredictable movements including
frequent occlusions between players provide challenges
to practitioners in regards to the accuracy of the infor-
mation provided compared to what is occurring on the
pitch.
38
Future projects should be aimed at diminishing
these inherent errors,
38
and it is recommended that
practitioners use caution when comparing results
between different tracking systems. Nonetheless,
player tracking data are being used more and more
frequently by teams to turn raw data into useful
information.
Most studies using machine learning are performed
by computer scientists as they use more complex
approaches. However, the downside of using those com-
plex models is that they result in a ‘black box’ where a
result (model) is obtained (especially using neuronal net-
works), but determining what the important factors are
is not always possible. For instance, you can have a
passing model that does a good job quantifying passes,
but it does not tell if pass length, velocity, position etc.
0
2
4
6
8
10
12
14
16
Supervised
(classificaon)
Supervised
(predicon)
Unsupervised
(clustering)
Supervised and
Unsupervised
Figure 2. A frequency distribution of studies using supervised learning, unsupervised learning and both. Within supervised learning,
prediction (continuous y variable) versus classification (categorical y variable), are shown.
Herold et al. 15
makes it a good pass. Thus, it is hard to provide feed-
back to coaches and practitioners who favour straight-
forward analyses that provide a quick ‘snapshot’ of the
team’s performance.
7
Performance analysis research
including substantial and complex statistics and math-
ematical equations are not priorities for coaches, nor
have they been successfully integrated into coaching.
33
As such, the majority of machine learning analyst’s work
has been done by computer science research groups with
little involvement of sports scientists, match analysts or
coaches.
79
Additionally, many studies have aimed at
match prediction which offers little in terms of how con-
clusions are drawn outside of which team is more likely
to outscore their opponent.
25,80–82
Due to the lack of
interaction between practice and computer science, the
outcomes seldom transfer into practice, leaving consid-
erable room to improve upon applying knowledge
gained by the data to knowledge that can be applied to
the actual game. Therefore, it is suggested that machine
learning analysts/computer scientists, sports scientists
and football coaches/analysts combine to obtain more
accurate information with respect to individual and col-
lective performance that may influence the outcome of
football matches (Figure 3).
Another issue of applying machine learning to foot-
ball is the lack of a learning curve. Unlike other
domains that build on previous findings, there has
been a greater emphasis on trying to develop new,
fancy approaches. This means that research in this
field is more concerned with creating new machine
learning approaches rather than incorporating existing
approaches to build a better model for practice.
One example is the study by Goes et al.
62
that evaluated
passing performance. They mention in their discussion
section that their model would benefit from including
the actual playing formation rather than the starting
formations of a team, which could have been imple-
mented using the findings of Bialkowski et al.
22
From
a methods standpoint, early research studies rarely vali-
dated their results with a test sample, and several stu-
dies had relatively small samples (some with just one to
two games). With some exceptions (like expected
goals), more recent studies have improved upon this
and provide descriptions of their methods that allow
for rebuilding their model and redoing their analysis.
In doing so, practitioners can be sure that those
approaches can be applied to new data sets and hold
some predictive value, rather than overfitting the train-
ing data to show promising results within a publication.
One of the challenges for machine learning is to pro-
vide information about football beyond the capabilities
of the human observer. When spatio-temporal data
based on methods from computational geometry were
used to create an approximation algorithm to identify
pitch dominant regions and rate the quality of passes in
a football match, results indicated accuracy levels of
85.8%
57
and 90.2%,
59
an agreement between the
machine classifier and a football expert observer similar
in magnitude to the level of agreement between
two observers. Despite the accuracy between machine
learning and human observers, analysts are still using
observational notation analysis to study aspects of the
game that machine learning has not yet been able to
satisfy. For example, in studying the subtle behaviours
that separate the top players from the rest, observa-
tional notation analysis was used to study English
Premier League players’ visual exploration (moving
their bodies and heads to enable perception of the
full, 360, external environment) in the 10 s prior to
receiving the ball.
83
Results showed that players
Pass classification
Pitch zones
Expected Goals
Metric
Off-Ball Scoring
Opportunity
Optapro
STATS
Prozone
Impect
SciSports
Second Spectrum
Companies that try to take
revenue for data-science
services to clubs
Researchers. Sport and
data-scientists working
with football (soccer)
data
Clubs trying to make use of
data and formulate needs
to companies
Cooperations with
Companies
Performance Analyst
at almost every
professional Team
Data-Scientists in
Clubs
Conferences, where
researchers and
practitioners discuss new
possibilities and trends
MIT Sloan Sports
Analytics Conference
OptaPro Analytics
Forum
ACM SIGKDD
International
Conference on
Knowledge Discovery
and Da
t
a Minin
g
Figure 3. A collaborative effort between sport scientists, computer scientists and football clubs will optimise the application of
machine learning in a more relevant manner.
16 International Journal of Sports Science & Coaching 0(0)
exhibiting higher frequency of visual explorations are
more consistent in completing passes to their team-
mates, especially midfielders making forward passes.
If machine learning can measure these visual explor-
ations combined with positioning data, it would save
analysts’ time, amplify the amount of data that can be
collected and help teams find better playing solutions in
specific match contexts.
Whilst positional data have improved, the machine
learning algorithms need further refining and the gap in
analysing event data in unison with football theory
needs narrowing.
84
To improve transferability to prac-
tice, mathematical-based measures that are not the
highest priority for coaches need to be simplified,
and the research should incorporate important aspects
of football match performance that are not yet
fully understood. Some of these areas include team
adaptability, communication, penetrating defensive
lines, how possession is regained and the effects of
playing at varying tempos during different phases of
the match.
33
Future directions of machine learning in football
As technology enhances analysts’ ability to compare
and value performance, more data continue to drive a
revolution in football analytics. At present, machine
learning is a new concept in football, and more research
is necessary to realise its potential to inform coaches
and analysts on a practical level. Further studies
should aim to use larger samples and include both
training and testing data sets to allow for feedback
and model validation. Providing clear descriptions of
the steps of their approach and the methods section (or
sharing via GitHub) will improve subsequent models
and improve applicability.
If machine learning can decipher situations quickly
and reliably, it would demonstrate a practical impact
not yet apparent in the literature. There are also other
pertinent questions related to football that machine
learning can address in future research. Some of these
questions include understanding more about how off-
ball movement characteristics impact the decision-
making and passing ability of a team. Furthermore,
information about how different defensive schemes
influence ways of penetrating the defence is needed,
including the constellations of players and off-ball
movements that are most effective.
Conclusion
To date, most of the match analysis work has predom-
inantly used simple description and associations
between variables. Moreover, advanced analyses have
been driven by computer scientists and research-based
approaches that lack practicality and adoptability by
coaches and teams. It is possible that the integration
of machine learning with coaches and analysts in
applied settings can account for a larger number of
interacting variables, providing teams with practical
information at faster speeds. However, relying on
increasingly complex data analysis techniques will
also present new challenges for future sports scientists.
It is not only a matter of improving the machine learn-
ing techniques, but the challenge of representing the
knowledge in a way that can be understood and utilised
in practice. This implies the use of multi-disciplinary
approaches including computer science research
groups and sports scientists competent in football to
interpret the relevant value of the information and pat-
terns produced by the machine.
Acknowledgements
The authors would like to thank Matthias Kempe for con-
structive criticism of the article.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of
interest with respect to the research, authorship, and/or pub-
lication of this article: Mat Herold is supported by a ‘Science
and Health in Football scholarship’ funded by the Deutscher
Fußball-Bund (DFB).
Funding
The author(s) received no financial support for the research,
authorship, and/or publication of this article.
ORCID iD
Mat Herold https://orcid.org/0000-0002-5885-7504
References
1. Hughes M and Franks I. Analysis of passing sequences,
shots and goals in soccer. J Sports Sci 2005; 23: 509–514.
2. Winkler W. Computer/video analysis in German soccer.
In: Hughes M (ed.) Notational analysis of sport. Cardiff:
UWIC, 1996, pp.19–31.
3. Tenga A, Mortensholm A and O’Donoghue P. Opposition
interaction in creating penetration during match play in
elite soccer: Evidence from UEFA champions league
matches. Int J Perform Anal Sport 2017; 17: 802–812.
4. Tenga A, Holme I, Ronglan LT, et al. Effect of playing
tactics on achieving score-box possessions in a random
series of team possessions from Norwegian professional
soccer matches. J Sports Sci 2010; 28: 245–255.
5. Lago-Ballesteros J, Lago-Pen
˜as C and Rey E. The effect of
playing tactics and situational variables on achieving
score-box possessions in a professional soccer team.
J Sports Sci 2012; 30: 1455–1461.
6. Ruiz-Ruiz C, Fradua L, Fernandez-Garcia A, et al.
Analysis of entries into the penalty area as a performance
indicator in soccer. Eur J Sport Sci 2013; 13: 241–248.
Herold et al. 17
7. Carling C, Le Gall F, McCall A, et al. Squad manage-
ment, injury, and match performance in a professional
soccer team over a championship-winning season. Eur J
Sport Sci 2015; 15: 573–582.
8. Rossi A, Pappalardo L, Cintia P, et al. Effective injury
forecasting in soccer with GPS training data and machine
learning. PloS one 2018; 13: e0201264.
9. Carling C, Reilly T and Williams AM. Performance
assessment for field sports. London: Routledge, 2008.
10. Brud L. The International Football Association Board.
Amendments to the laws of the game-2015/2016 and
information on the completed reform of The
International Football Association Board, https://resour-
ces.fifa.com/mm/document/affederation/ifab/02/60/91/
38/circular_log_amendments_2015_v1.0_en_neutral.pdf
(2015, accessed 19 February 2017).
11. Memmert D and Perl J. Game creativity analysis using
neural networks. J Sports Sci 2009; 27: 139–149.
12. Clemente FM, Couceiro MS, Martins F, et al. An online
tactical metrics applied to football game. Res J Appl Sci
Eng Technol 2013; 5: 1700–1719.
13. Johnson JG. Cognitive modeling of decision making in
sports. Psychol Sport Exerc 2006; 7: 631–652.
14. Kelso J. Phase transitions and critical behavior in human
bimanual coordination. Am J Physiol-Regul Integr Comp
Physiol 1984; 246: R1000–R1004.
15. Laube P, Imfeld S and Weibel R. Discovering relative
motion patterns in groups of moving point objects. Int
J Geog Inf Sci 2005; 19: 639–668.
16. Prieto J, Go
´mez M-A
´and Sampaio J. From a static to a
dynamic perspective in handball match analysis: A sys-
tematic review. Open Sports Sci J 2015; 8(1): 25–34.
17. Bartlett R, Button C, Robins M, et al. Analysing team
coordination patterns from player movement trajectories
in soccer: Methodological considerations. Int J Perform
Anal Sport 2012; 12: 398–424.
18. Kempe M, Vogelbein M and Nopp S. The cream of the
crop: Analysing FIFA world cup 2014 and Germany’s
title run. J Hum Sport Exerc 2016; 11: 42–52.
19. Mackenzie R and Cushion C. Performance analysis in
football: A critical review and implications for future
research. J Sports Sci 2013; 31: 639–676.
20. Sarmento H, Marcelino R, Anguera MT, et al. Match
analysis in football: A systematic review. J Sports Sci
2014; 32: 1831–1843.
21. Wagenaar M, Okafor E, Frencken W, et al. Using
deep convolutional neural networks to predict goal-
scoring opportunities in soccer. In: 6th International
conference on pattern recognition applications and meth-
ods, Porto, Portugal, 24–26 February 2017, pp.448–455.
22. Bialkowski A, Lucey P, Carr P, et al. Discovering team
structures in soccer from spatiotemporal data. IEEE
Trans Knowl Data Eng 2016; 28: 2596–2605.
23. Wei X, Sha L, Lucey P, et al. Large-scale analysis of
formations in soccer. In: 2013 International conference
on digital image computing: Techniques and applications
(DICTA), Hobart, Australia, 26–28 November, 2013,
pp.1–8. New York: IEEE.
24. Bialkowski A, Lucey P, Carr P, et al. Large-scale analysis
of soccer matches using spatiotemporal tracking data. In:
2014 IEEE international conference on data mining
(ICDM), Shenzhen, China, 14–17 December 2014,
pp.725–730. New York: IEEE.
25. Fernando T, Wei X, Fookes C, et al. Discovering meth-
ods of scoring in soccer using tracking data. In: KDD
workshop on large-scale sports analytics, Sydney,
Australia, 10 August 2015.
26. Feuerhake U. Recognition of repetitive movement
patterns—The case of football analysis. ISPRS Int J
Geo-Inf 2016; 5: 208.
27. Memmert D, Lemmink KA and Sampaio J. Current
approaches to tactical performance analyses in soccer
using position data. Sports Med 2017; 47: 1–10.
28. Bialkowski A, Lucey P, Carr P, et al. Win at home and
draw away: Automatic formation analysis highlighting
the differences in home and away team behaviors. In:
Proceedings of 8th annual MIT sloan sports analytics con-
ference, Boston, MA, 2014, pp.1–7.
29. Le HM, Carr P, Yue Y, et al. Data-driven ghosting using
deep imitation learning. In: MIT sloan sports analytics
conference, Boston, MA, 3–4 March 2017.
30. James N, Mellalieu S and Hollely C. Analysis of strate-
gies in soccer as a function of European and domestic
competition. Int J Perform Anal Sport 2002; 2: 85–103.
31. Grehaigne J-F, Bouthier D and David B. Dynamic-
system analysis of opponent relationships in collective
actions in soccer. J Sports Sci 1997; 15: 137–149.
32. Couceiro MS, Clemente FM, Martins FM, et al.
Dynamical stability and predictability of football players:
The study of one match. Entropy 2014; 16: 645–674.
33. McLean S, Salmon PM, Gorman AD, et al. What’s in a
game? A systems approach to enhancing performance
analysis in football. PloS one 2017; 12: e0172565.
34. Bloomfield J, Jonsson G, Polman R, et al. Temporal pat-
tern analysis and its applicability in soccer. In: Anolli L,
Duncan S, Magnusson M, et al. (eds) The hidden structure
of social interaction: From genomics to cultural patterns.
Amsterdam: IOS Press, 2005, pp.237–251.
35. Vilar L, Arau´ jo D, Davids K, et al. The role of ecological
dynamics in analysing performance in team sports. Sports
Med 2012; 42: 1–10.
36. Wallace JL and Norton KI. Evolution of World Cup
soccer final games 1966–2010: Game structure, speed
and play patterns. J Sci Med Sport 2014; 17: 223–228.
37. Perin C, Vuillemot R, Stolper C, et al. State of the art of
sports data visualization. In: Computer Graphics Forum
2018, pp.663–686. Wiley Online Library.
38. Linke D, Link D and Lames M. Validation of electronic
performance and tracking systems EPTS under field con-
ditions. PloS one 2018; 13: e0199519.
39. McHale IG and Relton SD. Identifying key players and
in soccer teams using network analysis and pass diffi-
culty. Eur J Oper Res 2018; 268: 339–347.
40. Olthof SB, Frencken WG and Lemmink KA. When
something is at stake: Differences in soccer performance
in 11 vs. 11 during official matches and training games.
J Strength Cond Res 2019; 33: 167.
41. Owen A, Twist C and Ford P. Small-sided games: The
physiological and technical effect of altering pitch size
and player numbers. Insight 2004; 7: 50–53.
18 International Journal of Sports Science & Coaching 0(0)
42. Buchheit M, Allen A, Poon TK, et al. Integrating differ-
ent tracking systems in football: Multiple camera semi-
automatic system, local position measurement and GPS
technologies. J Sports Sci 2014; 32: 1844–1857.
43. Almeida CH, Duarte R, Volossovitch A, et al. Scoring
mode and age-related effects on youth soccer teams’
defensive performance during small-sided games.
J Sports Sci 2016; 34: 1355–1362.
44. Lago-Pen
˜as C and Dellal A. Ball possession strategies in
elite soccer according to the evolution of the match-score:
The influence of situational variables. J Hum Kinet 2010;
25: 93–100.
45. Hirano S and Tsumoto S. Grouping of soccer game rec-
ords by multiscale comparison technique and rough clus-
tering. In: Hybrid intelligent systems, international
conference, Rio de Janeiro, Brazil, 6–9 November 2005,
pp.399–404. New York: IEEE.
46. Montoliu R, Martı
´n-Fe
´lez R, Torres-Sospedra J, et al.
Team activity recognition in Association Football using
a Bag-of-Words-based method. Hum Mov Sci 2015; 41:
165–178.
47. Pappalardo L and Cintia P. Quantifying the relation
between performance and success in soccer. Adv
Complex Syst 2017; 21: 1750014.
48. Brooks J, Kerr M and Guttag J. Using machine learning
to draw inferences from pass location data in soccer. Stat
Anal Data Min 2016; 5: 338–349.
49. Horn R, Williams M and Ensum J. Attacking in central
areas: A preliminary analysis of attacking play in the
2001/2002 FA Premiership season. Insight 2002; 3: 28–31.
50. Grant A and Williams M. Analysis of the final 20
matches played by Manchester United in the 1998–99
season. Insight 1999; 3: 42–45.
51. Barron D, Ball G, Robins M, et al. Artificial neural net-
works and player recruitment in professional soccer. PloS
one 2018; 13: e0205818.
52. Rathke A. An examination of expected goals and shot effi-
ciency in soccer. J Hum Sport Exerc 2017; 12: 514–529.
53. Ruiz H, Lisboa P, Neilson P, et al. Measuring scoring
efficiency through goal expectancy estimation. In:
ESANN 2015 proceedings of the European symposium on
artificial neural networks, computational intelligence and
machine learning, Bruges, Belgium, 22–24 April 2015,
pp.149–154. Belgium: Presses universitaires de Louvain.
54. Eggels RvEH and Pechenizkiy M. Explaining soccer
match outcomes with goal scoring opportunities predict-
ive analytics. In: 3rd Workshop on machine learning
and data mining for sports analytics, Riva del Garda,
Italy, 19 September 2016.
55. Yue Z, Broich H, Seifriz F, et al. Mathematical analysis
of a soccer game. Part I: individual and collective behav-
iors. Stud Appl Math 2008; 121: 223–243.
56. Gudmundsson J and Wolle T. Football analysis using
spatio-temporal tools. Comput Environ Urban Syst
2014; 47: 16–27.
57. Horton M, Gudmundsson J, Chawla S, et al. Automated
classification of passing in football. In: Pacific-Asia
conference on knowledge discovery and data mining,
Ho Chi Minh, Vietnam, 19–22 May 2015, pp.319–330.
New York: Springer International Publishing.
58. Taki T and Hasegawa J-i. Visualization of dominant
region in team games and its application to teamwork
analysis. In: Proceedings computer graphics international
2000, Geneva, Switzerland, 19–24 June 2000, pp.227–235.
New York: IEEE.
59. Chawla S, Estephan J, Gudmundsson J, et al.
Classification of passes in football matches using spatio-
temporal data. ACM Trans Spatial Algorithms Syst 2017;
3: 6.
60. Szczepan
´ski Land McHale I. Beyond completion rate:
Evaluating the passing ability of footballers. J R Stat Soc
2016; 178: 513–533.
61. Lucey P, Bialkowski A, Carr P, et al. Characterizing
multi-agent team behavior from partial team tracings:
Evidence from the English premier league. In: 26th
AAAI conference on artificial intelligence, Toranto,
Ontario, Canada, 22–26 July 2012.
62. Goes FR, Kempe M, Meerhoff LA, et al. Not every pass
can be an assist: A data-driven model to measure pass
effectiveness in professional soccer matches. Big Data
2018; 7(1): 57–70.
63. Frencken W, Lemmink K, Delleman N, et al. Oscillations
of centroid position and surface area of soccer teams in
small-sided games. Eur J Sport Sci 2011; 11: 215–223.
64. Chassy P. Team play in football: How science supports
FC Barcelona’s training strategy. Psychology 2013; 4: 7.
65. Rein R, Raabe D and Memmert D. ‘‘Which pass is
better?’’ Novel approaches to assess passing effectiveness
in elite soccer. Hum Movement Sci 2017; 55: 172–181.
66. Link D, Lang S and Seidenschwarz P. Real time quanti-
fication of dangerousity in football using spatiotemporal
tracking data. PloS one 2016; 11: e0168768.
67. Knauf K, Memmert D and Brefeld U. Spatio-temporal
convolution kernels. Mach Learn 2016; 102: 247–273.
68. Oberstone J. Differentiating the top English premier
league football clubs from the rest of the pack:
Identifying the keys to success. J Quant Anal Sports
2009; 5(3): 1–29.
69. Castellano J, Casamichana D and Lago C. The use of
match statistics that discriminate between successful and
unsuccessful soccer teams. J Hum Kinet 2012; 31:
137–147.
70. Szwarc A. Effectiveness of Brazilian and German teams
and the teams defeated by them during the 17th FIFA
World Cup. Kinesiol Int J Fundam Appl Kinesiol 2004; 36:
83–89.
71. Grunz A, Memmert D and Perl J. Tactical pattern rec-
ognition in soccer games by means of special self-organiz-
ing maps. Hum Movement Sci 2012; 31: 334–343.
72. Perl J, Grunz A and Memmert D. Tactics analysis in
soccer–an advanced approach. Int J Comput Sci Sport
2013; 12: 33–44.
73. Lucey P, Bialkowski A, Monfort M, et al. Quality vs
quantity: Improved shot prediction in soccer using stra-
tegic features from spatiotemporal data. In: 8th Annual
MIT sloan sports analytics conference, Boston, MA, 2014,
pp.1–9.
74. Schulze E, Mendes B, Maurı
´cio N, et al. Effects of pos-
itional variables on shooting outcome in elite football. Sci
Med Football 2018; 2: 93–100.
Herold et al. 19
75. Power P, Ruiz H, Wei X, et al. Not all passes are created
equal: Objectively measuring the risk and reward of
passes in soccer from tracking data. In: Proceedings of
the 23rd ACM SIGKDD international conference on know-
ledge discovery and data mining, Halifax, NS, Canada,
13–17 August 2017, pp.1605–1613. New York: ACM.
76. Spearman W. Beyond expected goals. In: MIT sloan
sports analytics conference, Boston, MA, 23–24
February 2018.
77. Hobbs J, Power P, Sha L, et al. Quantifying the value of
transitions in soccer via spatiotemporal trajectory cluster-
ing. In: MIT sloan sports analytics conference, Boston,
MA, 23–24 February 2018.
78. Barreira D, Garganta J, Guimara
˜es P, et al. Ball recovery
patterns as a performance indicator in elite soccer.
J Sports Eng Technol 2014; 228: 61–72.
79. Rein R and Memmert D. Big data and tactical analysis in
elite soccer: Future challenges and opportunities for
sports science. Springerplus 2016; 5: 1410.
80. Hucaljuk J and Rakipovic
´A. Predicting football scores
using machine learning techniques. In: Proceedings of the
34th International Convention MIPRO, Opatija, Croatia,
23–27 May 2011, pp.1623–1627. New York: IEEE.
81. Joseph A, Fenton NE and Neil M. Predicting football
results using Bayesian nets and other machine learning
techniques. Knowl-Based Syst 2006; 19: 544–553.
82. Tax N and Joustra Y. Predicting the Dutch football com-
petition using public data: A machine learning approach.
IEEE Trans Knowl Data Eng 2015; 10: 1–13.
83. Jordet G, Bloomfield J and Heijmerikx J. The hidden
foundation of field vision in English Premier League
(EPL) soccer players. In: Proceedings of the MIT sloan
sports analytics conference, Boston, MA, 1–2 March
2013.
84. Drust B and Green M. Science and football: Evaluating
the influence of science on performance. J Sports Sci
2013; 31: 1377–1382.
20 International Journal of Sports Science & Coaching 0(0)
... Football is the sport with the largest viewership worldwide https://www.worldatlas.com/articles/what-are-the-mostpopular-sports-in-the-world.html, with an economic activity estimated at billions of dollars annually [1]. Aided by the availability of big data and machine learning sports analytics approaches are increasingly used to predict player, team and coach performance [2][3][4][5][6] Prediction of future performance is very valuable for selecting promising players and coaches, and the accuracy of such predictions can be easily measured with holdout data. ...
Article
Full-text available
We investigate the impact of risk-taking on football match outcomes within the context of substitutions. The study uses an extensive dataset of 75 thousand substitutions spanning eight seasons across six leagues, encompassing team, match and manager characteristics, and interim results. Given the intricate web of potential drivers and the absence of a well-defined theoretical framework, conventional econometric approaches fall short. Instead, Causal Forest, a black-box causal machine learning (ML) technique, was adopted to estimate the heterogeneous treatment effects, coupled with robustness checks. Its estimates are explained through Generalized Additive Models (GAM), visualizing how different factors affect the risk-taking probability and moderate the treatment effect of risk-taking on match outcomes. Fidelity of the explanation is improved by segmenting the dataset to capture strong interaction effects. The study reveals that risk-taking propensity peaks when a team trails by 2-3 goals and tapers when leading by the same margin. Considering impact of risk-taking on match outcomes, interestingly, younger managers exhibit superior performance compared to middle-aged ones, while older counterparts excel in later substitutions. The manager’s tenure with the team increases the impact, tapering in the long run. Earlier substitutions, stronger teams have greater impact when risk-taking. Teams leading by one goal stand to benefit most from risk-taking in substitutions. This research underscores the synergistic potential of black-box causal machine learning and interpretable models, given large representative datasets with all confounders. Insights into football risk-taking dynamics have implications for managerial strategies.
... Artificial intelligence has been used for game and player analysis for several years, especially in American football and baseball (Li, & Xu, 2021), and is gradually leading technological development in different disciplines. For example, research has shown that analyzing various types of data can help support coaches' tactical decision-making (Herold et al., 2019) as well as training and competition planning (Me & Unold, 2011). This type of analysis has great potential to prevent injuries (Claudino et al., 2019). ...
Chapter
Full-text available
This research explores the transformative impact of the integration of technology in basketball, both on and off the court, from training and performance enhancement for athletes, coaches and conditioners, to data analytics for statistical coaches, administrators and spectator engagement. It also highlights the importance of the use of technology in player development, equipment development, injury prevention, pre-game and in-game strategies, and team-fan interaction, as well as the fact that technology not only improves the overall quality of basketball but also opens new avenues for growth and innovation in sport.
... Além disso, a análise de dados esportivos concentra-se na descoberta do valor dos dados e fornece recursos e informações valiosos para empresas e empresários. Diversos esportes beneficiam-se dessas análises na literatura, principalmente utilizando inteligência artificial, especificamente algoritmos de Aprendizado de Máquina (AM), como basquetebol [Nguyen et al. 2022 No entanto, a demanda por abordagens mais automatizadas em diversas modalidades esportivas está aumentando [Herold et al. 2019], incluindo esportes em equipe. Um exemplo é a National Football League (NFL), o principal campeonato de futebol americano, que movimenta bilhões de dólares anualmente [Durand et al. 2021]. ...
Conference Paper
Com o avanço no desenvolvimento de tecnologias, a análise de informações sobre esportes tornou-se uma questão cada vez mais desafiadora. Além disso, as bases de dados disponíveis para estudos de previsão de resultados são limitadas. Considerando isso, neste artigo serão estudados conceitos relacionados ao futebol americano e modelagem de algoritmos de Aprendizado de Máquina (AM), aplicados para a predição de times ganhadores ou perdedores com base nos dados das últimas temporadas. Como resultados, é possível observar que técnicas de AM, quando combinadas com uma quantidade expressiva de dados e com os devidos tratamentos, podem fornecer bons resultados na previsão de resultados de partidas esportivas.
... This is due to the development of new and innovative technologies that can be used to e.g., measure and analyse athletic performance such as use of inertial wearables in a range of sports (5)(6)(7)(8)(9). Some of the most common technologies used in sport science include wearable devices, motion capture systems and data analytics, including use of machine learning (10). Those technologies are used for a variety of purposes in sport science, including improving athletic performance, reducing injury frequency, rehabilitating injuries and/or analysing tactical performance. ...
... Here we focus on the movement behavior of football players. Football games have recently attracted scientific attention (Brillinger, 2007;Herold et al., 2019;Lucey et al., 2013;Rein & Memmert, 2016;Sarmento et al., 2014). Movement analysis of football players has also been studied in GIScience (Andrienko et al., 2016, 2017, 2019. ...
Article
Full-text available
Movement analysis is distinguished by an emphasis on understanding via observation and association. However, an important component of movement from the human and computer modeling perspective is the processes that bring about movement behavior in the first place. This article contextualizes the graphical causal modeling framework (for association, intervention, and counterfactual causal analysis) in GIScience, and more specifically within movement analysis studies. This is done by modeling the movement behavior of football players, applied to spatiotemporal data generated by an agent‐based simulation. The movement dataset is thoroughly analyzed to infer the statistical associations among its variables, to estimate the effect of an intervention on some of those variables, and to answer a few counterfactual questions from the observations. We conclude that causal graphs (i.e., directed acyclic graphs), if implemented correctly, can assist analysts in infering causal relations from movement data. This research suggests the integration of causal graphs and agent‐based paradigms as one solution for computational movement analysis.
Article
The objective of this study is to build a mathematical model that predicts the success of a goal kick in soccer. The model is based on an ensemble of neural networks whose inputs are five features extracted directly from the goal kick and one more that depends on the opposing team. This new variable is calculated using a hierarchical cluster analysis and divides the possible opponents by their defensive strategy. It is shown that it has a relevant importance in the result of a goal kick, which validates the idea of using the presented methodology that takes into account the opponent’s tactics when analyzing specific plays of a soccer match.
Chapter
Identifying the best-suited player position is one of the most important tasks in the game of football. But there is no structured way to identify the right talent. This paper aims at designing a talent identification model with the help of machine learning algorithms that helps in identifying nine different player positions by using the physiological, technical, tactical, and psychological attributes of a player. Three multi-classification algorithms such as naïve Bayes, decision tree, and random forests are used for building the model, and Bayesian optimization process is applied for tuning the hyperparameters. Kappa values and AUC are used for evaluating the models. Random forest classifier provided the highest accuracy of 93.1%, and the variable importance plot showcased that technical attributes influenced the player position more than the other attributes.
Article
Full-text available
11 vs. 11 training games are used to mimic the official match, but differ in playing duration and a consequence of winning or losing. Anxiety levels, crowd pressure, and the intention to win are examples of constraints present in the match, but absent or less prevalent in training. The aim is, therefore, to compare soccer performance in official matches with 11 vs. 11 training games. Six elite youth soccer teams played 5 official matches and 15 training games. Soccer performance, defined as a combination of game characteristics (game duration, transitions, and ball possession duration) and physical (distance covered, high-intensity distance, and sprints), technical (passing), and team tactical performance (inter-team and intra-team distances) and corresponding interaction patterns, was determined with video footage and positional data (local position measurement system). Soccer performance in official matches differed from similar training games, in a way that players covered more distance, sprinted more often, but game pace was lower and players made more mistakes. In addition, team width was smaller and length-per-width ratio larger and teams were tighter coupled in official matches. 11 vs. 11 training games can be used to mimic the match, in particular the team tactical performance. Coaches could increase physical and technical representativeness of training games by raising the stakes and increasing the consequence of winning or losing.
Article
Full-text available
The aim was to objectively identify key performance indicators in professional soccer that influence outfield players’ league status using an artificial neural network. Mean technical performance data were collected from 966 outfield players’ (mean SD; age: 25 ± 4 yr, 1.81 ±) 90-minute performances in the English Football League. ProZone’s MatchViewer system and online databases were used to collect data on 347 indicators assessing the total number, accuracy and consistency of passes, tackles, possessions regained, clearances and shots. Players were assigned to one of three categories based on where they went on to complete most of their match time in the following season: group 0 (n = 209 players) went on to play in a lower soccer league, group 1 (n = 637 players) remained in the Football League Championship, and group 2 (n = 120 players) consisted of players who moved up to the English Premier League. The models created correctly predicted between 61.5% and 78.8% of the players’ league status. The model with the highest average test performance was for group 0 v 2 (U21 international caps, international caps, median tackles, percentage of first time passes unsuccessful upper quartile, maximum dribbles and possessions gained minimum) which correctly predicted 78.8% of the players’ league status with a test error of 8.3%. To date, there has not been a published example of an objective method of predicting career trajectory in soccer. This is a significant development as it highlights the potential for machine learning to be used in the scouting and recruitment process in a professional soccer environment.
Article
Full-text available
In professional soccer, nowadays almost every team employs tracking technology to monitor performance during trainings and matches. Over the recent years, there has been a rapid increase in both the quality and quantity of data collected in soccer resulting in large amounts of data collected by teams every single day. The sheer amount of available data provides opportunities as well as challenges to both science and practice. Traditional experimental and statistical methods used in sport science do not seem fully capable to exploit the possibilities of the large amounts of data in modern soccer. As a result, tracking data are mainly used to monitor player loading and physical performance. However, an interesting opportunity exists at the intersection of data science and sport science. By means of tracking data, we could gain valuable insights in the how and why of tactical performance during a soccer match. One of the most interesting and most frequently occurring elements of tactical performance is the pass. Every team has around 500 passing interactions during a single game. Yet, we mainly judge the quality and effectiveness of a pass by means of observational analysis, and whether the pass reaches a teammate. In this article, we present a new approach to quantify pass effectiveness by means of tracking data. We introduce two new measures that quantify the effectiveness of a pass by means of how well a pass disrupts the opposing defense. We demonstrate that our measures are sensitive and valid in the differentiation between effective and less effective passes, as well as between the effective and less effective players. Furthermore, we use this method to study the characteristics of the most effective passes in our data set. The presented approach is the first quantitative model to measure pass effectiveness based on tracking data that are not linked directly to goal-scoring opportunities. As a result, this is the first model that does not overvalue forward passes. Therefore, our model can be used to study the complex dynamics of build-up and space creation in soccer.
Conference Paper
Full-text available
Abstract: Many models have been constructed to quantify the quality of shots in soccer. In this paper, we evaluate the quality of off-ball positioning, preceding shots, that could lead to goals. For example, consider a tall unmarked center forward positioned at the far post during a corner kick. Sometimes the cross comes in and the center forward heads it in effortlessly, other times the cross flies over his head. Another example is of a winger, played onside, while making a run in past the defensive line. Sometimes the through-ball arrives; other times the winger must break off their run because a teammate has failed to deliver a timely pass. In both circumstances, the attacking player has created an opportunity even if they never received the ball. In this paper, we construct a probabilistic physics-based model that uses spatio-temporal player tracking data to quantify such off-ball scoring opportunities (OBSO). This model can be used to highlight which, if any, players are likely to score at any point during the match and where on the pitch their scoring is likely to come from. We show how this model can be used in three key ways: 1) to identify and analyze important opportunities during a match 2) to assist opposition analysis by highlighting the regions of the pitch where specific players or teams are more likely to create off-ball scoring opportunities 3) to automate talent identification by finding the players across an entire league that are most proficient at creating off-ball scoring opportunities.
Article
Full-text available
Injuries have a great impact on professional soccer, due to their large influence on team performance and the considerable costs of rehabilitation for players. Existing studies in the literature provide just a preliminary understanding of which factors mostly affect injury risk, while an evaluation of the potential of statistical models in forecasting injuries is still missing. In this paper, we propose a multi-dimensional approach to injury forecasting in professional soccer that is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We then construct an injury forecaster and show that it is both accurate and interpretable by providing a set of case studies of interest to soccer practitioners. Our approach opens a novel perspective on injury prevention, providing a set of simple and practical rules for evaluating and interpreting the complex relations between injury risk and training performance in professional soccer.
Article
Full-text available
The purpose of this study was to assess the measurement accuracy of the most commonly used tracking technologies in professional team sports (i.e., semi-automatic multiple-camera video technology (VID), radar-based local positioning system (LPS), and global positioning system (GPS)). The position, speed, acceleration and distance measures of each technology were compared against simultaneously recorded measures of a reference system (VICON motion capture system) and quantified by means of the root mean square error RMSE. Fourteen male soccer players (age: 17.4±0.4 years, height: 178.6±4.2 cm, body mass: 70.2±6.2 kg) playing for the U19 Bundesliga team FC Augsburg participated in the study. The test battery comprised a sport-specific course, shuttle runs, and small sided games on an outdoor soccer field. The validity of fundamental spatiotemporal tracking data differed significantly between all tested technologies. In particular, LPS showed higher validity for measuring an athlete’s position (23±7 cm) than both VID (56±16 cm) and GPS (96±49 cm). Considering errors of instantaneous speed measures, GPS (0.28±0.07 m⋅s⁻¹) and LPS (0.25±0.06 m⋅s⁻¹) achieved significantly lower error values than VID (0.41±0.08 m⋅s⁻¹). Equivalent accuracy differences were found for instant acceleration values (GPS: 0.67±0.21 m⋅s⁻², LPS: 0.68±0.14 m⋅s⁻², VID: 0.91±0.19 m⋅s⁻²). During small-sided games, lowest deviations from reference measures have been found in the total distance category, with errors ranging from 2.2% (GPS) to 2.7% (VID) and 4.0% (LPS). All technologies had in common that the magnitude of the error increased as the speed of the tracking object increased. Especially in performance indicators that might have a high impact on practical decisions, such as distance covered with high speed, we found >40% deviations from the reference system for each of the technologies. Overall, our results revealed significant between-system differences in the validity of tracking data, implying that any comparison of results using different tracking technologies should be done with caution.
Article
Full-text available
In this report, we organize and reflect on recent advances and challenges in the field of sports data visualization. The exponentially‐growing body of visualization research based on sports data is a prime indication of the importance and timeliness of this report. Sports data visualization research encompasses the breadth of visualization tasks and goals: exploring the design of new visualization techniques; adapting existing visualizations to a novel domain; and conducting design studies and evaluations in close collaboration with experts, including practitioners, enthusiasts, and journalists. Frequently this research has impact beyond sports in both academia and in industry because it is i) grounded in realistic, highly heterogeneous data, ii) applied to real‐world problems, and iii) designed in close collaboration with domain experts. In this report, we analyze current research contributions through the lens of three categories of sports data: box score data (data containing statistical summaries of a sport event such as a game), tracking data (data about in‐game actions and trajectories), and meta‐data (data about the sport and its participants but not necessarily a given game). We conclude this report with a high‐level discussion of sports visualization research informed by our analysis—identifying critical research gaps and valuable opportunities for the visualization community. More information is available at the STAR's website: https://sportsdataviz.github.io/.
Article
We use a unique dataset to identify the key members of a football team. The methodology uses a statistical model to determine the difficulty of a pass from one player to another, and combines this information with results from network analysis, to identify which players are pivotal to each team in the English Premier League during the 2012–13 season. We demonstrate the methodology by looking closely at one game, whilst also summarising player performance for each team over the entire season. The analysis is hoped to be of use to managers and coaches in identifying the best team lineup, and in the analysis of opposition teams to identify their key players.
Article
The aim of this study is to compare how penetrations were created between the Finalists and Non-finalists by assessing opposition interaction in elite soccer. Sample included data from 12 matches played from the round of 16 to the final of the UEFA Champions League season 2010/2011. Differences in creating dangerous penetrations were found only after controlling for the effects of opponent’s defensive balance. Three way repeated measures ANOVA revealed that the interaction of team status and opponent’s defensive balance had a meaningful effect on the percentage of penetrative ball actions into dangerous spaces (F2,20 = 2.9, p = 0.076, partial η² = 0.227). Finalists performed a higher percentage of dangerous penetrative ball actions per match than Non-finalists when playing against an imbalanced defence (89.2 ± 14.0 vs. 77.6 ± 13.6), while Non-finalists performed a higher percentage when playing against balanced (25.8 ± 10.7 vs. 16.1 ± 12.5) and beginning imbalanced (32.8 ± 10.9 vs. 29.1 ± 9.2) defences. Results suggest that effective exploitation of spaces within and behind the last line of opponent’s defence is an important determinant of successful offensive performance in soccer. The assessment of opposition interaction is of critical importance when analysing elite soccer performance.
Article
Objective: To analyse shooting situations in elite football (soccer) by incorporating opponent positioning for shots from the same location. Methods: All attempts by foot during four seasons (2011–2012 until 2014–2015) of an elite team's home matches were analysed. Three locations with the highest density of shots (n = 21, 24, and 20) were identified and selected. The positioning of defenders (closest, first and number of opponents in line to the goal) and the goalkeeper was determined and analysed for the outcome of each shot (being a goal, on target, or based on placement). Results: For shots from a tight angle, the distance of the closest defender correlated with improved shot placement (r = –0.39, P = 0.04). For shots from a close range, the sight of goal affected shots going on or off target (ES = 0.77, P = 0.03). Finally, for shots from a longer range, none of the positional variables correlated with shot outcome. Conclusions: Adding context to shooting performance, by taking opponent positions into account, increased the amount of relevant information for the analysis of shooting performance. Since different variables were found to relate to the outcome of shots from different positions, the significance of contextualising match events has been highlighted.