ArticlePDF Available

Abstract

In professional soccer, increasing amounts of data are collected that harness great potential when it comes to analysing tactical behaviour. Unlocking this potential is difficult as big data challenges the data management and analytics methods commonly employed in sports. By joining forces with computer science, solutions to these challenges could be achieved, helping sports science to find new insights, as is happening in other scientific domains. We aim to bring multiple domains together in the context of analysing tactical behaviour in soccer using position tracking data. A systematic literature search for studies employing position tracking data to study tactical behaviour in soccer was conducted in seven electronic databases, resulting in 2338 identified studies and finally the inclusion of 73 papers. Each domain clearly contributes to the analysis of tactical behaviour, albeit in - sometimes radically - different ways. Accordingly, we present a multidisciplinary framework where each domain's contributions to feature construction, modelling and interpretation can be situated. We discuss a set of key challenges concerning the data analytics process, specifically feature construction, spatial and temporal aggregation. Moreover, we discuss how these challenges could be resolved through multidisciplinary collaboration, which is pivotal in unlocking the potential of position tracking data in sports analytics.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tejs20
European Journal of Sport Science
ISSN: 1746-1391 (Print) 1536-7290 (Online) Journal homepage: https://www.tandfonline.com/loi/tejs20
Unlocking the potential of big data to support
tactical performance analysis in professional
soccer: A systematic review
F.R. Goes, L.A. Meerhoff, M.J.O. Bueno, D.M. Rodrigues, F.A. Moura, M.S.
Brink, M.T. Elferink-Gemser, A.J. Knobbe, S.A. Cunha, R.S. Torres & K.A.P.M.
Lemmink
To cite this article: F.R. Goes, L.A. Meerhoff, M.J.O. Bueno, D.M. Rodrigues, F.A. Moura, M.S.
Brink, M.T. Elferink-Gemser, A.J. Knobbe, S.A. Cunha, R.S. Torres & K.A.P.M. Lemmink (2020):
Unlocking the potential of big data to support tactical performance analysis in professional soccer:
A systematic review, European Journal of Sport Science, DOI: 10.1080/17461391.2020.1747552
To link to this article: https://doi.org/10.1080/17461391.2020.1747552
© 2020 The Author(s). Published by Informa
UK Limited, trading as Taylor & Francis
Group
View supplementary material
Published online: 16 Apr 2020. Submit your article to this journal
View related articles View Crossmark data
REVIEW
Unlocking the potential of big data to support tactical performance
analysis in professional soccer: A systematic review
F.R. GOES
1
, L.A. MEERHOFF
2
, M.J.O. BUENO
5
, D.M. RODRIGUES
3
,
F.A. MOURA
5
, M.S. BRINK
1
, M.T. ELFERINK-GEMSER
1
, A.J. KNOBBE
2
,
S.A. CUNHA
4
, R.S. TORRES
3
, & K.A.P.M. LEMMINK
1
1
Center for Human Movement Sciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen,
The Netherlands;
2
Leiden Institute of Advanced Computer Sciences (LIACS), Leiden University, Leiden, The Netherlands;
3
Institute of Computing (IC), University of Campinas, Campinas, Brazil;
4
Sport Sciences Department (DCE), University of
Campinas, Campinas, Brazil &
5
Sport Sciences Department, State University of Londrina, Londrina, Brazil
Abstract
In professional soccer, increasing amounts of data are collected that harness great potential when it comes to analysing tactical
behaviour. Unlocking this potential is difficult as big data challenges the data management and analytics methods commonly
employed in sports. By joining forces with computer science, solutions to these challenges could be achieved, helping sports
science to find new insights, as is happening in other scientific domains. We aim to bring multiple domains together in the
context of analysing tactical behaviour in soccer using position tracking data. A systematic literature search for studies
employing position tracking data to study tactical behaviour in soccer was conducted in seven electronic databases,
resulting in 2338 identified studies and finally the inclusion of 73 papers. Each domain clearly contributes to the analysis
of tactical behaviour, albeit in sometimes radically different ways. Accordingly, we present a multidisciplinary
framework where each domains contributions to feature construction, modelling and interpretation can be situated. We
discuss a set of key challenges concerning the data analytics process, specifically feature construction, spatial and temporal
aggregation. Moreover, we discuss how these challenges could be resolved through multidisciplinary collaboration, which
is pivotal in unlocking the potential of position tracking data in sports analytics.
Keywords: Football, big data, tactical analysis, team sport, performance analysis
Highlights
.Over the recent years, there has been a considerable growth in studies on tactical behaviour using position tracking data,
especially in the domains of sports science and computer science. Yet both domains have contributed distinctly different
studies, with the first being more focused on developing theories and practical implications, and the latter more on
developing techniques.
.Considerable opportunities exist for collaboration between sports science and computer science in the study of tactics in
soccer, especially when using position tracking data.
.Collaborations between the domains of sports science and computer science benefit from a stronger dialogue yielding a
cyclical collaboration.
.We have proposed a framework that could serve as the foundation for the combination of sports science and computer
science expertise in tactical analysis in soccer.
1. Introduction
Increasingly large amounts of data are collected in
professional soccer for the purpose of match analysis.
Player positions are tracked continuously during
practice and competition using state-of-the-art track-
ing systems (Rein & Memmert, 2016). Due to recent
technological innovations, there has been a particular
© 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License
(http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any
medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.
Correspondence: F. Goes, Center for Human Movement Sciences, University of Groningen, UMCG, Antonius Deusinglaan 1, 9713 AV,
Groningen, The Netherlands. E-mail: f.r.goes@umcg.nl
European Journal of Sport Science, 2020
https://doi.org/10.1080/17461391.2020.1747552
increase in systems and devices that collect and
provide position tracking data. These innovations
have been embraced and widely adopted by pro-
fessional sports organizations, and the use of data is
broadly considered as a potential game-changer in
professional sports (Rein & Memmert, 2016).
However, there is still a lot to be gained, as the avail-
ability of data has increased much more rapidly than
the scientific advancements required to valorise data
in the domain of soccer (Rein & Memmert, 2016).
One of the more interesting opportunities provided
by the availability of position tracking data in soccer is
the study and analysis of tactical behaviour. Tactical
behaviour is an important determinant of perform-
ance in team sports like soccer, and refers to how a
team manages its spatial positioning over time to
achieve a shared goal (i.e. scoring), while interacting
with the opponent under constraints of the con-
ditions of play (Gréhaigne, Godbout, & Bouthier,
1999; Rein & Memmert, 2016). In the past, the
analysis of tactical behaviour has mostly been based
on manually annotated data and observation by
experts (Rein & Memmert, 2016). As these assess-
ments mainly describe what happens with the ball,
they only provided insights into the who and
what, and albeit with poor accuracy the
where, and whenof on-ball behaviour (Vilar,
Araujo, Davids, & Travassos, 2012). However, as
tactical behaviour is the result of the interaction
between all players on and off the ball (Gréhaigne
et al., 1999; Rein & Memmert, 2016), truly analysing
the mechanisms behind it requires accurate data on
all 22 players and the ball. Therefore, position track-
ing data provides the opportunity to accurately study
the mechanisms behind tactical behaviour in soccer.
However, despite its potential in the analysis of tacti-
cal behaviour, so far it has mainly been used to deter-
mine player activity profiles to monitor player loading
and subsequently prescribe training loads (Sarmento
et al., 2014).
The large amounts of position tracking data chal-
lenge the data management and analytics methods
native to sports (Gandomi & Haider, 2015), and
unlocking its potential in the study of tactical behav-
iour requires solving these challenges first (Rein &
Memmert, 2016). Although data can be considered
bigbased on the three Vs (volume, variety, and vel-
ocity Gandomi & Haider, 2015), there are no univer-
sal benchmarks for these dimensions. Whether a
dataset is considered big or not heavily depends on
the interplay between these dimensions, and is gener-
ally considered to be domain specific (Gandomi &
Haider, 2015). One could consider data bigwhen
it exceeds the three-V tipping point: the point
where traditional data management and analysis
methods become inadequate (Gandomi & Haider,
2015). The overall process of deriving information
from position tracking data can be divided into two
components: data management and data analytics
(Gandomi & Haider, 2015; Labrinidis & Jagadish,
2012). These components can each be divided
further into various sub processes, each associated
with their own challenges (Gandomi & Haider,
2015; Labrinidis & Jagadish, 2012). Challenges to
the data management component have been
thoroughly addressed in previous reviews. Manafi-
fard, Ebadi, and Moghaddam (2017) for example
provide a detailed review on the strengths and weak-
nesses of optical tracking systems, and what could be
done when it comes to (pre-)processing to improve
data collection with these systems in the future (Man-
afifard et al., 2017). In other examples Stein et al.
(2017) and Rein and Memmert (2016) (both specific
to soccer) and Gandomi and Haider (2015) (in
general) have addressed the various data streams
that need to be brought together in the analysis, and
how this poses a challenge to data management
systems commonly employed in soccer (Gandomi
& Haider, 2015; Rein & Memmert, 2016; Stein
et al., 2017). Challenges to data analytics on the
other hand, and specifically the challenge of aggregat-
ing raw position data into interpretable spatiotem-
poral features that capture the complex dynamics of
tactical behaviour, have received considerably less
attention so far.
Contributions from the domain of sports science
and the domain of computer science are typically
characterized by distinctly different research para-
digms. Research from the domain of sports science
on tactical behaviour is generally characterized by
deductive reasoning in forming a hypothesis and
designing an (experimental) study. Teams are for
example considered as complex dynamical systems
and hypotheses regarding their behaviour are formu-
lated based on expectations rooted in such a theoreti-
cal perspective (Araújo et al., 2015; Balague,
Torrents, Hristovski, Davids, & Araújo, 2013;
Seifert, Araújo, Komar, & Davids, 2017) To study
whether soccer teams behave like dynamical
systems, and to study how manipulating constraints
affects the systems behaviour, data is typically col-
lected for a specific research purpose, after the
research question has been formulated. In most
sports science contributions, this means data is col-
lected in an experimental setting most frequently
a set of manipulated small-sided games which is
designed based on the research question and related
hypotheses. The raw position tracking data is then
usually aggregated into features that operationalize
the hypotheses and represent group level behaviour,
such as team centroids or team surface areas
(Frencken, Lemmink, Delleman, & Visscher, 2011;
2F. Goes et al.
Memmert, Lemmink, & Sampaio, 2017). A feature
like the team centroid reduces the complex behaviour
of a group of players into interpretable behaviour by
aggregating their movements into a single feature,
in the case of the centroid, representing the average
positions at a point in time. These aggregated features
are then used to study the interaction between groups
over time. This can be insightful for the development
of specific theories. However, by reducing the teams
performance to these aggregated features, relevant
aspects of the complexity of this behaviour may be
overlooked. Aggregating the behaviour of 11 players
into one feature, like the centroid, might for
example, fail to capture the different movements of
sub-units (i.e. defensive line) on the team and
thereby fail to fully capture the complexity of tactical
behaviour.
On the other hand, contributions from the domain
of computer science, as well as the application of its
techniques also described as data science’–
utilize a distinctly different research paradigm. Com-
puter science concerns the theoretical foundations of
(computationally retrieving) information, typically
yielding advanced analyses and high-level represen-
tations of large and complex data (Gudmundsson &
Horton, 2017). For example, Knowledge Discovery
(also referred to as Data Mining) is all about identi-
fying the robustness of patterns that are found
without formulating hypotheses about the existence
of these patterns. Although both sports- and compu-
ter science adopt a deductive approach, the type of
empirical evidence for these deductions is radically
different. In sports science, (experimental) research
designs typically aim to confirm or reject a hypothesis
that was formulated based on theory as discussed in
the previous paragraph. In computer science, new
modelling techniques are evaluated by testing the
robustness of the generated model. This quantifi-
cation of robustness can then be used to verify
whether a discovered pattern was significant:How
likely is it that this pattern was found by chance? In
other words, whether the technique worked success-
fully is deduced based on the empirical evidence to
quantify the robustness. Explorative techniques
such as subgroup discovery (Grosskreutz & Rüping,
2009) have the benefit that patterns can be discovered
based on how interestingthey are, for example
based on how accurate the pattern is (ratio between
true positives and false negatives) or how many
instances it applies to. Typically, computer science
techniques have been developed in the context of
large datasets with many possible patterns to
explore, as it is not always clear which patterns can
be expected a-priori. From position tracking data,
many features can be derived resulting in a multitude
of features. Therefore, the data mining tools from
computer science are well-suited to deal with the
complexity of position tracking data.
One could argue that unlocking the full potential
of big data for sports science and practice
requires bringing the two domains, and thus two
distinctly different paradigms, together, as their
contributions can be regarded complimentary.
Doing so however, requires one to understand the
challenges and opportunities of a multidisciplinary
interplay between the domains of sports- and com-
puter- science (Rein & Memmert, 2016). Several
authors have addressed this question in previously
published narrative studies: Rein and Memmert
(2016) have discussed the potential of applying
big data in tactical analysis, but also discussed
how it challenges the methodological approaches
native to sports sciences. Memmert et al. (2017)
have applied techniques from both domains to a
position tracking dataset of one professional match
to illustrate the potential of using contributions
from both domains. Gudmundsson and Horton
(2017) have provided an overview of mostly
computer science techniques available in sports for
the study of spatiotemporal behaviour. Stein et al.
(2018)have described the entire process from
data acquisition, to storage, to ultimately analysis
and interpretation, in an attempt to provide an
overview of different segments of the process of uti-
lizing big data for performance analysis. Although
these studies all refer to challenges as well as the
potential of multidisciplinary collaboration, none
of these studies actually put the contributions
from both domains into one framework nor do
they discuss the operationalization of such a
collaboration.
The integration of fundamental computer science
work into applied settings (i.e. data science) has
been discussed in other applied domains, illustrating
the benefits of integrating these techniques in differ-
ent settings. Gandomi and Haider (2015) have dis-
cussed the challenges and opportunities of applying
big data in general, while more specific examples of
integrating computer science techniques in specific
settings outside of sports include forecasting and
pattern mining of financial time-series in economics
(Cao & Tay, 2003), development of individual
video recommendation systems in media and enter-
tainment (Davidson, 2010), and spatiotemporal
analysis of geographical data in geographic and
earth sciences (Peuquet & Duan, 1995). These
examples illustrate that application of techniques
from computer science can support analysis and
innovation in other areas. With the current review,
we aim to outline a framework that integrates contri-
butions from the domains of sports science and com-
puter science in the study and analysis of tactical
Unlocking the potential of big data to support tactical performance analysis 3
behaviour in soccer using position tracking data, and
discuss the additional insights that can be gained
from this integration. We specifically focus on the
identification of challenges and opportunities with
regard to the utilization of expertise from the
domains of sports science and computer science, as
both domains benefit from a conceptual model that
outlines where each domain complements the other
in analysing tactical behaviour in soccer using pos-
itional tracking data.
2. Methods
2.1. Literature search
A systematic review of the available literature was
conducted according to PRISMA (Preferred Report-
ing Items for Systematic reviews and Meta-analyses)
guidelines (Moher et al., 2015). A literature search
was conducted on 14 June 2019 to identify studies
that report the use of position tracking data to
analyse tactical behaviour in soccer (Figure 2).
Specifically, the following electronic databases were
searched: Science Direct, Dimensions, Computer
Science Bibliographies, PubMed, Scopus, ACM
Digital Library, IEEE Xplore.
Titles and/or abstracts of all records in an elec-
tronic database were searched for the combination
of the following search terms: soccer OR football
AND tacticOR strategOR formationOR inter-
player OR interteam OR spatiotemporal NOT
robo.
Furthermore, additional studies to consider were
identified by manually searching the reference lists
of included papers.
2.2. Study selection
To be considered for this review, studies had to
concern tactical behaviour and meet the inclusion cri-
teria outlined in Table I. For the purpose of this
review, tactical behaviour was defined as how a
team or individual manages its spatial position over
time to achieve a shared goal (i.e. scoring), while
adapting to, and interacting with the opponent
under constraints of the conditions of play (Gré-
haigne et al., 1999). We operationalized this by
searching for studies that at least included data and
analysis on the interactions in space and time on
the inter-team as well as intra-team level.
The first author conducted the first selection based
on titles and abstracts conducted by the first author.
Any study that clearly not met the inclusion criteria
was excluded at this stage. When a confident decision
based on the title and abstract could not be made, the
study was included for full-text analysis. Next, the eli-
gibility for inclusion was assessed based on analysis of
full-text papers by the first author of this review. The
final selection was then validated by at least one of the
co-authors. Any ambiguities regarding the inclusion
of papers of the review until consensus was reached.
2.3. Data extraction
All included studies were classified as sports science
(1) or computer science (2) based on the journal or
conference they were published in, as well as the
associated keywords. Next, information on data col-
lection was extracted. To review the contributions
of all studies to the components of feature construc-
tion and modelling & analysis (Figure 1), we
extracted data on the spatial aggregation features,
window selection, and techniques applied for analy-
sis. Furthermore, data was extracted on the link
with match performance, the problem definition or
aim of the study, and the inclusion of a theoretical
definition of tactical behaviour to review the inter-
pretability of all included studies. Finally, all findings
were categorized and put into a single framework
(Figure 2), that will serve as the context for the dis-
cussion of our findings, and as a proposed structure
for the utilization of expertise from the domains of
sports science and computer science in the study
and analysis of tactical behaviour. All data extraction
Table I. In- and exclusion criteria for the systematic literature search.
Inclusion criteria Exclusion criteria
.Published in the last 15 years
.Full-text publication English
.Published as a peer-review journal or conference paper
.Tactical analysis based on position tracking data (LPM, GPS or Optical Tracking)
.Data collected in matches or SSGs
.Data collected in soccer
.No full-text available (in English)
.Analysis based only on notational data
.Data collected in futsal
.Data available for only one team
.Data available of less than two players
Notes: LPM, Local Position Measurement system with Radio Frequency Identification (RFID) (Frencken et al., 2010); GPS, Global
Positioning System; SSGs, Small-sided games.
4F. Goes et al.
was based on full-text assessment by the first author
of this review. Data extraction tables (Supplementary
Data) were developed based on consensus between
all authors.
3. Results
The initial database search returned 2290 records to
be considered for inclusion. An additional 48 papers
were identified based on manual inspection of the
reference lists of already included papers (see Identi-
ficationin Figure 1). As a result, a total of 2338
records were screened based on title and abstract,
of which 146 were considered for full-text assessment
(see Screeningin Figure 1). After full-text assess-
ment, 73 records were excluded because they did
not meet our inclusion criteria (see Eligibility in
Figure 1). The remaining 73 records (Aguiar, Gon-
çalves, Botelho, Lemmink, & Sampaio, 2015;
Andrienko et al., 2017; Aquino et al., 2016a,
2016b; Baptista et al., 2018; Batista et al., 2019;
Barnabé, Volossovitch, Duarte, Ferreira, & Davids,
2016; Bartlett, Button, Robins, Dutt-Mazumder, &
Kennedy, 2012; Bialkowski et al., 2014a,2014b,
2014c,2016; Castellano, Fernandez, Echeazarra,
Barreira, & Garganta, 2017; Chawla, Estephan, Gud-
mundsson, & Horton, 2017; Clemente, Couceiro,
Martins, Mendes, & Figueiredo, 2013a,2013b,
2014; Couceiro, Clemente, Martins, & Machado,
2014; Coutinho et al., 2017,2018; Duarte et al.,
2012,2013a,2013b; Fernandez & Bornn, 2018; Fig-
ueira, Gonçalves, Masiulis, & Sampaio, 2018; Filetti,
Ruscello, DOttavio, & Fanelli, 2017; Folgado, Gon-
çalves, Abade, & Sampaio, 2014a; Frencken et al.,
2011; Frencken, De Poel, Visscher, & Lemmink,
Figure 1. Flowchart of systematic literature search (conform PRISMA guidelines) where the number of included studies during each of the
stages of the search process is shown. The main reasons for exclusion based on full-text assessment, as well as the number of included studies
are shown at the bottom.
Unlocking the potential of big data to support tactical performance analysis 5
2012; Frencken, van der Plaats, Visscher, &
Lemmink, 2013; Frias & Duarte, 2014; Gonçalves,
Figueira, Maçãs, & Sampaio, 2014; Gonçalves
et al., 2017a,2017b; Gonçalves, Marcelino, Torres-
Ronda, Torrents, & Sampaio, 2016; Grunz,
Memmert, & Perl, 2012; Gudmundsson & Wolle,
2010; Janetzko et al., 2014; Janetzko, Stein, Sacha,
& Schreck, 2016; Knauf, Memmert, & Brefeld,
2016; Link, Lang, & Seidenschwarz, 2016;
Machado et al., 2017; Memmert et al., 2017;
Memmert, Raabe, Schwab, & Rein, 2019; Moura,
Barreto Martins, Anido, De Barros, & Cunha,
2012; Moura et al., 2013,2016; Olthof, Frencken,
& Lemmink, 2015,2018,2019; Power, Ruiz, Wei,
& Lucey, 2017; Ramos, Lopes, Marques, & Araújo,
2017; Rein, Raabe, & Memmert, 2017; Ric et al.,
2017; Sampaio, Lago, Gonçalves, Macas, & Leite,
2014; Sampaio & Macas, 2012; Siegle & Lames,
2013; Silva et al., 2014a,2014b,2015,2016a,
2016b; Spearman, Basye, Dick, Hotovy, & Pop,
2017; Stein et al., 2015,2016; Travassos, Gonçalves,
Marcelino, Monteiro, & Sampaio, 2014; Vilar,
Araujo, Davids, & Bar-Yam, 2013,2014a,2014b;
Wei, Sha, Lucey, Morgan, & Sridharan, 2013; Yue,
Broich, Seifriz, & Mester, 2008a,2008b;Zhang,
Beernaerts, Zhang, & de Weghe, 2016) were
Figure 2. Conceptual framework for the combination of sports science (translucent red bars) and computer science (translucent blue bars)
expertise in the study of tactical behaviour in soccer. Based on the results from the current systematic review. Bars with percentage represent
the relative occurrence of a certain method or feature within a domain. Abbreviations: SSG, Small-Sided Games; LPM, Local Position
Measurement.
6F. Goes et al.
included for analysis in the review. Of the included
papers, 54 (74%) were qualified as sports science
papers and 19 (26%) as computer science papers.
Below, we will describe the results of our systema-
tic analysis of the literature. We examine various cat-
egories, including: Problem Definition, Data
Collection, Spatial Aggregation, Temporal Aggrega-
tion, and Modelling & Interpretation. We analyse
the included studies numerically, by describing how
often various categories occur. Moreover, we sum-
marize the different categories in a visual framework
where we combine the expertise from sports- and
computer-science domains (see Figure 2). This
figure will be used as a guide to explain the body of
literature that encompasses the study of tactical be-
haviour. Full details and data extracted from the
included studies can be found in the supplementary
data.
3.1. Problem definition
In most included sports science studies, research
questions were driven by theoretical or practical
domain expertise from for example, physiology, be-
havioural science or psychology. Studies frequently
aimed for practical implications, and study designs
and data collection result from the research question.
When looking at the problem definitions and aims of
the included sports science papers, 63% studied the
effect of an intervention on tactical behaviour, as is
illustrated by the work of Olthof et al. (2018,2019),
who studied the effect of manipulating pitch sizes
on tactical behaviour in different age groups, and
the work of Gonçalves et al. (2016,2017a,2017b),
who studied the effect of numerical imbalance
between teams on tactical behaviour (Gonçalves
et al., 2017a,2017b; Ric et al., 2017). Twenty
percent studied a variable/method to quantify tactical
behaviour, as is illustrated by the work of (Link et al.,
2016), who conceptualized a new feature called dan-
gerousityto quantify offensive impact. Finally, 17%
studied the relationship between variables (see
Problem Definitionin Figure 2), as for example
illustrated in the work of Rein et al. (2017) who
studied the relation between pass effectiveness quan-
tified by the change in space control and number of
outplayed defenders and success in 103 Bundesliga
games (Rein et al., 2017).
In most included computer science studies on the
other hand, research questions were driven by theor-
etical and methodological domain expertise from for
example computer sciences, mathematics or data
science. These studies frequently aimed for new
methodological approaches and techniques rather
than practical implications. Furthermore, in many
cases the design could be considered data-driven:
rather than formulating hypotheses based on theory
and collecting data in an experimental set-up to test
these hypotheses, studies used large sets of available
data and generated hypotheses from the data. When
looking at the problem definitions of these studies,
5% studied the effect of an intervention or constraint,
as there is the work by Bialkowski et al. (2014a,
2014b,2014c), studying the impact of home-advan-
tage on the dynamic formation of a team on the
pitch. The majority (84%) of computer science con-
tributions however, studied a new technique or
model (mostly classification or clustering problems),
like the work by Fernandez and Bornn (2018), who
proposed an improved model for measuring space
control, the work by Andrienko et al. (2017), propos-
ing a new feature to quantify pressure on a player, or
the work by Bialkowski et al. (2014a,2014b,2014c)
and the work by Grunz et al. (2012) proposing new
methods to identify patterns and formation in the
data (Bialkowski et al., 2014a,2014b,2014c; Grunz
et al., 2012). Finally, 11% studied prediction or prob-
ability problems, as illustrated in the work by Spear-
man et al. (2017), or Chawla et al. (2017), who
proposed models to predict if a pass would arrive at
a team-mate or not (Chawla et al., 2017; Spearman
et al., 2017) (see Problem Definitionin Figure 2).
3.2. Data collection
The type, quality, and quantity of data strongly
influences the research questions that can be
answered within the study of tactical behaviour, as
well as the approach that can be used (see Data
Collectionin Figure 2). Most studies (64%) used
optical tracking data as this is the system of
choice in many professional competitions. As
opposed to LPM and GPS systems, optical tracking
systems typically allow tracking of the ball.
However, they are also known to have a lower accu-
racy in comparison to wearable tracking devices,
especially LPM (Frencken, Lemmink, & Delleman,
2010). Work by Mara, Morgan, Pumpa, and
Thompson (2017) revealed optical tracking
systems suffer measurement errors in the range of
2.5 m2.5 m in measuring covered distance on
20100 m (change of direction) runs (Mara et al.,
2017). Although these errors could limit the use
of optical tracking data for the analysis of physical
performance, the subsequent errors of 00.5 m in
measuring position still allow for accurate assess-
ment of tactical behaviour, as the error margin is
small enough for data to still represent actual pos-
itions. Only a minority (18%) of the studies used
ball tracking, and a much larger part of the
Unlocking the potential of big data to support tactical performance analysis 7
studies (42%) used the more time-consuming nota-
tional event data to study ball events. Sensor
systems (36%) and experimental designs (48%)
like small-sided games (SSGs) were exclusively
used in sports science studies. As sensor systems
do not allow ball tracking, event-based analyses
are impossible without notational event data
(Figure 2).
3.3. Spatial aggregation
Tracking the X and Y position of 22 players and the
ball 1100 times a second results in sizeable
amounts of data, even for one match, as well as a
high complexity as the 22 degrees of freedom of the
system allow for numerous potential interactions.
Therefore, most studies aggregate raw position data
by reducing the spatial positions of all players into
spatial features. More specifically, spatial aggregation
refers to the process of constructing features that
capture group-level behaviour per timeframe and
allow one to derive contextual meaning, as these fea-
tures reduce the systems complexity to an interpret-
able level (see Spatial Aggregationin Figure 2).
These features can be constructed at the macro
level (full team), as for example in work by Frencken
et al. (2012), who aggregated the positions of the
team into one team cerntroid, at the micro-level (sub-
groups of at least two players), like in the work by
Memmert et al. (2017), who aggregated the positions
of a subgroup (e.g. defensive line) into a line centroid,
or even at the level of the individual, as in the work by
Olthof et al. (2015), who measured the average dis-
tance of all players to the team centroid (e.g. stretch
index). Furthermore, combinations of spatial aggre-
gates can be used to construct composite measures
of spatial (sub-)group interactions, as for example
presented in the work by Goes, Kempe, Meerhoff,
& Lemmink, 2019, who constructed a measure of
pass effectiveness by using line centroids, team
spread and team surface areas. Most sports science
studies (84%) used some form of spatial aggregation,
most frequently (57%) centroid related features
(Frencken et al., 2011; Yue et al., 2008a,2008b), fol-
lowed by team surface areas and spread (Moura et al.,
2012) (46%), length and width (Folgado, Lemmink,
Frencken, & Sampaio, 2014b) (30%), and space
control (Rein et al., 2017) (7%). Distribution
amongst computer science studies is somewhat
similar, with 58% of the studies using spatial aggre-
gates, specifically centroid features (32%), length
and width (11%) and space control (11%).
However, as data mining techniques can directly
be applied to the positional data without aggregating
it into features, a small minority of the sports science
studies (16%), and nearly half of the computer
science studies (42%) do not use spatial aggregation.
In these cases, patterns in the raw data can, for
example, be detected using unsupervised machine
learning techniques like clustering, as is illustrated
by the work of Grunz et al. (2012), Knauf et al.
(2016), and Machado et al. (2017), who all mine pat-
terns in the data by clustering the raw positions in
some way (Grunz et al., 2012; Knauf et al., 2016;
Machado et al., 2017). Furthermore, machine learn-
ing techniques also allow for the inclusion of many
features and studying their non-linear relationships,
like there is the work by Power et al. (2017), and
Spearman et al. (2017), who model pass risk and
reward and the probability of a pass arriving and
include a multitude of features (Power et al., 2017;
Spearman et al. 2017). In many of these computer
science contributions, the algorithm does feature
selection automatically. The main benefit of this is
that instead of creating features based on a-priori
assumed relationships between entities, (hidden)
relationships can be uncovered from the data. As fea-
tures are not created and selected based on expec-
tations of the user, but rather based on their
importance in the algorithm, they could prove to be
a better depiction of patterns in the data.
3.4. Temporal aggregation
To extract information, statistically compare, or
model time-series of either raw data or aggregated
spatial features, data needs to be aggregated within
the temporal domain as well (see Temporal Aggrega-
tionin Figure 2). Temporal aggregation refers to the
summation of data over a given time-window, by for
example computing the mean value of a given feature.
We consider three different methods for temporal
aggregation: first of all, data can be aggregated (e.g.
averaged) over time windows with a fixed size, inde-
pendent of the context of the game (Sampaio &
Macas, 2012). In such methodologies, for example,
data is aggregated over the course of a half or full
match, or another time window with a fixed duration.
Secondly, data can also be aggregated over a window
with a fixed size that is linked to match events. An
example is looking at the 3 s following a pass (Goes
et al., 2019), or the 30 s before a goal (Frencken
et al., 2012). Finally, data can be aggregated over
windows with a flexible size. In these cases,
windows are always linked to events with variable
durations like a sequence of passes or running trajec-
tories (Rein et al., 2017; Spearman et al. 2017). The
majority of sports science studies (60%) utilized fixed
windows in which they often aggregate spatial data
over the course of a full SSG or match, while only a
minority aggregates over fixed (9%) or flexible
8F. Goes et al.
(24%) event-based windows. However, the majority
of computer science studies aggregated over fixed
(26%) or flexible (42%) event-based windows, and
only a minority (32%) aggregated over fixed
windows independent of context.
3.5. Modelling & interpretation
Most included sports science studies utilized statisti-
cal models, and models rooted in the dynamical
systems theory like relative-phase (Palut & Zanone,
2005) and entropy (Pincus, 1991,1995) analyses
that allow for time-series analysis. These models are
generally based on linear relationships and allow
comparison of multiple conditions, the study of
relationships between variables, and testing specific
hypotheses. Furthermore, they are interpretable on
the level of individual features. Most computer
science studies on the other hand used methods
that are in comparison computationally complex
(i.e. require more computations and therefore more
processing power), like various machine learning
approaches. These approaches allow the study of
(non-) linear complex relationships amongst many
different features and the discovery of hidden pat-
terns in the data, but require specific (programming)
skills and often high-performance computing clus-
ters, and can be harder to interpret, especially
without the methodological domain expertise.
To be able to interpret the practical impact of a
study on behaviour, it needs to be clear what (tactical)
behaviour was actually studied, and how changing
this behaviour impacts performance (see Modelling
& Interpretationin Figure 2). Only 19% explicitly
defined tactical behaviour, of which only one study
Janetzko et al. (2014) was classified as a computer
science study. Analysing the extracted definitions,
three common elements were identified: Tactical
performance/behaviour refers to (1) the dynamic
positioning and organisation in space and time, of a
team and its players on the pitch, in interaction with
and adapting to the movement of the ball, (2) move-
ment of the opponents, and conditions of play, (3)
and constitutes more than just the sum of individual
parts. As according to these criteria tactical behaviour
is emergent, it cannot be studied by breaking down
the behaviour of a team into 11 individual parts and
analysing them separately, as behaviour is the result
of interaction. Furthermore, only 30% used match
performance indicators (e.g. outcome, shots on
goal) in their study of tactical behaviour. Most
(86%) investigated the link between tactical features
and match performance using performance indi-
cators related to shots or goals. Interestingly, there
is little consensus on the relation of most tactical fea-
tures with performance (outcome). On the one hand,
studies that investigated the link between often-used
tactical features like the team-centroid did not find
a clear relationship with offensive events and per-
formance (Bartlett et al., 2012; Frencken et al.,
2012). On the other hand, authors who used more
complex tactical features like the team surface area
or spread (Moura et al., 2012,2016), or composite
features related to passing (Rein et al., 2017; Spear-
man et al. 2017) did report some relationship with
performance. These rather inconsistent reports on
the effect of tactical features on performance, as
well as the large variety of possible tactical features
to analyse, highlight how difficult it is to uncover
and interpret consistent and generalizable patterns
in tactics.
4. Discussion
With this review, we aimed to put the contributions of
sports and computer science to the analysis of tactical
behaviour in soccer using position tracking data into
perspective. Both domains contributed significantly
to the study of tactical behaviour, and provide a set
of unique approaches towards analytics. Our results
show that there are considerable differences in meth-
odology. We propose that both domains benefit from
a cyclical collaboration and embedding each others
domain expertise. Therefore, we provide a frame-
work for optimizing this collaboration by linking the
contributions from both domains to different parts
of the analytical process that entails the analysis of tac-
tical behaviour using position tracking data (Figure 2).
Our framework could support the field of sports ana-
lytics and specifically the analysis of tactical behaviour,
and result in a better translation to practice.
We have argued in our introduction that research
from sports science and research from computer
science is characterized by distinctly different, and
to some extent contrasting research paradigms. Our
results have revealed that this was also true for
research specifically concerning the study of tactical
behaviour using position tracking data. The sports
science studies we have included in this review were
predominantly characterized by deductive reasoning
in which hypotheses were formed based on theory,
and tested in mostly experimental settings. This is
clearly illustrated by many of the included sports
science works, like those by Aguiar et al. (2015), Bap-
tista et al. (2018), Coutinho et al. (2017,2018),
Duarte et al. (2012), Frencken et al. (2011,2013),
or Olthof et al. (2015,2018), who all presented a
theoretical framework to study and understand tacti-
cal behaviour that is rooted in the dynamical systems
theory (Aguiar et al., 2015; Baptista et al., 2018,
2019;Coutinhoetal.,2017,2018;Duarteetal.,
2012;Frenckenetal.,2012,2013; Olthof et al., 2015,
Unlocking the potential of big data to support tactical performance analysis 9
2018), and specifically designed experimental set-ups
with small-sided games to analyse behaviour against
the backdrop of this framework. The aims of these
sports science studies are generally focused on advan-
cing our understanding of tactical behaviour, and
applying the findings in practice to for example
improve training design or talent identification and
development. This is illustrated in studies like those
by Gonçalves et al. (2016,2017a,2017b), who
studied the impact of numerical imbalance and
spatial constraints on tactical behaviour in small-
sided games, to optimize training design (Gonçalves
et al., 2016,2017a,2017b). Or the work by Olthof
et al. (2015,2018,2019), who studied the impact
of field size on tactical behaviour in small-sided
games and compared that behaviour to behaviour
seen in a real match, to find out what design would
be the best format to improve match performance.
The included computer science studies on the
other hand, provide a very different perspective.
The studies we included from this domain generally
do not present any theoretical context to explain tac-
tical behaviour, nor do they contain hypotheses about
what this behaviour would look like or how teams or
players would react to certain manipulations or
stimuli. We would like to argue that based on our
findings, this is not necessarily a shortcoming but
rather a matter of a different aim and perspective.
Rather than aiming for an increased understanding
and practical implications in sport, the computer
science studies we included were typically focussed
on advancing methodology and computational tech-
niques for data processing, modelling and extraction
of information by means of inductive designs that
centre on data mining, feature extraction and visual
analysis. This is illustrated by for example the work
of Bialkowski et al. (2014a,2014b,2014c,2016),
and Wei et al. (2013), who presented new methods
to detect formations and identify positional roles
based on data based on large observational dataset
collected in competition. Or the work of Stein et al.
(2015,2016), and Janetzko et al. (2014,2016), who
presented a data visualization and exploration tech-
niques that aim to optimize the workflow of video-
analysts in professional soccer organizations
(Janetzko et al., 2014,2016; Stein et al., 2015,
2016). Or the work of Chawla et al. (2017), who pre-
sented a model to accurately classify successful and
non-successful passes based on data. None of these
works extensively discus practical applications,
explain the findings based on a theoretical under-
standing of tactical behaviour or advance our under-
standing of behaviour, have experimental designs or
result in direct practical implications on the level of
training and performance. However, this is by
design, as these contributions all aimed to propose
new techniques, features and data processing and
visualization routines instead.
The distinct difference in contributions from both
domains to the research on tactical behaviour is also
confirmed by other recent review studies on similar
topics. In systematic reviews characteristic for sports
science like those by Sarmento et al. (2014) and
Ometto et al. (2018), the focus is on how position
tracking data can be used to analyse performance
and monitor loading, or how to manipulate small-
sided games to change behaviour. On the other
hand, in typical computer science survey papers like
the one by Perin et al. (2018), Gudmundsson and
Horton (2017) and Stein et al. (2017), the focus is
more technical, discussing topics from data manage-
ment to visualization and how to develop analytical
tools. Given the fundamental differences in expertise
and methodology, collaboration between both
domains can therefore be regarded a key challenge.
Most studies included in this review fit well into
one end of the sports science computer science
spectrum, and collaborations between domains are
still relatively sparse. However, we have also included
multiple studies that gravitate towards the middle of
the spectrum and illustrate the added benefit of a
synergy between both domains. The studies by Link
et al. (2016), Rein et al. (2017), and Goes et al.
(2019), are examples of sports science work that uti-
lizes observational designs in which large datasets
were collected in competition and used for the devel-
opment and validation of new features that assess
some aspect of performance (Goes et al., 2019;
Link et al., 2016; Rein et al., 2017). Although in
these studies most involved scientists had a back-
ground in sports science, at least some of them also
had a background computer science helping them
applying computer science techniques for data pro-
cessing, visualization and analytics coming from
domains like mathematics, data mining and
machine learning, and information processing.
Despite their methodology, these studies were still
classified as sports science as their aim was not
necessarily the sole development of a new approach
or technique, but rather the validation of these
approaches by studying their relation to successful
performance and applying the approach for the
purpose of performance analysis. The work by Goes
et al. (2019) for example resulted in a new metric to
quantify the effectiveness of a pass that was con-
structed using clustering techniques and then
applied for player evaluation purposes, while the
work by Rein et al. (2017), was focussed on applying
multiple metrics that assess pass effectiveness by
studying their relation to offensive performance.
As we identified several sports science studies that
utilized techniques from other domains to advance
10 F. Goes et al.
their research, we also identified multiple computer
science studies that did the same. The studies by
Power et al. (2017), Spearman et al. (2017),
Andrienko et al. (2017) and Fernandez and Bornn
(2018) can all be regarded as examples of studies
that predominantly involved expertise from computer
and data science, but who also involved domain
expertise from sports (science) (Andrienko et al.,
2017; Fernandez and Bornn, 2018; Power et al.,
2017; Spearman et al. 2017). These studies focussed
on feature development and modelling, as they con-
structed models for the assessment of pass risk and
reward, pressure, space control and pass probability.
Different to the sports science examples mentioned
before, the scope of these studies was methodologi-
cal, yet they typically validated their approach and
its assumed relation to performance based on
domain expertise, and provided several examples of
practical use cases based on data collected in compe-
tition. These examples from sports science and com-
puter science studies that utilize expertise from other
domains illustrate the additional benefits that can be
gained and can in some ways be regarded as tem-
plates for future collaborations.
The included studies are illustrative of collabor-
ations between the domains of computer science
and sports science suggest contributions from both
domains are compliant rather than concomitant.
We therefore propose that collaboration between
sports science and computer science in the process
of studying tactical behaviour using position tracking
data should be a cyclical rather than a parallel one.
Sports science tests theory and translates practical
problems into research questions. By applying tech-
niques from computer science to sports science
research designs one could come to different
answers to research questions. These answers might
differ in the sense that sports scientists could assess
different aspects of performance, but they could
also differ in the sense that these methods allow for
a more in-depth answer. The other way around
research questions deduced from theory and obser-
vation by sports science, can be used by computer
science to define the scope of their search for, and
development of appropriate technologies to derive
information from position tracking data. Computer
science provides the tools to gain in-depth knowledge
and enables sports science to test increasingly
complex hypotheses and ask new questions. As both
domains bring relevant expertise in relation to con-
ducting and interpreting tactical analyses, we
propose that impactful analytics relies on the combi-
nation of expertise from both domains.
The quality (i.e. accuracy, sampling frequency,
inclusion of ball data) and quantity of available data
have a big impact on most types of research and
cannot be ignored in any discussion of sports ana-
lytics. Due to technological advancements, lowers
costs, and growing interest (Rein & Memmert,
2016), we have seen an increase in the availability
and quality of data in soccer, similar to big data devel-
opments in other areas, providing numerous oppor-
tunities (Gandomi & Haider, 2015), like opponent-
analysis, scouting and performance optimization on
a team and individual level. However, based on our
results, these opportunities only seem to be seized
to a limited extent. Most sports science studies are
characterized by experimental set-ups in which
small samples of data are collected in a specific popu-
lation, to answer a predetermined research question
(Olthof et al., 2015; Travassos et al., 2014). Although
this kind of research has allowed us to draw general
inferences about what drives tactical behaviour of
groups, the small sample sizes and highly specific cir-
cumstances that are often different from a real match
also limit the use of findings from these studies in
real-life tactical analysis. As tactical behaviour is
highly dependent on the context (Gréhaigne et al.,
1999; Rein & Memmert, 2016), larger real-life datasets
collected in actual competitive matches in combination
with methodology that enables capturing complex pat-
terns might allow one to draw conclusions about per-
formance with a stronger ecological validity. Of
course, causation and correlation should not be con-
fused, but with large enough datasets, the discovered
patterns carry some weight and at the very least
provide a good basis for developing new theories that
can be further examined in more controlled settings.
On the other hand, handling and analysing much
larger datasets challenges back-end processes (i.e.
storing, pre-processing andquerying)andanalytics
(i.e. aggregation and feature construction) that are
not typically addressed by sports science research,
and can thus be regarded a key challenge. The
domain of computer science typically focuses on tech-
nological developments within these processes, and
collaboration could advance the ability of sports
science to work with increasingly large datasets.
As illustrated by the results in this review, the
majority of sports science studies utilizes low-level
(simple to compute and high reduction of complex-
ity) spatial features like the team centroid (Folgado
et al., 2014; Yue et al., 2008a,2008b), that aim to
capture group-level behaviour in one feature. The
computation of these features is relatively easy, and
their computational cost is low, yet as illustrated by
the results, they have limited value. Features like
the team centroid have often been developed to
study tactics in small-sided games, but seem incap-
able of fully capturing the complex dynamics of an
11-a-side match (Goes et al., 2019). Combining
computer science expertise on for example data
Unlocking the potential of big data to support tactical performance analysis 11
mining and machine learning, with sports science
theory provides many opportunities to innovate in
this aspect. A potential example could be applying
the work of Bialkowski et al. (2014a,2014b,2014c,
2016), that has resulted in methodology to automati-
cally and dynamically identify formations and pos-
itional roles. Applying this method in sports science
research like that of Memmert et al. (2017), Goes
et al. (2019) or Siegle and Lames (2013), who all
use line centroids in which the lines are based on
manual annotation of fixed positional roles, could
lead to different answers and new insights. The
other way around, applying the theoretical framework
of dynamical systems theory that is presented in for
example the sports science work by Frencken et al.
(2012,2013), to feature construction in computer
science work like that on quantifying pressure by
Andrienko et al. (2017), could lead to advanced
methods that use coupling between features and
movement synchrony of players to quantify pressure,
defensive strategies and off-ball performance of
offensive players. These are typical examples of cycli-
cal collaboration. The outcome of a collaboration like
this would for example allow one to innovate the way
we analyse the performance of a team during the
game, to support decision-making by the coach in
near-time, to analyse the opponent before the
match by studying patterns that characterise their
successful attacks, or to identify specific patterns to
emphasize and train in the own team.
Ultimately, spatial features no matter their com-
plexity hold little meaning when aggregated over a
full match, and temporal aggregation is essential to
place spatial behaviour in a temporal context (Gré-
haigne et al., 1999; Rein & Memmert, 2016). Most
included sports science studies aggregated over fixed
windows independent of game-context, like a match
or half (Duarte et al., 2013a,2013b; Gonçalves et al.,
2017a,2017b), which limits interpretability. We argue
that deriving meaning from spatial features requires
the use of event-based time-windows, which is more
common in computer science studies (Andrienko
et al., 2017;Chawlaetal.,2017; Fernandez and
Bornn, 2018), as using event-based time-windows
allows one to draw conclusions about for example a
pass, dribble or set-piece. On such a small timescale,
it is much easier to find structural patterns than on
theleveloftheentiregame.Thisinturnwouldallow
one to answer questions like what defines an effective
attack, or successful dribble. Although this might
seem like another opportunity for sports science to
implement existing computer science expertise, this
one is less straightforward than spatial aggregation,
and adequate temporal aggregation can be regarded
as a key challenge. As time-series analysis is typically
challenging for most machine learning techniques
(Fu, 2011), and sport and behavioural sciences actually
have a lot of expertise in time-series analysis, one could
argue innovation here would definitively be on the brink
of interaction between both domains.
Despite the often underlined potential (Memmert
et al., 2017; Rein & Memmert, 2016; Stein et al.,
2017) of position tracking data to study tactical be-
haviour, in sports, and specifically in soccer, the
application is still relatively limited (Rein &
Memmert, 2016; Folgado et al., 2014). Our results
demonstrated the contributions to this topic have
increased substantially over the recent years, and
already resulted in an in-depth understanding of
tactics in soccer. However, so far, these studies have
had little practical impact, and the potential of pos-
ition tracking data does not seem to be fully utilized
so far. We argue that changing this requires domain
expertise from sports science as well as computer
science embedded within a multidisciplinary
approach, which is a key challenge for sports ana-
lytics. It also requires a clear link between method-
ology, findings and real-life performance (i.e.
answering the question how does this help me/is
this related to winning the game?asked by prac-
tioners). Understanding behaviour therefore
requires an approach that at least evaluates a certain
aspect within the context of others, as well as
answers the key performance question how does
(changing) this behaviour impact our performance.
With this systematic review, we provided an evalu-
ation of contributions from sports science and compu-
ter science to the study of position tracking data for the
purpose of tactical analysis in soccer, and we have
shown how an interplay between both domains could
results in innovative contributions to the field of
sports analytics. One major limitation of the current
review is its narrow scope, as we largely ignored essen-
tial components of the data analytics process like data
acquisition, storage, management, visualization, as
well as ethics and privacy issues (Perin et al., 2018;
Stein et al., 2017). However, doing so allowed us to
discuss the opportunities for position tracking data to
impact tactical behaviour, whereas previous reports
have merely touched upon its potential. This has
resulted in the discussion of a set of challenges con-
cerning the data analytics process, specifically feature
construction, spatial and temporal aggregation that
could be resolved by multidisciplinary collaboration,
which is pivotal in unlocking the potential of position
tracking data in sports analytics.
5. Conclusion
With this review, we have shown the considerable
opportunities for collaboration between sports
12 F. Goes et al.
science and computer science to study tactics in
soccer, particularly when using position tracking
data. Our systematic review highlights that sports-
and computer science research on tactical behaviour
contains distinctly different contributions. We pro-
posed a framework that could serve as the foundation
for the combination of sports science and computer
science expertise in tactical analysis. It has become
clear that the collaborations between both domains
benefit from a stronger dialogue yielding a cyclical
collaboration: sports science identifies problems and
tests theory hypotheses, computer science develops
robust techniques to solve such problems, and
sports science in turn adjusts theories and derives
practical implications from data by implementing
them.
Acknowledgements
This work was supported by grants of the Nether-
lands Organization for Scientific Research and
FAPESP (project title: The Secret of Playing Foot-
ball: Brazil vs. The Netherlands).
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplemental data
Supplemental data for this article can be accessed here
https://doi.org/10.1080/17461391.2020.1747552.
Funding
This work was supported by grants of the Netherlands Organiz-
ation for Scientific Research (629.004.012-SIA) and FAPESP
(2016/50250-1, 2017/20945-0 and 2018/19007-9).
ORCID
F.R. Goes http://orcid.org/0000-0002-5995-3792
L.A. Meerhoff http://orcid.org/0000-0003-4386-
0919
F.A. Moura http://orcid.org/0000-0002-0108-
7246
S.A. Cunha http://orcid.org/0000-0003-1927-0142
References
Aguiar, M., Gonçalves, B., Botelho, G., Lemmink, K., & Sampaio,
J. (2015). Footballersmovement behaviour during 2-, 3-, 4-
and 5-a-side small-sided games. Journal of Sports Sciences,33,
12591266.
Andrienko, G., Andrienko, N., Budziak, G., Dykes, J., Fuchs, G.,
von Landesberger, T., & Weber, H. (2017). Visual analysis of
pressure in football. Data Mining and Knowledge Discovery,31,
17931839.
Aquino, R. L., Gonçalves, L. G., Vieira, L. H., Oliveira, L. P.,
Alves, G. F., Santiago, P. R., & Puggina, E. F. (2016a).
Periodization training focused on technical-tactical ability in
young soccer players positively affects biochemical markers
and game performance. Journal of Strength and Conditioning
Research,30, 27232732.
Aquino, R. L., Goncalves, L. G., Vieira, L. H., Oliveira, L. P.,
Alves, G. F., Santiago, P. R., & Puggina, E. F. (2016b).
Biochemical, physical and tactical analysis of a simulated
game in young soccer players. Journal of Sports Medicine and
Physical Fitness,56, 15541561.
Araújo, D., Passos, P., Esteves, P., Duarte, R., Lopes, J.,
Hristovski, R., & Davids, K. (2015). The micro-macro link in
understanding sport tactical behaviours: Integrating infor-
mation and action at different levels of system analysis in
sport. Movement & Sport Sciences Science & Motricité,5363.
doi:10.1051/sm/2015028
Balague, N., Torrents, C., Hristovski, R., Davids, K., & Araújo, D.
(2013). Overview of complex systems in sport. Journal of Systems
Science and Complexity,26,413.
Baptista, J., Travassos, B., Gonçalves, B., Mourão, P., Viana, J. L.,
& Sampaio, J. (2018). Exploring the effects of playing for-
mations on tactical behaviour and external workload during
football small-sided games. Journal of Strength and
Conditioning Research. Epub ahead of print. doi:10.1519/JSC.
0000000000002445.
Barnabé, L., Volossovitch, A., Duarte, R., Ferreira, A. P., &
Davids, K. (2016). Age-related effects of practice experience
on collective behaviours of football players in small-sided
games. Human Movement Science,48,7481.
Bartlett, R., Button, C., Robins, M., Dutt-Mazumder, A., &
Kennedy, G. (2012). Analysing team coordination patterns
from player movement trajectories in soccer: Methodological
considerations. International Journal of Performance Analysis in
Sport,12, 398424.
Batista, J., Goncalves, B., Sampaio, J., Castro, J., Abade, E., &
Travassos, B. (2019). The influence of coachesinstruction on
technical actions, tactical behaviour, and external workload in
football small-sided games. Montenegrin Journal of Sports
Science and Medicine,8,2936.
Bialkowski, A., Lucey, P., Carr, P., Matthews, I., Sridharan, S., &
Fookes, C. (2016). Discovering team structures in soccer from
spatiotemporal data. IEEE Transactions on Knowledge and Data
Engineering,28, 25962605.
Bialkowski, A., Lucey, P., Carr, P., Yue, Y., & Matthews, I.
(2014b). Win at home and draw away: Automatic formation
analysis highlighting the differences in home and away team
behaviors. MIT Sloan Sports Analytics Conference.
Bialkowski, A., Lucey, P., Carr, P., Yue, Y., Sridharan, S., &
Matthews, I. (2014a). Identifying team style in Soccer using for-
mations learned from spatiotemporal tracking data. 2014 IEEE
International Conference on Data Mining Workshop 914.
doi:10.1109/ICDMW.2014.167
Bialkowski, A., Lucey, P., Carr, P., Yue, Y., Sridharan, S., &
Matthews, I. (2014c). Large-scale analysis of soccer matches
using spatiotemporal tracking data. 2014 IEEE International
Conference on Data Mining 725730. IEEE. doi:10.1109/
ICDM.2014.133
Cao, L. J., & Tay, F. E. H. (2003). Support vector machine with
adaptive parameters in financial time series forecasting. IEEE
Transactions Neural Networks,14, 15061518.
Castellano, J., Fernandez, E., Echeazarra, I., Barreira, D., &
Garganta, J. (2017). Influence of pitch length on inter- and
Unlocking the potential of big data to support tactical performance analysis 13
intra-team behaviors in youth soccer. Anales de Psicología,33,
486496.
Chawla, S., Estephan, J., Gudmundsson, J., & Horton, M. (2017).
Classification of passes in football matches using spatiotemporal
data. ACM Transactions on Spatial Algorithms and Systems,3,1
30.
Clemente, F. M., Couceiro, M. S., Martins, F. M. L., Mendes, R.,
& Figueiredo, A. J. (2013a). Measuring tactical behaviour using
technological metrics: Case study of a football game.
International Journal of Sports Science & Coaching,8, 723
739.
Clemente, M. F., Couceiro, S. M., Martins, F. M. L., Mendes, R.,
& Figueiredo, A. J. (2013b). Measuring collective behaviour in
football teams: Inspecting the impact of each half of the match
on ball possession. International Journal of Performance Analysis
in Sport,13, 678689.
Clemente, F., Santos-Couceiro, M., Lourenco-Martins, F., Sousa,
R., & Figueiredo, A. (2014). Intelligent systems for analyzing
soccer games: The weighted centroid. Ingeniería e
Investigación,34,7075.
Couceiro, M. S., Clemente, F. M., Martins, F. M. L., &
Machado, J. A. T. (2014). Dynamical stability and predictability
of football players: The study of one match. Entropy,16, 645
674.
Coutinho, D., Gonçalves, B., Travassos, B., Wong, D. P., Coutts,
A. J., & Sampaio, J. E. (2017). Mental fatigue and spatial refer-
ences impair soccer playersphysical and tactical performances.
Frontiers in Psychology,8, 1645.
Coutinho, D., Gonçalves, B., Wong, D. P., Travassos, B., Coutts,
A. J., & Sampaio, J. (2018). Exploring the effects of mental and
muscular fatigue in soccer playersperformance. Human
Movement Science,58, 287296.
Davidson, J. (2010). The YouTube video recommendation
system. Proceedings of the fourth ACM conference on
Recommender systems RecSys 10 293. ACM Press.
doi:10.1145/1864708.1864770
Duarte, R., Araújo, D., Correia, V., Davids, K., Marques, P., &
Richardson, M. J. (2013a). Competing together: Assessing the
dynamics of team-team and player-team synchrony in pro-
fessional association football. Human Movement Science,32,
555566.
Duarte, R., Araújo, D., Folgado, H., Esteves, P., Marques, P., &
Davids, K. (2013b). Capturing complex, non-linear team beha-
viours during competitive football performance. Journal of
Systems Science and Complexity,26,6272.
Duarte, R., Araújo, D., Freire, L., Folgado, H., Fernandes, O., &
Davids, K. (2012). Intra- and inter-group
coordination patterns reveal collective behaviors of football
players near the scoring zone. Human Movement Science,31,
16391651.
Fernandez, J., & Bornn, L. (2018). Wide open spaces : A statistical
technique for measuring space creation in professional soccer.
MIT Sloan Sports Analytics Conference,119.
Figueira, B., Gonçalves, B., Masiulis, N., & Sampaio, J. (2018).
Exploring how playing football with different age groups
affects tactical behaviour and physical performance. Biology of
Sport,35, 145153.
Filetti, C., Ruscello, B., DOttavio, S., & Fanelli, V. (2017). A
study of relationships among technical, tactical, physical par-
ameters and final outcomes in elite soccer matches as analyzed
by a semiautomatic video tracking system. Perceptual and Motor
Skills,124, 601620.
Folgado, H., Gonçalves, B., Abade, E., & Sampaio, J. (2014a).
Brief overview of research and applications using football
playerspositional data. Rev Kronos,13,18.
Folgado, H., Lemmink, K. A. P. M., Frencken, W., & Sampaio, J.
(2014b). Length, width and centroid distance as measures of
teams tactical performance in youth football. European Journal
of Sport Science,14, S487S492.
Frencken, W., De Poel, H., Visscher, C., & Lemmink, K. (2012).
Variability of inter-team distances associated with match events
in elite-standard soccer. Journal of Sports Sciences,30, 1207
1213.
Frencken, W. G. P., Lemmink, K. A. P. M., & Delleman, N. J.
(2010). Soccer-specific accuracy and validity of the local pos-
ition measurement (LPM) system. Journal of Science and
Medicine in Sport,13(6), 641645. doi:10.1016/j.jsams.2010.
04.003.
Frencken, W., Lemmink, K., Delleman, N., & Visscher, C.
(2011). Oscillations of centroid position and surface area of
soccer teams in small-sided games. European Journal of Sport
Science,11, 215223.
Frencken, W., van der Plaats, J., Visscher, C., & Lemmink, K.
(2013). Size matters: Pitch dimensions constrain interactive
team behaviour in soccer. Journal of Systems Science and
Complexity,26,8593.
Frias, T., & Duarte, R. (2014). Man-to-man or zone defense ?
Measuring team dispersion behaviors in small-sided soccer
games. Trends in Sports Science,3, 135144.
Fu, T. C. (2011). A review on time series data mining. Engineering
Applications of Artificial Intelligence,24, 164181.
Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data
concepts, methods, and analytics. International Journal of
Information Management,35, 137144.
Goes, F. R., Kempe, M., Meerhoff, L. A., & Lemmink, K. A. P. M.
(2019). Not every pass can be an assist: A data-driven model to
measure pass effectiveness in professional soccer matches. Big
Data,6(1), 5770. doi:10.1089/big.2018.0067.
Gonçalves, B., Coutinho, D., Santos, S., Lago-Penas, C., Jiménez,
S., & Sampaio, J. (2017a). Exploring team passing networks and
player movement dynamics in youth association football. PLoS
One,12(1), 113. doi:10.1371/journal. pone.0171156.
Gonçalves, B., Esteves, P., Folgado, H., Ric, A., Torrents, C., &
Sampaio, J. (2017b). Effects of pitch area-restrictions on tactical
behavior, physical, and physiological performances in soccer
large-sided games. Journal of Strength and Conditioning
Research,31, 23982408.
Gonçalves, B. V., Figueira, B. E., Maçãs, V., & Sampaio, J. (2014).
Effect of player position on movement behaviour, physical and
physiological performances during an 11-a-side football game.
Journal of Sports Sciences,32, 191199.
Gonçalves, B., Marcelino, R., Torres-Ronda, L., Torrents, C., &
Sampaio, J. (2016). Effects of emphasising opposition and
cooperation on collective movement behaviour during football
small-sided games. Journal of Sports Sciences,34, 13461354.
Gréhaigne, J.-F., Godbout, P., & Bouthier, D. (1999). The foun-
dations of tactics and strategy in team sports. Journal of
Teaching in Physical Education,18, 159174.
Grosskreutz, H., & Rüping, S. (2009). On subgroup discovery in
numerical domains. Data Mining and Knowledge Discovery,19,
210226.
Grunz, A., Memmert, D., & Perl, J. (2012). Tactical pattern recog-
nition in soccer games by means of special self-organizing maps.
Human Movement Science,31, 334343.
Gudmundsson, J., & Horton, M. (2017). Spatio-temporal analysis
of team sports. ACM Computing Surveys,50(2), 122.
Gudmundsson, J., & Wolle, T. (2010). Towards automated foot-
ball analysis. Proc. 10th Conf. Math. Comput. Sport.
Janetzko, H., Sacha, D., Stein, M., Schreck, T., Keim, D. A., &
Deussen, O. (2014). Feature-driven visual analytics of soccer
data. 2014 IEEE Conference on Visual Analytics Science and
Technology (VAST),1322.
Janetzko, H., Stein, M., Sacha, D., & Schreck, T. (2016).
Enhancing parallel coordinates: Statistical visualizations for
14 F. Goes et al.
analyzing soccer data. IS&T Electronic Imaging Conference on
Visualization and Data Analysis,14 FEB 2016 - 18 FEB 2016
(San Francisco, California), 18.
Knauf, K., Memmert, D., & Brefeld, U. (2016). Spatio-temporal
convolution kernels. Machine Learning,102, 247273.
Labrinidis, A., & Jagadish, H. V. (2012). P2032-Labrinidis.Pdf.
20322033. doi:10.14778/2367502.2367572
Link, D., Lang, S., & Seidenschwarz, P. (2016). Real time quanti-
fication of dangerousity in football using spatiotemporal track-
ing data. PLoS One,11, e0168768.
Machado, V., Leite, R., Moura, F., Cunha, S., Sadlo, F., &
Comba, J. L. D. (2017). Visual soccer match analysis using
spatiotemporal positions of players. Computers & Graphics,68,
8495.
Manafifard, M., Ebadi, H., & Moghaddam, H. A. (2017). A survey
on player tracking in soccer videos. Computer Vision and Image
Understanding,159,1946.
Mara, J., Morgan, S., Pumpa, K., & Thompson, K. (2017). The
accuracy and reliability of a new optical player tracking system
for measuring displacement of soccer players. International
Journal of Computer Science in Sport,16, 175184.
Memmert, D., Lemmink, K. A. P. M., & Sampaio, J. (2017).
Current approaches to tactical performance analyses in soccer
using position data. Sports Medicine,47,110.
Memmert, D., Raabe, D., Schwab, S., & Rein, R. (2019).
A tactical comparison of the 4-2-3-1 and 3-5-2 formation in
soccer: A theory-oriented, experimental approach based
on positional data in an 11 vs. 11 game set-up. PLoS One,14,
112.
Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A.,
Petticrew, M., Stewart, L. A. (2015). Preferred reporting
items for systematic review and meta-analysis protocols
(PRISMA-P) 2015 statement. Systematic Review,4(1), 1.
Moura, F. A., Barreto Martins, L. E., Anido, R. D. O., De Barros,
L., & Cunha, R. M. (2012). Quantitative analysis of Brazilian
football playersorganisation on the pitch. Sports Biomechanics,
11,8596.
Moura, F. A., Martins, L. E. B., Anido, R. O., Ruffino, P. R. C.,
Barros, R. M. L., & Cunha, S. A. (2013). A spectral analysis of
team dynamics and tactics in Brazilian football. Journal of Sports
Sciences,31, 15681577.
Moura, F. A., van Emmerik, R. E. A., Santana, J. E., Martins,
L. E. B., Barros, R. M. L. d., & Cunha, S. A. (2016).
Coordination analysis of playersdistribution in football using
cross-correlation and vector coding techniques. Journal of
Sports Sciences,34, 22242232.
Olthof, S. B. H., Frencken, W. G. P., & Lemmink, K. A. P. M.
(2015). The older, the wider: On-field tactical behavior of
elite-standard youth soccer players in small-sided games.
Human Movement Science,41,92102.
Olthof, S. B. H., Frencken, W. G. P., & Lemmink, K. A. P. M.
(2018). Match-derived relative pitch area changes the physical
and team tactical performance of elite soccer players in small-
sided soccer games. Journal of Sports Sciences,36, 15571563.
Olthof, S. B. H., Frencken, W. G. P., & Lemmink, K. A. P. M.
(2019). A match-derived relative pitch area facilitates the tacti-
cal representativeness of small-sided games for the official
soccer match. Journal of Strength and Conditioning Research,33,
523530.
Ometto, L., Vasconcellos, F. V., Cunha, F. A., Teoldo, I., Souza,
C. R. B., Dutra, M. B., Davids, K. (2018). How manipulat-
ing task constraints in small-sided and conditioned games
shapes emergence of individual and collective tactical beha-
viours in football: A systematic review. International Journal of
Sports Science & Coaching,13, 12001214.
Palut, Y., & Zanone, P. G. (2005). A dynamical analysis of tennis:
Concepts and data. Journal of Sports Sciences,23, 10211032.
Perin, C., Vuillemot, R., Stolper, C. D., Stasko, J. T., Wood, J., &
Carpendale, S. (2018). State of the art of sports data visualiza-
tion. Computer Graphics Forum,37, 663686.
Peuquet, D. J., & Duan, N. (1995). An event-based spatiotemporal
data model (ESTDM) for temporal analysis of geographical data.
International Journal of Geographical Information Systems,9,724.
Pincus, S. (1995). Approximate entropy (ApEn) as a complexity
measure. Chaos,5, 110117.
Pincus, S. M. (1991). Approximate entropy as a measure of system
complexity. Proceedings of the National Academy of Sciences,88,
22972301.
Power, P., Ruiz, H., Wei, X., & Lucey, P. (2017). Not all passes are
created equal. Proc. 23rd ACM SIGKDD Int. Conf. Knowl.
Discov. Data Min. KDD 17, 16051613.
Ramos, J., Lopes, R. J., Marques, P., & Araújo, D. (2017).
Hypernetworks reveal compound variables that capture coop-
erative and competitive interactions in a soccer match.
Frontiers in Psychology,8,112.
Rein, R., & Memmert, D. (2016). Big data and tactical analysis in
elite soccer: Future challenges and opportunities for sports
science. Springerplus,5, 1410.
Rein, R., Raabe, D., & Memmert, D. (2017). Which pass is better?
Novel approaches to assess passing effectiveness in elite soccer.
Human Movement Science,55, 172181.
Ric, A., Torrents, C., Gonçalves, B., Torres-Ronda, L., Sampaio,
J., & Hristovski, R. (2017). Dynamics of tactical behaviour in
association football when manipulating playersspace of inter-
action. PLoS One,12(1), e0180773.
Sampaio, J. E., Lago, C., Gonçalves, B. V., Macas, V. M., & Leite,
N. (2014). Effects of pacing, status and unbalance in time
motion variables, heart rate and tactical behaviour when
playing 5-a-side football small-sided games. Journal of Science
and Medicine in Sport,17, 229233.
Sampaio, J., & Macas, V. (2012). Measuring tactical behaviour in
football. International Journal of Sports Medicine,33, 395401.
Sarmento, H., Marcelino, R., Anguera, M. T., CampaniÇo, J.,
Matos, N., & LeitÃo, J. C. (2014). Match analysis in football:
A systematic review. Journal of Sports Sciences,32, 18311843.
Seifert, L., Araújo, D., Komar, J., & Davids, K. (2017).
Understanding constraints on sport performance from the com-
plexity sciences paradigm: An ecological dynamics framework.
Human Movement Science,56(April), 178180. doi:10.1016/j.
humov.2017.05.001.
Siegle, M., & Lames, M. (2013). Modeling soccer by means of rela-
tive phase. Journal of Systems Science and Complexity,26,1420.
Silva, P., Aguiar, P., Duarte, R., Davids, K., Araújo, D., &
Garganta, J. (2014a). Effects of pitch size and skill level on tac-
tical behaviours of association football players during small-
sided and conditioned games. International Journal of Sports
Science & Coaching,9, 9931006.
Silva, P., Chung, D., Carvalho, T., Cardoso, T., Davids, K.,
Araújo, D., & Garganta, J. (2016a). Practice effects on intra-
team synergies in football teams. Human Movement Science,
46,3951.
Silva, P., Duarte, R., Sampaio, J., Aguiar, P., Davids, K., Araújo, D.,
& Garganta, J. (2014b). Field dimension and skill level constrain
team tactical behaviours in small-sided and conditioned games in
football. Journal of Sports Sciences,32, 18881896.
Silva, P., Esteves, P., Correia, V., Davids, K., Araújo, D., &
Garganta, J. (2015). Effects of manipulations of player
numbers vs. field dimensions on inter-individual coordination
during small-sided games in youth football. International
Journal of Performance Analysis in Sport,15, 641659.
Silva, P., Vilar, L., Davids, K., Araújo, D., & Garganta, J. (2016b).
Sports teams as complex adaptive systems: Manipulating player
numbers shapes behaviours during football small-sided games.
Springerplus,5, 191.
Unlocking the potential of big data to support tactical performance analysis 15
Spearman, W., Basye, A., Dick, G., Hotovy, R., & Pop, P. (2017).
Physics-based modeling of pass probabilities in Soccer. 114.
Stein, M., Häußler, J., Jäckle, D., Janetzko, H., Schreck, T., &
Keim, D. (2015). Visual soccer analytics: Understanding the
characteristics of collective team movement based on feature-
driven analysis and abstraction. ISPRS International Journal of
Geo-Information,4, 21592184.
Stein, M., Janetzko, H., Breitkreutz, T., Seebacher, D., Schreck,
T., Grossniklaus, M., Keim, D. A. (2016). Directors cut:
Analysis and annotation of soccer matches. IEEE Computer
Graphics and Applications,36,5060.
Stein, M., Janetzko, H., Lamprecht, A., Breitkreutz, T.,
Zimmermann, P., Goldlucke, B., Keim, D. A. (2018).
Bring it to the pitch: Combining video and movement data to
enhance team sport analysis. IEEE Transactions on
Visualization and Computer Graphics,24,1322.
Stein, M., Janetzko, H., Seebacher, D., Jäger, A., Nagel, M.,
Hölsch, J., Grossniklaus, M. (2017). How to make sense of
team sport data: From acquisition to data modeling and
research aspects. Data,2(2), 225.
Travassos, B., Gonçalves, B., Marcelino, R., Monteiro, R., &
Sampaio, J. (2014). How perceiving additional targets modifies
teamstactical behavior during football small-sided games.
Human Movement Science,38, 241250.
Vilar, L., Araujo, D., Davids, K., & Bar-Yam, Y. (2013). Science
of winning soccer: Emergent pattern-forming dynamics in
association football. Journal of Systems Science and Complexity,
26,7384.
Vilar, L., Araujo, D., Davids, K., & Travassos, B. (2012).
Constraints on competitive performance of attacker-
defender dyads in team sports. Journal of Sports Sciences,30,
459469.
Vilar, L., Duarte, R., Silva, P., Chow, J. Y. I., & Davids, K.
(2014a). The influence of pitch dimensions on performance
during small-sided and conditioned soccer games. Journal of
Sports Sciences,32, 17511759.
Vilar, L., Esteves, P. T., Travassos, B., Passos, P., Lago-Peñas, C.,
& Davids, K. (2014b). Varying numbers of players in small-
sided soccer games modifies action opportunities during train-
ing. International Journal of Sports Science & Coaching,9,
10071018.
Wei, X., Sha, L., Lucey, P., Morgan, S., & Sridharan, S. (2013).
Large-scale analysis of formations in soccer. 2013 International
Conference on Digital Image Computing: Techniques and
Applications (DICTA),18.
Yue, Z., Broich, H., Seifriz, F., & Mester, J. (2008a). Mathematical
analysis of a soccer game. Part I: Individual and collective beha-
viors. Studies in Applied Mathematics,121, 223243.
Yue, Z., Broich, H., Seifriz, F., & Mester, J. (2008b).
Mathematical analysis of a soccer game. Part II: Energy, spec-
tral, and correlation analyses. Studies in Applied Mathematics,
121, 245261.
Zhang, P., Beernaerts, J., Zhang, L., & de Weghe, N. (2016).
Visual exploration of match performance based on football
movement data using the continuous triangular model.
Applied Geography,76,113.
16 F. Goes et al.
... 20 Traditionally, coaching has heavily relied on experience, intuition, and observation to enhance player performance and team dynamics. 21 Through data analysis 22 and simulation of training scenarios, 15 AI helps athletes train, improves talent scouting, optimizes training plans, and makes the fan experience better. 23,24 However, with the rise of AI, coaching is undergoing a transformation. ...
Article
Artificial intelligence (AI) is rapidly transforming sports coaching, offering new tools to enhance athlete performance and training methods. However, the balance between leveraging AI’s capabilities and maintaining the human touch in coaching remains a critical challenge. This study investigates how AI can be effectively integrated into sports coaching while maintaining the essential human elements of leadership, mentorship, and personalized support. The research aims to provide a framework for combining AI technology with traditional coaching strategies to optimize performance. Using Grounded Theory (GT) methodology, the study conducted expert interviews and performed a detailed literature review to understand the interaction between AI and sports coaching. The resulting “Synergy Theory” model explains how AI can enhance training while highlighting the importance of maintaining ethical standards and human-centered coaching practices. The research reveals that AI can considerably improve performance analysis, injury prevention, and training customization. However, over-reliance on AI risks undermining the human aspects of coaching. The findings underscore the need for technological literacy among coaches and the ethical integration of AI in sports. Challenges such as data quality, resistance to technology, and privacy concerns must also be addressed. The present article is one of the first studies to comprehensively explore the ethical, practical, and technical considerations of integrating AI into sports coaching. This study also offers practical recommendations for balancing AI technology with traditional coaching methods.
... In [14], in the review of the research conducted in related to the use of players' location data in support of tactical performance analysis, the importance of cumulative spatial functions is pointed out. Cumulative spatial functions, while modeling the players' behavior at the macro level in each time period, reduces the complexity of the analyses to an interpretable level. ...
Preprint
Full-text available
The existence of a significant amount of spatio-temporal data in a football match creates a good potential for Post-Match Review and analysis of team behavior. These analyses can be done by focusing on the whole team or individual players. The purpose of this paper is to analyze the efficiency and general behavior of the team in the form of a single entity, from a spatio-geometrical point of view. This process starts by defining a convex hull as the team shape in each time frame. In the next step, a set of spatial, geometric, zone-based, and event-based parameters are introduced and extracted to describe the shape of the team at each frame. These descriptors are the basis of the two-stage spatio-geometrical clustering of the team during the match. What is obtained from clustering is the identification of similar patterns for the shape of the team in situations of in-possession and out-of-possession of the ball. Examining these clusters in the Post-Match Review process determines the overall performance of the team in different situations, the extent of the team's dominance over different areas of the field, as well as the team's technical strategies. No need to transfer the team shape to the image space, no need for image processing techniques for analysis and thus reducing the computational load, introducing a new geometric descriptor and performing clustering in two stages for a better and more meaningful interpretation of the team shape from the points are the distinguishing points of this article.
... Player monitoring systems, wearable technology, and video analysis tools track every action made on the field. A team's performance, player accomplishments, and head-to-head matches are kept on file in statistical databases [6,7]. ...
Article
Full-text available
Sports data analysis and prediction are essential for gaining a competitive advantage in today’s sports. Artificial Neural Networks (ANNs) have shown promising outcomes in several disciplines, including sports analytics. Sports data is dynamic and complex, making it difficult for standard ANNs to identify minute patterns in it. We introduce a new Puzzle-Optimized Artificial Neural Network (PO-ANN) in this work, which is intended for sports data processing and prediction. The PO-ANN is optimized using a puzzle-inspired method to enhance the network’s ability to identify and comprehend complex patterns in the data. The technique constantly modifies the weights and network topology, enabling the model to better react to the shifting dynamics of sports competitions. The Indian Premier League provided the dataset, which consists of 950 matches and 20 variables (IPL). We implemented our proposed PO-ANN and forecast accuracy in sports data analysis and prediction using Python. We performed a comparison analysis between our suggested PO-ANN approach and other existing methods, using numerous metrics, including MSE, MAE, and MAPE. The suggested POANN technique produced better outcomes than the previous approaches.
... Studies have demonstrated the effectiveness of machine learning algorithms in forecasting player movements, identifying gameplay patterns, and optimizing team strategies. For instance, [6] found that machine learning models could predict player performance metrics with increasing accuracy as larger datasets become available, highlighting the models' capability to analyze vast amounts of historical game data and provide actionable insights for coaches and analysts. This aligns with the findings of [7], who emphasize the potential of machine learning techniques, such as clustering, to detect patterns in raw sports data, further supporting the utility of these algorithms in performance analysis [8]. ...
Conference Paper
This conceptual research paper examines the potential of Artificial Intelligence (AI) to revolutionize the analysis of game tactics and player performance in sports. With the rapid advancement of AI technologies, there is a growing interest in leveraging machine learning and data analytics to gain deeper insights into game dynamics and player efficiency. This study aims to conceptualize the development and implementation of AI-driven tools to enhance tactical analysis and performance assessment, offering a transformative approach to sports analytics. The primary goal of this research is to outline a comprehensive framework for utilizing AI to analyze game tactics and player performance systematically. By integrating AI technologies, the proposed framework seeks to provide coaches, analysts, and players with precise, data-driven insights that can inform strategic decisions, optimize training processes, and ultimately improve competitive performance. Methodologically, this paper synthesizes existing literature from the fields of AI, sports science, and data analytics to identify the key components and capabilities of AI-driven sports analytics systems. It proposes a conceptual model that incorporates machine learning algorithms, computer vision, and big data analytics to process and interpret vast amounts of game data. The model emphasizes automated video analysis, real-time performance tracking, and predictive analytics to evaluate both team strategies and individual player contributions. The paper on "AI-driven analysis of game tactics and player performance" relates to the field of education and learning by providing a framework for leveraging advanced AI technologies to enhance the educational processes in sports through improved tactical understanding, performance assessment, and data-driven decision-making. The conceptual analysis highlights several anticipated benefits of AI-driven sports analytics. These include enhanced accuracy and objectivity in performance evaluation, the ability to uncover hidden patterns and trends in gameplay, and the provision of actionable insights that can be used to develop more effective game plans and training regimens. Additionally, AI's capacity to process data at scale allows for comprehensive, longitudinal studies of player development and team performance over time. However, the paper also acknowledges potential limitations and challenges. These include the high cost and complexity of implementing AI systems, the need for extensive data to train accurate models, and potential resistance from traditionalists within the sports community. Ethical considerations, such as data privacy and the risk of over-reliance on technology, are also critical factors that must be addressed. In conclusion, this conceptual research underscores the transformative potential of AI-driven analysis in sports, while also recognizing the significant challenges that must be overcome for successful integration. The paper calls for empirical research to validate the proposed framework and to explore the practical applications of AI in various sports contexts. Future research should focus on developing robust, user-friendly AI tools, conducting pilot studies with sports teams, and establishing best practices for ethical AI use in sports. By addressing these areas, the sports analytics community can better leverage AI to enhance tactical analysis and player performance, driving innovation and excellence in sports.
... This is due to the fact that both the quantity and quality of data collected have increased rapidly in recent years [13,16]. This availability of data offers opportunities and challenges in creating interfaces between data science and sports science, which is generally seen as a potential game changer in football [16,17]. Nevertheless, the accessibility of football data, explicitly for high-quality event data, is considered a challenge for scientific studies, as those datasets are often inaccessible or only available for high fees from providers [28]. ...
Chapter
Full-text available
Football, being one of the most popular sports in the world,has attracted significant attention from researchers exploring the poten-tial of Artificial Intelligence (AI). In particular, Large Language Models(LLMs), exemplified by digital assistants such as ChatGPT, have proventheir capabilities and offer a potentially effective avenue for football re-search. However, accessibility of football data remains a challenge, as thedatasets collected by providers are often inaccessible. This case studypresents a proof-of-concept that addresses this challenge by introduc-ing an innovative web scraping approach to extract football event dataand making it accessible e.g. for scientific research with LLMs. To thisend, the extracted data is structured into coherent sentences for linguis-tic compatibility. The results show the successful integration of LLMswith football event data, enabling the extraction of information throughretrieval-augmented generation. This work makes a first contribution tothe field by bridging the gap between football and LLMs, demonstratingthe potential for further analysis.
Article
Full-text available
Artificial Intelligence (AI) is transforming the field of sports science by providing unprecedented insights and tools that enhance training, performance, and health management. This work examines how AI is advancing the role of sports scientists, particularly in team sports environments, by improving training load management, sports performance, and player well-being. It explores key dimensions such as load optimization, injury prevention and return-to-play, sports performance, talent identification and scouting, off-training behavior, sleep quality, and menstrual cycle management. Practical examples illustrate how AI applications have significantly advanced each area and how they support and enhance the effectiveness of sports scientists. This manuscript also underscores the importance of ensuring that AI technologies are context-specific and communicated transparently. Additionally, it calls for academic institutions to update their curriculums with AI-focused education, preparing future sports professionals to fully harness its potential. Finally, the manuscript addresses future challenges, such as the unpredictable nature of team sports, emphasizing the need for interdisciplinary collaboration, including clear communication and mutual understanding between sports scientists and AI experts, and the critical balance between AI-driven insights and human expertise.
Article
This study evaluated the performance outcomes of headers during the FIFA Women's World Cup France 2019™. Video analysis was used to code performance outcomes of headers (uncontested and contested) and their descriptors (e.g., playing position, match situation, field location, and the distance the ball travelled). Descriptive statistics, and odds ratios (ORs) (odds of a successful outcome) are reported for headers based on their descriptors. Less than half of all headers resulted in a successful outcome, with headers observed to result in a turnover of possession 53% of the time (uncontested: 51%, contested: 57%). Headed goal conversion rates ranged from 0–38% across countries/teams (mean: 13%), with variability in the frequency of headed shots (range n = 1–22). Headed shots were most efficient from free kicks with 24% of shots resulting in a goal. Odds of a successful heading outcome was lowest from long balls (>20 m) in all areas of the pitch. Uncontested headers had greater odds of a successful heading outcome than contested headers from corner kicks (OR: 2.33, p = 0.004) and free play (OR: 1.30, p = 0.001), but had lower odds of success from goal kicks (OR: 0.62, p = 0.017). Central defensive midfielders (OR: 1.45, p = 0.002) and centrebacks (OR: 1.25, p = <0.001) had significantly greater odds of successful heading outcomes, and strikers (uncontested) (OR: 0.82, p = 0.043) and wingers (contested) (OR: 0.72, p = 0.041) had the lowest. The findings of the current study suggest that heading commonly results in lost possession, particularly from long balls. These findings may help to guide future heading coaching frameworks.
Article
Full-text available
This paper considers the European transfer market for professional football players as a network to study the relation between a team’s position in this network and performance in its domestic league. Our analysis is centered on eight top European leagues. The market in each season is represented as a weighted directed network capturing the transfers of players to or from the teams in these leagues, and we also consider the cumulative network over the past 28 years. We find that the overall structure of this transfer market network has properties commonly observed in real-world networks, such as a skewed degree distribution, high clustering, and small-world characteristics. To assess football teams we first construct a measure of within-league performance that is comparable across leagues. Regression analysis is used to relate league performance with both the network position and level of engagement of the team in the transfer market, under two complimentary setups. Network position variables include, e.g., betweenness centrality, closeness centrality and node clustering coefficient, whereas market engagement variables capture a team’s activity in the transfer market, e.g., total number of player transfers and total paid for players. For the season snapshots, the number of transfers correspond to weighted in- and out-degree. Our analysis first corroborates several recent findings relating aspects of market engagement with teams’ league performance. A higher number of incoming transfers indicates worse performance and better resourced teams perform better. Then, and across specifications, we find that network position variables remain salient even when engagement variables are already considered. This substantiates the notion in the existing literature that a high degree corresponds to better team performance and suggests that network aspects of trading strategy may affect a team’s success in their respective domestic league (or vice versa). In this sense, the approach and findings presented in this paper may in the future guide team’s player acquisition policies.
Article
Full-text available
The presented field experiment in an 11 vs. 11 soccer game set-up is the first to examine the impact of different formations (e.g. 4-2-3-1 vs. 3-5-2) on tactical key performance indicators (KPIs) using positional data in a controlled experiment. The data were gathered using player tracking systems (1 Hz) in a standardized 11 vs. 11 soccer game. The KPIs were measured using dynamical positioning variables like Effective Playing Space, Player Length per Width ratio, Team Separateness, Space Control Gain, and Pressure Passing Efficiency. Within the experimental positional data analysis paradigm, neither of the team formations showed differences in Effective Playing Space, Team Separateness, or Space Control Gain. However, as a theory-based approach predicted, a 3-5-2 formation for the Player Length per Width ratio and Pressure Passing Efficiency exceeded the 4-2-3-1 formation. Practice task designs which manipulate team formations therefore significantly influence the emergent behavioral dynamics and need to be considered when planning and monitoring performance. Accordingly, an experimental positional data analysis paradigm is a useful approach to enable the development and validation of theory-oriented models in the area of performance analysis in sports games.
Article
Full-text available
Small-sided games (SSGs) are a promising training format in soccer to replicate (situations of) the official match across all age groups. Typically, SSGs are played on a smaller relative pitch area (RPA; i.e., ,150 m2) than the match (320 m2 RPA), which results in different tactical demands. To create a more precise replication of tactical match demands in SSGs with less than 11 players per team, a match-derived RPA (320 m2) may be considered because this affords a similar playing area per player. In addition, subgroup analysis is necessary to deal with the different number of players in match and SSGs. Therefore, this study aims to investigate tactical demands of matches and various SSGs—with a different number of players and played on 320 m2 RPA—in talented youth soccer players. Twelve elite soccer teams in 4 age categories (under-13, under-15, under-17, and under-19) played official matches and 4 vs. 4 + goalkeepers (GKs), 6 vs. 6 + GKs, and 8 vs. 8 + GKs. Positional data were collected to calculate tactical variables (interpersonal distances, length, width, and surface areas) for all players and for 2- and 4-player subgroups. Corresponding tactical variability (coefficients of variation expressed as percentages) was determined for all players. Results demonstrated that in each age category, with an increase in number of players, team distances increased and tactical variability decreased. Subgroup analyses revealed similar team distances in matches and SSGs with the exception of larger interpersonal distances in 4 vs. 4 + GKs than the match in under-13, under-15, and under-17. Match-derived RPA in SSGs facilitates the tactical representativeness for the match. Soccer coaches can use such SSGs for an optimal tactical match preparation.
Article
Full-text available
Th is study aimed to explore the eff ects of previous instruction on technical, tactical and external workload performances in football small-sided games. Two 7-a-side balanced competitive teams received instructions regarding the rules of the small-sided games proposed. Additionally, one team received instructions from the coach regarding the collective tactical behaviour required for each exercise condition: (a) Without strategic instruction (WSI); (b) Defensive strategy (DS); (c) Off ensive strategy (OS) to play against the team that only received the rules of the small-sided games. Th e comparisons among game scenarios were assessed via standardised mean diff erences. Th e comparison between WSI and DS revealed higher number of defensive actions, less space covered, and more distance covered in jogging for DS in comparison with WSI. Th e comparison between WSI and OS revealed more passes per ball possession, larger team length, larger space covered, lower distance covered walking, and more distance covered in jogging for OS in comparison with WSI. Th e results reinforce that coaches’ previous instruction constrains the technical, tactical, and physical demands of small-sided games in football. Th e use of previous instruction regarding strategical and tactical behaviour allows highlighting the players’ behaviour and ensures functional team performance.
Article
Full-text available
In professional soccer, nowadays almost every team employs tracking technology to monitor performance during trainings and matches. Over the recent years, there has been a rapid increase in both the quality and quantity of data collected in soccer resulting in large amounts of data collected by teams every single day. The sheer amount of available data provides opportunities as well as challenges to both science and practice. Traditional experimental and statistical methods used in sport science do not seem fully capable to exploit the possibilities of the large amounts of data in modern soccer. As a result, tracking data are mainly used to monitor player loading and physical performance. However, an interesting opportunity exists at the intersection of data science and sport science. By means of tracking data, we could gain valuable insights in the how and why of tactical performance during a soccer match. One of the most interesting and most frequently occurring elements of tactical performance is the pass. Every team has around 500 passing interactions during a single game. Yet, we mainly judge the quality and effectiveness of a pass by means of observational analysis, and whether the pass reaches a teammate. In this article, we present a new approach to quantify pass effectiveness by means of tracking data. We introduce two new measures that quantify the effectiveness of a pass by means of how well a pass disrupts the opposing defense. We demonstrate that our measures are sensitive and valid in the differentiation between effective and less effective passes, as well as between the effective and less effective players. Furthermore, we use this method to study the characteristics of the most effective passes in our data set. The presented approach is the first quantitative model to measure pass effectiveness based on tracking data that are not linked directly to goal-scoring opportunities. As a result, this is the first model that does not overvalue forward passes. Therefore, our model can be used to study the complex dynamics of build-up and space creation in soccer.
Article
Full-text available
In this report, we organize and reflect on recent advances and challenges in the field of sports data visualization. The exponentially‐growing body of visualization research based on sports data is a prime indication of the importance and timeliness of this report. Sports data visualization research encompasses the breadth of visualization tasks and goals: exploring the design of new visualization techniques; adapting existing visualizations to a novel domain; and conducting design studies and evaluations in close collaboration with experts, including practitioners, enthusiasts, and journalists. Frequently this research has impact beyond sports in both academia and in industry because it is i) grounded in realistic, highly heterogeneous data, ii) applied to real‐world problems, and iii) designed in close collaboration with domain experts. In this report, we analyze current research contributions through the lens of three categories of sports data: box score data (data containing statistical summaries of a sport event such as a game), tracking data (data about in‐game actions and trajectories), and meta‐data (data about the sport and its participants but not necessarily a given game). We conclude this report with a high‐level discussion of sports visualization research informed by our analysis—identifying critical research gaps and valuable opportunities for the visualization community. More information is available at the STAR's website: https://sportsdataviz.github.io/.
Conference Paper
Full-text available
Soccer analytics has long focused on the outcomes of discrete, on-ball events; however, in a game where each player has the ball 3% of the time, on average, much of the sport’s complexity resides in off-ball events. A recurrent subject in observation-based tactical analysis is the creation and closure of spaces, yet it remains highly unexplored from a quantitative perspective. We present a method for quantifying spatial value occupation and generation during open play. Our approach proposes a novel pitch control model that incorporates motion information, relative distance to the ball and player position in order to provide a smooth surface of potential ball control. We also provide a model for the relative value of any field location based on the position of the ball, using feed-forward neural networks. This quantification of space creation allows us to observe Sergio Busquets’ high relevance during positional attacks through his pivoting skills, the dragging power of Luis Suarez to generate spaces for his teammates, and unravels the capacity of Lionel Messi to occupy spaces of value with smooth movements along the field, among many other characteristics. Evaluating space occupation and generation opens the door for new research on off-ball dynamics that can be applied in specific matches and situations, and directly integrated into coaches analysis. This information can be used not only to better evaluate players’ contributions to their teams but also to improve their positioning and movement through coaching, providing a key competitive advantage in a complex and dynamic sport.
Article
Full-text available
Background Small-Sided and Conditioned Games are characterised by modifications of field dimensions, number of players, rules of the game, manipulations used to shape the key task constraints that performers need to satisfy in practice. Evidence has already demonstrated the importance of designing practice to enhance understanding of tactical behaviours in football, but there is a lack of information about how coaches can manipulate task constraints to support tactical learning. Objective To investigate which task constraints have been most often manipulated in studies of SSCGs; and what impact each manipulation had on emerging tactical behaviours, technical–tactical actions, and positional relationships between players. Methods PubMed, Web of Science, Scielo, and Academic Google databases were searched for relevant reports without time limits. The criteria adopted for inclusion were: a) studies performed with football players; b) studies that included SSCGs as an evaluation method; c) studies that investigated tactical behaviours in SSCGs; and d), articles in English and Portuguese. Results The electronic database search included 24 articles in the review. Of these, five manipulated field dimensions, six manipulated number of players involved, five manipulated field dimensions and number of players, five used different scoring targets, two altered the number of players and scoring target, and one manipulated the number of players, field dimension, and scoring target. Conclusion Among the task constraints analyzed in this systematic review, manipulation of number of players and playing field dimensions concomitantly occurred most frequently.
Article
Full-text available
The study aimed to compare footballers’ performances when playing with teammates and opponents from the same age group with performances when playing with teammates and opponents of different age groups. Three football matches were played: i) under-15 (U15) players played with each other; ii) under-17 (U17) players played with each other; and iii) players under the age of 15 and 17 played with each other in two equivalent mixed age teams. The players’ physical performance was measured using the distances covered at different speed categories and tactical behaviour was assessed using several positioning-derived variables. The results showed that, when playing in the mixed age condition, the U15 players increased the distance covered in sprinting intensity (18.1%; ±21.1%) and the U17 players increased the distance covered in jogging zones (6.8%; ±6.5%). The intra-team movement synchronization in longitudinal and lateral displacements was higher when U15 players confronted peers of the same age, in the first half (-13.4%; ±2.0%, -20.3%; ±5.7% respectively), and when U17 players confronting the mixed group, in both halves (-16.9%; ±2.5%, 9.8%; ±4.0% and 7.9%; ±5.7%, 10.6% ±4.4%, respectively). The differences between age groups and the mixed condition may be connected with the level of players’ tactical expertise and adaptive positioning according to the dynamic environmental information. In general, these results suggest that mixing the age groups may be useful to promote a wider range of training session stimuli in these young football players.
Article
This study examined the effects of induced mental and muscular fatigue on soccer players’ physical activity profile and collective behavior during small-sided games (SSG). Ten youth soccer players performed a 5vs5 SSG under three conditions: a) control, playing without any previous activity; b) muscular fatigue, playing after performing a repeated change-of-direction task; c) mental fatigue, playing after completing a 30 min Stroop color-word task. Players’ po- sitional data was used to compute time-motion and tactical-related variables. The muscular fa- tigue condition resulted in lower distances covered in high speeds (∼27%, 0.3; ± 0.5) than the control condition. From the tactical perspective, the muscular fatigue condition resulted in lower distance between dyads and players spent ∼7% more time synchronized in longitudinal dis- placements than the control condition (0.3; ± 0.3). Additionally, players spent ∼14% more time synchronized with muscular fatigue than with mental fatigue (0.7; ± 0.3). The mental fatigue condition resulted in a very likely more predictable pattern in the distance between dyads than in muscular fatigue condition (0.4; ± 0.2). Also, the mental fatigue possibly decreased the teams’ stretch index when compared with control (0.2; ± 0.3) and likely increased compared with muscular fatigue (0.5; ± 0.5). The better levels of longitudinal synchronization after muscular fatigue, might suggest the usage of tactical-related tasks after intense exercise bouts. The lower physical performance and time spent longitudinally synchronized after mental fatigue, should alert to consider this variable before matches or training activities that aim to improve collective behavior.