Conference PaperPDF Available

Who is going to get hurt? Predicting injuries in professional soccer

Authors:

Abstract and Figures

Injury prevention has a fundamental role in professional soccer due to the high cost of recovery for players and the strong influence of injuries on a club's performance. In this paper we provide a predic-tive model to prevent injuries of soccer players using a multidimensional approach based on GPS measurements and machine learning. In an evo-lutive scenario, where a soccer club starts collecting the data for the first time and updates the predictive model as the season goes by, our approach can detect around half of the injuries, allowing the soccer club to save 70% of a season's economic costs related to injuries. The proposed approach can be a valuable support for coaches, helping the soccer club to reduce injury incidence, save money and increase team performance.
Content may be subject to copyright.
Who is going to get hurt?
Predicting injuries in professional soccer
Alessio Rossi1, Luca Pappalardo2,3, Paolo Cintia2,3,
Javier Fernandez4, F. Marcello Iaia1, and Daniel Medina4
1Department of Biomedical Science for Health, University of Milan, Italy
alessio.rossi2@gmail.com
2Department of Computer Science, University of Pisa, Pisa, Italy
lpappalardo@di.unipi.it
3ISTI-CNR, Pisa, Italy
4Sports Science and Health Department, Football Club Barcelona, Spain
Abstract. Injury prevention has a fundamental role in professional soc-
cer due to the high cost of recovery for players and the strong influence
of injuries on a club’s performance. In this paper we provide a predic-
tive model to prevent injuries of soccer players using a multidimensional
approach based on GPS measurements and machine learning. In an evo-
lutive scenario, where a soccer club starts collecting the data for the first
time and updates the predictive model as the season goes by, our ap-
proach can detect around half of the injuries, allowing the soccer club to
save 70% of a season’s economic costs related to injuries. The proposed
approach can be a valuable support for coaches, helping the soccer club
to reduce injury incidence, save money and increase team performance.
Keywords: sports analytics, data science, machine learning, sports sci-
ence, predictive analytics.
1 Introduction
Injuries are an important issue in professional soccer, as they can negatively
affect team performance and represent a remarkable expense for soccer clubs.
The cost associated with the process of recovery and rehabilitation for a player
is often considerable, especially in terms of medical care and missed earnings
from merchandising [1]. It has been observed that injuries in Spain cause in
average around 16% of season absence by players, corresponding to a total cost
estimation of 188 million euros just in one season [2]. Hence, it is not surprising
that injury prediction is attracting a growing interest from soccer managers, who
are interested in intervening with appropriate actions to reduce the likelihood of
injuries of their players. Due to its importance for a club’s economy and success,
a big effort has been put in the sports science literature on investigating injury
prediction in professional soccer [10–12]. A major limitation of existing studies
is that they follow a monodimensional approach, i.e., they use just one variable
at a time to estimate injury risk thus not fully exploiting the complex patterns
II
underlying measurable aspects of soccer performance. Moreover, in these works
statistical modeling is used mainly to quantify the relation between the chosen
variable and injury likelihood, while an evaluation of the predictive power of a
player’s performance is still missing [8, 6].
In this paper, we propose a data-driven, multidimensional approach to in-
jury prediction, considered as the problem of forecasting whether or not a player
will get injured in the next training session or official game, given his recent
training workload. Our approach is based on automatic data collection through
standard Electronic Performance and Tracking Systems (EPTS) [4,7, 5, 6], and
it is intended as a supporting tool to the decision making of soccer managers and
coaches. In the first stage of our study, we collect data about training workload
of players through GPS devices, covering half of a season of a professional soccer
club. After a preprocessing task, we extract from the data a set of features used
in sports science to describe aspects of training workload, and we enrich them
with information about all the injuries which happen during the half season. We
found that injuries can be successfully predicted with a small set of three vari-
ables: the presence of recent previous injuries, high metabolic load distance and
sudden decelerations. We investigate a real-world scenario where the classifiers
are updated while new training workload and injury data become available as
the season goes by. The machine learning approach can detect more than half of
the injuries during the season, indicating that by using our predictor the soccer
club could have been saved 70% of injury-related costs.
2 Related Work
Several studies performed by Gabbett et al. [13–18, 21] show that muscular in-
juries are to some extent preventable. In rugby, they find that a player has a high
injury risk when his workload is above a certain threshold. The same results are
observed by Hulin et al. [22] and Ehrmann et al. [11] for cricket players and soc-
cer players, respectively. In particular, all these studies assess the ratio between
acute workload (i.e., the average workload in the last 7 days) and chronic work-
load (i.e., the average workload in the last 28 days), defining specific thresholds
to detect players who could incur in a injury in the future training sessions.
The “monotony session load”, i.e., the ratio between the mean and the stan-
dard deviation of the session load is widely used in literature. In skating, Foster
et al. [23] find that when the session load outweighs a skater’s ability to fully
recover before the next session, the skater suffers from the so-called “overtrain-
ing syndrome”, a condition that can cause injury [23]. In basketball, Anderson
et al. [18] find a correlation between injury risk and monotony session load.
In soccer, Brink et al. [24] observe that injured players record higher values of
monotony in the week preceding the injury than non-injured players.
Some studies also show that technical-tactical performance during official
matches can affect the players’ physical fit. Talukder et al. [25] propose a classifier
able to predict 19% of the injuries occurred in NBA using the players’ technical-
tactical performance. They show that the most important features for injury
III
prediction in basket are the average speed, the number of past competitions
played, the average distance covered, the number of minutes played to date and
the average field goals attempted.
From the literature, it is clear that all injury prediction studies for soccer
suffer from a major limitation: they investigate the correlation between a single
aspect of training workload and injury likelihood but they do not construct any
predictor as a tool to make predictions and prevent injuries. Therefore, to the
best of our knowledge, there is no quantification of the potential of predictive
analytics in preventing injuries in professional soccer.
3 Dataset preparation
3.1 Data collection and feature extraction
During the season 2013/2014 we monitor the position of twenty-six professional
football players competing in the Italian Serie B during 23 training sessions –
from January 1st to May 31st – using a portable non-differential 10 Hz global po-
sition system (GPS) integrated with 100 Hz 3-D accelerometer, a 3-D gyroscope,
a 3-D digital compass (STATSports Viper, Northern Ireland). Each player wore
a tight vest where the receiver was placed between their scapulae, and every
player wore his own GPS device for each training session. We recorded a total
of 954 individual training sessions during the 23 weeks and extracted from the
data a set of training workload indicators through the software package Viper
Version 2.1 (STATSports 2014). From every training session we extracted 12
features describing kinematic, metabolic and mechanical aspects of the individ-
uals’ trainings. For each player, we also collected information about age, weight,
height and role on the field. Moreover, for each player’s training session we col-
lected information about the play time in the official game before the training
session and the number of official games played before the training session. Table
1 provides a description of the considered features.
The club’s medical staff recorded all the non-contact injuries occurred during
23 weeks. A non-contact injury is defined as any tissue damage sustained by a
player that causes absence in next football activities for at least the day after
the day of the onset. In this dataset there are 21 non-contact injuries in total.
3.2 Feature engineering and dataset construction
We construct four training sets transforming the 12 workloads features described
in Table 1 in the following way:
1. Workload Features set (WF) – we consider the training workloads in the
6 most recent training sessions by using an exponential weighted moving
average (EWMA). We also compute the EWMA of feature PI with a span
equal to 6 (PIWF) in order to take into account both the number of a player’s
previous injuries and their temporal distance to the current training sessions.
PIWF = 0 indicates that the player never got injured in the past; PIWF >0
IV
dTOT Distance in meters covered during the training session
dHSR Distance in meters covered above 5.5m/s
dMET Distance in meters covered at metabolic power
dHML Distance in meters covered by a player with a Metabolic Power
is above 25.5W/Kg
dHML/m Average dHML per minute
dEXP Distance in meters covered above 25.5W/Kg and below
19.8Km/h
Acc2Number of accelerations above 2m/s2
Acc3Number of accelerations above 3m/s2
Dec2Number of decelerations above 2m/s2
Dec3Number of decelerations above 3m/s2
DSL Total of the weighted impacts of magnitude above 2g. Impacts
are collisions and step impacts during running
FI Ratio between DSL and speed intensity
Age age of players
BMI Body Mass Index: ratio between weight (in kg) and the square
of height (in meters)
Role Role of the player
PI Number of injuries of the players before each training session
Play time Minutes of play in previous games
Games Number of games played before each training session
Table 1. Description of the training workload features extracted from GPS data and
the players’ personal features collected during the study.
indicates that the player got injured at least once in the past; PIWF >1
indicates that the player got injured more than once in the past.
2. Acute:Chronic Workload Ratio features set (ACWR) – here we consider the
standard de facto used in sports science to estimate injury likelihood [9] and
compute the ratio between the 6 most recent training sessions by the EWMA
and the EWMA of the previous 28 days.
3. Mean over Standard deviation Workload Ratio (MSWR) – we consider an-
other way proposed in literature to estimate injury likelihood [10] and com-
pute the ratio between the mean and the standard deviation of the training
workloads in the 6 most recent days. The higher the MSWR of a player, the
lower is the variability of his workloads during the training week.
4. we build a dataset based on the union of the three feature sets described
above (WF, ACWR and MSWR) and the personal features in Table 1. This
dataset consists of a vector of 42 features and the injury label indicating
whether or not the player gets injured in next match or training session.
Every training set consists of 954 examples (i.e., individual training sessions)
corresponding to 80 collective training sessions.
V
4 Experiments
First of all, we perform a feature selection process based on a Decision Tree Clas-
sifier in order to reduce the dimensionality of the feature space and consequently
the risk of overfitting. We use recursive feature elimination with cross-validation
(RFECV) to select the best set of features able to predict injuries in our dataset.
On the new training dataset derived from the feature selection, we train
a Decision Tree classifier (DT) and a Random Forest Classifier (ETRFC).5In
particular, we investigate a scenario where the club starts to record data at the
beginning of a season and trains the classifier as the season goes by. Hence,
we proceed from the first training week (w1) to the most recent one (wi-1). At
training week wiwe train the classifiers on weeks w1. . . wiand evaluate their
ability to predict injuries on week wi+1.
Considering injury prediction as a binary classification problem where the
injury class (1) is the positive class, we measure the goodness of the classifiers
week by week in terms of precision, recall, F1-score and AUC [27]. Precision
indicates the fraction of examples that the classifier correctly classifies over the
number of all examples the classifier assigns to that class. Recall indicates the
ratio of examples of a given class correctly classified by the classifier, while F1-
score is the harmonic mean of precision and recall. AUC (Area Under the Curve)
is the probability that a classifier will rank a randomly chosen positive instance
higher than a randomly chosen negative one (assuming “positive” ranks higher
than “negative”). An AUC close to 1 represents an accurate classification, while
an AUC close to 0.5 represents a random classification.
We compare the goodness of DT and ETRFC with four baselines. Baseline B1
randomly assigns a class to an example by respecting the distribution of classes.
Baseline B2always assigns the majority class (i.e., class 0, a non-injury), while
baseline B3always assigns the minority class (i.e., class 1, injury). Baseline B4is
a classifier which assigns class 1 (injury) if the exponentially weighted average of
variable PI >0, and 0 (no injury) otherwise. Finally, we estimate the economic
cost of the injuries for the considered soccer club by using the methodology
suggested by Fernandez et al. [3], i.e., we multiply the number of days of “work”
absence by the minimal legal salary per day in the Italian Serie B.
4.1 Results
Just 3 features out of 42 are selected by the feature selection task: PI(WF),d(MSWR)
HML
and DEC(WF)
2. Feature PI(WF) reflects the temporal distance between a player’s
current training session and the coming back to regular training of a player who
got injured in the past. Features d(MSWR)
HML and DEC(WF)
2are two training features
indicating high metabolic load and sudden decelerations, respectively. We ob-
serve that 42% of the injuries detected by the classifier happened immediately
after the coming back to regular training of players who got injured in the past,
and are characterized by specific values of d(MSWR)
HML and DEC(WF)
2, which indicate
5We use the Python package scikit-learn to train and test all the classifiers.
VI
the metabolic workload variability and the average of sudden decelerations in
the previous 6 days, respectively.
Figure 1 shows the evolution of the F1-score of DT, ETRFC and the four
baselines (i.e., B1, . . . , B4) as the season goes by. Due to the low number of
injury examples, the classifiers have a poor predictive performance at the begin-
ning of the season and miss many injuries (black crosses in Figure 1). However,
the predictive ability improves by time and the classifiers predict most of the
injuries in the second half of the season (red crosses in Figure 1). The cumula-
tive performance of the classifiers is highly affected by the initial period, where
injury examples are scarce. This suggests that trying to prevent injuries since
the beginning could not be a good strategy since classification performance can
be initially poor due to data scarcity. An initial period of data collection, whose
length depends on the needs and strategy of the club, is needed in order to
collect the adequate amount of data, and only then reliable classifiers can be
trained on the collected data. Regarding this aspect, in our dataset, we observe
that the performance of the classifiers stabilizes after 16 weeks of data collection
(Figure 1). In our case, a reasonable strategy could be to use the classifiers for
injury prevention starting from the 16th week. This suggests that the considered
club could effectively use the classifiers trained on data from a season to perform
injury prediction since the first session of the second half of the current season.
We observe that DT is the best classifier in this scenario detecting more
than half of the injuries (11 injuries out of 21), resulting in a cumulative F1-
score = 0.45 (Figure 1).6Table 2 shows the classification reports of the two
classifiers and the four baseline at the end of the season. We find that DT is
significantly better than the baselines (Table 2). At the end of the season, DT
detects 58% of the injuries (recall = 0.58) and it correctly predicts 38% of the
cases classified as injuries (precision = 0.38). Although the machine learning
approach significantly adds predictive power with respect to existing methods,
there is still room for improvement. Soccer clubs are indeed interested in an
algorithm with high precision to reduce “false alarms”, which could negatively
affect a team’s performance due to the forced absence of crucial players.
We also train DT, ETRFC and the baselines using the entire feature set,
i.e., without performing any feature selection process. These classifiers perform
slightly worse than the classifiers build on the three selected features (preci-
sion, recall, F1-score and AUC are 0.36, 0.52, 0.43, and 0.74, respectively). To
understand if the role of a player affects injury likelihood, we train distinct clas-
sifiers for every role (defender, midfielder, forwards) and find that they perform
much worse that the classifiers trained without distinguishing between the roles
(precision, recall, f1-score and AUC are 0.01, 0.04, 0.03 and 0.51, respectively).
Figure 2 shows the distribution of the number of days of work absence
recorded during the season. The number of work days of absence due to in-
juries is 139, i.e., 6% of the working days. Generally, a player returns to regular
6DT has the following meta-parameters: max depth = 3, minimum samples for a leaf
= 2, minimum sample split = 11. For all the other meta-parameters we use default
values suggested by sciki-learn (see documentation: http://bit.ly/1T5sf92).
VII
Fig. 1. Performance of classifiers in the evolutive scenario. We plot the cumu-
lative F-score of the classifiers and the baselines, week by week. For every week we
highlight in red the number of injuries detected by DT up to that week.
model class prec rec F1 AUC
0 0.98 0.99 0.99
DT 1 0.38 0.58 0.45 0.76
0 1.00 0.98 0.98
ET RF C 1 0.35 0.57 0.43 0.71
0 0.98 0.71 0.83
B41 0.04 0.20 0.12 0.56
B1
0 0.98 0.98 0.98
1 0.06 0.05 0.05 0.51
B2
0 0.98 1.00 0.99
1 0.00 0.00 0.00 0.51
B3
0 0.00 0.00 0.00
1 0.02 1.00 0.04 0.51
Table 2. Performance of classifiers compared to baselines. We report the per-
formance of classifiers DT and ETRFC in terms of precision, recall, F1 and AUC at
the end of the season. We compare the classifier with four baseline B1,...,B4.
physical activity within 5 days (i.e., 15 times out of 21 injuries), while only 6
times a player needed more than 5 days to recover. We estimate a (minimum)
total cost related to injuries of 11,583 euros (139x83 euros = days of absence
xminimal legal salary per day) corresponding to 3.81% of the salary cost of
the soccer club (from January 1st to May 31st the club spent 303,750 euros for
the players’ salary). By using DT to predict injuries as the season goes by, the
VIII
soccer club could had been able to prevent 11 injuries and save 8,300 euros, 70%
of the economic costs related to injuries during the season (100x83 euros = day
of absence xminimal legal salary per day).
Fig. 2. Distribution of the number of days of work absence after an injury.
5 Conclusion
This study presents a method to predict injuries of soccer players. Athletic train-
ers, coaches and physiotherapists can use our method to make decisions about
whether or not to stop a player in next official match, thus eventually prevent-
ing his injury, improving team performance and reducing the club’s costs. The
proposed study provides an example of how machine learning can be used to
solve a difficult problem in sports analytics such as predicting injuries. An en-
largement of the dataset to include different teams, which is planned by the
authors of this paper, might allow to build a more general and robust algorithm
for injury forecasting. With more injury cases we could transform the problem
from a binary classification (injury/no-injury) to a multi-class classification or
a regression problem, where information about the typology or the severity of
the injuries can be exploited to produce more diverse predictions. Finally, due to
its flexibility, our multidimensional approach can be easily extended to predict
injuries in other professional sports, like rugby [13] and cycling [28].
Acknowledgements. This work has been partially funded by the EU project
SoBigData grant n. 654024.
IX
References
1. Lehmann EE, Schulze GG. What Does it Take to be a Star? – The Role of Perfor-
mance and the Media for German Soccer Players. Applied Economics Quarterly
54:1, pp. 59-70, doi: 10.3790/aeq.54.1.59, 2008.
2. Fern´andez-Cuevas I., Gomez-Carmona P, Sillero-Quintana M, Noya-Salces J,
Arnaiz-Lastras J, Pastor-Barr´on A. Economic costs estimation of soccer injuries
in first and second Spanish division professional teams. 15th Annual Congress of
the European College of Sport Sciences ECSS, 23th 26th june. 2010.
3. Fernndez I, Gomez PM, Sillero M, Noya J, Arnaiz J, Pastor A. Economic costs
estimation of soccer injuries in first and second spanish division professional teams.
15th Annual Congress of the European College of Sport Sciences ECSS, At Antalya
(Turkey), 2010.
4. Gudmundsoon H, Horton M. Spatio-Temporal Analysis of Team Sports - A Survey.
CoRR: abs/1602.06994, 2016.
5. Cintia P., Rinzivillo S., Pappalardo L. A network-based approach to evaluate the
performance of football teams. In Proceedings of the Machine Learning and Data
Mining for Sports Analytics workshop (MLSA’15), ECML/PKDD 2015, Porto,
Portugal.
6. Pappalardo L., Cintia P. Quantifying the relation between performance and success
in soccer. eprint arXiv:1705.00885, 2017.
7. Cintia P., Pappalardo L., Pedreschi D., Giannotti F., Malvaldi M. The harsh rule
of the goals: Data-driven performance indicators for football teams. 2015 IEEE
International Conference on Data Science and Advanced Analytics (DSAA), pp.
1–10, doi:10.1109/DSAA.2015.7344823, 2015
8. Cintia P., Rinzivillo S., Pappalardo L. A network-based approach to evaluate the
performance of football teams, Proceedings of the Machine Learning and Data
Mining for Sports Analytics workshop (MLSA’15), ECML/PKDD 2015, 2015
9. Murray NB, Gabbett TJ, Townshend AD, Blanch P. Calculation acute:chronic
workload ratios using exponential weighted moving averages provides a more sen-
sitive indicator of injury likelihood than rolling averages. Br J Sports Med. 2016;
bjsports-2016-097152.
10. Brink MS1, Visscher C, Arends S, Zwerver J, Post WJ, Lemmink KA. Monitoring
stress and recovery: new insights for the prevention of injuries and illnesses in elite
youth soccer players. Br J Sports Med. 2010;44: 809-15.
11. Ehrmann FE, Duncan CS, Sindhusake D, Franzsen WN, Greene DA. GPS and
injury prevention in professional soccer. J Strength Cond Res. 2015;30:306-307.
12. Venturelli M, Schena F, Zanolla L, Bishop D. Injury risk factors in young soccer
players detected by a multivariate survival model. Journal of Science and Medicine
in Sport. 2011;14:293298.
13. Gabbett TJ. Reductions in pre-season training loads reduce training injury rates
in rugby league players. British Journal of Sports Medicine. 2004;38: 743749.
14. Gabbett TJ, Jenkins DG. Relationship between training load and injury in pro-
fessional rugby league players. Journal of Science and Medicine in Sport. 2011;14:
204209.
15. Gabbett TJ. Influence of training and match intensity on injuries in rugby league.
Journal of Sports Sciences. 2004;22(5):409-417.
16. Gabbett TJ. The development and application of an injury prediction model for
noncontact, soft-tissue injuries in elite collision sport athletes. The Journal of
Strength & Conditioning Research. 2010;24(10):2593-2603.
X
17. Gabbett TJ, Domrow N. Relationships between training load, injury, and fitness
in sub-elite collision sport athletes. Journal of Sports Sciences. 2007;25(13):1507-
1519.
18. Anderson L, Triplett-McBride T, Foster C, Doberstein S, Brice G. Impact of
training patterns on incidence of illness and injury during a women’s collegiate
basketball season. The Journal of Strength & Conditioning Research. 2003; 17:
734738.
19. Gabbett TJ, Ullah S. Relationship between running loads and soft-tissue injury
in elite team sport athletes. J Strength Cond Res. 2012;26: 953960.
20. Rogalski B, Dawson B, Heasman J, Gabbett TJ. Training and game loads and
injury risk in elite Australian footballers. J Sci Med Sport. 2013;16: 499503.
21. Gabbett TJ. The training-injury prevention paradox: should athletes be training
smarter and harder? Br J Sports Med. 2016; bjsports-2015-095788.
22. Hulin BT, Gabbett TJ, Blanch P, Chapman P, Bailey D, Orchard JV. Spikes
in acute workload are associated with increased injury risk in elite cricket fast
bowlers. Br J Sports Med. 2014;48:708-712.
23. Foster C. Monitoring training in athletes with reference to overtraining syndrome.
Med Sci Sports Exerc. 1998;30:11641168.
24. Brink MS1, Visscher C, Arends S, Zwerver J, Post WJ, Lemmink KA. Monitoring
stress and recovery: new insights for the prevention of injuries and illnesses in elite
youth soccer players. Br J Sports Med. 2010;44: 809-15.
25. Talukder H, Vincent T, Foster G, Hu C, Huerta J, Kumar A, et al. Preventing
in-game injuries for NBA players. MIT Sloan Analytics Conference. Boston; 2016.
26. Kirkendall D.T., Dvorak J. Effective Injury Prevention in Soccer. The physician
and sportsmedicine, 38:1, doi: http://dx.doi.org/10.3810/psm.2010.04.1772, 2010.
27. Tan P.-N., Steinbach M., Kumar V. Introduction to Data Mining. Addison-Wesley
Longman Publishing Co., Inc., Boston, MA, USA, 2005.
28. Cintia P., Pappalardo L., Pedreschi D. “Engine Matters”: A First Large Scale
Data Driven Study on Cyclists’ Performance. 13th IEEE International Conference
on Data Mining Workshops, ICDM Workshops, TX, USA, December 7-10, 2013,
doi = 10.1109/ICDMW.2013.41.
... The increase in studies investigating training load as a risk factor for injury has been accompanied by an increase in studies exploring injury prediction (5,11,12,14,22,23). Injury prediction models have been evaluated and compared using metrics, such as sensitivity, specificity or area under the receiver operator characteristic curve (AUC) (5,11,12,14,22,23). ...
... The increase in studies investigating training load as a risk factor for injury has been accompanied by an increase in studies exploring injury prediction (5,11,12,14,22,23). Injury prediction models have been evaluated and compared using metrics, such as sensitivity, specificity or area under the receiver operator characteristic curve (AUC) (5,11,12,14,22,23). These scoring metrics are designed to evaluate binary predictions (i.e., injury or no-injury) and look at how often the model predictions match the actual outcomes (24). ...
... The AUC measures the ability of the model to discriminate between the two outcome classes (injury and no-injury). It has been used as a way to select the best performing injury prediction model in studies comparing multiple methods (11,12,14,23). Cross-validation (10-fold) was used to obtain estimates of AUC for each simulated study. ...
... The field of sports analytics has been a research area for more than 100 years [1,2]. It covers a wide range of topics, reaching from directly visible facts (e.g., counting ball contacts during a match) and strategical game analysis (e.g., in baseball [3,4] and football [5]) to injury prediction and prevention (e.g., in football [6] and soccer [7]). The literature shows that research topics often are focused on similar outcomes but improve in quality based on the available techniques at the time [2]. ...
... The questionnaire was designed with a seven-point Likert-type scale [206,207]. The answers were given in a range from (1): 'strongly disagree' to (7): 'strongly agree' . For the evaluation, the answers of all participants were combined per question. ...
Thesis
Full-text available
Sports analytics research has major impact on the development of innovative training methods and the broadcast of sports events. This dissertation provides algorithms for both kinematic analysis and performance interpretation based on unobtrusively obtained measurements from wearable sensors. Its main focus is set on the processing of 3D-orientation features and the exploration of their potential for sports analytics. The proposed algorithms are described and evaluated in five exemplary sports. In scuba diving, rowing and ski jumping, the 3D-orientation of the body/boat/skis is determined and further processed to analyze and visualize the motion behavior. In snowboarding and skateboarding, the board orientation is calculated and processed for motion visualization and machine learning. Board sport tricks are automatically detected and subsequently classified for trick category and type. The methods of this work were already partially applied for TV broadcast of international competitions (e.g., Olympics 2018). Additionally, they can support sports science research for establishing thorough investigations and innovative training methods.
... The availability of massive data portraying soccer performance has facilitated recent advances in soccer analytics. Rossi et al. [42] proposed an innovative machine learning approach to the forecasting of non-contact injuries for professional soccer players. In [3], we can find the definition of quantitative measures of pressing in defensive phases in soccer. ...
... This exploratory examines the factors influencing sports success and how to build simulation tools for boosting both individual and collective performance. Furthermore, this exploratory describes performances employing data, statistics, and models, allowing coaches, fans, and practitioners to understand (and boost) sports performance [42]. Explainable machine learning. ...
Article
Full-text available
This paper shows data science’s potential for disruptive innovation in science, industry, policy, and people’s lives. We present how data science impacts science and society at large in the coming years, including ethical problems in managing human behavior data and considering the quantitative expectations of data science economic impact. We introduce concepts such as open science and e-infrastructure as useful tools for supporting ethical data science and training new generations of data scientists. Finally, this work outlines SoBigData Research Infrastructure as an easy-to-access platform for executing complex data science processes. The services proposed by SoBigData are aimed at using data science to understand the complexity of our contemporary, globally interconnected society.
... This information can be subsequently used to evaluate and improve current practices and decision-making (Buchheit 2017;Robertson et al. 2017;Ward et al. 2019). For example, training data collected by performance staff through GPS have previously been illustrated to help form a range of measures that may identify injury risks (Rossi et al. 2017) and changes in physical qualities . Though the potential impact of collecting training data is becoming clearer, further research is required to understand specifically whether this feedback is utilised to support coach decision-making. ...
... This suggestion is further supported by all stakeholders deeming training data important to the planning process (Table 2). While research exists showing a dose-response relationship between training load and injury risk (Rossi et al. 2017), research examining training load and fitness measures reports little usefulness. For instance, unclear associations between high-intensity running distances and changes in intermittent running capacity were reported by professional soccer players across pre-season (Taylor et al. 2018;Rabbani et al. 2019). ...
Article
The aim of the study was to examine the perceptions of training data feedback from key stakeholders within the coaching process of professional soccer clubs. A survey assessed the importance of training data towards reflection and decision-making, potential barriers and player preferences. 176 participants comprising coaches, players and performance staff completed the survey. The training data coaches most commonly identified as wanting to see to support reflection was ‘high-intensity’ actions and variables recognised by the coach as ‘work rate/intensity’. All stakeholders reported training data as at least somewhat important in guiding their coaches’ practices, with lack of a common goal and high volumes of information being the main barriers to effective feedback of training data. Players deemed feedback as positive to changing their behaviour, with total distance, high-speed running and sprint distances as the information they would most like to see. It would be likely to be looked at via message or pinned up in the changing room. Training data is seen as an impactful and effective tool for use by all key stakeholders. Despite this, its use can be optimised by increasing opportunities for informal reflection, using less information, and improving communication of data.
... This information can be subsequently used to evaluate and improve current practices and decision-making (Buchheit 2017;Robertson et al. 2017;Ward et al. 2019). For example, training data collected by performance staff through GPS have previously been illustrated to help form a range of measures that may identify injury risks (Rossi et al. 2017) and changes in physical qualities . Though the potential impact of collecting training data is becoming clearer, further research is required to understand specifically whether this feedback is utilised to support coach decision-making. ...
... This suggestion is further supported by all stakeholders deeming training data important to the planning process (Table 2). While research exists showing a dose-response relationship between training load and injury risk (Rossi et al. 2017), research examining training load and fitness measures reports little usefulness. For instance, unclear associations between high-intensity running distances and changes in intermittent running capacity were reported by professional soccer players across pre-season (Taylor et al. 2018;Rabbani et al. 2019). ...
Article
Full-text available
Nonnato, A, Hulton, AT, Brownlee, TE, and Beato, M. The effect of a single session of plyometric training per week on fitness parameters in professional female soccer players. A randomized controlled trial. J Strength Cond Res XX(X): 000-000, 2020-As the interest and popularity of female soccer has increased over the last few decades, there still lacks research conducted with the elite population, specifically ecological training interventions during the competitive season. Therefore, the aim of this study was to compare the effectiveness of 12 weeks (undertaken once a week) of plyometric (PLY) training on physical performance in professional female soccer players during the season. Using a randomized controlled trial design, 16 players were included in the current study (mean ± SD; age 23 ± 4 years, weight 60.3 ± 4.9 kg, height 167 ± 3.7 cm) and randomized in PLY (n = 8) and Control groups (CON, n = 8), respectively. Squat jump (SJ), counter movement jump (CMJ), long jump (LJ), single-leg triple jump distance test (triple jump test), changes of direction 505 test (505-COD), and sprint 10 and 30 m were performed before and after 12 weeks of PLY training. Significant within-group differences were found in triple jump test dominant (p = 0.031, effect size [ES] = moderate) and nondominant limb (p = 0.021, ES = moderate) and sprint 10 m (p = 0.05, ES = large), whereas the CON did not report any positive variation. However, neither group reported significant variation in SJ, CMJ, LJ, 505-COD, and sprint 30 m (underlining the difficulties in obtain meaningful variation in season). These findings have strong practical applications because this study showed for the first time that a single session a week of plyometric training can significantly increase sport-specific fitness parameters in professional female soccer players during the season.
... Unfortunately, out-of-possession movements are not described in soccer-logs, making it difficult to assess important aspects such as pressing [1] or the ability to create spaces [5]. PlayeRank can be easily extended by making the individual performance extraction module able to extract features from other data sources such as video tracking data [15] and GPS data [42,43], which provide a detailed description of the spatio-temporal trajectories generated by players during a match. ...
Article
Full-text available
The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this article, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players’ evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank—i.e. searching players and player versatility—showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.
... Memmert et al. (2016) and Gudmundsson and Horton (2017) provide a general overview of positional data applications in team sports. Other interesting applications include pass quality evaluation (Brooks et al. 2016) or injury prediction (Rossi et al. 2017). Taki and Hasegawa (2000) propose a movement model that is based on a player's current speed, her direction, and an acceleration profile along different directions. ...
Article
Full-text available
Coordinated movements of players are key to success in team sports. However, traditional models for player movements are based on unrealistic assumptions and their analysis is prone to errors. As a remedy, we propose to estimate individual movement models from positional data and show how to turn these estimates into accurate and realistic zones of control. Our approach accounts for characteristic traits of players, scales with large amounts of data, and can be efficiently computed in a distributed fashion. We report on empirical results.
... Nowadays, the data revolution has the potential to rapidly change this scenario, thanks to new sensing technologies that provide high-fidelity data streams extracted from every game, such as the spatio-temporal trajectories of players [10,23,24] and all the events that occur on the field [25,5,4]. Recently, several studies relied on these data to propose metrics which quantify specific aspects of soccer performance [6,14,26,16,18,19,3]. ...
Article
Full-text available
The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6,000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team's position in a competition's final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover we find that, while victory and defeats can be explained by the team's performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data, i.e. excluding the goals scored, exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking (the PC ranking) which is close to the actual ranking, suggesting that a complex systems' view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.
Article
Full-text available
The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6,000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team's position in a competition's final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover we find that, while victory and defeats can be explained by the team's performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data, i.e. excluding the goals scored, exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking (the PC ranking) which is close to the actual ranking, suggesting that a complex systems' view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.
Article
Full-text available
Objective: To determine if any differences exist between the rolling averages and exponentially weighted moving averages (EWMA) models of acute:chronic workload ratio (ACWR) calculation and subsequent injury risk. Methods: A cohort of 59 elite Australian football players from 1 club participated in this 2-year study. Global positioning system (GPS) technology was used to quantify external workloads of players, and non-contact 'time-loss' injuries were recorded. The ACWR were calculated for a range of variables using 2 models: (1) rolling averages, and (2) EWMA. Logistic regression models were used to assess both the likelihood of sustaining an injury and the difference in injury likelihood between models. Results: There were significant differences in the ACWR values between models for moderate (ACWR 1.0-1.49; p=0.021), high (ACWR 1.50-1.99; p=0.012) and very high (ACWR >2.0; p=0.001) ACWR ranges. Although both models demonstrated significant (p<0.05) associations between a very high ACWR (ie, >2.0) and an increase in injury risk for total distance ((relative risk, RR)=6.52-21.28) and high-speed distance (RR=5.87-13.43), the EWMA model was more sensitive for detecting this increased risk. The variance (R(2)) in injury explained by each ACWR model was significantly (p<0.05) greater using the EWMA model. Conclusions: These findings demonstrate that large spikes in workload are associated with an increased injury risk using both models, although the EWMA model is more sensitive to detect increases in injury risk with higher ACWR.
Article
Full-text available
Background: There is dogma that higher training load causes higher injury rates. However, there is also evidence that training has a protective effect against injury. For example, team sport athletes who performed more than 18 weeks of training before sustaining their initial injuries were at reduced risk of sustaining a subsequent injury, while high chronic workloads have been shown to decrease the risk of injury. Second, across a wide range of sports, well-developed physical qualities are associated with a reduced risk of injury. Clearly, for athletes to develop the physical capacities required to provide a protective effect against injury, they must be prepared to train hard. Finally, there is also evidence that under-training may increase injury risk. Collectively, these results emphasise that reductions in workloads may not always be the best approach to protect against injury. Main thesis: This paper describes the 'Training-Injury Prevention Paradox' model; a phenomenon whereby athletes accustomed to high training loads have fewer injuries than athletes training at lower workloads. The Model is based on evidence that non-contact injuries are not caused by training per se, but more likely by an inappropriate training programme. Excessive and rapid increases in training loads are likely responsible for a large proportion of non-contact, soft-tissue injuries. If training load is an important determinant of injury, it must be accurately measured up to twice daily and over periods of weeks and months (a season). This paper outlines ways of monitoring training load ('internal' and 'external' loads) and suggests capturing both recent ('acute') training loads and more medium-term ('chronic') training loads to best capture the player's training burden. I describe the critical variable-acute:chronic workload ratio)-as a best practice predictor of training-related injuries. This provides the foundation for interventions to reduce players risk, and thus, time-loss injuries. Summary: The appropriately graded prescription of high training loads should improve players' fitness, which in turn may protect against injury, ultimately leading to (1) greater physical outputs and resilience in competition, and (2) a greater proportion of the squad available for selection each week.
Conference Paper
Full-text available
Sports analytics in general, and football (soccer in USA) analytics in particular, have evolved in recent years in an amazing way, thanks to automated or semi-automated sensing technologies that provide high-fidelity data streams extracted from every game. In this paper we propose a data-driven approach and show that there is a large potential to boost the understanding of football team performance. From observational data of football games we extract a set of pass-based performance indicators and summarize them in the H indicator. We observe a strong correlation among the proposed indicator and the success of a team, and therefore perform a simulation on the four major European championships (78 teams, almost 1500 games). The outcome of each game in the championship was replaced by a synthetic outcome (win, loss or draw) based on the performance indicators computed for each team. We found that the final rankings in the simulated championships are very close to the actual rankings in the real championships, and show that teams with high ranking error show extreme values of a defense/attack efficiency measure, the Pezzali score. Our results are surprising given the simplicity of the proposed indicators, suggesting that a complex systems' view on football data has the potential of revealing hidden patterns and behavior of superior quality.
Conference Paper
Full-text available
The striking proliferation of sensing technologies that provide high-fidelity data streams extracted from every game induced an amazing evolution of football statistics. Nowadays professional statistical analysis firms like ProZone and Opta provide data to football clubs, coaches and leagues, who are starting to analyze these data to monitor their players and improve team strategies. Standard approaches in evaluating and predicting team performance are based on history-related factors such as past victories or defeats, record in qualification games and margin of victory in past games. In contrast with traditional models, in this paper we propose a model based on the observation of players' behavior on the pitch. We model a the game of a team as a network and extract simple network measures, showing the value of our approach on predicting the outcomes of a long-running tournament such as Italian major league.
Conference Paper
Full-text available
Introduction: Injuries are one of the most important problems in sport. In the field of professional soccer, injuries involve, in addition to the difficult process of rehabilitation for the player, a reduced athletic performance and a great economic cost to the team or club. This study quantify the economic cost of injuries occurred into a group of the First and Second Soccer Spanish Division teams during 2008 - 2009 season. Based on this data, we make a global estimation cost of soccer injuries. Method: 16 teams from First Spanish Soccer Division and 11 from the Second one participated in this study. The injuries of the team were recorded during the 2008 - 2009 season by means of the questionnaire REINLE (1). With this data, we obtained the percentage of absence from work of each team by adding up the days that players were injured. Afterwards, we estimated the injury cost produced directly work absenteeism for each club with two methods: 1. Taking the minimum legal salary for 1st and 2nd division clubs in Spain (2). 2. Taking the percentage of the clubs real budgets used for salaries (3). Finally, we estimated the injuries economic cost by multiplying the percentage of work absenteeism by the amount of clubs budgets used for salaries. Results: The results of this study show that during the 2008 -2009 season, the 16 First Division Soccer teams registered 24.360 days of absence from work, an average of 16,23% of season absence by player. In the case of the 11 Second Division Soccer teams this amount was of 15.946 days, averaging 15,44% of the working days. Applying the two methods, we obtained: 1. Method 1 of minimum legal salaries: Average of 643.402 € per 1st division team, 330.445€ per 2nd division team, and a global estimation of 20.137.835€ of inuries minimum cost. 2. Method 2 of percentage of salaries in real budgets: With this method, we made closer estimation to the reality, obtaining an average of 7.569.786€ per 1st division club, 1.666.469€ on average for 2nd division clubs and a global estimation of 188.058.072€ on 1st and 2nd division soccer clubs in Spain during 2008-2009 season. Discussion and conclusions: In terms of economic cost, Inklaar (4) estimated that direct and indirect soccer injuries cost in Holland was at least of $US 65 millions. However, we didn’t estimate the indirect cost, and furthermore the temporal distance -from 1994 to 2010- prevent us from comparing our results directly with those of Inklaar. According to Gay de Liebana (3) and Barajas Alonso (5), the current financial situation of Spanish soccer is not affordable for the clubs. Taking into account that salaries represent the bigger part of clubs budgets -over 60%- injury prevention could be an interesting way in order to reduce financial losses., We encourage to keep researching on this field in order to improve the methodology t obtain more realistic estimation of injury cost in soccer
Article
Full-text available
To determine if the comparison of acute and chronic workload is associated with increased injury risk in elite cricket fast bowlers. Data were collected from 28 fast bowlers who completed a total of 43 individual seasons over a 6-year period. Workloads were estimated by summarising the total number of balls bowled per week (external workload), and by multiplying the session rating of perceived exertion by the session duration (internal workload). One-week data (acute workload), together with 4-week rolling average data (chronic workload), were calculated for external and internal workloads. The size of the acute workload in relation to the chronic workload provided either a negative or positive training-stress balance. A negative training-stress balance was associated with an increased risk of injury in the week after exposure, for internal workload (relative risk (RR)=2.2 (CI 1.91 to 2.53), p=0.009), and external workload (RR=2.1 (CI 1.81 to 2.44), p=0.01). Fast bowlers with an internal workload training-stress balance of greater than 200% had a RR of injury of 4.5 (CI 3.43 to 5.90, p=0.009) compared with those with a training-stress balance between 50% and 99%. Fast bowlers with an external workload training-stress balance of more than 200% had a RR of injury of 3.3 (CI 1.50 to 7.25, p=0.033) in comparison to fast bowlers with an external workload training-stress balance between 50% and 99%. These findings demonstrate that large increases in acute workload are associated with increased injury risk in elite cricket fast bowlers.
Article
Full-text available
Objectives: To examine the relationship between combined training and game loads and injury risk in elite Australian footballers. Design: Prospective cohort study. Methods: Forty-six elite Australian footballers (mean±SD age of 22.2±2.9 y) from one club were involved in a one-season study. Training and game loads (session-RPE multiplied by duration in min) and injuries were recorded each time an athlete exerted an exercise load. Rolling weekly sums and week-to-week changes in load were then modelled against injury data using a logistic regression model. Odds ratios (OR) were reported against a reference group of the lowest training load range. Results: Larger 1 weekly (>1750 AU, OR=2.44-3.38), 2 weekly (>4000 AU, OR=4.74) and previous to current week changes in load (>1250 AU, OR=2.58) significantly related (p<0.05) to a larger injury risk throughout the in-season phase. Players with 2-3 and 4-6 years of experience had a significantly lower injury risk compared to 7+ years players (OR=0.22, OR=0.28) when the previous to current week change in load was more than 1000 AU. No significant relationships were found between all derived load values and injury risk during the pre-season phase. Conclusions: In-season, as the amount of 1-2 weekly load or previous to current week increment in load increases, so does the risk of injury in elite Australian footballers. To reduce the risk of injury, derived training and game load values of weekly loads and previous week-to-week load changes should be individually monitored in elite Australian footballers.
Article
Full-text available
Although the potential link between running loads and soft-tissue injury is appealing, the evidence supporting or refuting this relationship in high-performance team sport athletes is nonexistent, with all published studies using subjective measures (e.g., ratings of perceived exertion) to quantify training loads. The purpose of this study was to investigate the risk of low-intensity (e.g., walking, jogging, total distances) and high-intensity (e.g., high acceleration and velocity efforts, repeated high-intensity exercise bouts) movement activities on lower body soft-tissue injury in elite team sport athletes. Thirty-four elite rugby league players participated in this study. Global positioning system data and the incidence of lower body soft-tissue injuries were monitored in 117 skill training sessions during the preseason and in-season periods. The frailty model (an extension of the Cox proportional regression model for recurrent events) was applied to calculate the relative risk of injury after controlling for all other training data. The risk of injury was 2.7 (95% confidence interval 1.2-6.5) times higher when very high-velocity running (i.e., sprinting) exceeded 9 m per session. Greater distances covered in mild, moderate, and maximum accelerations and low- and very low-intensity movement velocities were associated with a reduced risk of injury. These results demonstrate that greater amounts of very high-velocity running (i.e., sprinting) are associated with an increased risk of lower body soft-tissue injury, whereas distances covered at low and moderate speeds offer a protective effect against soft-tissue injury. From an injury prevention perspective, these findings provide empirical support for restricting the amount of sprinting performed in preparation for elite team sport competition. However, coaches should also consider the consequences of reducing training loads on the development of physical qualities and playing performance.
Article
This study investigated the relationship between GPS variables measured in training and gameplay and injury occurrences in professional soccer. Nineteen professional soccer players competing in the Australian Hyundai A-League were monitored for one entire season using 5Hz Global Positioning System (GPS) units (SPI-Pro GPSports, Canberra, Australia) in training sessions and pre-season games. The measurements obtained were Total Distance, High Intensity Running Distance, Very High Intensity Running Distance, New Body Load and Metres per Minute. Non-contact soft tissue injuries were documented throughout the season. Players' seasons were averaged over one and four week blocks according to when injuries occurred. These blocks were compared to each other and to players' seasonal averages. Players performed significantly higher Metres per Minute in the weeks preceding an injury compared to their seasonal averages (+9.6 % and +7.4 % for one and four week blocks respectively) (p<0.01), indicating an increase in training and gameplay intensity leading up to injuries. Furthermore, injury blocks showed significantly lower average New Body Load compared to seasonal averages (-15.4 % and -9.0 % for one and four week blocks respectively) (p<0.01 and p=0.01). Periods of relative under-preparedness could potentially leave players unable to cope with intense bouts of high intensity efforts during competitive matches. Although limited by FIFA regulations, the results of this study isolated two variables predicting soft tissue injuries for coaches and sports scientist to consider when planning and monitoring training.