Content uploaded by Yikang Wang
Author content
All content in this area was uploaded by Yikang Wang on Nov 01, 2022
Content may be subject to copyright.
2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS)
978-1-7281-9874-3/20/$31.00 ©2020 IEEE 1 July 28-30, 2020•Shenyang, China
Performance Analysis of Everton Football Club
Based on Tracking Data
Yikang Wang
School of Geodesy and Geomatics
Wuhan University
Wuhan, China
wangyikang@whu.edu.cn
Hao Wang
Electronic Information School
Wuhan University
Wuhan, China
2017301200046@whu.edu.cn
Mingyue Qiu
Economics and Management School
Wuhan University
Wuhan, China
201730201010219@whu.edu.cn
Abstract—Competitive team sports are one of the most
informative scenarios in the research of team cooperation analysis.
However, there is a lack of simple, robust, and accurate key event-
based methods when evaluating the performance of a soccer team.
In this paper, we first built a ball-passing network to facilitate
teamwork analysis of a soccer team, with the help of which we then
proposed a novel model for quality assessment based on highlight
moments to evaluate the performance of the team. Further, we
develop a third model to identify the rhythm conversion of
offensive/defensive tactics so as to quantify them. Using
spatiotemporal tracking data of key events in 38 Premier League
games, a comprehensive and systematic analysis is formed on the
performance of both the team staff and players of the Everton
Football Club. Also, the key factors to the match result are
quantitatively explored and modeled.
Keywords—Sport analysis; Spatio-temporal; team performance;
Expectation-Maximization algorithm
I. INTRODUCTION
Competitive team sports have always attracted people's
attention. In team ball games, two sides compete for the control
of the ball, following the rules and shoot for goal. Meanwhile,
they have to prevent the other party from performing this
process. Soccer, as a competitive team sport, is much more than
a simple set of passing and shooting action. Rather, it is from an
interdisciplinary perspective about the complex interactions
among players. Thus, whether a team can win never only depend
on the sum of an individual's ability but also on some other
factors that can be explained with quantitative and formalized
methods.
In the past, due to the limitation of equipment, the game was
observed and memorized by expert observers, and their memory
of the major game time was only 42% [1]. With the development
of imaging technology, auxiliary models are being used, such as
T-patterns [2], to roughly quantify the spatiotemporal data of the
key events from the game video recording. In recent years, radio
frequency identification (RFID) technology has enabled people
to obtain accurate spatiotemporal data of the ball and players,
which makes it possible for coaches and data analysts to conduct
in-depth data analysis and mining. The analysis of soccer team
game data, especially the spatiotemporal data of key events, is
being used for decision support in all aspects of professional
sports, including match strategy, player evaluation, match
outcome, and league table prediction. Using the key event
tracking data in the 2017/18 season of Premier League [3], we
make a comprehensive analysis of the dynamics of Everton
Football Club.
II. RELATED WORK
Spatiotemporal data on sports events such as soccer are
generally analyzed and visualized from two dimensions: time
and space. In spatial dimensions, such as Cintia, etc., the field is
evenly divided according to Cartesian coordinates [4], while
Yue treats it in polar coordinates (divided by polar coordinates)
[5] to determine how the position of the event influence the
match on the global. In the time dimension, a team's strategic
rhythm and cooperation network can be known from the
sequence of passes [6], and performance can be evaluated by
time-domain analysis based on highlight moments [7]. By
combining the common spatiotemporal data of players and the
ball, the development of important events during the game can
be further accurately predicted [8].
Graph and network represented by an adjacency matrix are
often used to analyze the connections between multiple
independent individuals. In the analysis of team sport, network
analysis and visualization can find a lot of information which is
not easy to obtain directly from the game video record or
datasheet, such as the usage of the court area, the state of the
team when it passes the ball, and the player's performance in the
game. Network analysis can identify a team's weaknesses and
potential problems for making improvements, as well as
opponent weaknesses [9].
Through the calculation of indicators using the data such as
passing events, a team's performance can be quantitatively
evaluated. As in [4], the team's performance (including goals
and attempts) is strongly related to the total number of passing
actions as well as the uniformity of the distribution of both
passers and passing positions. But in fact, the passing actions in
football matches have multiple purposes, such as preparing for
offense, returning defense, or delaying time. Analysis for the
purpose of the pass will help improve the accuracy and
robustness of the team's performance indicators.
In our Work, through the ball-passing network and cluster
analysis for the entire season, we found the small groups in the
team and then analyzed their team formation in detail by
visualizing the position of a single game and the passing network
of the small groups. By identifying the purpose of the ball-
passing using time-series events, we established a performance
quality index Q based on the highlight moments, and then used
the Gaussian Mixture Model and solved the parameters of Q
through the Expectation-Maximum algorithm to predict the
competition's outcome. By judging the offensive and defensive
passing mode and the success of the duel, we have established a
2
tactical rhythm conversion model to calculate the rhythm ratio,
which is highly relevant to the match outcome.
III. PRELIMINARY RESEARCH
A. Ball-Passing Network
In the player's directed passing network, each node
represents a player with the Player ID (A unique identifier for
the player, consists of a letter and a number, the letter reflects
the player's position: 'F' for forward, 'D' for defense, 'M' for
midfield and 'G' for the goalkeeper) denoted in the circle, while
the radius of a node represents the degree of the player involving
in the successful pass:
()
Where Aij represents the number of successful passes from
player i to j, the edges between nodes indicate successful passing
events between players, the thicker, the more successful passes.
Through the analysis of 38 games, we can have the passing
network (Fig. 1) of all players of Everton.
Fig. 1. Ball-passing network of all players in Everton
It can be seen that due to the difference in playtime, a player's
ability, position, and the total number of passes aimed at
different players varies greatly. An individual player's passing
behavior has obvious observable regularity, while different
players' passing habits differ greatly from each other. Take M1,
M3, and D6 as an example. They all are players who frequently
pass, but M1 likes to pass more with both M3 and F2 but has
less interaction with other players; D6 does not have a fixed
passing object, and the passes with multiple players are evenly
distributed.
B. Cluster Analysis
The cluster analysis of the passing network [10] can well
show the collective nature of the player's passing habits. The
resolution determines the number of communities separated by
the clustering result. The lower the resolution, the more
communities. When the resolution is 0.8, players are divided
into five communities with 0.067 modularity. Different
categories and are shown in Table I with corresponding colors.
TABLE I. THE AVERAGE INPUT OF AVERAGE DEGREE FOR EACH CATEGORY
Community
No.
Number
of Players
Players
Average
Degree
1
6
G1 D1 D3 D9 D10 M7
25.36
2
6
D2 D4 D6 M1 M2 M3
46.68
3
8
D7 M4 M5 M8 M9 M10
M11 M13
8.97
4
5
D5 M6 F1 F2 F3
41.75
5
5
D8 M12 F4 F5 F6
17.18
All
12.3
Players with the same position, especially D and M, tend to
pass to one another because of the short distance between them
and the highly successful passing rate in the backcourt. The grass
green group where D1 is located is composed of D and G. They
are located in the backfield; the blue group where M1 is located
and the pink group where M4 is located are all composed of D
and M, which are located in the middle backfield. It is worth
noting that F has not formed its own community. This is due to
the relatively large opposing defensive pressure in the frontcourt
and the low successful passing rate between forwards. As for
why instead of F and M forming a community, orange and dark
green is composed of the three roles of D, M, and F, the
following heat map spatial analysis can give us the answer.
C. Spatial Analysis
Fig. 2. Heat map for passing (left) and shooting (right)
From the passing network for triadic configurations (Fig.
3a), it can be seen that this kind of pattern differs in different
games. Combining the score and the average player position
(Fig. 3b), we found that in match14, Everton is far stronger than
the opponent. The average position of the forward F2 is much
more forward than M and is in an offensive state. Accordingly,
F2 gets more passing balls from D5 and M6. But for the same
reason, that F2 is too much forward, it is easy for opponents to
snatch the ball, so there are much back passes done by F2. In
match18, D5 and M6's forward pass is more likely to F1 than
F2, since the position of another forward (F1) is more
advantageous to attack than that of F2; D5 passed to M6 for the
reason that M6 is between D5 and the forward, which made it
suitable for M6 to be a transit point for D5 to pass the ball to the
forward. In match25, M6 is not in the middle of D5 and the
forward, which means the loss of offensive advantage.
Therefore, D5 no longer passes to M6 as frequently as just
mentioned, and F2's position is not in an advantageous state to
attack anymore. The average number of passes between the three
is small.
3
Fig. 3. (a1-a3) A passing network for triadic configurations D5-M6-F2 in
match14, 18, and 25. The arrow points from the shooter to the receiver, while
the thickness indicates the number of successful passes; (b1-b3) Average
passing position of starters in three matches representing forward, midfield,
defender and goalkeeper are drawn in blue, orange, green, and black
respectively.
IV. OUR APPROACH
A. Performance Evaluation Model
1) Highlight Moment
In order to analyze the purpose of each pass, we analyze the
player's decision for a period of time after the passing event in
the time-serial data, looking for events that have a strong
relationship with the match result and the situation, including
shooting, acceleration and smart pass, which we named
"highlight moment." The "preparation period" is a period of time
before the highlight moment, and we think that the pass that
occurs during this period is mainly aimed at creating a highlight.
2) Performance Quality Index
Passes in the purpose of highlight have made a positive
contribution to a team's performance. Performance quality index
based on highlight moments is defined as:
()
Where Ashot, Aacce, and Asmart are the number of shooting,
acceleration, and smart pass in each match, respectively. Cside is
an offset constant that affects a team's performance depends on
whether a certain team is home side or away side. After our
analysis, the preparation periods for shooting, acceleration, and
smart pass are set as 30, 20, and 10 seconds respectively.
3) Gaussian Mixture Model
A Gaussian mixture model (GMM) solves multiple Gaussian
models simultaneously and fuses them into one model with a
certain weight. In theory, the performance quality Q should have
a Gaussian distribution. We take the Q of the two sides in each
match as two dimensions and perform a two-dimensional GMM
analysis based on the Expectation-Maximization (EM)
algorithm in seek of the maximum correlation of the difference
between the two sides' Q and the match result.
Algorithm: EM applied in GMM
Input: Ashot, Aacce, and Asmart
Output: Performance Quality Index Q
1 For each iteration {α, β, γ and Cside} do
2 Compute Performance Quality Index Q of Everton and
Opponent team by equation (2) to build two-dimensional
vectors (QEverton,QOpponent) of each match
3 Compute the probability that each point is generated by a
sub-model:
4 End for
B. Tactical Rhythm Model
1) Identification of Offensive and Defensive
When a team is taking control of the ball in a match, there is
two possible behavior tactic: offensive and defensive. The
duration and number of transitions of the team's offense and
defense can be used to evaluate the team's performance in the
game [11]. Compared with other ball games, it is difficult for
soccer players to create shooting opportunities, so we cannot
simply identify the team's tactic by whether the ball is moving
forward or backward to the opponent's goal. We use the
positions (F, M, D, or G) of the players receiving and shooting
the ball to identify offensive and defensive passing (Fig. 4):
passing between the team's forward and midfield form a
deterrent advantage for offense, while passing between midfield,
defense and goalkeeper is regarded as a temporary passive
defense. As for passing between two midfielders, it is offensive
if the ball is moving forward.
Fig. 4. Offensive and defensive passing
2) Conversion Rate
Based on the above rules, we can determine the offensive
and defensive tactical mode showed by each pass and then
obtained the number of changes in the mode of a team during a
single game. Suppose a team's pass sequence is {M1, M2, …, Mi,
…, Mn} (n is the total number of passes of the team in the game),
where
thepassisoffensive
thepassisdefensive ()
Then we can obtain transformed sequence {C1, C2, …, Ci,
…, Cn}, where
()
The total number of conversions of the team's offense and
defense in a game is
()
4
If the statistic time of the team's control of the ball in this
game is T, then the team's offensive and defensive conversion
rate is
()
After calculating the offensive and defensive conversion
rates of both teams in a game, the conversion ratio is
of
of ()
V. EXPERIMENTS AND RESULTS
A. Results of Performance Evaluation Model
Using 0.25 as the interval, we can solve iteratively, and the
final obtained parameters are α=1, β=2, γ=2, C_side=16. The
GMM results are shown in Fig. 5.
Fig. 5. Using two evaluation methods to calculate the scores of Everton and
the opponent in 38 matches. (a) is the rating that only considers passing. (b) is
the rating based on highlight moments.
In 28 non-tie matches, the rating method which uses only the
passing events (Fig. 5a), the winner's score was greater than the
loser's score in 18 games (with 64% accuracy). Using the
highlight-based scoring (Fig. 5b), the winner's Q is greater than
the loser's Q in 23 games (with 82% accuracy). The Q of Everton
does not change much, whereas the opponent's Q changes
greatly. This is because the opponent is not a single team but 19
teams with different strengths, which in turn shows that our Q
value can accurately evaluate the team's performance.
B. Results of Tactical Rhythm Model
The offensive and defensive conversion rate are shown in the
fact that a team has more average offensive and defensive
conversions per unit time than the opposing team. It also means
having a more flexible tactical rhythm. By analyzing 38 games,
we analyze the relation between game results and conversion
ratio η (Table II).
TABLE II. RELATION BETWEEN GAME RESULTS WITH CONVERSION RATE
Result
Matches
Average
Matches
rate
win
13
1.509
12
92%
tie or loss
25
1.010
11
44%
This means that one of the most important conditions to win
is to be more flexible than the opposing team in the conversion
of offense and defense. But this is obviously not a sufficient
condition to win. In 23 games, Everton's offensive and defensive
conversion frequency is higher than the opposite team. This
should be caused by Everton's tactical style.
A more accurate performance evaluation is given by
combining conversion ratio η and the difference of team
performance quality index ∆Q (Table III). There are 11 matches
over 13 that Everton wins with η>1 and ∆Q≥0, while this
number in losing is three over 15. While maintaining the style,
only by ensuring that the team's control over the ball and the
situation still prevails, that is, when their performance quality Q
is higher than the opponent, can they have a high chance of
winning.
TABLE III. RELATION BETWEEN GAME RESULTS WITH PERFORMANCE
QUALITY Q WHEN CONVERSION RATIO >1
Result
MatchID
Everton Q
Opponent Q
win
1
1.270
4.59
2.64
1.95
6
1.341
4.80
4.12
0.68
14
1.904
4.47
3.95
0.52
15
1.500
3.79
2.57
1.22
17
1.098
3.72
4.95
-1.23
18
1.235
4.14
3.56
0.58
25
1.769
3.26
3.09
0.17
27
1.664
3.84
3.67
0.17
30
2.319
4.35
2.26
2.09
31
1.715
4.55
3.24
1.31
35
1.391
4.61
3.49
1.12
36
1.570
3.81
3.81
0.00
tie
19
1.021
3.09
6.86
-3.77
24
1.276
4.46
2.43
2.03
34
1.570
4.55
3.91
0.64
loss
4
1.901
4.94
5.43
-0.49
7
1.323
6.38
2.20
4.18
10
1.567
5.06
2.87
2.19
13
1.432
3.25
7.21
-3.96
21
1.427
4.46
3.18
1.28
28
1.225
2.91
3.40
-0.49
29
1.314
3.06
3.99
-0.93
38
1.063
4.09
5.25
-1.16
C. Sensitive Analysis
Although the 38 games of Everton contain a large data set,
in the fitting of the model, the fixed parameters to be determined
may not be enough. In a two-team performance model, using the
EM-based mixed Gaussian model to fix the weight parameters
5
α, β, γ and Cside in Q involves the use of a heuristic algorithm.
The objective function of the heuristic algorithm is still a
dilemma of an iterative algorithm. This not only increases the
time cost for the heuristic algorithm to find the maximum value,
but also makes it difficult to judge the stability of the model after
fixing the weights.
In order to avoid the poor usable range of the model
calculated by the heuristic algorithm based on the objective
function of the iterative calculation result, we fixed the weight
parameters obtained by the heuristic algorithm α, β, γ and Cside.
By multiplying the five parameters Aij, Ashot, Aacce, Asmart and
Cside with five kind of Gaussian distribution with μ=1, σ=0.05,
0.1, 0.15, 0.2 and 0.25 respectively, we can test whether the
objective function is still ideal in this way. We still hope that the
mixed Gaussian model can best distinguish the Gaussian
distribution of the three results of winning, losing, and tying.
With the concern of that, we consider the mean of the three
mixed basic Gaussian models fitted with the mixed Gaussian
model, which is the sum of the squares of the Euclidean
geometric distances. The results are shown in Fig. 6. When Aij,
Ashot, Aacce, Asmart and Cside are regarded as input data with white
noise of no more than 0.25 standard deviation, the sum of the
squares of the Euclidean geometric distances changes within
9%, that is, our model maintains good stability.
In fact, although the model's training data has involved 38
games, these 38 games are not necessarily a summary of all
classic cases of soccer games. Therefore, it is not excluded that
under more complete data sets, the heuristic algorithm can find
a more reasonable solution set of weight parameters.
Fig. 6. Sensitive analyze of the performance quality module
VI. CONCLUSION
In this paper, a ball-passing network model, a team
performance quality model based on highlight moments, and an
offensive and defensive tactical rhythm conversion model are
established, which, with the use of spatiotemporal tracking data,
are able to evaluate team performance on time and space level.
Results showed that our models have good fits as well as high
stability; nonetheless, the next step of applying the model to
larger scale spatiotemporal data is necessary if we are expecting
to verify its universality.
REFERENCES
[1] I.M. Franks, and G. Miller. "Eyewitness testimony in sport." Journal of
sport behavior 9.1 (1986): 38.
[2] O.F. Camerino, J. Chaverri, M.T. Anguera, and G.K. Jonsson. "Dynamics
of the game in soccer: Detection of T-patterns." European Journal of Sport
Science 12.3 (2012): 216-224.
[3] L. Pappalardo, et al. "A public data set of spatio-temporal match events in
soccer competitions." Scientific data 6.1 (2019): 1-15.
[4] P. Cintia, F. Giannotti, L. Pappalardo, D. Pedreschi, and M. Malvaldi.
"The harsh rule of the goals: Data-driven performance indicators for
football teams." 2015 IEEE International Conference on Data Science and
Advanced Analytics (DSAA). IEEE, 2015.
[5] Y. Yue, P. Lucey, P. Carr, A. Bialkowski, and I. Matthews. "Learning
fine-grained spatial models for dynamic sports play prediction." 2014
IEEE international conference on data mining. IEEE, 2014.
[6] Q. Wang, H. Zhu, W. Hu, Z. Shen, and Y. Yao. "Discerning tactical
patterns for professional soccer teams: an enhanced topic model with
applications." Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. 2015.
[7] X. Wei, L. Sha, P. Lucey, S. Morgan, and S. Sridharan. "Large-scale
analysis of formations in soccer." 2013 international conference on digital
image computing: techniques and applications (DICTA). IEEE, 2013.
[8] X. Wei, P. Lucey, S. Vidas, S. Morgan, and S. Sridharan. "Forecasting
events using an augmented hidden conditional random field." Asian
Conference on Computer Vision. Springer, Cham, 2014.
[9] J.L. Pena, and H. Touchette. "A network theory analysis of football
strategies." arXiv preprint arXiv:1206.6904 (2012).
[10] R. Lambiotte, J.C. Delvenne, and M. Barahona. "Laplacian dynamics and
multiscale modular structure in networks." arXiv preprint
arXiv:0812.1770 (2008).
[11] Q. Wang, H. Zhu, W. Hu, Z. Shen, and Y. Yao. "Discerning tactical
patterns for professional soccer teams: an enhanced topic model with
applications." Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. 2015.