ArticlePDF Available

Abstract

To the trained-eye, experts can often identify a team based on their unique style of play due to their movement, passing and interactions. In this paper, we present a method which can accurately determine the identity of a team from spatiotemporal player tracking data. We do this by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps. We show how our approach is significantly better at identifying different teams compared to standard measures (i.e., Shots, passes etc.). We demonstrate the utility of our approach using an entire season of Prozone player tracking data from a top-tier professional soccer league.
Identifying Team Style in Soccer using Formations
Learned from Spatiotemporal Tracking Data
Alina Bialkowski1,2, Patrick Lucey1, Peter Carr1, Yisong Yue1,3, Sridha Sridharan2and Iain Matthews1
1Disney Research, Pittsburgh, USA, 2Queensland University of Technology, Australia, 3California Institute of Technology, USA
Email: a.bialkowski@connect.qut.edu.au, {patrick.lucey, peter.carr, iainm}@disneyresearch.com
yyue@caltech.edu, s.sridharan@qut.edu.au
Abstract—To the trained-eye, experts can often identify a team
based on their unique style of play due to their movement,
passing and interactions. In this paper, we present a method
which can accurately determine the identity of a team from
spatiotemporal player tracking data. We do this by utilizing a
formation descriptor which is found by minimizing the entropy
of role-specific occupancy maps. We show how our approach is
significantly better at identifying different teams compared to
standard measures (i.e., shots, passes etc.). We demonstrate the
utility of our approach using an entire season of Prozone player
tracking data from a top-tier professional soccer league.
I. INTRODUCTION
The question we ask in this paper is: given all the player
and ball tracking data of a team in a season, what team-
based features can adequately discriminate a team’s behavior?
In practice a human expert is able to do this, but it is very
labor intensive and is inherently subjective. Having a method
which can quantify these behaviors should be possible with the
prevalence of spatiotemporal tracking data of player and ball
movement being captured in most professional sports (e.g., [1],
[2]). However, this task is challenging due to the complexities
in dealing with adversarial multi-agent trajectory data. A major
issue centers on the alignment of individual player trajectories
within a team setting which is a source of noise. In this paper,
we align the data based on a role-based method which is
learnt directly from data [3] to provide a formation descriptor.
We show that using this approach, semantically meaningful
team-based strategic features can be obtained which are highly
predictive of their identity. We compare this descriptor to other
features including match statistics (e.g., shots, passes, fouls)
and ball movement, and show that the formation descriptor
is far superior in discriminating unique team characteris-
tics (Fig. 1).
A. Related Work
With the recent deployment of player tracking systems
in professional sports, a recent influx of research has been
conducted on how to use such data sources. Most of the
work has centered on individual player analysis. In basketball,
Goldsberry [4] used player tracking data to rank the best shoot-
ers in the NBA according to their shot location. Maheswaran
et al. [5], [6] used the tracking data to analyze the best method
to obtain a rebound. Similarly, Wiens et al. [7] looked at how
teams should crash the backboard to get rebounds. Recently,
Lucey et al. [8] used tracking data to discover how teams
A B C D E F G H I J K L M N O P Q R S T
Team ID
{
Statistics
Shots (on goal)
12(4)
Fouls
11
Corner kicks
8
Offsides
4
Time of possession
62%
Yellow cards
1
Red cards
0
Saves
3
Formation
Game716, T1, GT Label = 4141
1
2
3
4
5
6
7
8
9
10
Ball occupancy
WEHAM
Average overall
Fig. 1. In this paper, based solely on match statistics, location of ball
possession (i.e., ball occupancy), and a formation descriptor, we can predict the
identity of soccer teams with high accuracy. We show the formation descriptor
is the best discriminator of team style.
achieved open three-point shots. Bocskocksy et al. [9] re-
investigated the hot-hand theory. Miller et al. [10] analyzed
the shot selection process of players using non-negative matrix
factorization. Cervone et al. [11] used basketball tracking
data to predict points and decisions made during a play.
Carr et al. [12] used real-time player detection data to predict
the future location of play and point a robotic camera in
that location for automatic sport broadcasting purposes. In
tennis, Wei et al. [13], [14] used Hawk-Eye data to predict
the type and location of the next shot. Ganeshapillai and
Guttag [15] used SVMs to predict pitching in baseball while
Sinha et al. [16] used Twitter feeds to predict NFL outcomes.
In terms of analyzing a team’s style of play, most work
has centered on soccer. Lucey et al. [17] used entropy maps
to characterize a team’s ball movement patterns using data
from Opta [18]. This was followed by [19], which showed
that a team’s home and away style varied, highlighting that
home teams had more possession in the forward third as
well as shots and goals. Bialkowski et al. [20] examined the
rigidity of a team’s formation across a season and showed
that home teams tended to player higher up the pitch both
in offense and defense. Outside of the sporting realm, there
has been plenty of work focusing on identifying style. In the
seminal work on separating style from content, Tenenbaum
and Freeman [21] used a bilinear model to decouple the
raw content for improved recognition on a host of different
tasks. More recently, Doersch et al. [22] used discriminative
clustering to discover the attributes that distinguished images
of one city from another. They followed this work by exploring
the visual style of objects (e.g., cars and houses) and how they
vary over time [23]. The contribution of this paper is using a
formation descriptor to identity the unique style of a team.
(a) (b) (c)
Fig. 2. (a) Given the player trajectory of each player during an entire half, we see that players continually swap positions. (b) Shown are the covariances of
player positions which again highlights the overlap. (c) Using our iterative approach (which is very similar to k-means with the constraint that at every frame
each detection requires a unique role), a role label is assigned to each player at the frame-level, allowing us to see the underlying structure of the team.
Statistic Frequency
Teams 20
Games 375
Data Points 3.89M
Ball Events 721K
TABLE I. IN VEN TORY O F DATASE T USE D FO R THI S WOR K.
II. DATA: PL AYER TRACKING IN SOCCER
For this work, we utilized an entire season of player
tracking data from Prozone. The data consists of 20 teams
who played home and away, totaling 38 games for each team
or 380 games overall. Five of these games were omitted due to
erroneous data files. We refer to the 20 teams using arbitrary
labels {A, B , . . . , T }. Each game consists of two halves,
with each half containing the (x, y)position of every player
at 10 frames-per-second. This results in over 1 million data-
points per game, in addition to the 43 possible annotated ball
events (e.g., passes, shots, crosses, tackles etc.). Each of these
ball events contained the time-stamp as well as location and
players involved. An inventory of the data is given in Table I.
III. DISCOVERING FORM ATIO NS F ROM DATA
In sports, there exists a well established vocabulary for
describing the responsibility each player has within a team.
Even though it varies from sport to sport, within each sport
these descriptions generalize. The language used is in terms of
formations, which is effectively a strategic concept (i.e., dif-
ferent teams can use the same formation simultaneously).
As a result, we refer to a formation’s generic players using
a set of identity agnostic labels which we denote roles. A
formation is generally shift-invariant and allows for non-rigid
deformations. Therefore, we define each role by its position
relative to the other roles (i.e., in soccer a left-midfielder
plays in-front of the left-back and to the left of the center-
midfielder). Each role within a formation is unique (i.e., no
two players within the same formation can have the same role
at the same time), and players can swap roles throughout the
match. Additionally, multiple formations may exist which can
be interpreted as different sets of roles. A role represents any
arbitrary 2D probability density function. Therefore, we can
represent it non-parametrically by quantizing the field into a
discrete number of cells, or parametrically using a mixture
of 2D Gaussians. We can then represent the formation by
concatenating the features of each role into a single vector.
Pass Foul - Cross Catch
Direct FK Drop Save
Pass Foul - Cross Catch
Assist Indirect FK Assist Save
Corners Foul - Reception Punch
Penalty
Shot on Foul - Reception Punch
Target Throw-in Assist Save
Shot off Offside Reception Diving
Target Save
Goal Yellow Catch Diving
Card Save
Own Red Catch Drop of
Goal Card Drop Ball
Neutral Running Chance Substitution
Clear Save with Ball
Block Drop Pass Hold of
Kick Save Ball
Clearance Neutral Player Clearance
Uncontrolled Clearance Out
TABLE II. LI ST OF M ATCH S TATIST ICS U SE D TO DE SCR IB E TEA M
BE HAVIO R.
Role is a dynamic label, meaning that a player can fulfill many
roles during the game (e.g., a player may switch between left-
winger and center-midfielder). However, each role needs to be
assigned to a player in every frame so two players can not be
in the same role at the same time.
As a formation basically assigns an area or space to
each player at every frame, this problem can be framed
as a minimum entropy data partitioning problem [24], [25].
Bialkowski et al. [3] show the full derivation, but in practice
it is similar to k-means clustering with the caveat of instead
of assigning each data point to its closest cluster, we solve a
linear assignment problem between identities and roles using
the Hungarian algorithm [26] at each frame. The process is
shown in Fig 2. Using this procedure, the resulting formation
of each team in every half we analyzed is shown in Fig 3. In
the next section, we compare the formation descriptor to other
match factors.
IV. PREDICTING TEA M IDENTITY
To determine if teams had a distinct playing style, we
conducted a series of team identity experiments. The challenge
was, given only player tracking data and ball events, can
we predict the identity of each team? To do this, we need
descriptors of team behaviors during a match. For this paper,
we generated three types of match descriptors: 1) match
statistics, 2) ball occupancy, and 3) team formation.
AB C D E
F G H I J
K L M N O
P Q R S T
Fig. 3. Example of our formation descriptors for each team. The colors represent different roles. For visualization purposes we have just plotted the centroid
for each role for each match.
A. Match Descriptors
Match Statistics: During a match, various statistics that
capture team and individual behavior are annotated. Table II
shows the list of statistics which we used in this paper. While
the number of these match statistic is quite large, the majority
of them are quite sparse with only a couple of these events
labeled per match. In reporting of a match, only a half-dozen of
the most important match statistics are normally documented
(i.e., goals, shots on target, shots off target, passes, corners,
yellow and red-cards).
Ball Occupancy: Associated with the match statis-
tics/events are the time and location for each occurrence.
To form a representation of this information, we adopted
the approach used in [17], [19] which involves estimating
the continuous ball trajectory at each time-stamp by linearly
interpolating between events, as well as which team had
possession (ignoring stoppages). We then broke the field into
a10 ×8spatial grid and calculated the ball occupancy of
each of these grids for each team (i.e. how often the team
was in possession of the ball in this location over the match).
All teams were normalized to attack from left to right. A
visualization of a resulting ball occupancy example is shown
in Fig. 4.
Formation Descriptor: For each match half, we found
the formation descriptor Fby using the method described in
Section III. This gave an M×Nmatrix where Mrefers to the
number of cells in the field and Nis the number of roles (set
to 10, as we omitted the goal-keeper as well as games which
Fig. 4. Example ball occupancy map over a match half for a team attacking
left to right. This example shows dominance of ball possession on the left
side of the field which may be indicative of the team’s playing style.
had a player sent off). A depiction of the formation descriptors
for each team for all matches is shown in Fig. 3. For clarity of
presentation, we have only plotted the centroid of each role for
each match, with each team attacking from left-to-right. Each
different color marker corresponds to a different role for that
team. It can be seen from the plot that teams are rather rigid
in the way they play across a season which suggests that this
is a useful feature in discriminating between different teams.
Another interesting point is, as teams vary little in terms of
playing style throughout the season, this could be used as a
powerful prior for preparing against an opposition in upcoming
matches.
Confusion matrix 20NN, using LDA (CCR = 17.13%)
6
0
18
0
13
12
22
22
0
0
0
0
6
0
0
19
11
0
0
0
0
0
0
7
0
0
0
0
0
0
0
0
6
7
6
0
0
0
0
6
12
0
12
0
7
12
6
6
7
0
0
0
0
0
0
19
6
0
0
0
0
25
6
13
7
0
0
11
20
16
7
0
18
0
0
0
6
7
6
0
0
6
0
0
13
0
6
6
13
0
7
0
6
7
6
6
0
0
0
0
6
0
12
7
0
47
11
0
7
0
0
0
6
0
0
0
6
13
0
6
25
6
0
0
13
6
17
11
0
0
0
0
6
0
6
6
11
0
0
6
19
0
0
0
0
6
0
6
0
0
7
0
6
0
0
12
0
0
0
19
6
19
12
20
0
6
0
0
33
5
0
6
6
0
11
0
28
7
12
6
0
25
0
13
7
6
6
0
0
11
7
6
0
29
6
0
0
7
18
0
0
0
6
0
7
0
6
6
0
5
7
0
18
0
17
0
6
0
6
0
0
0
0
0
0
0
0
0
0
32
0
44
0
14
11
0
0
7
29
0
12
12
24
20
20
6
11
11
0
11
20
0
6
0
6
12
0
20
12
12
0
0
0
0
0
0
0
6
0
5
0
6
0
36
0
0
0
0
12
0
6
6
0
0
7
0
0
6
7
5
40
12
0
0
11
0
11
0
0
12
0
0
12
0
0
0
0
0
0
0
0
0
0
0
0
25
0
0
0
12
0
0
0
0
7
0
0
0
13
5
7
0
0
0
11
0
11
7
0
6
6
0
0
7
0
0
6
0
0
0
0
6
0
0
0
0
6
33
0
6
0
0
0
7
0
0
0
0
0
5
0
12
6
7
0
0
0
0
6
0
0
0
0
7
0
0
11
11
0
0
0
6
12
0
11
0
0
0
0
6
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
10
20
30
40
50
60
70
80
90
100
Confusion matrix 20NN, using LDA (CCR = 19.51%)
31
0
0
7
0
0
11
17
0
0
0
0
6
0
11
0
6
0
0
12
0
19
12
0
7
0
6
11
13
5
20
0
6
0
6
6
6
7
6
6
12
6
0
0
13
6
11
0
20
5
7
12
18
0
6
0
0
0
6
0
0
0
0
27
0
6
6
0
7
0
0
12
0
0
6
0
0
13
0
6
0
6
0
0
20
6
0
0
0
5
7
0
0
29
0
12
11
0
12
0
0
6
12
13
0
29
11
11
7
0
0
6
0
7
6
12
6
13
0
6
12
6
6
0
7
18
22
6
7
5
7
0
12
0
6
12
0
7
0
6
25
12
24
7
7
0
6
28
13
0
7
6
6
0
0
25
0
13
0
6
0
6
12
7
0
0
0
0
7
0
0
0
6
0
11
0
6
0
6
0
0
6
0
7
0
0
6
0
0
26
7
12
6
0
6
0
0
7
6
6
0
6
6
0
0
6
11
6
7
0
0
0
0
0
0
0
6
0
0
0
0
0
6
13
0
6
0
0
0
16
7
25
0
0
6
0
0
7
6
6
0
12
12
7
7
6
6
11
0
16
13
0
0
14
6
0
6
0
6
0
0
0
0
0
7
0
0
0
0
0
0
0
0
21
6
0
0
0
12
0
6
12
6
0
7
6
0
6
7
11
0
6
12
0
17
6
0
7
6
19
6
0
0
0
7
6
0
0
0
11
0
6
6
0
0
19
0
13
0
6
0
0
0
0
13
0
0
6
7
0
13
6
18
14
6
0
50
0
6
6
0
0
6
13
0
0
0
0
0
0
0
0
0
0
6
0
0
13
0
0
6
0
0
0
0
6
0
0
0
0
7
6
6
7
0
0
0
0
24
0
0
0
0
0
7
0
6
0
7
0
7
0
0
7
0
6
6
0
6
12
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
10
20
30
40
50
60
70
80
90
100
Confusion matrix 20NN, using LDA (CCR = 67.32%)
81
6
6
0
7
6
0
6
7
0
0
0
0
0
0
0
11
0
0
0
0
38
0
0
0
0
0
0
13
0
0
0
0
0
6
0
6
0
0
0
0
0
65
0
0
0
0
6
0
5
0
0
0
14
0
6
0
0
0
6
0
0
0
73
7
0
0
0
0
0
7
0
0
0
0
0
0
7
6
0
6
0
0
13
73
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
0
94
6
0
0
0
0
0
6
0
0
0
11
0
0
0
0
12
0
7
0
0
83
6
0
0
0
0
0
0
0
0
0
0
0
12
0
0
24
0
7
0
6
67
7
0
0
12
6
0
0
6
0
7
0
0
0
0
0
0
0
0
0
6
40
0
7
6
0
0
0
0
6
7
12
12
0
0
0
0
0
0
0
0
7
95
0
0
0
0
6
0
0
0
0
6
0
12
0
0
0
0
0
0
0
0
53
0
0
0
6
0
0
0
0
0
0
6
0
0
0
0
0
0
13
0
7
44
6
0
6
0
6
0
0
0
0
0
0
0
0
0
0
6
7
0
0
0
59
0
0
0
6
13
0
6
0
0
0
0
0
0
0
0
0
0
0
12
6
86
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
72
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
88
0
0
0
0
12
6
0
0
0
0
6
6
0
0
7
6
0
0
6
0
56
7
6
6
0
6
0
0
7
0
0
0
0
0
0
0
6
0
0
0
0
60
0
0
0
0
6
0
0
0
0
0
7
0
0
12
6
0
0
0
0
0
71
0
0
12
0
7
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
50
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
10
20
30
40
50
60
70
80
90
100
Confusion matrix 20NN, using LDA (CCR = 70.38%)
88
0
0
0
0
0
0
0
7
0
0
0
6
0
0
0
0
0
0
0
0
44
0
0
0
0
0
0
7
0
0
0
0
7
6
0
0
0
0
0
6
0
65
20
0
0
6
0
7
5
0
0
0
0
0
0
6
7
6
6
0
6
0
73
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
80
0
0
0
0
0
0
0
0
0
0
0
6
0
0
0
0
0
6
0
0
94
0
6
13
0
0
0
0
0
0
6
11
0
0
12
0
12
12
0
0
0
67
0
0
0
0
0
6
0
0
6
6
0
0
0
6
0
0
0
0
0
6
83
0
0
0
0
0
0
0
0
0
7
0
6
0
19
6
0
0
0
0
0
40
0
27
0
6
0
6
0
0
7
0
6
0
6
6
0
0
0
0
0
0
95
0
0
0
0
6
6
6
7
0
0
0
0
0
0
0
0
0
0
0
0
53
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
75
0
0
6
0
0
0
0
0
0
6
0
0
7
0
0
0
7
0
0
6
71
0
0
0
6
13
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
86
0
0
0
0
6
0
0
0
0
0
0
0
6
0
7
0
7
6
0
7
72
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
81
0
0
0
0
0
6
0
0
13
6
11
11
7
0
0
0
12
0
0
0
61
0
18
12
0
0
6
0
0
0
6
0
7
0
7
0
0
0
0
0
0
53
0
0
0
0
0
0
0
0
0
0
0
0
7
6
0
0
6
0
0
7
71
0
0
0
0
7
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
56
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
10
20
30
40
50
60
70
80
90
100
A B TC D E F G H I J K L NM PO Q SR A B TC D E F G H I J K L NM PO Q SR A B TC D E F G H I J K L NM PO Q SR A B TC D E F G H I J K L NM PO Q SR
A
B
T
C
D
E
F
G
H
I
J
K
L
N
M
P
O
Q
S
R
Actual Team
Predicted Team Predicted Team Predicted Team Predicted Team
(a) (b) (c) (d)
Fig. 5. Team identity results for the various descriptors presented as confusion matrices, showing the percentage of agreement, with the actual team on the
vertical axis and the predicted team on the horizontal (normalized as percentages): (a) match statistics (17.13%), (b) ball occupancy (19.51%), (c) formation
descriptor (67.32%) and (d) fused all descriptors (70.38%).
Get Match
Descriptor
Scale Data LDA Predict Team
Identity
Figure 6: Example of how our approach works. NB: for visualization purposes we estimate the occupancy
maps via covariances for each role which are depicted by ellipses.
used the team identity as the class labels (i.e., C=20).
We learn a Wfor each feature set and then multiply the
features by Wto yield a C1 feature vector. To predict
the identity label of the teams in the test match, we use
a k-nearest-neighbor classifier (k= 20) using the euclidean
norm as our distance metric.
X
Xscale
WLDA
WT
LDA
arg max
WTr( WbW
WwW)(18)
Maybe put something in here about clustering on style...
5. PREDICTING FUTURE PERFORMANCES
5.1 Predicting Team Behavior
But generally, these methods are essentially equivalent to
non-negative matrix factorization, kernel k-means and dis-
criminative k-means. I think we have to do all three and
show we can similar performance (these coecients will es-
sentially be our style vector).
This can be seen as discriminative clustering, which is sim-
ilar to kernel k-means and is similar to non-negative matrix
factorization.
In the previous section, given we had the ball and player
0
20
40
60
80
Match Stats Ball Occ Formation Combined
Figure 8: Results of Team ID results.
tracking data, we wanted to predict the team identity. In
this section, we want to do the reverse - given we just have
the identity of the two teams playing, can we predict how the
game will be played by estimating what the match features
will be.
We use K-NN regression by using the style prior as the
input. From the previous section we have the weights for
each team. Our input is a joint representation of the style
Figure 6: Example of how our approach works. NB: for visualization purposes we estimate the occupancy
maps via covariances for each role which are depicted by ellipses.
used the team identity as the class labels (i.e., C=20).
We learn a Wfor each feature set and then multiply the
features by Wto yield a C1 feature vector. To predict
the identity label of the teams in the test match, we use
a k-nearest-neighbor classifier (k= 20) using the euclidean
norm as our distance metric.
X
Xscale
WLDA
WT
LDA
arg max
WTr( WbW
WwW)(18)
Maybe put something in here about clustering on style...
5. PREDICTING FUTURE PERFORMANCES
5.1 Predicting Team Behavior
But generally, these methods are essentially equivalent to
non-negative matrix factorization, kernel k-means and dis-
criminative k-means. I think we have to do all three and
show we can similar performance (these coecients will es-
sentially be our style vector).
This can be seen as discriminative clustering, which is sim-
ilar to kernel k-means and is similar to non-negative matrix
factorization.
In the previous section, given we had the ball and player
0
20
40
60
80
Match Stats Ball Occ Formation Combined
Figure 8: Results of Team ID results.
tracking data, we wanted to predict the team identity. In
this section, we want to do the reverse - given we just have
the identity of the two teams playing, can we predict how the
game will be played by estimating what the match features
will be.
We use K-NN regression by using the style prior as the
input. From the previous section we have the weights for
each team. Our input is a joint representation of the style
Figure 6: Example of how our approach works. NB: for visualization purposes we estimate the occupancy
maps via covariances for each role which are depicted by ellipses.
used the team identity as the class labels (i.e., C=20).
We learn a Wfor each feature set and then multiply the
features by Wto yield a C1 feature vector. To predict
the identity label of the teams in the test match, we use
a k-nearest-neighbor classifier (k= 20) using the euclidean
norm as our distance metric.
X
Xscale
WLDA
WT
LDA
WT
LDAXscale
arg max
WTr( WbW
WwW)(18)
Maybe put something in here about clustering on style...
5. PREDICTING FUTURE PERFORMANCES
5.1 Predicting Team Behavior
But generally, these methods are essentially equivalent to
non-negative matrix factorization, kernel k-means and dis-
criminative k-means. I think we have to do all three and
show we can similar performance (these coecients will es-
sentially be our style vector).
This can be seen as discriminative clustering, which is sim-
ilar to kernel k-means and is similar to non-negative matrix
factorization.
0
20
40
60
80
Match Stats Ball Occ Formation Combined
Figure 8: Results of Team ID results.
In the previous section, given we had the ball and player
tracking data, we wanted to predict the team identity. In
this section, we want to do the reverse - given we just have
the identity of the twoteams playing, can we predict how the
game will be played by estimating what the match features
will be.
We use K-NN regression by using the style prior as the
input. From the previous section we have the weights for
Learn LDA Transform
Figure 6: Example of how our approach works. NB: for visualization purposes we estimate the occupancy
maps via covariances for each role which are depicted by ellipses.
used the team identity as the class labels (i.e., C=20).
We learn a Wfor each feature set and then multiply the
features by Wto yield a C1 feature vector. To predict
the identity label of the teams in the test match, we use
a k-nearest-neighbor classifier (k= 20) using the euclidean
norm as our distance metric.
X
Xscale
WLDA
WT
LDA
WT
LDAXscale
W= arg max
WTr( WbW
WwW)(18)
Maybe put something in here about clustering on style...
5. PREDICTING FUTURE PERFORMANCES
5.1 Predicting Team Behavior
But generally, these methods are essentially equivalent to
non-negative matrix factorization, kernel k-means and dis-
criminative k-means. I think we have to do all three and
show we can similar performance (these coecients will es-
sentially be our style vector).
This can be seen as discriminative clustering, which is sim-
ilar to kernel k-means and is similar to non-negative matrix
factorization.
0
20
40
60
80
Match Stats Ball Occ Formation Combined
Figure 8: Results of Team ID results.
In the previous section, given we had the ball and player
tracking data, we wanted to predict the team identity. In
this section, we want to do the reverse - given we just have
the identity of the twoteams playing, can we predict how the
game will be played by estimating what the match features
will be.
We use K-NN regression by using the style prior as the
input. From the previous section we have the weights for
Train
Fig. 6. Given a match descriptor, we first scale the data and then multiply it by
WTwhich is found using LDA to yield a discriminative feature vector. The
LDA matrix is learnt using the team identity labels and their match descriptors
in the training set. Team identity is predicted using k-NN.
B. Experiments
The team identity experiments were performed using
a “leave-one-match-out” cross-validation strategy where one
match was left out to test against, and the remaining matches
were used as the train set. A block diagram in Fig. 6 de-
scribes the process. Firstly, we generated the three descriptors
described above and scaled the features. To obtain a compact
but discriminative representation, we performed linear discrim-
inant analysis (LDA) by learning the transformation matrix
Wfrom the training set and used the team identity as the
class labels (i.e., C= 20). We learnt a Wfor each descriptor
and then multiplied the features by WTto yield a lower
dimensionality discriminant feature vector of dimensionality
C1. To predict the identity label of the teams in the test
match, we used a k-nearest-neighbor classifier (k= 20) using
the Euclidean norm as the distance metric.
The results for the various descriptors are shown in Fig. 5.
In the first experiment, (Fig. 5(a)) we can see that using
only match statistics is a poor indication of team identity
with an overall accuracy of 17% (chance is 5%). This result
makes sense as the match statistics only contain coarse event
information without any spatial or temporal information about
the ball or the players. Using the ball occupancy only gave
marginally improved performance over the match statistics
with an accuracy of 19% (Fig. 5(b)). This is well below the
33% which was obtained in the previous works [17], [19]. A
possible explanation of the performance difference could be
due to the coarse estimation of the possession strings and the
ball occupancy maps from the event data.
The most impressive performance by far is the formation
descriptor which obtains over 67% accuracy, which clearly
shows that teams have a true underlying signal which can be
encapsulated in the way the team moves in formation over
Match Stats Ball Occ. Formation Combined
0
20
40
60
80
17.13% 19.51%
67.32% 70.38%
TeamID Prediction Accuracy (%)
Fig. 7. Comparison of team identity prediction accuracy for the three different
match descriptors, as well as the combined performance.
time (Fig. 5(c)). We also fused together these descriptors by
concatenating all the scaled features, and performing LDA on
the combined features. This approach improved the overall
performance to over 70% which shows there is complimentary
information within the other descriptors. A bar-graph compar-
ing the overall performance for each descriptor is given in
Fig. 7.
V. ANALYZ IN G TEA M BEHAVIORS
In this section we explore how we can learn and represent
the characteristic style of teams, and use this for analyzing
team behaviors in prediction and anomaly detection tasks.
A. Team Style
Team style is a very subjective and high-level attribute to
label, especially in continuous sports like soccer. This is in part
due to the dynamic and low-scoring nature of such sports, as
it is hard to segment the game into discrete parts and assign a
label when style encompasses all aspects of play. Due to the
global nature of style, one way to quantify a team’s style is
via a linear combination of prior behavior styles.
Given a training set of team behavior descriptors, we can
discover a discrete set of styles using k-means clustering.
For evaluation, we exclude the last two rounds of the season
for testing, and use the remaining games to train the style
models. We first project the match features into a lower
dimensional, discriminative space using LDA, as in the team
identity experiments (Fig. 6), and then cluster similar examples
A
B
T
C
D
E
F
G
H
I
J
K
L
N
M
P
O
Q
S
R
Actual Team
0
30
0
30
5
0
0
0
4
4
0
1
15
0
0
5
1
19
0
0
27
1
31
2
5
32
29
31
1
0
0
2
9
0
0
26
20
4
0
2
1
1
1
0
21
0
0
2
24
24
29
0
4
2
32
0
11
6
1
1
0
0
0
0
0
0
0
0
0
7
0
28
1
29
0
1
0
0
31
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
28
1
2
3
4
5
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
5
10
15
20
25
30
1 2 3 4 5
0
31
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
2
31
4
0
1
3
1
1
0
1
21
1
0
27
0
0
0
0
0
0
9
0
0
32
7
0
0
0
0
0
0
0
0
2
0
0
0
0
26
0
17
0
4
0
5
29
1
0
1
1
4
0
0
2
2
0
0
1
0
0
0
0
15
0
0
0
15
34
20
0
1
28
29
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
29
0
2
3
1
0
0
0
0
2
0
3
1
7
0
16
1
6
0
6
0
3
0
0
0
31
0
1
1
0
0
1
0
0
0
0
0
5
0
1
0
0
0
0
0
0
29
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
31
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
28
1
2
3
4
5
6
7
8
9
10
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10
0
29
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
32
0
0
0
0
5
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
15
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
15
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
29
0
0
0
0
1
0
0
0
0
0
0
1
0
0
2
0
0
0
0
0
32
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
29
0
0
0
0
0
0
0
0
0
0
0
0
0
25
0
0
0
0
0
0
26
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
27
0
4
0
1
1
0
0
1
0
1
1
0
1
0
0
2
0
0
0
0
33
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
24
0
0
0
1
0
0
0
0
0
2
0
0
2
0
0
0
1
0
0
0
0
28
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
30
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
31
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
32
0
0
0
0
1
0
0
0
0
0
0
1
1
0
1
0
0
0
0
0
29
0
1
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
29
0
0
0
0
0
0
0
0
0
0
0
0
0
28
0
0
0
0
0
0
30
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
15
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1920
Style Cluster Style Cluster Style Cluster
(a) (b) (c)
Fig. 8. Results for clustering descriptors of each match half when we set the number of style clusters to: (a) 5, (b) 10, and (c) 20. These can be used as a
style prior for predicting the results of future matches.
Fig. 10. Prediction of formation using k-NN regression. (a) all training examples, (b) retrieved examples according to style prior, (c) the predicted formation
(= mean(retrieved examples)), (d) the actual formation.
Team’s style ordered by date
(Note, some values are missing so same column does not correspond to same round)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
A
B
T
C
D
E
F
G
H
I
J
K
L
N
M
P
O
Q
S
R
Actual Team
Team’s style ordered by date
(Note, some values are missing so same column does not correspond to same round)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
ARSEN
ASTON
CHELS
EVERT
FULAM
LIVER
MANCI
MANUD
NEWCA
NORWI
QUEEN
READI
SHAMP
STOKE
SUNDE
SWANS
TOTTE
WBROM
WEHAM
WIGAN 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
12345
Style Cluster:
Team ID
Fig. 9. The variation in team style for each team across a season can easily
be seen when to 5 styles.
in this space. The style clustering results for k= 5, 10 and 20,
are shown in Fig. 8.
Observing Fig. 8, there is some overlap in styles between
certain teams, and some teams exhibit multiple styles. The
variation in style for each team using k= 5 styles, is shown in
Fig. 9. Team T stands out, being in a style cluster of its own,
which could be explained by the distinctly different formation
from all other teams, with 3 defenders at the back (see Fig. 3).
Most teams play a single style, while teams E and R vary their
playing styles more frequently than other teams.
To encapsulate the behavior styles that teams adopt, we
define the playing style of a team as the normalized weights
from the style clustering matrices (e.g. for the 5 style clusters
used in Fig. 8(a), the style vector for Team A=[0,27
28 ,1
28 ,0,0],
Team B=[30
32 ,1
32 ,1
32 ,0,0], etc.). Modeling teams as a
combination of the styles they play makes intuitive sense, as
sometimes a team could play a pressing game and on other
occasions the team may play defensively, so they would be
weighted according to these performances. Another team may
be very rigid and play the same style every game - so the
weight for that style may be very high. These style vectors
can then be used to assist prediction.
B. Prediction and Anomaly Detection
Previously, given the ball and player tracking data, we
predicted the team identity. In this section, we want to do
the reverse - given we just have the identity of the two teams
playing, can we predict how the game will be played by
estimating what the match features will be?
To predict the most likely features, we use K-NN regression
using the learnt team style priors as the input, which allows
us to select which of the training matches to regress from for
1 2 3 4 5 6 78 9 10 11 12 13 14 15 16 17 18
2
4
6
Test match index
Mean error per role (m)
Difference Between Predicted and Actual Formation Played
Home Team
Away Team
Fig. 11. Results of comparing the predicted formation to the actual formation
played for each match in the final two rounds of the season (18 matches).
200 0 200
200
100
0
100
200
200 0 200
200
100
0
100
200
*MANUD* (Home) vs WBROM (Away) (distBetween est & actual = 9.71)
200 0 200
200
100
0
100
200
200 0 200
200
100
0
100
200
200 0 200
200
100
0
100
200
*MANUD* (Home) vs WBROM (Away) (distBetween est & actual = 9.71)
200 0 200
200
100
0
100
200
200 0 200
200
100
0
100
200
200 0 200
200
100
0
100
200
*MANUD* (Home) vs WBROM (Away) (distBetween est & actual = 9.71)
200 0 200
200
100
0
100
200
(a) (b) (c)
Fig. 12. Example of a poor formation estimate (test match 16), which appears
to be due to an anomaly in the team’s behavior. (a) retrieved examples, (b)
predicted formation, (c) actual formation
s
our prediction. That is, for each match in the training set, we
compare the two team styles to the test match’s team style
priors. We then extract the matches which are most similar in
terms of team styles, and calculate the mean features to predict
the outcome of the test match. We can then compare this
prediction with the actual result. The procedure, demonstrating
formation prediction is shown in Fig. 10.
We performed prediction of team formation on the last two
rounds of the season (containing 18 matches) and evaluated
the results by comparing the predicted formation to the actual
formation played as presented in Fig. 11. It can be seen that
most matches are estimated within 2 m average error per role,
while Match 1 and 16 are most poorly estimated. This suggests
that the teams were not playing their normal formation style in
these matches (i.e. anomalous behavior). The predictions allow
us to visualize the most likely formation given prior examples
and when anomalies occur, such as in Fig. 12.
VI. SUMMARY AND FUTURE WOR K
In this paper, we first presented a formation descriptor
which was found by minimizing the entropy of a set of player
roles. Using an entire season of player tracking data, we
generated the formation descriptor by projecting the set of oc-
cupancy maps of each role into a low-dimensional discrimina-
tive feature space using linear discriminating analysis (LDA).
We showed that this approach characterizes individual team
behavior significantly better (3 times more) than other match
descriptors which are normally used to describe team behavior.
We then conducted a series of analysis and predictions which
showed the utility of our approach. In future work, we plan
to use this descriptor for short-term prediction (i.e., who will
the next pass go to etc.), as well as long-term prediction (i.e.,
match result).
Acknowledgement: The QUT portion of this research was supported by the
Qld Govt’s Dept. of Employment, Economic Development & Innovation.
REFERENCES
[1] Prozone, www.prozonesports.com.
[2] STATS SportsVU, www.sportvu.com.
[3] A. Bialkowski, P. Lucey, P. Carr, Y. Yue, S. Sridharan, and I. Matthews,
“Large-Scale Analysis of Soccer Matches using Spatiotemporal Track-
ing Data,” in ICDM, 2014.
[4] K. Goldsberry, “CourtVision: New Visual and Spatial Analytics for the
NBA,” in MIT SSAC, 2012.
[5] R. Masheswaran, Y. Chang, A. Henehan, and S. Danesis, “Destructing
the Rebound with Optical Tracking Data,” in MIT SSAC, 2012.
[6] R. Masheswaran, Y. Chang, J. Su, S. Kwok, T. Levy, A. Wexler, and
N. Hollingsworth, “The Three Dimensions of Rebounding,” in MIT
SSAC, 2014.
[7] J. Wiens, G. Balakrishnan, J. Brooks, and J. Guttag, “To Crash or Not
to Crash: A quantitative look at the relationship between the offensive
rebounding and transition defense in the NBA,” in MIT SSAC, 2013.
[8] P. Lucey, A. Bialkowski, P. Carr, Y. Yue, and I. Matthews, “How to Get
an Open Shot: Analyzing Team Movement in Basketball using Tracking
Data,” in MIT SSAC, 2014.
[9] A. Bocskocsky, J. Ezekowitz, and C. Stein, “The Hot Hand: A New
Approach to an Old “Fallacy”,” in MIT SSAC, 2014.
[10] A. Miller, L. Bornn, R. Adams, and K. Goldsberry, “Factorized Point
Process Intensities: A Spatial Analysis of Professional Basketball,” in
ICML, 2014.
[11] D. Cervone, A. D’Amour, L. Bornn, and K. Goldsberry, “POINTWISE:
Predicting Points and Valuing Decisions in Real Time with NBA Optical
Tracking Data,” in MIT SSAC, 2014.
[12] P. Carr, M. Mistry, and I. Matthews, “Hybrid Robotic/Virtual Pan-Tilt-
Zoom Cameras for Autonomous Event Recording,” in ACM Multimedia,
2013.
[13] X. Wei, P. Lucey, S. Morgan, and S. Sridharan, “Sweet-Spot: Using
Spatiotemporal Data to Discover and Predict Shots in Tennis,” in MIT
SSAC, 2013.
[14] ——, “Predicting Shot Locations in Tennis using Spatiotemporal Data,
in DICTA, 2013.
[15] G. Ganeshapillai and J. Guttag, “A Data-Driven Method for In-Game
Decision Making in MLB,” in MIT SSAC, 2014.
[16] S. Sinha, C. Dyer, K. Gimpel, and N. Smith, “Predicting the NFL Using
Twitter,” in ECML Workshop on Machine Learning and Data Mining
for Sports Analytics, 2013.
[17] P. Lucey, A. Bialkowski, P. Carr, E. Foote, and I. Matthews, “Character-
izing Multi-Agent Team Behavior from Partial Team Tracings: Evidence
from the English Premier League,” in AAAI, 2012.
[18] Opta Sports, www.optasports.com.
[19] P. Lucey, D. Oliver, P. Carr, J. Roth, and I. Matthews, “Assessing team
strategy using spatiotemporal data,” in ACM SIGKDD, 2013.
[20] A. Bialkowski, P. Lucey, P. Carr, Y. Yue, and I. Matthews, “Win at
home and draw away: Automatic formation analysis highlighting the
differences in home and away team behaviors,” in MIT SSAC, 2014.
[21] J. Tenenbaum and W. Freeman, “Separating Style and Content with
Bilinear Models,” Neural Computation, vol. 12, no. 6, pp. 1247–1283,
2000.
[22] C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. Efros, “What Makes
Paris Look Like Paris?” ACM Transactions on Graphics (SIGGRAPH),
vol. 31, no. 4, 2012.
[23] Y. Lee, A. Efros, and M. Hebert, “Style-Aware Mid-Level Representa-
tion for Discovering Visual Connections in Space and Time,” in ICCV,
2013.
[24] S. Roberts, R. Everson, and I. Rezek, “Minimum Entropy Data Parti-
tioning,” IET, pp. 844–849, 1999.
[25] Y. Lee and S. Choi, “Minimum Entropy, K-Means, Spectral Clustering,
in International Joint Conference on Neural Networks, 2004.
[26] H. W. Kuhn, “The hungarian method for the assignment problem,
Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
... Playing styles are defined as behaviours or patterns adopted by teams to achieve offensive and defensive match objectives (Fernandez-Navarro et al., 2016;Zhou et al., 2021). They are complex and dynamic systems that can be influenced by multiple factors, and that can present modifications or variations both throughout the seasons and during the season itself (Bialkowski et al., 2014;Yi et al., 2019). In recent years, the analysis of performance based on playing styles has been widely studied across different sports (Greenham et al., 2017;Hewitt et al., 2016). ...
... Nevertheless, teams do not always have the same style of play; this can vary throughout the season (Bialkowski et al., 2014;Yi et al., 2019). A limitation of the study was the absence of any comparisons between the team's final ranking, location (home -away) or even the coach substitution. ...
... These are variables that were important for other studies (Gollan et al., 2020;Gómez et al., 2018;González-Ródenas et al., 2021;Lopez-Valenciano et al., 2021;Yi et al., 2019) and could be essential to facilitate the interpretation of the KPI and the categorisation of the factors, such as in the case of high distance in the opponent's own and high intensity play. Additionally, the incorporation of data on ball occupancy in the field could help to build a more complete model (Bialkowski et al., 2014). ...
Article
The behaviour adopted by football teams during matches is defined as their playing style and is a key area for performance analysis. This study aimed to identify the playing styles of LaLiga teams considering match outcome and overall comparison, during a season. The sample was collected from the 2020/2021 LaLiga, consisting of 380 matches, with a total of 760 records. In this research, technical (12), tactical (six) and physical (seven) variables were selected, as well as three variables related to goalkeeper performance, totalling a count of 28 performance indicators. A principal component analysis with orthogonal Varimax rotation was performed. As a result, the models explained more than 76% of the total variance, identifying six main factors or playing styles. These styles were recognised as build-up, high pressing, high-intensity play, direct play, use of crosses and high distance in the opposition half. Compared to previous research conducted in other leagues, five of the six styles could be identified in advance, but not the high-distance style, which only occurred in draws and wins. The use of crosses was identified for victories or defeats. Coaches, analysts and sport scientists could take these playing styles into account for analysis and match preparation.
... For instance, team formations have been recognized as an important factor in determining the playing style of teams. 15,16 Additionally, in-depth investigations into contextual variables have been scrutinized. Specifically, the adoption of different playing styles has been investigated based on a) match location, match status, and the ranking of the team and its opponent, 4,17 b) the country of the league, [18][19][20] c) the level of the league, 21 d) the different teams of the same league, 11,12 e) the age group, 22 and f) the type of playing surface. ...
Article
Full-text available
The recognition of playing styles in soccer has been established as highly significant in the performance analysis of the sport. The aim of this research was to clarify the terms used by authors to express this specific concept and to identify all recognized playing styles, examining their relationships, thereby creating a comprehensive framework. We employed a qualitative study design using a Grounded Theory approach. A rigorous process of open, axial, and selective coding was applied, involving nine researchers to ensure the reliability of the findings. Qualitative research data were obtained from documents found on Scopus and Google Scholar. After applying specific criteria, 205 documents were deemed suitable, with 22 of them necessary to achieve theoretical saturation, the point where no new properties , dimensions, or relationships emerge during analysis. The 22 documents were analyzed using Atlas.ti.23, identifying 84 codes, 40 of which were utilized as categories and 44 as subcategories. The set of codes categorized into six thematic folders. The analysis led to the identification of terms used to express the concept of style in the international literature and the recognition of playing styles used to characterize a team a) regardless of the game phases, b) in specific phases of the game, c) in specific sub-phases of the attack, d) based on the game phases that teams rely on for their tactics, and e) based on the teams' physical performance. By synthesizing existing literature, we proposed a Grounded Theory that serves as a consensus point for researchers and coaches. This theory managed to overcome the limitations of individual studies and can serve as the foundation for effective communication within the soccer community, thus being a useful tool for future research, as well as for coaches, analysts, and scouts of the teams.
... The use of AI methods in football analysis has gained popularity in recent years due to the large amount of data generated by matches. Machine learning algorithms, such as Random Forests and Support Vector Machines, have been used to analyze football data and identify patterns that can provide insights into team performance [2]. In [3], the authors show that collective actions scored 51.6% of all goals, while individual actions scored 10.5% of goals. ...
Conference Paper
Full-text available
In football, the ability to make accurate and effective passes to the third zone of the pitch is a key aspect of a team’s success. Evaluating these passes can provide valuable information about a team’s performance and help coaches and analysts make informed decisions about their tactics and strategies. In this article, we will explore the possibility of using artificial intelligence methods to score passes to the third zone on the field, incomparison to traditional metrics.
Article
While it is evident that disparities exist across various areas on a football pitch, and numerous studies have investigated spatio-temporal datasets in football for various analyses, there remains a lack of an effective method for quantitatively partitioning the pitch into specific areas with different properties. To address this gap, this article presents a novel approach to partitioning a football pitch into distinct areas based on successful passing paths that lead to goals. Utilizing hierarchical clustering and spatial/temporal features derived from successful passing paths, the study provides multi-level partitions of the football pitch, revealing detailed insights into the relationships between specific areas and scored shots in football games. Empirical analysis of over 4000 successful passing paths from various football leagues and international football events demonstrates the effectiveness of the proposed methodology in identifying and understanding the diverse areas of football pitches. The findings suggest practical applications in football analysis, aiding coaches and specialists in tactics development and informing player positioning and movement strategies.
Article
Due to ever-growing soccer data collection approaches and progressing artificial intelligence (AI) methods, soccer analysis, evaluation, and decision-making have received increasing interest from not only the professional sports analytics realm but also the academic AI research community. AI brings game-changing approaches for soccer analytics where soccer has been a typical benchmark for AI research. The combination has been an emerging topic. In this paper, soccer match analytics are taken as a complete observation-orientation-decision-action (OODA) loop. In addition, as in AI frameworks such as that for reinforcement learning, interacting with a virtual environment enables an evolving model. Therefore, both soccer analytics in the real world and virtual domains are discussed. With the intersection of the OODA loop and the real-virtual domains, available soccer data, including event and tracking data, and diverse orientation and decision-making models for both real-world and virtual soccer matches are comprehensively reviewed. Finally, some promising directions in this interdisciplinary area are pointed out. It is claimed that paradigms for both professional sports analytics and AI research could be combined. Moreover, it is quite promising to bridge the gap between the real and virtual domains for soccer match analysis and decision-making.
Chapter
Theoretical performance analysis (TPA) and practical performance analysis (PPA) were introduced as “sub-disciplines” of performance analysis in the first chapter. This distinction constitutes a basic concept of performance analysis as a scientific subject. TPA has become a very productive scientific discipline along with the increasing amount of data available on sports competitions. A classification of approaches in TPA may distinguish three “schools” that have evolved, the “stats” school, the modelling approach, and the dynamical systems school, each with a specific set of preferred methods. A commonality of all these approaches is that they are in principle employing designs of basic research, that is, the aim is to establish general (statistical) laws or adequate models for a better understanding of the phenomena. This chapter has three sections dealing with general problems and aims of the basic approaches of TPA: statistical approaches with performance profiles and studies on influencing or contextual factors, modelling approaches, which are divided in direct models for sports phenomena (ball possession, playing style etc.) and models imported from other fields like social network analysis or stochastical models, and, finally, dynamical systems theories with complex systems theory/synergetics and ecological psychology. These three basic areas are treated more in depth trying to give a systematics of topics they are dealing with, focusing on specific concepts and methods.
Chapter
Sets of moving entities can form groups which travel together for significant amounts of time. Tracking such groups is an important analysis task in a variety of areas, such as wildlife ecology, urban transport, or sports analysis. Correspondingly, recent years have seen a multitude of algorithms to identify and track meaningful groups in sets of moving entities. However, not only the mere existence of one or more groups is an important fact to discover; in many application areas the actual shape of the group carries meaning as well. In this paper we initiate the algorithmic study of the shape of a moving group. We use kernel density estimation to model the density within a group and show how to efficiently maintain an approximation of this density description over time. Furthermore, we track persistent maxima which give a meaningful first idea of the time-varying shape of the group. By combining several approximation techniques, we obtain a kinetic data structure that can approximately track persistent maxima efficiently.KeywordsGroup densityQuadtreesTopological persistence
Article
Full-text available
In soccer, the offensive style of play describes characteristic behavioral features of the players at team level during the offensive phase of matches. This study aimed to investigate the effect of offensive playing style (i.e., while in ball possession) on physical and technical match performance during offensive play as well as success-related factors. The sample consisted of official tracking and event data of 153 matches of the 2020/21 German Bundesliga season. For every team in every match an offensive playing style coefficient was calculated to locate teams on a continuum between ball possession and counter-attacking style. This calculation contains 11 technical and physical performance parameters and has already been validated. In addition, dependent physical (e.g., sprinting distance), technical (e.g., passes), and success-related (e.g., goals) variables were examined. A separate linear mixed model was calculated for each dependent variable. While teams with lower playing style coefficient values (= counter-attacking style) covered more high-intensity (p ≤ 0.01; R² = 0.13) and sprinting distances per second in possession (p ≤ 0.01; R² = 0.14), teams with higher playing style coefficient values (= ball possession style) were physically more demanded over a whole match (e.g., more accelerations (p ≤ 0.01; R² = 0.69), decelerations (p ≤ 0.01; R² = 0.69), high-intensity (p ≤ 0.01; R² = 0.36), sprint distance (p = 0.03; R² = 0.08)). Furthermore, teams with higher playing style coefficient values played more horizontal passes (p ≤ 0.01; R² = 0.73) and revealed better passing success rates (p ≤ 0.01; R² = 0.17). In contrast, teams with lower playing style coefficient values played more long passes (p < 0.01; R² = 0.58). The influence of the playing style coefficient on success-related variables was smaller (p ≤ 0.36; R² = 0.10–0.13). Concluding, offensive playing style affects physical and technical match performance, but has limited influence on success. Hence, coaches can use the findings to optimize training contents to prepare players for the physical demands of a match.
Article
Full-text available
The extent of player formation usage and the characteristics of player arrangements are not well understood in Australian football, unlike other team-based invasion sports. Using player location data from all centre bounces in the 2021 Australian Football League season; this study described the spatial characteristics and roles of players in the forward line. Summary metrics indicated that teams differed in how spread out their forward players were (deviation away from the goal-to-goal axis and convex hull area) but were similar with regard to the centroid of player locations. Cluster analysis, along with visual inspection of player densities, clearly showed the presence of different repeated structures or formations used by teams. Teams also differed in their choice of player role combinations in forward lines at centre bounces. New terminology was proposed to describe the characteristics of forward line formations used in professional Australian Football.
Conference Paper
Full-text available
Although the collection of player and ball tracking data is fast becoming the norm in professional sports, large-scale mining of such spatiotemporal data has yet to surface. In this paper, given an entire season's worth of player and ball tracking data from a professional soccer league (≈400,000,000 data points), we present a method which can conduct both individual player and team analysis. Due to the dynamic, continuous and multi-player nature of team sports like soccer, a major issue is aligning player positions over time. We present a "role-based" representation that dynamically updates each player's relative role at each frame and demonstrate how this captures the short-term context to enable both individual player and team analysis. We discover role directly from data by utilizing a minimum entropy data partitioning method and show how this can be used to accurately detect and visualize formations, as well as analyze individual player behavior.
Article
Full-text available
Although the collection of player and ball tracking data is fast becoming the norm in professional sports, large-scale mining of such spatiotemporal data has yet to surface. In this paper, given an entire season's worth of player and ball tracking data from a professional soccer league (≈400,000,000 data points), we present a method which can conduct both individual player and team analysis. Due to the dynamic, continuous and multi-player nature of team sports like soccer, a major issue is aligning player positions over time. We present a 'role-based' representation that dynamically updates each player's relative role at each frame and demonstrate how this captures the short-term context to enable both individual player and team analysis. We discover role directly from data by utilizing a minimum entropy data partitioning method and show how this can be used to accurately detect and visualize formations, as well as analyze individual player behavior.
Conference Paper
Full-text available
We present a method to generate aesthetic video from a robotic camera by incorporating a virtual camera operating on a delay, and a hybrid controller which uses feedback from both the robotic and virtual cameras. Our strategy employs a robotic camera to follow a coarse region-of-interest identified by a realtime computer vision system, and then resamples the captured images to synthesize the video that would have been recorded along a smooth, aesthetic camera trajectory. The smooth motion trajectory is obtained by operating the virtual camera on a short delay so that perfect knowledge of immediate future events is known. Previous autonomous camera installations have employed either robotic cameras or stationary wide-angle cameras with subregion cropping. Robotic cameras track the subject using realtime sensor data, and regulate a smoothness-latency trade-off through control gains. Fixed cameras post-process the data and suffer significant reductions in image resolution when the subject moves freely over a large area. Our approach provides a solution for broadcasting events from locations where camera operators cannot easily access. We can also offer broadcasters additional actuated camera angles without the overhead of additional human operators. Experiments on our prototype system for college basketball illustrate how our approach better mimics human operators compared to traditional robotic control approaches, while avoiding the loss in resolution that occurs from fixed camera system.
Conference Paper
Full-text available
In terms of analyzing soccer matches, two of the most important factors to consider are: 1) the formation the team played (e.g., 4-4-2, 4-2-3-1), and 2) the manner in which they executed it (e.g., conservative – sitting deep, or aggressive – pressing high). Despite the existence of ball and player tracking data, no current methods exist which can automatically detect and visualize formations. Using an entire season of Prozone data which consists of ball and player tracking information from a recent top-tier professional league, we showcase an automatic formation detection method by investigating the “home advantage”. In a paper we published recently, using an entire season of ball tracking data we showed that home teams had significantly more possession in the forward third which correlated with more shots and goals while the shooting and passing proficiencies were the same. Using our automatic formation analysis, we extend this analysis, and show that teams tend to play the same formation at home as they do away, but the manner in which they execute it is significantly different. Specifically, we show that the formation of teams at home is significantly higher up the field compared to when they play away. This conservative approach at away games suggests that coaches aim to win their home games and draw their away games. Additionally, we also show that our method can visually summarize a game which gives an indication of dominance and tactics. While enabling new discoveries of team behavior which can enhance analysis, it is also worth mentioning that our automatic formation detection method is the first to be developed.
Conference Paper
We present a weakly-supervised visual data mining approach that discovers connections between recurring mid-level visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. In contrast to existing discovery methods that mine for patterns that remain visually consistent throughout the dataset, our goal is to discover visual elements whose appearance changes due to change in time or location; i.e., exhibit consistent stylistic variations across the label space (date or geo-location). To discover these elements, we first identify groups of patches that are style-sensitive. We then incrementally build correspondences to find the same element across the entire dataset. Finally, we train style-aware regressors that model each element's range of stylistic differences. We apply our approach to date and geo-location prediction and show substantial improvement over several baselines that do not model visual style. We also demonstrate the method's effectiveness on the related task of fine-grained classification.
Conference Paper
The "Moneyball" revolution coincided with a shift in the way professional sporting organizations handle and utilize data in terms of decision making processes. Due to the demand for better sports analytics and the improvement in sensor technology, there has been a plethora of ball and player tracking information generated within professional sports for analytical purposes. However, due to the continuous nature of the data and the lack of associated high-level labels to describe it - this rich set of information has had very limited use especially in the analysis of a team's tactics and strategy. In this paper, we give an overview of the types of analysis currently performed mostly with hand-labeled event data and highlight the problems associated with the influx of spatiotemporal data. By way of example, we present an approach which uses an entire season of ball tracking data from the English Premier League (2010-2011 season) to reinforce the common held belief that teams should aim to "win home games and draw away ones". We do this by: i) forming a representation of team behavior by chunking the incoming spatiotemporal signal into a series of quantized bins, and ii) generate an expectation model of team behavior based on a code-book of past performances. We show that home advantage in soccer is partly due to the conservative strategy of the away team. We also show that our approach can flag anomalous team behavior which has many potential applications.
Conference Paper
Professional sports is a roughly $500 billion dollar industry that is increasingly data-driven. In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. Specifically we show how to use regularized linear regression to learn pitcher-specific predictive models that can be used to help decide when a starting pitcher should be replaced. A key step in the process is our method of converting categorical variables (e.g., the venue in which a game is played) into continuous variables suitable for the regression. Another key step is dealing with situations in which there is an insufficient amount of data to compute measures such as the effectiveness of a pitcher against specific batters. For each season we trained on the first 80% of the games, and tested on the rest. The results suggest that using our model could have led to better decisions than those made by major league managers. Applying our model would have led to a different decision 48% of the time. For those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up performing poorly 60% of the time.
Conference Paper
Real-world AI systems have been recently deployed which can automatically analyze the plan and tactics of tennis players. As the game-state is updated regularly at short intervals (i.e. point-level), a library of successful and unsuccessful plans of a player can be learnt over time. Given the relative strengths and weaknesses of a player’s plans, a set of proven plans or tactics from the library that characterize a player can be identified. For low-scoring, continuous team sports like soccer, such analysis for multi-agent teams does not exist as the game is not segmented into “discretized” plays (i.e. plans), making it difficult to obtain a library that characterizes a team’s behavior. Additionally, as player tracking data is costly and difficult to obtain, we only have partial team tracings in the form of ball actions which makes this problem even more difficult. In this paper, we propose a method to overcome these issues by representing team behavior via play-segments, which are spatio-temporal descriptions of ball movement over fixed windows of time. Using these representations we can characterize team behavior from entropy maps, which give a measure of predictability of team behaviors across the field. We show the efficacy and applicability of our method on the 2010-2011 English Premier League soccer data.