Content uploaded by Guiliang Liu

Author content

All content in this area was uploaded by Guiliang Liu on Jul 27, 2020

Content may be subject to copyright.

Noname manuscript No.

(will be inserted by the editor)

Deep soccer analytics: Learning an action-value function for

evaluating soccer players

Guiliang Liu ·Yudong Luo ·Oliver Schulte ·

Tarak Kharrat

the date of receipt and acceptance should be inserted later

Abstract Given the large pitch, numerous players, limited player turnovers, and sparse scor-

ing, soccer is arguably the most challenging to analyze of all the major team sports. In this

work, we develop a new approach to evaluating all types of soccer actions from play-by-play

event data. Our approach utilizes a Deep Reinforcement Learning (DRL) model to learn an

action-value Q-function. To our knowledge, this is the ﬁrst action-value function based on

DRL methods for a comprehensive set of soccer actions. Our neural architecture ﬁts con-

tinuous game context signals and sequential features within a play with two stacked LSTM

towers, one for the home team and one for the away team separately. To validate the model

performance, we illustrate both temporal and spatial projections of the learned Q-function,

and conduct a calibration experiment to study the data ﬁt under different game contexts. Our

novel soccer Goal Impact Metric (GIM) applies values from the learned Q-function, to mea-

sure a player’s overall performance by the aggregate impact values of his actions over all

the games in a season. To interpret the impact values, a mimic regression tree is built to ﬁnd

the game features that inﬂuence the values most. As an application of our GIM metric, we

conduct a case study to rank players in the English Football League (EFL) Championship.

Empirical evaluation indicates GIM is a temporally stable metric, and its correlations with

standard measures of soccer success are higher than that computed with other state-of-the-

art soccer metrics.

Keywords Deep Reinforcement Learning ·Action-Value Q-function ·Goal Impact Metric ·

Fine-Tuning ·Player Ranking

1 Introduction: Valuing Actions and Players

A major task of sports statistics is player evaluation, which provides insight into the perfor-

mance of a player [Schumaker et al., 2010]. Performance evaluation is important for team

The ﬁnal publication is available at link.springer.com http://link.springer.com/article/10.

1007/s10618-020- 00705-9

Corresponding Author: Yudong Luo

School of Computing Science, Simon Fraser University, and Sportlogiq Predictive Analytics

Burnaby, British Columbia, Canada

E-mail: yudong luo@sfu.ca

2 Guiliang Liu et al.

management and fan engagement. For instance, fantasy leagues allow fans to draft or build

their favourite team, based on the skills and the performance of players.

With the arrival of high-frequency tracking systems and object detection algorithms,

ever more data on the movement of players in professional sports have become available.

There is an increasing opportunity for large-scale machine learning to model complex sports

dynamics and evaluate players’ performances. Many evaluation metrics have been proposed

in recent years. The most common approach has been to evaluate players via quantifying the

values of the actions they took [McHale et al., 2012; Decroos et al., 2019].

Traditional sports evaluation metrics face two major problems: 1) Many player evalua-

tion metrics (e.g., expected goals) focus only on the actions with immediate impact on goals,

such as shots, but omit other actions that have signiﬁcant long-term effects. This limitation

is more severe when scoring is sparser; for example, soccer games are very likely to end

with zero or one goal. 2) Traditional methods tend to assign ﬁxed values to actions, regard-

less of the playing circumstances. To tackle these issues, Routley and Schulte [2015] built a

Markov model to capture the game context for ice hockey and calculated a Q-value for each

action. The Q-values estimate, for each action, the probability that a team scores the next

goal after the action, given the current game context.

Soccer is arguably the most challenging to analyze of all the major team sports [Bornn

et al., 2018]. The game context of soccer is even more complicated than that of ice hockey,

given that soccer has more players (22 players), larger pitch (350 feet long and 150 feet wide)

and longer playing time (90 minutes), all which lead to complex spatio-temporal distribution

patterns for each team. In this paper, we apply Deep Reinforcement Learning (DRL) to learn

an action-value Q-function from events in a soccer game. We introduce a stacked two-tower

LSTM to capture the playing dynamics for home and away teams separately. Unlike the

traditional control problem in reinforcement learning aiming to learn the optimal policy, we

solve the prediction problem in the passive learning (on policy) setting.

Based on the learned Q-function, we introduce two metrics to measure the performance

of players and theoretically justify their consistency. First, the Goal Impact Metric (GIM)

ranks a player by aggregating the impacts of all his actions, where the impact of an action

is the change of consecutive Q values due to this action. In empirical comparison with four

comparison metrics, GIM shows the highest correlation with most standard success mea-

surements. Generalizing from an initial sample of season matches, GIM is the best predictor

of season total goals and assists. Second, an alternative to the action value approach is to

compare a player to a random or league-average player (e.g., Cervone et al. [2014]). This

compares the expected success (e.g. the number of team wins) between the situations where

the player is ﬁelded and the situation if the player is replaced by a random or average player.

We adopt this idea to introduce a new approach for play-by-play data that deﬁnes a natural

Q-value-above-average-replacement metric for player performance measurement. Our main

theorem states that a player’s Q-value-above-average-replacement gives the same score as

their total action impact value. This means that the DRL framework uniﬁes the two funda-

mental approaches to player evaluation; the plausibility of the average replacement approach

supports our total action value metric (GIM).

To compute the action values for all players, we build a large dataset consisting of over

4.5M action events by pooling data from several soccer leagues. This dataset allows the

model to learn general estimates for actions values. However, as the game context within a

speciﬁc league may differ from that of the general soccer game, player assessment should be

adjusted for different leagues. To address the trade-off between generalizing across leagues

and specializing to a speciﬁc one, we propose a ﬁne-tuning approach: beginning with the

general model as an initialization, then train the model on the speciﬁc data from a certain

Deep soccer analytics: Learning an action-value function for evaluating soccer players 3

Fig. 1: A tree diagram to position our work in the research landscape. An important factor is whether a

metric considers all actions or only a subset of them. Our approaches assign a value to all on-the-ball actions.

Methods in bold are evaluated in our experiments and the star marks the proposed metrics.

league. Given the English Football League (EFL) Championship data, we use ﬁne-tuning to

improve the model’s ﬁtting performance as well as the evaluation results for players in this

league.

Contribution. The main contributions of this paper can be summarized as follows.

1. The ﬁrst neural Markov game model for soccer play-by-play event data. We utilize deep

reinforcement learning to estimate a context-aware Q-function.

2. A novel two-tower neural network architecture to capture the spatio-temporal complex-

ity of the home and away teams separately in a soccer game.

3. A ﬁne-tuning approach that learns a general action value model from a very large dataset

that combines different leagues, while capturing statistical patterns for speciﬁc leagues.

While versions of ﬁne-tuning have been applied in computer vision image datasets, to

our knowledge, ﬁne-tuning is new in deep sports analytics.

4. Two new soccer performance metrics based on the Q-function: Goal Impact Metric and

Q-value-above-average-replacement (QAAR). To the best of our knowledge, QAAR is

the ﬁrst replacement-based metric for soccer play-by-play data. We prove that they are

numerically identical, unifying the two fundamental approaches to player evaluation in

an RL framework.

2 Related Work

2.1 Evaluating Soccer Players

The handbook by Albert et al. [2017] provides several up-to-date survey articles on player

evaluation.

+/- (Plus-Minus) is a commonly applied player evaluation metric using goals only. It

qualiﬁes the inﬂuence of a player’s presence on the goal scoring opportunity for his team.

The basic version awards a player +1 if a goal is scored by the player’s own team when the

player is on the pitch, and -1 if the other team scores. Some recent works modify the basic

plus-minus metric, by weighting the goals according to their importance, based on expected

win probability, game time and game frequency [Schultze and Wellbrock, 2018], or with

machine learning and survival models to estimate both expected goals and expected points

to assess a player’s overall defensive and offensive inﬂuence [Kharrat et al., 2019].

Expected Goals (XG) uses shot information to quantify the value of a shot by the prob-

ability of a goal given shot features (e.g. angle to goal). Players are ranked by their total ex-

pected goals [Ali, 2011]. Many recent works have applied a similar method to study passes

4 Guiliang Liu et al.

rather than shots, where the quality of a player’s passes is quantiﬁed by their inﬂuence on

expected scoring opportunities. Passing is one of the most frequent actions in soccer. For

each pass, Brooks et al. [2016] measured its value as the estimated probability of resulting

in a successful shot. Bransen and Van Haaren [2018] measured its value as the difference be-

tween the goal-scoring probability before and after the pass. A drawback of these ratings is

that they evaluate only one type of action without modeling a player’s overall performance.

Several recent works rate players by evaluating all their actions. The Expected Pos-

session Value (EPV) [Cervone et al., 2016] evaluated all the actions in basketball within a

possession by estimating the expected number of points from the possession. Following this

framework, Fern´

andez et al. [2019] built a deep model from the full resolution spatiotempo-

ral data to compute the EPVs for all actions during a game. They study the action impacts

of individual soccer players under different game situations. Their approach requires track-

ing data, which assume the complete observability of all players. Many other play-by-play

datasets, including ours, provide only partial observability of game context: they record only

actions of the players who possess the ball at a given time. For on-ball action data, Decroos

et al. [2019] introduced the VAEP (Valuing Actions by Estimating Probabilities) framework

that evaluates all on-ball actions of soccer players based on their inﬂuence on the game

outcome. However, instead of explicitly representing the game environment, their model

considers a set of hand-crafted action features from the recent game history, and whether an

action will lead to a goal within a constant number of future steps.

Another approach to evaluating players is quantifying their value-above-replacement

(VAR). The most common VARs include Goals/Wins Above Replacement (GAR/WAR) which

measure the player’s contribution to his or her team by estimating the difference of team’s

scoring/winning chances when the target player is on the ﬁeld, vs. compared to a replacement-

level player. In this paper we take the replacement-level player to be a statistical league-

average or random player. In other works, replacement-level represents a player of common

skills available for minimum cost to a team.

2.2 Reinforcement Learning in Sport Analytics.

Reinforcement Learning (RL) models event data of the form s0, a0, r1, s1, a1,...,st, at,

rt+1, st+1 , at+1: environment state stoccurs, an action atis chosen, resulting in a reward

rt+1 and state st+1. At the next time step, another action at+1 is chosen. The data are often

separated into local transitions of the form T{s, a, r0, s0, a0}. Reinforcement Learning has

been applied to evaluating the actions of players. Schulte et al. [2017a] applied an ice hockey

play-by-play dataset to build a Markov model, where actions record the player movements

and states capture the game context. They measured players performance by their expected

Scoring Impact (SI). The expected scoring probabilities of player actions under different

game context are modeled by a Q-function using dynamic programming [Puterman and

Patrick, 2017] based on the Bellman equation:

Q(s, a) = Es0,a0[r0+Q(s0, a0)|s, a](1)

=X

r0

Pr(r0|s, a)r0+X

s0,a0

Pr(s0, a0|s, a)Q(s0, a0)(2)

This recurrence allows us to estimate the Q value at a current context s, a given an esti-

mate for the next Q values and transition probabilities Pr. Schulte et al. [2017a] discretized

Deep soccer analytics: Learning an action-value function for evaluating soccer players 5

location and time coordinates, and used maximum likelihood estimates for the resulting dis-

crete transition probabilities. The XThreat model is a discrete Markov model for soccer that

divides the pitch into 192 zones and uses the Bellman equation to assess the expected scoring

changes and resulting impact values [Van Roy et al., 2017]. The XThreat model considers

only two action types, passes and dribbles. Discretization leads to loss of information and

undesirable spatial-temporal discontinuities in the Q-function. The discontinuities prohibit

the model from generalizing to the unobserved part of the state space.

Instead of explicitly modeling transitions in a discrete MDP, our work employs a model-

free approach which learns Q values without explicitly estimating transition and reward

probabilities [Sutton and Barto, 2018]. Many previous model-free RL works [Mnih et al.,

2015] applied model-free learning with deep neural networks to capture continuous action

and state features. These works mainly focused on controlling in continuous-ﬂow games

(e.g., Atari games). However, the real agents—players—in professional sports games are

subject to evaluation, but not subject to control by an RL method.

Dick and Brefeld [2019] applied model-free RL to value match states in soccer accord-

ing to the chance that the team currently in position will bring the ball close to the other

team’s goal. They assume tracking data (specifying the location and ball at each time step),

rather than event data as our model does. Also, they did not apply the learned value func-

tion to assess player performance. To evaluate players performance, Liu and Schulte [2018]

applied a deep recurrent model to capture the features of game history in ice hockey. Their

model computes Q values to measure a player’s expected probability of scoring the next

goal with the Sarsa temporal difference learning method. Our work extends the approach of

Liu and Schulte [2018] from ice hockey to a more complex model designed for the more

complex sport of European soccer. We show can the resulting impact values can be inter-

preted through mimic learning and provide a theoretical justiﬁcation for the learned impact

values.

3 Dataset

Sports analytics uses several different formats of data: box score data, which provide total

action counts per player and match (e.g., number of goals scored), play-by-play data, which

are logs of discrete action events specifying various properties of the action (e.g. action

type, acting player, time and location), and tracking data, which record the location of each

player at dense time intervals (e.g. for every broadcast video frame, or more frequently

with stadium cameras). In this paper, we utilize the F24 play-by-play soccer game dataset

provided by Opta1. The dataset records the play-by-play information of game events and

player actions for the entire 2017-2018 game season from multiple soccer leagues, including

English Premier League, Dutch Eredivisie, EFL Championship, Italian Serie A, German

Bundesliga, Spanish La Liga, French Ligue 1 and German Bundesliga Zwei. Table 3 shows

dataset statistics. The dataset records the actions of on-the-ball players and the spatial and

the temporal context features. The complete feature set is listed in Table 2. Table 1 lists a

series of events describing a goal sequence for the home and away teams. The dataset utilizes

adjusted spatial coordinates. Both the X-coordinates and Y-coordinates are adjusted to [0,

+100]. The adjusted soccer pitch is shown in Figure 2, where play ﬂows from left to right

for either team. To adjust coordinates, we reverse them when the team in possession attacks

towards the left, so in this case XAdj usted =−rescale (X)and YAdjusted =−rescale (Y).

1https://www.optasports.com/

6 Guiliang Liu et al.

Fig. 2: Soccer pitch layout with adjusted coordinates. Coordinates are adjusted so that for the home/away

team performing an action, its offensive zone is on the right

The adjusted coordinates accelerate model convergence during training and improve the

model ﬁt for spatial features (Section 6.1).

MP=Manpower, GD=Goal Difference, OC = Outcome, S=Succeed,

F=Fail, H=Home, A=Away, T=Team who performs action, GTR = Game Time Remain, ED = Event Duration

GTR X Y MP GD Action OC Velocity ED Angle T Reward

35m44s 87 26 Even 1 simple pass S (2.2, 1.7) 11.0 0.19 H [0,0,0]

35m42s 90 17 Even 1 standard shot F (1.5, -4.5) 2.0 0.11 H [0,0,0]

35m42s 99 44 Even 1 save S (0, 0) 0.0 0.06 A [0,0,0]

35m9s 100 1 Even 1 cross S (0.0, -1.3) 33.0 0.0 H [0,0,0]

35m7s 85 56 Even 1 simple pass S (-7.3, 27.6) 2.0 0.39 H [0,0,0]

35m5s 92 67 Even 1 simple pass S (3.6, 5.4) 2.0 0.28 H [0,0,0]

35m4s 97 50 Even 1 corner shot S (5.1, -16.2) 1.0 1.74 H [0,0,0]

35m4s 100 50 Even 1 goal S (0, 0) 0.0 0.0 H [1,0,0]

....... ... ... .... ... ............ ... .......... ... ..... . ......

3m41s 62 96 Even 2 long ball F (4.5, 9.3) 9.0 0.08 A [0,0,0]

3m39s 19 89 Even 2 clearance S (-21.5, -3.2) 2.0 0.07 H [0,0,0]

3m35s 24 100 Even 2 throw in S (1.3, 2.7) 4.0 0.09 A [0,0,0]

3m33s 27 96 Even 2 simple pass S (1.1, -2.2) 2.0 0.1 A [0,0,0]

3m31s 12 95 Even 2 cross S (-7.5, -0.5) 2.0 0.07 A [0,0,0]

3m28s 6 46 Even 2 simple pass S (-1.7, -16.3) 3.0 0.79 A [0,0,0]

3m26s 14 48 Even 2 standard shot S (3.8, 1.3) 2.0 0.44 A [0,0,0]

3m26s 0 50 Even 2 goal S (0, 0) 0.0 0.0 A [0,1,0]

Table 1: A data sample featuring team scoring: a sequence of events where home team scores and then

away team scores. The rewards [1,0,0] and [0,1,0] indicate the scoring event of home team and away team

respectively (see Section 4.1). We skip some events in the middle due to space issues.

4 Modeling Play Dynamics

This section introduces our approach to deﬁning a Markov model for soccer games and a

Q-function to evaluate actions of players under different game context.

4.1 Markov Game Model for Sports Game

Similar to [Liu and Schulte, 2018], we apply the Markov Game Framework to model the

play dynamics for sports games. The basic building blocks of the model are:

Deep soccer analytics: Learning an action-value function for evaluating soccer players 7

Name Type Range

Game Time Remaining Continuous [0, 100]

X Coordinate of ball Continuous [0, 100]

Y Coordinate of ball Continuous [0, 100]

Manpower Situation Discrete [-5, 5]

Goal Differential Discrete (-∞, +∞)

Action Discrete one-hot representation

Action Outcome Discrete {success, failure}

Velocity of ball Continuous (-∞, +∞)

Event Duration Continuous [0, +∞)

Angle between ball and goal Continuous [−π,+π]

Home or Away Team Discrete {Home, Away}

Table 2: Complete feature list. For the feature manpower situation, negative

values indicate short-handed, positive values indicate power play.

Dataset F24

Events 4,679,354

Players 5,510

Games 2,976

Teams 164

Leagues 10

Season 2017-18

Place Europe

Table 3: Dataset statis-

tics. The basic unit of this

dataset is event, which de-

scribes the game context

and the on-the-ball action

of a player at a time step.

–There are two agents, Home and Away, representing their respective teams.

–The action atdenotes the movements of players who control the ball. Our model applies

a discrete action vector using one-hot representation.

–An observation is a feature vector xtspecifying a value of the features listed in Table 2

at a discrete time step t. We use the complete sequence st≡(xt, at−1, xt−1,...,x0)to

represent the state [Mnih et al., 2015].

–The reward rtis a vector of goal values gtthat speciﬁes which team (Home,Away)

scores. We introduce an extra Neither indicator for the eventuality that neither team

scores until the end of a game. For readability, we use Home,Away,Neither to de-

note the team in a 1-of-3 vector of goal values rt= [gt,Home, gt,Away , gt,Neither ]and

gt,Home = 1 indicates the home team scores at time t(see Table 1).

4.2 The Next-Goal Q-Function

Several value functions have been used to evaluate player actions. One option is to measure

actions by whether they increase the winning chances [Routley, 2015]. More recent works

focus on an action’s more immediate impact regarding scoring points or goals [Cervone

et al., 2016; Schulte et al., 2017b]. For soccer, we formalize this idea in terms of the next-

goal Qfunction, which is deﬁned as follows.

We divide a soccer game into goal-scoring episodes, so that each episode 1) starts at

the beginning of the game, or immediately after a goal, and 2) terminates with a goal or at

the end of the game. The next-goal Q-function represents the probability that the home resp.

away team scores the goal at the end of the current goal-scoring episode (goalHome =1

resp. goalAway =1), or neither team scores (goalNeither =1):

Qteam(s, a) = P(goal team =1|st=s,at=a)(3)

where team is a placeholder for one of Home,Away,Neither. This Q-function repre-

sents the probability that a team scores the next goal, given current play dynamics in a sports

game [Schulte et al., 2017a; Routley and Schulte, 2015]. For player evaluation, the next-goal

Q-function has several advantages over win probabilities.

–Compared to ﬁnal match outcome, the Q values model the probability of scoring the

next goal that is a relatively short time away and thus easier to explain and understand.

8 Guiliang Liu et al.

–Increasing the probability that a player’s team scores the next goal captures both of-

fensive and defensive value. For example, a defensive action like tackling decreases the

probability that the other team will score the next goal, thereby increasing the probability

that the player’s own team will score the next goal.

–The next-goal reward captures what a coach expects from a player. For example, instead

of thinking about how the game will end, a coach prefers his players to focus on de-

fending against their opponent’s strike and creating the next scoring opportunities at the

moment.

5 Learning Q Values: Model Architecture and Training

This section introduces a neural network architecture and the weight training methods to

learn a Q-function (Qteam(s, a)).

5.1 Model Architecture: Function Approximation with Neural Network

We discuss the model architecture for learning the Q values. Given a discrete state space,

it is possible to use dynamic programming for computing Q-values [Schulte et al., 2017b;

Van Roy et al., 2017]. But our soccer model contains continuous observation features de-

rived from continuous time stamps and spatial locations. A common solution is to discretize

spatio-temporal indices [Gudmundsson and Horton, 2017]. However, the resulting disconti-

nuities undermine the precision of state values and impugn predictive accuracy. In this paper,

we develop a neural network approach that can directly incorporate continuous observation

features.

To generate Q-values, our model applies the two-tower design [Song et al., 2017] to

ﬁt the data of home/away teams separately and a recurrent neural network to capture the

sequential features in play history. Figure 3 shows our model structure. The model ﬁts home

and away data separately, because from domain knowledge we expect the Q values to be

different depending on whether a team plays at home or away (for a discussion of the home

team advantage see [Swartz and Arce, 2014]). Each tower captures the play history with a

stacked LSTM, which is a multi-layer LSTM, where outputs of LSTM cells in lower layers

are used as the input for higher layers. Compared to the single layer LSTM, stacking adds

levels of abstraction for the input features of sequences. This increases the model’s ability to

generalize across complex game contexts. The complete play history of game contexts and

actions (st, at) is summarized in the last hidden state of the top LSTM layer. Our model uses

a team identiﬁer unit to select the hidden state from the home or the away tower according to

who controls the ball in the current play. The selected hidden state values are sent to hidden

layers whose outputs are normalized by a softmax function and considered as our estimates

of ˆ

QHome (s, a),ˆ

QAway (s, a), and ˆ

QNeither (s, a).

5.2 Weight Training

We train the two-tower neural network with an Temporal Difference (TD) prediction method

Sarsa [Sutton and Barto, 2018, Ch.6.4] and apply a dynamic-possession LSTM to control

the trace length during training. Our goal is to learn a function that estimates Qteam(s, a)

for the play dynamics observed in our dataset, with which we evaluate the performance of

players. The training details are as follows.

Deep soccer analytics: Learning an action-value function for evaluating soccer players 9

Fig. 3: The architecture of our Two-Tower Dynamic Play LSTM (TTDP-LSTM). The ﬁgure shows how the

model processes two generic time instances, one associated with home team, is analyzed by the home tower,

and the other from away team, is analyzed by the away tower.

Home/Away Tower Weight Training. At training time step t, our model feeds the output

from the home/away tower to the hidden layers if the home/away team controls the ball at

time t. During one training step, the hidden layers estimate the Q values for two continuous

actions and states within one transition T{st, at, rt+1, st+1 , at+1}. The estimated Q values

are applied to compute the TD loss:

L(θ) = X

team∈T

E(rteam,t+1 +ˆ

Qteam(st+1 , at+1)−ˆ

Qteam(st, at))2(4)

We use mini-batch gradient descent with backpropagation to ﬁnd weights of our neural

model that minimize this loss function (Figure 3). As for each transition, an error signal is

sent only to either the home or the away tower, the ﬂow of gradients will only inﬂuence

one of the two towers and thus their weights are updated independently. This independence

separates home and away signals and helps the network to learn their impact.

Dynamic Possession-LSTM. Team sports like soccer have a turn-taking aspect where one

team is on the offensive and the other defends; one such turn is called a play. A play ends

when possession passes from the team at time tto the opposing team at time t+ 1 [Liu

and Schulte, 2018]. In a sports game, events within a play are highly correlated, but when

a team loses control of the ball (meaning the play ends), the attacking team switches to

defense. The dependence between actions from successive plays is therefore much weaker.

The turn-taking aspect inspires a natural way of determining the trace length tlt, which

controls how far back in time the LSTM propagates the error signal from the current time

at the input history. Instead of ﬁxing the trace length, our model dynamically computes it

and sets tltto the number of time steps from current time tto the beginning of the current

play (with a maximum of 10 steps), so that the LSTM can restrict the history traces to the

continuous possession of one team. Using possession changes to deﬁne episodes for tem-

poral models has been proven to be successful in many continuous-ﬂow sports, especially

basketball [Cervone et al., 2016; Gudmundsson and Horton, 2017].

10 Guiliang Liu et al.

Training Settings. For our TTDP-LSTM model in Figure 3, both home and away towers

apply a two-layer LSTM, whose outputs are sent to two hidden layers with three output

nodes. The number of nodes in LSTM hidden states and hidden layers are both 256. The max

trace length of LSTM is 10 [Hausknecht and Stone, 2015b]. During training, we minimize

the loss function L(θ)with Adam optimizer with an initial general learning rate of 10E-04

on the entire dataset (containing over 4.5M event data) and a ﬁne-tuning initial learning rate

of 10E-05 on the league-speciﬁc datasets.

Computational Complexity : Applying the neural network approximation function, the Sarsa

prediction algorithm learns the Q function by updating the weights of a neural network

through backpropagation. Our model applies a two-layer stacked LSTM with trace length

10 plus an embedding layer for each team and two hidden layers to generate the Q values.

The sizes of hidden layers (or state) for both dense layers and LSTM cells are set to 256.

Assuming we have mtraining examples in a batch and the dimension of input space is n,

the time complexity of ﬁnishing training a neural network for one batch is therefore O(mn).

While the cost of each training step is linear in the batch size, the number of gradient steps

required until convergence depends on the dataset and the hyperparameter settings and can-

not be bounded a prior.

6 Model Validation: Q Values

Our case studies illustrate the learned Q-function with temporal and spatial projections. To

validate the model performance, we show that the learned Q values are well-calibrated,

meaning that they offer a satisfactory ﬁt to empirical scoring frequencies observed under

different game contexts.

6.1 Illustration of Temporal and Spatial Projection.

Temporal Projection. We illustrate the estimated Q values for actions and states across game

times. Figure 4 shows a value ticker [Cervone et al., 2016] that represents the evolution of the

Q values during a randomly sampled game from our dataset. The ﬁgure plots values of the

three output nodes representing ˆ

QHome (s, a),ˆ

QAway (s, a), and ˆ

QNeither (s, a), according

to which we highlight critical events to show the context-sensitivity of the Q-function. We

observe that: 1) High scoring probabilities for one team decrease those of its opponent. 2)

The probability that neither team scores rises signiﬁcantly at the end of the match.

Spatial Projection. To study the inﬂuence of players’ positions on scoring probability, we

generate Q values for the entire soccer pitch. Our neural model can generalize from ob-

served states and actions to those that have not occurred in the observed game season. Our

model’s generalization ability allows us to estimate a Q value for any action performed

at any position. Figure 5 shows the learned smooth Q-function surface ˆ

QHome (s, a)over

possible game trajectories for several actions of the home team including shot, pass, cross,

and tackle. We select these actions because they occur frequently and have been studied in

previous work [Brooks et al., 2016; Van Haaren et al., 2016]. For the selected actions, we

observe that the Q value of offensive actions like shots, passes, and crosses increases with

proximity to the opponent’s goal. The value of defensive tackling increases with proximity

to the team’s own goal. Angles from the left side of the goal appear slightly more promising

Deep soccer analytics: Learning an action-value function for evaluating soccer players 11

than from the right. The plots for ˆ

Qhome(s, pass)and ˆ

Qhome(s, cross)show the same phe-

nomena. An explanation for the ﬁrst observation is that players have more chance to score

when they approach their opponent’s goal. For the second observation related to shot angle,

inspection of our dataset reveals several goals scored on the upper corner (e.g. successful

banana kick) but none on the lower corner. The left/right asymmetry also explains why the

defensive action tackle made near the bottom left corner is more valuable (the last plot):

tackles disturb opponents’ actions that might lead to successful shots on their upper corner.

Fig. 4: Temporal Projection of the learned Q-function. The game is between Fulham (Home) and Shefﬁeld

Wednesday (Away), which has happened on Aug.19th, 2017.

Fig. 5: Spatial Projections for estimated Q values: ˆ

QHome (s, shot),ˆ

QHome (s, pass),ˆ

QHome (s, cross)

and ˆ

QHome (s, tackle)over the entire soccer pitch. We use the adjusted coordinate described in Section 3.

12 Guiliang Liu et al.

6.2 Calibration Quality for the learned Q-function

The calibration studies evaluate how well our learned Q-function ﬁts the observed next-goal

scoring frequencies under different game discrete contexts. Our approach to deﬁning dis-

crete game contexts is to divide the continuous state space into discrete bins. To calculate

the empirical scoring frequency associated with each bin, we assign an observed state to a

bin according to the values of three discrete context features in the last observation: Man-

power (Short Handed (SH), Even Strength (ES), Power Play (PP)), Goal Differential (≤ −3,

-2, -1, 0, 1, 2, ≥3) and Period (1 (ﬁrst half), 2 (second half)). The total number of bins is

3×7×2 = 42. This partition has two advantages. 1) The context features are well-studied

and important for soccer experts [Decroos et al., 2019], so the model predictions can be

checked against domain knowledge. 2) The partition covers a wide range of match contexts,

and each bin aggregates a large set of play histories. If our model exhibits a systematic bias,

the aggregation should amplify it and the bias should become detectable.

Given the set of bins where each bin Acontains a total of |A|states, the empirical and

estimated scoring probabilities for each bin are deﬁned as follows:

–Empirical Scoring Probabilities : for each observed state s, we set goal obs

team(s) = 1if

the observed episode containing state sends with a goal by team team =Home,Away

or neither (team =Neither). Then Qobs

team(A) = 1

|A|Ps∈Agoalobs

team(s)

–Estimated Scoring Probabilities: we apply our TTDP-LSTM model to estimate a Q

value for each observed sequence and average the resulting estimates to compute the

estimated scoring probabilities : ˆ

Qteam(A) = 1

|A|Ps∈Aˆ

Qteam(s, a)

We evaluate the ﬁt as the difference between the average empirical scoring probability

Qobs

team(A)and the average estimated scoring probability ˆ

Qteam(A). We show the results

in Table 4 where the context features Manpower (Man.), Goal Differential (Goal.) and Pe-

riod (P.) deﬁne a bin, and |A|records the number of actions in each bin Ain our dataset.

The estimated Q-function matches several well-known phenomena: 1) The chance of either

team scoring another goal decreases in the second period. 2) A clear home team advan-

tage [Swartz and Arce, 2014]: Comparing two match contexts with the home and away

team roles exchanged, the relative advantage of the home team is greater than that of the

away team. 3) Manpower advantage by the home team means a lower scoring chance for the

away team.

Our conclusions are as follows. 1) The model ﬁt is satisfactory (i.e., the average MAE for

all bins is below 0.1), except for some relatively rare game contexts. (For instance, the con-

text where the home team is trailing with a manpower advantage in the ﬁrst period, whose

corresponding bin count is only 876 out of 3M match states). 2) Our model signiﬁcantly

outperforms the Markov Model with a discrete state space. This shows the advantage of a

function approximation model that can utilize continuous space-time information without

losing information due to discretization.

7 Player Evaluation Metric Based on Q values

In this section, we show how a player evaluation metric can be derived from the Q-function.

Our paper’s main approach to measuring player performance is assigning impact values (the

difference between two consecutive Q values) to a player’s action. To understand when the

neural network will assign a high value to a player action, we ﬁt a regression tree with the

Deep soccer analytics: Learning an action-value function for evaluating soccer players 13

Man. Goal. P. |A|TT Home TT Away TT MAE Markov MAE

ES -1 1 73176 0.4374 0.4159 0.0052 0.1879

ES -1 2 96408 0.3496 0.3025 0.0782 0.1783

ES 0 1 356597 0.4437 0.4272 0.026 0.1908

ES 0 2 160080 0.356 0.3077 0.0814 0.1792

ES 1 1 88726 0.4402 0.4128 0.0335 0.1899

ES 1 2 119901 0.3459 0.295 0.077 0.1787

PP -1 1 876 0.4366 0.4045 0.1752 0.1937

PP -1 2 3319 0.352 0.2911 0.0668 0.1685

PP 0 1 3183 0.4414 0.403 0.1308 0.187

PP 0 2 7183 0.3579 0.2855 0.0841 0.1804

PP 1 1 1316 0.4391 0.3949 0.115 0.1825

PP 1 2 7676 0.356 0.2862 0.1121 0.1792

Table 4: Calibration Results. TT Home and TT Away report the average scoring probability ˆ

Qteam(A)esti-

mated by our TTDP-LSTM model. Here we compare only Q values for pass and shot as they are frequent and

well-studied actions. TT MAE is the Mean Absolute Error (MAE) between estimated scoring probabilities

from our model and empirical scoring probabilities. For comparison, we also report a Markov MAE which

applies the estimates from a discrete-state Markov model [Schulte et al., 2017b].

state-action features and the corresponding impact values. To provide a theoretical foun-

dation for our impact metric, this section introduces another Q-value-Above-Replacement

metric to evaluate a player’s action. By proving both metrics are equivalent, we show that

Q-values unify the two main approaches to player evaluation.

7.1 Goal Impact: Deriving Action Values from Q-values.

Our Q-function concept provides a novel AI-based deﬁnition for assigning a value to an

action. Similar to Schulte et al. [2017b]; Routley and Schulte [2015], we measure the qual-

ity of an action by how much it changes the expected total reward of a player’s team: the

difference in expected total reward before and after the player acts. The scoring chance at a

time measures the value of a state, and therefore depends on the previous efforts of the entire

team, whereas the change in value directly measures the impact of an action by a speciﬁc

player. For our speciﬁc choice of Next Goal as the reward function, we refer to goal impact.

The total impact of a player’s actions is his Goal Impact Metric (GIM) value.

The following equations show how the action impact can be computed for a transition

T{s, a, r0, s0, a0}given Q value estimates from our TTDP-LSTM model. The expected fu-

ture total reward before s0, a0is given by r0+Es0,a0[Qteam (s0, a0)|s, a](here the expectation

is taken over all possible successor states and actions). The expected future total reward after

s0, a0is given by r0+Qteam (s0, a0). Therefore:

impactteam (s,a,s0,a0)≡Qteam(s0,a0)−Es0,a0[Qteam (s0,a0)|s,a]

GIM i(D)≡X

s,a,s0,a0

n[s, a, s0, a0,pl0=i;D]·impactteam (s,a,s0,a0)(5)

where Dindicates our dataset, teamidenotes the team of player i, and

n[s, a, s0, a0,pl0=i;D]

14 Guiliang Liu et al.

is the number of occurrences that player iperforms action a0at s0after s, a. The Bellman

equation (1) implies that Es0,a0[Qteam(s0, a0)|s, a] = Qteam (s, a)−E[r0|s, a]. The expecta-

tion can therefore be computed from estimated Q values given an expected rewards model.

In our data, scoring a goal is represented as a separate action goal, after which no transition

occurs. This means that for every transition T{s, a, r0, s0, a0}, we have a6=goal,r0= 0

and thus E[r0|s, a] = 0. So in this representation, the impact equation (5) reduces to the

difference in Q values before and after the player acts.

7.2 Understanding Impact Values with Mimic Decision Tree

The impact values are computed with the Q-function, which applies a black-box neural

network to ﬁt the state-action features. To understand why some actions have large impacts

under certain game contexts, we apply Mimic Learning [Ba and Caruana, 2014] and train a

transparent regression tree (CART) to mimic the behavior of the deep model.

This interpretability study consists of two main steps. 1) We feed states and actions of

the players as input into a CART to ﬁt the resulting impact values via supervised learning.

At each splitting node, CART automatically selects the feature that contributes the largest

variance reduction to impact values on the child nodes. We split until one of the child nodes

contains fewer than 80/90 samples for shot/pass respectively. 2) After tree learning, we com-

pute the importance of a feature by summing the variance reductions at the splits applying

this feature [Liu et al., 2018].

We rank the state and action features by their importance values. Tables 5 and 6 show

the top 10 important features for shot and pass. Figure 6 and Figure 7 illustrate the structure

of the CART trees by plotting its top three layers. The trees for both shot and pass impacts

place at the root action outcome (a binary feature marking success or failure of an action),

which intuitively is one of the most important action features. We also ﬁnd that the shot

impact signiﬁcantly increases as a player approaches the goal, which is consistent with our

ﬁnding in the spatial projection for Q values. For passing, its impact increases with game ve-

locity. An explanation is that a quick pass prevents potential interruptions from opponents.

When the game is close to the end, we observe that although the average passing impact

decreases, the variance of impact among different passes signiﬁcantly increases. Our CART

in Figure 7 accurately locates the time when this phenomenon starts to occur (Time Remain

(t-1)<39.45). Another important observation is that in addition to features from current time

t, the historical features (e.g. X Coordinate (t-1)) are also considered as important for pre-

dicting the impact of the current action.

Feature Inﬂuence

X distance (t) 0.6632

outcome (t) 0.2275

Y distance (t) 0.0469

Game Time Remain (t) 0.0242

duration (t) 0.0062

X Coordinate (t-1) 0.0059

Game Time Remain (t-1) 0.0035

interrupted (t) 0.0035

X velocity (t) 0.0030

outcome (t-1) 0.0019

Table 5: Feature inﬂuence for the impact of shot.

Feature Inﬂuence

X Velocity (t) 0.1355

Distance to Goal(t) 0.1264

Game Time Remain (t-1) 0.1082

Game Time Remain (t) 0.0816

Outcome (t) 0.0773

Outcome (t-1) 0.0760

Distance to Goal (t-1) 0.0411

Angle (t) 0.0373

Angle (t-1) 0.0298

X Velocity (t-1) 0.0174

Table 6: Feature inﬂuence for the impact of pass.

Deep soccer analytics: Learning an action-value function for evaluating soccer players 15

Fig. 6: Regression tree for the impact of shot. Fig. 7: Regression tree for the impact of pass.

7.3 Q Value Above Average Replacement

We compare the goal impact metric to deriving a player metric from a Q-function using

an above-average-replacement framework. The fact that the same player performance rank-

ing can be derived using two fundamentally different approaches supports the conceptual

foundations of our metric.

The QAAR metric, compares the expected total future reward given that player iacts

next, to the expected total future reward given that a random replacement player acts next:

QAARi(D)≡X

s,a

n[s,a,pl0=i;D]Es0,a0[Qteam(s0, a0|s, a, pl 0=i)]−(6)

Es0,a0[Qteam(s0, a0)|s, a]

where n[s, a, pl0=i;D]is the occurrence number that player iperforms an action after

s, a. The QAAR metric can be computed for a dataset by using the maximum likelihood

estimates of transition probabilities. QAAR and GIM are natural deﬁnitions for the value-

above-replacement and action-value approaches, respectively. Our main result is that they

are equivalent:

Proposition 1 For each player irecorded in our play-by-play dataset D, his Q-value-

above-replacement is equal to his goal impact metric: QAARi(D) = GIM i(D).

The complete proof is in our Appendix. This equation indicates that by summing a player’s

impact over an entire game season (GIM), we measure how much his general playing skill

exceeds that of an average player (a replacement player with average Q-value) in the same

league. Thus the same method for ranking players can be derived from a Q-function using

two fundamentally different approaches. In the next section, we show some ranking exam-

ples by applying GIM to rate players.

8 Player Ranking: Case Study

To illustrate GIM, we discuss the ranking results for several players. We rank the EFL Cham-

pionship players by their GIMs over the entire 2017-2018 game season. Our case study only

ranks players in one league because they face the same level of competition and therefore

their contributions are comparable. We chose the EFL Championship, which is just below

the Premier League in the league hierarchy, because it has a large number of players in our

data set and it has been much less studied than the Premier League.

16 Guiliang Liu et al.

Fine-Tuning. Different leagues have their own characteristics including competition level,

season length, and playoff agenda. Therefore we apply a ﬁne-tuning technique in order to

achieve a better adaptation to the EFL Championship games.

1. Train a general model to evaluate actions in European soccer using games from multiple

European Soccer leagues.

2. Fine-tune the initial weight values from the general model, with a smaller learning rate

and using only EFL Championship game data.

Fine-tuning reﬁnes the general model and improves its ability to capture the behaviour

of players. Compared to training the model from scratch, ﬁne-tuning signiﬁcantly reduces

training time and prevents over-ﬁtting. In the following assessment, we describe GIM values

computed with the ﬁne-tuned model and present both a general ranking for all actions and

action-speciﬁc rankings.

8.1 All-Actions Assessment

Table 7 lists the 10 players with highest GIM for all actions. Our ranking includes the players

with the most goals and assists. We investigate the positive correlation between our metric

and standard success measures further in the next section.

Matej Vydra tops our 2017-2018 season ranking. He dominated the scoring board of

the England Championship league and won the 2017-18 Golden Boot award2. In the next

season (2018-2019), the Premier League team Burnley recognized the talent of Vydra and

signed him on a three-year deal from team Derby.

Another example is Tom Cairney, who has only 5 goals and 5 assists over the entire

season but ranks 6th in GIM assessment. Although he does not lead by any standard success

statistics (Goals, Assists), his impact was an indispensable factor of his team’s success in

winning the 2017-18 EFL playoffs. For example, he scored the only goal of the ﬁnal in

which Fulham beat Aston Villa by 1-0 in the Wembley stadium and earned promotion to

the Premier league. Tom Cairney was nominated as the EFL’s Championship Player of the

Season award3.

name team GIM Goals Assists

Matej Vydra Derby 18.017 21 4

Leon Clarke Shefﬁeld United 17.785 19 5

Lewis Grabban Sunderland 16.045 12 0

Bobby De Cordova-Reid Bristol 15.976 19 7

Diogo Jos´

e Teixeira da Silva Wolverhampton 15.707 17 5

Tom Cairney Fulham 15.24 5 5

Ivan Cavaleiro Wolverhampton 14.979 9 12

Stefan Johansen Fulham 13.565 8 8

James Maddison Norwich 13.23 14 8

Gary Hooper Shefﬁeld Wednesday 11.953 10 3

Table 7: 2017-2018 season top-10 Player Impact Scores for players in EFL Championship game season.

2https://www.skysports.com/football/news/11688/11361634/

3https://www.bbc.com/sport/football/43641225

Deep soccer analytics: Learning an action-value function for evaluating soccer players 17

name GIM Goal

Matej Vydra 4.747 21

Leon Clarke 4.024 19

Lewis Grabban 3.775 12

Kouassi Ryan Sessegnon 3.657 15

Harry Wilson 3.135 7

Famara Diedhiou 3.015 13

Sean Maguire 2.5 10

Joe Garner 2.44 10

Jarrod Bowen 2.408 14

Callum Paterson 2.29 10

Table 8: Top-10 players with largest shot impact

in 2017-2018 EFL Championship game season.

name GIM Assist

Leon Clarke 8.05 5

Matej Vydra 5.957 4

Bobby De Cordova-Reid 5.134 7

Chris Wood 4.732 1

Gary Hooper 4.694 3

Ivan Cavaleiro 4.533 12

Diogo Jos´

e Teixeira da Silva 4.283 5

Gary Madine 4.202 2

Tom Cairney 4.123 5

Conor Hourihane 4.042 2

Table 9: Top-10 players with largest pass impact

in 2017-2018 EFL Championship game season.

8.2 Action-Speciﬁc Assessment

An action-speciﬁc ranking evaluates only the impacts of action of interest. We compute two

GIM rankings of EFL Championship players by shots and passes respectively. These are

frequent actions in soccer with high impact. Table 8 and Table 9 list the top 10 players.

GIM computed from shots only can be seen as an alternative to the popular expected goals

(XG) metric. A shot with high impact will signiﬁcantly increase the probability of scoring

and thus top players in Table 8 also lead the goal scoring. For instance, Matej Vydra is the

player with the highest scoring impact and he also dominated goal scoring during the 2017-

18 game season. However, the relation between pass impact and the number of assists is

more complex. There is some association, because assists are often high-valued passes. On

the other hand, the number of assists is an incomplete measure of passing ability because

it neglects midﬁeld and defensive zone passes. Our ranking, in contrast, provides a com-

prehensive evaluation to all the passes of a player. For example, Conor Hourihane plays as

Midﬁelder and managed only 2 assists over the entire season. But he makes many inﬂuential

passes and is ranked as a top-10 passer by our metric.

9 Player Ranking: Empirical Evaluation

We describe our comparison methods and evaluation methodology. Similar to clustering and

recommendation problems, there is no ground truth for player ranking. To assess a player

evaluation metric, we follow previous work [Routley and Schulte, 2015; Liu and Schulte,

2018] and compute its correlation with statistics that directly measure success.

9.1 Comparison Player Evaluation Metrics

We compare GIM with baseline player evaluation metrics to show the advantage of 1) mod-

eling game context 2) incorporating continuous context signal and history 3) separately

handling home and away state action signals.

Our baseline player evaluation metrics are as follows. Goal-based Metrics. i) Plus-

Minus (PM) is a commonly studied metric that measures how much the presence of a player

inﬂuences the goals of his team [Macdonald, 2011]. ii) Expected Goal (XG) weights each

shot by its chance of leading to a goal. Players are ranked by their total expected goal shots.

Both PM and XG consider only very limited game context and action types. The next three

18 Guiliang Liu et al.

baselines assign an impact value to all actions and evaluate players according to their total

action impact.

All-Action Metrics. iii) Valuing Actions by Estimating Probabilities (VAEP) [Decroos

et al., 2019] applies the difference of action values to compute the impact of on-the-ball

actions. Instead of applying Temporal Difference learning to estimate Qvalues, VAEP uses a

classiﬁer4to estimate the probability that an action leads to a goal within the next k(window

size) steps. iv) Scoring Impact (SI) is based on a Markov model with pre-discretized spatial

and temporal features (e.g. x,y coordinate and game time) [Schulte et al., 2017a]. Dynamic

programming is applied to estimate a Q-function and impact values for the discrete state-

action space. v) DP-LSTM is a neural network architecture that was previously applied to

estimate action values for ice hockey. It applies a recurrent model to capture game context

and TD learning to train the model [Liu and Schulte, 2018]. The difference with our TTDP-

LSTM is that it merges the home/away towers and ﬁts all the states and actions with a

single-layer network. We refer to the resulting impact score as (M-GIM) for “merge”.

A league-speciﬁc study evaluates our Fine-Tuning GIM (FT-GIM) for players in the

EFL Championship. Training a separate model with only EFL Championship data from

scratch consumes more computational resources than ﬁne-tuning the general model. Our

experiment records 4,386,894 gradient steps to learn a reliable model from initial weights

while ﬁne-tuning requires only 818,120 gradient steps.

Signiﬁcance Test. To assess whether GIM is signiﬁcantly different from the other player

evaluation metrics, we perform paired t-tests over all players. The null hypothesis is re-

jected with respective p-values: 9.33E-2, 5.27E-281, 8.03*E-218, 4.82E-14 and 1.02E-118

for PlusMinus, XG, SI, VAEP and M-GIM. This shows that GIM values are different from

the values of other metrics.

9.2 Season Totals: Correlations with Standard Success Measures

We report the correlations between player ranking metrics and commonly used success mea-

sures over the entire 2017-18 game season and highlight the comprehensiveness of our GIM

metric. The examined success measures include Goals, Assists, Shots per Game (SpG), Pass

Success percentage (PS%) and Key Passes per game (KeyP). We also study two penalty mea-

sures: Yellow card received (Yel) and Red card received (Red). Table 10 shows the correla-

tions between the comparison methods and the success/penalty measures, for the players in

all 10 leagues. In addition to the general study, Table 11 shows the result of a league-speciﬁc

evaluation where we compare only the correlations for players in the EFL Championship.

Our GIM achieves very good correlations compared to the other methods. Among the

positive success measures, GIM has the highest correlation with 4 out of 5 success measures

(Goals, Assists, SPG, and KeyP) and a competitive result for the other (PS%). Together, the

Q-function based metrics GIM, M-GIM, and SI show the highest correlations with success

measures. XG is only the fourth best metric, because it considers only the expected value

of shots and does not correct for the team effort leading up to the shot. VAEP achieves only

limited correlation with the success measures. This is because their model assigns similar

expected values to all actions, which translates into all action impact values being close to

0. The traditional Plus-Minus metric correlates poorly with almost all success measures. We

conclude that RL techniques that provide ﬁne-grained expected action value estimates lead

to performance metrics that better match traditional success statistics.

4The classiﬁer is implemented with a neural network rather than CatBoost in [Decroos et al., 2019] due

to the size of dataset. We discuss our VAEP implementation further in the limitations (section 10.2).

Deep soccer analytics: Learning an action-value function for evaluating soccer players 19

Comparing the different RL approaches, the neural network model allows GIM to handle

continuous inputs without pre-discretization. This prevents the loss of game context infor-

mation and explains why both GIM and M-GIM perform better than SI in most success

measures. The higher correlation of GIM compared to M-GIM also demonstrates the value

of separately modeling home/away data. For Yel and Red which reﬂect the number of re-

ceived penalties—negative contributions by a player—only our GIM-based metrics (GIM,

M-GIM) show a negative correlation with both of them. The model correctly recognizes

that a penalty will signiﬁcantly reduce the scoring probability, inﬂuencing the overall player

GIM. In contrast, other metrics focus on the actions that are likely to lead to goals, which

tends to reward aggressive players who incur more penalties.

Methods Goals Assists SpG PS% KeyP Yel Red

PM 0.284 0.318 0.199 0.288 0.218 0.001 -0.069

VAEP 0.093 0.290 0.121 -0.111 0.116 0.024 0.133

XG 0.422 0.173 0.328 0.164 0.278 0.534 0.034

SI 0.585 0.153 0.438 -0.140 0.052 0.114 -0.089

M-GIM 0.648 0.367 0.573 0.153 0.417 -0.110 -0.145

GIM 0.844 0.498 0.596 0.16 0.562 -0.181 -0.137

Table 10: Correlation with standard success measures for all the players. We bold the highest correlations and

underline the lowest ones for penalties.

Methods Goals Assists SpG PS% KeyP Yel Red

PM 0.262 0.223 0.122 0.155 0.112 0.033 -0.046

VAEP 0.08 0.26 0.116 -0.126 0.137 -0.015 0.215

XG 0.420 0.165 0.394 0.149 0.254 0.578 -0.021

SI 0.574 0.124 0.408 -0.144 0.054 0.084 -0.147

M-GIM 0.629 0.309 0.551 0.171 0.388 -0.039 -0.132

GIM 0.638 0.382 0.553 -0.053 0.468 -0.026 -0.105

FT-GIM 0.736 0.585 0.569 0.082 0.592 -0.110 -0.171

Table 11: Correlation with standard success measures for players in the EFL Championship. We bold the

highest correlations and underline the lowest ones for penalties.

The league-speciﬁc study demonstrates the beneﬁt of ﬁne-tuning the deep reinforcement

learning models. Compared to the correlations for players in all 10 leagues, Championship

League players’ correlations generally decrease. Both traditional action-count metrics (PM,

XG) and impact-based metrics (VAEP, SI, GIM, M-GIM) show the decrease, but it is more

severe for our GIM metric whose correlations nearly drop 20% when the players in the

Championship League are evaluated by the general model. Fine-tuning addresses this issue:

the FT-GIM metric achieves a larger negative correlation with penalty counts (Yel and Red).

9.3 Round-by-Round Correlations: Predicting Future Performance From Past Performance

These results assesses the player performance metrics through round-by-round correlations.

A sports season can be divided into rounds. In round n, a team or player has ﬁnished n

games in a season. For a given performance metric, we measure the correlation between

(i) its value computed over the ﬁrst nrounds, and (ii) the value of the two main success

20 Guiliang Liu et al.

measures, assists, and goals, computed over the entire season. This allows us to assess how

quickly different metrics acquire predictive power for the ﬁnal season total, so that future

performance can be predicted from past performance. A good performance metric should be

consistent with a player’s overall performance in the early season, which provides the player

and his team with evidence for trading or training.

Figure 8 shows the round-by-round correlations for the players in all 10 leagues.5The

predictive power of GIM grows more quickly than with any other baseline: its correlation

with both assists (left) and goals (right) dominates others before the ﬁrst half of the season.

M-GIM achieves the second highest correlations, for assists even higher than GIM in the

ﬁrst 5 rounds. However, its predictive power substantially drops after the ﬁrst 10 rounds.

The remaining two metrics XG and SI show only weak correlations with assists and goals.

Fig. 8: Correlations between round-by-round metrics and season totals for all players.

The question for our next experiment is: does ﬁne-tuning help predict a player’s ﬁnal

total performance from the past performance? This experiment focuses on players in the

EFL Championship. Figure 9 shows round-by-round correlations of the performance met-

rics with EFL Championship players’ total assists and goals. We make the following obser-

vations. 1) Compared to the all-player setting of Figure 8, the metrics’ correlations decline

when restricted to EFL Championship players. This decline is more apparent for our GIM

metric. The reason is that the neural network trained on the general player population does

not ﬁt the behaviour of players in the EFL Championship as well. 2) Fine-tuning signiﬁ-

cantly improves the correlations of GIM, especially for its correlation with assists, where

the correlation of FT-GIM exceeds that of other metrics after the ﬁrst 10 rounds.

10 Discussion

In this section, we discuss topics related to the sparsity of goals, model convergence and

limitations of our method.

10.1 The Sparsity of Goals

A common method to evaluate soccer players’ contribution is computing their inﬂuence on

goal scoring. However, goals are rare in a soccer game. This issue is similar to the sparse

5In Figure 8 and 9, we omit players from teams that play less than 40 games in the 2017-18 season.

Deep soccer analytics: Learning an action-value function for evaluating soccer players 21

Fig. 9: Correlations between round-by-round metrics and season totals for the players in EFL Champion.

reward problem in Reinforcement Learning (RL). To address goal sparsity, many previ-

ous works on sport analytics suggested including other measures like assists, passes, and

penalties in player evaluation. This is similar to reward shaping in RL, which adds some

handcrafted indirect reward signals to accelerate training convergence Ng et al. [1999]. Re-

ward shaping includes more information but raises the difﬁcult issue of how to weight the

relative importance of the indirect rewards (e.g., passes) of the real target reward (scoring).

The Temporal Difference solution learns a Q-function that propagates the reward (scoring)

signals to previous events, and assigns a value to all actions on the same expected rewards

scale.

10.2 Model Convergence

We discuss the convergence of our TTDP-LSTM model. TTDP-LSTM is trained by the on-

policy Temporal Difference (TD) method Sarsa. Previous work has guaranteed the conver-

gence of on-policy TD with linear function approximators [Tsitsiklis and Van Roy, 1997].

However, in this paper, we apply a non-linear neural network function approximator. It is

well-known that on-policy TD with a non-linear function approximator often exhibits un-

stable convergence in the traditional RL setting, when the action-value Q function is deﬁned

as the expected cumulative rewards with unlimited look ahead:

Q(st, at) = E[

∞

X

i=t

γi−t·r(si, ai)].

Here α∈(0,1) is the discount factor and ris the reward function. To alleviate the instability

of TD methods, in this work, we constrain the look-ahead to the next goal (rather than

the end of game) and remove the discount factor, so Q(st, at) = E[r(sT, tT)] which is

the expected scoring probability of the next goal. This is valid because, as discussed in

Section 7.1, the reward r(st, tt) = 0 except at goal occurrences T.

10.3 Limitations

We show some limitations in this work and discuss some potential solutions.

22 Guiliang Liu et al.

Partial observability for the players on pitch. At each time step, our dataset records only

positions and actions of the player controlling the ball. The locations of the off-ball players

are not known. The information of other players however, has inﬂuence on scoring proba-

bilities, especially for a complex team sport like soccer. To alleviate this issue, our TTDP-

LSTM model applies a recurrent model to ﬁt the play history and includes the information of

previous on-the-ball players. It has been previously observed in reinforcement learning that

incorporating action history compensates for partial observability to some extent, because

the model can infer missing current information from past information McCallum [1996];

Hausknecht and Stone [2015a]. For example, current player locations can be predicted to

some extent from past player locations. Nonetheless, the model performance is limited due

to partial observability. A direction for future work is to build a multi-agent reinforcement

learning framework that combines fully observable tracking data with event categories. A

possible approach is to combine the deep RL tracking model of Dick and Brefeld [2019]

with our event-based deep RL model.

The problem of big input data. Our dataset has over 4M events including spatial and tempo-

ral features of players. Fitting the entire data requires substantial computational resources.

The scalability challenges increase when we include the play history. Therefore it is difﬁcult

to utilize standard machine learning packages (such as decision tree, random forest or gra-

dient boosting) that typically assume the entire data can be ﬁt into a single working memory

batch. In this work, we build a neural network with mini-batch gradients. In future work,

we will explore on-line learning methods and evaluate their performance on big sports data.

In addition to improving scalability, on-line methods are well-suited to sports data as teams

want to update player assessments after every round.

11 Conclusion

This paper investigated Deep Reinforcement Learning (DRL) to learn complex spatio-temporal

dynamics for professional soccer analytics. We designed a neural network architecture that,

to our best knowledge, is the most complex deployed in sports analytics to date: A stacked

two-tower LSTM architecture, with one tower each for home and away teams. The network

was trained with on-ball action logs from several European leagues, comprising a total of

over 4.5M action events. The trained neural network provides a rich source of knowledge

about how a team’s chance of scoring the next goal depends on the match context.

Based on the learned action values, we developed a new context-aware performance

metric GIM for soccer players, taking all their actions into account. In our experiments,

GIM computed over the entire season showed the highest correlation with most standard

success measures. Generalizing from a sample of season matches, GIM was the best pre-

dictor of season total goals and assists. To improve the evaluation results for players in a

speciﬁc league, we applied a ﬁne-tuning approach to achieve an effective balance between

generalizing across leagues and specializing to a speciﬁc league. Directions for future work

include incorporating tracking data and developing on-line deep RL methods.

Deep RL methods have enjoyed spectacular success in board games. Our results show

that the analysis of physical team sports is another highly promising application area.

Deep soccer analytics: Learning an action-value function for evaluating soccer players 23

Acknowledgements

This work was supported by Strategic Project Grant from the National Sciences and En-

gineering Council of Canada, and a GPU donation from NVIDIA Corporation. We are

indebted for helpful discussion and comments to Norm Ferns and Bahar Pourbabee from

Sportlogiq.

References

Albert J, Glickman ME, Swartz TB, Koning RH (2017) Handbook of Statistical Methods

and Analyses in Sports. CRC Press

Ali A (2011) Measuring soccer skill performance: a review. Scandinavian journal of

medicine & science in sports 21(2):170–183

Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural infor-

mation processing systems, pp 2654–2662

Bornn L, Cervone D, Fernandez J (2018) Soccer analytics: Unravelling the complexity of

“the beautiful game”. Signiﬁcance 15(3):26–29

Bransen L, Van Haaren J (2018) Measuring football players’ on-the-ball contributions from

passes during games. In: MLSA-ECML Workshop, Springer, pp 3–15

Brooks J, Kerr M, Guttag J (2016) Developing a data-driven player ranking in soccer us-

ing predictive model weights. In: Proceedings of the 22nd ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, ACM, pp 49–55

Cervone D, D’Amour A, Bornn L, Goldsberry K (2014) Pointwise: Predicting points and

valuing decisions in real time with NBA optical tracking data. In: 8th Annual MIT Sloan

Sports Analytics Conference, February, vol 28

Cervone D, D’Amour A, Bornn L, Goldsberry K (2016) A multiresolution stochastic process

model for predicting basketball possession outcomes. Journal of the American Statistical

Association 111(514):585–599

Decroos T, Bransen L, Haaren JV, Davis J (2019) Actions speak louder than goals: Valuing

player actions in soccer. In: Proceedings of the 25th ACM SIGKDD International Con-

ference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA,

August 4-8, 2019., pp 1851–1861

Dick U, Brefeld U (2019) Learning to rate player positioning in soccer. Big data 7(1):71–82

Fern´

andez J, Barcelona F, Bornn L, Cervone D (2019) Decomposing the immeasurable

sport: A deep learning expected possession value framework for soccer. In: MIT Sloan

Sports Analytics Conference

Gudmundsson J, Horton M (2017) Spatio-Temporal Analysis of Team Sports. ACM Comput

Surv 50(2):22:1–22:34, DOI 10.1145/3054132

Hausknecht M, Stone P (2015a) Deep recurrent Q-learning for partially observable MDPs.

CoRR, abs/150706527

Hausknecht MJ, Stone P (2015b) Deep recurrent q-learning for partially observable mdps.

In: 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015, pp

29–37

Kharrat T, McHale IG, Pe˜

na JL (2019) Plus–minus player ratings for soccer. European Jour-

nal of Operational Research

Liu G, Schulte O (2018) Deep reinforcement learning in ice hockey for context-aware player

evaluation. In: Proceedings of the Twenty-Seventh International Joint Conference on Ar-

24 Guiliang Liu et al.

tiﬁcial Intelligence, IJCAI-18, International Joint Conferences on Artiﬁcial Intelligence

Organization, pp 3442–3448

Liu G, Zhu W, Schulte O (2018) Interpreting deep sports analytics: Valuing actions and

players in the NHL. In: International Workshop on Machine Learning and Data Mining

for Sports Analytics, Springer, pp 69–81

Macdonald B (2011) A regression-based adjusted plus-minus statistic for NHL players.

Journal of Quantitative Analysis in Sports 7(3):29

McCallum A (1996) Learning to use selective attention and short-term memory in sequential

tasks. In: From animals to animats 4: proceedings of the fourth international conference

on simulation of adaptive behavior, MIT Press, vol 4, p 315

McHale IG, Scarf PA, Folker DE (2012) On the development of a soccer player performance

rating system for the english premier league. Interfaces 42(4):339–351

Mnih V, Kavukcuoglu K, Silver D, et al. (2015) Human-level control through deep rein-

forcement learning. Nature 518(7540):529–533

Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: Theory

and application to reward shaping. In: ICML, vol 99, pp 278–287

Puterman ML, Patrick J (2017) Dynamic programming. In: Encyclopedia of Machine Learn-

ing and Data Mining, pp 377–388

Routley K (2015) A markov game model for valuing player actions in ice hockey. Master’s

thesis, Simon Fraser University

Routley K, Schulte O (2015) A markov game model for valuing player actions in ice hockey.

In: Uncertainty in Artiﬁcial Intelligence (UAI), pp 782–791

Schulte O, Khademi M, Gholami S, Zhao Z, Javan M, Desaulniers P (2017a) A markov

game model for valuing actions, locations, and team performance in ice hockey. Data

Mining and Knowledge Discovery pp 1–23

Schulte O, Zhao Z, Javan M, Desaulniers P (2017b) Apples-to-apples: Clustering and rank-

ing nhl players using location information and scoring impact. In: Proceedings MIT Sloan

Sports Analytics Conference

Schultze SR, Wellbrock CM (2018) A weighted plus/minus metric for individual soccer

player performance. Journal of Sports Analytics 4(2):121–131

Schumaker RP, Solieman OK, Chen H (2010) Research in sports statistics. In: Sports Data

Mining, Integrated Series in Information Systems, vol 26, Springer US, pp 29–44

Song Y, Xu M, Zhang S, Huo L (2017) Generalization tower network: A novel deep neural

network architecture for multi-task learning. arXiv preprint arXiv:171010036

Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press

Swartz TB, Arce A (2014) New insights involving the home team advantage. International

Journal of Sports Science & Coaching 9(4):681–692

Tsitsiklis JN, Van Roy B (1997) Analysis of temporal-diffference learning with function

approximation. In: Advances in neural information processing systems, pp 1075–1081

Van Haaren J, Van den Broeck G, Meert W, Davis J (2016) Lifted generative learning of

markov logic networks. Machine Learning 103(1):27–55

Van Roy M, Robberechts P, Decroos T, Davis J (2017) Valuing on-the-ball actions in soccer:

A critical comparison of xt and vaep. In: Workshop on Team Sports AAAI 2020

A Proof of Proposition 1

The data record transitions from a state-action-player triple to another, possibly resulting in a non-zero reward

(score or point in the context of sports). We denote the number of times such a transition occurs as

Deep soccer analytics: Learning an action-value function for evaluating soccer players 25

nD[s, a, pl,s0,a0,pl0]

where the 0indicates the successor triple. We freely use this notation for marginal counts as well, for

instance

nD[s0, a0,pl0] = X

s,a,pl

nD[s,a,pl,s0,a0,pl0]

From the paper, we have the following equations for the Q-value-above-replacement and the GIM met-

rics:

QAARi(D) = X

s,a

nD[s, a, pl0=i]Es0,a0[Qteam(s0,a0|s,a,pl 0=i)] −Es0,a0[Qteam (s0,a0)|s,a]

(7)

GIM i(D) = X

s,a,s0,a0

n[s, a, s0, a0,pl0=i;D]·hQteam(s0,a0)−Es0

E,a0

E[Qteam(s0

E,a0

E)|s,a]i

(8)

Now we have

GIM i(D)E q.2

=X

s,a X

s0,a0

nD[s, a, s0, a0,pl0=i]Qteam(s0,a0)−Es0

E,a0

E[Qteam(s0

E,a0

E)|s,a]

=X

s,a

nD[s, a, pl0=i]X

s0,a0

nD[s,a,s0,a0,pl0=i]

nD[s,a,pl0=i]Qteam(s0,a0)

−X

s,a

nD[s, a, pl0=i]Es0

E,a0

E[Qteam(s0

E,a0

E)|s,a](9)

=X

s,a

nD[s, a, pl0=i]E[Qteam(s0,a0|s,a,pl 0=i)] (10)

−X

s,a

nD[s, a, pl0=i]Es0

E,a0

E[Qteam(s0

E,a0

E)|s,a]

=X

s,a

nD[s, a, pl0=i]Es0

E,a0

E[Qteam(s0

E,a0

E|s,a,pl0=i)] −Es0

E,a0

E[Qteam(s0

E,a0

E)|s,a]

Eq.1

=QAARi(D)(11)

Step (9) holds because the expectation E[Qteam(s0, a0|s, a)] depends only on s, a, not on s0, a0. Line (10)

uses the empirical estimate of the expected Q-value Qteam(s0, a0)] given that player iacts next, computed

from the maximum likelihood estimates of the transition probabilities:

ˆσ(s0, a0|s, a, pl0=i) = nD[s,a,s0,a0,pl0=i]

nD[s,a,pl0=i]

The ﬁnal conclusion (11) applies Equation (7).

- A preview of this full-text is provided by Springer Nature.
- Learn more

Preview content only

Content available from Data Mining and Knowledge Discovery

This content is subject to copyright. Terms and conditions apply.