ArticlePDF Available

Abstract and Figures

We introduce a forecasting system designed to profit from sports-betting market using machine learning. We contribute three main novel ingredients. First, previous attempts to learn models for match-outcome prediction maximized the model's predictive accuracy as the single criterion. Unlike these approaches, we also reduce the model's correlation with the bookmaker's predictions available through the published odds. We show that such an optimized model allows for better profit generation, and the approach is thus a way to ‘exploit’ the bookmaker. The second novelty is in the application of convolutional neural networks for match outcome prediction. The convolution layer enables to leverage a vast number of player-related statistics on its input. Thirdly, we adopt elements of the modern portfolio theory to design a strategy for bet distribution according to the odds and model predictions, trading off profit expectation and variance optimally. These three ingredients combine towards a betting method yielding positive cumulative profits in experiments with NBA data from seasons 2007–2014 systematically, as opposed to alternative methods tested.
Content may be subject to copyright.
Exploiting sports-betting market using machine learning
Ondˇrej Hub´ceka,
, Gustav ˇ
Soureka, Filip ˇ
Zelezn´ya
aCzech Technical University in Prague
Abstract
We introduce a forecasting system designed to profit from sports-betting market using machine
learning. We contribute three main novel ingredients. First, previous attempts to learn models
for match-outcome prediction maximized the model’s predictive accuracy as the single criterion.
Unlike these approaches, we also reduce the model’s correlation with the bookmaker’s predictions
available through the published odds. We show that such an optimized model allows for better
profit generation, and the approach is thus a way to ‘exploit’ the bookmaker. The second novelty is
in the application of convolutional neural networks for match outcome prediction. The convolution
layer enables to leverage a vast number of player-related statistics on its input. Thirdly, we adopt
elements of the modern portfolio theory to design a strategy for bet distribution according to the
odds and model predictions, trading off profit expectation and variance optimally. These three
ingredients combine towards a betting method yielding positive cumulative profits in experiments
with NBA data from seasons 2007–2014 systematically, as opposed to alternative methods tested.
Keywords: Decision making, Evaluating forecasts, Neural networks, Sports forecasting,
Probability forecasting
1. Introduction
Sports betting means placing a wager on a subset of outcomes of random sports events, each
of which is associated with a corresponding profit predefined by a bookmaker. If the outcome is
guessed correctly, the bettor wins back the wager plus the profit, otherwise (s)he loses the wager
to the bookmaker. Historically, bookmakers operated betting shops, but with the expansion of the
Internet, most bookmakers these days operate online through betting sites. A variety of betting
Corresponding author
Email address: hubacon2@fel.cvut.cz (Ondˇrej Hub´cek)
Preprint submitted to International Journal of Forecasting January 8, 2019
opportunities are offered. In our work, we focus on moneyline bets. To win a moneyline bet,
the bettor needs to predict the winner of the game. For each of the two possible outcomes of a
two-team match, the bookmaker sets the corresponding odds; the latter multiplies with the wager
towards the potential profit. So if the bookmaker sets the odds to 1.8 for the home team to win, a
bettor places the wager of 100 Eur on that outcome, and the home team actually wins, the bettor’s
profit will be 1.8×100 100 = 80 Eur. Naturally, both bettors and bookmakers try to maximize
their profits.
If the odds were fair, their inverse value could be interpreted as the probability of the outcome
as estimated by the bookmaker. In practice, however, this is not the case. For instance, when the
bookmaker is indifferent as to which outcome is more probable, (s)he does not set the fair odds
as 2.0:2.0, and rather offers a lower portion of profit such as 1.95 : 1.95. The absolute difference
between 1 (the sum of true probabilities) and the probabilities implied by inverted odds is called the
margin. In our example, the bookmaker’s margin would be 1.951+ 1.95112.5%. Given the
margin, combined with the bookmaker’s professional experience in forecasting the game outcomes,
it is extremely difficult for a bettor to profit on moneyline betting systematically.
Here we design a profitable sports-betting system. Its three main properties which enable profit
generation, and at the same time the most important contributions of this paper w.r.t. the state
of the art, are as follows.
First, as many studies before (Section 2 provides a brief review), we use machine learning to
develop an outcome prediction model. However, in previous work the single emphasis has been on
the predictive accuracy of such a model. Here we argue that even an accurate model is unprofitable
as long as it is correlated with the bookmaker’s model: if our guesses coincide with the bookmaker’s,
we will be losing money due to the margin. Thus we elaborate various measures to decorrelate
the learned model from the bookmaker’s model (estimable roughly from the assigned odds), while
maintaining adequate prediction accuracy.
Secondly, we propose an innovative method to learn the outcome prediction model from features
describing past performance statistics of individual players in both teams of the match. The
approach uses convolutional neural network which recently achieved significant successes in visual
and language data processing. Here, a convolutional network layer operates on the matrix of players
and their statistics, acting as an aggregator of player-level features towards team-level features
2
propagated further through the network towards the prediction on the output. The aggregation
pattern defined by the convolution layer may be complex and is itself learned from training data.
Thirdly, we adopt the concepts of modern portfolio theory in the design of the betting strategy.
Such a strategy accepts the (probabilistic) model predictions for a set of games and proposes a set
of bets on these games. The portfolio theory originated in the field of economics and its application
in sports-betting is novel. The proposed strategy distributes the bets under the optimal trade-off
between profit expectation and profit variance. This supersedes heuristic bet spreading strategies
published in previous work as well as a naive expectation-maximization strategy.
The rest of the paper is organized as follows. In the next section, we review the relevant prior
work. Section 3 defines a formal framework for the problem. In Section 4, we elaborate the various
kinds of predictive models employed. Section 5 formalizes several existing betting strategies and
develops the novel strategy of portfolio optimization. In Section 6, we investigate selected aspects
of the proposed models and strategies through simulated experiments. Section 7 then provides
a thorough experimental evaluation on the US National Basketball League matches throughout
seasons 2006 to 2014. In Section 8, we conclude the paper.
2. Related Work
Several studies investigated the strategies of bookmakers and bettors. Focusing on the US
National Football League (NFL), Levitt (2004) traced how odds are set and concluded that book-
makers rely on their ability to outperform an average bettor in outcome forecasting rather than
on earning money by balancing weighted wages and profiting from the margin. This hypothesis
was subjected to further scrutiny by Paul & Weinbach (2007), challenging Levitt’s dataset infor-
mativeness as consisting only of bets from entry-fee betting tournaments and a limited numbers
of participants. However, the conclusions essentially confirmed those of Levitt. Although the
hypothesis was not confirmed in basketball (Paul & Weinbach, 2008) using National Basketball
League (NBA) data, the disagreement can be explained by the smaller NBA betting market. The
recent inquiry (Paul & Weinbach, 2010) into the behavior of bettors with data from NBA and NHL
season 2008/09 proposes that most bettors act more like fans than investors. Combined with the
conclusion of Levitt (2004), this motivates the question whether the bookmaker can be exploited
with an emotionless statistical model.
3
The idea that a statistical model might outperform experts was first tested in Forrest & Sim-
mons (2000). The experts were found unable to process publicly available information efficiently.
Signs of using information independent of publicly available data were rare. The study deemed it
unlikely that experts would outperform a regression model. Forrest, Goddard & Simmons (2005)
challenged the thesis that a statistical model has an edge over tipsters. They examined the per-
formance of a statistical model and bookmakers on 10 000 soccer matches and concluded that
bookmakers were on par with a statistical model.
Song, Boulier & Stekler (2007) analyzed prediction accuracy of experts, statistical models and
opening betting lines on two NFL seasons. There was a little difference between statistical models
and experts performance, but both were outperformed by the betting line. Spann & Skiera (2009)
compared prediction accuracy of prediction markets, betting odds and tipsters. Prediction markets
and betting odds proved to be comparable in terms of prediction accuracy. The forecasts from
prediction markets would be able to generate profit against the betting odds if there were not for
the high fees. On the other hand, tipsters performed rather poorly in this comparison.
Stekler, Sendor & Verlander (2010) focused on several topics in horse racing and team sports.
Forecasts were divided into three groups by their origin market, models, experts. Closing odds
proved to be better predictors of the game outcome than opening odds. The most important
conclusion was that there was no evidence that a statistical model or an expert could consistently
outperform betting market.
Franck, Verbeek & uesch (2010) inspired by results of prediction markets in different domains
such as politics, compared performance of betting exchange against the bookmaker on 3 seasons
of 5 European soccer leagues. The prediction market was superior to the bookmaker in terms of
prediction accuracy. A simple strategy based of betting on the opportunities where the average
odds set by the bookmakers were higher than the odds in prediction market was profitable in some
cases.
Angelini & De Angelis (2018) examined effectiveness of 41 bookmakers on 11 European major
leagues over a period of 11 years. Some of the markets turned out to be inefficient, since a trivial
strategy of betting on opportunities with odds in certain range led to positive profit. For NBA
with Pinnacle odds, however, it was shown in Hub´cek (2017) that this is not possible and the
bookmaker cannot be exploited that simply.
4
2.1. Predictive Models
The review Haghighat, Rastegari & Nourafza (2013) of machine learning techniques used in
outcome predictions of sports events points out the prevailing poor results of predictions and the
small sizes of datasets used. For improving the prediction accuracy the authors suggested to include
player-level statistics and more advanced machine learning techniques.
Loeffelholz, Bednar & Bauer (2009) achieved a remarkably high accuracy of over 74% using
neural network models, however their dataset consisted of only 620 games. As features, the authors
used seasonal averages of 11 basic box score statistics for each team. They also tried to use average
statistics of past 5 games and averages from home and away games separately but reported no
benefits.
Ivankovi´c, Rackovi´c, Markoski, Radosav & Ivkovi´c (2010) used ANNs to predict outcomes of
basketball games in the League of Serbia in seasons 2005/06–2009/10. An interesting part of the
work was that effects of shots from different court areas were formalized as features. With this
approach, the authors achieved the accuracy of 81 %. However, their very specific dataset makes
it impossible to compare the results with other research.
Miljkovi´c, Gaji´c, Kovaˇcevi´c & Konjovi´c (2010) evaluated their model on NBA season 2009/10.
Basic box score statistics were used as features, as well as win percentages in league, conference
or division and in home/away games. A Naive Bayes classifier in 10-fold cross-validation achieved
mean accuracy of 67 %.
Puranmalka (2013) used play-by-play data to develop new features. The main reason why
features derived from such data are superior to box score statistics is that they include a context.
Out of Naive Bayes, Logistic Regression, Bayes Net, SVM and k-nn, the SVM performed best,
achieving accuracy over 71 % in course of 10 NBA season from 2003/04 to 2012/13.
Zimmermann, Moorthy & Shi (2013) leveraged multi-layer perceptrons for sports outcome
predictions. They proposed the existnece of a glass ceiling of about 75 % accuracy based on results
achieved by statistical models in numerous different sports. This glass ceiling could be caused by
using similar features in many papers. They also argued that the choice of features is much more
important than the choice of a particular machine learning model.
Vraˇcar, ˇ
Strumbelj & Kononenko (2016) made use of play-by-play data to simulate basketball
games as Markov processes. Analysis of the results showed that a basketball game is a homogeneous
5
process up to the very beginning and end of each quarter. Modeling these sequences of the game
had a large impact on forecast performance. The author saw the application of their model not only
in outcome prediction before the game but also in in-play betting on less common bets (number of
rebounds/fouls in specific period of the game).
Maymin (2017) tested profitability of deep learning models trained on different datasets during
the course of a single NBA season. In the paper, positive profits were only achievable with the
use of detailed features extracted by experts from video recordings, while models trained using
standard box-score statistics terminated with significant loss.
Constantinou, Fenton & Neil (2013) designed an ensemble of Bayesian networks to assess soccer
teams’ strength. Besides objective information, they accounted for the subjective type of infor-
mation such as team form, psychological impact, and fatigue. All three components showed a
positive contribution to models’ forecasting capabilities. Including the fatigue component provided
the highest performance boost. Results revealed conflicts between accuracy and profit measures.
The final model was able to outperform the bookmakers.
Sinha, Dyer, Gimpel & Smith (2013) made use of twitter posts to predict the outcomes of NFL
games. Information from twitter posts enhanced forecasting accuracy, moreover, a model based
solely on features extracted from tweets outperformed models based on traditional statistics.
3. Problem Definition
Each round of the league consists of nmatches. Each match has two possible outcomes, home
team wins and home team loses. The bookmaker assigns odds oiR, oi>1 to each of the 2n
outcomes.
We assume that the bettor places an amount bi[0,1] on each of the 2noutcomes, wherein even
two mutually exclusive outcomes (same game) may each receive a positive bet. The normalization of
bets to the real unit interval is for simplicity, and as such the bets can be interpreted as proportions
of a fixed bettor’s budget, which is to be exhausted in each round. A popular alternative approach,
often presented in related works (Boshnakov, Kharrat & McHale, 2017), is reinvestment of all the
previously accumulated profits (Kelly jr, 1956). However, we posit
2n
X
i=1
bi= 1 (1)
6
The bettor retrieves oibifor outcome iif the latter came true in the match, and zero otherwise.
Let pibe the probability of the i-th outcome. The bettor’s profit is thus
Pi=
oibibiwith probability pi
biwith probability 1 pi
(2)
so the expected profit is
E[Pi] = pi(oibibi)(1 pi)bi= (pioi1)bi(3)
and the cumulative profit
P=
2n
X
i=1
Pi(4)
from all bets in the round thus has the expectation
E[P] = E "2n
X
i=1
Pi#=
2n
X
i=1
E[Pi] (5)
Our goal is to devise a betting strategy which prescribes the bets given the known odds and
ˆp1,ˆp2,...,ˆp2n(6)
which are estimates of the unknown probabilities p1, p2, . . . , p2nof the 2noutcomes. The problem
thus breaks down into two tasks: designing an estimator of the probabilities (6), and designing a
betting strategy that uses these estimates along with the bookmaker’s odds.
The former task should result in a function
ˆp:D[0,1] (7)
which estimates the probability of the home-team winning from some data dDrelevant to the
match and available prior to it; Drepresents the domain of such background data. Assume (6) are
ordered such that for each k { 1,2, . . . , n }, ˆp2k1( ˆp2k, respectively) estimates the probability
that the home team wins (loses, respectively) in the k-th match described by data dk. Then (6)
are obtained from the function (7) as
ˆp1,ˆp2,ˆp3,ˆp4, . . . = ˆp(d1),1ˆp(d1),ˆp(d2),1ˆp(d2), . . . (8)
The ˆpfunction is assumed to be learned from data sampled from D, but also conveying the known
match outcomes, i.e. from a finite sample from D× { 0,1}. A natural requirement on ˆpis that it
7
estimates the true probabilities accurately. However, the precise requirements on ˆpas well as the
nature of Dwill be explored in the subsequent section.
The second task is to design a betting strategy, i.e., a function
~
b:R2n×[0,1]2n[0,1]2n.(9)
which for known odds o1, o2,...o2nand probability estimates (6), proposes the bets
~
b(o1, o2,...o2n,ˆp1,ˆp2,...,ˆp2n) = b1, b2,...b2n(10)
subject to (1). A natural requirement on ~
bis that the bets proposed should lead to a high expected
profit (5). We will elaborate some more refined requirements in Section 5.
4. Prediction Model
4.1. Data Features
The information we use for predicting the outcome of a match combines data relating to the
home team and those pertaining to the visiting team. For each of the two, we aggregate various
quantitative measures of the team’s performance in all of its preceding matches since the beginning
of the season in which prediction takes place.1The entire range of these measures is described in
the appendix. Current seasons are commonly deemed the most relevant time-windows for player
and team performance prediction. The seasonal aggregation is conducted as follows. All variables
depending on the match duration are divided by the duration in minutes, and for the seasonal
aggregate, we consider the average of these per-minute values. Such variables are marked as “per-
minute” in the appendix. For the remaining variables, the median value is considered instead.
The inputs dDto the predictive model ˆp(7) are tuples of real-valued features constructed out
of the said season-aggregated data. Some of the variables in the latter pertain to individual players
and others relate to the whole team. Consequently, we distinguish two levels of feature-based
description. In the fine-grained player level, we collect all player-related variables as individual
features, whereas the team-level description involves only the team-level variables as features.
1A few initial games of the season are thus not included among training instances and serve only to collect the
statistics. We will quantify this arrangement for a particular testing data set in Section 7.1.
8
Meta-parameter
Standard
(team-level)
Convolutional
(player-level)
Architecture D64-D32-D16-D1 C1-D64-D16-D1
Activations tanh tanh
Dropout 0.2 0.2
L2 regularization 0.0001 0.001
Table 1: The architecture and meta-parameters of the neural predictive models considered.
Besides the historical track data considered above, the bookmaker’s odds assigned to a match
represent another piece of information potentially relevant to the prediction of its outcome. While
the odds clearly present a very informative feature, their incorporation in a model naturally in-
creases the undesirable correlation with the bookmaker (Section 4.4). Whether to omit or include
the odds as a feature thus remains questionable and so we further consider both the options in the
experiments.
4.2. Logistic Regression Model
Given (7), we looked for a class of models with a continuous output in the [0,1] interval. Logistic
Regression is a simple such instance, which we adopt as the baseline prediction method. It can be
viewed as an artificial neural network with only one neuron, using the sigmoid as the activation
function.
4.3. Neural Model
As an alternative prediction method, we explored two variants of a neural network. The first
has a standard (deep) feed-forward architecture with 4 dense layers, while the second one uses a
convolutional layer (LeCun, Bengio et al., 1995) followed by 3 dense layers. Table 1 describes the
architectures and the relevant meta-parameters of the two neural models.
The standard feed-forward network is intended for the team-level feature data. The convo-
lutional network is specifically designed for the player-level data to deal with the large number
of features involved. The principle of its operation, inspired by well-known applications of con-
volutional networks for visual data processing, is explained through Figure 1. Intuitively, the
9
convolutional layer may be perceived as a bridge from player-level variables to a team-level rep-
resentation. However, whereas team-level variables already present in the data are simple sums
or averages over all team’s players, the convolution layer provides the flexibility to form a more
complex aggregation pattern, which itself is learned from the training data.
C1 D64 D16 D1
2 x 10 x 80
Features
Players
Teams
2 x 80 (81)
Odds
Conv.
filter
1 x 10 x 1
Figure 1: The convolutional neural network for player-level data. The input to the network are two matrices (one
for the home team, one for the visitors), with players in rows and all player-level features in columns. The rows are
sorted by the time-in-play of the corresponding players, and only the top 10 players w.r.t. this factor are included.
The convolution layer is defined by a vector of 10 tunable real weights. The output of the layer is a vector where
each component is the dot product of the former vector with one of the matrix columns. The vector may be viewed
as a filter sliding horizontally on the the first input matrix, and then on the second.
4.4. Model decorrelation
While we deal with the construction of a bettor’s predictive model, the bookmaker obviously
also possesses a (tacit) model according to which the odds are set. If the probability of outcome
i= 2k1 for some match kis ¯piaccording to the bookmaker’s model, the odds oiis set to a value
no greater than 1/¯pi. So if outcome i+ 1 is complementary to outcome i(i.e., they correspond
respectively to the home and visiting team winning in the given match k), then
1
oi
+1
oi+1
= 1 +
where 0. If = 0, the odds would be called fair. In real life, > 0 for all pairs of complementary
outcomes and is called the bookmaker’s margin. It is one of the sources of bookmaker’s profit
(see Figures 3 and 4 for an analysis of odds and margin distributions in real data).
10
Consider a situation where the bettor’s learned model coincides with the bookmakers model.
Then any betting opportunity promising from the viewpoint of an estimated high outcome prob-
ability ˆpi= ¯piis made unfavorable by the odds set lower than 1/¯pi. Therefore, even a highly
accurate predictive model is useless as long as it coincides with the bookmaker’s model. This
motivates the objective to learn a predictive model under two criteria of quality: high accuracy on
empirical data, and adequately low correlation with the bookmaker’s model.
Recall the question from 4.1 whether to include the known bookmaker’s odds as a feature
when training a predictive model. Unless the bookmaker’s odds are systematically biased, which
they are not (Hub´cek, 2017), a model considering odds as the only input feature would have no
choice but to coincide perfectly with the bookmaker, inevitably doomed to end up with negative
returns directly proportional to the margin. Although the odds are clearly highly accurate, given
the low-correlation desideratum we just discussed, it seems reasonable considering not to provide
the learning algorithm with this information. Since the bookmaker’s odds represent a strongly
predictive feature, the learning algorithm would likely involve it in the constructed model, which
entails the undesired high correlation with the bookmaker’s model. Consequently, for the models
incorporating the odds as a feature, or otherwise correlated models, we propose two additional
techniques to reduce the undesired correlation.
First is a simple technique in which we establish weights of learning samples, acting as factors
in the computation of training errors. Alternatively, this can be viewed as making several copies
of each example in the training multi-set so that the number of occurrences of each example in the
multi-set is proportional to its weight. We explored two particular weightings. In one, we set each
example’s weight to the value of the corresponding odds. Thus training examples corresponding to
high-odds outcomes and high potential pay-offs will contribute more to the training error. Hence
the model is forced to be especially accurate on such examples, if at the price of lesser accuracy
on other examples. In another variation, we set the weights to the odds only for instances where
the underdog (team with odds higher than the opponent’s) won, retaining unit weights for other
examples. The intended outcome is that such a learned model will tend to spot mainly situations
where the bookmakers underestimates the underdog’s true win-probability.
Second is a more sophisticated decorrelation technique which, rather than weighting examples,
directly alters the loss function which is minimized through gradient descent while fitting the neural
11
model. The standard loss, which is just the sum of squared prediction errors, is extended towards
1
N
N
X
i=1
(ˆpiyi)2c·(ˆpi1/oi)2
where ˆpiis the model’s output for the i-th example, yi { 0,1}is the actual outcome of the
match, and oiare the odds set by the bookmaker, so 1/oiprovides a simple estimate of ¯pi. The
first term is conventional, forcing the model to agree with the ground truth, while the second term
enforces decorrelation w.r.t. the bookmaker. The constant cdetermines the relative significance
of the decorrelation term.
5. Betting Strategy
Equipped with a prediction model, we have estimates ˆpiof the true outcome probabilities pi
allowing us to estimate the expected profit (3) as b
E(P) = (ˆpioi1)bi, and the cumulative profit (5)
as b
E(P) = P2n
i=1 b
E(Pi). A straightforward betting strategy would be to place such bets b1, b2,...b2n
which maximize the latter. This strategy, which we will refer to as max-ep, will obviously stipulate
to put the entire budget (1) on the apparently best opportunity, i.e. bi= 1 for i= arg maxib
E(Pi).
However, in repeated betting over several rounds, the bettor will likely prefer to spread the bets
over multiple opportunities in one round to reduce the risk of losing the entire budget and being
unable to continue. In other words, besides maximizing the profit’s expectation, we also want to
minimize its variance. In the literature reviewed in Section 2, we detected four strategies for bet
spreading. We formalize them briefly as functions producing numbers Bi, while the actual bets are
then prescribed by
bi=Bi1(ˆpi1/oi)
P2n
jBj1(ˆpj1/oj)
where 1(.) stands for the indicator function evaluating to 1 for positive arguments, and to 0
otherwise. So positive bets are only placed on opportunities where ˆpi>1/oi, which is equivalent
to ˆ
E(Pi)>0, and the denominator normalizes the bets to comply with (1). The four strategies
spread the bets on such opportunities:
1. uniformly, i.e., Bi1
2. according to the estimated probability of winning, i.e., Bi= ˆpi
3. according to the absolute difference between win probabilities predicted by the model and
that of the bookmaker, i.e., Bi= ˆpi1/oi
12
4. as above, but using relative difference, i.e., Bi= pi1/oi)/ˆpi
We will refer to these strategies as unif,conf,abs-disc,rel-disc, respectively.
5.1. Portfolio-optimization strategy
The four strategies above are heuristics. Here we propose to adopt a more theoretically justified
strategy for bet distribution, using the concepts of portfolio theory proposed in the field of economics
by Markowitz (1952). As motivated earlier, a strategy should weigh in both the expectation and
the variance of the profit. The portfolio optimization strategy seeks the Pareto front of ~
b’s with
respect to E[P] and Var[P], i.e., the set of all bet distributions not dominated in terms of both of
the factors.
The expectation is given by (5) and (3). For the variance of profit on a single outcome i, we
have
Var[Pi] = E[P2
i]E[Pi]2= (1 pi)pib2
io2
i(11)
and for the variance of the cumulative profit
Var[P] = Var "2n
X
i=1
Pi#= Var
X
i=1,..,2n
bi>0
Pi
where the second sum restricts the summands to those ifor which a positive bet biwas placed,
since by (2), Pi= 0 when bi= 0. We now make the assumption that the bettor never places
positive bets on each of two complementary outcomes, i.e. on both the home team and the away
team in a match. Under this assumption, the summands above may be safely considered pair-wise
independent random variables, as no team plays in two different matches in the same round. In
other words, no team influences both of outcomes i, j if i6=j, bi>0, bj>0. Thus we may write
Var[P] =
2n
X
i=1
Var[Pi] =
2n
X
i=1
(1 pi)pib2
io2
i
E[P] and Var[P] are of course not known and we compute the Pareto front using the model-
based estimates b
E[P] and d
Var[P] computed from ˆpi’s instead of pi’s. To pick a particular bet
distribution from the computed Pareto front, we rely on the Sharpe ratio introduced by Sharpe
(1994) according to which, we pick the distribution maximizing
b
E[P]R
ˆσP
(12)
13
where ˆσP=qd
V ar[P] is P’s (estimated) standard deviation and Ris the profit from a risk-free
investment of the disposable wealth, such as through banking interests. We neglect (i.e., set R= 0)
this economically motivated quantity due to the short duration of the betting procedure. We use
the algorithm of sequential quadratic programming (Nocedal & Wright, 2006) to identify the unique
maximizer of the Sharpe ratio. The strategy just described will be referred to as opt.
5.2. Confidence Thresholding
We also explored a modification applicable to each of the betting strategies, in which only
high-confidence predictions are considered. More precisely, a probability estimate ˆpiis passed to
the betting strategy if and only if
|ˆpi0.5|> φ.
The reasoning behind this thresholding is that we want to remove the games where the model is very
indifferent about the favorite. Although being in principle valid for the strategy, our assumption
is that probabilistic predictions around 0.5 are typically more imprecise than predictions of higher
confidence. This is especially true for the proposed models trained with gradient descent techniques
over logistic sigmoid output which is indeed most sensitive at that point.
6. Experiments on Simulated Data
Here we conduct two auxiliary experiments requiring simulated data, to examine the effects of
correlation between ˆpiand ¯pi, as discussed in Section 4.4, and to illustrate the Pareto analysis of
bet distributions introduced in Section 5.1.
6.1. Decorrelation Effects
Our motivation in Section 4.4 to keep the model-based estimates ˆpof pas little correlated with
¯p(the bookmaker’s estimates) as possible stems from the hypothesis that betting profits decay if
such correlation increases with other factors kept constant. To test this proposition, we simulated
the ground truth pas well as both of the estimates with various levels of their correlation, and
measured the profits made by the opt and unif strategies for these different levels.
More precisely, we sampled triples p, ˆp, ¯pfrom a multi-variate beta distribution. The distri-
bution is parameterized with the marginal means and variances of the three variables and their
14
pair-wise correlations. The mean of each of the three variables was set to 1/2, reflecting the mean
probability of the binary outcome. The variance of ¯pwas determined as 0.054 from real book-
maker’s data (Section 7.1), and ˆp’s variance copies this value. The variance of pwas set to 0.08
reflecting the glass ceiling thesis.2
We let the correlations ρ( ˆp, p) and ρ(ˆp, ¯p) and ρ(p, ¯p) range over the values {0.85,0.90,0.95 }.
The former two represent the independent variables of the analysis, acting as factors in Table 2,
while the presented numbers average over the 3 values of ρ(p, ¯p).
For each setting of ρ(ˆp, p) and ρ( ˆp, ¯p), we drew pi,ˆpi,¯pi(i= 1,2, . . . , n = 30) samples, to
simulate one round of betting. Then we set the odds oi= 1/¯pi(the bookmaker’s margin being
immaterial here) for 1 in, and determined the bets b1, b2, . . . , bnfrom o1, o2, . . . , onand
ˆp1,ˆp2,...,ˆpnusing the opt and unif strategies (Section 5.1). Finally, the match outcomes were
established by a Bernoulli trial for each of p1, p2, . . . , pn. With these inputs, we calculated the
cumulative profit P=P1+P2+. . . +Pnof one round. This procedure was repeated 10 000 times
(rounds), averaging the Popt and Punif.3
Table 2 shows the average profits as well as the accuracy of the bettor’s outcome predictions
(call win if ˆp > 1/2), and the percentual breakdown of 4 possible combinations of bettor’s and
bookmaker’s predictions. The accuracies, as well as the four latter proportions, are also averages
over all bets in all simulated rounds.
Besides the unsurprising observation that bettor’s prediction accuracy grows with ρp, p), the
results show that profits indeed decay systematically as the bettor’s and bookmaker’s predictions
become more correlated (increasing ρ(ˆp, ¯p) decreases profit). An instructive observation is that the
proportion of spotted opportunities is in all cases higher when the bookmaker’s and bettor’s pre-
dictions are less correlated. Moreover, we can see that the betting strategy is another independent
factor strongly influencing the profit, with the opt strategy being completely superior to the unif
strategy. Clearly, our proposals for promoting decorrelation and portfolio-optimization betting are
both supported by this synthetic data experiment.
2Articulated by Zimmermann et al. (2013) and expressing that sports games are predictable with a maximum of
75% accuracy at best. When pis sampled with mean 1/2 and variance 0.08, then with 0.75 probability the event
(p > 1/2) predicts correctly the outcome of a Bernoulli trial parameterized with p.
3Note that this is not the same (for Popt) as setting n= 30 ·10000 without repeating the procedure, as the full
unit budget is supposed to be spent in each round.
15
ρ(ˆp, p)ρ(ˆp, ¯p)Popt Punif Accuracy Consensus Upset Missed Spotted
0.85 0.85 11.15 3.14 70.11 61.99 20.37 9.53 8.12
0.90 6.14 0.52 70.05 63.60 22.04 7.91 6.45
0.95 -1.73 -5.46 70.08 65.74 24.12 5.80 4.34
0.90 0.85 18.14 8.56 71.48 62.66 19.70 8.82 8.83
0.90 14.05 5.74 71.48 64.36 21.35 7.17 7.12
0.95 9.60 3.38 71.45 66.39 23.50 5.05 5.06
0.95 0.85 25.30 13.42 72.91 63.34 18.98 8.11 9.57
0.90 22.95 12.69 72.93 65.02 20.62 6.46 7.91
0.95 20.79 11.87 72.92 67.21 22.74 4.33 5.71
Table 2: Average profits (in %) Popt of the opt and Punif of the unif strategies in dependence to the correlations of
the (estimated) probabilities. Accuracy denotes the % of correct outcome predictions by the bettor (predict win if
ˆp > 1/2). The four last columns break down the proportions (in %) of different combinations of predictions by ˆp
(bettor) and ¯p(bookmaker): Consensus (both right), Upset (both wrong), Missed (bettor wrong, bookmaker right),
Spotted (bettor right, bookmaker wrong).
6.2. Pareto Analysis
To provide a visual insight of the Pareto front and the placement of exemplary strategies with
respect to it, we generated 6 hypothetical betting opportunities with associated bettor’s estimate
ˆpiand bookmaker’s estimate ¯piof the outcome probability, for i= 1,2, . . . 6. The bet distributions
assigned to the 6 cases by 5 different strategies are shown in Table 3. Figure 2 shows the position
of these distributions within the expectation-deviation diagram (also called the risk-return space)
with respect to the Pareto front, along with other 1000 random bet distributions ~
b. For each
such random ~
b, we sampled each bi(1 i6) from the uniform distribution on [0,1] and then
normalized them so that b1+b2+. . . +b6= 1. As expected, the opt strategy maximizing the
Sharpe ratio (12) indeed lies on the Pareto front.
16
iˆpi¯piunif abs-disc rel-disc conf opt
1 0.30 0.26 0.17 0.20 0.35 0.08 0.09
2 0.59 0.52 0.17 0.27 0.24 0.16 0.23
3 0.75 0.70 0.17 0.21 0.15 0.21 0.30
4 0.60 0.57 0.17 0.13 0.12 0.17 0.12
5 0.74 0.71 0.17 0.11 0.08 0.20 0.17
6 0.64 0.62 0.17 0.07 0.06 0.18 0.08
Table 3: Bet distributions as dictated by 5 different strategies on 6 simulated betting opportunities.
7. Empirical Evaluation
7.1. Data
We retrieved the official box score data from the National Basketball Association (NBA) from
seasons 2000 to 2014. The gathered data provide game summaries; namely, player-level and team-
level statistics such as the number of shots or number of steals per game are recorded. The
detailed description of the available kinds of information can be found in the appendix. Games
with incomplete statistics were removed, and thus the number of games differs slightly between
seasons; on average, 985 games per year were included. 10 initial games of each team in each season
were not included as training instances as they only served for the initial calculation of seasonal
aggregates (c.f. Section 4.1). There are 30 teams in the NBA, so one league round consists of
n= 15 games.
For betting odds, we used the Pinnacle4closing odds for seasons 2010–20145. For earlier
seasons, we had to collect odds data from multiple different bookmakers. Fig. 3 shows histograms
of odds distribution for the home and away teams and their winnings, respectively. The histograms
reveal that in most matches the home team is the favorite in bookmaker’s eyes. This comes as
no surprise due to the home court advantage (home teams win in about 60 % of games). Both
histograms exhibit long-tailed distributions, as expected given that odds correspond to inverse
probabilities, which roughly follow the true proportions of the respective winnings.
4https://www.pinnacle.com/
5provided kindly by prof. Strumbelj, University of Ljubjana, Slovenia.
17
Figure 2: Comparison of five betting strategies and 1000 random bet distributions in the risk-return space with
respect to the Pareto front of optimal strategies.
Figure 4 shows the seasonal averages of the bookmaker’s margin , displaying the artifact caused
by different sources of odds information prior and post 2010. This artifact does not confound
the experimental questions below, except for causing higher profits in late seasons due to the
systematically smaller margins. To get a better insight into the bookmaker’s margins, we plotted
their dependency on odds for the 2010–2014 period. Figure 4 indicates a significantly larger
margin in the case where there is a clear favorite with high probability of winning (odds close
to 1). This is due to an asymmetry in bookmaker’s odds: while there are several occasions with
the favorite’s odds around 1.1 implying win-probability around 91%, odds around 11 representing
the complementary probability 9% are extremely rare. This asymmetry is increasing with favorite’s
odds approaching 1.0.
7.2. Experimental Protocol
The central experimental questions are: how accurate the learned predictors of match outcomes
are, how profitable the betting strategies using the predictions are, and how the profitability is
related to the correlation between the bookmaker’s and bettor’s models.
18
1234567
Odds(Home)
0
100
200
300
400
500
600
700
#Games
#Wins
1234567
Odds(Away)
0
100
200
300
400
500
600
700
#Games
#Wins
Figure 3: Distribution of all games (blue) with respective proportions of wins (green) w.r.t odds set by the bookmaker
from home (left) and away (right) team perspectives. Clearly, the home team is generally favored by the bookmaker,
with the true proportions roughly following the inverse of odds.
Training and evaluation of the models and betting strategies followed the natural chronological
order of the data w.r.t individual seasons, i.e. only past seasons were ever used for training a
model evaluated on the upcoming season. To ensure sufficient training data, the first season to be
evaluated was 2006, with a training window made up of seasons 2000–2005, iteratively expanding
all the way to evaluation on 2014, trained on the whole preceding range of 2000–2013.
7.3. Results
The number of games with complete statistics available varies slightly with each individual
season providing around 1000–1050 matches. The total number of 9093 games from the evaluated
seasons 2006–2014 is counted towards the accuracies (% of correctly predicted outcomes) of each
model, whose results are displayed in Table 4. The accuracy of the bookmakers’ model, predicting
the team with smaller odds to win, levels over these seasons at 69 ±2.5. Generally in terms of
accuracy, the bookmakers’ model is slightly superior to the neural models, which in turn beat the
logistic regression baseline (accuracy of 68.7 with odds, and 67.26 without). Overall the accuracy
of all the models is considerably similar, including progress over the individual seasons.
As expected, we can observe from the results that models utilizing the highly informative odds
feature achieve consistently higher accuracies. Similarly, the models that included the bookmakers’
odds were anticipated to be more correlated with the bookmaker (Section 4.4). This is convincingly
19
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
Season
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Margin
Figure 4: Evolution of margin over the seasons (left), showing drop for seasons 2010-2014 where Pinnacle was the
only source, and its dependency on bookmaker’s odds for the favorite of each game (right), displaying interesting
patterns with rapid growth towards the clear favorite case (minimal odds).
confirmed by measurements of Pearson coefficients which stay at 0.87 for the models trained without
odds as a feature, and 0.95 for models including them, applying equally to both the player-lever
and team-level models.
Table 4 also provides important insights on the profit generation. We display two selected
betting strategies (opt,unif) against a range of considered variants of predictive models. Similarly
to the simulation experiment, the superiority of opt over unif is still evident. Apart from accuracy,
we argued for decorrelation as an important factor for profit, which we here enforce by the means
of the altered loss function while varying the trade-off Cbetween accuracy and decorrelation
(Section 4.4). We can clearly see that such a trade-off is effectively possible for a wide range of
0.4C0.8 resulting into positive returns over all the models utilizing the opt strategy.
In Figure 5 we display insight on how the trade-off constant Cinfluences the distribution of the
four betting outcome situations. As expected, increasing the decorrelation results in a desirable
increase of spotted opportunities, i.e., cases where the model correctly predicted the underdog’s
victory. If this increase is too abrupt, however, it is outweighed by the parallel increase of missed
opportunities where the bet on the underdog was wrong. This was the case with the alternative
decorrelation technique of sample weighting where we were not successful in finding the optimal
trade-off between the two antagonistic factors to generate positive profit.
Revisiting the question as to whether include the odds feature or not, in terms of profit gen-
20
Team-level Player-level
CPopt Punif Accuracy Popt Punif Accuracy
Without odds
0.0 -0.94 ±0.12 -4.31 ±0.17 67.47 ±0.05 0.38 ±0.10 -5.12 ±0.11 67.62 ±0.03
0.2 -0.58 ±0.14 -3.60 ±0.19 67.39 ±0.04 1.05 ±0.12 -3.31 ±0.13 67.47 ±0.03
0.4 0.46 ±0.15 -1.94 ±0.20 67.30 ±0.05 1.74 ±0.14 -1.73 ±0.18 67.15 ±0.10
0.6 0.86 ±0.08 -1.68 ±0.22 66.93 ±0.06 1.32 ±0.14 -0.61 ±0.28 66.19 ±0.09
0.8 1.37 ±0.08 -0.79 ±0.16 65.94 ±0.12 1.10 ±0.29 -0.39 ±0.22 64.93 ±0.35
1.0 -1.06 ±0.35 -1.32 ±0.31 61.38 ±0.19 -1.92 ±0.81 -2.59 ±0.57 61.30 ±0.48
With odds
0.0 0.89 ±0.10 -2.24 ±0.21 68.83 ±0.05 -0.12 ±0.24 -3.83 ±0.22 68.80 ±0.06
0.2 0.92 ±0.18 -2.10 ±0.24 68.71 ±0.04 0.72 ±0.13 -2.50 ±0.14 68.37 ±0.04
0.4 1.24 ±0.12 -1.24 ±0.22 68.42 ±0.05 1.49 ±0.10 -1.30 ±0.12 67.48 ±0.10
0.6 1.44 ±0.11 -0.64 ±0.21 67.88 ±0.06 1.02 ±0.20 -1.15 ±0.22 66.55 ±0.10
0.8 1.41 ±0.10 -0.56 ±0.20 66.64 ±0.12 1.00 ±0.35 -0.45 ±0.28 65.19 ±0.27
1.0 -0.37 ±0.16 -0.74 ±0.13 62.49 ±0.12 -1.22 ±0.51 -2.25 ±0.30 61.77 ±0.44
Table 4: Averages and standard errors of profits (from 10 runs over seasons 2006–2014) for the two strategies (opt,
unif) with accuracies of Player-level and Team-level outcome prediction models (Section 4) across different levels of
decorrelation (Section 4.4).
eration the results are inconclusive, with the team-level model performing slightly better with the
feature and the player-level model without it.
In terms of profit measures, both the proposed models beat the baseline logistic regression,
which, while operating on the same feature-set, behaves similarly to the team-level model yet
yielding inferior performance (only 0.61 in the best setting with odds and opt).
Next we investigate the effects of confidence-thresholding used to filter the predictions (Section
5.2) before providing them to the betting strategy. By varying the threshold φwe can trade
off between the confidence of the model and the number of games providing information to the
strategy. Results in Table 5 are conclusive in that a reasonably low amount of thresholding bellow
φ= 0.2 in conjunction with the opt strategy indeed improves profits. Such a low threshold has
the effect of filtering out generally those predictions that are indifferent on the winner (estimated
probabilities of 0.5±0.2), which was the main motivation for this technique.
21
0 20 40 60 80 100
Portion of games [%]
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
C
Consensus Upsets Missed Spotted
Figure 5: The impact of loss-function-term model-decorrelation techniques, as introduced in Section 4.4 and applied
on the team-level model, on the distribution of betting opportunity outcomes: Consensus (both right), Upset (both
wrong), Missed (bettor wrong, bookmaker right), Spotted (bettor right, bookmaker wrong).
7.3.1. Actual Betting Evaluation
We have demonstrated the effectiveness of the proposed models and techniques such as decor-
relation and confidence thresholding. The ultimate question is, how a bettor could leverage these
observations to exploit the betting market or, more precisely, which setting would yield the largest
and most stable gains. In an actual scenario, the bettor could be continuously evaluating all the
proposed models in different settings, and after each season (s)he could fix a selected setting for the
upcoming season based on past performance. As a selection criterion, (s)he could once again utilize
the Sharpe ratio for trading off between average historical profits per round and their standard
deviation.
Following the proposed scenario, we evaluated each betting strategy under the max-Sharpe
selection criterion applied over all the possible settings combining the choice of features, decorre-
lation, and thresholding. Figure 6 shows the progress of their respective cumulative profits for all
rounds between 2007–2014 (the first season of 2006 is used for the selection). Operating on the
full range of possible settings, the resulting cumulative profits demonstrate the independent value
added to profit performance separately by each of the betting strategies.
22
Team-level Player-level
φPopt Punif Accuracy Games Popt Punif Accuracy Games
Without odds
0.0 0.86 ±0.08 -1.69 ±0.22 66.93 ±0.05 9093 1.74 ±0.14 -1.73 ±0.18 67.15 ±0.1 9093
0.1 1.61 ±0.14 -1.37 ±0.21 70.31 ±0.07 7370 2.39 ±0.2 -1.42 ±0.16 72.01 ±0.15 6686
0.2 1.99 ±0.25 -1.25 ±0.21 74.08 ±0.13 5442 3.24 ±0.32 -1.18 ±0.29 77.22 ±0.19 4228
0.3 0.51 ±0.59 -2.56 ±0.7 79.64 ±0.2 2937 -4.81 ±0.82 -6.9 ±1.09 84.32 ±0.48 1841
With odds
0.0 1.44 ±0.11 -0.64 ±0.21 67.88 ±0.06 9093 1.49 ±0.1 -1.3 ±0.12 67.48 ±0.1 9093
0.1 2.18 ±0.14 -0.13 ±0.25 70.93 ±0.06 7538 2.43 ±0.16 -0.94 ±0.24 72.2 ±0.08 6749
0.2 1.8 ±0.24 -0.73 ±0.29 74.47 ±0.09 5749 3.39 ±0.46 -0.7 ±0.52 77.41 ±0.12 4336
0.3 0.75 ±0.33 -1.61 ±0.38 80.26 ±0.21 3315 -4.57 ±0.93 -7.36 ±0.85 84.35 ±0.29 1940
Table 5: Averages and standard errors of profits (from 10 runs over seasons 2006–2014) for the two strategies (opt,
unif) with accuracies of the Player-level (C= 0.4) and Team-level (C= 0.6) prediction models (Section 4) across
different levels of confidence thresholding (Section 5.2). Games represent numbers of games that passed the respective
threshold.
The proposed opt strategy clearly outperforms all other strategies, except the risk-ignoring max-
ep. Although the expectation maximization strategy max-ep accumulates a larger ultimate profit
than opt in this scenario, it suffers from great variance with abrupt drops, possibly bankrupting
the bettor. On the contrary, the opt strategy maintains a steady growth of profit throughout the
entire duration.
Overall, the opt strategy with the adaptive settings selection generated a positive profit of
P= 1.63. Interestingly, in a scenario where we deprived the strategies of decorrelation and
thresholding settings, the opt achieved only P= 0.44 and max-ep ended up in bankrupcy, further
demonstrating usefulness of the proposed techniques.
8. Conclusions
The main hypotheses of this study were 1) that correlation of outcome predictions with the
bookmaker’s predictions is detrimental for the bettor, and that suppressing such correlation will
result in models allowing for higher profits, 2) that convolutional neural networks are a suitable
model to leverage player-level data for match outcome predictions, and 3) that a successful betting
strategy should balance optimally between profit expectation and profit variance.
The first hypothesis was clearly confirmed in simulated experiments and also supported by
23
0 100 200 300 400 500
Rounds
30
20
10
0
10
20
30
40
50
Profit
opt
unif
abs-disc
rel-disc
conf
max-ep
Figure 6: Actual cumulative profits of 6 different betting strategies through seasons 2007–2014.
extensive real-data experiments. In the former, for each level of constant accuracy (correlation
of model and ground truth), increasing the correlation between the model and the bookmaker
consistently decreased the profit in all settings. In the latter, models trained with the proposed
decorrelation loss achieved higher profits despite having lower accuracies than models with higher
correlation, in all settings up to a reasonable level of the decorrelation-accuracy trade-off.
Regarding the second hypothesis, the convolutional network achieved generally higher accu-
racies and profits than the rest of the models in the settings excluding bookmaker’s odds from
features. This can evidently be ascribed to its ability of digesting the full matrix of players and
their performance statistics through a flexible (learnable) pattern of aggregation, as opposed to
just replicating the bookmakers estimate from input.
As for the third hypothesis, the portfolio-optimization opt strategy, which we designed as
an essential contribution of this study, consistently dominated the standard unif strategy and,
reassuringly, it was the only strategy securing a steady growth in profits with minimal abrupt losses
in all actual betting simulations performed. Additionally, we proposed confidence-thresholding as
an enhancement to the strategy when used in conjunction with models utilizing logistic sigmoid
output. This technique effectively removes very uncertain predictions from the strategy, leading to
additional increases in profit.
To our best knowledge, no work of similar scale evaluating sports prediction models from the
24
viewpoint of profitability has yet been published.
8.1. Future Work
There are avenues for future work stemming from each of the evaluated hypotheses. In the
modelling part, we utilized neural networks operating on detailed feature sets of basketball games,
however, this choice is completely independent of the remaining contributions, and we could equally
employ different models on different sports and data representations. Particularly, we intend to
explore models operating on pure game results data from various sports. In the betting strategy
part, we assumed a scenario where a bettor is given a fixed budget to spend in each round. We
plan to extend this to the more complex case where the bettor is continually reinvesting his wealth.
Finally, following the decorrelation objective, we will aim to integrate the modelling and portfolio
optimization parts in an end-to-end learning setting.
Acknowledgement
The authors are supported by Czech Science Foundation project 17-26999S Deep Relational Learning. FZ is
also supported by OP VVV MEYS funded project CZ.02.1.010.00.016 0190000765 Research Center for Informatics.
Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085,
provided under the programme ”Projects of Large Research, Development, and Innovations Infrastructures”. We
thank the anonymous reviewers for their constructive comments.
References
Angelini, G., & De Angelis, L. (2018). Efficiency of online football betting markets. International Journal of
Forecasting, .
Boshnakov, G., Kharrat, T., & McHale, I. G. (2017). A bivariate weibull count model for forecasting association
football scores. International Journal of Forecasting,33 , 458–466.
Constantinou, A. C., Fenton, N. E., & Neil, M. (2013). Profiting from an inefficient association football gambling
market: Prediction, risk and uncertainty using bayesian networks. Knowledge-Based Systems ,50 , 60–86.
Forrest, D., Goddard, J., & Simmons, R. (2005). Odds-setters as forecasters: The case of english football. Interna-
tional Journal of Forecasting,21 , 551–564.
Forrest, D., & Simmons, R. (2000). Forecasting sport: the behaviour and performance of football tipsters. Interna-
tional Journal of Forecasting,16 , 317–331.
Franck, E., Verbeek, E., & uesch, S. (2010). Prediction accuracy of different market structuresbookmakers versus
a betting exchange. International Journal of Forecasting,26 , 448–459.
25
Haghighat, M., Rastegari, H., & Nourafza, N. (2013). A review of data mining techniques for result prediction in
sports. Advances in Computer Science: an International Journal,2, 7–12.
Hub´cek, O. (2017). Exploiting betting market inefficiencies with machine learning. Master’s thesis Czech Technical
University in Prague.
Ivankovi´c, Z., Rackovi´c, M., Markoski, B., Radosav, D., & Ivkovi´c, M. (2010). Analysis of basketball games using
neural networks. In Computational Intelligence and Informatics (CINTI), 2010 11th International Symposium on
(pp. 251–256). IEEE.
Kelly jr, J. (1956). A new interpretation of information rate. bell system technical journal, .
Kubatko, J., Oliver, D., Pelton, K., & Rosenbaum, D. T. (2007). A starting point for analyzing basketball statistics.
Journal of Quantitative Analysis in Sports,3.
LeCun, Y., Bengio, Y. et al. (1995). Convolutional networks for images, speech, and time series. The handbook of
brain theory and neural networks,3361 , 1995.
Levitt, S. D. (2004). Why are gambling markets organised so differently from financial markets? The Economic
Journal,114 , 223–246.
Loeffelholz, B., Bednar, E., & Bauer, K. W. (2009). Predicting nba games using neural networks. Journal of
Quantitative Analysis in Sports,5.
Markowitz, H. (1952). Portfolio selection. The Journal of Finance,7, 77–91.
Maymin, P. Z. (2017). Wage against the machine: A generalized deep-learning market test of dataset value. Inter-
national Journal of Forecasting, .
Miljkovi´c, D., Gaji´c, L., Kovaˇcevi´c, A., & Konjovi´c, Z. (2010). The use of data mining for basketball matches
outcomes prediction. In Intelligent Systems and Informatics (SISY), 2010 8th International Symposium on (pp.
309–312). IEEE.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization. Springer.
Paul, R. J., & Weinbach, A. P. (2007). Does sportsbook. com set pointspreads to maximize profits? The Journal of
Prediction Markets ,1, 209–218.
Paul, R. J., & Weinbach, A. P. (2008). Price setting in the nba gambling market: Tests of the levitt model of
sportsbook behavior. International Journal of Sport Finance,3, 137.
Paul, R. J., & Weinbach, A. P. (2010). The determinants of betting volume for sports in north america: Evidence of
sports betting as consumption in the nba and nhl. International Journal of Sport Finance,5, 128.
Puranmalka, K. (2013). Modelling the NBA to make better predictions. Master’s thesis Massachusetts Institute of
Technology.
Sharpe, W. F. (1994). The sharpe ratio. The Journal of Portfolio Management ,21 , 49–58.
Sinha, S., Dyer, C., Gimpel, K., & Smith, N. A. (2013). Predicting the nfl using twitter. arXiv preprint
arXiv:1310.6998 , .
Song, C., Boulier, B. L., & Stekler, H. O. (2007). The comparative accuracy of judgmental and model forecasts of
american football games. International Journal of Forecasting,23 , 405–413.
Spann, M., & Skiera, B. (2009). Sports forecasting: a comparison of the forecast accuracy of prediction markets,
betting odds and tipsters. Journal of Forecasting,28 , 55–72.
26
Stekler, H. O., Sendor, D., & Verlander, R. (2010). Issues in sports forecasting. International Journal of Forecasting,
26 , 606–621.
Vraˇcar, P., ˇ
Strumbelj, E., & Kononenko, I. (2016). Modeling basketball play-by-play data. Expert Systems with
Applications,44 , 58–66.
Zimmermann, A., Moorthy, S., & Shi, Z. (2013). Predicting college basketball match outcomes using machine learning
techniques: some results and lessons learned. arXiv preprint arXiv:1310.3607, .
27
Appendix
Below is the list of player and team performance data we used for constructing features. The
grouping of variables and the acronyms shown match the source of the data http://stats.nba.
com.
Basic statistics
AST: Number of assists. An assist occurs when a player completes a pass to a teammate that
directly leads to a field goal. (per minute)
BLK: Number of blocks. A block occurs when an offensive player attempts a shot, and the
defense player tips the ball, blocking their chance to score. (per minute)
DREB: Number of rebounds a player or team has collected while they were on defense. (per
minute)
FG PCT: Percentage of field goals that a player makes. The formula to determine field goal
percentage is: Field Goals Made/Field Goals Attempted. (per minute)
FG3 PCT: Percentage of 3 point field goals that a player or team has made. (per minute)
FG3A: Number of 3 point field goals that a player or team has attempted. (per minute)
FG3M: Number of 3 point field goals that a player or team has made. (per minute)
FGA: Number of field goals that a player or team has attempted. This includes both 2 pointers
and 3 pointers. (per minute)
FGM: Number of field goals that a player or team has made. This includes both 2 pointers
and 3 pointers. (per minute)
FT PCT: Percentage of free throws that a player or team has made.
FTA : Number of free throws that a player or team has taken. (per minute)
FTM: Number of free throws that a player or team has successfully made. (per minute)
MIN: Number of minutes a player or team has played.
OREB: Number of rebounds a player or team has collected while they were on offense. (per
minute)
PF: Number of fouls that a player or team has committed. (per minute)
PLUS MINUS: Point differential of the score for a player while on the court. For a team, it is
how much they are winning or losing by. (per minute)
PTS: Number of points a player or team has scored. A point is scored when a player makes
a basket. (per minute)
28
REB: Number of rebounds: a rebound occurs when a player recovers the ball after a missed
shot. (per minute)
STL: Number of steals: a steal occurs when a defensive player takes the ball from a player on
offense, causing a turnover. (per minute)
TO: Number of turnovers: a turnover occurs when the team on offense loses the ball to the
defense. (per minute)
Advanced statistics
AST PCT: Assist Percentage - % of teammate’s field goals that the player assisted.
ST RATIO: Assist Ratio - number of assists a player or team averages per 100 of their own
possessions.
AST TOV: Number of assists a player has for every turnover that player commits.
DEF RATING: Number of points allowed per 100 possessions by a team. For a player, it is the
number of points per 100 possessions that the team allows while that individual player is on
the court.
DREB PCT: The percentage of defensive rebounds a player or team obtains while on the court.
EFG PCT: Effective Field Goal Percentage is a field goal percentage that is adjusted for made
3 pointers being 1.5 times more valuable than a 2 point shot.
NET RATING: Net Rating is the difference in a player or team’s Offensive and Defensive Rating.
The formula for this is: Offensive Rating-Defensive Rating.
OFF RATING: The number of points scored per 100 possessions by a team. For a player, it is
the number of points per 100 possessions that the team scores while that individual player
is on the court.
OREB PCT: The percentage of offensive rebounds a player or team obtains while on the court.
PACE: The number of possessions per 48 minutes for a player or team.
PIE: An estimate of a player’s or team’s contributions and impact on a game: the % of game
events that the player or team achieved.
REB PCT: Percentage of total rebounds a player obtains while on the court.
TM TOV PCT: Turnover Ratio: the number of turnovers a player or team averages per 100 of
their own possessions.
TS PCT: A shooting percentage that is adjusted to include the value three pointers and free
throws. The formula is: Points
2(FieldGoalsAttempted+0.44FreeThrowsAttempted)
USG PCT: Percentage of a team’s offensive possessions that a player uses while on the court.
29
Four factors, as described by Kubatko, Oliver, Pelton & Rosenbaum (2007)
EFG PCT: Effective Field Goal Percentage is a field goal percentage that is adjusted for made
3 pointers being 1.5 times more valuable than a 2 point shot.
FTA RATE: The number of free throws a team shoots in comparison to the number of shots
the team attempted. This is a team statistic, measured while the player is on the court. The
formula is Free Throws Attempted/Field Goals Attempted. This statistic shows who is good
at drawing fouls and getting to the line.
OPP EFG PT: Opponent’s Effective Field Goal Percentage is what the team’s defense forces
their opponent to shoot. Effective Field Goal Percentage is a field goal percentage that is
adjusted for made 3 pointers being 1.5 times more valuable than a 2 point shot.
OPP FTA RATE: The number of free throws an opposing player or team shoots in comparison
to the number of shots that player or team shoots.
OPP OREB PCT: The opponent’s percentage of offensive rebounds a player or team obtains
while on the court.
OPP TOV PCT: Opponent’s Turnover Ratio is the number of turnovers an opposing team av-
erages per 100 of their own possessions.
OREB PCT: The percentage of offensive rebounds a player or team obtains while on the court.
TM TOV PCT: Turnover Ratio is the number of turnovers a player or team averages per 100 of
their own possessions.
Player scoring statistics
PCT AST 2PM: % of 2 point field goals made that are assisted by a teammate.
PCT AST 3PM: % of 3 point field goals made that are assisted by a teammate.
PCT AST FGM: % of field goals made that are assisted by a teammate.
PCT FGA 2PT: % of field goals attempted by a player or team that are 2 pointers.
PCT FGA 3PT: % of field goals attempted by a player or team that are 3 pointers.
PCT PTS 2PT: % of points scored by a player or team that are 2 pointers.
PCT PTS 2PT MR: % of points scored by a player or team that are 2 point mid-range jump
shots. Mid-Range Jump Shots are generally jump shots that occur within the 3 point line,
but not near the rim.
PCT PTS 3PT: % of points scored by a player or team that are 3 pointers.
PCT PTS FB: % of points scored by a player or team that are scored while on a fast break.
30
PCT PTS FT: % of points scored by a player or team that are free throws.
PCT PTS OFF TOV: % of points scored by a player or team that are scored after forcing an
opponent’s turnover.
PCT PTS PAINT: % of points scored by a player or team that are scored in the paint.
PCT UAST 2PM: % of 2 point field goals that are not assisted by a teammate.
PCT UAST 3PM : % of 3 point field goals that are not assisted by a teammate.
PCT UAST FGM: % of field goals that are not assisted by a teammate.
Usage statistics
PCT AST: % of team’s assists a player contributed.
PCT BLK: % of team’s blocked field goal attempts a player contributed.
PCT BLKA: % of team’s blocked field goal attempts a player contributed.
PCT DREB: % of team’s defensive rebounds a player contributed.
PCT FG3A: % of team’s 3 point field goals attempted a player contributed.
PCT FG3M: % of team’s 3 point field goals made a player contributed.
PCT FGA: % of team’s field goals attempted a player contributed.
PCT FGM: % of team’s field goals made a player contributed.
PCT FTA: % of team’s free throws attempted a player contributed.
PCT FTM: % of team’s free throws made a player contributed.
PCT OREB: % of team’s offensive rebounds a player contributed.
PCT PF: % of team’s personal fouls a player contributed.
PCT PFD: % of team’s personal fouls drawn a player contributed.
PCT PTS: % of team’s points a player contributed.
PCT REB: % of team’s rebounds a player contributed.
PCT STL: % of team’s steals a player contributed.
PCT TOV: % Percent of team’s turnovers a player contributed.
Miscellaneous other statistics
BLKA: Nnumber of field goal attempts by a player or team that was blocked by the opposing
team. (per minute)
OPP PTS 2ND CHANCE: Number of points an opposing team scores on a possession when the
opposing team rebounds the ball on offense. (per minute)
31
OPP PTS FB: Number of points scored by an opposing player or team while on a fast break.
(per minute)
OPP PTS OFF TOV: Number of points scored by an opposing player or team following a turnover.
(per minute)
OPP PTS PAINT: Number of points scored by an opposing player or team in the paint.
PFD: Number of fouls that a player or team has drawn on the other team. (per minute)
PTF FB: Number of points scored by a player or team while on a fast break. (per minute)
PTS 2ND CHANCE: Number points scored by a team on a possession that they rebound the ball
on offense. (per minute)
PTS OFF TOV: Number of points scored by a player or team following an opponent’s turnover.
(per minute)
PTS PAINT: Number of points scored by a player or team in the paint. (per minute)
32
... Therefore, machine learning techniques has proven to be promising while predicting performances; as for the distance in athletic events [10] and selecting competitive athletes for a specific competition [11,12]. Moreover, this approach is used by the tipsters and football experts to predict the winner of the game based on historical data [12][13][14][15][16][17][18][19], which is modelled by classification or regression algorithms. ...
Article
Full-text available
Background and purpose The aim of this study is to incorporte machine learning techniques in physical education activities assessment so we can plan a training session and learning cycle based on predictive analyses using machine learning algorithms. Material and methods A dataset represent the collection of physical tests (as Harvard test, Vertical and Horizontal Trigger) and activities performance (as 600 m, 1000 m, 12 min cooper) of 600 students in a secondary high school, aged between 15 and 20 years old (mean:16,21, SD:0,92), during 2021-2022 scholar year and project the predicted results on the following learning cycles in the scholar year of 2022-2023. We used Microsoft Azure Machine Learning Studio to obtain the best predictive model based on R2 score as an evaluating metric. Results Even if we focus on one metric test (as a target) with numeric values in this article, the results were promising compared to the predicted values of both physical tests and athletic performances, where we noticed some students have exceeded the expected values to reach. And the predictive analysis unveiled the more important features impacting the predicted results for the physical test. Conclusions Incorporating the Machine Learning techniques may encourage the change in the way we teach physical education and sport activities; otherwise, the assessment based on ML techniques will give a different overview on how to start a learning cycle and follow it up. The obtained predictive model provides an explication of the most impacting features on students’ performance allowing any training planification to relay on their importance respectively based on their density that affects prediction.
... Some aspects of sport-related social sustainability also appear in this cluster [88][89][90], such as the illegal influence of sports competitions, anti-doping measures [91][92][93], eliminating unjustified gender discrimination [94][95][96], challenges related to the dual careers of athletes [97][98][99], and specific issues of sports broadcasting [100][101][102]. Some authors' research focuses on the action against abuses committed in the gambling sector [103][104][105] and the issue of e-sports [106][107][108]. Additionally, a relatively large group within this cluster is the number of publications dealing with the appropriate educational role of coaches, sports personnel, teachers, and youth workers [109][110][111][112][113]. c) Health prevention in physical education. ...
Article
Full-text available
In the last decade, sustainability has become a keyword that must be addressed in any environment, society, or economy segment. Therefore, the relevance of sustainable operation and behaviour in complexly interrelated fields like sports is particularly actual. This study was designed to evaluate the demonstration of sports sustainability in the relevant literature by bibliometric analysis. The yearly number of publications, core topics, Sustainable Development Goal (SDG)-connection, author and affiliation patterns, most relevant journals, and terms and keywords were studied to thoroughly overview the trends. It was found that the number of papers is increasing, however, with a slight setback in the last few years. The main topics relate to several environmental, economic (social), governmental (ESG) segments, but the listed publications need more SDG connections. A high proportion of fundamental papers from the research area are from authors affiliated with a narrow group of countries, and one journal published relatively far the most articles. The most relevant keywords are sustainability , physical activity , and sport , while the primary topic hot spots can be classified into four clusters. In conclusion, more in-depth publications are required to reveal the underrepresented field of sports sustainability in the literature, which should simultaneously incorporate the environmental, social, and economic aspects. This would foster better understanding, public awareness, and the spread of best practices in theory and practice, which could be implemented as an attitude-formation tool.
Chapter
Decision intelligence in sports marketing refers to the use of advanced analytics and decision-making tools to inform strategic decisions related to sports marketing initiatives. It involves the collection, analysis, and interpretation of data to gain insights into consumer behavior, market trends, and the effectiveness of marketing campaigns. By leveraging decision intelligence, sports organizations can optimize their marketing strategies, improve fan engagement, and drive revenue growth. Key applications of decision intelligence in sports marketing include customer segmentation, personalized marketing, pricing optimization, and social media analytics. Considering big data (BD) has become a vital tool in contemporary research and sports management practices, managers, business leaders, and marketers have all the information they need to make informed decisions efficiently. Yet to make sense of the data, they often rely on artificial intelligence (AI) and related technology, thus contributing to the rise of the new academic discipline that blends engineering, social sciences, and management and decision theories called decision intelligence (Duan et al., Int J Inf Manag 48: 63–71, 2019). AI is not a stranger to the sports industry. From match outcome predictions and strategic and tactical decision-making to fantasy sports injury predictions, AI has crept into sports at various degrees of application and acceptance (Beal et al., Knowl Eng Rev 34: e28, 2019). Recent projections regarding the global AI in the sports market point out that it could reach a value of 19.2 billion USD with an annual growth rate of 30.3% until 2030 and highlight machine learning as one of the most lucrative segments on the market (Beesetty et al., Artificial intelligence in sports market. Allied Market Research, 2022). This opens the door for a wider application of AI technologies in nurturing decision-intelligence practices in the sports industry. One of them is sports marketing, as one of the fastest-growing aspects of sports (Fullerton et al., Sport Mark Q 17: 90–108, 2008). As a result, this chapter aims to explore the role of decision intelligence in sports marketing with an emphasis on the use of predictive analytics. This chapter first familiarizes the readers with the topic, the objectives, the scope, and the purpose of this chapter. Then, readers can take a look at a detailed overview of the existing literature and the related work explaining the current stage of decision intelligence as well as sports marketing, through a concise evolutionary framework and a summary of common practices and findings. Afterward, the authors introduce the readers to decision intelligence and sports marketing and the possibility of them going hand in hand to boost sales and growth for sports teams and related organizations in the industry. This chapter goes on to emphasize the application of predictive analytics in sports marketing to determine the likelihood of specific future events happening. Despite descriptive analytics, the power of sophisticated data analysis lies in predictive analytics that can model possible fan behavior to renew sports season tickets, likely fluctuations of prices, factors that drive attendance at sports games, and more (Hensley, Why fans crave predictive analytics-and how sports can deliver them. Forbes, 2022). Being aware of this information has the potential to significantly improve the productivity of sales departments of sports organizations, enhance the effectiveness of marketing efforts, lead to increased efficiency of the use of resources, and ultimately drive profits (Mumcu & Fried, Sport Managt Educ J 11(2): 102–105, 2017). Predictive analytics in sports marketing is possible with software like Microsoft’s Power BI that crunches the numbers of relevant factors through AI-driven algorithms and comes up with decision alternatives, fostering a decision intelligence climate for members of the management and marketing teams. Based on the presented data, this chapter will outline the next steps and future paths of research and practical application of decision intelligence in sports marketing and the process of making informed decisions aided by data and AI. This will be followed by a set of conclusions for researchers, managers, and marketers, thus contributing to a deeper understanding of how decision intelligence is applied in sports marketing, most notably for predictive analytics of likely outcomes. Largely, this chapter highlights the importance of decision intelligence and predictive analytics as powerful tools for professionals in sports marketing, enabling them to make data-driven decisions and improve competitiveness and organizational growth.
Chapter
This chapter introduces the topic of predictive models in sports. These models are driven by the idea of coming up with the most accurate data-based estimates of the probabilities of future events. This includes aspects as diverse as the outcomes of sports competitions, the tactical behaviour of teams on the field, or the frequency of injuries. The ability to accurately forecast such events enables bookmakers to improve their business model in the sports betting market. Moreover, it gives competitive advantages to sports teams with regard to both sporting and financial aspects. Predictive models, therefore, represent a relevant and highly interdisciplinary field of research that connects the domains of sports science, economics, mathematics, and computer science.
Conference Paper
In a world full of data, Decision Support Systems (DSS) based on ML models have significantly emerged. A paradigmatic case is the use of DSS in sports organisations, where a lot of decisions are based on intuition. If the DSS is not well designed, feelings of unusefulness or untrustworthiness can arise from the human decision-makers towards the DSS. We propose a design framework for DSS based on three components (ML model, explainability and interactivity) that overcomes these problems. To validate it, we also present the preliminary results for a DSS for rival team scouting in basketball. The model reaches state of the art performance in game outcome prediction. Explainability and interactivity of our solution also got excellent results in our survey. Finally, we propose some lines of research for DSS design using our framework and for team scouting in basketball.
Article
Full-text available
We make a unique contribution to momentum research by proposing a way to quantify momentum with performance indicators (i.e., features). We argue that due to measurable randomness in the NHL, sequential outcomes’ dependence or independence may not be the best way to approach momentum. Instead, we quantify momentum using a small sample of a team’s recent games and a linear line of best-fit to determine the trend of a team’s performances before an upcoming game. We show that with the use of SVM and logistic regression these momentum- based features have more predictive power than traditional frequency-based features in a pre-game prediction model which only uses each team’s three most recent games to assess team quality. While a random forest favors the use of both feature sets combined. The predictive power of these momentum-based features suggests that momentum is a real phenomenon in the NHL and may have more effect on the outcome of games than suggested by previous research. In addition, we believe that how our momentum-based features were designed and compared to frequency-based features could form a framework for comparing the short-term effects of momentum on any individual sport or team.
Article
Full-text available
This paper evaluates the efficiency of online betting markets for European (association) football leagues. The existing literature shows mixed empirical evidence regarding the degree to which betting markets are efficient. We propose a forecast-based approach for formally testing the efficiency of online betting markets. By considering the odds proposed by 41 bookmakers on 11 European major leagues over the last 11 years, we find evidence of differing degrees of efficiency among markets. We show that, if the best odds are selected across bookmakers, eight markets are efficient while three show inefficiencies that imply profit opportunities for bettors. In particular, our approach allows the estimation of the odds thresholds that could be used to set profitable betting strategies both ex post and ex ante.
Article
Full-text available
How can you tell whether a particular sports dataset really adds value, particularly with regard to betting effectiveness? The method introduced in this paper provides a way for any analyst in almost any sport to attempt to determine the additional value of almost any dataset. It relies on the use of deep learning, comprehensive historical box score statistics, and the existence of betting markets. When the method is applied as an illustration to a novel dataset for the NBA, it is shown to provide more information than regular box score statistics alone, and appears to generate above-breakeven wagering profits.
Article
Full-text available
We present a Bayesian network (BN) model for forecasting Association Football match outcomes. Both objective and subjective information are considered for prediction, and we demonstrate how probabilities transform at each level of model component, whereby predictive distributions follow hierarchical levels of Bayesian inference. The model was used to generate forecasts for each match of the 2011/2012 English Premier League (EPL) season, and forecasts were published online prior to the start of each match. Profitability, risk and uncertainty are evaluated by considering various unit-based betting procedures against published market odds. Compared to a previously published successful BN model, the model presented in this paper is less complex and is able to generate even more profitable returns.
Article
Full-text available
In the current world, sports produce considerable statistical information about each p layer, team, games, and seasons. Traditional sports science believed science to be o wned by experts, coaches, team managers, and analyzers. However, sports organizations have recently realized the abundant science available in their data and sought to take advantage of that science through the use of data mining techniques. Sports data mining assists coaches and managers in result prediction, player performance assessment, player in ju ry prediction, sports talent identification, and game strategy evaluation. The present study reviews previous research on data mining systems to predict sports results and evaluates the advantages and disadvantages of each system.
Article
The Levitt (2004) model of sportsbook behavior is tested using actual percentages of dollars bet on NFL games from the internet sportsbook, Sportsbook.com. Simple regression results suggest that Sportsbook.com sets pointspreads (prices) to maximize profits, as the Levitt model assumes, not to balance the betting dollars, as the traditional model of sportsbook behavior assumes. Sportsbook.com is found to accept significantly more wagering dollars on road favorites, larger favorites, and on the over for the highest totals in the over/under betting market. Bettor liquidity constraints and sportsbook betting limits may help explain this result.
Article
The paper presents a forecasting model for association football scores. The model uses a Weibull-inter-arrival times based count process and a copula to produce a bivariate distribution for the number of goals scored by the home and away teams in a match. We test it against a variety of alternatives, including the simpler Poisson distribution-based model and an independent version of our model. The out-of-sample performance of our methodology is illustrated first using calibration curves and then in a Kelly-type betting strategy that is applied to the pre-match win/draw/loss market and to the over-under 2.5 goals market. The new model provides an improved fit to data compared to previous models and results in positive returns to betting.
Article
We present a methodology for generating a plausible simulation of a basketball match between two distinct teams as a sequence of team-level play-by-play in-game events. The methodology facilitates simple inclusion into any expert system and decision-making process that requires the performance evaluation of teams under various scenarios. Simulations are generated using a random walk through a state space whose states represent the in-game events of interest. The main idea of our approach is to extend the state description to capture the current context in the progression of a game. Apart from the in-game event label, the extended state description also includes game time, the points difference, and the opposing teams' characteristics. By doing so, the model's transition probabilities become conditional on a broader game context (and not solely on the current in-game event), which brings several advantages: it provides a means to infer the teams' specific behavior in relation to their characteristics, and to mitigate the intrinsic non-homogeneity of the progression of a basketball game (which is especially evident near the end of the game). To simplify the modeling of the transition distribution, we factorize it into terms that can be estimated with separate models. We applied the presented methodology to three seasons of National Basketball Association (NBA) games. Empirical evaluation shows that the proposed model outperforms the state-of-the-art in terms of forecasting accuracy and in terms of the plausibility of the generated simulations.