ArticlePDF Available

Probability and Statistical Models for Racing

Authors:
  • Fidelity Investments / Bentley University

Abstract and Figures

Racing data provides a rich source of analysis for quantitative researchers to study multi-entry competitions. This paper first explores statistical modeling to investigate the favorite-longshot betting bias using world-wide horse race data. The result shows that the bias phenomenon is not universal. Economic interpretation using utility theory will also be provided. Additionally, previous literature have proposed various probability distributions to model racing running time in order to estimate higher order probabilities such as probabilities of finishing second and third. We extend the normal distribution assumption to include certain correlation and variance structure and apply the extended model to actual data. While horse race data is used in this paper, the methodologies can be applied to other types of racing data such as cars and dogs.
Content may be subject to copyright.
An Article Submitted to
Journal of Quantitative Analysis in
Sports
Manuscript 1103
Probability and Statistical Models for
Racing
Victor S. Lo
John Bacon-Shone
Fidelity Investments, victor.lo@fmr.com
University of Hong Kong, johnbs@hku.hk
Copyright
c
2008 The Berkeley Electronic Press. All rights reserved.
Probability and Statistical Models for Racing
Victor S. Lo and John Bacon-Shone
Abstract
Racing data provides a rich source of analysis for quantitative researchers to study multi-entry
competitions. This paper first explores statistical modeling to investigate the favorite-longshot
betting bias using world-wide horse race data. The result shows that the bias phenomenon is
not universal. Economic interpretation using utility theory will also be provided. Additionally,
previous literature have proposed various probability distributions to model racing running time
in order to estimate higher order probabilities such as probabilities of finishing second and third.
We extend the normal distribution assumption to include certain correlation and variance structure
and apply the extended model to actual data. While horse race data is used in this paper, the
methodologies can be applied to other types of racing data such as cars and dogs.
KEYWORDS: favorite-longshot bias, ordering probability, running time distribution, horse race
1
Probability and Statistical Models for Racing
Victor S.Y. Lo
1
and John Bacon-Shone
2
Abstract
Racing data provides a rich source of analysis for quantitative researchers to study
multi-entry competitions. This paper first explores statistical modeling to
investigate the favorite-longshot betting bias using world-wide horse race data.
The result shows that the bias phenomenon is not universal. Economic
interpretation using utility theory will also be provided. Additionally, previous
literature have proposed various probability distributions to model racing running
time in order to estimate higher order probabilities such as probabilities of
finishing second and third. We extend the normal distribution assumption to
include certain correlation and variance structure and apply the extended model to
actual data. While horse race data is used in this paper, the methodologies can be
applied to other types of racing data such as cars and dogs.
KEYWORDS: favorite-longshot bias, ordering probability, running time
distribution, horse race
1
Fidelity Investments, victor.lo@fmr.com; the work was completed when the first author was at
the University of Hong Kong
2
Social Sciences Research Centre, The University of Hong Kong,
johnbs@hku.hk
1
Lo and Bacon-Shone: Probability and Statistical Models for Racing
Published by The Berkeley Electronic Press, 2008
2
1. Introduction
Racing data provides a rich source of analysis for quantitative researchers to study
multi-entry competitions. In particular, horse racing has been well studied by
researchers in multiple disciplines; including economists, psychologists,
management scientists, statisticians, probability theorists, as well as professional
gamblers, see Hausch et al (1994a) which covers articles from all these areas. We
will focus on horse race data in this paper but the methodologies proposed are
transferable to other types of racing such as car, dog, and boat racing.
We study two areas in this paper. Firstly, a favorite-longshot bias is often found
in gambling data. The general interpretation is that since the reward from a
longshot (if it wins) is higher than that from a favorite, gamblers tend to underbet
favorites and overbet longshots. See Ali (1977), Snyder (1978), Asch et al (1982),
Ziemba and Hausch (1987), and Lo (1994a), which all concluded the presence of
this bias in US data with the exception of Busche and Hall (1988) using Hong
Kong data. We apply a model proposed by Lo (1994a) and Bacon-Shone, Lo &
Busche (1992a) to investigate the favorite-longshot betting bias using horse race
data across the world. The result shows that the bias phenomenon is not universal,
possibly due to difference in pool size. Economic interpretation using utility
theory will also be provided. It is important to note that this bias is also reported
in other areas, e.g. Ziemba (2004). While we focus on win bets here, more
complex bets have also been studied elsewhere, e.g. Lo and Busche (1994).
Our second area is predicting higher order probabilities such as the probabilities
of finishing second and third. The procedure of estimating ordering probabilities
typically is: 1) knowledge of winning probabilities (i.e. finishing first); 2)
estimating the mean running times using winning probabilities; and 3) estimating
ordering probabilities using the mean running times. Various probability
distributions have been proposed to model running time. The first model proposed
by Harville (1973) is a simple way of computing ordering probabilities based on
winning probabilities, and can be derived assuming that the running times are
independent exponential or extreme-value. Henery (1981) and Stern (1990)
proposed to use normal and gamma distributions respectively for running times.
However, both the Henery and Stern models are complicated to apply in practice.
Bacon-Shone, Lo & Busche (1992b) and Lo and Bacon-Shone (1994) showed that
the Henery and Stern models fit better than the Harville model for particular
racing data. Additionally, Lo and Bacon-Shone (2008) proposed a simple
practical approximation for both the Henery and Stern models. We extend
Henery’s independent normal distribution assumption to include certain
correlation and variance structure and apply the extended model to real data.
2
Submission to Journal of Quantitative Analysis in Sports
http://www.bepress.com/jqas
3
2. Study of Favorite-Longshot Bias
2.1 Model and Results
We examine whether gamblers tend to underbet favorites and overbet longshots in
order to aim at a higher reward if the longshot wins. Researchers using US horse
race data consistently concluded the presence of this bias. However, Busche and
Hall (1988) did not see such a bias using data from Hong Kong racetracks. We
study whether this bias phenomenon holds for multiple racetracks from different
countries.
While previous researchers used variety of methods to study the favorite-
longshot bias, we apply a more rigorous but simple statistical model proposed by
Lo (1994a) and Bacon-Shone, Lo & Busche (1992a). Define:
P
i
= Bet fraction (or % of win bet) on horse i, i.e. consensus win probability, i = 1,
…, n
= (1- track take)/(1 + O
i
), where O
i
= Win odds on i, and track take is a
percentage from the total betting pool to cover taxes, expenses, and profits,
π
i
= objective (true) win probability of i. Then,
)1(
1
=
=
n
j
j
i
i
P
P
β
β
π
The interpretation of the parameter β is straightforward:
β>1 risk-prefer,
β =1 risk-neutral,
β<1 risk-averse.
Table 1 shows the results when applying model (1) to multiple racetracks.
3
Lo and Bacon-Shone: Probability and Statistical Models for Racing
Published by The Berkeley Electronic Press, 2008
4
Table 1: International Comparison of Favorite-Longshot Bias
Racetrack # races
Estimated β
p-value for
H1: β not
equal to 1
Average pool size
US (Quandt's 83-84):
Atlantic City 712 1.10 0.08 unknown
Meadowlands
705
1.12
0.02
$52K
US (Ali's 70-74):
Saratoga 9,072 1.16 ~0 $25K
Roosevelt 5,806 1.13 ~0 $218K
Yonkers
1.13
~0
$228K
Japan (90)
1.07
0.01
$168K
Hong Kong (81-89):
Happy Valley 2,212 1.04 0.25 $1.1M
Shatin
0.94
0.04
$1.1M
China (23-35):
Shanghai
730
1.03
0.38
unknown
In Table 1, the first column indicates various racetracks in the US, Japan, Hong
Kong, and Mainland China, the second column shows the number of races at each
track, the third column shows the estimated parameter β followed by the p-value
associated with H
0
: β =1 versus H
1
: β 1 in the next column. It can be seen that
the β’s are significantly different from (in fact, greater than) 1, indicating a
longshot-favorite bias, for all racetracks in the US and Japan but not for Hong
Kong and Shanghai racetracks. The last column indicates the average size of the
winning pool for each racetrack, showing a huge difference between Hong Kong
and the rest of the racetracks. One hypothesis is that because of the much higher
pool size in Hong Kong, the higher expected gain has attracted more careful
research work done in the area, resulting in more accurate bets. For example,
Benter (1994) reports on some scientific research conducted by a betting
syndicate in Hong Kong.
2.2 Utility Interpretation
Next, we employ economic utility theory to study the favorite-longshot bias based
on model (1). Assuming expected utility maximizer is indifferent between betting
on any horses in a race, see Ali (1977), it can be shown that:
4
Submission to Journal of Quantitative Analysis in Sports
http://www.bepress.com/jqas
5
The negative Arrow-Pratt Measure means that bettors take more risk as capital
decline, i.e. “Risk-lovers,” if β > 1. See Ali (1977) and LeRoy & Werner (2006).
3. Predicting Ordering Probabilities with Running-time Distribution
3.1 Overview
While predicting the winner is important, it is also important to predict second
and third places. In horse racing, this is related to exacta and trifecta bets. To
estimated ordering probabilities such as π
ij
(probability of i finishing first and j
finishing second) and π
ijk
(probability of i finishing first, j finishing second, and k
finishing third), Harville (1973) proposed the following simple formulas:
take.trackwhere
functionpowera,)1()1(
)1(
1
/
)1(
then
,)1()(ofutilityExpected
=
+
+
+=
=
+
=
+
=
=
t
OO
t
P
K
K
OU
iKOUUEi
ii
ij
j
i
i
iii
ββ
β
β
π
π
1. ifwealth,withincreasesand,0
)2(/)1(
)('
)(''
AversionRiskAbsoluteofMeasurePrattArrow theThen,
><
=
=
β
β
x
xU
xU
.fractionbetbyestimatedbecanwhere
)4(
)1)(1(
)3(,
1
ii
jii
kji
ijk
i
ji
ij
P
π
πππ
πππ
π
π
π
π
π
=
=
5
Lo and Bacon-Shone: Probability and Statistical Models for Racing
Published by The Berkeley Electronic Press, 2008
6
(3) and (4) can be derived assuming independent exponential running times (or
equivalently in this context, extreme-value), a simple and perhaps unrealistic
assumption.
Other running time distributions for racing data have been proposed to estimate
ordering probabilities. However, the formulas for ordering probabilities are
usually not as simple as (3) and (4). Let T
i
be the running time of horse i, then the
following procedure can be used to estimate ordering probabilities:
)4()|()]|(1[)|(
}){(
: Estimate :3 Step
resp.cdf,andpdfare(.)and(.)and
,parameterlocationor)(where
)3(,)|()]|(1[
}){(
: estimate oequation t following theSolve :2 Step
.fraction bet by the estimated becan This . Estimate :1 Step
,
,
i
i
=
<<=
=
=
<=
jir
jjjrjij
r
jir
jiij
ij
ii
ir
iiiri
r
ir
ii
i
dttftFtF
T
MIN
TTP
π
Ff
TE
dttftF
T
MIN
TP
P
θθθ
π
θ
θθ
π
θ
π
Similar integrals can be computed for higher order probabilities.
Henery (1981) assumed that T
i
N(θ
i
,1) independently. This will involve
solving the system of integral equations in (3) and (4) using numerical
integrations and is not practical to use in real races. Similar practical difficulties
apply to the gamma model proposed by Stern (1990), where an extra shape
parameter is involved. Lo and Bacon-Shone (2008) proposed a simple
approximation to both the Henery and Stern models:
6
Submission to Journal of Quantitative Analysis in Sports
http://www.bepress.com/jqas
7
).6(and)5(toreduces)7(
,1,timelexponentiaforthatNote
).Shone(2007BaconandLoinvaluesparameterareand
,'fractionsbetbyestimatedbecan'where
)7(
,
==
=
τλ
τλ
π
π
π
π
π
ππ
τ
τ
λ
λ
sPs
ii
jit
t
k
is
s
j
iijk
Lo and Bacon-Shone (1994) found that the Harville model had a systematic bias
in estimating ordering probabilities based on Hong Kong data and the Henery
model was clearly superior in terms of model fit. Bacon-Shone, Lo, and Busche
(1992b) had a similar conclusion using Meadowlands data, however, Lo (1994b)
found that the Stern model with shape parameter = 4 was better than both Henery
and Harville using Japan data. All these models and approximations are based on
the assumption of independent running times. We will now relax this assumption
in a generalization of the Henery model.
3.2 Extension of the Henery Model
Recall that Henery (1981) assumed that T
i
N(θ
i
,1) independently. A natural
extension is to assume a constant correlation, i.e. Corr(T
i
,T
i
) = ρ for all i and j
(and all races). However, it can be easily shown that this is equivalent to the
Henery model where running times are independent so a more complex structure
is proposed:
To estimate the parameters δ, γ, and κ in (8) – (10) using maximum likelihood,
we choose the top 5 finishing positions for constructing the likelihood function as
the correlation and non-constant variance structure is expected to show higher
degree of importance in estimating higher order probabilities. Following Steps 1
3 in Section 3.1 for models (8) (10), and using a first order Taylor series
a) Non
-
constant correlation
:
ρ
ij
=
ψ
i
ψ
j
i
j
,
(
8
)
where log(
ψ
i
1
ψ
i
)=
δ
γ
(
θ
i
θ
),
θ
=
1
n
θ
i
i
, (9)
i.e. correlations tend to be higher for stronger pairs .
b) Non - constant variance :
σ
i
= exp[
κ
(
θ
i
θ
)], (10)
i.e. if
κ
> 0,weaker horses will have higher variance.
If
γ
=
κ
= 0,it reduces to Henery.
7
Lo and Bacon-Shone: Probability and Statistical Models for Racing
Published by The Berkeley Electronic Press, 2008
8
approximation similar to Henery (1981)’s, it can be shown that with Steps 1 and
2:
./11'and,1'
,
'
1
1
1
',
)1'('
'
''
,resp.cdf,andpdfnormalstandardareand
horses,n ofraceain
statisticordernormalstandardminimumoforigin aboutmomentsecond
horses,n ofraceainstatisticordernormalstandardminimumofvalueexpected
,'''
,horsforfractionbetwin
),(),/1(where
)11(
))(1)((
2
2
)2(
;1
;1
)2(
;1;11
11
0
1
00
nne
B
ne
nA
ABAM
ieP
Pznz
M
zznz
n
n
nn
i
ii
i
i
=+=
=
=
Φ
=
=
+=
=
Φ=Φ=
δ
δ
δ
δ
δδ
γ
κ
φ
µ
µ
µµ
φ
θ
And, with Step 3, to predict horses i1, i2, …, i5 finishing the first 5 positions:
positions. 5first in the finishing horses of nspermutatio allover summed isr denominato the and
n, of sample ain statisticorder normal standard minimum oforigin about moment secondith
n, of sample ain statisticorder normal standard expectedith
,'''
),15)...(1(,
)(
1
),/1(
)12(
)]}5/([{
)]}5/([{
}){min...(
)2(
;
;
)2(
;;
5
55
55
1
5
5,4,3,2,1
5
1
5
1
5
1
55
5
1
5
1
5
1
55
5,4,3,2,1
5,...,1
51
=
=
+=
+==Φ=
++Φ
++Φ
=
<
<
<
= = =
= = =
ni
ni
ninii
n
n
n
ttttt r r r
rtrrtr
r r r
rirrir
iiiiir
iir
ii
ABAM
nnnP
PC
PC
where
nMMC
nMMC
TTTP
µ
µ
µµ
φ
ν
θθν
θθν
π
8
Submission to Journal of Quantitative Analysis in Sports
http://www.bepress.com/jqas
9
Appendix A outlines the proof for (11) and (12). It can be easily shown that the
log likelihood of data from multiple races is:
estimated. be toparameters theoffunction a as , racein
positions 5first in the finishingactually horses 5 top theofy probabilit theis where
logliklog
],12345[
#
1
],12345[
l
l
races
l
l
π
π
=
=
The above models have been fit on 400 8-horse races in Hong Kong. The model
objective is to predict the probabilities of horses finishing in the first 5 positions.
Table 2: Comparison Between Henery and Extended Models
Model Estimates p-value of likelihood
ratio test relative to
Henery
a1) Non-constant
correlation (γ only)
γ = 0.58 0.06
a2) Non-constant
correlation (γ and δ)
γ = 0.60, δ=0.05
0.18
b) Non-constant variance
κ = 0.08
0.06
(Note: p-value above indicates the significance of the difference between the
extended model and the original Henery model by the likelihood ratio test.)
Table 2 indicates that the non-constant correlation structure with slope γ only or
non-constant variance structure shows some promise (significant at 6% level).
Improving the ordering probability estimates is only meaningful if they can be
used in practice. Hausch, Ziemba and Rubinstein (1981) assumed the Harville
(1973) model and developed a Kelly criterion (Breiman (1960), Algoet and Cover
(1988), Haigh (2000)) based stochastic nonlinear programming model to optimize
bets. Using a similar optimization algorithm, Lo, Bacon-Shone and Busche (1995)
demonstrated the superiority of using the Henery and Stern models in terms of
long-term returns in different racetracks. Hausch, Lo, and Ziemba (1994b),
however, concluded that the Harville model was slightly better than the Henery
model using a small data set in a particular type of bets. For future research, it will
be interesting to see whether the above non-constant correlation or non-constant
variance structure, while marginally significantly better in terms of fit, will
demonstrate a better result in betting. Further, it will be better if a simpler
approximation similar to (7) can be derived for (8) – (10) to be applied in practice.
9
Lo and Bacon-Shone: Probability and Statistical Models for Racing
Published by The Berkeley Electronic Press, 2008
10
4. Conclusion
Racing data is so rich that it provides many opportunities for academia and
practitioners to study. While this paper focused on horse-racing data, the
techniques can be applied to other types of racing such as cars, boats, and dogs.
In this paper, we studied two research areas in racing data. First, based on a
rigorous yet simple statistical model, we discovered that the so-called favorite
long-shot bias is not a universally true phenomenon although it appears to be
consistent in the US. We suggested a hypothesis to explain the results but
racetrack data from more countries can be used for further research. Second, we
attempted to improve existing ordering probability models using more complex
correlation and variance structures. The result shows some promise and deserves
further investigation especially in terms of generating returns in racetrack betting.
Appendix A: Approximation Formulas for the Non-Constant Correlation
and Non-Constant Variance Structures
This appendix provides an outline of the proof for (11) and (12), which are a first-
order Taylor series approximation to the solution to (8) – (10). It is a similar
approach used by Henery (1981).
Consider the structures in (8) – (10), it can be shown that the running times
among horses in the same race can be expressed as (see Johnson and Kotz (1972,
p.47))
10
Submission to Journal of Quantitative Analysis in Sports
http://www.bepress.com/jqas
11
.given of pdf and cdf theare above (.) and (.) where
),3.()},...,({
)();|()];|(1[
}){min(
Then,
.(10) and (9) toaccording )],(exp[ and )]}(exp[1{
where
(A.2) tly,independen ))1(,(~|
:implies also which ,gh only throuother each with correlated are ,..., means This
).1,0(~,...,, where
)1.(,...,2,1,1
0
1
0000
1
i
22
00
01
10
2
0
uTfF
Ag
duudyuyfuyF
TTP
uNuT
UTT
NUUU
AniUU
T
iii
ni
ir
iirr
r
ir
ii
iii
iiiiii
n
iid
n
iii
i
ii
θθ
φθθ
π
θθκσθθγδ
ψ
ψσσψθ
ψψ
σ
θ
Φ=
=
<=
=++=
+
=+=
Applying the first order Taylor series approximation to g(.) in (A.3) around θ
i
’s =
0:
)],0,...,0(),...,(
),...,(
[)0,...,0(),...,(
1
1
1
1
=
+
=
n
n
j
j
ni
jini
g
gg
θθ
θ
θθ
θθθ
(11) can be obtained with the following additional approximations:
.0 that assumption theand ,for 0 and 0
n
1
=
=
j
j
i
r
i
r
ir
θ
θ
σ
θ
ψ
We are now going to find a general approximate formula for π
i1,i2,…,im
, (m=1,…n).
ns.permutatio possible all
over taken issummation theand ,..., ofn permutatio a is ,..., where
)4.()},,...,({)......(
})min{...(
1,1
1
,...,
11
,...1
1,...,2,1
1
inminm
n
AA
nmimi
imir
rimiimii
TTAA
AhAATTP
TTTP
nm
++
+
Φ=<<<<<=
<
<
<
=
+
θθ
π
Each term in the above summation can be evaluated as follows:
11
Lo and Bacon-Shone: Probability and Statistical Models for Racing
Published by The Berkeley Electronic Press, 2008
12
ly.respective ,,...,,,..., of spdf' theare (.)(.),...,(.),(.),..., where
,...)|(...)|()|()(
)......(
1111
01200220110
11
1 1
nmiminmimi
t a
iinnniiii
nmimi
AATTffff
dudtdtdauafutfutfu
AATTP
i n
++
+
=
<
<
<
<
<
φ
Using the first order Taylor series approximation again to h(.) in (A.4) around
θ
i
’s = 0, the numerator of (12) can be obtained for m=5. The denominator of (12)
is there to make sure that the following is satisfied:
=
1 5
5,...,1
.1...
i i
ii
π
both.not but needed is or either thusand ,equivalent are
(10) and (9) 0, when (12), and (11)in ion approximat theleast withat ,shown that becan It
κγ
δ
=
References
Algoet, P.H. and Cover T.H. (1988) “Asymptotic Optimality and Asymptotic
Equipartition Properties of Log-optimum Investment,” The Annals of
Probability, 16, No.2, p.876-898.
Ali, M..M. (1977) “Probability and Utility Estimates for Racetrack Bettors,” J. of
Political Economy, 84, p.803-815.
Asch, P., Malkiel, B., and Quandt, R. (1984) Market Efficiency in Racetrack
Betting,” J. of Business 57, p.165-174.
Bacon-Shone, J., Lo, V.S.Y., and Busche, K. (1992a) “Modelling the Winning
Probability,” Research Report 10, Dept. of Statistics, the University of Hong
Kong.
Bacon-Shone, J.H., Lo, V.S.Y., and Busche, K. (1992b) “Logistic Analyses of
Complicated Bets,” Research Report 11, Dept. of Statistics, the University of
Hong Kong.
Benter, W. (1994) “Computer Based Horse Race Handicapping and Wagering
Systems: A Report,” in Hausch, D.B., Lo, V.S.Y., and Ziemba, W.T. ed.
(1994) Efficiency of Racetrack Betting Markets, Academic Press, p.183-198.
Bolton, R.N. and Chapman, R.G. (1986) “Searching for Positive Returns at the
Track, A Multinomial Logit Model for Handicapping Horse Races,”
Management Science, 32, p.1040-1059. Hausch, D.B., Lo, V.S.Y., and
Ziemba, W.T. ed. (1994) Efficiency of Racetrack Betting Markets, Academic
Press, p.237-247.
Breiman, L. (1960) “Investment Policies for Expanding Businesses Optimal in a
Long-run Sense,Naval Research Logistics Quarterly, 7, p.647-651.
12
Submission to Journal of Quantitative Analysis in Sports
http://www.bepress.com/jqas
13
Busche, K. and Hall, C.D. (1988) “An Exception to the Risk Preference
Anomaly,” J. of Business, 61, p.337-346.
Haigh, J. (2000) “The Kelly Criterion and Bet Comparisons in Spread Betting,”
The Statistician, 40, Part 4, p.531-539.
Harville, D.A. (1973) “Assigning Probabilities to the Outcomes of Multi-Entry
Competitions,” J. of American Statistical Association, 68, p.312-316.
Hausch, D.B., Lo, V.S.Y., and Ziemba, W.T. ed. (1994a) Efficiency of Racetrack
Betting Markets, Academic Press.
Hausch, D.B., Lo, V.S.Y., and Ziemba, W.T. (1994b) “Pricing Exotic Racetrack
Wagers,” in Hausch, D.B., Lo, V.S.Y., and Ziemba, W.T. ed. (1994)
Efficiency of Racetrack Betting Markets, Academic Press, p.469-483.
Hausch, DB., Ziemba, W.T., and Rubinstein, M. (1981) “Efficiency of the Market
for Racetrack Betting,” Management Science, 27, p.1435-1452.
Henery, R.J. (1981) “Permutation Probabilities as Models for Horse Races,” J. of
Royal Statistical Society B, 43, p.86-91.
Henery, R.J. (1985) “On the Average Probability of Losing Bets on Horses with
Given Starting Price Odds,” J. of Royal Statistical Society A, 148, p.342-349.
Johnson, N.L. and Kotz, S. (1972) Distributions in Statistics: Continuous
Multivariate Distributions, John Wiley & Sons.
LeRoy S.F. and Werner J. (2006) Principles of Financial Economics, Cambridge.
Lo, V.S.Y. (1994a) “Application of Logit Models in Racetrack Data,” in Hausch,
D.B., Lo, V.S.Y., and Ziemba, W.T. ed. (1994) Efficiency of Racetrack
Betting Markets, Academic Press, p.307-314.
Lo, V.S.Y. (1994b) “Application of Running Time Distribution Models in Japan,”
in Hausch, D.B., Lo, V.S.Y., and Ziemba, W.T. ed. (1994) Efficiency of
Racetrack Betting Markets, Academic Press, p.237-247.
Lo, V.S.Y. and Bacon-Shone, J. (1994) “A Comparison between Two Models for
Predicting Ordering Probabilities in Multiple-Entry Competitions,” The
Statistician, 43, No.2, p.317-327.
Lo, V.S.Y. and Bacon-Shone, J. (2008) “Approximating the Ordering
Probabilities of Multi-Entry Competitions By a Simple Method,” To appear
in: Hausch, D.B. and Ziemba, W.T. ed. (2008) Handbook of Investments:
Efficiency of Sports and Lottery Markets, Elsevier.
Lo, V.S.Y., Bacon-Shone, J., and Busche, K. (1995) “The Application of Ranking
Probability Models to Racetrack Betting,” Management Science, 41, p.1048-
1059.
Lo, V.S.Y. and Busche, K. (1994) “How Accurately do Betters Bet in Doubles?,”
in Hausch, D.B., Lo, V.S.Y., and Ziemba, W.T. ed. (1994) Efficiency of
Racetrack Betting Markets, Academic Press, p.465-468.
Stern, H. (1990) “Models for Distributions on Permutations,” J. of American
Statistical Association, 85, p.558-564.
13
Lo and Bacon-Shone: Probability and Statistical Models for Racing
Published by The Berkeley Electronic Press, 2008
14
Thorp E.O. (1971) “Portfolio Choice and the Kelly Criterion,” Business and
Economics Statistics Section, Proceedings of the American Statistical
Association.
Ziemba, W.T. (2004) “Behavioral Finance, Racetrack Betting and Options and
Futures Trading,” Mathematical Finance Seminar, Stanford University.
Ziemba, W.T. and Hausch, D.B. (1987) Dr. Z’s Beat the Ractrack, Morrow.
14
Submission to Journal of Quantitative Analysis in Sports
http://www.bepress.com/jqas
... One possibility is that the large pools in Hong Kong (as seen in Table 2, they are on average more than 50 times the US pools per race and 5 times the Japanese pools per race) have attracted sufficient serious gamblers, like Benter, to remove the bias. Lo and Bacon-Shone [19] used a more sensitive log odds test instead of grouping by odds or position as used by Busche and Hall. They conclude that Hong Kong and Shanghai have no bias towards long shots, while at least some tracks in the US and Japan have a positive bias, although Busche [20] showed that other tracks in Japan do not show the bias and Walls and Busche [21] suggest this mixed result in Japan may reflect that tracks showing the bias have much lower turnover than the tracks that show the bias. ...
Chapter
Horse racing is a popular sport in many countries for racing on the flat, over obstacles, and with carts, because of the associated gambling. The wide range of bets available have attracted wide academic interest in predicting the outcomes, testing for efficiency of the betting markets, constructing betting systems, and predicting the turnover. This article discusses the key modeling issues raised in these different contexts, including the arguments about how to account for decreasing incentives down the finishing order and whether the inefficencies found are caused by risk preference or biased expectations.
... Bacon-Shone et al. (1992a) suggest a method to remove any bias using a logit model before estimating the ordering probabilities. For future research opportunities, one can derive a similar approximation to a more complex probability model in Lo and Bacon-Shone (2008). ...
Chapter
To predict the ordering probabilities of multi-entry competitions (e.g., horse races), Harville (1973) proposed a simple way of computing the ordering probabilities based on the simple winning probabilities. This simple model is implied by assuming that the underlying model (e.g., running times in horse racing) is the independent exponential or extreme-value distribution. Henery (1981) and Stern (1990) proposed to use normal and gamma distributions, respectively, for the running time. However, both the Henery and Stern models are too complicated to use in practice. Bacon-Shone et al. (1992b) have shown that the Henery and Stern models fit better than the Harville model for particular horse racing datasets. In this chapter, we first give a theoretical result for the limiting case that all the horses have the same abilities. This theoretical result motivates an approximation of ordering probabilities for the Henery and Stern models. We then show empirically that this approximation works well in practice.
Article
Statistical evidence is presented to answer the title question using graphical tools from process behavior charting as well as ranking methods based on principal component analysis. These tools provide strong data evidence to answer the question convincingly.
Article
Full-text available
This paper investigates fundamental investment strategies to detect and exploit the public's systematic errors in horse race wager markets. A handicapping model is developed and applied to win-betting in the pari-mutuel system. A multinomial logit model of the horse racing process is posited and estimated on a data base of 200 races. A recently developed procedure for exploiting the information content of rank ordered choice sets is employed to obtain more efficient parameter estimates. The variables in this discrete choice probability model include horse and jockey characteristics, plus several race-specific features. Hold-out sampling procedures are employed to evaluate wagering strategies. A wagering strategy that involves unobtrusive bets, with a side constraint eliminating long-shot betting, appears to offer the promise of positive expected returns, even in the presence of the typically large track take encountered at Thoroughbred racing events. [Reprinted in Efficiency of Racetrack Betting Markets ISBN 0-12-333030-0]
Article
To predict ordering probabilities of a multiple-entry competition (e.g. a horse-race), two models have been proposed. Harville proposed a simple and convenient model that can easily be used in practice. Henery proposed a more sophisticated model but it has no closed form solution. In this paper, we empirically compare the two models by using a series of logit models applied to horse-racing data. In horse-racing, many previous studies claimed that the win bet fraction is a reasonable estimate of the winning probability. To consider complicated bet types which involve more than one position, ordering probabilities (e.g. P(horse i wins and horse j finishes 2nd)) are required. The Harville and Henery models assume different running time distributions and produce different sets of ordering probabilities. This paper illustrates that the Harville model is not always as good as the Henery model in predicting ordering probabilities. The theoretical result concludes that, if the running time of every horse is normally distributed, the probabilities produced by the Harville model have a systematic bias for the strongest and weakest horses. We concentrate on the horse-racing case but the methodology can be applied to other multiple-entry competitions.
Article
A model is presented which accounts for some of the empirical patterns of betting losses on horses: the punter discounts a constant fraction 1 - f of his losing bets, so that he believes his chances of losing are fq, where q is the true chance of losing. When compared with data from past flat racing seasons, the model is able to describe two important features: the average return from bets at given Starting Price X and the average over-round in races with n runners.
Chapter
Recent studies in racetrack market often deal with analyses of bet fractions, estimations of win probabilities and ordering probabilities. The logit model is a common way of statistical modelling of probabilities. This paper discusses and summarizes the possible applications of logit models in racetrack market analyses with win bet and exotic bets. In addition, we illustrate the application of logit model and how to apply the likelihood-based arguments to compare the accuracies of different sources of information using a real data set. The logit model may be more extensively used in racetrack market analyses.
Chapter
To predict the ordering probabilities of multi-entry competitions (e.g., horse races), Harville (1973) proposed a simple way of computing the ordering probabilities based on the simple winning probabilities. This simple model is implied by assuming that the underlying model (e.g., running times in horse racing) is the independent exponential or extreme-value distribution. Henery (1981) and Stern (1990) proposed to use normal and gamma distributions, respectively, for the running time. However, both the Henery and Stern models are too complicated to use in practice. Bacon-Shone et al. (1992b) have shown that the Henery and Stern models fit better than the Harville model for particular horse racing datasets. In this chapter, we first give a theoretical result for the limiting case that all the horses have the same abilities. This theoretical result motivates an approximation of ordering probabilities for the Henery and Stern models. We then show empirically that this approximation works well in practice.