Article

A Comparison Between Two Models for Predicting Ordering Probabilities in Multiple-Entry Competitions

Authors:
  • Fidelity Investments / Bentley University
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

To predict ordering probabilities of a multiple-entry competition (e.g. a horse-race), two models have been proposed. Harville proposed a simple and convenient model that can easily be used in practice. Henery proposed a more sophisticated model but it has no closed form solution. In this paper, we empirically compare the two models by using a series of logit models applied to horse-racing data. In horse-racing, many previous studies claimed that the win bet fraction is a reasonable estimate of the winning probability. To consider complicated bet types which involve more than one position, ordering probabilities (e.g. P(horse i wins and horse j finishes 2nd)) are required. The Harville and Henery models assume different running time distributions and produce different sets of ordering probabilities. This paper illustrates that the Harville model is not always as good as the Henery model in predicting ordering probabilities. The theoretical result concludes that, if the running time of every horse is normally distributed, the probabilities produced by the Harville model have a systematic bias for the strongest and weakest horses. We concentrate on the horse-racing case but the methodology can be applied to other multiple-entry competitions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... He compared racing horses based on the values estimated by the win bet fractions. If we assumed that Pi and Pj were the probabilities of winning horse i and j, the formula (example by Lo & Bacon-Shone, 1994) for the probability that horse i won the race and horse j finished the second was as follows: ...
... The assumption was later confirmed by Lo and Bacon-Shone (1994). They compared the Harville model with the more sophisticated model created by Henery (1981) and proved the model had a systematic bias in estimating ordering probabilities. ...
Preprint
Full-text available
Horse racing was the source of many researchers considerations who studied market efficiency and applied complex mathematic formulas to predict their results. We were the first who compared the selected machine learning methods to create a profitable betting strategy for two common bets, Win and Quinella. The six classification algorithms under the different betting scenarios were used, namely Classification and Regression Tree (CART), Generalized Linear Model (Glmnet), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Neural Network (NN) and Linear Discriminant Analysis (LDA). Additionally, the Variable Importance was applied to determine the leading horse racing factors. The data were collected from the flat racetracks in Poland from 2011-2020 and described 3,782 Arabian and Thoroughbred races in total. We managed to profit under specific circumstances and get a correct bets ratio of 41% for the Win bet and over 36% for the Quinella bet using LDA and Neural Networks. The results demonstrated that it was possible to bet effectively using the chosen methods and indicated a possible market inefficiency.
... 2 To 2 It is worth noting at this point that there are a number of different algorithms that could be used to estimate the probability of each letter appearing in each position. For example, one possibility is to treat the problem in the same way as one might try to compute the probabilities of the possible outcomes of a horse race given knowledge of the mean finishing times and variances for all horses (see, e.g., Henery, 1981aHenery, , 1981bLo & Bacon-Shone, 1994). The horses W, O, R, and D have mean finishing times of 1, 2, 3, and 4 minutes. ...
... Computationally, this is a very difficult problem. Although some approximations have been presented (see Lo & Bacon-Shone, 1994), we are not aware of any tractable solutions that would significantly speed up our simulations. understand what follows, one of the most important things to appreciate is that word likelihoods are computed from the product of input letter probabilities. ...
Article
Full-text available
The goal of research on how letter identity and order are perceived during reading is often characterized as one of "cracking the orthographic code." Here, we suggest that there is no orthographic code to crack: Words are perceived and represented as sequences of letters, just as in a dictionary. Indeed, words are perceived and represented in exactly the same way as other visual objects. The phenomena that have been taken as evidence for specialized orthographic representations can be explained by assuming that perception involves recovering information that has passed through a noisy channel: the early stages of visual perception. The noisy channel introduces uncertainty into letter identity, letter order, and even whether letters are present or absent. We develop a computational model based on this simple principle and show that it can accurately simulate lexical decision data from the lexicon projects in English, French, and Dutch, along with masked priming data that have been taken as evidence for specialized orthographic representations.
... However, both the Henery and Stern models are complicated to apply in practice. Bacon-Shone, Lo & Busche (1992b) and Lo and Bacon-Shone (1994) showed that the Henery and Stern models fit better than the Harville model for particular racing data. Additionally, Lo and Bacon-Shone (2008) proposed a simple practical approximation for both the Henery and Stern models. ...
... Similar practical difficulties apply to the gamma model proposed by Stern (1990), where an extra shape parameter is involved. Lo and Bacon-Shone (2008) proposed a simple approximation to both the Henery and Stern models: Lo and Bacon-Shone (1994) found that the Harville model had a systematic bias in estimating ordering probabilities based on Hong Kong data and the Henery model was clearly superior in terms of model fit. Bacon-Shone, Lo, and Busche (1992b) had a similar conclusion using Meadowlands data, however, Lo (1994b) found that the Stern model with shape parameter = 4 was better than both Henery and Harville using Japan data. ...
Article
Full-text available
Racing data provides a rich source of analysis for quantitative researchers to study multi-entry competitions. This paper first explores statistical modeling to investigate the favorite-longshot betting bias using world-wide horse race data. The result shows that the bias phenomenon is not universal. Economic interpretation using utility theory will also be provided. Additionally, previous literature have proposed various probability distributions to model racing running time in order to estimate higher order probabilities such as probabilities of finishing second and third. We extend the normal distribution assumption to include certain correlation and variance structure and apply the extended model to actual data. While horse race data is used in this paper, the methodologies can be applied to other types of racing data such as cars and dogs.
... A line of research beginning with Harville [8] estimates the probabilities of the various possible orders of finish of a horse race, assuming knowledge of just the win probabilities of each individual horse. Henery [9], Stern [10], Bacon-Shone et al. [11], Lo and Bacon-Shone [12], and Lo and Bacon-Shone [13] extend this work, developing better and more tractable models. There has also been extensive research on the favorite-longshot bias, the phenomenon that the public typically underbets favored horses and overbets longshot horses, skewing the win probabilities implied by the odds that each horse wins the race. ...
Article
Full-text available
Much work in the parimutuel betting literature has discussed estimating event outcome probabilities or developing optimal wagering strategies, particularly for horse race betting. Some betting pools, however, involve betting not just on a single event, but on a tuple of events. For example, pick six betting in horse racing, March Madness bracket challenges, and predicting a randomly drawn bitstring each involve making a series of individual forecasts. Although traditional optimal wagering strategies work well when the size of the tuple is very small (e.g., betting on the winner of a horse race), they are intractable for more general betting pools in higher dimensions (e.g., March Madness bracket challenges). Hence we pose the multi-brackets problem: supposing we wish to predict a tuple of events and that we know the true probabilities of each potential outcome of each event, what is the best way to tractably generate a set of n predicted tuples? The most general version of this problem is extremely difficult, so we begin with a simpler setting. In particular, we generate n independent predicted tuples according to a distribution having optimal entropy. This entropy-based approach is tractable, scalable, and performs well.
... To apply the model in practice (e.g., betting), we recommend collecting relevant data and choosing the most appropriate model, whether Henery or Stern (r), using a likelihood comparison and then applying Equations (8) and (9) using appropriate parameter values. Alternatively, we can estimate and directly through logistic modeling, for example, see Lo and Bacon-Shone (1994) and Lo (1994). The effect of this improved probability estimation on betting strategy (e.g., the Dr. Z system proposed by Hausch et al. 1981) may result in better returns, see Lo et al. (1995) and Hausch et al. (1994). ...
Chapter
To predict the ordering probabilities of multi-entry competitions (e.g., horse races), Harville (1973) proposed a simple way of computing the ordering probabilities based on the simple winning probabilities. This simple model is implied by assuming that the underlying model (e.g., running times in horse racing) is the independent exponential or extreme-value distribution. Henery (1981) and Stern (1990) proposed to use normal and gamma distributions, respectively, for the running time. However, both the Henery and Stern models are too complicated to use in practice. Bacon-Shone et al. (1992b) have shown that the Henery and Stern models fit better than the Harville model for particular horse racing datasets. In this chapter, we first give a theoretical result for the limiting case that all the horses have the same abilities. This theoretical result motivates an approximation of ordering probabilities for the Henery and Stern models. We then show empirically that this approximation works well in practice.
Article
Punters may engage in betting on both a selection in an event to finish first or in one of the number of places, e.g. second, third or fourth. When the amounts staked with bookmakers at fixed odds on the win and place are equal, it is called an each-way bet. Each-way bets are apparently popular with punters but inconsistent with prominent models of wagering which assume gamblers are everywhere risk-seeking. In this note, we derive the conditions for win and place bets to be optimal in these three models of risky choice. The mathematical conditions for the each-way wager to be optimal, as opposed to a win and place wager with different stakes, are complicated and appear likely to occur rarely in practice. However, bettors obviously see the attraction in giving themselves two ways to bet on the one horse or two ways to win and betting each way. We suggest part of the ‘each-way’ betting attraction is that they are quick and easy to compute – a heuristic – to solve an otherwise complex betting strategy.
Article
We consider a specialized form of risk management for betting opportunities with low payout frequency, presented in particular for exotic horse race wagering. An optimization problem is developed which limits losing streaks with high probability to the given time horizon of a gambler, which is formulated as a globally solvable mixed integer nonlinear program. A case study is conducted using one season of historical horse racing data.
Article
Many games and sports, including races, involve outcomes in which competitors are rank ordered. In some sports, competitors may play in multiple events over long periods of time, and it is natural to assume that their abilities change over time. We propose a Bayesian state-space framework for rank ordered logit models to rate competitor abilities over time from the results of multi-competitor games. Our approach assumes competitors’ performances follow independent extreme value distributions, with each competitor’s ability evolving over time as a Gaussian random walk. The model accounts for the possibility of ties, an occurrence that is not atypical in races in which some of the competitors may not finish and therefore tie for last place. Inference can be performed through Markov chain Monte Carlo (MCMC) simulation from the posterior distribution. We also develop a filtering algorithm that is an approximation to the full Bayesian computations. The approximate Bayesian filter can be used for updating competitor abilities on an ongoing basis. We demonstrate our approach to measuring abilities of 268 women from the results of women’s Alpine downhill skiing competitions recorded over the period 2002–2013.
Article
Hausch et al. (HZR) (Hausch, D. B., W. T. Ziemba, M. Rubinstein. 1981. Efficiency of the market for racetrack betting. Management Sci. 27 1435--1452.) developed a betting system that demonstrated positive profits at two racetracks. The system assumes running times are distributed exponentially, but other distributions for running times (Henery [Henery, R. J. 1981. Permutation probabilities as models for horse races. J. Roy. Statist. Soc. B 43(1) 86--91.] and Stern [Stern, H. 1990. Models for distributions on permutations. J. Amer. Statist. Assoc. 85(410) 558--564.]) have been shown to produce a better fit in Bacon-Shone et al. (1992a), Lo (Lo, V. S. Y. 1994. Application of running time distribution models in Japan. D. B. Hausch, V. S. Y. Lo, W. T. Ziemba, eds. Efficiency of Racetrack Betting Markets. Academic Press, 237--247.), and Lo and Bacon-Shone (Lo, V. S. Y., J. Bacon-Shone. 1994. A comparison between two models for predicting ordering probabilities in multi-entry competitions. The Statistician 43(2) 317--327.) using data from Hong Kong, the Meadowlands, and Japan. The better fit is at the cost of severely increased complexity in computing ranking probabilities, though. In response, Lo and Bacon-Shone (Lo, V. S. Y., J. Bacon-Shone. 1992. An Approximation to Ordering Probabilities of Multi-entry Competitions. Research Report 16, Department of Statistics, University of Hong Kong.) proposed a simple model of computing ranking probabilities which closely approximates those based on the Henery and the Stern models and fits the data as well. This paper couples the Lo and Bacon-Shone model and the HZR system. For data sets from the United States and Hong Kong, we show improved profit over the HZR system at lower levels of risk using final betting data assuming zero computational costs. With data from Japan, our model shows little difference in profits from the HZR system.
Article
A number of models have been examined for modelling probability based on rankings. Most prominent among these are the gamma and normal probability models. The accuracy of these models in predicting the outcomes of horse races is investigated in this paper. The parameters of these models are estimated by the maximum likelihood method, using the information on win pool fractions. These models are used to estimate the probabilities that race entrants finish second or third in a race. These probabilities are then compared with the corresponding objective probabilities estimated from actual race outcomes. The data are obtained from over 15 000 races. it is found that all the models tend to overestimate the probability of a horse finishing second or third when the horse has a high probability of such a result, but underestimate the probability of a horse finishing second or third when this probability is low.
Article
Full-text available
Many racetrack bettors have systems. Since the track is a market similar in many ways to the stock market one would expect that the basic strategies would be either fundamental or technical in nature. Fundamental strategies utilize past data available from racing forms, special sources, etc. to "handicap" races. The investor then wagers on one or more horses whose probability of winning exceeds that determined by the odds by an amount sufficient to overcome the track take. Technical systems require less information and only utilize current betting data. They attempt to find inefficiencies in the "market" and bet on such "overlays" when they have positive expected value. Previous studies and our data confirm that for win bets these inefficiencies, which exist for underbet favorites and overbet longshots, are not sufficiently great to result in positive profits. This paper describes a technical system for place and show betting for which it appears to be possible to make substantial positive profits and thus to demonstrate market inefficiency in a weak form sense. Estimated theoretical probabilities of all possible finishes are compared with the actual amounts bet to determine profitable betting situations. Since the amount bet influences the odds and theory suggests that to maximize long run growth a logarithmic utility function is appropriate the resulting model is a nonlinear program. Side calculations generally reduce the number of possible bets in any one race to three or less hence the actual optimization is quite simple. The system was tested on data from Santa Anita and Exhibition Park using exact and approximate solutions (that make the system operational at the track given the limited time available for placing bets) and found to produce substantial positive profits. A model is developed to demonstrate that the profits are not due to chance but rather to proper identification of market inefficiencies.
Article
Some properties of models for the outcomes of races are described, these properties being consequences of a stochastic ordering of the permutations which define the outcomes of a race. Order statistics models which lead to stochastic ordering are also discussed—particular cases of these are the first‐order model of Plackett (1975) and the normal model of Upton and Brook (1974). An approximation for the normal model is suggested.
Article
A generalization of the equivalence noted by Henery (1981) of a system of exponential order statistics to a model of Plackett (1975) is given.
Article
This paper presents the results of an analysis of horse race data collected from Aqueduct and Belmont Park in 1970. These data are used to demonstrate the reliability of subjective evaluations when incentive is offered to the subjects.
Article
A model is presented which accounts for some of the empirical patterns of betting losses on horses: the punter discounts a constant fraction 1 - f of his losing bets, so that he believes his chances of losing are fq, where q is the true chance of losing. When compared with data from past flat racing seasons, the model is able to describe two important features: the average return from bets at given Starting Price X and the average over-round in races with n runners.
Chapter
To predict the the ordering probabilities such as the probability that horse i wins and j finishes second), the Harville (1973) model has been the most popular. The model assumes the running time distribution is independent exponential. However, a recent empirical study shows that the Henery (1981) model has a better fit. In this paper, we consider the Stern (1990) model in addition to the two models above. We fit the Stern model in a Japanese data set and conclude that the Stern model with a particular value of the shape parameter is superior to the others. Under the assumption of the Stern model, we show that the Harville model has a systematic bias in predicting the ordering probabilities.
Article
The problem discussed is one of assessing the probabilities of the various possible orders of finish of a horse race or, more generally, of assigning probabilities to the various possible outcomes of any multientry competition. An assumption is introduced that makes it possible to obtain the probability associated with any complete outcome in terms of only the ‘win’ probabilities. The results were applied to data from 335 thoroughbred horse races, where the win probabilities were taken to be those determined by the public through pari-mutuel betting.
Article
A probability distribution is defined over the r! permutations of r objects in such a way as to incorporate up to r! minus 1 parameters. Problems of estimation and testing are considered. The results are applied to data on voting at elections and beanstores.
Article
A parametric distribution on permutations of k objects is derived from gamma random variables. The probability of a permutation is set equal to the probability that k independent gamma random variables with common shape parameter and different scale parameters are ranked according to that permutation. This distribution is motivated by considering a competition in which k players, scoring points according to independent Poisson processes, are ranked according to the time until r points are scored. The distributions obtained in this way include the popular Luce-Plackett and Thurstone-Mosteller-Daniels ranking models. These gamma-based distributions can serve as alternatives to the null ranking model in which all permutations are equally likely. Here, the gamma models are used to estimate the probability distribution of the order of finish in a horse race when only the probability of finishing first is given for each horse. Gamma models with shape parameters larger than 1 are found to be superior to the most commonly applied model (shape parameter 1). Examples are limited to small values of k because of the complicated calculations required to compute the distribution. Approximations that are easier to calculate are required before more extensive applications can be undertaken.
Article
Apparent irregularities between the win and the place betting markets in Australian horseracing are examined. Win odds are used to predict win probabilities from which place probabilities are estimated and compared with the place odds on offer. It is concluded that anomalies do in fact exist and are capable, in theory at least, of profitable exploitation.
Article
It is required to test a composite null hypothesis. High power is desired against a composite alternative hypothesis that is not in the same parametric family as the null hypothesis. In an earlier paper a modification of the Neyman‐Pearson maximum‐likelihood ratio test was suggested for this problem. The present paper gives some general comments on the formulation of the problem, a general large‐sample form for the test, and, finally, a number of examples.
Article
Random variables XiN(θi,1),i=1,2,,kX_i \sim N(\theta_i, 1), i = 1,2,\cdots, k, are observed. Suppose XSX_S is the largest observation. If the inference θS>maxiSθi\theta_S > \max_{i\neq S}\theta_i is made whenever XSmaxiSXi>cX_S - \max_{i\neq S}X_i > c, then the probability of a false inference is maximized when two θi\theta_i are equal and the rest are -\infty. Equivalently, the inference can be made whenever a two-sample two-sided test for difference of means, based on the largest two observations, would reject the hypothesis of no difference. The result also holds in the case of unknown, estimable, common variance, and in fact for location families with monotone likelihood ratio.
Article
Subjective and estimated objective winning probabilities are obtained from 20,247 harness horse races. It is shown that subjectively a horse with a low winning probability is exaggerated and one with a high probability of winning is depressed. Various hypotheses characterizing the bettors' behavior to explain the observed subjective-objective probability relation are explored. Under some simplified assumptions, a utility of wealth function of a decision maker is derived, and a quantitative summary measure of his risk attitude is defined. Attitude toward risk of a representative bettor is examined. It is found that he is a risk lover and tends to take more risk as his capital dwindles.
Article
The theory of risk bearing implies risk aversion. In every published study of horse race betting known to the authors, however, investigators reject this implication in favor of "ri sk-loving" behavior. Using the techniques of these studies, the auth ors examine a new data set from Hong Kong and find a rather different result: Hong Kong bettors seem to be either risk neutral or risk ave rse. A striking difference between the Hong Kong data and the previou sly studied North American data is the much larger betting volume per race. Copyright 1988 by the University of Chicago.
Article
Horse racing data permit interesting tests of attitudes toward risk. The present paper studies a new sample of racetrack results from Atlantic City, New Jersey. The questions examined are: (1) Are the market odds the best data for predicting the order of finish? (2) Do horses go off at odds that reflect their true probability of winning? (3) Is there any evidence that late bettors have better information than early bettors? It is found that market odds predict the order of finish well, but that ‘favorites’ are good bets and ‘long shots’ are poor ones. The data suggest that there does exist an ‘informed’ class of bettors and that bettors are on the whole neither risk neutral nor risk averse.
Article
Sangamon State University. An earlier version of this paper was presented December 20, 1976 at the Third Conference on Gambling, Las Vegas, Nevada. I am slightly embarrassed at the extent to which I am indebted to others for help of various kinds. Ron Sutherland suggested the framework for this study and provided helpful comments on various drafts. Nancy Jacob and Mark Rubinstein also made several recommendations. H. Fabro, J. Jung, N. Ostroot and J. Rogers gave many tireless hours collecting data. Also, I received the cooperation of the Illinois Racing Board and the Daily Racing Form's statistical editor Don Anderson. But my list would be totally inadequate without expressing my gratitude to J. Miller's T.A. seminar for requiring me to make a “contract” to complete this study.
Efficiency and profitability in exotic bets
  • P Asch
  • R Quandt
  • J Bacon-Shone
  • V S Y Lo
  • K Busche
Asch, P. and Quandt, R. (1987) Efficiency and profitability in exotic bets. Economica, 54, 289-298. Bacon-Shone, J., Lo, V. S. Y. and Busche, K. (1992a) Modelling the winning probability. Research Report 10. Department of Statistics, University of Hong Kong, Hong Kong. (1992b) Logistic analyses for complicated bets. Research Report 11. Department of Statistics, University of Hong Kong, Hong Kong.
The Science of Winning: a Random Walk on the Road to Riches
  • B P Fabricand
Fabricand, B. P. (1979) The Science of Winning: a Random Walk on the Road to Riches. New York: Van Nostrand Reinhold.
Betting at the Racetrack
  • W T Ziemba
  • D B Hausch
Ziemba, W. T. and Hausch, D. B. (1985) Betting at the Racetrack. New York: Strauss. (1987) Dr. Z's Beat the Racetrack. New York: Morrow.