Article

Prediction of the margin of victory only from team rankings for regular season games in NCAA men's basketball

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The main objective of this paper is to investigate the extent to which the margin of victory can be predicted solely by the rankings of the opposing teams in NCAA Division I men's basketball games. Several past studies have modeled this relationship for the games played during the March Madness tournament, and this work aims at verifying if the models advocated in these papers still perform well for regular season games. Indeed, most previous articles have shown that a simple quadratic regression model provides fairly accurate predictions of the margin of victory when team rankings only range from 1 to 16. Does that still hold true when team rankings can go as high as 351? Do the model assumptions hold? Can we find semi- or non-parametric methods that yield even better results (i.e. predicted margins of victory that more closely resemble actual results)? The analyses presented in this paper suggest that the answer is "yes" on all three counts!

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The old adage says that it is better to be lucky than to be good, but when it comes to winning NCAA tournament pools, do you need to be both? This paper attempts to answer this question using data from the 2014 men's basketball tournament and more than 400 predictions of game outcomes submitted to a contest hosted by the website Kaggle. We begin by describing how we built a prediction model for men's basketball tournament outcomes under the binomial log-likelihood loss function. Next, under different sets of true underlying game probabilities, we simulate tournament outcomes and imputed pool standings, in an effort to determine how much of an entry's success can be attributed to luck. While one of our two submissions finished first in the Kaggle contest, we estimate that this winning entry had no more than about a 12% chance of doing so, even under the most optimistic of game probability scenarios.
Article
Full-text available
This paper uses the difference in seeding ranks to predict the outcome of March Madness games. It updates the Boulier-Stekler method by predicting the outcomes by rounds. We also use the consensus rankings obtained from individuals, systems and poll. We conclude that the consensus rankings were slightly better predictors in the early rounds but had the same limitations as the seedings in the later rounds.
Article
Full-text available
We propose a new way of quantifying a team's strength of schedule for NCAA basketball. This strength of a schedule is defined as the number of games a team on the borderline of the annual national tournament would expect to win if they played that schedule. This gives a direct way of quantifying how well different teams have done relative to the schedules they have played. Our motivation for constructing this strength of schedule is to help inform the choice of teams given bids to the national tournament: teams who have won more games than their strength of schedule have strongest evidence that they deserve such a bid. Estimating the strength of schedules is possible through fitting a simple statistical model to the results of all regular season matches. We are able to quantify the uncertainty in these estimates, which helps differentiate between teams with clear evidence for selection and those on the borderline. We apply our method to data from the 2007/08 and 2008/09 season. Our results suggest that St. Mary's warranted a bid to the 2009 tournament, at the expense of Wisconsin; and that
Book
Written by a prominent statistician and author, the first edition of this bestseller broke new ground in the then emerging subject of spatial statistics with its coverage of spatial point patterns. Retaining all the material from the second edition and adding substantial new material, Statistical Analysis of Spatial and Spatio-Temporal Point Patterns, Third Edition presents models and statistical methods for analyzing spatially referenced point process data. Reflected in the title, this third edition now covers spatio-temporal point patterns. It explores the methodological developments from the last decade along with diverse applications that use spatio-temporally indexed data. Practical examples illustrate how the methods are applied to analyze spatial data in the life sciences. This edition also incorporates the use of R through several packages dedicated to the analysis of spatial point process data. Sample R code and data sets are available on the author's website.
Book
This book describes an array of power tools for data analysis that are based on nonparametric regression and smoothing techniques. These methods relax the linear assumption of many standard models and allow analysts to uncover structure in the data that might otherwise have been missed. While McCullagh and Nelder's Generalized Linear Models shows how to extend the usual linear methodology to cover analysis of a range of data types, Generalized Additive Models enhances this methodology even further by incorporating the flexibility of nonparametric regression. Clear prose, exercises in each chapter, and case studies enhance this popular text.
Article
Sports events and tournament competitions provide excellent opportunities for model building and using basic statistical methodology in an interesting way. In this article, National Collegiate Athletic Association (NCAA) regional basketball tournament data are used to develop simple linear regression and logistic regression models using seed position for predicting the probability of each of the 16 seeds winning the regional tournament. The accuracy of these models is assessed by comparing the empirical probabilities not only to the predicted probabilities of winning the regional tournament but also the predicted probabilities of each seed winning each contest.
Article
Following the announcement by the NCAA of the seeding and placement of men's basketball teams in the regional tournaments there is often much discussion among basketball afficionados of the fairness. A statistical analysis of simple regression models for the tournament games shows that indeed there is a strong association between the seed positions of the teams and the actual margin of victory; in fact, fairly reliable prediction models of actual margin of victory in tournament games can be achieved based primarily on the seed numbers alone.
Article
Several models for estimating the probability that a given team in an NCAA basketball tournament emerges as the regional champion were presented by Schwertman, Mc-Cready, and Howard. In this article we improve these probability models by taking advantage of external information concerning the relative strengths of the teams and the point spreads available at the start of the tournament for the first round games. The result is a collection of regional championship probabilities that are specific to a given region and tournament year. The approach is illustrated using data from the 1994 NCAA basketball tournament.
Article
Very little attention has been given to predicting outcomes of sporting events. While studies have examined the accuracy of alternative methods of predicting the outcomes of thoroughbred horse races, some obvious predictors of the outcomes of other sporting events have not been examined. In this paper, we evaluate whether rankings (seedings) are good predictors of the actual outcomes in two sports: (1) US collegiate basketball and (2) professional tennis. In this analysis we use statistical probit regressions with the difference in rankings as the predictor of the outcome of games and/or matches. We evaluate both the ex post and ex ante predictions using base rate forecasts and Brier scores. We conclude that the rankings, by themselves, are useful predictors and that the probits improve on this performance.
Article
The LRMC method for predicting NCAA Tournament results from regular-season game outcomes is a two-part process consisting of a logistic regression model to estimate head-to-head differences in team strength, followed by a Markov chain model to combine those differences into an overall ranking. We consider replacing each of the two parts of LRMC with alternative models, empirical Bayes and ordinary least squares, that attempt to accomplish the same goal. Computational results show that replacing the logistic regression with either of two empirical Bayes models yields a statistically-significant improvement when the probabilities are jointly conditioned.
Article
This paper tests whether the differences in rankings between individual players are good predictors for Grand Slam tennis outcomes. We estimate separate probit models for men and women using Grand Slam tennis match data from 2005 to 2008. The explanatory variables are divided into three groups: a player's past performance, a player's physical characteristics, and match characteristics. We estimate three alternative probit models. In the first model, all of the explanatory variables are included, whereas in the other two specifications, either the player's physical characteristics or the player's past performances are not considered. The accuracies of the different models are evaluated both in-sample and out-of-sample by computing Brier scores and comparing the predicted probabilities with the actual outcomes from the Grand Slam tennis matches from 2005 to 2008 and from the 2009 Australian Open. In addition, using bootstrapping techniques, we also evaluate the out-of-sample Brier scores for the 2005-2008 data.
Article
Several authors have recently explored the estimation of binary choice models based on asymmetric error structures. One such family of skewed models is based on the exponential generalized beta type 2 (EGB2). One model in this family is the skewed logit. Recently, McDonald (1996, 2000) extended the work on the EGB2 family of skewed models to permit heterogeneity in the scale parameter. The aim of this paper is to extend the skewed logit model to allow for heterogeneity in the skewness parameter. By this we mean that, in the model developed, here the skewness parameter is permitted to vary from observation to observation by making it a function of exogenous variables. To demonstrate the usefulness of our model, we examine the issue of the predictive ability of sports seedings. We find that we are able to obtain better probability predictions using the skewed logit model with heterogeneous skewness than can be obtained with logit, probit, or skewed logit.
Article
This paper analyses, in a simple two-region model, the undertaking of noxious facilities when the central government has limited prerogatives. The central government decides whether to construct a noxious facility in one of the regions, and how to …nance it. We study this problem under both full and asymmetric information on the damage caused by the noxious facility in the host region. We particularly emphasize the role of the central government prerogatives on the optimal allocations. We …nally discuss our results with respect to the previous literature on NIMBY and argue that taking into account these limited prerogatives is indeed important.
Article
Systems for ranking college basketball or football teams take many forms, ranging from polls of selected coaches or members of the media to so-called computer-ranking systems. Some of these are used in ways that have considerable impact on the teams. The committee responsible for the selection and seeding of teams for the postseason National Collegiate Athletic Association (NCAA) Division I men's basketball tournament is influenced by various rankings, including ones based on the ratings percentage index (RPI). The Bowl Championship Series (BCS) rankings of NCAA Division I-A football teams determine which two teams compete in a postseason national championship game and determine eligibility for other prestigious postseason games. There are certain attributes that seem desirable in any ranking system to be used in selecting or seeding teams for postseason competition or that may have some other tangible or intangible effect on the teams. These attributes include accuracy, appropriateness, impartiality, unobtrusiveness, nondisruptiveness, verifiability, and comprehensibility. The polls, the RPI, and the BCS rankings are notably deficient in several of these attributes. A system having all of the attributes, except for unobtrusiveness, can be achieved by applying least squares to a statistical model in which the expected difference in score in each game is modeled as a difference in team effects plus or minus a home court/field advantage. The potential obtrusiveness of this approach can be circumvented by introducing modifications to reward winning per se and to eliminate any incentive for "running up the score" or for deliberately surrendering a lead so as to extend a game into overtime. The modified least squares system was applied to the 1999-2000 basketball and 1999-2001 football seasons. Its accuracy in predicting the outcomes of 73 postseason football games and 93 postseason basketball games was undiminished by the modifications and was comparable to that of the betting line.
Local regression models. Chapter 8 of Statistical Models in S
  • W S Cleveland
  • E Grosse
  • W M Shyu
Cleveland, W. S., Grosse, E. and Shyu, W. M. (1991). Local regression models. Chapter 8 of Statistical Models in S, eds J.M. Chambers and T.J. Hastie, Wadsworth & Brooks/Cole.
Generalized additive models. Chapter 7 of Statistical Models in S
  • T J Hastie
Hastie, T. J. (1991) Generalized additive models. Chapter 7 of Statistical Models in S, eds J. M. Chambers and T. J.
GAM: generalized additive models (R package version 1.14)
  • T Hastie
Hastie, T. J. (2016) it gam: Generalized Additive Models. R package version 1.14. https://CRAN.R-project.org/package=gam.
  • Harville DA