Journal of Quantitative Analysis in Sports

Published by De Gruyter
Online ISSN: 1559-0410
Print ISSN: 2194-6388
This paper aims to analyse the efficiency of the teams that have participated in the last eight seasons in the Spanish football (soccer) league (LFP). Efficiency is based on technical aspects of the sport rather than variables of an economic nature due to how difficult it is to obtain such information from the clubs that make up the Spanish league. These measures are obtained for attack and defense, as both are essentials parts of football as a sport. To measure the efficiency we used a DEA model. Once the two facets mentioned above are defined, it is important to find out whether they are more directly related to the achievement of the points at stake in each match, as the final goal is to gain as many points as possible in order to win the competition or even to avoid are being relegated to the second division. Finally, we measure how the eight teams that have played in the first division for the last few seasons have performed. These teams are primarily the ones that have finished those seasons up the table. In order to achieve this, a DEA-Window analysis is carried out with two windows, as this makes it possible to compare two consecutive seasons. Most teams change either players or their manager from one season to another, which means there is little sense football-wise to analyse windows of more than three seasons.
Home advantage was evaluated using data from three national netball competitions: The Commonwealth Bank Trophy which ran from 1997 to 2007 in Australia, The National Bank Cup which ran from 1998 to 2007 in New Zealand, and The Co-operative Netball Superleague which ran from 2005/06 to 2008/09 in England. Mixed models were used to analyze the goal advantage in playing at home, and both resampling methods and mixed models were used to analyze the probability of winning at home. In Australia there was a significant overall goal advantage in playing at home of 1.9 goals, 95% CI (1.0, 2.7). This value is small compared to the average score for a home team of 53 goals per game. Only in 1997 and 2006 were significant goal advantages evident, both of value 3.3 goals, 95% CI's (0.5, 6.1) and (0.6, 6.0) respectively. In the Australian competition, across all years, there was a significantly higher probability of winning at home, p = 0.54, 95% CI (0.51, 0.56). For New Zealand and England there was neither a significant home goal advantage nor home win advantage. These are unexpected results as netball is derived from basketball which has been shown to have a large home win advantage.
By definition, giving 100 percent effort all of the time is sustainable, but begs the question of how to define 100 percent effort. As a corollary, once a benchmark for defining 100 percent effort is chosen, it may be possible, even optimal, to give a greater amount of effort for a short period of time, while recognizing that this level of effort is not sustainable. This dynamic effort provision problem is analyzed in the context of effort and performance by National Basketball Association (NBA) players over the course of a season. Within this context, several benchmarks for sustainable effort are considered, but these are rejected by the data. Meanwhile, the data are consistent with the proposition that NBA players put forth optimal effort, even if such effort is not always sustainable.
Top Fifty Batsmen in Test Cricket * : 1877-2006 ** 
The assessment of batsmen in cricket is largely based upon their average score: a Test average of 50 or over provides a rule-of-thumb for distinguishing great players from the merely good; Donald Bradman, with the highest Test average ever achieved (99.94), is generally regarded as the greatest of all batsmen even though many of his other achievements have been eclipsed. However, a ranking based on simple averages suffers from two defects. First, it does not take into account the consistency of scores across innings: a batsman might have a high career average but with low scores interspersed with high ones; another might have a lower average but with much less variation in his scores. Second, it pays no attention to the “value†of the player’s runs to the team: arguably, a century, when the total score is 600, has less value compared to a half-century in an innings total of, say, 200. The purpose of this paper is to suggest new ways of computing batting averages which, by addressing these deficiencies, complement the existing method and present a more complete picture of batsmen’s performance. Based on these "new" averages, the paper offers a revised ranking of the top fifty batsmen in the history of Test cricket.
This paper sets out to estimate univariate time series models on a selected set of offensive baseball measures from 1901 to 2005. The measures include homeruns, bases on balls, runs batted in, doubles, and stolen bases. The paper next estimates the trends in these statistics simultaneously using a vector autoregressive time series model. Along the way, tests of assumptions underlying the time-series models are provided. Univariate time series results suggest that simple lag--1 models fit these offensive statistics quite well. The multivariate results show that a simple lag--1 vector autoregressive model also fits quite well. The results of the vector time series model indicate that most statistics are strongly predicted by their prior values. However, certain temporal dependencies among baseball measures are observed, suggesting the importance of examining covariation in baseball data over time.
Major League Baseball's geographic line-up essentially was constant between 1903 and 1953, when the Boston Braves moved to Milwaukee and the new publicly financed stadium built on their behalf. Other MLB teams moved prior to the 1954, 1955, 1958, 1961, 1968, 1970, 1972, and 2005 seasons. In addition to team relocation, MLB has added new teams several times since 1950. The first expansion occurred in 1961, in response to the threat of a proposed third league, with the addition of the Los Angeles Angels and a new incarnation of the Washington Senators. Subsequent MLB expansion occurred in 1962, 1969, 1977, 1993 and 1998. What had been sixteen-team league in 1960 grew to thirty franchises by 2006, an increase of 87.5%. While there has been relatively little academic study of the effects of team relocation, "conventional wisdom" holds that MLB league expansion leads to distribution of the available baseball talent across more teams, and that the thinning of talent has a greater negative effect on quality of pitching than on hitting. Furthermore, it is widely believed that competitive balance is reduced for at least the first year following expansion. This paper analyzes the effects of MLB expansion and team relocation between 1950 and 2004 on trends in game attendance, within-season competitive balance, and the balance between offense and defense. This is done by testing a series of time-series "shock" models on a comprehensive statistical data set assembled from The Baseball Archive and number of other sources. The models consider the effects of relocation and expansion shocks of differing relative magnitudes, while controlling for changes in MLB's population coverage, the effects of new stadia, and the consequences of labor strife. Taking these controls into account, team relocation was found to have a depressing effect on the trend of increasing average MLB attendance per home date during 1950-2004, but not on competitive balance, or on the balance between offense and defense. League expansion was found to have no net effect on trends in average MLB home date attendance, or the balance between offense and defense. However, expansion was found to depress growth in home date attendance for incumbent (i.e., non-expansion) teams, to reduce competitive balance, and to set back the trend of increasing of average team fielding percentage. The reduction in competitive balance in expansion years was found to hold across all teams, as well as among incumbent teams only.
Before the space race and computer games, 1957 was a simpler time. Baseball card collectors, mostly young boys, had few distractions to advance their hobby. In 1957, there were 16 major league baseball teams. The Kansas City Athletics, moved from Philadelphia in 1955, was the farthest west team. For others west of Saint Louis, baseball appetites were fed with the TV Game of the Week with Dizzy Dean, KMOX—the St Louis radio station, the Sporting News, baseball cards, or some combination. The young baseball fans of the period were the “baby boomers†who most likely were the major collectors in 1957. We examine several factors that influence the values of the Topps 1957 baseball card issue through the years and show how this baseball card set avoided the price collapse of the post-1980 baseball card issues.
Introduction: The Mitchell Report to the Commissioner of Baseball sought to characterize the extent to which the use of performance enhancing drugs (PEDs) proliferated through baseball during the last 15 years. While the Report was not primarily initiated to expose individual players, it nonetheless contained detailed accounts of alleged PED abuse by 89 current and former players including seasons in which the abuse occurred and type of abuse (steroids or human growth hormone (HGH)). Previous analyses have largely focused on the impact of PED abuse on individual players (Barry Bonds and Roger Clemens, for instance). The present study integrates data from the Mitchell Report to make inferences about the overall effects of PED abuse on offensive production. Methods: The Lahman database was queried for all offensive seasons from 1995 to 2007 (minimum 50 PA, no pitchers). Runs created per 27 outs (RC27) was used as an estimate of the offensive production of a player in a season. An adjusted RC27 (ADJRC27) was obtained by accounting for career progression effects to reduce the influence of the expected change in performance over time due to age. Information from the Mitchell Report identified each player season as a PED season or a non-PED season. General linear mixed effects models were constructed that modeled ADJRC27 as a function of PED use (Yes or No). Multiple models were considered to assess the PED effect under various assumptions. Results: The baseline model estimated a mean non-steroid ADJRC27 during the study period of 4.58. The effect of steroid use was an additional 0.58 ADJRC27, an increase in production of 12.6% (p=0.0108). Additional models considered the effect of being a player mentioned in the Mitchell Report, adjustments for baseline performance, and the influential effect of Barry Bonds' performance. The estimated steroid effect ranged from 3.9% to 18.0% among twelve different models. Similar analysis of HGH use showed no evidence of performance improvement. Conclusions: This analysis suggests a significant and substantial performance advantage for players who used steroids during the study period. It is estimated that offensive production increased approximately 12% in steroid users versus non-users. This analysis represents the first attempt to quantify the overall effect of PED abuse on offensive performance in baseball.
Using data from the 4,858 baseball games that were played in the major leagues during the 2004 and 2005 seasons, four logit regression models that measure the likelihood of a team winning a game are estimated. Of particular interest is the effect of being the home team. As expected, the results indicate that a home-field advantage does exist in the major leagues, but only under certain circumstances. Specifically, the strength of the home-field advantage varies with the number of runs scored by the home team and with the run differential between the winning and losing team. The probability of a home team winning a game increases as it scores more runs, but it increases at a decreasing rate. Also, for a given number of runs scored, a home team is more likely to win a game than a visiting team. The home-field advantage is strongest in games where the run differential between the winning team and losing team is one run. It is weaker in games where the run differential is two runs and is non-existent in games where the run differential is three runs or more.
The book primarily provides a robust analysis of passing games, while focusing on player match-ups that dictated the outcome of all passing plays. Specifically, focused on breaking down the field into 3 different depth zones (short, medium, and deep) and then analyze the receivers’ and defensive backs’ comparative success in each of those metrics. For receivers, stats include such things as completion percentage in each zone, how open they were, how many passes they dropped, and the yards per attempt in each zone. For defensive backs, he studied the same information from the opposite vantage point. The book also includes analysis of QB's, pass protection of the offensive line and the pass rush of the defense.
This paper first presents a brief review of potential rating tools and methods for predicting success in the NCAA basketball tournament, including those methods (such as the Ratings Percentage Index, or RPI) that receive a great deal of weight in selecting and seeding teams for the tournament. The paper then proposes a simple and flexible rating method based on ordinal logistic regression and expectation (the OLRE method) that is designed to predict success for those teams selected to participate in the NCAA tournament. A simulation based on the parametric Bradley-Terry model for paired comparisons is used to demonstrate the ability of the computationally simple OLRE method to predict success in the tournament, using actual NCAA tournament data from 2006 and 2007. Given that the proposed method can incorporate several different predictors of success in the NCAA tournament when calculating a rating, and is shown to have better predictive power than a model-based approach, it should be considered as an alternative to other rating methods currently used to assign seeds and regions to the teams selected to play in the tournament. The predictive power of the model-based simulation approach is also discussed, given the success of this approach in 2007. The paper concludes with limitations and directions for future work in this area.
Football (soccer) is certainly the most popular sport in Belgium. In 2008-2009, the Belgian premier league (the so-called Jupiler Pro League) consisted of 18 teams, each having played others twice, home and away. Wins and draws earned 3 points and 1 point, respectively, and teams were first ranked by total points and then by total wins, regardless of the goal difference (or the number of goals scored), unlike in most European leagues. Obviously, the Belgian Champion should have been the team which had ended the season on top of the ranking, but a very unlikely event happened: the first two teams, Sporting Anderlecht and Standard Liege, completed the season with exactly the same number of points (77) and the same number of victories (24). Although the goal difference was in favor of Anderlecht, an extraordinary playoff match was hastily organized to decide the championship, and Standard ended up winning. That dramatic denouement sustained the discussions among the large football-lover part of the Belgian population and opens the interrogation: which team was really the best? The aim of this work is to objectively investigate this question through statistical modelling. Given the results of the whole season, a semiparametric model for the conditional probabilities of home win, tie and away win for each match, given the involved teams and other explanatory variables, is fit. The semiparametric nature of the model grants a great flexibility and allows us to identify interesting and up to now ignored patterns in the above probabilities. Then, a large number of Monte-Carlo simulations are run as if the season was replayed a large number of times, which permits the estimation of absolute probabilities of many events of possible interest. In particular, clear evidence about which team really deserved the 2008-2009 Belgian football champion title appears.
Three of the most celebrated football leagues in the world include the English Premier League (EPL), Italy’s Serie A, and Spain’s La Liga. To date, little football research has been conducted that attempts to determine why these leagues are so successful. What is it that the EPL, La Liga, and Serie A do that fosters such a high caliber of play, and what pitch factors, if any, either (1) contrast or (2) connect these prestigious leagues? The paucity of rigorous inquiry has not deterred popular speculation—common folklore has not waited for hard data. Experts rush to characterize the perceived performance characteristics of these leagues with little hesitation. And these assumptions have, to some degree, taken on a life of their own: football’s answer to urban legend. This paper searches for key similarities and differences between these leagues that are bolstered by statistically significant findings as well as evidence to identify the key pitch factors that are associated with a team’s ultimate success within its respective league.
The search for great goalscorers in football has been traditionally tunnel-visioned. Prestigious individual football awards, such as the Golden Boot and the Ballon d'Or, are almost automatically given to the player with the highest goal count. Simple. There is nothing usually complicated about this selection process except, this one-dimensional approach may be missing some less-than-obvious gems of goalscoring by not considering the overall contributions a player makes to his team’s offensive performance as indicated by more subtle measures. The contributions made during the 2009-2010 season by the 20 leading goal scorers from each of the six most prestigious football leagues in Europe—the English Premier League, Spanish La Liga, Italian Serie A, German Bundesliga, French Ligue 1, and the Dutch Eredivisie—are examined. Statistical methods are used to (1) adjust for the significant disparity in the goal scoring prevalence between these leagues that promote unbiased comparisons and (2) account for the shorter, 34 game season used by two of the leagues as compared to the more common 38 game season.
Unlike many other sports where only the top ten or twenty participants have a realistic shot at victory, when 144 players tee it up at a PGA tournament every participant has a legit chance at winning. In golf, even the greatest players lose more often than they win, and long-shots and unknowns win some of the most prestigious events. With such parity, random chance plays a large part in determining the winner. This is evident by the world ranking of four major champions in the 2009: 69<sup>th</sup>, 71<sup>st</sup>, 33<sup>rd</sup>, and 110<sup>th</sup>. While statistical modeling is commonplace in many sports, particularly baseball, the golf world is largely untapped. Using historical data from one of golf’s major championships, the Masters, this paper establishes a technique for modeling hole-by-hole results. This research has two major benefits: the opportunity to calculate real-time winning percentages and definition of the performance coefficient—which quantifies the level of performance within the player’s capability. For instance, 2009 Masters winner, the 69<sup>th</sup> rated Angel Cabrera, only had a seven percent chance of defeating both Phil Mickelson and Tiger Woods over 72 holes. However, his performance coefficient of .01 signifies that he performed close to his optimal performance. While Tiger Woods and Phil Mickelson performed above average with performance coefficients of .37 and .17, respectively, on this given week they were unable to better Cabrera.
The utilization of statistical methods in sports is growing rapidly. Sports teams use statistical analyses to evaluate players and game strategies, and sports associations develop ranking and ratings systems of players and teams. The evolution of the application of statistics to sports continues to be enhanced with extensive collaboration and interaction between sports analysts and professional statisticians. Unfortunately, opportunities for this collaboration are still relatively uncommon, as academic statisticians often work in isolation developing statistical methods for sports applications, while sports organizations often do not have access to well-trained statistical expertise and cutting edge statistical tools for the analysis of sports data.
The San Francisco Giants were crowned champions of Major League Baseball in 2010 after defeating the Texas Rangers in the World Series. The World Series matchup may have come as a surprise to many baseball fanatics; the Rangers ended the regular season with the worst record of any of the eight playoff teams, and the Giants ended with the fourth worst. Did these two teams simply catch fire at the right time? Or were they better than their regular season records showed? To answer these questions, the regular season statistics of individual players on each team were used to simulate the postseason. These simulations determined the probability with which each playoff team could have been expected to win the 2010 World Series.
The aims of this study were to describe match actions in ice sledge hockey on a team level and identify the differences between successful and less successful teams. Eight ice sledge hockey matches in the Winter Paralympics 2010 were recorded and analyzed using the Dartfish TeamPro 5.5 analysis program. The variables for the analysis were chosen based on the performance indicators commonly used in invasion games and nine variables with sufficient reliability were reported. The number of different match actions and the percentages of successful actions were compared between the winning and losing teams, teams in different categories (team’s position in the final ranking 1-4 or 5-8), and between different player roles (forwards and defensemen). Also a scoring analysis for 23 goals was executed. The average number of actions per team in a single match was 507 (±54). The most frequent actions were passes (36 percent of the analyzed actions), dribbles (18 percent), and received passes (16 percent). The success percentages for passes, received passes, dribbles and face-offs were 65±4, 82±4, 74±8 and 50±12. The scoring analysis showed that 96 percent of the goals were shot from a close distance. The most common attack types leading to a goal were possession in the attacking zone and attacks after conquered puck and the most common shot types dribbling+shot and receiving+shot. The average scoring efficacy was 6.2±4.7 percent. The match analysis revealed only slight differences between the winning and losing teams and teams in different categories. Thus, it seems evident that individual skills and mistakes most often determined the final outcomes of the matches.
A country's size in terms of population and economic status play a major role in the resources available to a national soccer team. Additionally, the national expectations for the team define what a successful result is. The purpose of this paper is to present a prediction for the 2010 Federation Internationale de Football Association (FIFA) World Cup based on socioeconomic variables. Building off of models presented by Johnson and Ali (2004) as well as Dyte and Clark (2000), new models are created and presented. First, a linear regression model accounts for the ratings and rankings of the nations participating. Then, the ratings are applied to a Poisson regression to predict the outcome for each of the 64 games in the World Cup. The results predict the tournament will be won by Brazil. The model was then subjected to back-testing using the 2006 World Cup tournament. The back-tested model ranked the eventual first, second and third place finishers higher than the FIFA official rankings. The paper then presents future directions for research and notes the rationale for omissions from the model.
The men’s NCAA basketball tournament is a popular sporting event often referred to as “March Madness.†Each year the NCAA committee not only selects but also seeds the tournament teams. Invariably there is much discussion about which teams were included and excluded as well as discussion about the seeding of the teams. In this paper, we propose an innovative heuristic measure of team success, and we investigate how well the NCAA committee seeding compares to the computer-based placements by Sagarin and the rating percentage index (RPI). For the 2011 tournament, the NCAA committee selection process performed better than those based solely on the computer methods in determining tournament success.
The centers of the five clusters of players found via the k-means algorithm according to each player's shot distribution. 
The logit curve fit to the proportion of three pointers in the data. 
We propose two new measures for evaluating offensive ability of NBA players, using one-dimensional shooting data from three seasons beginning with the 2004-05 season. These measures improve upon currently employed shooting statistics by accounting for the varying shooting patterns of players over different distances from the basket. This variance also provides us with an intuitive metric for clustering players, wherein performance of players is calculated and compared to his cluster center as a baseline. To further improve the accuracy of our measures, we develop our own variation of smoothing and shrinkage, reducing any small sample biases and abnormalities. The first measure, SCAB or, Scoring Ability Above Baseline, measures a player's ability to score as a function of time on court. The second metric, SHTAB or Shooting Ability, calculates a player's propensity to score on a per-shot basis. Our results show that a combination of SCAB and SHTAB can be used to separate out players based on their offensive game. We observe that players who are highly ranked according to our measures are regularly considered as top performers on offense by experts, with the notable exception of LeBron James; the same claim holds for the offensive dregs. We suggest possible explanations for our findings and explore possibilities of future work with regard to player defense.
This paper determines that rankings in both the Media and Coaches college football Top 25 Polls are significantly accurate at their tops, insignificantly accurate towards their bottoms, and significantly more accurate at their tops than their bottoms. The computer-based Sagarin Poll is significantly accurate at both its top and bottom, and significantly more accurate at its top than its bottom. Comparing the Media and Coaches Polls to the Sagarin Poll suggests that the Media and Coaches Polls have diminishing accuracy because of both imperfections in voter behavior and smaller actual differences in team quality at lower ranks.
The basic statistics of evaluation variables according to the dominant and non-dominant legs and soccer and control groups
Purpose: It is important for soccer players to accurately kick a ball in various directions with both legs. This study aimed to examine the accuracy of the kick in soccer players in terms of kick direction, kicking legs (dominant and non-dominant legs), and experience. Methods: Seventeen male soccer players (age: 20.1 ± 1.1 yr) and fourteen male university students without soccer experience (age: 19.8 ± 1.2 yr) participated in the experiment. They kicked a ball from the penalty spot to the targets set at four corners of the goal and the accuracy of each kick was measured. Results: The soccer group’s score was significantly higher than that of the control group regardless of the kick direction or kicking legs. The score of the upper position targets was significantly lower than that of the lower position targets in both groups. There was no significant difference between the scores of the lower position targets of dominant and non-dominant legs in the soccer group. In addition, the score of reverse cross directions (kicking towards the right side of the goal with the right-leg) was significantly higher than that of cross directions (kicking towards the left side of the goal with the right-leg) in soccer players. Conclusion: Soccer players differ non-significantly between the dominant and non-dominant legs in the accuracy of inside kick with low trajectory and have high accuracy when kicking in the reverse cross direction.
Linear models for paired comparisons, the Bradley-Terry model and the Thurstone-Mosteller model in particular, are widely used in sports for ranking and rating purposes. By their formulation, these models predict the probability that a player or team defeats another if the playing strengths of the players or teams are known. In this paper, we investigate the prediction accuracy of the two linear models by using them to describe three simple theoretical games which mimic actual sports and whose winning probability, given the playing strength of each player, can be expressed explicitly. A theoretical result is presented, which provides the basis of a linearization method that enables these games to be represented by linear models. The predicted winning probabilities from the linear models are then compared to the actual ones. Comparisons are also made in prediction accuracy between the Bradley-Terry model and the Thurstone-Mosteller model.
Pitcher intent, as measured by the position of the catcher's glove before a pitch is thrown, is an element of baseball that is regularly observed by commentators (“he's missing his spotsâ€) but remains an uncaptured aspect of statistical analysis of the game, offering many potential aspects on pitcher performance that have yet to be exploited. In order to collect this data systematically for public consumption (a far from trivial task), I propose and design a number of mechanisms for manual collection of this data from video playback using an offine charting approach, the direct indication of catcher position on the video, or a combination of the two. Through a pilot study conducted via a web applet, I find that there are considerable advantages to the direct-on-video method of charting catcher spots, including a higher inter-rater reliability as a consequence of higher precision and fewer replays needed of each pitch for a measurement to be taken, suggesting that direct video analysis, rather than lower-tech zone assessment, will be the preferred method for collecting catcher spotting data as the method becomes more popular.
This work extends the work of Coleman (2005) which provides a football rankings system that minimizes game score violations. The modified model, called AccuV, developed in this paper incorporates several other aspects of most rankings systems including: strength of schedule, margin of victory, and home field advantage. These other components of a team's rank are included via an additional variable called the composite index. In addition to these new components, other bounding constraints have been added. These new constraints significantly reduce the solve-time of the model. The model of (Coleman, 2005) achieved a minimum of 55 game score violations for the complete 2002 season in about 36 seconds with 234,313 iterations as reported by LINGO. However, with the addition of the new constraints, the new model (without the additional ranking components) established the same minimum in only 6 seconds with a total of 55,119 iterations. This work is also easily extendable to allow the addition of other components if desired.
The aim of present investigation was to develop the behavioural instrument for measuring the achievement motivation in sport matches. According to 5-stage behavioural measurement system, the instrument was established and was applied for Iran national soccer team among three matches. The results have revealed the good validity, intra-rater, and inter-rater reliabilities for measuring motivational behaviours in sport contexts. In addition, the repeated measure analysis of variance has shown the applicability of new instrument for studying the association of achievement behaviours with successful performance, through significant differences between achievement behaviours in different matches with varied outcomes (p<.05).It seems the developed instrument is applicable for coaches to discriminate achievement behaviours of players during the match and select their strategy and players’ substitutions according to their trends and behaviours for success.
Representation of command actions for use of the program
Inter-evaluator reliability indicators for each component of the time profile of judo matches
Intra-evaluator reliability indicators for each component of the time profile of judo matches
Judo is an intermittent sport. As such, it is important to characterize the actions involved in combat so that training may be structured in such a way as to simulate competitive demands. However, to do so, an objective notational analysis system is necessary. The aim of the present study was to design a computer program that would aid in the analysis of time structure of specific actions during judo combat and test its inter-evaluator and intra-evaluator reliability. Ten male judokas, divided by class and category, were evaluated during three combats each. The matches were filmed and the evaluators used the computer program Saatsâ„¢ (Structural Analysis of Action and Time in Sports) to analyze the following actions: break, grip, technique, fall and groundwork. The sequences were characterized by the sum of actions between each break. A total of 276 action sequences were evaluated, with a mean of 11 action sequences per combat, with four on the ground. Two evaluations were carried out by three evaluators for each judo match (inter-evaluator agreement), with only one being an expert in the software used. There was a lack of similarities in the results of only two of the variables (p<0.05). Evaluations by the same evaluator (intra-evaluator agreement) demonstrated a high reliability on all six variables. It was concluded that the use of this computer software for notational analysis in judo greatly assists in the detailing of actions performed by the athletes. The use of this software by professionals unfamiliar with it likely requires a short learning period. Knowledge of judo actions will very likely allow practitioners of the sport to be trained more specifically.
The Opta Index is a prestigious performance measure used to assess English Premier League (EPL) football players. Although the Opta model is proprietary, the general structure uses a multiattribute collection of subjectively weighted pitch measures that either rewards or penalizes a player with a potential range of points based on the quality of his game performance. In addition, the specific set of measures used depends upon player position: forwards, midfielders, defenders, and goalkeepers each have their own unique set of measures even though there might be some overlap. Although the player's Opta Index is calculated for each game, it is the cumulative "grade card"—the final Opta Index calculated at the end of the thirty-eight game EPL season in May—that is of particular importance. The index, along with the large array of player pitch data, is commercially distributed to the EPL clubs and appears in a wide variety of television and print media outlets. This paper proposes an alternative to using the full set of Opta data by identifying those specific pitch actions that form a statistically significant retrodictive linear regression model for the 2007-2008 EPL season. Additionally, the importance of evaluating pitch actions historically assumed to be clearly pertinent measures—such as goals allowed per game for the goalkeeper—will be not only be appraised from a statistical viewpoint, but also from a practical perspective.
The goal of this paper is to develop an adjusted plus-minus statistic for NHL players that is independent of both teammates and opponents. We use data from the shift reports on in a weighted least squares regression to estimate an NHL player's effect on his team's success in scoring and preventing goals at even strength. Both offensive and defensive components of adjusted plus-minus are given, estimates in terms of goals per 60 minutes and goals per season are given, and estimates for forwards, defensemen, and goalies are given.
Regression-based adjusted plus-minus statistics were developed in basketball and have recently come to hockey. The purpose of these statistics is to provide an estimate of each player's contribution to his team, independent of the strength of his teammates, the strength of his opponents, and other variables that are out of his control. One of the main downsides of the ordinary least squares regression models is that the estimates have large error bounds. Since certain pairs of teammates play together frequently, collinearity is present in the data and is one reason for the large errors. In hockey, the relative lack of scoring compared to basketball is another reason. To deal with these issues, we use ridge regression, a method that is commonly used in lieu of ordinary least squares regression when collinearity is present in the data. We also create models that use not only goals, but also shots, Fenwick rating (shots plus missed shots), and Corsi rating (shots, missed shots, and blocked shots). One benefit of using these statistics is that there are roughly ten times as many shots as goals, so there is much more data when using these statistics and the resulting estimates have smaller error bounds. The results of our ridge regression models are estimates of the offensive and defensive contributions of forwards and defensemen during even strength, power play, and short handed situations, in terms of goals per 60 minutes. The estimates are independent of strength of teammates, strength of opponents, and the zone in which a player's shift begins.
Home-Advantage-Corrected Ideal Standard Deviation of Winning Percentage
One measure of sports league competitive balance uses a ratio: the standard deviation of team winning percentages is divided by the so-called ideal standard deviation, which assumes a game between evenly-skilled teams is equally likely to be won by either team. In fact, a team is more likely to win when playing at home than when playing on the road. The extent of this advantage differs across sports leagues. Home advantage reduces the variability of season-long team records. Ignoring home advantage biases upward the traditionally measured ideal standard deviation and bias downward the ratio of standard deviations. The authors derive a balanced league standard deviation formula that accounts for home advantage, use it to recompute the ratio of standard deviations for major sports leagues, and consider how the adjustment affects comparisons of competitive balance across those leagues.
The existence of home advantage in Australian Rules football (AFL) has been well documented in previous literature. This advantage typically refers to the net advantage of several factors which, generally speaking, have a positive effect on the home team and a negative effect on the away team. However, this practice excludes the in-course dynamics of home advantage throughout the match including the interrelationship between pre-game and in-game team characteristics. The aim of the present study is to calculate the intra-match home advantage for each quarter in AFL by incorporating the interaction between team quality and current score. Archival AFL data was obtained from seasons 2000 to 2009 which consisted of year, round, quarter, (nominal) home team, away team, home team score and away team score. Analysis of variance (ANOVA) on margin of victory was used to determine if there was a distinct difference between team quality (favourite/underdog) within current score (ahead/behind). Since the in-game team characteristics (current score) are likely to be caused by pre-game characteristics (team quality) the margin of victory is adjusted for team quality. The results provide marginal evidence that home underdogs in the third quarter irrespective of whether they were ahead or behind at half time receive a greater advantage than home favourites. Furthermore, home advantage is greatest in the final quarter when there is a high level of uncertainty about the outcome of the match.
Recently it was reported that in the NBA as a whole, two thirds of the home advantage which teams enjoy when playing at home is accumulated in the first quarter. Home advantage can also be determined for individual teams, and there is good reason for doing so. For example, the relation of home advantage to team statistics such as assists, rebounds, and turnovers can be studied team-specifically but not in the league as a whole. Before any such project is undertaken, however, a major technical problem must be addressed. Formally, team-specific home advantage is a difference score between positively correlated variables (games won at home minus games won away), and difference scores are notoriously unreliable. This unreliability, moreover, is not just an empirical generalization. There is a formal basis for it in the theory of mental tests. This study reports that over a four-year period in the NBA the estimated reliability of team-specific home advantage was 0.284, even though the estimated reliabilities of games won at home and games won away were 0.772 and 0.833 respectively. The implications for research on home advantage are discussed.
In papers on basketball, it is standard practice to treat the home-court advantage in terms of percentages or point differences at the end of the game. This practice leaves out of account how the advantage develops during the game, when it accumulates most strongly, its course and the in-course dynamics. This study analyzes all games played in two seasons of the NBA by quarters and overtime periods. The main result is that home advantage in the NBA is strongly front-loaded. In both years studied the home team accumulated two thirds of the home advantage it had at the end of the game in the first quarter. It accumulated less of an advantage in the second and third quarters, and still less in the fourth quarter. Further, the home team does not on average lengthen its lead in quarters which it enters ahead, but gains strongly in any quarter which it enters behind. The paper concludes with a discussion of theoretical issues raised by these results and next steps in research.
To date, the factors which lead to the very large home court advantage characteristic of the NBA have not yet been well isolated. This study analyzes the relationship between that home court advantage and the comparatively fewer days of rest between games that the NBA schedule imposes on visiting teams. A statistical model has been developed and applied to the NBA data for the 2004-2005 and 2005-2006 seasons to estimate the importance of the effect of rest on the magnitude of the home court advantage. The results indicate that lack of rest for the road team, while not a dominant factor, is an important contributor to the home court advantage in the NBA.
This study examines home advantage in American college football games from a multilevel perspective. It quantifies the extent and significance of that home advantage and examines how it varies between BCS and non-BCS teams as well as analyzing the relationship between home advantage and team parity. Our results indicate that home advantage exists for most teams and conferences. It equates to a 6 point advantage for home teams and a 3-point disadvantage for away teams when controlling for team strength and other predictors. It concludes that after controlling for team ability, non-BCS teams possess a stronger home advantage than BCS teams. Such a result is likely related to the greater parity among BCS teams which leads to a “choking under pressure†effect for them in closely played games.
Received wisdom in baseball takes it as a given that it is an advantage have the last turn at bat in a baseball game. This belief is supported, implicitly or explicitly, by an argument that the team on offense benefits by knowing with certainty the number of runs it must score in the final inning. Because the discrete nature of plays in baseball lends itself naturally to a model of a baseball contest as a zero-sum Markov game, this hypothesis can be tested formally. In a model where teams may employ the bunt, stolen base, and intentional walk, there is no significant quantitative advantage conferred by the order in which teams bat, and in some cases batting first may be of slight advantage. In practice, the answer to the question may be determined by actions more subtle than previously considered, such as the extent to which the defensive team can influence the distribution of run-scoring by pitch selection or fielder positioning.
The issue of competitive balance is not normally considered in the study of home advantage. This paper focuses on home advantage, assessing and comparing it between seasons. The strength of the teams is estimated and linked with competitive balance. The results support that, in both the eighties and early nineties, the home advantage is more visible. After that, it tends to decrease. Meanwhile, competitive balance increases until the nineties and after that, the trend becomes unclear. The changes in the structure of the championship and the reward point system have affected the Portuguese league.
This article presents a method to measure the impact of the home field advantage for intra-conference college football. The method models longitudinal data across several years while utilizing a unique home field parameter for each individual team. Additionally, two novel yet intuitive measures of home field advantage are proposed. As a case study of the method and the definitions of home field advantage, teams with the best and worst home field advantages within their respective conferences are determined.
A current approach to the empirical study of the relationship between affect and the performance of athletes before and during a competition is idiographic in nature. Affect-performance zones are estimated for each athlete based on a sufficient number of paired affect and performance observations. Though extremely important for practitioners, the idiographic approaches introduced in the literature until now do not readily support generalizations across different populations (e.g., for different genders, levels of experience, and levels of expertise). This article illustrates how hierarchical linear modeling (HLM) can be effectively used to retain this idiographic focus, while also adding a nomothetic perspective describing the variation of individual affect-performance relationships across athletes. The article illustrates the computational and graphical options that, when appropriately used, can expand our understanding of the affect-performance linkage for both individual cases and populations of interest.
This study sought to establish the most important factors affecting the service in high-level women’s volleyball and the relative weighting of such factors on this technical part of the sport. A total of 1300 services from eight matches played in two Final Fours of the Indesit European Champions League were analysed. The services were delivered by 58 players of 25 nationalities. Observation sheets and two video cameras located at both ends of the court were used. Service speed was measured by radar. The twelve variables studied enabled the service to be divided into four components. The most influential component (19.02% of total variance) comprised variables related to technical service characteristics (type of service and service speed). The second most influential component (15.16% of variance) was related to the opponents’ technique and tactics, and to their position on court at the time of the service. The service was also affected by the technical and tactical movements that the servicer needed to perform in the subsequent play (12.20%). The stage of the match and the score (10.67%) also presented players with different levels of risk and helped to determine the type of service chosen and the power with which it was executed.
The importance of sports statistics to professional sports teams is clear and there are exciting new opportunities for statisticians to learn from the new types of data that are collected. The Journal of Quantitative Analysis of Sports has shown steady growth since its inception in 2005. The new editorial system and new directions and initiatives of the journal are described.
This study aimed to evaluate individual and team judo strengths using an analytic hierarchy process (AHP). The source of the decision was based only upon the international regulation values assigned to each of the decisive technique: 10 points for an ippon , 7 for a waza-ari and 5 for yuko(s) . The data were obtained from Japan’s interscholastic athletic competition for men’s judo (high school teams) in 2005. Our AHP technique demonstrated an ability to predict judo strengths of individual players, and to detect potential players regardless of unsuccessful team results. There was a significant but small correlation between the actual team rankings and predicted team strengths. The predicted team strengths, however, may be more informative and appropriate to evaluate judo strengths than the tournament results, as they were based on the content of each individual match, rather than just win/lose results. Coaches or talent scouts should consider the estimated individual or team strengths as well as competition results to make more accurate decisions when selecting players or teams.
Batting Average (AVG) and On-Base Percentage (OBP) are two of the most commonly cited statistics in baseball. Existing research has demonstrated that for a team, OBP is more closely correlated to runs scored than is AVG, and secondly, for players, OBP is more closely correlated over time than is AVG. We offer an algebraic explanation for the latter phenomenon. Specifically, we will prove that batting average depends more heavily upon a particularly unpredictable variable, hits per balls in play (HPBP), than does OBP. This result will explain why for both batters and pitchers, on-base percentage is a better indicator of future performance than batting average.
The 2007 gambling scandal involving a National Basketball Association (NBA) referee, coupled with the NBA’s follow-up investigation, put allegations of basketball referee bias in the spotlight. This paper analyzes specific allegations of bias by Miami Heat coach and general manager Pat Riley against NBA referees Steve Javie and Derrick Stafford. In the course of analyzing every referee who officiated a Miami Heat during a nine-year period, neither Javie nor Stafford exhibited systematic bias that had an adverse effect on the Miami Heat. In fact, the Heat performed slightly better than predicted when Javie officiated their games. The results provide real-world empirical evidence consistent with “confirmation bias,†a theory grounded in the finding that individuals with a vested interest in certain self-justifying outcomes may reach generalized conclusions unsupported by actual evidence.
Competition points are awarded in sports tournaments to determine which participants qualify for the playoffs or to identify a champion. We use competition points to measure strength in a prediction model and choose points to maximize prediction accuracy. This allows us to determine the allocation of competition points that most appropriately rewards strong teams. Our analysis focuses on Super Rugby, as the characteristics of this competition closely match our modelling assumptions. We find that the current allocation of competition points does not ensure that the strongest teams qualify for the playoffs and suggest an alternative. Our findings have implications for other competitions.
In this paper, we consider the National Football League Pick Value Chart and propose an alternative. The current Pick Value Chart was created approximately 20 years ago and has been used since to determine the value of draft selections for trading of draft selections. For this paper, we analyze the first 255 draft selections for the years 1991 to 2001. As part of our analysis, we consider four non-position dependent metrics to measure and model player performance at each of the first 255 draft selections. We perform a nonparametric regression of each performance metric onto player's selections. A comparison is then made between each fitted line and the Pick Value Chart. Having considered these comparisons, we propose an alternative Pick Value Chart.
The aim of this paper is to develop a game-theoretic framework to study the impact of player personnel changes on offensive productivity in American football. We develop a new model, the improvement in passing model, which is used to determine the optimal proportion of run and pass plays that a team should call. The model determines the optimal run/pass balance in terms of parameters that reflect a recent change (generally, in our case, an increase) in efficacy of a team's passing offense. The model assumes a residual positive effect on the team's running game occurs as a result of the improved passing attack. Several conclusions are drawn, most surprisingly that improvements in a team's expected gains via the passing game imply that the team should, in fact, run more frequently to optimize their overall offensive productivity. We conclude with an example studying the 2009 acquisition of Jay Cutler by the Chicago Bears.
Variable Importance Plots 
Confusion Matrix for All Models
Model 2(b) Training Data MDS Visualization 
RF Hitter Vote Percentage and Induction Probability (Training)
Model 3 Training Data MDS Visualization 
We predict the induction of Major League Baseball hitters and pitchers into the National Baseball Hall of Fame by the Baseball Writers’ Association of America. We employ a Random Forest algorithm for binary classification, improving upon past models with a simplistic input approach. Our results suggest that the random forest technique is a fruitful line of research with prediction in the sports world. We find an error rate as low as 0.91% in our most accurate forest, with no out-of-bag Error higher than 2.6% in any tree ensemble. We extend the results to an examination of the possibility of discrimination with respect to BBWAA voting, finding little evidence for exclusions based on race.
Top-cited authors
Joel Oberstone
  • University of San Francisco
Gilbert W Fellingham
  • Brigham Young University - Provo Main Campus
Emerson Franchini
  • University of São Paulo
Giovani Marcon
  • Universidade Federal de São Paulo
Jeremy Arkes
  • Naval Postgraduate School