ArticlePDF Available

Abstract

Using a common framework, this paper presents a survey of the major world sports rating systems (WSRSs) in skiing (sponsored by the International Skiing Federation (FIS)), men's tennis (Association of Tennis Professionals (ATP)), women's tennis (Women's Tennis Association (WTA)), soccer (Federation of International Football Associations (FIFA)) and golf (Royal and Ancient Golf Club of St Andrews). These systems are not otherwise available in the literature. Each of the WSRSs has three phases: first, the observed results are weighted to provide points for each competition; second, these points are combined to provide a seasonal value; third, the seasonal values are combined to provide a rating. The final result or placement (and not the score or time) is the most important factor in determining points for a given competition. In skiing, men's tennis and women's tennis, the rating is calculated from results over one season, while three seasons are used in golf and six seasons are used in soccer. In cross-country skiing and men's tennis, the seasonal value is calculated from the sum of the best values from that season's competitions. In alpine skiing and women's tennis, the sum of all values from that season's competitions is used. In golf and soccer, an averaging process is used. Besides potentially encouraging more entries, a 'best' system and one using all values also generates simple integer ratings rather than decimal ratings as are obtained with an averaging system. The simplest system is that of FIS in skiing, where one table of points is used for all alpine and cross-country disciplines. In contrast, considering that soccer (as a sport) prides itself on the simplicity of the game, it is surprising that FIFA's system is so complex, It is also surprising in soccer that a 'friendly' (often a pick-up exhibition used for player development) counts two-thirds as much as does a World Cup final played before a worldwide TV audience. It is hoped that this survey will serve as a valuable resource for those studying sports rating systems.
... T he XXiii olympic Winter Games in pyeongchang, South Korea, february 9-25, 2018 feature 15 disciplines, i.e. alpine skiing (11), biathlon (11), bobsleigh (3), cross-country skiing (12), curling (3), figure skating (5), freestyle skiing (10), ice hockey (2), luge (4), Nordic combined (3), short track speed skating (8), skeleton (2), ski jumping (4), snowboarding (10), speed skating (14), where the numbers in brackets are the number of events in each discipline. ...
... regarding bullet point 1, this article points out that mass a third work 5 surveys the major world sports rating systems in skiing, men's and women's tennis, soccer and golf. The author identifies three phases. ...
... This latter procedure gave only three references, presented in the next section. [3][4][5] Thereafter 54 studies for cross-country skiing, 13 studies for biathlon, nine studies for Nordic combined, and 15 studies for speed skating are reviewed. The studies are intended to be representative of the research domain, not exhaustive. ...
Article
Full-text available
Cross-country skiing, biathlon, Nordic combined, short track speed skating, and speed skating (12+11+3+8+14=48 events), i.e. five of the 15 disciplines in the 2018 Winter Olympics, require participants to reach the finish line in minimum time, while exerting mechanical propulsion power through flat terrain, uphill, and downhill. This article compares distances and times for these disciplines systematically with each other and with running, walking, and swimming in the Summer Olympics. Regarding physiological implications, the absence of distances below 6 km in biathlon, 5 km in Nordic combined, 1.2-1.5 km in cross-country skiing, and 0.5 km in speed skating means recruiting fewer competitors with sprint characteristics (type IIx fast isoforms muscles, etc.). The absence of distances above 10 km in speed skating and Nordic combined, and 20 km in biathlon, means recruiting fewer or other kinds of competitors with long distance characteristics. For example, high anaerobic threshold is important at greater distances, and high VO2max is important above intermediate distances. A new recruitment criterion for Olympic events is proposed, argued to recruit athletes fairly and be fair to spectators. The new criterion supplements current criteria such as popularity, relevance, and cooperation. The article recommends assessing 26 new events for future Winter Olympics within the five disciplines, equivalently for men and women. Formats are specified for the new events. Regarding equal distances for men and women, women use 8.7-13.6% more time than men in most events, except when upper-body power is important (above 13.6%) and in ultraendurance events (below 5.3%).
... A representative overview of such methods is beyond the scope of this work-for this reason, and for reasons we will describe below, we will focus mainly on rankings for football, particularly for the National Collegiate Athletic Association Football Bowl Subdivision (NCAAF FBS). Stefani (1997) provides a relatively extensive overview of systems across all sports. ...
Preprint
Full-text available
Ultimate frisbee is one of the fastest-growing sports in the world. In the United States, the governing body USA Ultimate uses a custom power rating system to determine bid allocations for various competitive tournaments. However, this rating system has significant flaws and leaves room for improvement. In this paper, we apply the least squares rating system and demonstrate its improvement over the current system both qualitatively and relative to a number of quantitative metrics.
... In the multi-armed bandit problem (Katehakis and Veinott, 1987) the goal is to maximise total payoffs from slot machines with unknown rewards distributions. In sports (including e-sports), the issue of designing an efficacious tournament in selecting the best participants among competing individuals or teams occurs naturally (Langville and Meyer, 2012;Lasek et al., 2016;Stefani, 1997). ...
... The order of the ratings determines a ranking, or in many cases some function of the comparative ratings gives a probability of victory. Stefani, R. T gives a summary of the various rating systems that are used, and define two main categories, accumulative and adjustive schemes [12,13]. The most well-known in tennis is the ad hoc ATP rating system which underpins the world tennis ranking that determines seeding for most tournaments and end of year world ranking. ...
Article
Full-text available
This paper describes a rating system that has been implemented for a suburban doubles tennis competition. Doubles results are first pre-processed to remove a partner effect and produce a measure of performance of each player against his direct opponent. Ratings are then produced which optimise the fit of an additive model to the season's results. The ratings produced generally correspond with subjective views of player ability, have a good correlation with within team ranking, and demonstrate reasonable consistency from year to year. The ratings are now used by the association to assist in promotion and relegation and the placement of teams in a section comparable with their overall ability. An exponential smoothing method which is easier to implement and has wider application is currently being tested.
... A comprehensive overview of official rating systems in sport was written up by Stefani in 1997 [53] and updated in 2011 [54]. It was proposed that rating systems fell in one of three categories -subjective, accumulative, and adjustive. ...
Thesis
Full-text available
This thesis creates a new method for assessing player performance in Australian football, specifically for an application to the Australian Football League (AFL). Data have been captured by Champion Data as the official statistics provider to the AFL since 1999. More than 100 event types are recorded for each game, such as possessions, disposals, tackles and goals. Up to 50 variables are available to describe the context, quality and location of such events. A full investigation of this information was performed to extract relevant traits of the data, and to improve on existing methods of analysis and available knowledge within the industry. Two outcomes of this exploratory analysis were a visual representation of data through the use of heatmaps, and a new measure for kicking ability called ’kick rating’. Heatmaps have been used by AFL clubs since 2008 as a way of communicating information to players and extracting information about team and player tactical traits. Kick ratings have been used by AFL clubs since 2011 as a development tool for their own players, and for scouting of opposition players. The data were also compiled into a player performance measure (‘Equity Ratings’), where the spatial locations of each event acted as a basis for evaluation. For each event, the rating system calculates the equity of the player’s team before and after the event to measure the objective contribution that the event made to a team’s position. i Players who consistently improve the position of their team are rewarded with a large positive rating. Worsening the position of the team results in a negative rating. In this manner, the quality of a player’s involvements will be considered more important than previous rating systems which placed more emphasis on the quantity of a player’s involvements in a game. Equity Ratings proved to be the most comprehensive measure of player performance that has been applied to AFL - showing decreased bias towards midfield players and a higher correlation to match results. A cross-season evaluation of player performance was then compiled to assess long-range performance. This combined measure has been adopted by the league as the ‘Official AFL Player Ratings’, hosted on the official website of the league (afl.com.au) and through the League’s official mobile and tablet apps.
... In the multi- armed bandit problem (Katehakis and Veinott, 1987) the goal is to maximise total payoffs from slot machines with unknown rewards distributions. In sports (including e-sports), the issue of designing an efficacious tournament in selecting the best partic- ipants among competing individuals or teams occurs naturally (Langville and Meyer, 2012;Lasek et al., 2016;Stefani, 1997). Thus, the efficacy of competition formats is among the most important criteria taken into account in tournament design. ...
Article
The efficacy of different league formats in ranking teams according to their true latent strength is analysed. To this end, a new approach for estimating attacking and defensive strengths based on the Poisson regression for modelling match outcomes is proposed. Various performance metrics are estimated reflecting the agreement between latent teams’ strength parameters and their final rank in the league table. The tournament designs studied here are used in the majority of European top-tier association football competitions. Based on numerical experiments, it turns out that a two-stage league format comprising of the three round-robin tournament together with an extra single round-robin is the most efficacious setting. In particular, it is the most accurate in selecting the best team as the winner of the league. Its efficacy can be enhanced by setting the number of points allocated for a win to two (instead of three that is currently in effect in association football).
Thesis
Jeder Sportler und jede Sportlerin bewertet im Anschluss eines Wettbewerbs seine bzw. ihre eigene Leistung. Diese Bewertung hat zahlreiche wichtige Konsequenzen, nicht nur für die unmittelbar empfundenen Emotionen, sondern beispielsweise auch für die mittel und langfristige sportliche Motivation. Bisherige Arbeiten zeigen, dass für diese Bewertung nicht nur das objektive Endresultat eine Rolle spielt, sondern beispielsweise auch die Leichtigkeit, mit der man sich andere Ausgänge des Wettkampfs vorstellen kann oder die vorherigen eigenen Erwartungen an die zu erbringende Leistung. Eine in der Literatur bisher wenig beachtete mögliche Einflussgröße ist die psychologische Distanz zu der bewerteten Leistung. Nach der Construal Level Theory von Liberman und Trope (2010) geht eine hohe psychologischen Distanz mit einem hohen Grad an mentaler Abstraktion einher und umgekehrt. Das Ausmaß an mentaler Abstraktion kann wiederum dazu führen, dass dieselbe Situation sehr unterschiedlich bewertet wird. In der vorliegenden Dissertation wird eine mögliche Rolle der zeitlichen Distanz auf die Bewertung sportlicher Leistungen untersucht. Dazu wurden vier Studien durchgeführt. In Studie 1 ging es darum, drei zentrale Untersuchungen zum Einfluss der zeitlichen Distanz von Liberman und Trope (1998) und Liberman, Sagristano und Trope (2002) innerhalb eines sportlichen Kontexts zu replizieren, was sowohl die Inhalte der Studie als auch die Versuchsteilnehmer und -teilnehmerinnen (N = 102 Sportstudierende) betraf. Die Überprüfung aus der Construal Level Theory abgeleitete Hypothesen ergab in diesem Kontext allerdings keine der zu erwartenden Ergebnisse. In Studie 2 wurde ein Fragebogen entwickelt und an einer Stichprobe von N = 116 U23 Leichtathleten und -athletinnen getestet, mit dessen Hilfe sich Zufriedenheit von Sportlern und Sportlerinnen über die vollbrachte Leistung nach Beendigung des Wettbewerbs reliabel und valide erfassen lässt. Die Zufriedenheit steht in Abhängigkeit der beiden Variablen Platzierung und Verbesserung, welche High- und Low-Level Konstrukten zugeschrieben werden. Unter anderem zeigte sich, dass die Platzierung den größeren Einfluss auf die Zufriedenheitsbewertung einnimmt. In der Hauptuntersuchung, Studie 3, bewerteten insgesamt N = 204 Athleten und Athletinnen aus Leichtathletik, Schwimmen und Gewichtheben ihre Leistung direkt nach dem Wettkampf und erneut drei Monate nach der ersten Befragung. Nach der Construal Level Theory lässt sich eine systematische Veränderung voraussagen. Bei einer größeren zeitlichen Distanz zum Ereignis sollte die Zufriedenheit mit der Wettkampfleistung stärker durch High-Level Konstrukte getrieben werden als unmittelbar nach dem Wettkampf. Es zeigt sich allerdings, dass über alle Versuchsteilnehmenden hinweg sich weder Platzierung noch Verbesserung signifikant auf die Zufriedenheitsbewertung ausgewirkt haben, weder nach Wettkampfende noch drei Monate danach. Studie 4 stellt einen Zusatz dar, um die beiden Versuchsgruppen der Leichtathleten, U23 (N = 52) vs. Erwachsene (N = 46), miteinander zu vergleichen. Es zeigte sich in der deskriptiven Statistik, dass die jüngere Versuchsgruppe mit einem höheren motivationalen Anreiz an ihren Wettbewerb heran tritt und das einer der Gründe dafür sein kann, warum die Ergebnisse im ersten Messzeitpunkt in Studie 3 zu Studie 2 nicht bestätigt werden konnten. Insgesamt widersprechen die Ergebnisse der vier empirischen Studien dieser Arbeit der Annahme, dass sich die Bewertung von sportlichen Leistungen in Abhängigkeit von der zeitlichen Distanz zum Ereignis in der von der Construal Level Theory postulierten Weise vorhersagen lässt. Möglicherweise überwiegen im sportlichen Kontext andere individuelle bedeutsame Faktoren, die bestimmen, wie Sportler und Sportlerinnen ihre erbrachte Leistung bewerten, so dass sich der postulierte systematische Einfluss des Construal Levels nur schwer nachweisen lässt.
Article
Case 1 best-worst scaling, also known as best-worst scaling or MaxDiff, is a popular method for examining the relative ratings and ranks of a series of items in various disciplines in academia and industry. The method involves a survey respondent indicating the “best” and “worst” from a sample of items across a series of trials. Many methods exist for calculating scores at the individual and aggregate levels. I introduce the bwsTools package, a free and open-source set of tools for the R statistical programming language, to aid researchers and practitioners in the construction and analysis of best-worst scaling designs. This package is designed to work seamlessly with tidy data, does not require design matrices, and employs various published individual- and aggregate-level scoring methods that have yet to be employed in free software.
Chapter
One of the great pleasures of sport is to attend an event accompanied by family and friends and to cheer enthusiastically for a favorite local team. A casual review of the standings of most sports reveals the fact that a home team wins more than it loses and scores more points than the visiting opposition. What are the causes of the relative success of home teams? Do home teams have more advantage in some sports than others? How does playoff and cup home advantage compare to the regular season home advantage? These are some of the questions that I hope to answer in this section. A review of the work of Pollard (1986, 2002, 2006) and Stefani and Clarke (1992) and others suggests three primary factors contributing to home advantage.
Chapter
A casual review of a sports almanac or today’s news reveals a montage of records being broken, technological improvements emerging, political events affecting sports and, unfortunately, the use of performance-enhancing drugs. We wonder where performances are headed. The purpose of this section is to examine certain sports events in a holistic way so as to arrange the montage of facts into a coherent picture. We shall examine how the laws of physics provide insight into those factors by which performances may be improved. We shall examine the underlying political events and technological breakthroughs that strongly influenced performances. Which events and performances shall we examine? If we formulate a taxonomy of sports events (Stefani, 1999), we find two important descriptors: the evaluation method by which a winner is determined and the manner of interaction between an athlete and the opponent. The winning athlete may be evaluated by subjective judging (as in boxing and diving), by arbitrary scoring (as in basketball and shooting), and by unambiguous measurement (as in swimming and running). Competitors interact in the only three ways that two separate objects can interact: by direct, indirect, and independent movement. Competitors interact directly in what may be termed a combat sport, wherein the goal is to control the opponent as in boxing and wrestling. Competitors interact indirectly, in what can be termed an object sport, in that the competitor tries to control some object as in basketball and soccer. Competitors may only have incidental contact in what may be termed an independent sport, such as swimming and rowing, wherein the competitor tries to control the competitor’s own self to succeed. The performance of an athlete in an independent sport is not directly influenced by the opponent as in a combat sport and an object sport. Further, those sports decided by unambiguous measurement provide a better comparison over time than those sports decided by subjective judging. This section focuses on those sports in which the competitor is evaluated by unambiguous measurement in independent events. We shall therefore examine running, jumping, swimming, rowing, and speed skating events. What performance data shall we examine?World records provide a limited amount of evolutionary information. World records are set under ideal conditions, at irregular intervals and are, by definition, always improving (see for example Chapter 2 of this volume). No information is provided when there may be a downturn in performance due to warfare, for example. We shall examine Olympic winning performances for the sports just mentioned, because Olympic competition occurs at regular intervals and winning performances represent the state-of-the-art as of that time. Those winning performances provide a picture into the past and perspective for predicting the future.
Article
Full-text available
This paper investigates fundamental investment strategies to detect and exploit the public's systematic errors in horse race wager markets. A handicapping model is developed and applied to win-betting in the pari-mutuel system. A multinomial logit model of the horse racing process is posited and estimated on a data base of 200 races. A recently developed procedure for exploiting the information content of rank ordered choice sets is employed to obtain more efficient parameter estimates. The variables in this discrete choice probability model include horse and jockey characteristics, plus several race-specific features. Hold-out sampling procedures are employed to evaluate wagering strategies. A wagering strategy that involves unobtrusive bets, with a side constraint eliminating long-shot betting, appears to offer the promise of positive expected returns, even in the presence of the typically large track take encountered at Thoroughbred racing events. [Reprinted in Efficiency of Racetrack Betting Markets ISBN 0-12-333030-0]
Article
Class handicapping methods enable different classes of athletes to compete on equal terms. Different sports use a variety of algorithms, which are usually based on historical data and subjective opinions. A recent proposal is to use an interactive shrinkage method for class handicapping, as this is generic across sports and uses data from the current competition only. This article presents a mathematical justification of the interactive shrinkage method for class handicapping, based on an objective Bayesian analysis of a suitable probability model. It also investigates how this approach performs in the context of paralympic sports, by analyzing actual competition data and comparing the results with those from existing schemes. Our findings suggest that this method is robust, convenient and fair. We provide a discussion in this article to explore possible extensions of this procedure.
Article
Results on mixed linear models were used to develop a procedure for predicting the outcomes of National Football League games. The predictions are based on the differences in score from past games. The underlying model for each difference in score takes into account the home-field advantage and the difference in the yearly characteristic performance levels of the two teams. Each team's yearly characteristic performance levels are assumed to follow a first-order autoregressive process. The predictions for 1,320 games played between 1971 and 1977 had an average absolute error of 10.68, compared with 10.49 for bookmaker predictions.
Article
An exponential smoothing technique operating on the margins of victory was used to predict the results of Australian Rules football matches for a Melbourne daily newspaper from 1981-86 and again for a competitor in 1991-92. An initial 'quick and dirty' program used only a factor for team ability and a common home ground advantage to predict winning margins. Probabilities of winning were accumulated to predict a final ladder, with a simulation to predict chances of teams finishing in any position. Changes to the competition forced a more complicated approach, and the current version uses several parameters which allow for ability, team/ground interaction, team interaction, and a tendency for team ability to regress towards the mean between seasons. A power method is used to place greater weight on the errors in closer matches, and errors across the win-lose boundary. While simple methods were used originally, the Hooke and Jeeves method was used in optimizing the parameters of the current model. Both the original model and the improved version performed at the level of expert tipsters.
Article
It has long been customary to measure the adequacy of an estimator by the smallness of its mean squared error. The least squares estimators were studied by Gauss and by other authors later in the nineteenth century. A proof that the best unbiased estimator of a linear function of the means of a set of observed random variables is the least squares estimator was given by Markov [12], a modified version of whose proof is given by David and Neyman [4]. A slightly more general theorem is given by Aitken [1]. Fisher [5] indicated that for large samples the maximum likelihood estimator approximately minimizes the mean squared error when compared with other reasonable estimators. This paper will be concerned with optimum properties or failure of optimum properties of the natural estimator in certain special problems with the risk usually measured by the mean squared error or, in the case of several parameters, by a quadratic function of the estimators. We shall first mention some recent papers on this subject and then give some results, mostly unpublished, in greater detail.
Article
This study examined the influence of opposition team formation on physical and skill-related performance in a professional soccer team. Performance in 45 French League 1 matches played over three competitive seasons (2007–2008, 2008–2009, and 2009–2010) was analysed using multi-camera computerized tracking. Players (n=21) in the reference team (using a 4-3-3/4-5-1 formation) were analysed in matches against three opposition team formations: 4-4-2 (11 games), 4-3-3/4-5-1 (16 games), and 4-2-3-1 (18 games). Performance was compared for defending and midfield units as a whole and individually across four positions: full backs, central defenders, central midfielders, and wide midfielders. Collectively, players covered a greater total distance (P
Article
A parametric model is developed and fitted to English league and cup football data from 1992 to 1995. The model is motivated by an aim to exploit potential inefficiencies in the association football betting market, and this is examined using bookmakers' odds from 1995 to 1996. The technique is based on a Poisson regression model but is complicated by the data structure and the dynamic nature of teams' performances. Maximum likelihood estimates are shown to be computationally obtainable, and the model is shown to have a positive return when used as the basis of a betting strategy.