ThesisPDF Available

Modelos predictivos con aplicación en el béisbol

Authors:

Abstract and Figures

El aprendizaje automático de datos deportivos constituye un área de investigación novedosa. Las tareas de predicción han acaparado la atención en el contexto deportivo debido sobre todo a los intereses del mercado y sus amplias aplicaciones como apoyo en la toma de decisiones. Actualmente se dispone de una gran cantidad de datos y registros históricos de este tipo. El béisbol es reconocido por ser uno de los deportes que mayor cantidad de datos estadísticos genera durante cada partido. La sabermetría se ha consolidado como una tendencia novedosa en el estudio de este deporte. En el presente trabajo se abordan dos de los principales problemas predictivos del béisbol haciendo uso de métodos del aprendizaje automático. El primero de ellos es la predicción de resultados de juegos, mientras que el segundo consiste en la predicción del desempeño de los lanzadores abridores. Para su solución se proponen dos modelos pertenecientes al paradigma del aprendizaje automático supervisado. Las propuestas incluyen la utilización tanto de métodos de aprendizaje tradicionales como de un modelo basado en la clasificación de series de tiempo. Para evaluar las propuestas se realizan experimentos empleando conjuntos de datos reales de juegos de la MLB. Los resultados obtenidos demuestran la viabilidad del uso de los métodos del aprendizaje automático para dar solución a problemas de predicción en el deporte.
Content may be subject to copyright.
A preview of the PDF is not available
... La sabermetría es el análisis objetivo y científico del béisbol, (Soto Valero & González Castellanos, 2015;Paz-Antunez & Martínez-Laffita, 2018;Soto Valero, 2016) basado fundamentalmente en las estadísticas. Este concepto fue popularizado en 1980, por George William James, derivado del acrónimo SABR en referencia a la Sociedad para la Investigación del Béisbol Americano. ...
Article
Full-text available
The analysis of data using recent advances in information and communication technologies (ICTs), is an indispensable tool for decision making in different sectors of society. Sport is one of the leading industries in its use. Currently, professional sports teams present Departments of Analysis aimed at the study of the opposite, self-analysis, scouting and other functions offering scientific support to decision-making. Baseball is a highly complex collective sport. This complexity favors the extraction of a large amount of data of various kinds related to the game, which makes baseball one of the most complete sports in terms of statistics. These characteristics have made baseball the protagonist of the application of data analysis and ICTs as ways to obtain competitive advantages. The teams that participate in the National Baseball Series, the most demanding competition in Cuban baseball, do not have the necessary support in terms of analyzing their own information and those of their rivals. This undermines the results and does not allow the overcoming of the directors and players as far as tactical thinking is concerned. To meet the needs mentioned above, this paper describes the implementation of the principles and methods of Sports Intelligence in Cuban baseball, supported by ICTs and sabermetrics. In this sense, the document objective is to improve the decision-making processes in Cuban baseball based on sports intelligence, for which the document will outline the first indispensable theoretical steps.
Conference Paper
Full-text available
Major League Baseball, a professional baseball league in the US and Canada, is one of the most popular sports leagues in North America. Partially because of its popularity and the wide availability of data from games, baseball has become the subject of significant statistical and mathematical analysis. Pitch analysis is especially useful for helping a team better understand the pitch behavior it may face during a game, allowing the team to develop a corresponding batting strategy to combat the predicted pitch behavior. We apply several common machine learning classification methods to PITCH f/x data to classify pitches by type. We then extend the classification task to prediction by utilizing features only known before a pitch is thrown. By performing significant feature analysis and introducing a novel approach for feature selection, moderate improvement over former results is achieved.
Chapter
Full-text available
Baseball, which is one of the most popular sports in the world, has a uniquely discrete gameplay structure. This stop-and-go style of play creates a natural ability for fans and observers to record information about the game in progress, resulting in a wealth of data that is available for analysis. Major League Baseball (MLB), the professional baseball league in the US and Canada, uses a system known as PITCHf/x to record information about every individual pitch that is thrown in league play. We extend the classification to pitch prediction (fastball or nonfastball) by restricting our analysis to pre-pitch features. By performing significant feature analysis and introducing a novel approach for feature selection, moderate improvement over published results is achieved.
Book
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Book
Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website at http://www.cs.waikato.ac.nz/ml/weka/book.html It contains Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc. Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface Includes open-access online courses that introduce practical applications of the material in the book.
Article
A home advantage in sport competitions has been well documented. The strength and consistency of the home advantage has made it a popular phenomenon in sport today. Very little systematic research has been carried out, however, and the home advantage remains one of the least understood phenomena in sport. It appears that much of the game location research has been arbitrary, and a clear sense of direction is lacking. The purpose of the present paper is to provide a conceptual framework to organize a comprehensive review of previous game location research and provide direction for future research. The review of literature indicated that the descriptive phase of inquiry has been completed, and it is time to address the underlying mechanisms responsible for the manifestation of the home advantage. Possible methodologies and areas of inquiry are highlighted and discussed.
Article
This paper presents work on using Machine Learning approaches for predicting performance patterns of medalists in Track Cycling Omnium championships. The omnium is a newly introduced track cycling competition to be included in the London 2012 Olympic Games. It involves six individual events and, therefore, requires strategic planning for riders and coaches to achieve the best overall standing in terms of the ranking, speed, and time in each individual component. We carried out unsupervised, supervised, and statistical analyses on the men’s and women’s historical competition data in the World Championships since 2008 to find winning patterns for each gender in terms of the ranking of riders in each individual event. Our results demonstrate that both sprint and endurance capacities are required for both men and women to win a medal in the omnium. Sprint ability is shown to have slightly more influence in deciding the medalists of the omnium competitions.