Preprint

Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the author.

Abstract

In baseball, a scouting report profiles a player's characteristics and traits, usually intended for use in player valuation. This work presents a first-of-its-kind dataset of almost 10,000 scouting reports for minor league, international, and draft prospects. Compiled from articles posted to MLB.com and Fangraphs.com, each report consists of a written description of the player, numerical grades for several skills, and unique IDs to reference their profiles on popular resources like MLB.com, FanGraphs, and Baseball-Reference. With this dataset, we employ several deep neural networks to predict if minor league players will make the MLB given their scouting report. We open-source this data to share with the community, and present a web application demonstrating language variations in the reports of successful and unsuccessful prospects.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The Natural Language Toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in com- putational linguistics and natural language processing. NLTK is written in Python and distributed under the GPL open source license. Over the past year the toolkit has been rewritten, simplifying many linguis- tic data structures and taking advantage of recent enhancements in the Python lan- guage. This paper reports on the simpli- fied toolkit and explains how it is used in teaching NLP.
Projecting Hitting Prospects' MLB Primes
  • Chris Carruthers
Carruthers, Chris, "Projecting Hitting Prospects' MLB Primes," Breaking Blue, 2014.
Understanding Convolutional Neural Networks for Text Classification
  • Alon Jacovi
  • Oren Sar Shalom
  • Yoav Goldberg
Jacovi, Alon, Oren Sar Shalom, and Yoav Goldberg, "Understanding Convolutional Neural Networks for Text Classification," in "BlackboxNLP@EMNLP" 2018.
Our Deep Dive Into 73,000 Never-Before-Seen MLB Scouting Reports
  • Ben Lindbergh
  • Rob Arthur
Lindbergh, Ben and Rob Arthur, "Our Deep Dive Into 73,000 Never-Before-Seen MLB Scouting Reports," The Ringer, 2019.
NHL Prospect Classifier
  • Matthew Liu
Liu, Matthew, "NHL Prospect Classifier," https://github.com/mattjliu/NHL-Prospect-Classifier 2019.
  • Eric Longenhagen
  • Kiley Mcdaniel
Longenhagen, Eric and Kiley McDaniel, "The New FanGraphs Scouting Primer," FanGraphs, 2018.
KATOH: Forecasting Major League Hitting with Minor League Stats
  • Chris Mitchell
Mitchell, Chris, "KATOH: Forecasting Major League Hitting with Minor League Stats," The Hardball Times, 2014.
Text Mining of Scouting Reports as a Novel Data Source for Improving NHL Draft Analytics
  • Timo Seppa
  • Michael E Schuckers
  • Mike Rovito
Seppa, Timo, Michael E Schuckers, and Mike Rovito, "Text Mining of Scouting Reports as a Novel Data Source for Improving NHL Draft Analytics."