Sports knowledge management and data mining.
-
Citations (0)
-
Cited In (0)
Page 1
Sports Knowledge Management and Data Mining
Robert P. Schumaker1, Osama K. Solieman2 and Hsinchun Chen3
1Information Systems Dept, Iona College, New Rochelle, New York 10801, USA
rschumaker@iona.edu
26015 N. Mardelle Circle, Tucson, Arizona 85704, USA
osolieman@gmail.com
3Artificial Intelligence Lab, Department of Management Information Systems
The University of Arizona, Tucson, Arizona 85721, USA
hchen@eller.arizona.edu
Word Count: 17,721
Page 2
2
Introduction
Vast amounts of sports data are routinely collected about players, coaching decisions and
game events. Making sense of this data is important to those seeking an edge. By transforming
this data into actionable knowledge, scouts, managers and coaches can have a better idea of what
to expect from opponents and be able to use a player draft more effectively. With millions of
dollars riding on the many decisions made within a sports franchise (Lewis, 2003), the sports
environment is ideal for data mining and knowledge management approaches. While the
application these approaches to the sports environment may be unique and the focus of this
chapter, the topics of data mining and knowledge management should certainly be well known to
the reader and form the basis of the approaches we discuss.
Background and Motivation
Before the advent of data mining and knowledge management techniques, sports
organizations relied almost exclusively on human expertise. It was believed that these domain
experts (coaches, managers and scouts) could effectively convert their collected data into usable
knowledge. As the different types of data collected grew in scope, these organizations sought to
find more practical methods to make sense of what they had. This led first to the employment of
in-house statisticians who created better measures of performance and better decision-making
criteria. One way that these measures were used was to augment the decision-making of domain
experts with additional knowledge and provide them with a competitive advantage. Armed with
this knowledge, it was not a far step for sporting organizations and fans alike to begin harnessing
more practical methods of extracting knowledge using data mining techniques. These newer
techniques allowed organizations to begin to predict particular player matchups and/or forecast
Page 3
3
how a player may perform under specific conditions. Sports organizations were sitting on a
wealth of data and needed ways to harness it.
The primary knowledge management and data mining techniques that can be used by sports
organizations include statistical analysis, pattern discovery and outcome prediction. A variety of
non-typical sports data can be similarly monitored including injury likelihood. One such
example is a biomedical tool piloted by AC Milan, an Italian professional soccer club, which
uses software to monitor workouts that helps to predict player injuries (Flinders, 2002). Another
example is software used to monitor sports betting locales for unusual bets which may signal
corrupt officiating or players that are compromised (Audi & Thompson, 2007). Similarly, data
mining researchers have found that physical aptitude correlates to anticipated physical
performance (Fieltz & Scott, 2003). Every year the National Football League (NFL) conducts a
“Combine” where prospective college draft players are run through a series of physical drills in
front of team scouts and coaches. The Combine also includes a mental evaluation of players
called the Wonderlic Personnel Test, which assesses the intellectual capacity of prospects. The
NFL has developed expected Wonderlic scores based on amount of intelligence required to play
a particular position; e.g., a quarterback who has to make a myriad of on-field decisions should
have a higher Wonderlic score (24), than a halfback (16) whose job is to run the ball
(Zimmerman, 1985).
Sport statistics, by themselves can be misleading without an understanding of their
fundamental meaning. This comes from either imprecise measurement of an event or the sports
community’s misuse and over reliance on particular statistics. As evidence, consider the fact that
certain players can build impressive individual statistics yet have little impact on the
performance of the team. The impreciseness of sports statistics can be best illustrated by
Page 4
4
baseball’s Runs Batted In (RBI) statistic which has been long heralded as a cornerstone of
evaluating player contribution. Developed by British-born journalist Henry Chadwick during the
mid-1800s, the RBI was an attempt to quantify game events and attribute them to particular
players (Lewis, 2003). While Chadwick was more familiar with cricket than baseball and had an
incomplete understanding of the game, he managed to popularize his statistics which were never
seriously questioned until the latter half of the 20th century. The RBI’s imprecise measurement
can be summed up in the following thought experiment. Suppose two players had the same
batting average, meaning that they hit the ball with the same percentage of success. Further
suppose that both players are not power hitters but routinely hit for singles, advancing
themselves and their teammates one base at a time. The RBI is then dependent upon the actions
of those who that batted before them. If team members were able to routinely get on base for
one of these players and not for the other, then the first of our hypothetical players would be
credited with RBIs when their teammates crossed home plate as a consequence of the player’s
hits. The second of our hypothetical players would not receive any RBIs, even though both
players performed the exact same actions. Basing a player’s value on RBI statistics alone would
be a misleading indicator of performance. Besides impreciseness in measuring player
productivity, the sports community has overvalued the RBI as a measurement of performance in
contract negotiations and player comparisons. It wasn’t until pioneering baseball statistician Bill
James began questioning the RBI, that better measurements arose such as the On Base
Percentage (OBP) which measures how often a player gets on-base.
Another difficulty with the use of sports statistics is how to measure risk. In American
football, a defensive back can either stay in mid-field and attempt to intercept the ball or play
solid cover defense. In the first instance, the player is taking a risk which can quickly change the
Page 5
5
momentum of the game whereas in the second instance, the player is playing it safe. However,
by being successful at taking risks and making interceptions, there is a greater perceived player
value. Quantifying risk taking behavior is a difficult problem.
Another example of statistical imprecision is the measurement the number of defensive
rebounds off missed free-throws in Basketball. In order to get a defensive rebound, teammates
must block out opposing players and in doing so, they typically cannot get the rebound although
their actions arguably make them just as important in the accomplishment (Ballard, 2006).
However, given the way in which rebounds are measured; only the player who gets the ball is
credited with the rebound.
In this chapter, we propose a Sports Knowledge Management framework to categorize the
different methods sports organizations use to uncover new knowledge and better value player
contributions. From this, we will highlight measurement inadequacies and showcase techniques
to make better usage of data collected in a wide domain of sport and sport-related specialties.
Properly leveraging Sports Knowledge Management techniques can result in better team
performance by matching players to certain situations, identifying individual player
contributions, evaluating the tendencies of the opposition and exploiting any weaknesses.
For these reasons, there should be no surprise that many sports organizations are
revolutionizing themselves. The traditional decision-making approach of using intuition or gut
instincts is falling out of favor. Instead, assessments are being made on the basis of strong
analysis and scientific exploration. With more and more sports organizations embracing the
digital era, it may soon become be a battle of the better algorithm or measurement used, where
back-office analysts may become just as important as the players on the field.