Conference Paper

The Wisdom of Bookies? Sentiment Analysis Versus. the NFL Point Spread.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The American Football betting market provides a particularly attractive domain to study the nexus between public senti- ment and the wisdom of crowds. In this paper, we present the first substantial study of the relationship between the NFL betting line and public opinion expressed in blogs and mi- croblogs (Twitter). We perform a large-scale study of four distinct text streams: LiveJournal blogs, RSS blog feeds captured by Spinn3r, Twit- ter, and traditional news media. Our results show interesting disparities between the first and second halves of each season. We present evidence showing usefulness of sentiment on NFL betting. We demonstrate that a strategy betting roughly 30 games per year identified winner roughly 60% of the time from 2006 to 2009, well beyond what is needed to overcome the bookie's typical commission (53%).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As a result of this, industrial activities have flourished in recent years. Some of the notable application-oriented research works in the past decade include [2,[28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45]. A list of research works in this domain is in presented in Chap. ...
... In [28], the Facebook likes were taken into consideration whereby the users' behaviour is predicted based on their online activities. The Bayesian statistics towards detection of latent attributes for topical models was used in [29]. The online users' behaviour was used in [30]. ...
... Chen et al. [28] trained a deep neural network model DeepSentiBank considering Caffe towards the classification of the visual sentiments. Xu et al. [29] utilized pre-trained deep neural network [26] on ILSVRC 2012 dataset and then transferred parameters learned towards sentiment prediction in visual mode. Wang et al. [30] fine-tuned CNN on Getty Images towards visual sentiment analysis, and a paragraph vector model was trained towards textual sentiment analysis. ...
Chapter
Full-text available
This chapter highlights several social network datasets that are utilized in performing the experiments. Here, the experimental datasets are taken from four social network sites, viz. Twitter, Instagram, Viber and Snapchat.
... As a result of this, industrial activities have flourished in recent years. Some of the notable application-oriented research works in the past decade include [2,[28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45]. A list of research works in this domain is in presented in Chap. ...
... In [28], the Facebook likes were taken into consideration whereby the users' behaviour is predicted based on their online activities. The Bayesian statistics towards detection of latent attributes for topical models was used in [29]. The online users' behaviour was used in [30]. ...
... Chen et al. [28] trained a deep neural network model DeepSentiBank considering Caffe towards the classification of the visual sentiments. Xu et al. [29] utilized pre-trained deep neural network [26] on ILSVRC 2012 dataset and then transferred parameters learned towards sentiment prediction in visual mode. Wang et al. [30] fine-tuned CNN on Getty Images towards visual sentiment analysis, and a paragraph vector model was trained towards textual sentiment analysis. ...
Chapter
Full-text available
The experimental setup consists of performing visual and text sentiment analysis through hierarchical based deep learning networks. A brief discussion on the deep learning networks is presented for the interested readers. The cross-media bag-of-words model (CBM) is used as the baseline method. The basic aspects of the gated feedforward recurrent neural networks (GFRNN) are illustrated. The mathematical abstraction of HGFRNN is vividly explained. The chapter concludes with hierarchical gated feedforward recurrent neural networks for multimodal sentiment analysis.
... As a result of this, industrial activities have flourished in recent years. Some of the notable application-oriented research works in the past decade include [2,[28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45]. A list of research works in this domain is in presented in Chap. ...
... In [28], the Facebook likes were taken into consideration whereby the users' behaviour is predicted based on their online activities. The Bayesian statistics towards detection of latent attributes for topical models was used in [29]. The online users' behaviour was used in [30]. ...
... Chen et al. [28] trained a deep neural network model DeepSentiBank considering Caffe towards the classification of the visual sentiments. Xu et al. [29] utilized pre-trained deep neural network [26] on ILSVRC 2012 dataset and then transferred parameters learned towards sentiment prediction in visual mode. Wang et al. [30] fine-tuned CNN on Getty Images towards visual sentiment analysis, and a paragraph vector model was trained towards textual sentiment analysis. ...
Chapter
Full-text available
The experimental results are highlighted in this chapter using Twitter, Instagram, Viber and Snapchat datasets. HGFRNN is evaluated through 2-class (+ve, −ve) as well as 3-class (+ve, −ve, unbiased) propositions.
... As a result of this, industrial activities have flourished in recent years. Some of the notable application-oriented research works in the past decade include [2,[28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45]. A list of research works in this domain is in presented in Chap. ...
... In [28], the Facebook likes were taken into consideration whereby the users' behaviour is predicted based on their online activities. The Bayesian statistics towards detection of latent attributes for topical models was used in [29]. The online users' behaviour was used in [30]. ...
... Chen et al. [28] trained a deep neural network model DeepSentiBank considering Caffe towards the classification of the visual sentiments. Xu et al. [29] utilized pre-trained deep neural network [26] on ILSVRC 2012 dataset and then transferred parameters learned towards sentiment prediction in visual mode. Wang et al. [30] fine-tuned CNN on Getty Images towards visual sentiment analysis, and a paragraph vector model was trained towards textual sentiment analysis. ...
Chapter
Full-text available
There has been a wide array of domains ranging from fast-moving consumer products to political events where sentiment analysis has numerous applications. Several large companies have their own in-built capabilities in this area. These innumerable applications and interests have been the driving source towards sentiment analysis research. Several social networks and microblogs have provided strong platforms for users’ information exchange and communication. The social networks and microblogs provide trillions of pieces of multimodal information.
... As a result of this, industrial activities have flourished in recent years. Some of the notable application-oriented research works in the past decade include [2,[28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45]. A list of research works in this domain is in presented in Chap. ...
... In [28], the Facebook likes were taken into consideration whereby the users' behaviour is predicted based on their online activities. The Bayesian statistics towards detection of latent attributes for topical models was used in [29]. The online users' behaviour was used in [30]. ...
... Chen et al. [28] trained a deep neural network model DeepSentiBank considering Caffe towards the classification of the visual sentiments. Xu et al. [29] utilized pre-trained deep neural network [26] on ILSVRC 2012 dataset and then transferred parameters learned towards sentiment prediction in visual mode. Wang et al. [30] fine-tuned CNN on Getty Images towards visual sentiment analysis, and a paragraph vector model was trained towards textual sentiment analysis. ...
Chapter
Full-text available
In this research, a novel hierarchical GFRNN-based model for analysing sentiments on multimodal content is presented. Giving due consideration for leveraging huge volume of blog contents available towards sentiment analysis, multimodal techniques are utilized here. The learning algorithm of GFRNN is based on different timescales which work as temporal convolution, and it is basically 1D convolution which is similar to 2D spatial convolution. © 2019, The Author(s), under exclusive to Springer Nature Singapore Pte Ltd.
... As a result of this, industrial activities have flourished in recent years. Some of the notable application-oriented research works in the past decade include [2,[28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45]. A list of research works in this domain is in presented in Chap. ...
... In [28], the Facebook likes were taken into consideration whereby the users' behaviour is predicted based on their online activities. The Bayesian statistics towards detection of latent attributes for topical models was used in [29]. The online users' behaviour was used in [30]. ...
... Chen et al. [28] trained a deep neural network model DeepSentiBank considering Caffe towards the classification of the visual sentiments. Xu et al. [29] utilized pre-trained deep neural network [26] on ILSVRC 2012 dataset and then transferred parameters learned towards sentiment prediction in visual mode. Wang et al. [30] fine-tuned CNN on Getty Images towards visual sentiment analysis, and a paragraph vector model was trained towards textual sentiment analysis. ...
Chapter
Full-text available
he information on text has been analysed rigorously in several areas pertaining to business decision-making (Miller et al. in Sentiment flow through hyperlink networks, pp 550–553, 2011, [1]). A tweet for images is shown in Fig. 5.1. The visual information analysis covering information retrieval from images has not made much progress relatively. Several studies have suggested that more than one-third of social blogs’ data are images. © 2019, The Author(s), under exclusive to Springer Nature Singapore Pte Ltd.
... By harnassing the wisdom of the crowds [2], a number of researchers have tried to prove that social media form a valuable source of information for predicting the outcome of sports games. Hong and Skiena [3] tried to predict the winner of American football (NFL) games using sentiment analysis, while Sinha et al. [4] tried to predict the winner with the spread and the over/under line using the Twitter volume and text unigrams. On the other hand, Uz-Zaman et al. [5] tried to predict the results of the World Cup Soccer 2010 using a context-free grammar for parsing microposts. ...
... To predict the point spread of NFL games, Hong and Skiena [3] applied sentiment analysis on a number of data sources, including microposts taken from Twitter. However, the method used simply relies on word lists of positive and negative words for predicting the sentiment within microposts. ...
... Number of correctly predicted outcomes for the last 50 games, for match days 29-34.3 ...
Conference Paper
Full-text available
In this paper, we investigate the feasibility of using collec-tive knowledge for predicting the winner of a soccer game. Specifically, we developed different methods that extract and aggregate the information contained in over 50 million Twitter microposts to predict the outcome of soccer games, considering methods that use the Twitter volume, the sen-timent towards teams and the score predictions made by Twitter users. Apart from collective knowledge-based pre-diction methods, we also implemented traditional statistical methods. Our results show that the combination of different types of methods using both statistical knowledge and large sources of collective knowledge can beat both expert and bookmaker predictions. Indeed, we were for instance able to realize a monetary profit of almost 30% when betting on soccer games of the second half of the English Premier League 2013-2014.
... The other caveat noted in some of these analyses is that the predictive value of these may vary over time-in particular a couple of analyses [52,59] found that using social media had particular predictive value in the latter half of sports seasons. ...
... Across all of the themes, there is early encouraging evidence of convergence in studies looking at issues such as advertising [52,58]. There is however a need to establish the replicability of these findings, illustrated using the sentiment literature. ...
Article
Full-text available
Purpose of Review Social media enables a range of possibilities in the way gamblers and gambling operators interact and content communicate with gambling. The purpose of this systematic review was to synthesise the extant literature to identify the ways in which social media has been investigated in the context of gambling. Recent Findings A systematic review of the literature identified 41 papers that collected primary data pertinent to gambling and social media from multiple disciplines. These papers broadly fell into three themes: communication, community and calculation (of sentiment). Papers on communication focused on the content of gambling advertising on social media and the impact on people exposed to it. Studies of gambling communities studied the activity and structures of discussion groups on social media concerning recreational or problematic gambling. Papers on calculation collated social media data to assess sentiment and compared it against betting odds. Summary There is an emerging multidisciplinary literature that has looked at the use of social media in relation to gambling. There is preliminary evidence that the content and the reach of gambling advertising on social media is a source of concern, particularly for younger people. The themes discussed on gambling support forums appear to be common across communities, focusing on negative emotions, recovery, addictive products and financial support. Using social media to assess sentiment appears to be particularly effective at identifying potential upsets in sporting matches. Future suggestions for research are explored.
... Twitter effectively takes part in any mega event happening around the world. Data obtained from Twitter has been effectively utilized in the prediction and explanation of various real-world phenomena, such as spreading of infectious diseases [14], elections [4], stock market prediction [9], opinion polls [5] and sport results [6,15,18,22]. On average, Twitter posts contain meaningful information that can be exploited with the help of statistical methods. Therefore, Twitter offers a way to exploit the wisdom of crowds [10] concept to make better predictions of the real-world events. ...
... Hong and Skiena [22] presented a study on the relationship between NFL betting line and public opinion expressed in blogs and Twitter microposts. They used sentiment analysis on various data sources, including Twitter microposts for predicting point spread of NFL games. ...
Article
Full-text available
Social media has become a platform of first choice where one can express his/her feelings with freedom. The sports and matches being played are also discussed on social media such as Twitter. In this article, efforts are made to investigate the feasibility of using collective knowledge obtained from microposts posted on Twitter to predict the winner of a cricket match. For predictions, we use three different methods that depend on the total number of tweets before the game for each team, fans sentiments toward each team and fans score predictions on Twitter. By combining these three methods, we classify winning team prediction in a cricket game before the start of game. Our results are promising enough to be used for winning team forecast. Furthermore, the effectiveness of supervised learning algorithms is evaluated for classifiers where Support Vector Machine (SVM) has shown advantage over other classifiers.
... The research has demonstrated that social networking sites can significantly impact the interaction of learners with courses (Georgios Paltoglou, 2012). With the growing popularity of social networking, sentiment analysis has been used with social networks and microblogging sites, especially Twitter or blogs (Hong and Skiena, 2010;Miller et al., 2011). However, the nature and the structure of the texts published in social networks is largely scattered and unstructured. ...
Article
Full-text available
In recent years, sentiment analysis (SA) has gained popularity among researchers in various domains, including the education domain. Particularly, sentiment analysis can be applied to review the course comments in massive open online courses (MOOCs), which could enable instructors to easily evaluate their courses. This article is a systematic literature review on the use of sentiment analysis for evaluating students’ feedback in MOOCs, exploring works published between January 1, 2015, and March 4, 2021. To the best of our knowledge, this systematic review is the first of its kind. We have applied a stepwise PRISMA framework to guide our search process, by searching for studies in six electronic research databases (ACM, IEEE, ScienceDirect, Springer, Scopus, and Web of Science). Our review identified 40 relevant articles out of 440 that were initially found at the first stage. From the reviewed literature, we found that the research has revolved around six areas: MOOC content evaluation, feedback contradiction detection, SA effectiveness, SA through social network posts, understanding course performance and dropouts, and MOOC design model evaluation. In the end, some recommendations are provided and areas for future research directions are identified.
... In [12], product rankings were based on periodical reviews. In [7], the study was focussed on National Football League betting and its relationship with blog opinions. In [15], the focus was on linking of public opinion polls with Twitter sentiment. ...
... Likewise, Burton (2019) sought to understand consumer sentiment toward ambush marketing from the 2018 FIFA World Cup Finals. The second technique is the lexicon-based approach, using a predefined dictionary of positive and negative words (Hong & Skiena, 2010). The measure of sentiment is derived by counting the number of positive words minus the number of negative words in a message. ...
Article
The use of big data in sport and sport management research is increasing in popularity. Prior research generally includes one of the many characteristics of big data, such as volume or velocity. The present study presents big data in a multidimensional lens by considering the use of sentiment analysis. Specifically focusing on the phenomenon of tanking, the purposeful underperformance in sport competitions, the present study considers the impact that consumers’ sentiment regarding tanking has on game attendance in the National Basketball Association. Collecting social media posts for each National Basketball Association team, the authors create an algorithm to measure the volume and sentiment of consumer discussions related to tanking. These measures are included in a predictive model for National Basketball Association home game attendance between the 2013–2014 and 2017–2018 seasons. Our results find that the volume of discussions for the home team and sentiment toward tanking by the away team impact game attendance.
... Twitter (e.g., Hong & Skiena, 2010;Miller, Sathi, Wiesenthal, Leskovec, & Potts, 2011;Tumasjan, Sprenger, Sandner, & Welpe, 2010). Advances have enabled data-mining techniques and artificial analysis to be applied to MOOCs (e.g., Crossley et al., 2015;Wen, Yang, & Rose, 2014) and, more recently, analysis has begun to be used to identify sentiment from within MOOCs (Moreno-Marcos et al, 2018;Pérez, Jurado, & Villen, 2019). ...
Article
Full-text available
Many course designers trying to evaluate the experience of participants in a MOOC will find it difficult to track and analyse the online actions and interactions of students because there may be thousands of learners enrolled in courses that sometimes last only a few weeks. This study explores the use of automated sentiment analysis in assessing student experience in a beginner computer programming MOOC. A dataset of more than 25,000 online posts made by participants during the course was analysed and compared to student feedback. The results were further analysed by grouping participants according to their prior knowledge of the subject: beginner, experienced, and unknown. In this study, the average sentiment expressed through online posts reflected the feedback statements. Beginners, the target group for the MOOC, were more positive about the course than experienced participants, largely due to the extra assistance they received. Many experienced participants had expected to learn about topics that were beyond the scope of the MOOC. The results suggest that MOOC designers should consider using sentiment analysis to evaluate student feedback and inform MOOC design.
... In [43], reviews were used to sort products and merchants. In [31], the authors studied the relationships between betting trends in the NFL and public opinions on blogs and Twitter TM . In [55], the sentiment extracted from Twitter TM was linked to public opinion polls. ...
Article
Full-text available
Convolutional neural networks are known for their excellent performance in computer vision, achieving results in the state of the art. Moreover, recent research has shown that these networks can also provide promising results for natural language processing. In this case, the basic idea is to concatenate the vector representations of words into a single block and use it as an image. However, despite the good results, the problem of using convolution networks is the large numbers of design decisions that need to be made á priori. These models require the definition of many hyper-parameters, including the type of word embeddings, which consists of the data vectorized representation, the activation function that prints the non-linearity characteristics to the model, the size of the filter that applies data convolution, the number of feature maps, which are responsible for identifying the attributes and the pooling method used for data reduction. In addition, one must also predefine the regularization constant and the dropout rate, which are responsible for avoiding any network over-fitting. In existing research works, convolutional neural network architectures capable of overcoming the performance of traditional machine learning models are presented. Even though these can compete with more complex models, the problem of how the different setting of the hyper-parameters may affect the performance of this type of network has not yet been explored. In this paper, we propose an efficient sentiment analysis classifier using convolutional neural networks by analyzing the impact of the hyper-parameters on the model performance. The main interest in analyzing sentiment comes from the advent of social media and the technological advances that flood the Internet with opinions. Nonetheless, mining the Internet for opinion and sentiment analysis is not an easy task and thus needs outstanding models with the best hyper-parameters setting to be able to get pertinent answers. The results achieved are obtained with the use of GPU and show that the different configurations exceed the reference models in the most of the cases with gains of up to 18% and have similar performance to the models of the state of the art with gains of up to 2% in some cases.
... A positive word was given the sentiment score of +1 and a negative word was given the sentiment score of-1. In [5] the relationships between the NFL betting line and public opinions in blogs and Twitter were studied. In [6] Twitter sentiment was linked with public opinion polls. ...
Conference Paper
The paper considers mining and analyzing data generated by Twitter social network, regarding content classification, language determination and sentiment analysis of tweets. Analyzes are based on geospatial tweets collected in timespan of four months within region Vračar in Belgrade, Serbia. All of collected data is first being preprocessed, filtered and classified by given criteria, by using “Twitter search engine” (TSE) application, that has been upgraded in order to detect tweet language and execute sentiment analysis of the tweets written in English. This type of analysis can be used for determining popularity of city locations of interest and public spaces in general.
... En dehors des applications de la vie réelle, nombreux documents de recherche sur les applications d'opinion mining ont également été publiés. Par exemple dans Hong et Skiena[7], les relations entre les jeux de la league nationale de Football Américaine (NFL) et les opinions publiques dans les blogs et Twitter ont été étudiés. Dans Liu et al.[8]un modèle de sentiment a été proposé de prévoir le rendement des ventes. ...
Conference Paper
Full-text available
__ Savoir ce que les autres pensent a toujours été une importante information, avec la disponibilité croissante des ressources d'opinions tel que les réseaux sociaux, des nouvelles opportunités et des nouveaux défis sont créées afin d'analyser et savoir l'orientation d'une population et leur sentiments envers un produit, service, organisation ou évènement quelconque. L'opinion mining traite l'ensemble des tâches pour analyser automatiquement des documents qui expriment les avis et les sentiments des gens et assembler leurs orientations. Dans ce papier nous décrivons un état de l'art de l'opinion minning.
... While a great deal of recent research has focused on sentiment analysis of Twitter data and spam detection (Wang et al [11]) less attention has been devoted to extending these classification tasks to public Facebook posts. Furthermore, while domains such as politics (Bakliwal et al. [12]; Yang et al. [13]) and sports (Hong and Skiena [14]) have received strong coverage, the genre of commercial fashion brands has not been mined as frequently for predictive and classification tasks The absence of literature on cross channel sentiment analysis with a special focus on the implications of prior distributions on the classification results has motivated us to undertake the following study. The niche segment of fast fashion brands was chosen because it remains largely unexplored. ...
... Since the concept of "opinion" is critical to many activities, businesses and other entities are interested in knowing what those opinions are. Researchers have applied sentiment analysis in many real-life domains, such as predicting sales performance (Liu et al., 2007), linking Twitter sentiments with public opinion polls (O' Connor et al., 2010), predicting box-office revenues (Doshi, 2010), predicting the stock market (Bollen et al., 2011), studying trading strategies (Zhang and Skiena, 2010), and studying the relationship between the NFL betting line and public opinions (Hong and Skiena, 2010). ...
Thesis
Vector representations for language have been shown to be useful in a number of Natural Language Processing (NLP) tasks. In this thesis, we aim to investigate the effectiveness of word vector representations for the research problem of Aspect-Based Sentiment Analysis (ABSA), which attempts to capture both semantic and sentiment information encoded in user generated content such as product reviews. In particular, we target three ABSA sub-tasks: aspect term extraction, aspect category detection, and aspect sentiment prediction. We investigate the effectiveness of vector representations over different text data, and evaluate the quality of domain-dependent vectors. We utilize vector representations to compute various vector-based features and conduct extensive experiments to demonstrate their eectiveness. Using simple vector-based features, we achieve F1 scores of 79.9% for aspect term extraction, 86.7% for category detection, and 72.3% for aspect sentiment prediction.
... Recently, Hong and Skiena [12] used sentiment analysis from news and social media to design a successful NFL betting strategy. However, their main evaluation was on in-sample data, rather than forecasting. ...
Article
Full-text available
We study the relationship between social media output and National Football League (NFL) games, using a dataset containing messages from Twitter and NFL game statistics. Specifically, we consider tweets pertaining to specific teams and games in the NFL season and use them alongside statistical game data to build predictive models for future game outcomes (which team will win?) and sports betting outcomes (which team will win with the point spread? will the total points be over/under the line?). We experiment with several feature sets and find that simple features using large volumes of tweets can match or exceed the performance of more traditional features that use game statistics.
Chapter
Internet, & more unambiguously the creation of WWW in the early 1990s, helped people to build an interconnected global platform where information can be stored, shared, and consumed by anyone with an electronic device which has the ability to connect to the Web. This provides a way of putting together lots of information, ideas, and opinion. An interactive platform was born to post content, messages, and opinions under one roof, and the platform is known as social media. Social media has acquired massive popularity and importance that why today almost everyone can't stay away from it. Social media is not only a medium for people to express their thoughts, moreover, but it is also a very powerful tool which can be used by businesses to focus on new and existing customers and increase profit with the help of social media analytics. This paper starts with a discussion on social media with its significance & pitfalls. Later on, this paper presents a brief introduction of sentiment analysis in social media and give an experimental work on sentiment analysis in a social game review.
Article
Player evaluation is a key component of the question-answering (QA) system in sports. Since existing player evaluation methods heavily rely on game statistics, they cannot capture the qualitative impact of each player during a game, which can be exploited using news articles after the game. In this paper, we propose a deep learning-based player evaluation model by combining both quantitative game statistics and the qualitative analyses provided by news articles. Players are classified as positive or negative based on their performance during certain periods, and news articles in the same period are annotated using the player's class. Then, the relationship between news articles and the annotated polarity is investigated by a deep neural network, which can deal with the high dimensionality of the text data. Since there is no explicit polarity label for news articles, we use the change in game statistics in target periods to annotate related sentences. The proposed system is applied to a Korean professional baseball league (KBO) and it is shown to be capable of understanding the sentence polarity of news articles on player performances.
Article
Full-text available
What started as a social utility for sharing short bursts of ‘inconsequential information’ has become a powerful information network capable of both tracking and shaping current events. From orchestrating government insurgencies to tracking epidemics, the majority of information shared via Twitter contains semantic relevance to contemporary topic(s), according to recent statistics. And, in consequence, Twitter is considered by researchers as an ideal platform for sentiment analysis. Compared to other online arenas such as forum discussions, blogs, and Facebook postings, Twitter frequently yields a higher degree of sentiment analysis accuracy due to the shortness of each post (140 character limit per Tweet). Various natural language processing techniques have been used to successfully perform sentiment classification on a group of Tweets. However, these techniques analyze text using both English-specific grammar rules and lexicons. Since there are fewer resources or tools in other languages, researchers often attempt to first use machine translation to translate the text into English. Often, translation errors introduce noise that obfuscates the results. In this study, we are analyzing the accuracy of sentiment analysis using an ad hoc and a translated sentiment lexicon in terms of capability of predicting the results of a future occurrence. We collected some 22,000 tweets using Twitter Search and Streaming APIs regarding a highly popular TV Show called “O Ses Türkiye” to predict the winner (Turkish version of globally known voice contest “The Voice of America”). We first performed a frequency-based statistical classification using an English sentiment lexicon translated into Turkish as well as a small ad hoc Turkish sentiment lexicon generated specifically for this study. We also use a k-means clustering technique using the two sentiment lexicons to evaluate the accuracies. Our study concludes that although using a translated sentiment lexicon (or training data for that matter) can also give a rough estimate for the result of a future event successfully, a language-specific ad hoc lexicon yields better granularity with higher discriminative power between negative, positive and neutral tweets. We also show the effect of automatic spell check and stemming in tweets on the predictive and discriminative power of auto-translated sentiment lexicon on a target language.
Conference Paper
In Chap. 9, we studied the extraction of structured data from Web pages. The Web also contains a huge amount of information in unstructured texts. Analyzing these texts is of great importance as well and perhaps even more important than extracting structured data because of the sheer volume of valuable information of almost any imaginable type contained in text. In this chapter, we only focus on mining opinions which indicate positive or negative sentiments. The task is technically challenging and practically very useful. For example, businesses always want to find public or consumer opinions on their products and services. Potential customers also want to know the opinions of existing users before they use a service or purchase a product.
Conference Paper
Sentiment analysis is the fundamental component in text-driven monitoring or forecasting systems, where the general sentiment towards real-world entities (e.g., people, products, organizations) are analyzed based on the sentiment signals embedded in a myriad of web text available today. Building such systems involves several practically important problems, from data cleansing (e.g., boilerplate removal, web-spam detection), and sentiment analysis at individual mention-level (e.g., phrase, sentence-, document-level) to the aggregation of sentiment for each entity-level (e.g., person, company) analysis. Most previous research in sentiment analysis however, has focused only on individual mention-level analysis, and there has been relatively less work that copes with other practically important problems for enabling a large-scale sentiment monitoring system. In this paper, we propose Empath, a new framework for evaluating entity-level sentiment analysis. Empath leverages objective measurements of entities in various domains such as people, companies, countries, movies, and sports, to facilitate entity-level sentiment analysis and tracking. We demonstrate the utility of Empath for the evaluation of a large-scale sentiment system by applying it to various lexicons using Lydia, our own large scale text-analytics tool, over a corpus consisting of more than a terabyte of newspaper data. We expect that Empath will encourage research that encompasses end-to-end pipelines to enable a large-scale text-driven monitoring and forecasting systems.
Article
Home teams in sports have an advantage because of learning, travel and crowd factors. In the National Football League, home teams won 58% of games over the period 1981–1996. An examination of the betting market on NFL games showed that bettors generally have recognized the magnitude of the home field advantage. The point spread market was efficient in that the strategy of betting on home teams produced a win rate of .499. However, for a subset of games which had national focus (i.e., Monday night and playoff games), betting on the home team produced a .592 win rate, which was significantly different than .5 at the .0003 level, and betting on underdog Monday night and playoff home teams produced a .656 win rate. These results suggest that the home field advantages recognized in the sports psychology literature are increased under the public attention of national exposure to a larger extent than is recognized by bettors, and the point spread market is inefficient for national focus games.
Article
This paper tests the hypothesis that the football betting market is efficient. Our statistical tests are stronger than those in previous studies, and we examine both NFL and college data over a sample period of fifteen years. Our statistical tests detect two specific biases in the NFL market and an unspecified bias in the college market. We examine the year-to-year consistency and magnitudes of the biases and find that the NFL bias against home teams has been nearly eliminated, while the bias against underdogs has increased. Profitable exploitation of the biases depends upon transaction costs.
Conference Paper
There is a growing interest in mining opinions using senti- ment analysis methods from sources such as news, blogs and product reviews. Most of these methods have been devel- oped for English and are difficult to generalize to other lan- guages. We explore an approach utilizing state-of-the-art ma- chine translation technology and perform sentiment analysis on the English translation of a foreign language text. Our ex- periments indicate that (a) entity sentiment scores obtain ed by our method are statistically significantly correlated ac ross nine languages of news sources and five languages of a par- allel corpus; (b) the quality of our sentiment analysis meth od is largely translator independent; (c) after applying cert ain normalization techniques, our entity sentiment scores can be used to perform meaningful cross-cultural comparisons.
Conference Paper
Newspapers and blogs express opinion of news entities (peo- ple, places, things) while reporting on recent events. We present a system that assigns scores indicating positive or negative opinion to each distinct entity in the text corpus. Our system consists of a sentiment identication phase, which associates expressed opinions with each relevant entity, and a sentiment aggregation and scoring phase, which scores each entity relative to others in the same class. Finally, we evalu- ate the signicance of our scoring techniques over large corpus of news and blogs.
Article
Modifying and consolidating previous research methods to generate more reliable estimates, some fairly weak evidence is found of inefficiency in the NFL betting market resulting from a bias favouring home underdog (against away favourite) teams. In contrast to previous research, no evidence is found that 'momentum strategies' generate significant returns in this market.
Article
In an efficient NFL beting market, point spreads incorporate all relevant information contained in past game outcomes. Efficiency implies that trading rules based on past game outcomes should not be able to produce a consistent pattern of winners over losers. This study identifies 15 trading rules based on historical game outcomes and, using simulated gambling, tests them over the 1984–1986 NFL seasons. The study's main finding indicates that the NFL betting market is efficient, but does identify a small set of profitable trading rules over this time period.
Article
A tendency for individuals to overweigh recent information and underweigh prior data has been discovered by researchers in financial markets, economic forecasting, security analysis and other areas. A study of point spread patterns in the 2264 regular season National Football League (NFL) games over the 1981-;1995 seasons was conducted to investigate the overreaction bias of bettors. Results indicated that bettors tend to overweigh outstanding positive performance when measured over the previous game, over the previous two to five games or over the previous season. In general, the more outstanding the performance, the greater the overreaction. However, bettors did not overreact to unusual negative performance over the same periods. This result is congruent with the tendency for heavy favourites to cover the point spread less than half the time over the 1969-;1995 seasons. The overreaction bias in the NFL betting market provides another example of a violation of the weak form of the Efficient Markets Hypothesis.
Article
This article examines the efficiency of the National Football League betting market. The standard ordinary least squares regression methodology is replaced by a probit model. This circumvents potential econometric problems, and allows the authors to implement more sophisticated betting strategies where bets are placed only when there is a relatively high probability of success. In-sample tests indicate that probit-based betting strategies generate statistically significant profits. Whereas the profitability of a number of these betting strategies is confirmed by out-of-sample testing, there is some inconsistency among the remaining out-of-sample predictions. The authors' results also suggest that widely documented inefficiencies in this market tend to dissipate over time. Copyright 1997 by American Finance Association.
Fact sheets about sports wagering
AmericanGamingAssociation. 2009. Fact sheets about sports wagering.