ArticlePDF Available

Predicting Stock Market Indicators Through Twitter – “I Hope it is Not as Bad as I Fear

Authors:

Abstract and Figures

This paper describes early work trying to predict stock market indicators such as Dow Jones, NASDAQ and S&P 500 by analyzing Twitter posts. We collected the twitter feeds for six months and got a randomized subsample of about one hundredth of the full volume of all tweets. We measured collective hope and fear on each day and analyzed the correlation between these indices and the stock market indicators. We found that emotional tweet percentage significantly negatively correlated with Dow Jones, NASDAQ and S&P 500, but displayed significant positive correlation to VIX. It therefore seems that just checking on twitter for emotional outbursts of any kind gives a predictor of how the stock market will be doing the next day.
Content may be subject to copyright.
Procedia - Social and Behavioral Sciences 26 ( 2011 ) 55 – 62
Available online at www.sciencedirect.com
1877-0428 © 2011 Published by Elsevier Ltd.
doi: 10.1016/j.sbspro.2011.10.562
COINs2010: Collaborative Innovation Networks Conference
Predicting Stock Market Indicators Through Twitter
“I hope it is not as bad as I fear”
Xue Zhang1,2*, Hauke Fuehres2, Peter A. Gloor2
1Department of Mathematic and Systems Science, National University of Defense Technology, Changsha, Hunan, P.R.China
2MIT Center for Collective Intelligence, Cambridge MA, USA
Abstract
This paper describes early work trying to predict stock market indicators such as Dow Jones, NASDAQ and S&P 500 by
analyzing Twitter posts. We collected the twitter feeds for six months and got a randomized subsample of about one hundredth of
the full volume of all tweets. We measured collective hope and fear on each day and analyzed the correlation between these
indices and the stock market indicators. We found that emotional tweet percentage significantly negatively correlated with Dow
Jones, NASDAQ and S&P 500, but displayed significant positive correlation to VIX. It therefore seems that just checking on
twitter for emotional outbursts of any kind gives a predictor of how the stock market will be doing the next day.
© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of COINs 2010 Organizing Committee
Keywords: Twitter, economic indicator prediction, Web buzz analysis, coolhunting
1. Introduction
Twitter is a very popular microblogging website, where users can update their status in tweets, follow the people
they are interested, retweet others’ posts and even communicate with them directly. Since it launched in 2006, its
user base has been growing exponentially. As of June 2010, about 65 million tweets are posted each day, equaling
750 tweets sent each second (http://en.wikipedia.org/wiki/Twitter).
Recently, Twitter’s popularity has drawn more and more attention of researchers from different disciplines. There
are several streams of research investigating the role of Twitter. One stream of research focuses on understanding its
usage and community structure. By examining the follower network, Java et al. (2007) found that there is a great
variety in users’ intentions. A single user may have multiple intentions and may even serve different roles in
* Corresponding author. Tel.: +1 617 253 7018
E-mail address: xuezhang@mit.edu
56 Xue Zhang et al. / Procedia - Social and Behavioral Sciences 26 ( 2011 ) 55 – 62
different communities. Huberman et al. (2009) analyzed the social interaction on Twitter, revealing that the driver of
usage is a sparse hidden network among friends and followers, while most of the interaction links are meaningless.
Another stream of research concentrates on influence of Twitter users and information propagation. Cha et al.
(2010) compared three different measures of influence indegree, retweets and user mentions. They found that
popular users who have high indegree are not necessarily influential in terms of spawning retweets or mentions.
Also, Romero et al. (2010) showed that the correlation between popularity and influence is weaker than it might be
expected, because most users are passive information consumers and do not forward the content to the network. By
constructing a model capturing the speed, scale and range of information diffusion, Yang et al. (2010) claimed that
some properties of the tweets themselves predict greater information propagation.
Besides the general understanding of Twitter, other researchers are interested in its prediction power and
potential application to other areas. Asur and Huberman (2010) used Twitter to forecast box-office revenues of
movies. They showed that a simple model built from the rate at which tweets are created about particular topics
could outperform market-based predictors. In their study, Tumasjan et al. (2010) analyzed Twitter messages
mentioning parties and politicians prior to the German federal election 2009 and found that the mere number of
tweets reflects voter preferences and comes close to traditional election polls. Other researchers speculate that
Twitter also could be used in areas such as tracking the spread of epidemic disease (Lampos, V. & Cristianini, N.
2010).
There is also prior work on analyzing correlation between web buzz and stock market. Antweiler and Frank
(2004) determine correlation between activity in Internet message boards and stock volatility and trading volume.
Other researches employed blog posts to predict stock market behavior. Gilbert and Karahalios (2010) used over 20
million posts from the LiveJournal website to create an index of the US national mood, which they call the Anxiety
Index. They found that when this index rose sharply, the S&P 500 ended the day marginally lower than is expected.
Besides the posts’ contents itself, other properties of communication such as the number of comments, the length
and response time of comments etc. are also helpful. Choudhury et al. (2010) modeled such contextual properties as
a regression problem in a Support Vector Machine framework and trained it with stock movement. Their results are
promising, yielding about 87% accuracy in predicting the direction of movement.
In recent years, we have been working on trying to predict market indicators by analyzing Web Buzz, predicting
who will win an Oscar, or how well movies do at the box office (Doshi et. al 2009). Among other things we have
correlated posts about a stock on Yahoo!Finance and Motley’s Fool with the actual stock price, predicting the
closing price of the stock of the next day based on what people say today on Yahoo!Finance, on the Web and Blogs
about a stock title (Gloor et al. 2009). In this paper, we describe early work trying to predict stock market indicators
such as Dow Jones, NASDAQ and S&P 500 by analyzing Twitter posts.
2. Method
The rising popularity of twitter gives us a novel way of capturing the collective mind up to the last minute. In our
current project we analyze the positive and negative mood of the masses on twitter, comparing it with stock market
indices such as Dow Jones, S&P 500, and NASDAQ. We collected the twitter feeds from one whitelisted IP for six
months from March 30, 2009 to Sept 7, 2009, ranging from 8100 to 43040 tweets per day. According to Twitter, this
corresponds to a randomized subsample of about one hundredth of the full volume of all tweets, as the total volume
in 2009 was about 2.5 million tweets per day.
3. Results
57
Xue Zhang et al. / Procedia - Social and Behavioral Sciences 26 ( 2011 ) 55 – 62
3.1. Measuring Investor Fear by Tracking “Fear” Words
As is well known, emotional state can influence our decisions, and no doubt such choice includes stock market
investment decision (Gilbert et. al 2010). When people are pessimistic or uncertain about the future, they will be
more cautious to invest and trade. So capturing the collective mind – especially people’s mood – becomes one
possible way to predict the stock market movement.
Twitter is a microblogging service in which users post very short messages: less than 140 characters, averaging
11 words per message (Connor 2010). This implies that most of the tweets have simple meaning, and even just one
or two key words may capture the main topic. Inspired by this property, we decided to use mood words, for example
“fear”, “worry”, “hope” etc., as emotional tags of a tweet. Then we measured collective emotion each day by simply
counting all tweets containing such words. Table 1 below summarizes our results. The emotional words are divided
into two groups: positive ones – hope and happy, and negative ones – fear, worry, nervous, anxious, and upset. Due
to the different sample size everyday, the daily amount of each emotion is also highly variable. There were 4 to 49
“fear” tweets and 5 to 51 “worry” tweets per day; for “hope” the daily tweet numbers range from 54 to 467. More
interestingly, we also find that the number of positive tweets is much higher than that of negative ones, more than
double on average, which might suggest that people prefer optimistic to pessimistic words.
Table 1. Number of Twitter Posts from March 30, 2009 to Sept , 2009
Average per day Min per day Max per day
Tweet # 29758 8100 43040
Hope # 307 54 467
Happy # 260 37 1806
Fear # 28 449
Worry # 27 551
Nervous # 13 036
Anxious # 4 0 9
Upset # 14 225
Positive # 570 91 2204
Negative # 86 11 125
58 Xue Zhang et al. / Procedia - Social and Behavioral Sciences 26 ( 2011 ) 55 – 62
3.2. Selection of Baseline
Next, we investigated against which baseline the number of tweets about a certain topic such as “hope, fear, and
worry” should be measured. In our work we looked at three different baselines:
1. The number of tweets per day
2. The number of followers per day
3. The number of retweets per day
First we investigated the number of tweets about a certain topic in relation to the total number of tweets. The
daily total number of tweets has been growing incrementally over the last years (Figure 1).
Figure 1. Growth in tweets per day (http://mashable.com/2010/02/22/twitter-50-million-tweets/)
In our own data sample we were using the Twitter “public timeline” function, implemented in such a way to
deliver a more or less constant stream of messages per day. This stream allowed us to measure the percentage of
emotional tweets among all the tweets. Using “hope” as an example, we defined
hope%t
as the ratio between the
number of “hope” tweets on day t and the amount of tweets we collected that day, comparing it with the stock
market indicators on day t+1. Table 2 displays the correlation analysis result.
Table 2. Correlation Coefficient of emotional tweets percentage and stock market indicators (N=93) with total number of tweets per day as a
baseline
**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).
Dow NASDAQ S&P 500 VIX
Hope % 0.381** 0.407** 0.373** 
Happy % 0.107 0.105 0.103 
Fear % 0.208* 0.238* 0.200 
Worry % 0.300** 0.305** 0.295** 
Nervous % 0.023 0.054 0.021 
Anxious % 0.261* 0.295** 0.262* 
Upset % 0.185 0.188 0.184 
Positive % 0.192 0.197 0.187 
Negative % 0.294** 0.323** 0.288**0.301**
59
Xue Zhang et al. / Procedia - Social and Behavioral Sciences 26 ( 2011 ) 55 – 62
As external benchmark of investor fear we used the Chicago Board Options Exchange Volatility Index VIX,
which strongly negatively correlated with Dow, S&P 500, and NASDAQ, which is not surprising, as the spread of
stock options on a given day is used to calculate VIX.Initially we expected that the correlation between optimistic
mood and stock market indicators would be positive, and the pessimistic mood would negatively correlate.
Surprisingly, we found positive correlation for all of them with VIX, and negative correlation with Dow, NASDAQ
and S&P500. This implies that people start using more emotional words such as hope, fear and worry in times of
economic uncertainty, independent of whether they have a positive or negative context.
As our second candidate for a baseline we investigated the total number of followers per day. Follower is a key
concept in Twitter, it is commonly seen as a measure of popularity. It is likely that the more followers a user has, the
more people s/he can affect. In particular, the bigger the audience of one pessimist is, the more people may be
infected and feel the same negative way. We analyzed the correlation between percentage of potential emotional
audience and stock market indicators. For instance, we added all the follower numbers of “worry” tweets of day t
and divided it by the total number of followers on that day, (
worryfollower%t
in Table 3) then comparing it with
Dow
t+1
,
NASDAQ
t+1
and
S&P500
t+1
. The correlation coefficients are 0.143, 0.149 and 0.146 separately, which
are relatively lower than we expected. As can be seen in Table 3, this index is therefore not a good predictor of stock
market indices.
Table 3. Correlation Coefficients of percentage of potential emotional audience and stock market indicators (N=93)
Finally we looked at the number of retweets per day, based on the hypothesis that the more a topic is being
picked up and retweeted by others, the more it is relevant. In an accumulated way, the total number of retweets is a
proxy for the activity of the twitter users on a particular way.
Table 4. Number of retweets from March 30, 2009 to Sept , 2009
Figure 2. Percentage of retweets per day
Dow NASDAQ S&P 500 VIX
Hope-followers % 0.086 0.048 0.077 
 0.19 0.181 0.188
Fear-followers % 0.005 0.051 0.012 
Worry-followers % 0.143 0.149 0.146 
Nervous-followers %   0.108
Anxious-followers % 0.156 0.177 0.177
Upset-followers % 0.106 0.116 0.103
Average
per day
Min per
day
Max per
day
Retweet # 1083 221 1884
   
Happy-retweet # 9 0 40
Fear-retweet # 3 0 9
Worry-retweet # 1 0 51
60 Xue Zhang et al. / Procedia - Social and Behavioral Sciences 26 ( 2011 ) 55 – 62
Table 4 above illustrates the number of retweets about a certain topic per day. The retweet numbers range from
221 to 1884, nearly 3% 5% of the tweets. As Figure 2 shows, the retweets percentage displayed an exponential
growth too. We also found that there were about 40% less retweets at weekends (the nodes underneath the black line
in figure 2 are weekends). We speculate that on weekends, active tweeters have the time to send more original
tweets, while during the week they pick up tweets from others they find worthwhile retweeting. This means,
however, that they “stake their reputation” on others’ tweets during the weekdays.
Next, we analyzed the correlation between the emotional retweets percentage and the stock market indicators.
Again, taking “hope”, for example, we defined
hope retweet %t
as the ratio between the number of retweets
which contain “hope” on day t and the amount of retweets on that day, then we compared it with the stock market
indicators on day t+1. Table 5 below displays the correlation analysis result. Obviosly, number of retweets is a better
baseline than number of followers, but simply taking the total number of tweets gives the best results. This is not
surprising, however, because the number of retweets containing “hope” is much lower than the number of tweets
containing “hope”, which means that the fluctuation in the results is much higher, therefore leading to smaller
sample size and less significant correlations. We speculate that the correlations would have been higher if we would
have been able to collect a larger subsample of all the tweets.
Table 5. Correlation Coefficients of emotional retweets percentage and stock market indicators (N=93)
**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).
3.3. Time Lag of Prediction
We also investigated how much the discussion on twitter precedes fluctuations on the stock market. To put it in
other words, the question is how long it takes the stock market to react on the buzz on Twitter.
All analysis in Section 3.2 focused on the Twitter buzz one day before the trading day. However, this way, a
considerable portion of information has been wasted. For example, when we analyze Monday’s Dow, just the data
of Sunday are used. All the data about Fridays and Saturdays becomes useless, because weekends are not trading
days. In order to find out the time lag of prediction and achieve full use of resources, we created a simple twitter-
volatility index averaging the buzz of
dayt
,
day
t1
and
day
t2
to predict the stock market indicators of
day
t+1
.
This index displays significant negative correlations to Dow, NASDAQ and S&P500, and significant positive
correlation to VIX (in this section we only use the emotional tweet percentages with the total number of tweets as
baseline as predictor). Among all the emotional words, hope, fear and worry work best in this analysis. We also
added them together to test whether the sum of them might improve results further, which it turns out it did not (see
table 6).
Dow NASDAQ S&P 500 VIX
Hope-retweet % 0.139 0.156 0.158 
 0.011 0.008 0.015
Fear-retweet % 0.258* 0.245* 0.253* 
Worry-retweet % 0.037 0.036 0.047 0.083
61
Xue Zhang et al. / Procedia - Social and Behavioral Sciences 26 ( 2011 ) 55 – 62
Table 6. Correlation Coefficient of average emotional tweets percentage and stock market indicators (N=93)
**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).
The picture below visualizes the negative correlation between Dow (blue) and “hope+fear+worry%-3-mean”
(green) in the period March 30, 2009 to Sept 7, 2009.
Figure 3. Correlation between “hope, fear and worry-3 mean” and Dow Jones Industrial Average
4. Discussion
To put it in simple words, when the emotions on twitter fly high, that is when people express a lot of hope, fear,
and worry, the Dow goes down the next day. When people have less hope, fear, and worry, the Dow goes up. It
Dow NASDAQ S&P 500 VIX
Hope% 0.381** 0.407** 0.373** 
Hope%-2 mean 0.618** 0.631** 0.607** 
Hope%-3-mean 0.737** 0.738** 0.724** 
Fear% 0.208 * 0.238 * 0.2 
Fear%-2-mean 0.259* 0.285** 0.253* 
Fear%-3-mean 0.346** 0.368** 0.342** 
Worry% 0.3** 0.305** 0.295** 
Worry%-2-mean 0.421** 0.415** 0.414** 
Worry%-3-mean 0.472** 0.460** 0.467** 
Hope+Fear+Worry% 0.379** 0.405** 0.37** 
Hope+Fear+Worry%-2-mean 0.612** 0.625** 0.6** 
Hope+Fear+Worry%-3-mean 0.726** 0.728** 0.713** 
62 Xue Zhang et al. / Procedia - Social and Behavioral Sciences 26 ( 2011 ) 55 – 62
therefore seems that just checking on twitter for emotional outbursts of any kind gives a predictor of how the stock
market will be doing the next day.
In this paper, we have presented very preliminary results, much more work is needed to verify it further.
References
Antweiler, W. & Frank, M.Z. (2004). Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards. Journal of Finance
Vol. 59, No. 3 (Jan., 2004), pp.1259-1294.
Asur, S. & Huberman, B. A. (2010). Predicting the Future With Social Media. http://arxiv.org/abs/1003.5699.
Boyd. danah, Scott Golder, & Gilad Lotan (2010). Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter. HICSS-43. IEEE:
Kauai, HI, January 6.
Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, K. P. (2010). Measuring User Influence in Twitter: The Millon Follower Fallacy. 4th
International AAAI Conference on Weblogs and Social Media (ICWSM), 2010.
Choudhury, M. D., Sundaram, H., John, A. & Seligmann, D. D. (2010). Can Blog Communication Dynamics be Correlated with Stock Market
Activity? Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, 2010.
Connor, B., Balasubramanyan, R., Routledge, B. R. & Smith, N.A. (2010). From Tweets to Polls: Linking Text Sentiment to Public Opinion
Time Series. 4th International AAAI Conference on Weblogs and Social Media (ICWSM), 2010.
Doshi, L. Krauss, J. Nann, S. Gloor, P. Predicting Movie Prices Through Dynamic Social Network Analysis. Proceedings COINs 2009,
Collaborative Innovations Networks Conference, Savannah GA, Oct 8-11, 2009
Gilbert, E. & Karahalios, K. (2010). Widespread Worry and the Stock Market. 4th International AAAI Conference on Weblogs and Social Media
(ICWSM), 2010.
Gloor, P. & Zhao, Y. (2004) TeCFlow - A Temporal Communication Flow Visualizer for Social Networks Analysis. ACM CSCW Workshop on
Social Networks. ACM CSCW Conference, Chicago, Nov. 6. 2004.
Gloor, P., Krauss, J., Nann, S., Fischbach, K. & Schoder, D. (2009). Web Science 2.0: Identifying Trends through Semantic Social Network
Analysis. IEEE Conference on Social Computing (SocialCom-09), Aug 29-31, Vancouver, 2009.
Huberman, B. A., Romero, D. M., & Wu F. (2009). Social networks that matter: Twitter under the microscope. First Monday, 14(1), 2009.
Java, A., Song, X., Finin, T. & Tseng, B. (2007). Why We Twitter: Understanding Microblogging Usage and Communities. 9th WebKDD and 1st
SNA-KDD workshop on web mining and social network analysis, 2007.
Lampos, V. & Cristianini, N. (2010). Tracking the flu pandemic by monitoring the Social Web. IAPR 2nd Workshop on Cognitive Information
Processing (CIP 2010), 14-16 Jun 2010.
Romero, D. M., Galuba, W., Asur, S. & Huberman, B. A. (2010). Influence and Passivity in Social Media.
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1653135.
Tumasjan, A., Sprenger, T. O., Sandner P. G. & Welpe I. M. (2010). Predicting Elections with Twitter: Whan 140 Characters Reveal about
Political Sentiment. 4th International AAAI Conference on Weblogs and Social Media (ICWSM), 2010.
Weng, J., Lim, E., Jiang, J. & He, Q. (2010). TwitterRank: Finding Topic-sensitive Influential Twitterers. WSDM, 2010.
Yang, J. & Counts, S. (2010). Predicting the Speed, Scale, and Range of Information Diffusion in Twitter. 4th International AAAI Conference on
Weblogs and Social Media (ICWSM), 2010.
... [15,21,60,95,96,106] have studied the predictability of the Foreign Stock Exchange (Forex) and cryptocurrencies market, where a currency is traded based on the ratio of two currency pairs, such as the EUR/USD, UDS/JPY, or BTC/USD. News [13,86,89,107]; social network data, such as Twitter [55,108,109], Stocktweet [88,108,110], or Sina Weibo [111]; as well as trends in search engines [95,112]; referring to Wikipedia page statistics [59]; and online reviews of customers about firm's products [113] are the types of media-based sources available to investors. Thomson Reuters and Bloomberg newsgroup are the mainstream news resources in this category; however, it is a basic challenge to utilize relevant texts to the target market because of redundancy and noise in the associated texts. ...
... [15,21,60,95,96,106] have studied the predictability of the Foreign Stock Exchange (Forex) and cryptocurrencies market, where a currency is traded based on the ratio of two currency pairs, such as the EUR/USD, UDS/JPY, or BTC/USD. News [13,86,89,107]; social network data, such as Twitter [55,108,109], Stocktweet [88,108,110], or Sina Weibo [111]; as well as trends in search engines [95,112]; referring to Wikipedia page statistics [59]; and online reviews of customers about firm's products [113] are the types of media-based sources available to investors. Thomson Reuters and Bloomberg newsgroup are the mainstream news resources in this category; however, it is a basic challenge to utilize relevant texts to the target market because of redundancy and noise in the associated texts. ...
... There have been two lines of work predicting stock markets using a sentiment analysis. In the first one, researchers extract the sentiment (positive, negative, or neutral) from documents as features [23,79], and in the other line of work, the public mood time series is calculated by aggregating a sentiment score for each time interval [25,71,108]. From the market perspective, there are two categories, coarse grained and fine grained, in the domain-specific usage of the financial sentiment analysis (FSA). ...
Article
Full-text available
Abstract: News dissemination in social media causes fluctuations in financial markets. (Scope) Recent advanced methods in deep learning-based natural language processing have shown promising results in financial market analysis. However, understanding how to leverage large amounts of textual data alongside financial market information is important for the investors’ behavior analysis. In this study, we review over 150 publications in the field of behavioral finance that jointly investigated natural language processing (NLP) approaches and a market data analysis for financial decision support. This work differs from other reviews by focusing on applied publications in computer science and artificial intelligence that contributed to a heterogeneous information fusion for the investors’ behavior analysis. (Goal) We study various text representation methods, sentiment analysis, and information retrieval methods from heterogeneous data sources. (Findings) We present current and future research directions in text mining and deep learning for correlation analysis, forecasting, and recommendation systems in financial markets, such as stocks, cryptocurrencies, and Forex (Foreign Exchange Market).
... It has been shown that the volume of social media posts about a film are predictive of its ticket sales [8]. In traditional finance, there are many works that use social media data to predict the prices of stocks [9][10][11][12][13][14][15][16][17]. These works often utilize the sentiment of social media posts in addition to their volume. ...
Preprint
Full-text available
We study the problem of predicting the future performance of cryptocurrencies using social media data. We propose a new model to measure the engagement of users with topics discussed on social media based on interactions with social media posts. This model overcomes the limitations of previous volume and sentiment based approaches. We use this model to estimate engagement coefficients for 48 cryptocurrencies created between 2019 and 2021 using data from Twitter from the first month of the cryptocurrencies' existence. We find that the future returns of the cryptocurrencies are dependent on the engagement coefficients. Cryptocurrencies whose engagement coefficients are too low or too high have lower returns. Low engagement coefficients signal a lack of interest, while high engagement coefficients signal artificial activity which is likely from automated accounts known as bots. We measure the amount of bot posts for the cryptocurrencies and find that generally, cryptocurrencies with more bot posts have lower future returns. While future returns are dependent on both the bot activity and engagement coefficient, the dependence is strongest for the engagement coefficient, especially for short-term returns. We show that simple investment strategies which select cryptocurrencies with engagement coefficients exceeding a fixed threshold perform well for holding times of a few months.
... This method deals with mostly unstructured data. It has been observed that we can also predict the stock market behavior even using this unstructured data such as text [13]. ...
... This method deals with mostly unstructured data. It has been observed that we can also predict the stock market behavior even using this unstructured data such as text [13]. ...
Thesis
Full-text available
It has been a perpetual question for investors whether to buy or sell a particular stock because thedecision is based on speculation [1]. The ongoing news about a company can help potential investors tohave better indication about company performances to invest in the right stocks. To support the abovestatement, this study investigates how news sentiment polarity could affect the stock market share price.Social media and financial news can be considered as two potential sources to find out the sentimentabout a company. Under this experiment, the polarity of the sentiment was extracted from financial newsheadlines of historical news archive and online news website and was joined with day-wise stock prices toknow how they were related. The sentiment could be positive, negative or neutral, and the sentimentscore quantitatively depicts how positive or negative the text is. To extract the sentiment and sentimentscores out of the news headlines, we used VADER, TextBlob, and RoBERTa as pre-trained NLP algorithms.From the results, we discovered that there was a correlation existed between the news headlines andactual stock market share price movement and there were 70 percent of cases where news sentiment andstock market share price movements were in orchestration.
... This method deals with mostly unstructured data. It has been observed that we can also predict the stock market behavior even using this unstructured data such as text [13]. ...
... This method deals with mostly unstructured data. It has been observed that we can also predict the stock market behavior even using this unstructured data such as text [13]. ...
Article
It has been a perpetual question for investors whether to buy or sell a particular stock because the decision is based on speculation [1]. The ongoing news about a company can help potential investors to have better indication about company performances to invest in the right stocks. To support the above statement, this study investigates how news sentiment polarity could affect the stock market share price. Social media and financial news can be considered as two potential sources to find out the sentiment about a company. Under this experiment, the polarity of the sentiment was extracted from financial news headlines of historical news archive and online news website and was joined with day-wise stock prices to know how they were related. The sentiment could be positive, negative or neutral, and the sentiment score quantitatively depicts how positive or negative the text is. To extract the sentiment and sentiment scores out of the news headlines, we used VADER, TextBlob, and RoBERTa as pre-trained NLP algorithms. From the results, we discovered that there was a correlation existed between the news headlines and actual stock market share price movement and there were 70 percent of cases where news sentiment and stock market share price movements were in orchestration.
... In addition to the number of tweets, tweets can also be classified according to the mood they reflect or simply as positive and negative sentiment. Zhang et al. (2011) use tweets to predict stock market indicators. Their finding: Emotional tweets containing the words "hope", "fear" and "worry" reflect uncertainty. ...
Thesis
The idea of this thesis is to use new data sources to approximate investor beliefs. It investigates whether the approximation improves the measurement of return and volatility in existing model frameworks. The findings are that differences in implied volatility, Google Search volume and Twitter Volume can be proxy variables for investor beliefs. They have an impact on financial market indicators and on the prediction of future market movements. Comparison of the trading behaviour of individual and institutional investors to predict market movements The first approach is to create a new sentiment index which compares the difference between retail investor behaviour at the Stuttgart Stock Exchange (SSE) and professional investors at the Frankfurt Stock Exchange (FSE). The measure is a comparison between the implied volatility measures for the DAX at the FSE (VDAX and VDAX-NEW) and a newly created implied volatility index (VSSE) for the SSE. The sentiment index is significant in predicting the daily returns on a size-based long-short portfolio over a four-year period. The analysis shows the persistent inconsistence between prices of structured products for retail investors on the SSE and option prices of professional investors on the FSE. The results provide empirical evidence that there are significant persistent behavioural differences between the two investor types which is reflected in persistent mispricing. Measurability of investor beliefs and their impact on financial markets The second approach is to measure individual investor beliefs with Google search volume (GSV) and Twitter volume (TV) to analyse their impact on financial markets. The basis is a daily panel of 29 Dow Jones Industrial average index (DJIA) stocks over a time period of 3.5 years in a panel data set-up. The impact on trading activity measured by turnover, is positive for GSV and TV on the same day and the next day which indicates their predictive power. The impact on realized volatility (RV), indicating the share of noise traders on the market, is only positive and significant for TV. It is significant on the same day and the next day. The impact of GSV is not significant. The results support the idea that GSV and TV capture the beliefs of individual investors. Although they suggest that the impact of TV on financial markets is more important than the impact of GSV. Predictive power of Google and Twitter The third approach is to use GSV and TV as a proxy for investor attention and investor sentiment, to assess their predictive power on the RV of the DJIA. The basis is a time-series set-up with a vector autoregression (VAR) model over a period of 2.5 years. The findings show that GSV and TV granger cause RV, controlling for macroeconomic and financial factors. Again, the effect of TV on RV is more important than the effect of GSV. In-sample, the linear prediction model with GSV and TV outperforms a standard AR (1) process. Out-of-sample the AR (1) process outperforms the standard model with GSV and TV. Clustering for high and low volatility groups, the analysis shows that the effect of GSV and TV on RV changes. Especially in times of high and low RV, GSV and TV seem to contain new information, as they improve the model fit compared to a standard AR (1) process. However, the results are not persistent in- and out-of-sample. This underlines that the results of GSV and TV are not generally persistent but depend on the selected criteria. Overall, the results of this thesis show that investor beliefs have an impact on financial markets. The measures, such as a sentiment index based on implied volatility, GSV and TV are proxy variables for investor beliefs. Future research should further improve the comprehension of investor beliefs to improve causality and economic significance in the long term.
... Recent research analyzes the effect of index level sentiment on the U.S market. Zhang, Fuehres, and Gloor (2011) and Bollen, Mao, and Zeng (2011) find that Twitter-related moods are key determinants of U.S stock indexes. ...
Article
Full-text available
We examine the quantile connectedness of returns between the recently developed S&P 500 Twitter Sentiment Index and various asset classes. Rather than a mean-based connectedness measure, we apply quantile-connectedness to explore connectedness of means and, especially, extreme left and right tails of distributions. Using mean-based connectedness measures, the level of return connectedness between the twitter sentiment index and all financial markets is a modest 46%. However, when applying a novel quantile-based connectedness approach, we find that levels of tail-connectedness are much stronger, up to 82%, at extreme upper and lower tails. This suggests that the impact of sentiment on financial markets is much stronger during extreme positive/negative sentiment shocks. Moreover, return connectedness measures are less volatile during extreme events. Net connectedness analysis shows that the Twitter sentiment index acts as a net transmitter of return spillovers, highlighting the leading role of investor sentiment on predicting other financial markets.
Article
Breakthrough research may signal shifts in science, technology, and innovation systems. Early identification of breakthrough research is important not only for scientists, but also for policy makers and R&D experts in developing R&D strategies and allocating R&D resources. Researchers mostly use scientific papers data to identify potential breakthrough research, but they rarely make use of Twitter data related to scientific research and machine learning methods. Analysis of Twitter data is of great significance for us to understand the public's perception of potential breakthrough research and to identify potential breakthrough research. Machine learning methods can assist us in predicting the trend of events by utilizing prior knowledge and experience. Therefore, this paper proposes a framework for identifying potential breakthrough research using machine learning methods with scientific papers and Twitter data. We select solar cells as a case study to verify the valid and flexible of this framework. In this case, we use machine learning method to discover potential breakthrough research from scientific papers, and we use Twitter data mining to analyze Twitter users' sense of and response to the discovered potential breakthrough research, which aims to achieve a more extensive and diverse assessment of the discovered potential breakthrough research. This paper contributes to identifying potential breakthrough research, as well as understanding the emergence and development of breakthrough research. It will be of interest to R&D experts in the field of solar cell technology.
Technical Report
Full-text available
Cryptocurrencies are one of the latest and most discussed topics in the financial world. Every time a new one entered the market, influencers like Elon Musk started talking about them. With our paper, we investigated the connection between the price development of Dogecoin and the sentiment of tweets from Elon Musk that contain Dogecoin-related content. Since the data basis of a single influencer was not enough to work reliably with, we also included Dogecoin tweets from every other user. The data, scraped from Twitter's API, was cleaned, and partially annotated. More than 45,000 tweets were then analyzed using different sentiment analysis models. Our results show a decent connection between Musk's tweets and Dogecoin price but not between the latter and the group of every other Twitter user.
Article
Full-text available
The ever-increasing amount of information flowing through Social Media forces the members of these networks to compete for attention and influence by relying on other people to spread their message. A large study of information propagation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and passivity of users based on their information forwarding activity. An evaluation performed with a 2.5 million user dataset shows that our influence measure is a good predictor of URL clicks, outperforming several other measures that do not explicitly take user passivity into account. We demonstrate that high popularity does not necessarily imply high influence and vice-versa.
Article
Full-text available
Microblogging is a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. Twitter, a popular microblogging tool has seen a lot of growth since it launched in October, 2006. In this paper, we present our observations of the microblogging phenomena by studying the topological and geographical properties of Twitter's social network. We find that people use microblogging to talk about their daily activities and to seek or share information. Finally, we analyze the user intentions associated at a community level and show how users with similar intentions connect with each other.
Article
Full-text available
The ever-increasing amount of information owing through Social Media forces the members of these networks to compete for attention and influence by relying on other peopleto spread their message. A large study of information propagation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and passivity of users based on their information forwarding activity. An evaluation performed with a 2.5 million user dataset shows that our influence measure is a good predictor of URL clicks, outperforming several other measures that do not explicitly take user passivity into account. We also explicitly demonstrate that high popularity does not necessarily imply high influence and vice-versa.
Conference Paper
Full-text available
Tracking the spread of an epidemic disease like seasonal or pandemic influenza is an important task that can reduce its impact and help authorities plan their response. In particular, early detection and geolocation of an outbreak are important aspects of this monitoring activity. Various methods are routinely employed for this monitoring, such as counting the consultation rates of general practitioners. We report on a monitoring tool to measure the prevalence of disease in a population by analysing the contents of social networking tools, such as Twitter. Our method is based on the analysis of hundreds of thousands of tweets per day, searching for symptom-related statements, and turning statistical information into a flu-score. We have tested it in the United Kingdom for 24 weeks during the H1N1 flu pandemic. We compare our flu-score with data from the Health Protection Agency, obtaining on average a statistically significant linear correlation which is greater than 95%. This method uses completely independent data to that commonly used for these purposes, and can be used at close time intervals, hence providing inexpensive and timely information about the state of an epidemic.
Conference Paper
Full-text available
Twitter - a microblogging service that enables users to post messages ("tweets") of up to 140 characters - supports a variety of communicative practices; participants use Twitter to converse with individuals, groups, and the public at large, so when conversations emerge, they are often experienced by broader audiences than just the interlocutors. This paper examines the practice of retweeting as a way by which participants can be "in a conversation." While retweeting has become a convention inside Twitter, participants retweet using different styles and for diverse reasons. We highlight how authorship, attribution, and communicative fidelity are negotiated in diverse ways. Using a series of case studies and empirical data, this paper maps out retweeting as a conversational practice.
Conference Paper
Full-text available
This paper focuses on the problem of identifying influential users of micro-blogging services. Twitter, one of the most notable micro-blogging services, employs a social-networking model called "following", in which each user can choose who she wants to "follow" to receive tweets from without requiring the latter to give permission first. In a dataset prepared for this study, it is observed that (1) 72.4% of the users in Twitter follow more than 80% of their followers, and (2) 80.5% of the users have 80% of users they are following follow them back. Our study reveals that the presence of "reciprocity" can be explained by phenomenon of homophily. Based on this finding, TwitterRank, an extension of PageRank algorithm, is proposed to measure the influence of users in Twitter. TwitterRank measures the influence taking both the topical similarity between users and the link structure into account. Experimental results show that TwitterRank outperforms the one Twitter currently uses and other related algorithms, including the original PageRank and Topic-sensitive PageRank.
Conference Paper
Full-text available
Our emotional state influences our choices. Research on how it happens usually comes from the lab. We know relatively little about how real world emotions affect real world settings, like financial markets. Here, we demonstrate that estimating emotions from weblogs provides novel information about future stock market prices. That is, it provides information not already apparent from market data. Specifically, we estimate anxiety, worry and fear from a dataset of over 20 million posts made on the site LiveJournal. Using a Granger-causal framework, we find that increases in expressions of anxiety, evidenced by computationally-identified linguistic features, predict downward pressure on the S&P 500 index. We also present a confirmation of this result via Monte Carlo simulation. The findings show how the mood of millions in a large online community, even one that primarily discusses daily life, can anticipate changes in a seemingly unrelated system. Beyond this, the results suggest new ways to gauge public opinion and predict its impact.
Conference Paper
We introduce a novel set of social network analysis based algorithms for mining the Web, blogs, and online forums to identify trends and find the people launching these new trends. These algorithms have been implemented in Condor, a software system for predictive search and analysis of the Web and especially social networks. Algorithms include the temporal computation of network centrality measures, the visualization of social networks as Cybermaps, a semantic process of mining and analyzing large amounts of text based on social network analysis, and sentiment analysis and information filtering methods. The temporal calculation of betweenness of concepts permits to extract and predict long-term trends on the popularity of relevant concepts such as brands, movies, and politicians. We illustrate our approach by qualitatively comparing Web buzz and our Web betweenness for the 2008 US presidential elections, as well as correlating the Web buzz index with share prices.
Article
This paper introduces an approach for organizational redesign and optimization of communication flows based on temporal analysis of communication patterns in groups of people. Our Temporal Communication Flow Visualizer automatically generates interactive movies of communication flows among individuals by mining e-mail log files and other communication archives. Combining those movies with measures of social network analysis such as the change over time in group betweeness centrality (GBC) and group density leads to deep insights into organizational dynamics. In addition we have defined a contribution index, which measures the activity of an individual as a sender and receiver of messages relative to a team. Based on these findings we can make predictions on the productivity of teams and suggest interventions for improved performance.