ArticlePDF Available

Social Media Big Data Analytics for Demand Forecasting: Development and Case Implementation of an Innovative Framework


Abstract and Figures

Social media big data offers insights that can be used to make predictions of products' future demand and add value to the supply chain performance. The paper presents a framework for improvement of demand forecasting in a supply chain using social media data from Twitter and Facebook. The proposed framework uses sentiment, trend, and word analysis results from social media big data in an extended Bass emotion model along with predictive modelling on historical sales data to predict product demand. The forecasting framework is validated through a case study in a retail supply chain. It is concluded that the proposed framework for forecasting has a positive effect on improving accuracy of demand forecasting in a supply chain.
Content may be subject to copyright.
DOI: 10.4018/JGIM.2020010106
Volume 28 • Issue 1 • January-March 2020
Copyright © 2020, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Rehan Iftikhar, Maynooth University, Maynooth, Ireland
Mohammad Saud Khan, Victoria University of Wellington, New Zealand
Social media big data offers insights that can be used to make predictions of products’ future demand
and add value to the supply chain performance. The paper presents a framework for improvement
of demand forecasting in a supply chain using social media data from Twitter and Facebook. The
proposed framework uses sentiment, trend, and word analysis results from social media big data in
an extended Bass emotion model along with predictive modelling on historical sales data to predict
product demand. The forecasting framework is validated through a case study in a retail supply
chain. It is concluded that the proposed framework for forecasting has a positive effect on improving
accuracy of demand forecasting in a supply chain.
Apparel Supply Chain, Bass Emotion Model, Big Data, Demand Forecasting, Emotion Enhanced Model,
Sentiment Analysis, Social Media, Supply Chain Management
Big data represents a tremendous opportunity for companies, as it can help to make better decisions
in an operational, tactical and strategic level (Schroeck, Shockley, Smart, Romero-Morales, & Tufano,
2012), with direct impact on business profitability (Waller & Fawcett, 2013). The ability to draw
insights from different types of data creates huge value for a firm (Dijcks, 2013; Kiron & Shockley,
2015). Big data presents a far greater opportunity than what is being utilized. Only 0.5% of big data
is being utilized and analysed while there is potential for so much more (Guess, 2015). Bearing in
mind this huge potential, literature providing empirical evidence of the business value added by big
data analytics in a supply chain remains little and even poor (Wamba, 2017).
All supply chain operations and activities are set in motion by the final customers’ demand
(Syntetos et al., 2016). Demand forecasting is used as a basis to make supply chain strategy (Marshall,
Dockendorff, & Ibáñez, 2013) and forecasting weaknesses is one of the main reasons for supply chain
failures (Zadeh, Sepehri, & Farvaresh, 2014). Demand Forecasting can be improved significantly by
using big data (Chao, 2015), especially the big data from social media (Arias, Arratia, & Xuriguera,
2014). With an increase in social media activity, there has been an emergence of academic and
industrial research that taps into these social media data sources. However, the utilization of these
data sources remain at an early stage and outcomes are often mixed (Yu, Duan, & Cao, 2013).
Volume 28 • Issue 1 • January-March 2020
Companies face a challenge in forecasting with regards to analysing their historical data in the
same breath as big data from social media (Papanagnou & Matthews-Amune, 2017). There has been an
increased focus from supply chain practitioners to leverage effects from unstructured big data such as
social media data, but there is very little support in terms of empirical evidence (Syntetos et al., 2016).
Integration of social media analytics and supply chain management is needed to comprehensively
establish ‘what can be actually done’ in the field of forecasting with the help of analytics. There
is a paucity of predictive frameworks for forecasting using social media big data. This paper aims
to bridge the gap between traditional forecasting techniques and big data analytics utilization and
contributes towards a forecasting platform using social media big data as well as historical sales data.
This work presents a framework to utilize social media big data in Bass-Emotion Model introduced
by Fan, Che, & Chen (2017). The proposed framework uses the results of sentiment analysis on
Facebook and Twitter for demand forecasting. This work provides empirical evidence on the usage
of social media big data for demand forecasting in supply chain management (Choi, 2018; Schaer,
Kourentzes, & Fildes, 2018). It is one of the first studies that incorporates word analysis, topic
modelling and sentiment analysis to provide social media data parameters to the Bass- Emotion model.
Diverse, massive and complex data on different domains of business and technology which cannot
be efficiently addressed by the traditional technologies, skills, and infrastructure is referred to as
big data. Most big data researchers and practitioners in general agree on three dimensions that
characterize big data: volume, velocity and variety (Zikopoulos & Eaton, 2011). Big data analytics in
supply chain management can be described as applying analytical techniques on big data to facilitate
optimization and decision making in a supply chain (Souza, 2014). The use of big data analytics can
help us understand ‘what has happened, what is happening at the moment, what will happen and why
things happen’ (Feki & Wamba, 2016 p.1127). Three distinct analytics approaches for answering
these questions have been classified as descriptive, predictive, and prescriptive analytics (Hahn &
Packowski, 2015). The most valued use of big data analytics in a supply chain is the ability it provides
to analysts in predicting a reaction or an event by detecting changes based on current or historical
data (Sanders, 2014). The utilization of current data, is very effective in improving a supply chain
which is seeing a start in its use now in industry. Amazon has patented Anticipatory Shipping’ which
predicts based on an analysis of previous orders and other factors such as customersshopping trend to
anticipate that when and by whom a certain product will be bought and ship it in advance and deliver
it instantly after the order has been placed (Kopalle, 2014). Another example is that of DHL. DHL is
implementing big data analytics to re-route their vehicles and re-define the delivery/picking sequence
to save significant time; additionally, DHL has also developed ‘MyWays’: a crowd-based platform
that assigns the parcels to daily commuters, students and taxi drivers by their geo-location and usual
routes which in turn improves the efficiency of the last-mile delivery (Jeske, Grüner, & WeiB, 2013).
Most important aspect which hinders maximum utilization of big data is the lack of analytical
techniques and applications which could be used to convert the unstructured data from various sources
to business intelligence for the user (Sanders, 2014). This calls for more practical applications and
techniques to be introduced which use big data analytics for improving decision making in supply
chain management. To cater for this call, this paper introduces a framework which utilizes social
media big data to update the demand forecast while also using information from the related product’s
sale. The proposed framework will generate direct implications to supply chain practitioners who are
keen to utilize customers’ opinions for improving their demand forecasting.
Volume 28 • Issue 1 • January-March 2020
Social Media is defined as “ a conversational, distributed mode of content generation, dissemination,
and communication among communities” (Zeng et al., 2010 p. 13). Social Media is an effective
sensor when it comes to receiving signals from potential customers. Social media data contains
emotions, opinions, and preferences which makes it potentially useful as a market sensing platform
but with social media data being qualitative, unstructured and subjective form of big data, it calls
for a different analytics approach from traditional approach used in big data (Wong, Chan, & Lacka,
2017). Descriptive analytics, network analytics and content analytics have been identified as three
major type of analytics which can be used to create value from social media data (Chae, 2015). As
the concern of this study is analysis of the text on Twitter and Facebook, content analytics will be
used. Three main dimensions have been identified in the content analytics domain through which
social media data can be used to create value for a supply chain forecasting in the proposed framework
which are sentiment analysis, word analysis and topic modelling.
Sentiment Analysis
Analysing people’s opinion, sentiment, evaluation, attitude, judgment and emotions towards tangible
or intangible objects, issues or attributes, such as, product, service, organizations, individuals, events,
topics is known as Sentiment Analysis (Liu, 2012). Twitter and Facebook are a very tempting source
for sentiment analysis due to the variety, velocity and volume (3vs of big data) of the available content.
But informal style of posts and tweets, length of tweets, the resulting use of special symbols in posts
makes it challenging to extract high performance result from analysis on these sources. Appraisal
theory (Scherer, 2005) describes a way to extract sentiment from text. Arnold and Plutchik (1964)
introduced the basic concept of the theory. The theory lays basis for structured sentiment extraction
that is based on appraisal expression, a basic grammatical unit by which an opinion is expressed .
Korenek and Šimko (2014) utilized appraisal theory to analyse microblogs using sentiment analysis
and categorize sentiments as positive, negative and neutral. The sentiments have been categorized
in the proposed framework utilizing concepts from appraisal theory. Various organizations from
different sectors have used sentiment analysis for gathering information, predicting market response,
election results, product innovation, improving customer service, stock forecasting and supply
chain management as shown in Table 1. Machine learning, lexicon based, statistical and rule based
approaches are the most widely used methods for sentiment analysis (Medhat et al., 2014) but n-gram
analysis and artificial neural networks methods have also been used (Ghiassi, Skinner, & Zimbra,
2013). Fan et al. (2017) used Naïve Bayes (NB) algorithm for sentiment analysis on online reviews
for use in product forecasting. NB algorithm is better suited to classifications where text is treated
independently. Cui et al. (2017) used Support Vector Machine (SVM) for classifying text from social
media for event detection. In the proposed framework, both NB and SVM algorithm are used but
different from all it is being applied on social media data from Twitter and Facebook and is used in
conjunction with trend and word analysis results.
Topic Modelling
Social media sources provide huge amount of information every day and with proper tools an
understanding of the trends of that information for actionable insights can be developed. Topic
Modelling is typically used to uncover industry data across a certain topic or domain (Kwak, Lee,
Park, & Moon, 2010), such as product demands, consumer insights, and service quality of an industry.
It can help business managers or decision makers to predict the future behaviours or trends of a
community based on a relevant set of data. Lansley and Longley (2016) demonstrates a way to use
Twitter information to analyse and present geographical trends using Latent Dirichlet Allocation
(LDA). Blei, Ng and Jordan (2003) describes LDA as an unsupervised model which is used to find
possible topics from collections of text.
Volume 28 • Issue 1 • January-March 2020
Word Analysis
Word analysis of social media data encompasses term frequency analysis, word cloud formation and
clustering (Chae, 2015). Term frequency is used to identify key words and phrases from the dataset
by use of algorithms such as n-gram. N-gram combines adjacent words of length ‘n’ from the given
dataset to capture the language structure from statistical point of view. Word cloud is a visually
appealing method to get an overview of the text (Heimerl et al., 2014). Word analysis have been used
frequently in literature for text summarization (Kuo, Hentrich, Good, & Wilkinson, 2007), opinion
mining (Wu et al., 2010) and text visualization (Stasko, Görg, Liu, & Singhal, 2007), patent analysis
(Koch et al., 2011) and investigative analysis (Stasko et al., 2007). In the proposed framework, word
analysis is used to get an overview of the text being used for the selected keywords and to identify
related words to add to the search.
Table 1. Studies based on sentiment analysis
Research Topic Previous work with description
Stock Forecasting Arias et al. (2013) and Bollen et al. (2011) have used
social media analytics for stock forecasting using twitter
Srivastava et al. (2016) and (Zhang, Xu, & Xue, 2017)
used sentiment analysis and transaction data to predict
market trends for stock market customers.
Ren, Wu and Liu (2018) used SVM with sentiment
analysis to predict market movements.
Brand management Ghiassi et al. (2013) have used sentiment analysis from
twitter data for brand management employing techniques
such as n-gram analysis and artificial neural networks.
Election results Oliveira, Bermejo and dos Santos (2017) compared results
from sentiment analysis on social media data to traditional
opinion surveys and found it 1 to 8% more accurate for
predicting election results.
Giglietto (2012) used likes on Facebook pages to the
study the predictive power of Facebook to forecast Italian
elections in 2011.
Product Innovation KIA motors and The Royal Bank of Canada, have used
sentiment analysis to innovate new products (Kite, 2011).
Supply Chain Management Singh et al. (2017) presented a framework for improving
supply chain management in food industry using sentiment
Swain and Cao (2017) explored the sharing of information
by supply chain members on social media and by using
sentiment analysis gauged its association with supply
chain performance.
Box Office Forecasting Asur and Huberman (2010) presented a study to use data
from Twitter for Box Office forecasting using sentiment
Customer Service Bank of America used sentiment analysis to recognize key
issues facing their customers by collecting and analysing
texts from different social media sources (Purcell, 2011).
Malhotra et al. (2012) used sentiment analysis to
implement improved marketing methods using Twitter.
Volume 28 • Issue 1 • January-March 2020
Social Media Analytics in Supply Chain
Getting accurate information from extremely noisy data such as social media data, is a big challenge
and as is unifying all social media data and making sense of it, which hinders wide use of social
media analytics. Table 2 lists the major studies which have used social media big data in supply
chain management. In the last few years, there has been a growing interest in utilizing value from
social media data in supply chain management as evident from Table 2. But there is still a lack of
accurate models for supply chain management which utilize social media data. One of the reason is
that with extremely noisy sources such a social media getting the external casual factors right is a
big challenge. Making sense of all the casual data (particularly social media) poses a big question
for supply chain practitioners and software developers and requires further research (Syntetos et
al.,2016). The framework proposed in this paper tries to address this issue.
The authors have developed a framework for extracting maximum benefits out of social media in terms
of product forecasting. Three main dimensions were identified from the literature and experimentation
through which social media data can be used to create value in demand forecasting which are sentiment
analysis, word analysis and topic modelling. The framework utilizes these dimension for using social
media analytics to improve demand forecasting. The framework consists of data collection and
preprocessing, sentiment extraction and building of forecasting model as shown in Figure 1.
Data is collected and preprocessed using following methods in the given order.
Table 2. Use of social media analytics in supply chain
Research Topic Previous work with description Used Feature
Supply Chain Forecasting (Chong, Li, Ngai, Ch’ng, & Lee, 2016) conducted a study
using neural network and sentiment analysis to see effect of
online user generated contents on product sales.
Three-layered neural
Sentiment Analysis
Choi (2016) analytically explored the impact of positive
sentiment on social media on market demand of fashion
Word Analysis
Beheshti-Kashi (2015) explored whether microblogging
websites such as Twitter can be used for predicting fashion
Trend Analysis
Boldt et al., (2016) tested utilization of Facebook data for
predicting sales of Nike Products and the effects of events on
activity on Nike’s Facebook pages.
Event Study
Supply Chain
Chae (2015) developed a framework to study usefulness of
twitter information in supply chain management.
Descriptive Analytics
Content Analytics
Network Analytics
Sianipar and Yudoko (2014) concluded in their work that
social media integration with a supply chain can be helpful
to improve collaboration among supply chains and to
increase the agile response of a supply chain.
Content Analysis
Singh et al. (2017) presented a framework for improving
supply chain management in food industry using sentiment
Sentiment Analysis
Volume 28 • Issue 1 • January-March 2020
Keywords Identification
The first step is to identify the initial keywords to be provided by the user. Keywords are used to
harvest public data from Facebook and Twitter which are selected after input from the user. N-gram
is then applied.
Figure 1. Overview of the demand forecasting framework using social media big data
Volume 28 • Issue 1 • January-March 2020
API Streaming
The process of getting data from Twitter and Facebook is the next step and it starts authentication
from Twitter and Facebook APIs and establishing a connection. After the authentication, data can
be captured using different platforms such as R and Python.
Data Cleaning
The Twitter and Facebook data extracted contains a lot of details (tweets, posts, number of comments,
coordinates, embedded URLs, hashtags, retweet count, number of follower, username, location). This
data is then transformed using data parsing, data cleansing and noise cancellation to get only relevant
data for analysis. All those SMDs (Social Media datasets) collected from Facebook and Twitter are to
be neglected which contained less than three words as they didn’t represent the customer comments
in focus. SMDs from users with 2000 plus posts or tweets are also discarded. If a user is tweeting
or posting on the same subject with high frequency those will also be discarded to prevent bias as
the results which include these are skewed by the company’s marketing campaign. Beheshti-Kashi,
Karimi, Thoben, Lütjen, & Teucke (2015) had similar results in their study when they found URLs
linked of such tweets and posts to eBay shops. In the final step of data cleansing, the pre-processing
of the collected data is done which is mainly cleaning the data. This includes removing URL links,
symbols, punctuation and spaces to transform cases.
Word Analysis
Word analysis of social media data encompasses term frequency analysis, word cloud formation and
clustering (Chae, 2015). Term frequency is used to identify key words and phrases from the dataset
by use of algorithms such as n-gram. In the proposed framework, n-grams that occur with frequency
above the selected threshold are selected. This step involves identifying keywords for the products
using word analysis. It is then later compared to quantitative result from the sentimental analysis
obtained by rating positive and negative words being used. Bounding Boxes and restricting region
approach is used which helps in extracting more useful data from the API (Singh et al., 2017). Specific
keywords and exact regions are used to make sure of the accuracy of the data.
In the second major part of the framwork topic modelling is performed to form different groups of
text extraced from Facebook and Twitter in terms of product type, colour and brand.
Topic Modelling
LDA is used in the proposed framework to identify topics related to a product and then perform
sentiment analysis on the groups. It is described as an unsupervised model which is used to find possible
topics from text collections (Blei et al., 2003). LDA is applied using R and the library ‘topicmodels’.
Sentiment Analysis
Liu (2012) provides an English Lexicon of about 6800 words which has been amended and used for
the purpose of Sentiment Analysis . NB method (Yu et al., 2013) is used for polarity classification
with the aim of obtaining a sentiment index for each SMD. Three categories of sentiment are positive,
negative and neutral. The value of Wtk is calculated using the NB and SVM method. ‘R’ is the
software used in this study. NB is applied using ‘E1071’ library in R and SVM using ‘caret’ package
in R. ‘Caret’ package has in built algorithms for different machine learning algorithms including
decision tree, K-Nearest Neighbours(KNN) and SVM. In this instance, the authors are using only
SVM from caret package.
Volume 28 • Issue 1 • January-March 2020
The sentiment index in time period t, Wt, is calculated by W W c
 
( ) where value of
‘c’ is from 1 to -1 depending on the category of W
i.e. sentiment value of the SMD(positive,
negative, neutral) and h is the number of SMDs.
In this framework, the Bass Emotion Model (Fan et al., 2017) is extended to include sentiment analysis
results from SMDs collected in the first step. In the Bass model (Bass, 2004), potential buyers are
classified as innovators and imitators, and then the general form of the Bass model is as follows.
S t m e
p q t
p q t
 
 
 
 
 
 
where S(t) is the cumulative sales by the end of time period t. p refers to the coefficient of
innovation, q refers to the coefficient of imitation, and m refers to the total number of potential
adopters. m and p are calculated using historical sales data. q is related to the sentiment and can be
perceived as a function of the social media sentiment q f Wt
 
. From the SMDs, if positive
sentiment is obtained it means that social media users are talking positively about the product and it
gives a potential increase in adopters q and vice versa. The function is described as
qq q
q q q e
 
 
0 0
where q denotes the effect of word of mouth via social media. q0 refers to the minimum of q, qm
refers to the maximum of q. ϒ is a constant that represents the slope of the sales curve. ϒ is calculated
using historical product data.
The study was conducted at an apparel retail company. Focal company’s business model is buying
and selling apparel products. The suppliers are from different countries encapsulating Far East,
South Asia and Europe. Clothes are imported from these countries as well as bought from the local
market and then sold to more than 60 countries throughout the world. The complete supply chain is
huge spanning four continents. The focal apparel retail company was chosen because of importance
of customer-oriented content in apparel industry and because of the focal company’s significant
presence on social media.
It is difficult to coordinate longer apparel supply chains, so it becomes really important to have
very accurate demand forecasting (Syntetos et al., 2016). Traditional forecasting methods like time
series data don’t work particularly will in an apparel industry as designs and items of one season are
typically replaced next season by new collections and trends, and therefore, companies often face a
lack of historical sales data (Thomassey, 2010). Moreover, demand in the industry is significantly
influenced by additional factors such as the economic situation, events or changing weather conditions
(Thomassey, 2014). Many practitioners have been using univariate method (Au et al., 2008) for supply
chain forecasting in apparel industry which utilizes historical sales data and it is assumed that the
Volume 28 • Issue 1 • January-March 2020
underlying variation of data is constant. For instance, Wong and Guo (2010) utilized one-step-ahead
sales data to predict the sales of medium-priced fashion products in Mainland China. Au et al. (2008)
used previous time series data to predict the sales of T-shirt and jeans from several shops with the
use of neural networks. The sales of products in apparel industry are volatile, often influenced by
changing trends and weather conditions and events. So, for the forecasting purposes, it is not right
to hypothesize that the trend of time series sales data is unchanged. To cope with this, researchers
integrate other influencing factors as the inputs of forecasting models besides the historical time
series data, which is known as multivariate forecasting. Beheshti-Kashi (2015) has presented current
fashion forecasting approaches in the industry and academia. Most successful techniques surveyed
were Extreme machine learning(Sun, Choi, Au, & Yu, 2008), evolutionary neural network (ENN) (Au
et al., 2008; Wong & Guo, 2010), Thomassey and Happiette fuzzy inference systems (Thomassey,
Happiette, & Castelain, 2005) and hybrid intelligent sales forecasting model (Aburto & Weber, 2007).
Most of the forecasting models discussed above give reliable results for middle and long-term
forecasting. But due to a very competitive market and short selling span accurate and customer centric
and short-term forecasting is necessary. With the advent of information technology and affordable
information systems, most companies (big and small) have developed or implemented information
systems from which they get sales reports, graphs and even forecasts. With the advent of social
media data, this is not enough to be competitive. Data gathered by the companies needs to add the
information circulating on social media, which could deliver another type of insight for forecasting
and result in the increased competitiveness especially for creative industry such as apparel industry
with the involvement of potential customers in style design, colour preference and judging trends,
and scope for new products (Banica & Hagiu, 2016).
Short term forecasting methods have not been explored as much (N. Liu, Ren, Choi, Hui, & Ng,
2013). Short term forecasting is very important in the apparel industry because of the ever-changing
trends and short selling times. For this purpose, Beheshti (2015) suggested adding social media to
the discussion of fashion forecasting and Syntetos et al. (2016) predicted that future of supply chain
forecasting will include predictive analytics based on social media data. For an apparel supply chain,
there can be multiple topics of interest which are being discussed in social media. The authors try to
utilize these topics to make this data viable using the proposed framework for supply chain forecasting
in apparel industry.
For the implementation of the framework, company sales and social media data i.e. Twitter and
Facebook data was collected. This data was collected for a period of six weeks. Data collection for
this study began in July 2016 and data was collected till August 15, 2016. Beheshti-Kashi (2015) did a
study for exploration of trends using twitter and found out it hard to present the finding in quantitative
form. To cater for this issue, the authors expanded the study by analysing specificities and increased
the amount of data collection by including both Facebook and Twitter so results could be presented
in quantitative form. The period of six weeks was chosen with the insights from the user, which in
this case is the supply chain manager of the focal company. ‘Shorts’ were selected as the product to
be used for the study. For collection of data from social media i.e. Twitter and Facebook, APIs were
used and the related SMDs was analysed. Only those SMDs were selected which were either brand
related, product type related, or a fashion trend related. Data was collected every 7 days as twitter
allowed tweets to be collected which were 7-8 days old. SMDs were extracted for brand and products.
Hashtags and texts for the brands sold by the focal company were analysed. The total number of
tweets analysed were 1,208,650. For the category product type shorts were chosen as they were the
most selling item as the data was collected in summers. SMDs were collected against different type
of shorts as shown in Table 3 and for different brands as shown in Table 4. As this data of brands was
analysed there were a lot of data which wasn’t related to the brand or products of the focal company.
One such example was #next being used for election campaign in United States. After extraction of
text, it was used to form word clouds which can be helpful in manual inspection of the data gathered
as the viewer can get a general idea about the kind of words being used and this can later be used for
Volume 28 • Issue 1 • January-March 2020
cross checking the results obtained by sentimental analysis to make sure no anomaly has occurred
during the process. Word Clouds were formed before and after processing and cleaning of data to
investigate manually the dataset being used for sentiment extraction. Figure 2 displays a word cloud
for keyword ‘nike’ before data cleaning process. The noise in this dataset is evident as there are words
from different languages and some completely unrelated words. Figure 3 displays the word cloud
after data cleaning which removes all the unrelated SMDs.
For a period of 6 weeks, the SMDs were analysed and then compared to the sales period for
that period as well as next 6 weeks. Table 5 shows the sentiment analysis score for different product
categories after application of SVM and then calculation of parameter q. Analysis of sentiment score
show that the amount of sales had a co relation with the sentiment around that particular brand or
colour. There was no co relation found when sentiment analysis was done for the product type which
could be attributed to the noise in the data as single word or single product search was susceptible to
much more noise than a search using words for multiple characteristics. Multiple character searches
with positive sentiment lead to an increase in sale and the negative sentiment lead to a decrease.
Analysing the tweets and Facebook comments for running shorts and running a sentiment analysis on
it using SVM and NB methods. Comparison of the results of these models have been shown in Table 7.
Figure 2. Word cloud for brand ‘Nike’
Volume 28 • Issue 1 • January-March 2020
The results from sentiment analysis were then used in Bass Emotion model to predict the sales.
The parameters m,p and γ for Bass- Emotion model were calculated using historical sales data and q
was calculated using sentiment analysis from SMDs. Parameters calculated are represented in Table 8.
All these parameters were calculated using R. Table 6 shows the forecasting accuracy of the proposed
emotion enhanced model which is a significant improvement on the forecasting accuracy of original
Bass Model. Figure 4 displays the forecasted values using proposed model compared to actual values.
This paper introduced a framework that provides a way of utilizing social media big data in Bass-
Emotion Model for demand forecasting using results from sentiment analysis on Facebook and
Twitter data. As social media data is very noisy, it is difficult to make accurate predictions from social
media data about products in general but if the products are broken down and multiple characteristics
search is applied then the information which is collected can be converted as a demand forecasting
and market or trend sensing tool. The major factor in extracting value from the social media is to
apply multiple data cleaning techniques in conjunction with one another, so the data subjected to
Figure 3. Word cloud after data cleaning
Volume 28 • Issue 1 • January-March 2020
later analysis gives reliable results as described in the framework presented in the paper. More than
1200,000 tweets, posts and comments from Facebook and Twitter were analysed in the case study.
The study showed that social media big data is extremely useful for apparel industry and can be very
effective if used to support demand forecasting. With proper modelling and implementation of right
techniques, social media big data has the potential to help forecast with accuracy. Results from this
study shows a co relation between customers opinion on Facebook and Twitter to actual sales. The
framework presented in this study can be further verified and improved with the help of case studies
to make it a reliable mechanism for using social media big data in demand forecasting.
As this a relatively new research area, there is a considerable need for enhancing our understanding
social media data in supply chain contexts. One area which needs urgent work, is developing detailed,
Table 3. Keywords used for SMDs extraction for ‘shorts’
Shorts#nike Shorts#green Shorts#swimming zara#swimmingshorts
Shorts#adidas Shorts#navy Shorts#running zara#runningshorts
Shorts#reebok Shorts #jersey nike#jerseyshorts zarablack#jerseyshort
Shorts#next Shorts #cargo nike #cargoshorts zarablack#cargoshorts
Shorts#blue Shorts#jorts nike #jorts zarablack#jorts
Shorts#black Shorts#fleece nike #fleeceshorts zarablack#fleeceshort
Shorts#grey Shorts#gym nike #gymshorts zarablack#gymshorts
Shorts#swimming nike#swimmingshort Shorts#swimming adidas#swimmingshor
Shorts#running nike#runningshorts Shorts#running puma#runningshorts
nike#jerseyshorts nikeblack#jerseyshor adidas#jerseyshorts nikeblack#jerseyshort
nike#cargoshorts nextblack#cargoshor adidas#cargoshorts pumablack#cargoshts
nike #jorts nike black#jorts adidas #jorts nike black#jorts
nike #fleeceshorts nikeblack#fleeceshor adidas#fleeceshorts nikeblack#fleeceshort
adidasShorts#ru nike#runningshorts adidasShorts#runni puma#runningshorts
next#jerseyshorts nikeblack#jerseyshorts adidas#jerseyshorts pumablack#jerseyshorts
next #cargoshorts nextblack#cargoshors adidas#cargoshorts pumblack#cargoshorts
next #jorts nike black#jorts adidas #jorts puma black#jorts
next #fleeceshorts nikeblack#fleeceshorts adidas#fleeceshorts pumablack#fleeceshorts
next #gymshorts nikeblack#gymshorts adidas #gymshorts pumablack#gymshorts
Table 4. Number of Brands and Product Related SMDs for week 1
Brand # of SMDs Product Type # of SMDs
Zara 12,456 #jerseyshorts 651
Nike 29,435 #cargoshorts 543
Adidas 36,792 #jorts 189
NEXT 71,234 #gymshorts 984
BHS 61,281 #swimmingshorts 429
Puma 23,124 #runningshorts 183
Volume 28 • Issue 1 • January-March 2020
Table 5. Product type with sentiment analysis score
Product Type Sales Number of
Analysis Score
Product Type Sales Number of
Analysis Score
Nike Jersey
1120 651 0.23 Adidas Jersey
983 156 0.64
Nike Cargo
2832 543 0.12 Adidas Cargo
811 531 0.12
Nike Denim
563 189 0.70 Adidas Denim
641 145 0.53
Nike Fleece
212 84 0.34 Adidas Fleece
1212 821 0.31
Nike Gym
984 984 0.05 Adidas Gym
1944 547 0.43
1367 429 0.76 Adidas
937 122 0.53
Table 6. Comparison of forecasted and actual values for Bass Model and proposed Emotion Enhanced Model
Forecasting week 1 2 3 4 5 6
Actual value 712.3409 817.6867 921.2260 843.5641 926.7657 923.9208
Forecasted value (Bass Model) 704.5435 810.4631 927.0904 841.5382 922.7238 918.6123
Forecasted value (Proposed Model) 708.6674 816.5294 923.1996 844.2350 926.8046 922.7927
Table 7. Comparison of SVM and NB Methods
Product Brand Algorithm Accuracy
Nike NB 67.21
SVM 69.24
Adidas NB 67.46
SVM 75.12
Puma NB 65.24
SVM 71.81
BHS NB 69.42
SVM 78.10
Next NB 63.41
SVM 63.51
Zara NB 75.87
SVM 75.11
Volume 28 • Issue 1 • January-March 2020
practical guidelines, which can help companies in designing industry applications, using Facebook,
Twitter and other social media platforms, for diverse supply chain activities, including new product
development, stake holder engagement, supply chain risk management, and market sensing. Further
research is needed in the implementation of this framework on other industries and using cloud-
based systems. Moreover, sentiment extraction could be improved by including other social media
platforms including YouTube, google trends and Instagram. Sentiment analysis can be implemented
on videos and pictures posted instead of limiting it only to the text. This could further improve the
results as it will take into consideration users from other platforms as well, painting a more accurate
picture of customers sentiment.
Table 8. Parameter for bass model
Parameter Results
m 887.0306
p 0.023777
γ 0.170784
Figure 4. Results of Forecasting Model of Emotion Enhanced Model
Volume 28 • Issue 1 • January-March 2020
Aburto, L., & Weber, R. (2007). Improved supply chain management based on hybrid demand forecasts. Applied
Soft Computing. doi:10.1016/j.asoc.2005.06.001
Arias, M., Arratia, A., & Xuriguera, R. (2014). Forecasting with Twitter Data. ACM Transactions on Intelligent
Systems and Technology. doi:10.1145/2542182.2542190
Arnold, M. B., & Plutchik, R. (1964). The Emotions: Facts, Theories and a New Model. The American Journal
of Psychology. doi:10.2307/1421040
Asur, S., & Huberman, B. A. (2010). Predicting the Future with Social Media. Journal of Interactive Marketing.
Au, K. F., Choi, T. M., & Yu, Y. (2008). Fashion retail forecasting by evolutionary neural networks. International
Journal of Production Economics. doi:10.1016/j.ijpe.2007.06.013
Banica, L., & Hagiu, A. (2016). Using big data analytics to improve decision-making in apparel supply chains.
In Information Systems for the Fashion and Apparel Industry. doi:10.1016/B978-0-08-100571-2.00004-X
Bass, F. M. (2004). A New Product Growth for Model Consumer Durables. Management Science. doi:10.1287/
Beheshti-kashi, S. (2015). Twitter and Fashion Forecasting : An Exploration of Tweets regarding Trend
Identification for Fashion Forecasting. Academic Press.
Beheshti-Kashi, S., Karimi, H. R., Thoben, K.-D., Lütjen, M., & Teucke, M. (2015). A survey on retail sales
forecasting and prediction in fashion markets. Systems Science & Control Engineering: An Open Access Journal.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning
Research. 10.1162/jmlr.2003.3.4-5.993
Boldt, L. C., Vinayagamoorthy, V., Winder, F., Schnittger, M., Ekran, M., Mukkamala, R. R., & Vatrapu, R.
(2016). Forecasting Nike’s sales using Facebook data. In Proceedings - 2016 IEEE International Conference
on Big Data, Big Data 2016. IEEE. doi:10.1109/BigData.2016.7840881
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational
Science. doi:10.1016/j.jocs.2010.12.007
Chae, B. (2015). Insights from hashtag #supplychain and Twitter analytics: Considering Twitter and Twitter
data for supply chain practice and research. International Journal of Production Economics. doi:10.1016/j.
Chao, L. (2015). Big Data Brings Relief to Allergy Medicine Supply Chains - WSJ. Retrieved September 18,
2017, from
Choi, T.-M. (2016). Incorporating social media observations and bounded rationality into fashion quick response
supply chains in the big data era. 10.1016/j.tre.2016.11.006
Choi, T. M. (2018). Incorporating social media observations and bounded rationality into fashion quick response
supply chains in the big data era. Transportation Research Part E, Logistics and Transportation Review.
Chong, A. Y. L., Li, B., Ngai, E. W. T., Ch’ng, E., & Lee, F. (2016). Predicting online product sales via online
reviews, sentiments, and promotion strategies: A big data architecture and neural network approach. International
Journal of Operations & Production Management. doi:10.1108/JFM-03-2013-0017
Cui, W., Wang, P., Du, Y., Chen, X., Guo, D., Li, J., & Zhou, Y. (2017). An algorithm for event detection based
on social media data. Neurocomputing. doi:10.1016/j.neucom.2016.09.127
Dijcks, J.-P. (2013). Oracle : Big Data for the Enterprise. Academic Press.
Volume 28 • Issue 1 • January-March 2020
Fan, Z.-P., Che, Y.-J., & Chen, Z.-Y. (2017). Product sales forecasting using online reviews and historical sales
data: A method combining the Bass model and sentiment analysis. Journal of Business Research. doi:10.1016/j.
Feki, M., & Wamba, S. F. (2016). Big Data Analytics-enabled Supply Chain Transformation : A Literature
Review. 49th Hawaii International Conference on System Sciences, 1123–1132. doi:10.1109/
Fosso Wamba, S. (2017). Big data analytics and business process innovation. Business Process Management
Journal. doi:10.1108/BPMJ-02-2017-0046
Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using n-gram
analysis and dynamic artificial neural network. Expert Systems with Applications. doi:10.1016/j.eswa.2013.05.057
Guess, A. R. (2015). Only 0.5% of All Data is Currently Analyzed - DATAVERSITY. Retrieved September 4,
2017, from
Hahn, G. J., & Packowski, J. (2015). A perspective on applications of in-memory analytics in supply chain
management. Decision Support Systems, 76, 45–52. doi:10.1016/j.dss.2015.01.003
Heimerl, F., Lohmann, S., Lange, S., & Ertl, T. (2014). Word cloud explorer: Text analytics based on word
clouds. Proceedings of the Annual Hawaii International Conference on System Sciences, 1833–1842. doi:10.1109/
Jeske, M., Grüner, M., & Wei, B. F. (2013). Big data in logistics: A DHL perspective on how to move beyond
the hype. DHL Customer Solutions & Innovation.
Khalil Zadeh, N., Sepehri, M. M., & Farvaresh, H. (2014). Intelligent sales prediction for pharmaceutical distribution
companies: A data mining based approach. Mathematical Problems in Engineering. doi:10.1155/2014/420310
Kiron, D., & Shockley, R. (2015). Creating business value with analytics. MIT Sloan Management Review.
Koch, S., Bosch, H., Giereth, M., & Ertl, T. (2011). Iterative integration of visual insights during scalable patent
search and analysis. IEEE Transactions on Visualization and Computer Graphics. doi:10.1109/TVCG.2010.85
Kopalle, P. (2014). Why Amazon’s Anticipatory Shipping Is Pure Genius. Retrieved September 4, 2017,
Korenek, P., & Šimko, M. (2014). Sentiment analysis on microblog utilizing appraisal theory. World Wide Web
(Bussum). doi:10.1007/s11280-013-0247-z
Kuo, B. Y.-L., Hentrich, T., & Good, B. M., & Wilkinson, M. D. (2007). Tag clouds for summarizing
web search results. Proceedings of the 16th International Conference on World Wide Web - WWW ’07.
Kwak, H., Lee, C., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Network.
Lansley, G., & Longley, P. A. (2016). The geography of Twitter topics in London. Computers, Environment and
Urban Systems. doi:10.1016/j.compenvurbsys.2016.04.002
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers. doi:10.2200/
Liu, N., Ren, S., Choi, T. M., Hui, C. L., & Ng, S. F. (2013). Sales forecasting for fashion retailing service
industry: A review. Mathematical Problems in Engineering. doi:10.1155/2013/738675
Malhotra, A., Kubowicz, C., & See, A. (2012). How to Get Your Messages Retweeted. MIT Sloan Management
Marshall, P., Dockendorff, M., & Ibáñez, S. (2013). A forecasting system for movie attendance. Journal of
Business Research, 66(10), 1800–1806.
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain
Shams Engineering Journal. 10.1016/j.asej.2014.04.011
Volume 28 • Issue 1 • January-March 2020
Oliveira, D. J. S., Bermejo, P. H. de S., & dos Santos, P. A. (2017). Can social media reveal the preferences of
voters? A comparison between sentiment analysis and traditional opinion polls. Journal of Information Technology
& Politics. doi:10.1080/19331681.2016.1214094
Papanagnou, C. I., & Matthews-Amune, O. (2017). Coping with demand volatility in retail pharmacies with the
aid of big data exploration. Computers & Operations Research.
Ren, R., Wu, D. D., & Liu, T. (2018). Forecasting Stock Market Movement Direction Using Sentiment Analysis
and Support Vector Machine. IEEE Systems Journal.
Sanders, N. R. (2014). Big data driven supply chain management: A framework for implementing analytics and
turning information into intelligence. Pearson Education.
Schaer, O., Kourentzes, N., & Fildes, R. (2018). Demand forecasting with user-generated online information.
International Journal of Forecasting.
Scherer, K. R. (2005). Appraisal Theory. In Handbook of Cognition and Emotion.
Schroeck, M., Shockley, R., Smart, J., Romero-Morales, D., & Tufano, P. (2012). Analytics: The real-world use
of big data. IBM Global Business Services Saïd Business School at the University of Oxford.
Sianipar, C. P. M., & Yudoko, G. (2014). Social media: Toward an integrated human collaboration in supply-chain
management. WIT Transactions on Information and Communication Technologies. doi:10.2495/Intelsys130221
Singh, A., Shukla, N., & Mishra, N. (2017). Social media data analytics to improve supply chain management
in food industries. Transportation Research Part E: Logistics and Transportation Review.
Souza, G. C. (2014). Supply chain analytics. Business Horizons. doi:10.1016/j.bushor.2014.06.004
Stasko, J., Görg, C., Liu, Z., & Singhal, K. (2007). Jigsaw: Supporting investigative analysis through interactive
visualization. VAST IEEE Symposium on Visual Analytics Science and Technology 2007, Proceedings. https:// doi:10.1109/VAST.2007.4389006
Sun, Z.-L., Choi, T.-M., Au, K.-F., & Yu, Y. (2008). Sales forecasting using extreme learning machine with
applications in fashion retailing. Decision Support Systems. doi:10.1016/j.dss.2008.07.009
Swain, A. K., & Cao, R. Q. (2017). Using sentiment analysis to improve supply chain intelligence. Information
Systems Frontiers. doi:10.1007/s10796-017-9762-2
Syntetos, A. A., Babai, Z., Boylan, J. E., Kolassa, S., & Nikolopoulos, K. (2016). Supply chain forecasting: Theory,
practice, their gap and the future. European Journal of Operational Research. doi:10.1016/j.ejor.2015.11.010
Thomassey, S. (2010). Sales forecasts in clothing industry: The key success factor of the supply chain management.
International Journal of Production Economics. doi:10.1016/j.ijpe.2010.07.018
Thomassey, S. (2014). Sales Forecasting in Apparel and Fashion Industry. Intelligent Fashion Forecasting
Systems: Models and Applications. 10.1007/978-3-642-39869-8
Thomassey, S., Happiette, M., & Castelain, J. M. (2005). A global forecasting support system adapted to textile
distribution. International Journal of Production Economics. doi:10.1016/j.ijpe.2004.03.001
Waller, M. A., & Fawcett, S. E. (2013). Data Science, Predictive Analytics, and Big Data: A Revolution That Will
Transform Supply Chain Design and Management. Journal of Business Logistics, 34(2), 77–84. doi:10.1111/
Wang, G., Gunasekaran, A., Ngai, E. W. T., & Papadopoulos, T. (2016). Big data analytics in logistics and supply
chain management: Certain investigations for research and applications. International Journal of Production
Economics. doi:10.1016/j.ijpe.2016.03.014
Wong, T. C., Chan, H. K., & Lacka, E. (2017). An ANN-based approach of interpreting user-generated comments
from social media. Applied Soft Computing. doi:10.1016/j.asoc.2016.09.011
Volume 28 • Issue 1 • January-March 2020
Rehan Iftikhar is a Marie-Curie Research Fellow and a 2nd year PhD student at School of Business, Maynooth
University. He holds a Master’s degree in Engineering Management from University of Exeter. His current research
interests include digital retail, information systems and big data. His work has appeared in various journals and
conference proceedings including Journal of Global Information Management, British Food Journal, Academy of
Management Global Proceedings and International Conference on Information Systems Development. Rehan is
the corresponding author and can be contacted at:
Mohammad Saud Khan, PhD, is a Senior Lecturer in the area of Strategic Innovation and Entrepreneurship at
Victoria University of Wellington, New Zealand. Before taking up this role, he was positioned as a Postdoctoral
Researcher at the University of Southern Denmark. Having a background in Mechatronics (Robotics & Automation)
Engineering, he has worked as a field engineer in the oil and gas industry with Schlumberger Oilfield Services in
Bahrain, Saudi Arabia, and the United Kingdom. His current research interests include innovation management
(especially the implications of big data and 3D printing), technology, and social media entrepreneurship.
Wong, W. K., & Guo, Z. X. (2010). A hybrid intelligent model for medium-term sales forecasting in fashion
retail supply chains using extreme learning machine and harmony search algorithm. International Journal of
Production Economics. doi:10.1016/j.ijpe.2010.07.008
Wu, Y., Wei, F., Liu, S., Au, N., Cui, W., Zhou, H., & Qu, H. (2010). OpinionSeer: Interactive visualization of hotel
customer feedback. IEEE Transactions on Visualization and Computer Graphics. doi:10.1109/TVCG.2010.183
Yu, Y., Duan, W., & Cao, Q. (2013). The impact of social and conventional media on firm equity value: A
sentiment analysis approach. Decision Support Systems. doi:10.1016/j.dss.2012.12.028
Zeng, D., Chen, H. C. H., Lusch, R., & Li, S.-H. (2010). Social Media Analytics and Intelligence. IEEE
Intelligent Systems.
Zhang, G., Xu, L., & Xue, Y. (2017). Model and forecast stock market behavior integrating investor sentiment
analysis and transaction data. Cluster Computing. doi:10.1007/s10586-017-0803-x
Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming
data. McGraw-Hill Osborne Media.
... Like data science, big data and data analytics attracted the attention of many people, especially in the world of practitioners (Iftikhar & Khan, 2020;Bag et al., 2022;Jayawardena et al., 2022;Singh et al., 2022). In August 2014, two researchers performed a search on Google using phrases "big data," "analytics," and "data science." ...
... The use of social media or social networks by business organizations for the purpose of information dissemination has been widely studied for many years. Many studied the positive impact of social media (Vakeel & Panigrahi, 2016;Iftikhar & Khan, 2020;Sohaib, 2021;Li, et al., 2022), while some investigated its potential drawback (Le, 2019;Sun et al., 2022). ...
Full-text available
As the data-analysts job market grows, many colleges and universities have started offering a data analytics curriculum. However, there are potential gaps between the skills business organizations expect of data analysts and the skills universities and colleges teach their students. This study collected 2500+ data-analyst job ads posted on LinkedIn and analyzed them using distribution analysis, cross-tabulation analysis, and cluster analysis. Among many findings, this study identified five most essential nontechnical skills and five most essentials areas of technical skills. In addition, of 90+ computer programs business organizations expect data analysts to use, this study identified SQL, Microsoft Excel, Tableau, Python, and Microsoft Power BI to be the five most essential computer programs for potential data analysts to master.
... Closer to the topic of the current paper, the exploration of general beliefs, attitudes and emotions about global warming, known as sentiment analysis or opinion mining, has become one of the leading research branches benefiting from the development of big data analytics tools (Iftikhar & Khan, 2020;Kirelli & Arslankaya, 2020;Qiao & Williams, 2022). Since the detection of polarity is the primary aim of sentiment analysis, part of the literature focuses on the polarity of Twitter data pertaining to global warming, including Dahal et al. (2019) who investigated 390,016 tweets about global warming using sentiment analysis and found that the overall discussion of global warming had negative polarity. ...
Public opinion surveys over the past 30 years show that public opinion is split on the issue of global warming. One of the problems with “solicited” opinion polls is that the findings may be selectively interpreted in favour of the political goals of a particular interest group. To gain a better understanding of the general public’s unsolicited responses to climate change news, the current study examined Twitter messages containing the words “global warming” spanning 16 months. Using a framework combining a sentiment analysis technique, Hedonometer from the perspective of natural language processing and appraisal theory from a discourse analysis perspective, the study shows that the demonstrated happiness level in tweets containing the words “global warming” is consistently lower than the general level on Twitter due to increased use of negative words and decreased use of positive words. The appraisal analysis shows that “Appreciation” is used most frequently and “Affect” least.
... As a result, social data can contain structured, unstructured or semi-unstructured data having text messages, pictures, and videos, and revealing many information of users about locations, opinion, preferences, sentiments, characters, specific features that could be used for many kinds of research and developments. 42 ...
Full-text available
Lifestyles of individuals have changed drastically in the last two decades with the impact of social media platforms which transforms individuals from being users into an asset of social media. The assets now become very precious and seriously attract who can generate useful or harmful values. In this context, studies conducted in the last 5 years are analyzed based on the methodology covering implementation areas, data sources, data size, methods and tools. The studies were classified and summarized under nine main “research fields,” and a “purpose‐based” classification under three main purposes was investigated. The results have shown that even if data obtained from social media platforms are often preferred in the studies, issues such as compliance with legal regulations, data processing, confidentiality and privacy of data also bring difficulties; collection and processing of social big data are a serious obstacle to the realization of many studies; not enough data sources provided by public or private enterprises; most of the studies carried out on text data, and the rest focused on location and image data; mostly machine learning methods are preferred in applications. This study differs from previous literature reviews by revealing comprehensively how social big data can be transformed into practice with a holistic perspective.
... Since 1949, the country has gone from decline to prosperity and blazed a path of fast and steady development. The path is proved to be in accordance with China's economic and social characteristics during every development stage (Halawani et al., 2020;Iftikhar & Khan, 2020;Rialp-Criado et al., 2020;Vatanasakdakul et al., 2020;. China's Ministry of Education's annual statistical data bulletin shows that the total number of postgraduate education institutions reached 0.8 thousand in 2018 for the first time. ...
Full-text available
Using four types of publicly available datasets and ArcGIS software, the authors identify the spatial characteristics of postgraduate education in China at three scales: comprehensive economic zone, provincial, and city. They also employ geographically weighted regression and ordinary least squares to study the factors influencing the spatial pattern of postgraduate education in Gin at the city scale. The findings show that the number of postgraduate education institutions increases as the longitude of a city increases, but the number decreases from coast to inland. Second, postgraduate education institutions tend to group together in provincial capitals and megacities. Finally, GDP, per capita GDP, population size, local income, and total retail sales of consumer goods significantly impact postgraduate education development. The study contributes to the literature and provides insights for practitioners in promoting urban planning and infrastructure development.
Examining the particular value of each platform for big data would be difficult because of the variety of social media forms and sizes. Using social media to objectively and subjectively analyze large groups of individuals makes it the most effective tool for this task. There are numerous sources of big data within the organization. Social media can be identified by the interaction and communication it facilitates. Utilizing social media has become a daily occurrence in modern society. In addition, this frequent use generates data demonstrating the importance of researching the relationship between big data and social media. It is because so many internet users are also active on social media. We conducted a systematic literature review (SLR) to identify 42 articles published between 2018 and 2022 that examined the significance of big data in social media and upcoming issues in this field. We also discuss the potential benefits of utilizing big data in social media. Our analysis discovered open problems and future challenges, such as high‐quality data, information accessibility, speed, natural language processing (NLP), and enhancing prediction approaches. As proven by our investigations of evaluation metrics for big data in social media, the distribution reveals that 24% is related to data‐trace, 12% is related to execution time, 21% to accuracy, 6% to cost, 10% to recall, 11% to precision, 11% to F1‐score, and 5% run time complexity.
Full-text available
The prevalence of social media has inspired multiple researchers to investigate the value of information on various platforms. However, most studies focus on integrating individual views (the wisdom of the crowd), and few studies investigate just one person's effect. To close this gap, this article investigates the impact of Trump's tweets on stock markets. Based on intraday stock market data, this study uses an event study to test the immediate reaction of the stock markets in both the Chinese and U.S. markets. Next, with ordinary least squares (OLS) regression, this study testes the effect of tweets' content features on the returns and volatility of the Chinese and U.S. indices. The results show that Trump's tweets impacted the financial market, especially the returns of the U.S. stock market during the COVID-19 pandemic. With additional analyses based on industry indices and time frequencies, the researchers found that Trump's sentiment on Twitter affected the Chinese financial industry during the trade war and impacted the Chinese pharmaceutical industry during the pandemic.
Management decision-making is increasingly supported by new data types and advanced predictive analytics tools. Prior research suggests that the inclusion of new data types – such as social media data – in forecasting models can improve forecasting. We explore whether managers’ operational decisions differ depending on the data type used by a predictive analytics tool and the consistency of the trend with prior developments. Experimental results show that the extent to which managers use predictions from analytics tools is a joint function of the data type utilized and trend consistency. If a trend predicted by an analytics tool reveals a downward break from prior positive developments (i.e., an unexpected negative trend), managers utilize predictions less if they are mainly based on social media data rather than on traditional accounting data. If a trend predicted by an analytics tool continues a prior positive trend, we do not find such a difference. In supplemental analyses, we explore managers’ comfort level and related attitude concerning the data types and find that only in the trend-breaking condition mediation effects are observed. Together, our findings have important implications for the management accounting function that needs to embed knowledge about managers’ information utilization to facilitate decision-making.
Today, the advent of social media has provided a platform for expressing opinions regarding legislation and public schemes. One such burning legislation introduced in India is the Citizenship Amendment Act (CAA) and its impact on the National Citizenship Register (NRC) and, subsequently, on the National Population Register (NPR). This study examines and determines the opinions expressed on social media regarding the act through a Twitter analysis approach that extracts nearly 18,000 tweets during 10 days of introducing the scheme. The analysis revealed that the opinion was neutral but tended to a more negative reaction. Consequently, recommendations on improving public perception about the scheme by suitable for interpreting the Act to the public are provided in the paper.
Full-text available
Recently, there has been substantial research on augmenting aggregate forecasts with individual consumer data from internet platforms, such as search traffic or social network shares. Although the majority of studies report increased accuracy, many exhibit design weaknesses including lack of adequate benchmarks or rigorous evaluation. Furthermore, their usefulness over the product life-cycle has not been investigated, which may change, as initially, consumers may search for pre-purchase information, but later for after-sales support. In this study, we first review the relevant literature and then attempt to support the key findings using two forecasting case studies. Our findings are in stark contrast to the literature, and we find that established univariate forecasting benchmarks, such as exponential smoothing, consistently perform better than when online information is included. Our research underlines the need for thorough forecast evaluation and argues that online platform data may be of limited use for supporting operational decisions.
Full-text available
Analysis of comments and opinions expressed in social media can be used to gather additional intelligence via market research information to better predict consumer behavior. The area of “opinion mining”, particularly sentiment analysis, aims to find, extract, and systematically analyze people’s opinions, attitudes and emotions towards certain topics. Performance of a supply chain is closely associated with the level of trust, collaboration, and information sharing among its members. In this paper, using textual “sentiment analysis”, we explore the relationship between elements of social media content generated by supply chain members and performance of supply chain. In particular, we identify specific elements of member generated supply chain related content on social media such as: information sharing, collaboration, trust, and commitment to determine their association with supply chain performance. We find information sharing and collaboration to be positively associated with supply chain performance, and these findings are consistent with previous reports in supply chain literature. In addition, ours is one of the first attempts to use sentiment analysis to analyze social media content in a supply chain context. The findings indicate that supply chain members value the sharing of relevant information and collaborative contents on social media as such efforts improve individual and overall supply chain performance. The results of this study should prove useful to other studies that utilize social media in a supply chain context, and to improve supply chain management strategies.
Full-text available
Social network media analytics is showing promise for prediction of financial markets. However, the true value of such data is unclear due to a lack of consensus on which instruments can be predicted. In this paper, we investigate whether measurements of collective emotional states derived from large scale network feeds are correlated to the stock transaction data over time. The information space corresponding to stocks is divided into the network public opinion space \(Opinion\_Space\) and the realistic transaction space \(Behavior\_Space\). We then handle the information and generate the multidimensional time series from them respectively. Furthermore, Granger causality analysis and information theory measures are used to find and demonstrate that social media sentiments contain statistically significant ex-ante information on the future prices. At last, we propose our separate-LSTM model and the experimental results of six stocks which are randomly selected indicate that financial data predictions can be significantly improved through our model by the fusion of network public opinion emotions and realistic transaction data.
Conference Paper
Full-text available
This paper tests whether accurate sales forecasts for Nike are possible from Facebook data and how events related to Nike affect the activity on Nike's Facebook pages. The paper draws from the AIDA sales framework (Awareness, Interest, Desire,and Action) from the domain of marketing and employs the method of social set analysis from the domain of computational social science to model sales from Big Social Data. The dataset consists of (a) selection of Nike's Facebook pages with the number of likes, comments, posts etc. that have been registered for each page per day and (b) business data in terms of quarterly global sales figures published in Nike's financial reports. An event study is also conducted using the Social Set Visualizer (SoSeVi). The findings suggest that Facebook data does have informational value. Some of the simple regression models have a high forecasting accuracy. The multiple regressions have a lower forecasting accuracy and cause analysis barriers due to data set characteristics such as perfect multicollinearity. The event study found abnormal activity around several Nike specific events but inferences about those activity spikes, whether they are purely event-related or coincidences, can only be determined after detailed case-by-case text analysis. Our findings help assess the informational value of Big Social Data for a company's marketing strategy, sales operations and supply chain.
Investor sentiment plays an important role on the stock market. User-generated textual content on the Internet provides a precious source to reflect investor psychology and predicts stock prices as a complement to stock market data. This paper integrates sentiment analysis into a machine learning method based on support vector machine. Furthermore, we take the day-of-week effect into consideration and construct more reliable and realistic sentiment indexes. Empirical results illustrate that the accuracy of forecasting the movement direction of the SSE 50 Index can be as high as 89.93% with a rise of 18.6% after introducing sentiment variables. And, meanwhile, our model helps investors make wiser decisions. These findings also imply that sentiment probably contains precious information about the asset fundamental values and can be regarded as one of the leading indicators of the stock market.
Data management tools and analytics have provided managers with the opportunity to contemplate inventory performance as an ongoing activity by no longer examining only data agglomerated from ERP systems, but also, considering internet information derived from customers' online buying behaviour. The realisation of this complex relationship has increased interest in business intelligence through data and text mining of structured, semi-structured and unstructured data, commonly referred to as "big data" to uncover underlying patterns which might explain customer behaviour and improve the response to demand volatility. This paper explores how sales structured data can be used in conjunction with non-structured customer data to improve inventory management either in terms of forecasting or treating some inventory as "top-selling" based on specific customer tendency to acquire more information through the internet. A medical condition is considered - namely pain - by examining 129 weeks of sales data regarding analgesics and information seeking data by customers through Google, online newspapers and YouTube. In order to facilitate our study we consider a VARX model with non-structured data as exogenous to obtain the best estimation and we perform tests against several univariate models in terms of best fit performance and forecasting.
This paper proposes a big-data analytics-based approach that considers social media (Twitter) data for the identification of supply chain management issues in food industries. In particular, the proposed approach includes text analysis using a support vector machine (SVM) and hierarchical clustering with multiscale bootstrap resampling. The result of this approach included a cluster of words which could inform supply-chain (SC) decision makers about customer feedback and issues in the flow/quality of food products. A case study in the beef supply chain was analysed using the proposed approach, where three weeks of data from Twitter were used.
Online reviews provide consumers with rich information that may reduce their uncertainty regarding purchases. As such, these reviews have a significant influence on product sales. In this paper, a novel method that combines the Bass/Norton model and sentiment analysis while using historical sales data and online review data is developed for product sales forecasting. A sentiment analysis method, the Naive Bayes algorithm, is used to extract the sentiment index from the content of each online review and integrate it into the imitation coefficient of the Bass/Norton model to improve the forecasting accuracy. We collected real-world automotive industry data and related online reviews. The computational results indicate that the combination of the Bass/Norton model and sentiment analysis has higher forecasting accuracy than the standard Bass/Norton model and some other sales forecasting models.
Online social network applications such as Twitter, Weibo, have played an important role in people’s life. There exists tremendous information in the tweets. However, how to mine the tweets and get valuable information is a difficult problem. In this paper, we design the whole process for extracting data from Weibo and develop an algorithm for the foodborne disease event detection. The detected foodborne disease information are then utilized to assist the restaurant recommendation. The experiments results show the effectiveness and efficiency of our method.