Content uploaded by Thai Le
Author content
All content in this area was uploaded by Thai Le on Dec 06, 2019
Content may be subject to copyright.
Application of Artificial Neural Network
in Social Media Data Analysis: A Case
of Lodging Business in Philadelphia
Thai Le, Phillip Pardo and William Claster
Abstract Artificial Neural Network (ANN) is an area of extensive research.
The ANN has been shown to have utility in a wide range of applications. In this
chapter, we demonstrate practical applications of ANN in analyzing social media
data in order to gain insight into competitive analysis in the field tourism. We have
leveraged the use of an ANN architecture in creating a Self-Organizing Map
(SOM) to cluster all the textual conversational topics being shared through thou-
sands of management tweets of more than ten upper class hotels in Philadelphia. By
doing so, we are able not only to picture the overall strategies being practiced by
those hotels, but also to indicate the differences in approaching online media among
them through very lucid and informative presentations. We also carry out predictive
analysis as an effort to forecast the occupancy rate of luxury and upper upscale
group of hotels in Philadelphia by implementing Neural Network based time series
analysis with Twitter data and Google Trend as overlay data. As a result, hotel
managers can take into account which events in the life of the city will have deepest
impact. In short, with the use of ANN and other complementary tools, it becomes
possible for hotel and tourism managers to monitor the real-time flow of social
media data in order to conduct competitive analysis over very short timeframes.
Keywords Artificial neural networks (ANNs) !Hospitality !Social Media anal-
ysis !Kohonen !Forecasting !Competitive Analysis !Lodging !Hotel Occupancy
T. Le (&)!P. Pardo !W. Claster
Ritsumeikan Asia Pacific University, Jumonji Baru 1-1, Beppu, Oita, Japan
e-mail: le.thai.jp@ieee.org
P. Pardo
e-mail: pardorit@apu.ac.jp
W. Claster
e-mail: wclaster@apu.ac.jp
©Springer International Publishing Switzerland 2016
S. Shanmuganathan and S. Samarasinghe (eds.), Artificial Neural
Network Modelling, Studies in Computational Intelligence 628,
DOI 10.1007/978-3-319-28495-8_16
369
1 Introduction
Starwood Hotels and Resorts was one of the very first hotels to realize the critical
role of social media data, and to leverage the information to support customers in
their travel decisions [1]. Gradually, not only have more and more tourism busi-
nesses become actively involved in online activities on different social network
channels such as Twitter and Facebook, but many of them are also considering
social media data as a valuable and timely information source of input for various
decision making processes [2]. Ironically, regardless of the prevalent adoption of
social media usage in the tourism industry (e.g. [3–7]), there is still a lack of
comprehensive guidelines on how online social data can be interpreted to gain
competitive knowledge in the hospitality industry. In this chapter, we introduce
approaches based on using unsupervised ANN, namely self-organizing map
(SOM) based methods to analyze Twitter and Google Trends data of two different
groups of hotels, namely luxury and upper upscale, in Philadelphia during the
period between 2011 and 2014. First, we look at how related data is collected and
pre-processed. Then, the chapter shares and discusses the implementation of SOMs,
which are trained by ANN, in analyzing textual contents of hotel’s management
tweets. An application for using ANN in predicting the occupancy rate of the two
groups of hotels with different overlay data is subsequently examined.
2 Data Collection and Pre-Processing
Social media data is scattered throughout the Internet in many forms, but most are
in the form of micro-blogs, which are found on various online social networks such
as Facebook, Twitter, etc. Because of this, in this research, data was mainly col-
lected from two sources: Tweets data from Twitter and search queries data from
Google Trend, which were also effectively employed in various related researches
(e.g. [8–12]). Moreover, data regarding Philadelphia hotels’average occupancy rate
between January 2008 and May 2014 is provided from Smith Travel Research Inc.
(STR). Regarding the Twitter data, we collected thousands of tweets posted by the
public Twitter accounts of eight different hotels in Philadelphia, which we cate-
gorized into two groups: luxury and upper upscale. The Google Trend data is
normalized query volume of keywords worldwide, which in this case were the
names of the examined hotels as in Table 1.
After collecting all the data, we proceed to the pre-processing procedure, in
which the data is cleaned up to ensure a sound subsequent analysis. In particular, all
the duplicated tweets, English stop-words (the, a, an, etc.), numeric figures,
Philadelphia’s different entities (PA, Philly, etc.), and hyperlinks are filtered and
eliminated from the dataset. Regarding AKA Rittenhouse hotel, since only a cor-
porate Twitter account is found, solely tweets concerned with location in
Philadelphia are selected for the purpose of this research.
370 T. Le et al.
3 Neural Network Facilitated Self-Organized Map
Since we postulate that the tweet’s contents represent the marketing strategies used
in approaching customers through online channels, understanding the relationship
of different keywords being used in the tweets can help us to gain viable com-
petitive knowledge from the different hotels’manager’s perspective. However,
since each of the keywords belongs not only to one but several documents, or
tweets, the collected tweets can be considered as a huge sparse matrix of several
thousand-dimensional vectors. Hence, an algorithmic approach enabling unsuper-
vised clustering is needed to transform this matrix into meaningful visual exposi-
tions. In this section, we share an application of an unsupervised ANN that is able
to cluster multi-dimensional data into a two-dimensional informative map. The map
is called Self-Organizing-Map (SOM) proposed Kohonen [13], which is recognized
to be a very effective analysis tool in data clustering facilitated by an unsupervised
ANN learning algorithm [14–17].
In order to clearly picture the use of SOM in the analysis of social media data in
the field of tourism and hospitality, results of such a neural network training process
on the management tweets of the Four Season Philadelphia and Sofitel Philadelphia
hotels, which belong to two different hotel ranks listed as luxury and upper upscale
respectively, are introduced as follows in Fig. 1.
As we can clearly see, the above generated SOM map (Fig. 1) contains a total of
1029 nodes representing 1029 neurons, classifying over 9716 terms from 3149
management tweets posted by the Four Season hotel into 20 clusters. Each of the
clusters is pictured by different colors, which has a algorithmically generated central
Table 1 Input query string used to collect google trend data and tweets data collection results
Hotel names Input query string on google trend Number of
retrieved user’s
tweets
Hyatt at the
Bellevue
Hyat1t at The Bellevue 352/353 (99.7 %)
Windsor Suites Windsor suites philadelphia 3035/3037
(99.9 %)
Le Meridien
Philadelphia
lemeridienphiladelphia 694/694 (100 %)
Kimpton Hotel
Palomar
Philadelphia
hotelpalomarphiladelphia + palomarphiladelphia 2006/2011
(99.7 %)
Sofitel Philadelphia sofitelphiladelphia 3151/6868 (46 %)
Four Seasons Hotel
Philadelphia
four seasons philadelphia 3149/9081 (35 %)
The Latham Hotel thelatham hotel + the lathamphiladelphia 40/40 (100 %)
AKA Rittenhouse
Hotel
Aka_Rittenhouse philadelphia 2924/2929
(99.8 %)
Application of Artificial Neural Network …371
concept expressed as the cluster name. In the same way that color on a world map
color only serves as a visual demarcation between countries, the color in these
visualizations do not express any qualitative or quantitative attributes of the data.
However, the position, as well as the size of each of the clusters does have meaning.
To illustrate, the most prevalent cluster “check out”located at the center of the map
containing terms such as “culture”,“events”, and “local”, shows a marketing effort
as it suggests various available activities occurring locally in Philadelphia.
Specifically, the cluster “#fsfoodtruck”and “charity”located in a position adjoining
each other doubtless describe an effort to market the corporation’s food-truck
campaign around September 2014 to raise donation for the Children’s Hospital in
Philadelphia. Above that is the “beer”cluster, which probably points to a beer
festival that occurred in the city around that time. A portion of the map is covered
by the “new year”concept, which is located right next to “happy”,“#luxbride”, and
“what”cluster. If we look at the details of the messages which include these top
keywords, we can find terms such as “resolution”,“wedding”,“weddingplanning”,
etc., which in fact describe a marketing promotion for the hotel on the social
networks of a wedding package during the New Year period. Additionally, located
at the top left corner of the map are three aggregated clusters namely “fri”,“chat”,
“join”that help us to learn about the publicity surrounding of regular speaking
events featuring some famous regional editors in fashion and lifestyle. Through the
lens of the Kohonen map, it is possible to gain a deeper insight into the hotel’s
marketing strategies and campaigns during this time, and also to picture the hotel’s
different emphases in approaching customers via online social media.
In comparison with the previous SOM (Fig. 1) the one found in Fig. 2for Sofitel
Philadelphia’s hotel management tweets, shows several noticeable differences.
Other than the centered “holiday”cluster, the “#conciergechoice”motif seems to be
Fig. 1 Self-organizing-map (SOM) for four season Philadelphia, 2011–14
372 T. Le et al.
the largest among them. The fact that this concept was surrounded by the “new”and
“events”clusters implies that the hotel enjoys giving out advice and suggestions
about new activities and events to its customers. If we look deeper into the “new”
cluster, some related phrases can be found such as “new native American voices
exhibit”,“new app for iphone”, and “new and coming-soon restaurants”. In addi-
tion, the “baseball”cluster located next to “events”also implies that the hotel
marketers seem to be leveraging the popularity of news regarding local baseball
games to make their entity prevalent on the social network. Despite the fact that
there are many events being promoted by the hotel’s tweets, most of them relate to
“baseball”, and “battle”, hence the algorithm creates a separate grouping for this
concept. Noticeably, many concepts shown on the map are described by keywords
in French such as “merci”or “magnifique”. This also suggests that the Sofitel
Philadelphia concentrates on advertising one of its unique features, which is the
combination of French style and American living.
4 Time Series Prediction with Neural Network
Because of its great impact on various aspects in the field of tourism, hotel occu-
pancy forecasting has always been a main focus of hotel managers. Hotel occu-
pancy is not only a metric for the internal assessment of a single hotel, but it is also
an indicator of the changing patterns of customers within a specific geographic area
such as city, or country. In other words, hotel occupancy rates reflect the con-
sumer’s cognition and behavior, which are also recognized to possess a linkage
with social media data. Because of this, we propose forecasting the hotel occupancy
rate of these two groups of hotels in Philadelphia using the retrieved data from the
Twitter social network. Our goal was to find if there are strong quantitative
Fig. 2 Self-organizing-map (SOM) for Sofitel Philadelphia, 2011–14
Application of Artificial Neural Network …373
relationships between management tweets and hotel occupancy rate. Moreover, this
forecasting model also makes use of Google Trend data to partly reflect the con-
tribution of consumer’s online behavior in the hotel occupancy rate.
Within the scope of this chapter, implementations of time series forecasting
facilitated by a Multilayer Perceptron ANN with different overlay data using the
Weka data mining software [18] is suggested. Part of the collected data (occupancy
rate, management tweets, and Google trends) is filtered so that the respective time
period is aligned with each other. Then, it is divided into training and testing dataset
with a ratio of 7:3. Different evaluation metrics namely Mean Absolute Error
(MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE) are used to
fit the trained model on the testing dataset to compare the respective results with the
performance of Linear Regression.
Regarding the analysis on the luxury group of hotels (Table 2), Weka’s
Multilayers Perceptron (MLP) provided a better forecast of occupancy than did
Linear Regression and this is partly due to MLP flexibility in being able to tune
multiple parameters, in particular the learning rate (−L) and momentum (−M). This
improvement showed up across all the evaluation measures (MAE, MAPE, RMSE,
and MSE). However, the introduction of social media data: Tweets, Google Trend,
or both, did not improve the forecasts.
In the analysis on the upper upscale group of hotel we observed a different
scenario (Table 3). Even though the MLP model outperformed linear regression in
the analysis without overlay data, social media data showed itself to be significant.
Specifically, management tweets greatly improved the forecasting results with
linear regression model, and the inclusion of both management tweets and Google
trends data shows slight enhancement in the case of MLP model.
Table 2 Prediction analysis result on the luxury group of hotels data using MLP and linear
regression
No overlay
data
Tweets overlay Trend overlay Tweets + trend
overlay
Linear regression MAE 6.1587 MAE: 6.8693 MAE: 7.1971 MAE: 6.8693
MAPE: 8.0194 MAPE: 8.9465 MAPE: 9.3409 MAPE: 8.9465
RMSE: 8.0204 RMSE: 9.2922 RMSE: 9.8314 RMSE:9.2922
MSE: 64.3276 MSE: 86.3453 MSE: 96.6559 MSE: 86.3453
Multilayer
perceptron (−L 0.3
−M 0.2)
MAE: 3.8041 MAE: 5.0625 MAE: 4.939 MAE: 3.899
MAPE: 5.06 MAPE: 6.6659 MAPE: 6.495 MAPE: 5.2032
RMSE: 4.3529 RMSE: 5.9638 RMSE: 6.1949 RMSE: 5.4229
MSE: 18.9478 MSE: 35.5664 MSE: 38.3763 MSE: 29.4084
Multilayer
perceptron (−L 0.2
−M 0.001)
MAE: 4.0597 MAE: 3.8467 MAE: 3.5707 MAE: 3.9372
MAPE: 5.3961 MAPE: 5.1286 MAPE: 4.7686 MAPE: 5.262
RMSE: 4.7331 RMSE: 4.4672 RMSE: 4.5605 RMSE: 4.8992
MSE: 22.4024 MSE: 19.9558 MSE: 20.7982 MSE: 24.0018
374 T. Le et al.
In overall, it is noticeable that either linear regression or MLP ANN has its own
advantages with different input data of the two groups of hotels in Philadelphia.
However, the ANN model is shown to be more adaptable, since its parameters are
easily altered accordingly with different scenario.
5 Conclusion
In this chapter, we have shown that ANN facilitated SOM is powerful in analyzing
social media data to gain competitive knowledge in the field of tourism. Because of
the flexibility of the algorithm, the introduced methodology can be easily cus-
tomized and employed in different business scenarios. The capability of MLP ANN
is also demonstrated through its application in time series forecasting, hotel
occupancy prediction particularly, with different overlay data. ANN is without
doubt a very dynamic and supportive in business analytics, and its potentials will be
surely pushed beyond any recognizable boundaries in the present context.
References
1. L.H. Lanz, B.W. Fishhof, R. Lee, How are hotels embracing social media in 2010—examples
of how to begin engaging. HVS sales and marketing services
2. C.K. Anderson, The impact of social media on lodging performance. Cornell Hospitality
Rep. 12(15), 4–11 (2012)
3. W. Claster, Q.T. Le, P. Pardo, Implication of social media data in business analysis: a case in
lodging industry, Proceedings of the APUGSM 2015 conference, (Japan, 2015), p. 61
4. I. Blal, M.C. Sturman, The differential effects of the quality and quantity of online reviews on
hotel room sales. (Cornell Hospitality Quarterly, 2014)
Table 3 Prediction analysis result on the upper upscale group of hotel data using MLP ANN and
linear regression
No overlay
data
Tweets
overlay
Trend
overlay
Tweets +
trend overlay
Linear regression MAE: 5.662 MAE: 2.5313 MAE: 5.662 MAE: 5.662
RMSE: 6.4643 RMSE: 3.3175 RMSE: 6.4643 RMSE: 6.4643
MSE: 41.7875 MSE: 11.0061 MSE: 41.7875 MSE: 41.7875
Multilayer perceptron
(−L 0.1 −M 0.4)
MAE: 3.5747
RMSE: 4.5417
MAE: 5.496
RMSE: 6.1738
MAE: 4.1679
RMSE: 5.004
MAE: 3.6409
RMSE: 4.676
MSE: 20.627 MSE: 38.1156 MSE: 25.0399 MSE: 21.8645
Multilayer perceptron
(−L 0.1 −M 0.5)
MAE: 4.1775 MAE: 5.3354 MAE: 4.9444 MAE: 3.2727
RMSE: 4.8488 RMSE: 6.0356 RMSE: 5.7739 RMSE: 4.4111
MSE: 23.5109 MSE: 36.4279 MSE: 33.3382 MSE: 19.4582
Application of Artificial Neural Network …375
5. H. Choi, P. Liu, Reading tea leaves in the tourism industry: a case study in the Gulf oil spill.
(March 24, 2011)
6. G. Seth, Analyzing the effects of social media on the hospitality industry. UNLV
Theses/Dissertations/Professional Papers/Capstones, paper 1346 (2012)
7. W. Claster, P. Pardo, M. Cooper, K. Tajeddini, Tourism, travel and tweets: algorithmic text
analysis methodologies in tourism. Middle East J. Manage. 1(1), 81–99 (November 2013)
8. W. He, S. Zha, L. Li, Social media competitive analysis and text mining: a case study in the
pizza industry. Int. J. Inf. Manage. 33(3), 464–472 (June 2013)
9. X. Yang, B. Pan, J.A. Evans, B. Lv, Forecasting Chinese tourist volume with search engine
data, tourism management, vol. 46, pp. 386–397 (IFebruary 2015). ISSN 0261-5177
10. L. Dey et al., Acquiring competitive intelligence from social media. Proceedings of the 2011
Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, ACM
(2011)
11. G. Barbier, H. Liu, Data mining in social media. Social Network Data Analytics, (2011),
pp. 327–352
12. Z. Xiang, B. Pan, Travel queries on cities in the United States: implications for search engine
marketing for tourist destinations, Tourism Manage. 32(1), 88–97 (February 2011). ISSN
0261-5177
13. T. Kohonen, Self-organized formation of topologically correct feature maps. Bio. Cybern. 43
(1), 59–69 (1982)
14. D. Isa, V.P. Kallimani, L.H. Lee, Using the self-organizing map for clustering of text
documents. Expert Syst. Appl. 36(5), 9584–9591 (July 2009). ISSN 0957-4174
15. W. Claster, D. Hung, S. Shanmuganathan, Unsupervised Artificial Neural Nets for Modeling
Movie Sentiment, 2nd International Conference on Computational Intelligence,
Communication Systems and Networks, 2010
16. W.B. Claster, M. Cooper, Y. Isoda, P. Sallis, Thailand—tourism and conflict: modeling
tourism sentiment from twitter tweets using naïve bayes and unsupervised artificial neural nets.
CIMSim2010, Computational intelligence, modelling and simulation, 2010, pp. 89–94
17. Y.C. Liu, M. Liu, X.L. Wang, Application of self-organizing maps in text clustering: a review,
applications of self-organizing maps, ed. by Dr. M. Johnsson, (2012). ISBN:
978-953-51-0862-7
18. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data
mining software: an update. SIGKDD Explor. 11(1), (2009)
376 T. Le et al.