Content uploaded by Mihai Dascalu
Author content
All content in this area was uploaded by Mihai Dascalu on Feb 19, 2020
Content may be subject to copyright.
Comprehensive Exploration of Game
Reviews Extraction and Opinion Mining
Using NLP Techniques
Stefan Ruseti, Maria-Dorinela Sirbu, Mihnea Andrei Calin, Mihai Dascalu,
Stefan Trausan-Matu and Gheorghe Militaru
Abstract Sentiment analysis and opinion summarization have become an important
research area with the increase of available data on the Web. Since the Internet started
containing more and more opinions and reviews for different products, individual
users and companies saw the benefits of a priori evaluations based on other users’
experiences; thus, automated analyses centered on customer impressions and expe-
riences emerged as crucial marketing instruments. Our aim is to create a scalable and
easily extensible pipeline for building a custom-tailored sentiment analysis model
for a specific domain. A corpus of around 200,000 games reviews was extracted,
and three state-of-the-art models (i.e., support vector machines, multinomial Naïve-
Bayes, and deep neural network) were employed in order to classify the reviews into
positive, neutral, and negative. Current results surpass previous experiments based on
word counts applied on a similar game reviews dataset, thus arguing for the adequacy
of the proposed workflow.
Keywords Sentiment analysis and opinion mining ·Game reviews ·Natural
language processing
S. Ruseti ·M.-D. Sirbu ·M. A. Calin ·M. Dascalu (B
)·S. Trausan-Matu ·G. Militaru
University Politehnica of Bucharest, Splaiul Independent
,ei 313, 60042 Bucharest, Romania
e-mail: mihai.dascalu@cs.pub.ro
S. Ruseti
e-mail: stefan.ruseti@cs.pub.ro
M.-D. Sirbu
e-mail: maria.sirbu@cti.pub.ro
M. A. Calin
e-mail: mihnea.calin@gmail.com
S. Trausan-Matu
e-mail: stefan.trausan@cs.pub.ro
G. Militaru
e-mail: gheorghe.militaru@upb.ro
M. Dascalu ·S. Trausan-Matu
Academy of Romanian Scientists, Splaiul Independen¸tei 54, 050094 Bucharest, Romania
©SpringerNatureSingaporePteLtd.2020
X.-S. Yang et al. (eds.), Fo u rt h I n t e r n a ti o n a l C o n gre s s on I n f o r m a t io n
and Communication Technology, Advances in Intelligent Systems
and Computing 1041, https://doi.org/10.1007/978-981- 15-0637-6_27
323
324 S. Ruseti et al.
1 Introduction
Internet has become a common practice and this trend leads to an increased influ-
ence in marketing and buying decisions. For example, online comments play a major
role in the popularity of a game, movie, or any kind of media product. Thus, the
popularity of a brand increases or decreases depending on the thoughts expressed
by different people. Gaming is one the most thriving industries in which players are
highly influenced by user reviews. As game rankings are mostly based on aggre-
gated user or critic scores, it is essential to extract and consider the actual value of
individual reviews. Full-text reviews provide important benefits for users who can
make informed decisions, as well as for game companies which can use online mar-
keting mechanisms to take appropriate decisions (e.g., promotions for low evaluated
games).
A sentiment refers to attitudes, emotions, and opinions conveyed with regards
to a given entity. Multiple points of views or ideas can be found just by analyzing
online reviews as these originate from different social groups, genders, professions;
therefore, this provides a better overview of the perceived impressions. Sentiment
analysis (or opinion mining) focuses on the task of extracting human feelings and
opinions from texts written in natural language [1,2]. Sentiment analysis models
are widely employed in different domains including marketing, business, education,
sociology, psychology, and economics [2]. These automated models frequently use
natural language processing (NLP), information retrieval, and data mining [3]tech-
niques. An important factor for the increased efficiency of machine learning and
NLP methods resides in the huge amount of information available online for train-
ing purposes [4]. In addition, we can perceive sentiment analysis as a simplification
of in-depth discourse analysis and semantics as our aim is to automatically extract
global features (e.g., positive or negative sentiments and their corresponding labels)
[5].
Sentiment analysis methods entail various processing steps, out of which two are
frequently encountered: text preprocessing, followed by the automated classification.
The first stage uses NLP techniques such as: tokenization, stop-words, numbers and
punctuation removal, and lemmatization. Second, a wide range of classifiers can
be employed consisting of two major categories—machine-learning techniques and
lexicon-based statistics—as well as a combination of them (i.e., hybrid methods).
Pang and Lee [6]envisionanoptimalsolutionforopinion-miningasamachinewhich
processes the text for a given item, creates a list of product features, and aggregates
all opinions for the given entity.
The aim of this paper is to compare multiple state-of-the-art models capable of
classifying game reviews as positive, negative, or neutral. The next section describes
frequently employed methods for opinion-mining and sentiment analysis. The third
section introduces the used corpora (consisting of around 200,000 game reviews
gathered for over 3000 games), followed by a description of the proposed methods.
The fourth section presents the results and a comparison of our selected models,
followed by conclusions and the future work.
Comprehensive Exploration of Game Reviews Extraction … 325
2StateoftheArt
Two major NLP-based methods for sentiment analysis compete, for which rep-
resentative models are presented in this section: lexicons and machine-learning
techniques, each with its own limitations. Machine-learning techniques require a
large dataset of human examples for training, which is opposite to lexicon-based
approaches that do not capture context-sensitive semantics because they rely on
isolated word occurrences [7]. Moreover, context is important for expressing word
meanings because some words can have multiple valences in different contexts. How-
ever, most approaches rely on the bag-of-words assumption in which word order is
disregarded. Thus, the discourse structure is completely disregarded, and, for exam-
ple, the following two phrases will have the same polarity, even though they express
opposite positions: “You are rig h t , I d o n ot li k e t h i s ice cream ” versus “Yo u a r e n o t
right, I like this ice cream.”
Sentiment lexicons are lists of words which express polarities for different dimen-
sions. These vectors contain information about semantic valences (e.g., intensifica-
tion or negation) [8], potential parts of speech [9], and can be divided into two main
categories: domain-independent word lists (i.e., general dictionaries that grant a
global overview) and domain-dependent vectors (i.e., accurate for certain categories
like books, tourism, shopping, gaming, or movies review, but potentially irrelevant
for other domains). Multiple dictionaries and tools have been developed, out of which
the most representative open-source ones are: Affective Norms for English Words
(ANEW) [10], SenticNet, The General Inquirer (GI) [11] including the Lasswell [12]
dictionary, Geneva Affect Label Coder (GALC) [13], and EmoLex [14]. In addition,
several approaches build on top of individual word lists by combining them into
meaningful components—for example, SEANCE [15]orthemethodintroducedby
Sirbu, Secui, Dascalu, Crossley, Ruseti, and Trausan-Matu [16].
Machine-learning approaches mostly rely on supervised-learning algorithms to
classify opinions into positive and negative classes, such as: Naïve-Bayes [17], max-
imum entropy [18], multinomial logistic regression [19], support vector machines
[20]ordeepneuralnetworks[20]. The most representative methods were selected for
our pipeline and are presented in detail in the following section. Some NLP libraries,
like Stanford CoreNLP [21], also include a sentiment analysis component. Socher,
Perelygin, Wu, Chuang, Manning, Ng and Potts [22] use a recursive neural network
applied on the constituency parsing tree of a sentence. Each node in the tree is labeled
with a polarity score with seven possible values.
In addition, Kim, Ganesan, Sondhi, and Zhai [23]pointoutreasonswhyopinion-
mining has not yet been used by major industries when developing products. Unfor-
tunately, this subject is still in research and the solutions are not as accurate as
possible. The accuracy of the solutions should incorporate better understanding of
the text and make a more generalized solution so that it can be used in different
fields. The downside of using more features to improve the accuracy means that
scalability decreases. This can be mitigated by introducing parallel processing and
streamlined workflows. Another potential problem is the lack of common and public
326 S. Ruseti et al.
datasets. Most researches are performed on private data and other researches cannot
have easy access to it. Moreover, besides standardization, quality control [24]needs
to be enforced—only in this manner, valuable opinions are taken into consideration
for researches and projects [25].
3 Method
3.1 Corpora
Our dataset consists of 201,552 games reviews crawled from Metacritic. Crawler4j
[26]wasusedtoextractreviewsfromover3000games.Gamesandreviewswere
indexed in Elasticsearch [27], whereas Kibana [28] was used to create interactive
visualizations and statistics over our dataset. Figure 1displays the number of reviews
for each user rating.
In Fig. 2, we present a comparison between the users’ score and critics’ metascore
as we want to analyze the accuracy of the user ratings. All scores were rounded to
the closest integer in order to ease the follow-up classifications. We consider that
the metascores are the best available reference for quantifying the subjectivity and
objectivity within a user’s opinion. The data shows that critics usually give higher
scores compared to user reviews; however, there are no major differences in the
assignments of games in specific intervals when using their corresponding metascores
or the users’ scores.
The initial dataset was split into three categories depending on the rating given by
users, as follow: negative—reviews with rating between 1 and 5, neutral—reviews
with rating between 6 and 8, and positive—reviews with rating 9 and 10. In order
to have a balanced dataset, the same number of reviews was sampled from each
Fig. 1 Number of reviews per score
Comprehensive Exploration of Game Reviews Extraction … 327
Fig. 2 Comparison between a.Metascores(critics)andb.Averageuserscoreoftheconsidered
games
Table 1 Classes and reviews
number for sentiment
prediction pipeline
Negative
reviews
Neutral
reviews
Positive
reviews
Trainin g 33,000 33,000 33,000
Validation 3000 3000 3000
Test 3000 3000 3000
class. Table 1contains the distribution of the three types of reviews in the training,
validation, and test partitions.
3.2 Extensible Architecture
Figure 3presents our extensible sentiment analysis pipeline. First, we extracted a
large number of games reviews from Metacritic using Crawler4j [26]. Second, all
games reviews were indexed into Elasticsearch [27]. Third, we applied a text prepro-
cessing pipeline which includes: stop-words removing (stop-words like preposition,
Fig. 3 Building a scalable sentiment analysis pipeline
328 S. Ruseti et al.
conjunctions, pronoun, etc., are eliminated), lemmatization (bringing words to the
base form, i.e., verbs are transformed to infinitive form), content-words extraction
(all models were trained only on content-words because they express sentiments and
contain valuable information).
In the last step, we evaluated three classifiers on the preprocessed texts: support
vector machine, multinomial Naïve-Bayes, and deep neural networks (DNN). The
three classifiers use the words as features and predict the score of the review. The
hyper-parameters of each model were tuned using the validation partition and the
reported results are obtained on the test partition by training on both training and
validation sets.
In general, a neural network-based text classifier uses an encoder that computes
arepresentationforthetextfollowedbysomefullyconnectedlayersthatproduce
the probabilities for each output class. In this case, we used the deep averaging
network (DAN) and transformer versions of the Universal Sentence Encoder [29]
from TensorFlow Hub1These models were pre-trained on several text prediction
tasks which require a general representation of the text and show good results when
transferred on other tasks.
The SVM and multinomial Naïve-Bayes were applied on the bag-of-words rep-
resentation of the text. We have also tested a Tf–Idf weighting of the words. The
scikit-learn python machine-learning library2was used for preprocessing and model
training.
4 Results
The accuracies obtained by each evaluated model are presented in Table 2.TheDNN
models performed better than the other classifiers, but did not achieve a significantly
higher accuracy. One possible explanation for this limitation is the fact that the pre-
trained encoders should capture the meaning of a text, which might not be very useful
for sentiment analysis applied on a specific dataset.
Table 2 Accuracy of
different models on the test
set
Model Representation Accuracy (%)
SVM BoW 66
SVM Tf–Idf 61
Multinomial NB BoW 65
Multinomial NB Tf–Idf 65
DNN DAN 66
DNN Transfo rmer 67
1https://www.tensorflow.org/hub/modules/google/universal-sentence-encoder/2.
2http://scikit-learn.org.
Comprehensive Exploration of Game Reviews Extraction … 329
The obtained results surpass previous lexicon-based analyses performed on a
similar game reviews dataset that considered emerging PCA components [16,30].
5ConclusionsandFutureDevelopments
This paper analyses the accuracy of different classifiers for predicting the rating of
game reviews extracted from Metacritic. Our goal was to create a scalable and easily
extensible pipeline for building a custom-tailored sentiment analysis model for a
specific domain.
In the future, we want to improve the training dataset and create a complete
pipeline for Romanian language that can be employed onto different domains and
industries, including books and film reviews. However, the first step is to extract
alargecollectionofreviewswritteninRomanian.Moreover,weobservedalotof
writing mistakes in reviews, which can potentially influence the accuracy of the
classifiers. This is especially important in the case of bag-of-words inputs in which
any wrongly spelled word acts as a completely different feature. One way to avoid
this is to use an automated spell-checker in the preprocessing step. In addition to
the previous classifiers, we will also consider additional potential methods (e.g.,
k-nearest neighbors, random forest) in order to create a strong baseline.
Additional experiments should be conducted on this dataset using other DNN
architectures. The dataset is large enough to train an encoder on it, instead of using
apre-trainedone.Inasentimentanalysistask,theadjectivesandadverbsfromthe
text are more important than the nouns and verbs, which are usually central elements
for computing text similarity. By training directly on this dataset, the encoder should
be capable of learning what parts of speech to focus on.
Acknowledgements This work was supported by a grant of the Romanian Ministry of Research and
Innovation, CCCDI—UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0689/“Lib2Life—
Revitalizarea bibliotecilor si a patrimoniului cultural prin tehnologii avansate”/“Revitalizing
Libraries and Cultural Heritage through Advanced Technologies”, within PNCDI III.
References
1. B. Liu, Sentiment Analysis and Opinion Mining (Morgan & Claypool Publishers, San Rafael,
CA, 2012)
2. C.J. Hutto, E. Gilbert, Vader: a parsimonious rule-based model for sentiment analysis of social
media text, in 8th International AAAI Conference on Weblogs and Social Media (AAAI Press,
Ann Arbor, MI, 2014), pp. 216–225
3. Z. Hailong, G. Wenyan, J. Bo, Machine learning and lexicon based methods for sentiment
classification: a survey, in 2014 11th Web Information System and Application Conference
(WISA) (IEEE, 2014) pp. 262–265
4. B. Pang, L. Lee, Opinion mining and sentiment analysis (foundations and trends (R) in Infor-
mation Retrieval). Now Publishers Inc. (2008)
330 S. Ruseti et al.
5. B. Liu, Sentiment analysis and opinion mining. Synth Lect. Hum. Lang. Technol. 5(1), 1–167
(2012)
6. B. Pang, L. Lee, Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2),
1–135 (2008)
7. O.K.M. Cheng, R.Y.K. Lau, Probabilistic language modelling for context-sensitive opinion
mining. Sci. J. Inf. Eng. 5(5), 150–154 (2015)
8. J.G. Shanahan, Y. Qu, J. Wiebe, Computing Attitude and Affect in Text: Theory and applica-
tions, vol. 20 (Springer,Berlin, 2006)
9. A. Hogenboom, F. Boon, F. Frasincar, A statistical approach to star rating classification of
sentiment,ManagementIntelligentSystems(Springer,2012),pp.251–260
10. M.M. Bradley, P.J. Lang, Affective Norms for English words (ANEW): Stimuli, Instruction
Manual and Affective Ratings, (The Center for Research in Psychophysiology, University of
Florida, Gainesville, FL, 1999)
11. P. Stone, D.C. Dunphy, M.S. Smith, D.M. Ogilvie, Associates: The General Inquirer: A Com-
puter Approach to Content Analysis (The MIT Press, Cambridge, MA, 1966)
12. H.D. Lasswell, J.Z. Namenwirth, The Lasswell Value Dictionary (Yale University Press, New
Haven, 1969)
13. K.R. Scherer, What are emotions? And how can theybe measured? Soc. Sci. Inf. 44(4), 695–729
(2005)
14. S.M. Mohammad, P.D. Turney, Crowdsourcing a word–emotion association lexicon. Comput.
Intell 29(3), 436–465 (2013)
15. S. Crossley, K. Kyle, D.S McNamara, Sentiment Analysis and Social Cognition Engine
(SEANCE): An Automatic Tool for Sentiment, Social Cognition, and Social Order Analysis.
Behavior Research Methods (in press)
16. M.-D. Sirbu, A. Secui, M. Dascalu, S.A. Crossley, S. Ruseti, S. Trausan-Matu, Extracting
gamers’ opinions from reviews, in 18th International Symposium on Symbolic and Numeric
Algorithms for Scientific Computing (SYNASC 2016) (IEEE, Timisoara, Romania, 2016),
pp. 227–232
17. A. Pak, P. Paroubek, Twitter as a corpus for sentiment analysis and opinion mining, in LREC
2010 (Valletta, Malta, 2010)
18. A. Go, R. Bhayani, L. Huang, Twitter Sentiment Classification Using Distant Supervision.
CS224N Project Report, vol. 1(2) (Stanford, 2009)
19. P. Melville, W. Gryc, R.D. Lawrence, Sentiment analysis of blogs by combining lexical knowl-
edge with text classification, in Proceedings of the 15th ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining (ACM, 2009), pp. 1275–1284
20. T. Mullen, N. Collier, Sentiment analysis using support vector machines with diverse infor-
mation sources, in Proceedings of the 2004 Conference on Empirical Methods in Natural
Language Processing (2004)
21. C.D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S.J. Bethard, D. McClosky, The Stanford
CoreNLP Natural Language Processing Toolkit, in Proceedings of 52nd Annual Meeting of
the Association for Computational Linguistics: System Demonstrations (ACL, Baltimore, MA
,2014),pp.55–60
22. R. Socher, A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, C.P. Potts, Recursive deep
models for semantic compositionality over a sentiment treebank, in Conference on Empirical
Methods in Natural Language Processing (EMNLP 2013) (ACL, Seattle, WA, 2013)
23. H.D. Kim, K. Ganesan, P. Sondhi, C. Zhai, Comprehensive Review of Opinion Summarization
(2011)
24. B. Liu, L. Zhang, A survey of opinion mining and sentiment analysis, in Mining Text Data
(Springer, 2012), pp. 415–463
25. L. Zhuang, F. Jing, X.-Y. Zhu Movie review mining and summarization, in Proceedings of
the 15th ACM International Conference on Information and Knowledge Management (ACM,
2006), pp. 43–50
26. Y. Ganjisaffar, Crawler4j–Open Source Web Crawler for Java, Google Scholar (2012)
Comprehensive Exploration of Game Reviews Extraction … 331
27. C. Gormley, Z. Tong, Elasticsearch: The Definitive Guide: A Distributed Real-Time Search
and Analytics Engine (O’Reilly Media, Inc. California, 2015)
28. Y. Gupta, Kibana Essentials, Packt Publishing Ltd (2015)
29. D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R.S. John, N. Constant, M. Guajardo-
Cespedes, S. Yuan, C. Tar, Universal Sentence Encoder. arXiv preprint (2018), arXiv:1803.
11175
30. A. Secui, M.-D. Sirbu, M. Dascalu, S.A. Crossley, S. Ruseti, S. Trausan-Matu, Expressing sen-
timents in game reviews, in 17th International Conference on Artificial Intelligence: Methodol-
ogy, Systems, and Applications (AIMSA 2016) (Springer, Varna, Bulgaria, 2016), pp. 352–355