Conference PaperPDF Available

AN EXAMPLE OF PRAGMATIC ANALYSIS IN NATURAL LANGUAGE PROCESSING: SENTIMENTAL ANALYSIS OF MOVIE REVIEWS

Authors:

Figures

Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
61
An Example of Pragmatic Analysis in Natural
Language Processing: Sentimental Analysis of
Movie Reviews
Sütçü C.S.1
Aytekin C.2
1 Cem Sefa SÜTÇÜ, Marmara University, (Turkey)
e-mail: csutcu@marmara.edu.tr
2 Çiğdem AYTEKİN, Marmara University, (Turkey)
e-mail: cigdem.aytekin@ marmara.edu.tr
Abstract
Natural Language Processing (NLP) studies are one of the most exciting applications of
artificial intelligence and it is foreseen that NLP will cause new and revolutionist changes in human-
computer interaction. In this aspect, NLP is analyzed in five different segments: Phonology,
Morphology, Syntax, Semantics, Pragmatics Analysis. In this paper, a sentimental analysis will be
conducted using movie reviews left by users on beyazperde.com. The sentimental analysis allows to
automatically draw conclusions about the mood from text data. Thus, numerous reviews left on a
movie can be sorted as a positive or negative mood without the need of reading all of them one by
one. The ability to sort the reviews automatically according to their mood is invaluable to those who
are considering watching the said movie, actors\actresses, producers and advertisers. Such
application arms media companies with new and improvable opportunities to create effective
strategies. In this aspect, viewer expectations can be analyzed, and this information can be utilized for
the upcoming program projects. Importance of communications and information technologies is
undeniable. It is foreseen that this paper will aid the production of information processing in Turkish.
Keywords: Natural Language Processing, Artificial Intelligence, Sentimental Analysis, Movie
Reviews.
Introduction
Natural language processing is one of the main issues of artificial intelligence. Although it has
shown an increasing trend around the world especially in the last three years (2016-2019), it can be
said that its history is as old as the emergence of computers. Therefore, natural language processing
studies have also been mentioned with different names in this historical process: Computational
Language Science, Statistical Natural Language Processing and so on. The basis of these different
nomenclatures lies in the diversification of the work that can be done through natural language
processing over time. Today, these works are categorized in different ways as the field expands and
natural language processing is further developed into sub-branches.
Language is the most basic means of reconciliation among living things. In this context, a
classification related to communication types is examined within the framework of written language
and oral language. According to Gleason; written language, verbal language, and associated
components (source and receiver) are synergistic systems consisting of individual language domains
that form a dynamic inclusive whole. These individual language domains can be studied at five levels:
phonology, morphologic, syntactic, semantic and pragmatic. Pragmatic analysis refers to the rules of
language for an interview (commentating/commenting) and wider social situations. It also includes on
the listening side of the spoken language, "understanding of the social aspects of language" and on
the speaking side "social use of language". In the written language, on the reading side, it includes "
understanding the perspective, the needs of the user and so on" (Gleason, 2005). This study focuses
on the written language of pragmatic analysis.
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
62
On the other hand, thanks to the interactive feature of the internet environment, users can
express their opinions and make comments by taking part in the channels of interest. More
importantly, since they are willing to take part in this environment, it is possible to easily reach clear
and transparent views that are difficult to obtain in other ways. Therefore, the emotions in these written
texts can be analyzed and the information obtained can be used according to the needs. However,
considering the size of the data flowing in these environments, it becomes difficult or even impossible
for emotion analysis to be performed by manpower. Therefore, the analysis should be done
automatically with artificial intelligence methods. Sentimental analysis emerged from this requirement
as one of the sub-fields of natural language processing related to written language.
Interpretations are at the center of almost all human activities because they are key elements
of our behavior. Organizations and businesses always want to create public opinion about their
products and services. In addition, individual consumers want to know the opinions of their current
users and the opinions of others before purchasing a product or service. When an organization or
business needs public or consumer views, it creates surveys and focus groups. Obtaining these views
is a great business for marketing, public relations, and political campaign companies. This limitation of
social networks, forums and so on. environments. Interpretations in these environments are now
included in the decision-making processes of both the individual and the organizations and
businesses. For individuals, the limitation of family and friend level has disappeared. An organization
or business may not conduct surveys and focus group research. This is because there is an
abundance of such data in order to collect public opinion (Liu, 2012, 8). The sentimental analysis
assumes the task of extracting information from such data, enabling the automatic deduction of their
emotions from user reviews.
In this respect, in the study, a sentimental analysis of the user reviews shared on the movies
on beyazperde.com website was made using a sample data and the results were evaluated with
sensitivity measure.
Literature Review
Natural Language Processing Applications and Turkish Language
Natural language processing is an interdisciplinary field that is based on the processing of
natural language with artificial intelligence technologies for an area to be benefited. It combines
expertise in linguistics, design, communication science, software engineering, and data science.
Apple's Siri for natural language processing applications, IBM's Watson, ChatBots developed for
different sectors, search engine technologies that can be tailored to the needs of the user, robot
journalism, and expressions in text with technologies using deep learning techniques. Popular
examples include robot narrative and narrative verbal expressions on Instagram. Nowadays, natural
language processing is expanding and developing with the rapid increase of big data which includes
different forms of media such as text-sound-image. Figure 1 shows possible working topics in the field
of natural language processing.
Natural language processing can be used as a useful area, for example for detecting false
news in revealing fake news. One of the recent studies on this subject belongs to Traylor and Straub
(Traylor & Straub, 2019: 445). Authors have tried to determine the possibility of fake news texts by
using natural language processing methods, considering that deceptive content affects models such
as idea formation, decision making, and voting and this creates a worldwide information accuracy and
integrity problem. According to them, most of the fake news is primarily planted on social media
channels such as Facebook and Twitter. From this point of view, they made a fake news identification
study using Bayesian algorithm and evaluated their performance. The resulting transaction sensitivity
is 63% effective in assessing the possibility that a news text is fraudulent.
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
63
Figure 1: Natural Language Processing Topics (Adalı, 2013: 4)
Withanage and colleagues developed a system based on the assumption that voice-based
navigation systems play an important role in filling the gap between man and machine. They overcame
the user's difficulty in receiving and understanding voice commands and proposed a mobile navigation
application called “direct me", which mentions the main elements such as street names, landmarks,
points of interest, intersections, and specifies the route on an interactive interface. Here, the approach
of creating the user's preferred route is provided by first converting the audio streams to text and then
using natural language processing to obtain navigation-related information. This system can be used
as an effective approach to translate natural language instructions into a machine-understandable
format (Withanage, Liyanage, Deeyakaduwe, Dias, & Thelijjagoda, 2018).
In another study, a natural language processing framework was proposed to create natural
language interaction in Chinese. Syntax analysis, the proposed assessment method to understand the
relationship of change between entity classes in the group, and Bayes-based verb classification are
some of the elements of this framework. In addition, a semantic framework of verb types has been
established in order to identify the necessary and unnecessary roles for each verb type. As a result, in
order to help a robot to understand the instruction in summary terms, a semantic role-playing
approach has been proposed with the human-computer interaction module. The method was
confirmed by the relevant experiments. The research forms the basis of human-robot natural language
interaction (Li, Xu, Qi, & Ding, 2018: 2171, 2176).
Robot journalism is another natural language processing application and is based on the
writing of news by computer software. This requires generalized rules of language characteristics, and
journalists should be able to produce the format of the news and words that can be the labels. Today,
robot journalism studies are carried out with different perspectives. For example, one study is
conducted to determine the attitudes of journalists to adopt robot journalism (Kim & Kim, 2018: 340),
while another study is structured on the construction of text segmentation and custom labeling for
natural language processing (Naeun, Kirak, & Yoon, 2017: 566).
On the other hand, Turkish has several features that present very interesting challenges in
terms of natural language processing. Being additive morphology structured language, vowel harmony
Natural
Language
Processing
Reading printed
text and
correcting
reading errors Find and
Replace
Correction of
spelling
mistakes
Development of
writing aids
Foreign
language
reading aids
Question and
answer systems
Computer
conversation
Understanding
text
Access to
information
Extracting the
information
contained in the
text
Understanding
speech
Voice
interaction wit h
computer
Summary of a
text
Interlingual
translation
Writing aids in a
foreign
language
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
64
and sentence elements being replaced freely are some of them (Oflazer, 2016: 1). Over the past two
decades, a number of resources have been developed that can be used in Turkish natural language
processing: Morphological Analysis, Morphological Unification, Statistical Dependency Analyzer,
Lexical-Functional Grammar-Based Analyzer, Tree-Structured Corpus, Turkish WordNet, and Turkish
Corpus are the most important ones (Oflazer, 2016: 9-10).
One of the biggest projects in Turkish natural language processing is the open-source
Zemberek project. Text classification, spelling, orthography, syllabication, finding possible roots and
attachments, word generation is some of the applications performed in the project. The first version of
the spring was released in 2006. Finally, on 29.10.2018, “0.16.0 text normalization and gRPC (google
Remote Procedure Call) server” version was released. The text normalization feature in this release
attempts to correct errors in sentences used in social media, forums, and messaging software. For
example, the phrase “tmm, yarin havuza giricem ve aksama kadar yaticam :) [ok, I'll enter the pool
tomorrow and lay down till evening :)]” can be converted to “tamam, yarın havuza gireceğim ve
akşama kadar yatacağım :) [okay, I will enter the pool tomorrow and sleep until evening :)]”. This
process is important for the success of text analysis (Zemberek, 2019).
Gülşen Eryiğit presented the “Turkish Natural Language Processing Software Chain” platform
of İstanbul Technical University at the 14th European Chapter of Computational Linguistics
Conference. ITU Turkish Natural Language Processing Web Service (http://tools.nlp.itu.edu.tr/)
provides users with automatic update and patch management, ease of communication, easier
collaboration, and more. It also provides researchers and students with many levels of natural
language processing tools such as preprocessing, morphology, syntax, and entity recognition. Users
can communicate with the platform through three channels (Eryiğit, 2014: 1-4).
Sentimental Analysis
Pragmatic analysis allows you to analyze what the given text basically means. The aim is to
draw inferences from the given text. Sentimental analysis is one of the fields of study of pragmatic
analysis and aims to reveal the emotions in the given text. Sentimental analysis is a field of study that
analyzes users' views, feelings, assessments and attitudes towards entities such as products,
services, organizations, individuals, topics, events and their characteristics. The term was first used in
Nasukawa and Yi's 2003 study “Sentiment Analysis: Capturing Favorability Using Natural Language
Processing” (Nasukawa & Yi, 2003) (Liu, 2012: 7). The main point of a sentimental analysis is to
reveal the emotions contained in the text data. The analysis can be performed at three levels:
Document-level, sentence-level, and precision level. All levels have their own procedures and
functions. In the first level, the entire document is analyzed according to the general mood. At the
second level, the detail data of the comments are expanded. For example, the interpretation of “the
screen of this mobile phone is great" indicates a positive feeling for the mobile phone's “screen”. At the
third level, problems grow, but they are more clearly revealed. Upon completion of all levels, user
feedback is displayed (Solangi, et al., 2018: 3).
Sentimental analysis can be done in three ways: machine learning, dictionary-based technique
of textual data and a mixed technique that combines machine learning and dictionary-based technical
approaches (Rosa, Schwartz, Ruggiero, & Rodriguez, 2019: 2125). Although the dictionary-based
technical form was predominantly used in the study of sentimental analysis, nowadays, studies on all
three forms are increasingly continuing. Sentimental analysis is widely used in "voice of the customer"
researches of organizations and businesses, measuring the effectiveness of marketing activities and
online reputation management. However, there are also areas of use for different purposes. For
example, the journalist Terena Bell posted on her Twitter account on 27.06.2017 that Periscopic, a
data visualization company developed "Trump Emoti-Coaster” application to measure the moods in
Donald Trump's videos (Bell, 2017). Figure 2 illustrates this practice based on the facial expression
visual recognition and natural language processing infrastructure. All emotions are derived from the
standard video broadcasts using the Microsoft Emotion API. Here, the video is taken as an input and a
description of the emotions detected at any given percentage over time. Perceived emotions are
emotions that address universally specific facial expressions such as anger, contempt, disgust, fear,
happiness, neutrality, sadness and surprises.
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
65
Figure 2: Trump Emoti-Coaster Application
(Periscopic, 2019)
Another area of application of sentimental analysis is applications for monitoring and
recommendation systems (RS). This is because social networks, which are widely used today, contain
a number of useful data to provide users with a sense of mood for different themes. In their study,
Rosa et al. Designed a Knowledge-Based Recommendation System (KBRS), which includes a health
monitoring system to identify users with potential psychological disorders, especially depression and
stress. According to monitoring results, based on ontologies and sentimental analysis, KBRS is
enabled to send happy, calm, comforting or motivational messages to users with psychological
disorders. In addition, if a depressive condition is detected by the monitoring system, a mechanism
that sends warning messages to authorized persons is included in the system. The proposed method
has achieved 89% and 90% success in detecting depressed and stressful users, respectively (Rosa,
Schwartz, Ruggiero, & Rodriguez, 2019: 2124).
On Left on Read (https://leftonread.me/) is an application that presents users' moods in
iMessage and was developed by two students, Teddy Ni and Alex Danilowicz, to monitor their
message writing habits (later, seven more students have come to the team). With this application,
users can see when they write messages, how they write and who they write. They can see the
information and thus they can determine their own emotions depending on time. In other words, they
can follow their own speaking habits. The developers of the application state that users can see their
relations with their phones and thus feel better (Culver, 2019).
In their study, Ren et al. Started with the assumption that investor sentiment plays an
important role in the stock market. Because user-generated text data on the Internet provides a
valuable resource for reflecting investor psychology and predicts stock prices as complementary to
stock market data. The study integrates sentimental analysis based on a support vector machine into
a machine learning method. In addition, the "day of the week” effect was also considered, thus
creating more reliable and realistic emotion indices. The findings also indicate that emotions probably
contain valuable information about the core values of the entity and can be regarded as one of the
leading indicators in the stock market. The model helps investors make smarter decisions (Ren, Wu, &
Liu, 2019: 760). wefeelfine.org is a web-based emotion search engine. It was developed by Kamvar
and Harris for dictionary purposes. We Feel Fine is based on a data collection engine. This engine
automatically tracks many blocks that collect human emotions every 10 minutes. Blog data comes
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
66
from many online sources. We Feel Fine scans blog posts to collect data on "I'm feeling” or “I feel”
expressions. This is an approach inspired by the techniques used in the project "message listening “, a
project developed by Mark Hansen and Ben Rubin. Most blogs are hosted by one of several large
blogging companies. The URL format of many blog posts can be used to extract the author's name
from the entry. Given the author's name, the zigzags can be drawn from the given blogging site to
access that author's profile page. The profile page can extract information such as age, gender,
country, state and blog author's city. Given the country, state and city, the weather conditions of that
location can be subtracted from the time of writing, and most of this information and similar information
is extracted. This process is automatically repeated every ten minutes. In general, identification and
recording are around 15-20,000 emotions per day. The application panel allowing the viewer to control
the sample population on the screen at any time can be used to arbitrarily sort the different
populations into substances. The criteria that can be used by any combination are: Happy, sad,
depressed, and so on moods, age, gender, weather, location and history (We Feel Fine, 2019).
Kamvar and Harris introduced the emotion search engine in their work titled We Feel Fine
and Searching the Emotional Web. The purpose of the search engine is to gather and reveal emotions
at a world level to help people understand themselves and others better. Traditional motivating
practices for sentimental analysis are included in consumer research and decision support system
tools. However, data on moods can be used to support other research in the social sciences, such as
creating scalable computing tools. Such tools have the potential to make a significant impact by
allowing social science researchers to carry out cheap and large-scale studies in generating data-
based hypotheses (Kamvar & Harris, 2011: 126). Figure 3 shows the sentimental diagram of the
emotion search engine.
Figure 3: Sentimental Diagram (Kamvar & Harris, 2011: 127)
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
67
In the literature, there are many studies structured on different languages and methods for the
sentimental analysis of movie reviews subject to this study. One of the recent studies is the Indonesian
language study conducted by Permatasari et al. Here, both dictionary-based technique and machine
learning methods are used for sentimental analysis. These techniques have f-measurement success
values of 88% and 94%, respectively (Permatasari, Fauzi, Adikara, & Lukmana Sari, 2018: 92). The
study of machine learning in Hindi language (Nanda, Dua, & Nanda, 2018) and a study of a lexical
update algorithm in Chinese language (Song, Gu, Li, & Sun, 2017) are other recent language-based
studies.
In the Turkish language, in the study titled "Performance Comparison of Text Representation
Methods in the Classification of Turkish Texts”, movie reviews were used as a data set such as “news”
and “mood” (Amasyalı, Balcı, Varlı, & Mete, 2012). In their study, Vural et al. Conducted a sentimental
analysis of movie reviews in Turkish with uncontrolled learning (Vural, Cambazoğlu, Şenkul, & Tokgöz,
2013). Kaynar et al. made the sentimental analysis of movie reviews in Turkish using four different
algorithms and observed that artificial neural networks and support vector machine algorithms gave
better results than other methods (Kaynar, Yıldız, Görmez, & Albayrak, 2016).
A Research on Sentimental Analysis of Movie Reviews
a) Purpose and Importance of Research
This research focuses on the sentimental analysis of movie reviews as an example of
pragmatic analysis in natural language processing. In this respect, a sentimental analysis of the
shared user reviews about the movies on beyazperde.com was conducted in a sample. The
sentimental analysis automatically extracts the results of emotions from text data. Thus, many
interpretations can be assigned to one of the positive or negative moods by artificial intelligence
methods without having to read them one by one. The success of the assignment includes a
comparison of the results automatically obtained with the manually marked results of the sample set
and evaluated by the precision measure.
The automatic structuring of feedback on movies through artificial intelligence methods is
invaluable for the movie's actors, producers, and advertisers, especially those who intend to watch the
movie. Such an application provides new and improved opportunities for media businesses to develop
effective strategies. In this respect, audience expectations can be evaluated, and this information can
be used as input for subsequent program projects. The importance of communication and information
technologies is indisputable today. This study is expected to contribute to the production of information
processing in the Turkish language.
b) Sample of the Research
The sample of the research is a subset of the Turkish movie reviews which are collected
randomly from www.beyazperde.com. Data were obtained from http://sentilab.sabanciuniv.edu/wp-
content/uploads/2015/03/TurkishMovieReviews.txt. This address belongs to Sabancı University.
Sentiment Analysis Research Group is engaged in the fields of text mining, information acquisition and
sentimental analysis research under the title of “Sentilab Project" at Sabancı University. The group
draws synergies from expertise in different fields such as machine learning, data mining and natural
language processing (Sentiment Analysis Research Group, 2019).
There are user reviews for the movies, 183 of which are negative, 822 of which are positive
and 145 of which are objective. Within the scope of the research, only reviews that indicate positive
and negative emotions were included. In this way, a database of 1005 reviews were created. This
database will be referred to as “Movie Reviews Database-MRD" in the following sections.
Below are two examples of reviews in the MRD that indicate a positive mood:
“I think he definitely deserved the score. I laughed so much, it was beautiful, Adam Sandler was great,
again funny. I recommend you watch :)) [bence aldığı puanı kesinlikle hak etmiş o kadar güldüm ki çok
güzeldi adam sandler harikaydı yine çok komik izlemenizi tavsiye ederim :))]
"A quality spectacle that never bored, one of the best of its kind. [hiç sıkmayan kaliteli bir seyirlik
türünün en iyilerinden.]”
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
68
The following are two examples of reviews in the Movie Reviews Database that indicate a negative
mood:
"I couldn't understand how Janiston starred in such a ridiculous stage movie. I gave it five points for
the sake of the beautiful face of Aniston. [janiston nasıl böyle saçma bir sahneli film de rol aldı
anlayamadım. 5 puan verdim o da anistonun güzel yüzünün hatırına.]”
"I can tell you that I have an extremely boring movie ... how did they make such an absurd topic, hullo.
[son derece sıkıcı bir filim olduğunu söyleyebilirim... saçma bir konuyu nasılda filim yapmışlar
maşallah.]”
c) Research Methodology
An application developed by Aytekin was utilized for automatic analysis of the reviews with
artificial intelligence methods in the MRD (Aytekin, 2013). This application has been developed with
uncontrolled learning techniques and classifies the text data according to the Naive Bayes Bit
Weighting Algorithm rules which in most cases provide effective results. The assignment to emotion
states was structured according to the probability values of 4744 adjectives/adverb-based words in the
dictionary. The reason for choosing words based on adjective/adverb is since they can best convey
the desired mood.
However, since this application was developed based on blog reviews, it was inadequate in
evaluating movie reviews. For this reason, 183 negative emotion state reviews and 822 positive
emotion state reviews were transferred to a separate database and frequency analyses were
performed on word basis. The assumption is that, for example, if a word in the database of 183
negative emotion reviews is at the forefront in terms of frequency and is adjective/adverbial-based,
then this word must be included in the dictionary with the corresponding probability value. The same
applies to words in positive emotion reviews. In this way, new words were added to the dictionary and
a new database was obtained. This database will now be referred to as the "Turkish Movie Reviews
Emotion Dictionary-TMRED". In other words, it was envisaged that the words in this dictionary could
now represent film interpretations.
What is important here is the probability value of the newly added words. As a result, there are
many probability instances between 1-100. For example, will the newly added “even [hatta] and
“times [kez]“ words in a positive mood receive the same positive probability values? Or is it not
necessary to change the probability values to represent the movie reviews of the words in the old
dictionary but not new ones? As the answer to the second question, let us immediately say that the
probability values of 5 words had to change. We also discuss the answers to these questions under
the heading "examples of erroneous results" of the research. Table 1 shows examples of the words
added to the TMRED.
Table 1: Examples of Words Added to Turkish Movie Reviews Emotion Database Glossary
Positive Emotional Words
Negative Emotional Words
super [süper] disgrace [rezalet]
funny [eğlenceli] vote [oy]
exactly [kesinlikle] Bayık [unconscious]
times [kez] affect [etkileme]
times [kere] exaggeration [abartı]
times [defa]
quality [kaliteli]
even [hatta]
even [hele]
advice [tavsiye]
legend [efsane]
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
69
a. Research Findings and Evaluation
First of all, it is important to note that users use spoken language when writing reviews about
the movie. Because these environments are what we call "informal" in communication. Therefore,
norms such as spelling rules and grammar are often not paid attention. However, this situation causes
some problems in the operation of the algorithm. For example, Table 2 shows the different spellings of
the word “beautiful [güzel]" in social media (Sütcü & Aytekin, 2018). If the user chooses an incorrect
spelling in the form of "quzel” instead of “güzel", the algorithm will not be able to find such a word in
the Turkish Movie Reviews Emotion Dictionary and evaluate its probability value in the calculation.
This may cause the review to be assigned a false mood. Different solutions are presented in the
literature for this situation. One of them is using wildcards. In this method, often misspelled words are
also included in the dictionary. The second and more important one is the spelling check. The
Zemberek project, which has been developed in this regard, guides the researchers working in the
field of Turkish natural language processing. In this research, the spellchecking of reviews was done
manually. Yet, it is possible that the reviews presented to the algorithm are assigned to false emotions
without spelling. This way has been taken in order not to compromise the sensitivity measure.
Table 2: Different Spellings of the word Güzel [beautiful] in Social Media
güzel güzelll guzel güsel güzeel guselll
gzl gsl gsel gzel qüzel qusel
gzel quzel güssel quselll qussel qüsel
The Naive Bayes Bit Weighting Algorithm can be summarized as follows:
1. As a first step, reviews are extracted from punctuation and numbers (because only text is
required), all are converted to lower case.
2. Each word in the review to be analyzed is compared with the words recorded in the Turkish
Film Reviews Emotion Database dictionary and bit weighting is performed; 1 for those found, and 0 for
those who do not. The number of repetitions in the review is ignored (algorithms using repetition
numbers work on a different principle).
At this point, it would be appropriate to talk about the task of finding a root. Some words in the
review, suffixes, etc. may not be included in the Turkish Movie Reviews Emotion Database dictionary.
However, this disadvantage could be eliminated with an application to find the roots of words. Since
the words in the dictionary are adjective and adverb-based words, they do not have roots. Because
adjectives, when used alone are names. Therefore, they cannot take the suffixes (noun suffix,
possessive suffix, plural suffix) when they are used as adjectives. Adverbs are non-affixed words; they
do not take adjuncts. However, if they are used as names, they may receive adjunct. Therefore, no
application has been developed for root discovery.
The flow diagram of the algorithm is shown in Figure 4. All stages are shown in a certain order
with a focus on how to go to the solution.
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
70
Figure 4: Algorithm Flow Diagram
The algorithm works on the following basis (the larger of the two calculated results will give the
emotion state of the review):
Probability of review to be in positive emotion = ½ * (the probability that each word/phrase in
the TMRED dictionary will be in positive emotion).
Probability of review to be in negative emotion = ½ * (the probability that each word/phrase in
the TMRED dictionary will be in negative emotion).
In the research, the success of assigning the manually marked reviews of the application to
the related mood was evaluated with sensitivity measure. Sensitivity measure is one of the most
commonly used methods for measuring text classification effectiveness. Table 3 and Table 4 show the
results of the analysis of the assignment of reviews to the relevant emotional states with sensitivity
measures.
Review
s
Assigned Correctly
Number of
Review
s
Incorrectly Assigned
Unassigned
Review
s
658 142 22
Sensitivity Measure = %80,04
Table 3: 822 Appointment Results of Positive Emotional State Reviews
Review
s
Assigned Correctly
Number of
Review
s
Incorrectly Assigned
Unassigned
Review
s
119 58 6
Sensitivity Measure = %65,02
Table 4: 183 Appointment Results of Negative Emotional State Reviews
On the other hand, Table 3 and Table 4 mention unassignable reviews. For example, "I said
no more, what a trick [yok artık dedim, ne dümenler dönüyor ya]”, while the negative emotional state
review was not made. Because no words in this review are included in the Turkish Film Reviews
Emotion Dictionary. Because the dictionary consists of adjective/adverb-based words. Accordingly, the
Start
Read the
words from
the Turkish
Movie Reviews
Emotion
Database
Dictionary
Read the
likelihood of
words being in
a negative
state
Count how
many words
are in the
comment
Delete
punctuation
and digits
Search the first
word in the
dictionary
Make a bit
operation
calculation if
the word is
found
If not, was the
last word
sought?
Skip to next
word if not the
last word
Print
results
and finish
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
71
dictionary needs to be expanded/improved. Reviews that cannot be assigned in the above tables are
also considered in the wrong category.
When Table 3 and Table 4 are examined comparatively, it is seen that the success of the
assignment is higher in positive mood reviews. One reason for this is that the number of samples of
positive reviews is much higher than negative reviews. In other words, words capable of representing
positive interpretations could thus be identified at a higher rate.
Another assessment can be based on the number of words of reviews. For example, the
average number of correctly assigned negative mood reviews is 41, while the incorrectly assigned
negative mood reviews are 36 words. Thus, short-written negative emotion state reviews may cause
false assignments. So, if the review is longer, it can be said to be more accurately assigned to the
negative emotional state.
b. Discussions on Examples of Incorrect Results
The following are examples of reviews that are assigned to the wrong mood in the TMRED for
different reasons:
1. “sometimes it was shown on TV often and I saw it as a very sympathetic movie. A film that
needs to be watched or not. Meanwhile, Vanessa Paradis is the wife of the beloved pirate Johnny
Depp and has two children. Wish them happiness [bi ara tvde sık sık verilen we çok sempatik bi film
olarak gördüm izlenmesi gereken ya da gerkmeyen film bu arada u an vanessa paradis çok sevgili
korsan johnny deppin eşi rolündedir we 2 adet çocukları vardır mutluluklar dileriz]”.
The above review is marked manually as a positive emotion. However, the application
assigned this review to a negative mood. In this review, the words in the dictionary and the probability
values are as follows:
very [çok] 0,111991088
sympathetic [sempatik] 0,462799907
need [gereken] 0,606238745
beloved [sevgili] 0,718546845
often [sık sık] 0,735649124
In this analysis, words represent review, but the probability values need to be rearranged. This
is a very challenging task, with a huge number of combinations. In addition, the new values should not
change or decrease the calculated sensitivity measure for negative emotions.
2. “As if Billy Bob Thornton, Nick Nolte, and Sean Penn names as the cast are not enough in a
film, they do a swell job. The movie squeezed me yes but only psychologically. In short, a successful
film noir, good work of Oliver Stone. [billy bob thornton nick nolte ve sean penn gibi isimlerin kadroda
bulunması yetmiyormuş gibi bide döktürdükleri film film beni sıktı evet ama sadece psikolojik
olarak kısacası başarılı bir kara film oliver stone un iyi işlerinden]”.
The above review is marked manually as a positive emotion. However, the application
assigned this review to a negative mood. In this review, the words in the dictionary and their probability
values are as follows.:
good [iyi] 0,079452529
successful [başarılı] 0,220766411
psychological [psikolojik] 0,70862983
noir [kara] 0,798007993
but [ama] 0,839020825
only [sadece] 0,933505058
In this analysis, the words represent the review, but in part of the review, the user says "the
movie squeezed me yes but only psychologically...” This indicates a negative mood. In other words,
although the user indicates a positive state of emotion in the end, he also makes a negative discourse.
Therefore, the algorithm will find the words “but”, “onlyand “psychological" and may consider their
negative probability values and cause the result to be calculated negatively. To prevent this situation,
transition-based sentence analysis is recommended.
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
72
3. “every movie is beautiful [her film güzeldir]”
The above review is marked manually in case of positive emotion. However, the application
could not assign this review to any state of emotion, in other words, it could not classify it. However, it
should have been appointed because of the word “beautiful”. At this point, it should be noted that the
TMRED dictionary developed within the scope of the research consists of adjective/adverb-based
words. Although the word "güzel" is included in this review, it also has the suffix “-dir; that is, the verb,
or even the verb-noun. However, the word "güzel" [beautiful] is an adjective-based word when it is
used without a suffix. For the reasons explained, an assignment to the interpretation could not be
carried out. The situation needs to be examined within the framework of root-finding studies in
sentimental analysis.
4. “Well, some movies start with such high score when newly added then fall to real score [ya
bazı filmler yeni eklendiğinde böyle yüksek puanla başlıyor sonradan gerçek puanına düşer]”
The above review is marked manually as a negative emotion. However, the application
assigned this review to positive emotion. The words “high”, “new” and “real" in the interpretation are in
the positive group in the TMRED dictionary. In this interpretation, only the word "düşer [falls]" indicates
negativity but could not be evaluated by the algorithm because of the verb structure. In this case, the
dictionary structure needs to be rearranged to include verbs and body analysis.
5. “It's a terrible remake movie. There are excellent players, but they were wasted, never showed
themselves because of the script. It certainly doesn't deserve 7 [berbat bir yeniden çevrim mükemmel
oyuncular var ama harcanmışlar hiç kendilerini senaryo yüzünden gösterememişler 7 yi kesinlikle hak
etmiyor]”.
The above review is marked manually as a negative emotion. However, the application
assigned this review to positive emotion. In this review, the words in the dictionary and the probability
values are as follows:
certainly [kesinlikle] 0,0001
excellent [mükemmel] 0,111991088
there are [var] 0,172251443
terrible [berbat] 0,989842728
but [ama] 0,839020825
never [hiç] 0,965074367
The first 3 of the above words are in positive and the last 3 are in negative group in the
TMRED dictionary. However, the assignment was still incorrect with a small margin. This can be
remedied by rearranging the probability values. More important is that the algorithm ignores the
numerical data (7) in accordance with the rules of text mining. However, these data can significantly
influence the assignment of interpretation to positive or negative mood. This can be prevented by
formulas that convert numbers into text and the rules can be expanded in this direction.
Conclusion
This study is based on pragmatic analysis which is one of the five dimensions of natural
language processing. The pragmatic analysis approach is used here to reveal users' perspectives,
assessments, attitudes, needs, and the message they want to give, and their written language is
preferred. Because of the interactive feature of today's internet environment, users express their
opinions by writing reviews in media they interested. These written statements of users' reviews on the
internet provided significant data for the natural language processing field and triggered analyzes to
extract information from these data. Sentimental analysis can be said to be the result of such a trend.
Sentimental analysis is now expanding as applicable to all media formats such as text-
sound-image. The important thing is that the data contains a mood and the results that can be
automatically detected from them benefit an area.
In this study, a sentimental analysis of the shared user reviews about films was done. The
sensitivity measure obtained was 80,04% for positive emotion reviews and 65,02% for negative
emotion reviews. The success of the assignment was higher in positive emotion reviews and the
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
73
negative number of samples was much less than positive reviews. From this point of view, it can be
said that better results can be obtained by expanding the dictionary in studies based on the mixed
approach of dictionary-based technique and machine learning.
Information about positive or negative moods obtained from user reviews on a film, for
example, about 60% of users have negative emotions about a movie- can be used for different
purposes. Although this information does not represent all audience views, it is still an idea for users
who have the potential to watch the film on the individual side. In addition, film actors, producers,
advertisers, critics and so on are a very important source of feedback. It provides them with new and
expandable opportunities to develop more effective strategies and contributes to the limitation of
feedback in the traditional mass media “cinema”.
Finally, the improper results of the developed application have been discussed together
with the different reasons causing the error and suggestions have been made. Although the initial
objective is largely achieved, it is useful to consider these recommendations for future studies. This
application, which was developed based on unsupervised learning, is expected to contribute to the
natural language processing literature in Turkish language.
References
[1] Adalı, E. (2013). Doğal Dil İşleme. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği
Dergisi(7).
[2] Amasyalı, M., Balcı, S., Varlı, E. N., & Mete, E. (2012, 12). Türkçe Metinlerin
Sınıflandırılmasında Metin Temsil Yöntemlerinin Performans Karşılaştırılması. EMO BİLİMSEL
DERGİ, 2(4), p. 1-10.
[3] Aytekin, Ç. (2013, March). An Opinion Mining Task in Turkish Language: A Model for
Assigning Opinions in Turkish Blogs to the Polarities. Journalism and Mass Communication,
3(3), p. 179-198.
[4] Bell, T. (2017, 06 27). Terena Bell Twitter Account. Address:
https://twitter.com/terenabell/status/879717843834724353?lang=ca
[5] Culver, A. (2019, 04 09). The Dartmouth. Students develop “Left on Read” app to track texting
habits. Address: https://www.thedartmouth.com/article/2019/04/students-develop-left-on-read-
app-to-track-texting-habits
[6] Eryiğit, G. (2014). ITU Turkish NLP Web Service. 14th Conference of the European Chapter of
the Association for Computational Linguistics EACL 2014. Gothenburg.
[7] Gleason, J. B. (2005). The Development of Language. Boston: Pearson Education.
[8] Kamvar, S., & Harris, J. (2011). We Feel Fine and Searching The Emotional Web.
Proceedings of The Fourth ACM International Conference On Web Search and Data Mining,
(p. 117-126). Hong Kong.
[9] Kaynar, O., Yıldız, M., Görmez, Y., & Albayrak, A. (2016). Makine Öğrenmesi Yöntemleri ile
Duygu Analizi. International Artificial Intelligence and Data Processing Symposium (IDAP'16),
(p. 234-241). Malatya.
[10] Kim, D., & Kim, S. (2018). Newspaper journalists’ attitudes towards robot journalism.
Telematics and Informatics(35), s. 340-357. doi:10.1016/j.tele.2017.12.009
[11] Li, W., Xu, K., Qi, J., & Ding, X. (2018). A Natural Language Processing Method of Chinese
Instruction for Multi-legged Manipulating Robot. IEEE International Conference on Robotics
and Biomimetics, (p. 2171-2176). Kuala Lumpur.
[12] Liu, B. (2012). Sentiment Analysis and. San Rafael: Morgan & Claypool Publishers.
[13] Naeun, L., Kirak, K., & Yoon, T. (2017). Implementation of robot journalism by programming
custombot using tokenization and custom tagging. 19th International Conference on Advanced
Communication Technology (ICACT), (p. 566-570). Kwangwoon Do.
doi:10.23919/ICACT.2017.7890154
[14] Nanda, C., Dua, M., & Nanda, G. (2018). Sentiment Analysis of Movie Reviews in Hindi
Language Using Machine Learning. International Conference on Communication and Signal
Processing (ICCSP) , (p. 1069-1072). Chennai.
Communication and Technology Congress – CTC 2019 (April 2019 – Turkey, İstanbul)
DOI NO: 10.7456/ctc_2019_05
© İstanbul Aydın University © Edlearning.it. C417F0005
74
[15] Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing avorability using natural
language processing. Proceedings of the K-CAP-03, 2nd Int Conference on Knowledge
Capture.
[16] Oflazer, K. (2016). Türkçe ve Doğal Dil İşleme. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve
Mühendisliği Dergisi, 5(2), p. 1-12. Address: https://dergipark.org.tr/download/article-
file/207207
[17] Periscopic. (2019, 5 4). On the Trump Emoto-Coaster. Address:
https://emotions.periscopic.com/
[18] Permatasari, R. I., Fauzi, M. A., Adikara, P. P., & Lukmana Sari, E. D. (2018). Twitter
Sentiment Analysis of Movie Reviews using Ensemble Features Based Naïve Bayes.
International Conference on Sustainable Information Engineering and Technology (SIET), (s.
92-95). Malang. doi:10.1109/SIET.2018.8693195
[19] Ren, R., Wu, D. D., & Liu, T. (2019, 03). Forecasting Stock Market Movement Direction Using
Sentiment Analysis and Support Vector Machine. IEEE SYSTEMS JOURNAL, 13(1), s. 760-
770.
[20] Rosa, R. L., Schwartz, G. M., Ruggiero, W. V., & Rodr´ıguez, D. Z. (2019, 4). A Knowledge-
Based Recommendation System That Includes Sentiment Analysis and Deep Learning. IEEE
TRANSACTIONS ON INDUSTRIAL INFORMATICS, 15(4), p. 2124-2135.
[21] Sentiment Analysis Research Group. (2019). SentiLab. Address:
http://sentilab.sabanciuniv.edu/
[22] Solangi, Y. A., Solangi, Z. A., Aarain, S., Abro, A., Mallah, G. A., & Shah, A. (2018). Review on
Natural Language Processing (NLP) and Its Toolkits for Opinion Mining and Sentiment
Analysis. IEEE 5th International Conference on Engineering Technologies & Applied
Sciences. Bangkok.
[23] Song, Y., Gu, K., Li, H., & Sun, G. (2017). A Lexical Upadating Algorithm for Sentiment
Analysis on Chinese Movie Reviews. Fifth International Conference on Advanced Cloud and
Big Data, (p. 188-193). Shanghai. doi:10.1109/CBD.2017.40
[24] Sütçü, C., & Aytekin, Ç. (2018). Veri Bilimi. İstanbul: Paloma.
[25] Traylor, T., & Straub, J. (2019). Classifying Fake News Articles Using Natural Language
Processing to Identify In-Article Attribution as a Supervised Learning Estimator. IEEE 13th
International Conference on Semantic Computing (ICSC) (p. 445-449). California: IEEE
Computer Society. doi:10.1109/ICSC.2019.00086
[26] Vural, A., Cambazoğlu, B., Şenkul, P., & Tokgöz, Z. (2013). A Framework for Sentiment
Analysis in Turkish: Application to Polarity Detection of Movie Reviews in Turkish. E. Gelenbe,
& R. Lent içinde, Computer and Information Sciences III (p. 437-445). London: Springer.
[27] We Feel Fine. (2019, 05 06). We Feel Fine. Methodology. Address:
http://wefeelfine.org/methodology.html
[28] Withanage, P., Liyanage, T., Deeyakaduwe, N., Dias, E., & Thelijjagoda, S. (2018). Road
Navigation System Using Automatic Speech Recognition (ASR) And Natural Language
Processing (NLP). 2. I. (R10-HTC) (Dü.). içinde Malambe: IEEE. doi:10.1109/R10-
HTC.2018.8629859
[29] Zemberek. (2019, 5 3). Zemberek NLP. Address: http://zembereknlp.blogspot.com/
... These tasks include classifying phrases and sentences into a sequence of characters, determining the intended meaning of a sentence, determining whether it follows grammatical rules, and extracting or understanding the meaning of a sentence [41]. The respective terms for these five tasks are morphological analysis, syntactic analysis, semantic analysis, phonological analysis, and pragmatic analysis [50,54]. ...
Article
Full-text available
The demand for automated customer support approaches in customer-centric environments has increased significantly in the past few years. Natural Language Processing (NLP) advancement has enabled conversational AI to comprehend human language and respond to enquiries from customers automatically independent of the intervention of humans. Customers can now access prompt responses from NLP chatbots without interacting with human agents. This application has been implemented in numerous business sectors, including banking, manufacturing, education, law, and healthcare, among others. This study reviewed earlier studies on automating customer queries using NLP approaches. Using a systematic review methodology, 73 articles were analysed from reputable digital resources. The evaluated result offers an in-depth review of prior studies investigating the use of NLP techniques for automated customer service responses, including details on existing studies, benefits, and potential future study topics on the use of NLP techniques for business applications. The implications of the results were discussed and, recommendations made.
Chapter
Artificial intelligence (AI) is increasingly becoming pervasive in contemporary work organisations due to its perceived benefits. A review of the literature shows that AI is being appropriated in a wide range of industries that include the automotive industry, food chains, retail businesses and parts of the media and communication sector. Although several scholars have identified the immense benefits associated with appropriating AI in work organisations, there is a paucity of studies that examine how AI is being appropriated into the broader field of organisational communication. Using a systematic search of EbscoHost, Web of Science and Google Scholar, this scoping review sought to understand how AI is being appropriated to enhance organisational communication in contemporary organisations. Our findings show that a growing number organisations are appropriating AI in organisational communication. The studies analysed show that AI tools such as autobots, WordAI, Spin Rewriter and NLG have been appropriated to enhance and replicate organisational communication tasks such as media and audience analysis, content creation, crisis communication and to a lesser extent communication strategy and decision support. However, the findings confirm that current AI applications are still limited in terms of their capacity to cater for human capabilities such as perceptual understanding and self-thinking. Looking into the future, AI needs to incorporate more human capabilities to extend its utility in organisational communication. Its appropriation in organisations needs to be extended beyond replacing and enhancing organisational communication tasks to informing decision making and organisational communication strategy.
Chapter
Chatbots leverage two technologies: artificial intelligence (AI) and linguistics. We explored some concepts and historical developments in computing in the previous chapter. Now it’s time to take a deep look into the fascinating world of AI. The two main topics in this chapter will be the main techniques inside natural language processing (NLP) and the basics of linguistics. We’ll be traveling in time into the near future as well, peeking at some of the possibilities of quantum computing as it pertains to AI.
Conference Paper
Sentiment Analysis is an application of Natural Language Processing (NLP) which is used to find the sentiments of users reviews, comments etc. on the internet. Nowadays, social websites like facebook, twitter are widely used for posting the users reviews about different things such as movies, news, food, fashion, politics and much more. Reviews and opinions play a major role in identifying the level of satisfaction of users regarding a particular entity. These are then used to find the polarity i.e. positive, negative and neutral. In this paper an approach to Sentiment Analysis on movie reviews in Hindi language is discussed.
Article
This study identifies the attitudes of three types of newspaper journalists towards robot journalism by employing Q-methodology. The samples analyzed in this study are 47 journalists from 17 South Korean newspapers. The first type believes that journalism is beyond robots' capabilities, a position terms "journalism's elitism." The second type demonstrates the "Frankenstein complex," meaning greater concern about the introduction of robots based on dismal scenarios. The last type has a relatively rosy view, which focuses on a positive blueprint despite recognition of some threats.
Conference Paper
We present We Feel Fine, an emotional search engine and web-based artwork whose mission is to collect the world's emotions to help people better understand themselves and others. We Feel Fine continuously crawls blogs, microblogs, and social networking sites, extracting sentences that include the words "I feel" or "I am feeling", as well as the gender, age, and location of the people authoring those sentences. The We Feel Fine search interface allows users to search or browse over the resulting sentence-level index, asking questions such as "How did young people in Ohio feel when Obama was elected?" While most research in sentiment analysis focuses on algorithms for extraction and classification of sentiment about given topics, we focus instead on building an interface that provides an engaging means of qualitative exploration of emotional data, and a flexible data collection and serving architecture that enables an ecosystem of data analysis applications. We use our observations on the usage of We Feel Fine to suggest a class of visualizations called Experiential Data Visualization, which focus on immersive item-level interaction with data. We also discuss the implications of such visualizations for crowdsourcing qualitative research in the social sciences.
12n s K a r şılaştırılması
  • M Amasyalı
  • S Balcı
  • E N Varlı
  • E Mete
Amasyalı, M., Balcı, S., Varlı, E. N., & Mete, E. (2012, 12n s K a r şılaştırılması. EMO BİLİMSEL D E R G İ, 2(4), p. 1-10.
An Opinion Mining Task in Turkish Language: A Model for
  • Ç Aytekin
Aytekin, Ç. (2013, March). An Opinion Mining Task in Turkish Language: A Model for
The Development of Language
  • J B Gleason
Gleason, J. B. (2005). The Development of Language. Boston: Pearson Education.
Makine Öğrenmesi Yöntemleri ile D u y g u A n a l i z i
  • O Kaynar
  • M Yıldız
  • Y Görmez
  • A Albayrak
Kaynar, O., Yıldız, M., Görmez, Y., & Albayrak, A. (2016). Makine Öğrenmesi Yöntemleri ile D u y g u A n a l i z i. I n t e r n a t i o n a l A r t i f i c i a l I n t e l l i g e n c e a n d D a t a P r o c e s s i n g S y m p o s i u m ( I D A P ' 1 6 ), ( p. 2 3 4 -2 4 1 ). M a l a t y a.
Sentiment Analysis and
  • B Liu
Liu, B. (2012). Sentiment Analysis and. San Rafael: Morgan & Claypool Publishers.
Implementation of robot
  • L Naeun
  • K Kirak
  • T Yoon
Naeun, L., Kirak, K., & Yoon, T. (2017). Implementation of robot journalism by programming c u s t o m b o t u s i n g t o k e n i z a t i o n a n d c u s t o m t a g g i n g. 1 9 t h I n t e r n a t i o n a l C o n f e r e n c e o n A d v a n c e d C o m m u n i c a t i o n T e c h n o l o g y ( I C A C T ), ( p.