Conference PaperPDF Available

Measuring Service Encounter Satisfaction with Customer Service Chatbots using Sentiment Analysis

Authors:

Abstract and Figures

Chatbots are software-based systems designed to interact with humans using text-based natural language and have attracted considerable interest in online service encounters. In this context, service providers face the challenge of measuring chatbot service encounter satisfaction (CSES), as most approaches are limited to post-interaction surveys that are rarely answered and often biased. As a result, service providers cannot react quickly to service failures and dissatisfied customers. To address this challenge, we investigate the application of automated sentiment analysis methods as a proxy to measure CSES. Therefore, we first compare different sentiment analysis methods. Second, we investigate the relationship between objectively computed sentiment scores of dialogs and subjectively measured CSES values. Third, we evaluate whether this relationship also exists for utterance sequences throughout the dialog. The paper contributes by proposing and applying an automatic and objective approach to use sentiment scores as a proxy to measure CSES.
Content may be subject to copyright.
This is the author’s version of a work that was published in the following source
Feine, J., Morana, S., and Gnewuch, U. 2019. Measuring Service Encounter Satisfaction with
Customer Service Chatbots using Sentiment Analysis,” in Proceedings of the 14th International
Conference on Wirtschaftsinformatik (WI2019), Siegen, Germany, February 2427.
Please note: Copyright is owned by the author and / or the publisher.
Commercial use is not allowed.
Institute of Information Systems and Marketing (IISM)
Fritz-Erler-Strasse 23
76133 Karlsruhe - Germany
http://iism.kit.edu
Karlsruhe Service Research Institute (KSRI)
Kaiserstraße 89
76133 Karlsruhe Germany
http://ksri.kit.edu
© 2017. This manuscript version is made available under the CC-
BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-
nc-nd/4.0/
14th International Conference on Wirtschaftsinformatik,
February 24-27, 2019, Siegen, Germany
Measuring Service Encounter Satisfaction with
Customer Service Chatbots using Sentiment Analysis
Jasper Feine1, Stefan Morana1, and Ulrich Gnewuch1
1 Karlsruhe Institute of Technology, Institute of Information Systems and Marketing (IISM),
Karlsruhe, Germany
{jasper.feine,stefan.morana,ulrich.gnewuch}@kit.edu
Abstract. Chatbots are software-based systems designed to interact with humans
using text-based natural language and have attracted considerable interest in
online service encounters. In this context, service providers face the challenge of
measuring chatbot service encounter satisfaction (CSES), as most approaches are
limited to post-interaction surveys that are rarely answered and often biased. As
a result, service providers cannot react quickly to service failures and dissatisfied
customers. To address this challenge, we investigate the application of automated
sentiment analysis methods as a proxy to measure CSES. Therefore, we first
compare different sentiment analysis methods. Second, we investigate the
relationship between objectively computed sentiment scores of dialogs and
subjectively measured CSES values. Third, we evaluate whether this relationship
also exists for utterance sequences throughout the dialog. The paper contributes
by proposing and applying an automatic and objective approach to use sentiment
scores as a proxy to measure CSES.
Keywords: online customer service, chatbot, sentiment analysis, service
encounter satisfaction, correlation analysis
1 Introduction
Digital communication technologies have become an integral part for organizations to
interact with their customers [1]. Many companies offer online services via live chat
interfaces, which enable customers to directly interact with customer service employees
[2]. This type of text-based service encounter is a cost effective service solution and
often the preferred way of communication for young people [3]. One technology which
is often deployed to assist service employees in online service encounters are chatbots
[1]. Chatbots are software-based systems designed to interact with humans via text-
based natural language [4, 5] and can be found across industries (e.g., airlines, energy
provider). Gartner predicts that by 2020, 25% of all customer services organizations
will integrate this technology [6].
Despite their great potential, many customer service chatbots did not meet customer
expectations and led to service failures [7]. As a result, many service providers retired
their chatbots, as unsatisfactory online service encounters have negative effects on
word-of-mouth, loyalty, and intention to repurchase a product [8]. Ignoring customer
frustrations can strongly impede the performance of customer service encounters and
carries the risks that the service chatbot is perceived as cold, socially indebt,
untrustworthy, and incompetent [9]. Therefore, service providers should identify
service encounters that were below customer’s expectations [10] and trigger service
recovery procedures (e.g., offering compensation). Such procedures can help to recover
from almost any service failure and increase trust, perception of fairness, and service
experience [10]. However, most approaches to identify dissatisfied customers in a text-
based online environment (e.g., chat, social media) are limited to post-interaction
surveys [11]. This is problematic as self-reported data can hardly be retrieved during
an interaction, is influenced by various biases [12], and only few users are willing to
provide this kind of information [8, 11]. Therefore, we propose that an automated
method to measure chatbot service encounter satisfaction (CSES) during a customer-
chatbot interaction could help service providers to deal with these issues.
To develop such a method, we want to take advantage of the fact that written text is
associated with a persons thoughts, emotions and motivations [1315]. Humans write
differently when they are happy or frustrated and thus, written text by itself conveys
much information about a human [14]. Users who are less happy with a chatbot use less
assent, fewer positive, and more anger-related words and thus, express more negative
sentiments [16]. The analysis of such opinionated text can provide valuable information
about the user as opinions are “key influencers of our behaviors[17, p. 2] and
sentiment and tonal polarity are inherent properties of human-human communication
and interaction[18, p. 1367].
As a manual analysis of expressed polarity in written text does not scale well to
larger datasets [19], automated sentiment analysis methods have been developed. These
methods are capable of automatically extracting positive or negative polarity expressed
in written text [20]. Moreover, current sentiment analysis methods have been found to
be very accurate and thus, seem to be a valid approach [21, 22]. However, research has
rarely applied sentiment analysis in human-computer interaction (HCI) so far [20, 23].
Most HCI studies focus on auditory and visual signals of humans as these transmit the
majority of communication-related information [20]. Moreover, most sentiment
analysis studies focus on the method itself [23]. As a result, there is a lack of
understanding on how to apply sentiment analysis in online chatbot service encounters
to obtain valuable information about the user and her/his CSES. Therefore, we
investigate the application of sentiment analysis methods for chatbots in online service
encounters by drawing on research that text-based communication by itself is rich in
informative signals [24] and that written language is influenced by emotions, intentions,
and thoughts [1315]. More specifically, we argue that sentiment analysis of dialog
data can be used as an easy-to-use and objective proxy to measure CSES. Therefore,
our research project addresses the following research question:
How to measure service encounter satisfaction with a chatbot using sentiment
analysis methods?
To address this research question, we first compare different sentiment analysis
methods on an empirical level by analyzing the calculated sentiment scores for two
datasets. Next, we test for a potential correlation between sentiment scores and CSES
values that were measured using a survey-based approach in an online experiment. In
doing so, we first investigate this potential relationship on a dialog level and second on
an utterance level (i.e., single messages). This paper contributes by proposing and
applying an automatic and objective approach to use sentiment scores as a proxy to
measure CSES. Our proposed approach enables researchers and practitioners, such as
online customer service providers, to objectively and automatically retrieve valuable
information after and during an online service encounter.
2 Related Work
2.1 Customer Service Chatbots
Recent advances in technology and great business potential have led to an increased
interest in the development of conversational agents [5, 25]. Conversational agents are
software-based systems designed to converse with a user via natural language [4, 5].
Thereby, the user interacts with the conversational agent in a natural dialog and does
not use a predefined set of keywords or command phrases [4]. They can offer both
speech- and text-based interfaces and can also be visualized and animated (i.e.,
embodied conversational agent) [4]. Conversational agents that interact with the user
primarily via a text-based interface are often referred to as chatbots [5]. Chatbots can
be deployed on various communication channels, such as instant messaging platforms
(e.g., Line, Telegram, WeChat), websites, or on social media (e.g., Facebook, Twitter)
and are accessible from various devices (e.g., PCs, mobile phones) [4]. Since
Weizenbaum developed the first chatbot named ELIZA in 1966, much research has
been conducted and various chatbots have been deployed across industries [4, 5].
One of the reasons why both research and practice are increasingly using this
technology is the fact that chatbots interact in a human-like interaction style (i.e., use
natural language) and offer great business potential (i.e., 24/7 availability at lost costs)
[4]. Therefore, chatbots are increasingly implemented in online service encounters as
many companies communicate with their customers via live-chats on their website or
on social media platforms [1, 2]. Chatbots could help to automate online customer
service, save costs, and enhance online experience [1, 26]. For example, instead of a
customer calling or chatting with a service employee, customers are now
communicating with a service chatbot [26]. In addition, chatbots can also take the role
of first tier support agents and assist customer service employees. Therefore, chatbots
can first start an online service encounter and then seemingly handover the conversation
to a human agent when required. This can lead to a great reduction of routine requests
usually handled by service employees.
2.2 Chabot Service Encounter Satisfaction
Satisfaction is an often applied construct in information systems (IS) research to
evaluate the success and effectiveness of a system and it is particularly critical for the
success of service systems [27]. It reflects whether customers perceive a service as
pleasurable with regard to its consumption-related fulfilment [8]. High customer
satisfaction values are important to achieve long-term success, especially in highly
competitive markets, and therefore should have priority for any organization [8, 28].
Customer satisfaction is strongly impacted by the service encounter satisfaction,
which refers to the post-consumption evaluation of a service encounter [29, 30]. A
successful service encounter makes a company’s product incrementally more effective
and easier to use [28], influences the customer’s choice independent whether a service
is provided offline or online [31], and is linked to several desired outcomes such as
word-of-mouth, loyalty, and intention to repurchase a product [8, 32]. Thus, service
encounter satisfaction is a critical indicator for any organization [8, 28].
Service encounter satisfaction is influenced by several antecedents such as the
customization and flexibility in service encounter, effective service recovery when
failures occur, and spontaneous delights (i.e., pleasing experiences customers do not
expect) [32]. In addition, various design elements of a chatbot influence the CSES such
as verbal communication cues (i.e., being polite, responsive, and show mutual
understanding), level of expertise (i.e., a core attribute of a service employee), or visual
cues (i.e., such as an avatar) [29]. In an online context, the measurement of CSES is
often limited to follow-up surveys [11, 29]. Thus, CSES cannot be retrieved in real-
time, is often biased, and the surveys are only answered by a few users [11, 12].
Moreover, customers have a general “reluctance to share their sentiments with firms
[8, p. 359] and thus, companies are often not able to react fast enough to dissatisfied
customers using service recovery procedures [10]. Failing to recover can result in lost
customers, negative word of mouth, decreased loyalty, and less profits [28, 32].
2.3 Sentiment Analysis Methods
A common method within the natural language understanding literature is the analysis
of opinions and sentiments expressed in written text. This becomes meaningful as
research has shown that written text is clearly impacted by the user’s emotions,
intentions, and thoughts [1315]. Consequently, written text says something about us
and can be used as a proxy for information about the author. Therefore, various methods
have been developed to analyze the opinions and sentiments expressed in written text
[21]. These methods are named and defined in many different ways (e.g., sentiment
analysis, opinion mining, see [33]). As it is the most common name, we follow [17, 33]
and define sentiment analysis as the computational analysis of written language to
identify the user’s perceived positive or negative valence towards a certain entity (e.g.,
product, service, event). Sentiment analysis has recently witnessed great attention,
because of the large availability of opinion-rich resources on the Internet (e.g., online
reviews) and advances in artificial intelligence [17]. Consequently, many of the major
technology companies offer sentiment analysis solutions (e.g., IBM, Google) and also
various open source solutions are available (see [21]). This led to the development of
many available and precise methods (see [20, 21]).
Sentiment analysis methods can be generally distinguished into two broad but also
overlapping approaches, namely the application of semantic rules or statistical methods
[20]. Methods of the first category compare sentiment-related expressions with
sentiment lexicons that contain the semantic orientation of words [34]. One of the
greatest challenges of these methods is that the semantic orientation of individual words
does not necessarily correspond to the contextual polarity of the whole sentence [34].
Therefore, it is necessary to extract additional linguistic patterns of the text by
conducting morpho-syntactic text analyses (i.e., wordform, lemma, part of speech tags)
[20]. Too specific extraction patterns, however, limit the application range to a specific
domain. Methods of the second, more recently applied category use unsupervised or
supervised machine learning algorithms including support vector machines and Bayes
classifiers [20]. These methods enable the development of more generic models, but
require labeled data for training purposes. Consequently, the quality of such models is
heavily influenced by the reliability of sentiment annotations [20].
Today’s applications of sentiment analysis are manifold. Sentiment analysis can be
used to predict the success of political campaigns [35], identify interaction problems
within a conversation corpus [11], or even to scan the dark web in an intelligence
context [36]. Nevertheless, only a few studies analyzed sentiments in a chatbot context
yet as most studies are focusing on the method itself [20, 23]. One reason is the
difficulty to classify rather short informal chat messages, which include a high degree
of language creativity, spelling mistakes, and the expression of sentiments without real
intentions [19]. Another reason are the differences and ambiguities in human mood
coding which make it difficult to create a gold standard [37] and thus, it is difficult to
develop user-independent prediction models [38]. However, some related research has
already applied sentiment analysis to infer the customer satisfaction from product
reviews for shopping websites and mobile services [23, 39].
3 Research Method
To answer our research question, we first selected suitable dialog corpora and sentiment
analysis methods to run our analyses. Then, we defined a three-step research approach
to analyze the corpora in order to answer our research question.
3.1 Dialog Corpora and Sentiment Methods
First, we collected one dialog corpus from an online experiment in a customer service
context [40]. The participants (n = 79, mean age = 28.835, SD age = 6.388) were given
a fictive mobile phone bill and the experimental task was to find a more suitable mobile
phone plan through interacting with a customer service chatbot. The chatbot asked
several consumption-related questions and was capable of responding interactively to
given user queries. After the interaction, all participants were asked to complete a
questionnaire measuring CSES using an established measurement instrument on a 7-
point Likert scale [29]. The construct displayed a sufficient composite reliability (CR)
above 0.8 (CR = 0.814) and the average variance extracted was above 0.5. All
measurement items had factor loadings above 0.7 and the mean CSES value was 4.924
(SD = 1.179). The complete experiment, all dialogs, as well as the questionnaire were
in English. The complete dialog corpus consists of 79 user dialogs with a total of 1416
user utterances. We removed 353 utterances because they consisted of only mobile
contract related numbers. The final corpus included 79 dialogs and a total of 1063
utterances with an average of 13.456 utterances per dialog (SD = 8.312). We refer to
this dialog corpus as “ExpCorpus” in the remainder of this paper.
In addition to ExpCorpus, we used a second, publicly available dialog corpus
(without CSES values) in order to have a greater basis for the comparison of different
sentiment analysis methods. Therefore, we selected the ConvAI dialog corpus [41].
500 volunteers chatted with ten chatbots and the dialog set is freely available as a JSON-
File. The dataset includes 2778 dialogs from which we excluded 441 human-human
dialogs, 102 empty dialogs, 54 bot only dialogs, and one numbers-only dialog. This
resulted in the extraction of 2180 human-chatbot dialogs, which were neither empty nor
contained only numbers. Finally, we extracted all 12482 human written utterances. We
refer to this dialog corpus as “ConvAI” in the remainder of this paper.
To select appropriate sentiment analysis methods for our study, we reviewed two
benchmark analyses [21, 42]. We followed the benchmark analysis of [21], which
compared 24 open source methods, as well as the benchmark analysis of [42], which
also included sentiment analysis methods from major technology companies (e.g., IBM,
Microsoft). The benchmark analyses reveal that there is no superior sentiment analysis
method because all tools perform differently depending on the specific context they are
applied on or depending on the corresponding data source on which they were trained
[21]. Consequently, both benchmarks reveal several suitable methods depending on the
respective context and the training data [21]. The benchmark of [21] reveals that two
of the best sentiment analysis methods providing numerical polarity for negative,
neutral, and positive sentiments are VADER [43] and AFINN (i.e., an extension of
ANEW [44]) [21]. VADER and AFINN are rule-based sentiment analysis methods,
which use rules and heuristics to match the analyzed texts to sentiment lexicons. Both
lexicons were developed and trained on social media content and Twitter data [21, 43].
The benchmark analysis of [42] reveals that the sentiment analysis methods by IBM
Watson, Google Cloud, and Microsoft Azure perform best with varying types of
datasets [42]. These sentiment analysis methods leverage machine learning
classification algorithms in order to predict the sentiment score. Therefore, all three
providers trained their algorithms on an extensive body of sentiment annotated text
databases [42]. To cover both types of sentiment analysis techniques, namely semantic
rules and statistical methods [20], we selected the following methods for our study: two
open source methods using rule-based sentiment analysis methods (i.e., VADER,
AFINN) and three commercial methods using machine learning classification
algorithms (i.e., IBM Watson, Google Cloud, and Microsoft Azure). We calculated the
sentiment scores for each of the open source methods using the web service ifeel 2.0
provided by [22] and for each of the commercial methods using their Node.js APIs.
3.2 Research Approach
In this section, we present our three-step research approach (see Table 1) to answer our
research question and to investigate the potential correlation between sentiments and
CSES. All analyses were conducted using R 3.5.0.
Table 1. Research approach
Step
Research method
Dialog corpora
1.
Comparison of sentiment analysis
methods
ConvAI (dialog & utterance level),
ExpCorp (dialog & utterance level)
2.
Correlation analysis between
sentiment scores and CSES values
ExpCorp (dialog level)
3.
Exploratory analysis of sentiment
scores and CSES values
ExpCorp (utterance level)
In the first step, we compared all selected sentiment methods because the accuracy of
sentiment analysis method are highly context and data dependent. Therefore, we
investigated whether sentiment scores from each tool are similar on a dialog and
utterance level by calculating the sentiment scores for each dialog and each single
utterance of both corpora with all five methods. Next, we tested for potential
correlations among the five sentiment scores. We do this analysis on a sentence and
utterance level as sentiment analysis methods seem to perform better on carefully
authored, lengthier content, but often struggle when faced with informal online
communication[19, p. 318]. Consequently, we assume that some sentiment methods
may struggle to predict the sentiment score of rather short utterance level and that the
methods perform quite differently on both levels.
In the second step, we tested for a correlation between sentiment scores and CSES
values. Therefore, we standardized the sentiment scores to -1 (i.e., negative) and +1
(i.e., positive) and subsequently tested for a correlation between sentiment scores (from
all five methods) and CSES values using the dialogs and satisfaction data of
ExpCorpus. By doing this, we aimed to reveal whether sentiment scores are a valid
proxy for CSES values.
In the third step, we investigated the minimum number of utterances required to
show a correlation between sentiment scores and CSES values. For this analysis, we
used IBM’s sentiment method because it yielded the highest correlation in the previous
step. Therefore, we extracted utterance sequences of each dialog, calculated their
sentiment scores, and tested for a correlation between sentiment scores and CSES
values. Next, we investigated whether these findings also hold for utterance sequences
throughout the whole dialog. This analysis provides insights whether sentiment scores
can be used as a proxy for CSES during a customer service encounter.
4 Results
Step 1: Comparison of Sentiment Analysis Methods
In the first step, we started our analysis by comparing the calculated sentiment scores
of selected sentiment analysis methods for both dialog corpora (ConvAI and
ExpCorpus). Table 2 contains the correlation analysis between sentiment scores of both
corpora for each dialog and single utterance calculated by all five methods.
Table 2. Pearson correlation analyses among sentiment scores of different methods
Corpus
Method
AFINN
VADER
IBM
Microsoft
AFINN
VADER
IBM
Microsoft
ConvAI
AFINN
-
-
VADER
.605***
-
.322***
-
IBM
.385***
.317***
-
.387***
.357***
-
Microsoft
.368***
.356***
.533***
-
.300***
.311***
.604***
-
Google
.369***
.295***
.504***
.395***
.414***
.388***
.600***
.497***
n = 2180 dialogs
n = 12482 utterances
ExpCorpus
AFINN
-
-
VADER
.719***
-
.597***
-
IBM
.508***
.512***
-
.505***
.169***
-
Microsoft
.473***
.467***
.600***
-
.383***
.201***
.615***
-
Google
.516***
.366***
.521***
.537***
.653***
.439***
.625 ***
.487***
n = 79 dialogs
n = 1063 utterances
*** p < .001
The results reveal that sentiment scores of dialog data from both corpora are at least
moderately positively correlated with each other [45] (ConvAi .295 r .605, n =
2180, p < .001, ExpCorpus .366 r .719, n = 79, p < .001). The strongest correlation
for ConvAi dialogs were identified between VADER’s and AFINN’s sentiment scores
(r = .605, n = 2180, p < .001) and the weakest between Vader’s and Google’s sentiment
scores (r = .295, n = 2180, p < .001). The strongest correlation for ExpCorpus was again
identified between VADER’s and AFINN’s sentiment scores (r = .719, n = 79, p <
.001) and the weakest one between Vader’s and Google’s sentiment scores (r = .366, n
= 2180, p < .001). All sentiment scores on an utterance level were significantly
positively correlated, but some correlations were weaker among some methods than
they were on a dialog level (.169 r .653, p < .001). All in all, the findings reveal
that sentiment methods using similar methodologies to identify the expressed polarity
in a given text provide rather similar results. Thus, methods using semantic rules such
as VADER and AFINN are strongly correlated on a dialog level. Moreover, methods
using machine classification algorithms such as IBM’s, Microsoft’s, and Google’s
methods are at least moderately correlated on a dialog and utterance level.
Step 2. Correlation Analysis Between Sentiment Scores and CSES Values
In the second step, we tested for a correlation between sentiment scores and CSES
values using the dialogs and CSES values of ExpCorpus. The results and the
corresponding scatterplots are displayed in Figure 1. The analysis reveals a significant
moderate to strong correlation between sentiment scores (from all five methods) and
CSES values (.405 ≤ r ≤ .513, n = 79, p < .001) [45]. Thus, we conclude that there is a
moderate positive correlation between sentiment scores and CSES values for four
sentiment analysis methods and a strong positive correlation for IBM’s sentiment
method (r = .513, n = 79, p < .001) [45]. Moreover, it becomes visible that sentiment
scores seem to be primarily a better predictor for positive than for negative CSES
values. Moreover, semantic rule based algorithms seem to calculate sentiment scores
of the dialogs generally more positive.
Sentiment
method
Pearson’s correlation
statistic
AFINN
r = .417, n = 79, p < .001
VADER
r = .464, n = 79, p < .001
IBM
r = .513, n = 79, p < .001
Microsoft
r = .461, n = 79, p < .001
Google
r = .405, n = 79, p < .001
Figure 1. Correlation analyses between sentiment scores (of dialogs) and CSES
values for ExpCorpus
Step 3: Exploratory Analysis of Sentiment Scores and CSES Values
In the third step, we investigated the minimum number of utterances required to show
a significant positive correlation between sentiment scores and CSES values. Therefore,
we combined the first ten utterances (ui, i = 1, ,10) into ten different utterance
sequences (USi, i = 1, …, 10), calculated their sentiment scores, and tested for a
correlation with CSES values. The results are summarized in Table 3.
Table 3. Correlation analyses between sentiment scores (of utterances sequences) and CSES
values for ExpCorpus
Analysed utterance
squence
Included
dialogs
Included
utterances
Included
words
Pearson’ s correlation statistic
US1 = {u1}
79
79
739
n = 79, r = .018, p = .872
US2 = {u1, u2}
79
158
1086
n = 79, r = .133, p = .244
US3 = {u1, u2, u3}
79
237
1360
n = 79, r = .234, p = .038
US4 = {u1, …, u4}
79
316
1574
n = 79, r = .251, p = .026
US5 = {u1, …, u5}
79
395
1779
n = 79, r = .372, p < .001
US6 = {u1, …, u6}
75
450
1989
n = 75, r = .437, p < .001
US7 = {u1, …, u7}
74
518
2159
n = 74, r = .480, p < .001
US8 = {u1, …, u8}
68
544
2350
n = 68, r = .443, p < .001
US9 = {u1, …, u9}
58
522
2495
n = 58, r = .506, p < .001
US10= {u1, …, u10}
46
460
2633
n = 46, r = .503, p < .001
All dialogs with all
utterances
79
1060
3431
n = 79, r = .513, p < .001
Please note that not all dialogs included up to ten user utterances. As a consequence, the number of analyzed dialogs
decreases with increasing sequence length. The last row analyzes all dialogs including all utterances of each dialog.
The analysis reveals that the sentiment scores of US1 and US2 have no significant
correlation with the CSES values. However, the correlation increases with an increasing
number of utterances combined in each sequence. Our results show a significant weak
positive correlation between sentiment scores and CSES values after the analysis of the
first three utterances (r = .234, n = 79, p = .038). Moreover, we revealed a significant
moderate positive correlation (r = .372, n = 79, p < .001) after the analysis of the first
five utterances [45]. To provide a greater understanding of these findings, Table 4
provides some exemplary utterance sequences, their sentiment scores, and the measured
CSES values.
Table 4. Exemplary utterance sequences including the first three utterances
Utterance sequence
Sentiment
score
CSES
value
{“Hi”, Nice to meet you. I’m interested in a cheaper phone plan. Can you help
me?”, I think it is SuperMobile”}
0.769
6
{“Hey, I’m currently on the mobile phone plan Yellow Basic 1000 and I
received an unexpectedly high mobile phone billl last month.”, Are there any
better mobile phone plans for me?”, “It’s SuperMobile Yellow Basic 1000”}
0.488
6
{My bill is too high”, Help me to find a new mobile phone plan”, I dont
know”}
-0.566
4,333
Having shown that at least the first three utterances of a dialog are required to find a
significant positive correlation between sentiment scores and CSES values, we further
investigated whether this correlation can also be found for all utterance sequences
throughout the whole dialogs. Therefore, we extracted all consecutive utterance
sequences consisting of three or five utterances within the first ten utterances of each
dialog. This extraction resulted in eight consecutive utterance sequences for dialogs that
were at least that long (e.g., Seq-1 = {u1, u2, u3}, Seq-2 = {u2, u3, u4}, Seq-9 = {u1,
u2, u3, u4, u5}). Then we calculated the sentiment scores and tested for a correlation
between sentiment scores and CSES values (see Table 5).
Table 5. Correlation analyses between sentiment scores (of consecutive utterance sequences)
and CSES values for ExpCorpus
Sequence
Included utterances
Pearson’ s correlation statistic
Seq-1
{u1,
u2,
u3}
n = 79, r = .234, p = .038
Seq-2
{u2,
u3,
u4}
n = 79, r = .243, p = .031
Seq-3
{u3,
u4,
u5}
n = 79, r = .289, p = .010
Seq-4
{u4,
u5,
u6}
n = 75, r = .350, p = .002
Seq-5
{u5,
u6,
u7}
n = 74, r = .410, p < .001
Seq-6
{u6,
u7,
u8}
n = 68, r = .267, p = .029
Seq-7
{u7,
u8,
u9}
n = 58, r = .501, p < .001
Seq-8
{u8,
u9,
u10}
n = 46, r = .501, p < .001
Seq-9
{u1,
u2,
u3,
u4,
u5}
n = 79, r = .372, p < .001
Seq-10
{u2,
u3,
u4,
u5,
u6}
n = 75, r = .407, p < .001
Seq-11
{u3,
u4,
u5,
u6,
u7}
n = 74, r = .436, p < .001
Seq-12
{u4,
u5,
u6,
u7,
u8}
n = 68, r = .377, p = .002
Seq-13
{u5,
u6,
u7,
u8,
u9}
n = 58, r = .503, p < .001
Seq-14
{u6,
u7,
u8,
u9,
u10}
n = 46, r = .325, p = .028
Please note that not all dialogs included up to ten user utterances. As a consequence, the number of analyzed dialogs
decreases with increasing utterance position.
The analysis shows that sentiment scores of all utterance sequences throughout the
whole dialog are positively correlated with the CSES values. All correlations are
significant at least at a p < .05 level. The correlation strength varies among the different
sequences between weak and strong correlation. However, the minimum and maximum
value of the correlation strength is higher for sequences consisting of five consecutive
utterances, which always had at least a moderate positive correlation with CSES values.
5 Discussion
In this paper, we investigate whether sentiment scores from textual input can be used
as a proxy to measure CSES in a customer-chatbot interaction. Therefore, we followed
a three-step research approach: first, we compared five sentiment analysis methods by
testing the relation of sentiment scores from two dialog corpora. Second, we tested for
a correlation between sentiment scores and CSES values. Third, we analyzed this
correlation in detail at the utterance level. Results of step 1 reveal a significant
positively correlation among sentiment scores from all selected sentiment analysis
methods. Results of step two reveal that sentiment scores of complete dialogs are
significantly positive correlated with the subjectively measured CSES values. Results
of step three reveal that this relation is not only valid for the analysis of an entire dialog,
but also for any sequences of at least three consecutive utterances throughout the entire
dialog. Thus, we conclude that sentiment scores can be used as an automatic and
objective proxy to measure CSES in an online service encounter. Therefore, our
findings further contribute to existing research that states sentiment analysis
corresponds surprisingly well with emotional self-report[15, p. 87].
The results of our analysis have implications for the design of customer service
chatbots. As customers may express their frustrations in written language, future
chatbots could continuously perform sentiment analyses and use sentiment scores as a
proxy to identify dissatisfied customers (by analyzing at least three consecutive
utterances). In this way, service providers can intervene to reduce the risk of service
failures. For example, a customer service chatbot could recognize that the current
conversation with a customer is turning towards a negative sentiment score. In this case,
several strategies could be triggered. The chatbot could seamlessly handover the
conversation to a trained human service agent, automatically trigger service recovery
procedures, or express certain verbal utterances such as excuses [46, 47]. Research has
shown that these immediate reactions can reduce the level of frustration [46] and can
lead to an increased interaction length [47]. Furthermore, service providers can use this
data in post-interaction analyses to retrieve valuable information about CSES. This
information cannot only be used for service recovery, but also for identifying general
weaknesses in the service quality of the chatbot.
Although we aimed to ensure a high rigor in our research, some limitations should
be considered. First, many sentiment analysis methods exist and they all may evaluate
a given text differently depending on the context and type of a message [21]. This
becomes even more meaningful when applied to rather short and informal chat data.
Therefore, a different selection of sentiment methods may have led to different results.
Consequently, we tried to minimize this risk by starting with a selection of five
sentiment methods based on benchmarks and compared them with each other by
applying them on two dialog corpora. Even though all sentiment analysis methods had
a moderate to strong correlation to CSES, some sentiments methods were rather weak
predictors for users having low CSES values. Therefore, it is important that
researchers and companies perform experiments with different methods before
applying a method[21, p. 27]. Second, we analyzed a dialog corpus, which measured
the CSES using a post-interaction survey. However, data of a survey-based approach
might be influenced by various biases [12]. To reduce this risk, we reviewed all dialogs
and verified that participants followed the experimental task and did not answer with
straight line responses. Third, we only analyzed the relationship between sentiments
and CSES based on a dialog corpus from a hypothetical online service task (i.e., finding
new plan) in a specific context (i.e., mobile contract) in one language (i.e., English).
Therefore, it is unclear whether our findings also hold for other customer service tasks
(e.g., book ticket) in other contexts (e.g., airlines) in other languages (e.g., German).
Fourth, we conducted correlation analyses between sentiment scores and CSES values
to reveal a correlation between these two variables. Even though we found a strong
positive correlation and propose sentiment scores as a proxy for CSES values, this
analysis does not provide the explanation for this relation and does not indicate a cause-
and-effect relationship [48]. Thus, results need to be applied with care as we cannot
predict CSES based on sentiments scores or vice versa [48].
Considering these limitations, we identify several avenues for future research. First,
future work can replicate our analyses on additional dialog corpora from different
contexts, doing different tasks, and in different languages. This could further strengthen
the applicability of sentiment analysis as a proxy to measure CSES in several domains
and languages. Second, future studies could investigate adaptive reaction strategies
based on real-time analyses of at least three consecutive user utterances. This could
enable chatbots to recognize user frustrations and supports the development of chatbots
that act more socially [46, 47]. Moreover, future research could investigate the
application of more trivial text analysis methods, such as word count and length of
dialogs, as well as more complex methods, such as topic modelling, as proxies to predict
customer satisfaction. Integrating these techniques into a chatbot can lead to even
greater understanding of the user and enables more precise reactions by the chatbot.
6 Conclusion
In this paper, we investigate the application of sentiment analysis methods in an online
service encounter with a chatbot and show that sentiment scores can serve as a proxy
to measure CSES. This enables researchers and practitioners, such as online service
providers, to objectively and automatically retrieve user information during and after
an online service encounter. This information can be used not only to trigger service
recovery procedures, but also to identify weaknesses in the service quality and to
analyze the user in real-time. Therefore, our results contribute towards the design of
user adaptive service chatbots.
References
1. Larivière, B., Bowen, D., Andreassen, T.W., Kunz, W., Sirianni, N.J., Voss, C.,
Wünderlich, N.V., Keyser, A. de: “Service Encounter 2.0”: An investigation into the roles
of technology, employees and customers. Journal of Business Research 79, 238246
(2017)
2. McLean, G., Osei-Frimpong, K.: Examining satisfaction with the experience during a live
chat service encounter-implications for website providers. Computers in Human Behavior
76, 494508 (2017)
3. Kowatsch, T., Nißen, M., Rüegger, D., Stieger, M., Flückiger, C., Allemand, M.,
Wangenheim, F. von: The Impact of Interpersonal Closeness Cues in Text-based
Healthcare Chatbots on Attachment Bond and the Desire to Continue Interacting: An
Experimental Design. In: Twenty-Sixth European Conference on Information Systems
(ECIS). Portsmouth, UK (2018)
4. McTear, M., Callejas, Z., Griol, D.: The Conversational Interface. Talking to Smart
Devices. Springer International Publishing, Switzerland (2016)
5. Dale, R.: The return of the chatbots. Natural Language Engineering 22, 811817 (2016)
6. Gartner: Gartner Says 25 Percent of Customer Service Operations Will Use Virtual
Customer Assistants by 2020, https://www.gartner.com/newsroom/id/3858564
7. Ben Mimoun, M.S., Poncin, I., Garnier, M.: Case studyEmbodied virtual agents. An
analysis on reasons for failure. Journal of Retailing and Consumer Services 19, 605612
(2012)
8. Oliver, R.L.: Satisfaction. A behavioral perspective on the consumer. McGraw Hill, New
York (1997)
9. Brave, S., Nass, C.: Emotion in human-computer interaction. In: Jacko, J.A., Sears, A.
(eds.) The human-computer interaction handbook, pp. 8196. L. Erlbaum Associates Inc,
NJ, USA (2002)
10. Holloway, B.B., Beatty, S.E.: Service Failure in Online Retailing: A Recovery
Opportunity. Journal of Service Research 6, 92105 (2003)
11. Xiang, Y., Zhang, Y., Zhou, X., Wang, X., Qin, Y.: Problematic situation analysis and
automatic recognition for chinese online conversational system. In: Joint Conference on
Chinese Language Processing. Wuhan, China (2014)
12. Podsakoff, P.M., MacKenzie, S.B., Lee, J.-Y., Podsakoff, N.P.: Common method biases in
behavioral research: a critical review of the literature and recommended remedies. The
Journal of applied psychology 88, 879903 (2003)
13. Tausczik, Y.R., Pennebaker, J.W.: The Psychological Meaning of Words: LIWC and
Computerized Text Analysis Methods. Journal of Language and Social Psychology 29,
2454 (2010)
14. Nerbonne, J.: The Secret Life of Pronouns. What Our Words Say About Us. Literary and
Linguistic Computing 29, 139142 (2014)
15. Küster, D., Kappas, A.: Measuring Emotions Online: Expression and Physiology. In:
Holyst, J.A. (ed.) Cyberemotions: Collective Emotions in Cyberspace, pp. 7193. Springer
International Publishing, Cham (2017)
16. Skowron, M., Rank, S., Theunis, M., Sienkiewicz, J.: The Good, the Bad and the Neutral:
Affective Profile in Dialog System-User Communication. In: D’Mello, S., Graesser, A.,
Schuller, B., Martin, J.-C. (eds.) Affective Computing and Intelligent Interaction, pp. 337
346. Springer, Berlin, Heidelberg (2011)
17. Liu, B.: Sentiment analysis and opinion mining. Synthesis lectures on human language
technologies 5, 1167 (2012)
18. Banchs, R.E.: On the construction of more human-like chatbots: Affect and emotion
analysis of movie dialogue data. In: Asia-Pacific Signal and Information Processing
Association Annual Summit and Conference. Kuala Lumpur (2017)
19. Brooks, M., Kuksenok, K., Torkildson, M.K., Perry, D., Robinson, J.J., Scott, T.J.,
Anicello, O., Zukowski, A., Harris, P., Aragon, C.R.: Statistical Affect Detection in
Collaborative Chat. In: Conference on Computer Supported Cooperative Work, pp. 317
328. ACM, New York, NY, USA (2013)
20. Clavel, C., Callejas Z.: Sentiment Analysis: From Opinion Mining to Human-Agent
Interaction. IEEE Transactions on affective computing 7, 7493 (2016)
21. Ribeiro, F.N., Araújo, M., Gonçalves, P., André Gonçalves, M., Benevenuto, F.:
SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods.
EPJ Data Science 5, 23 (2016)
22. Diniz, J.P., Bastos, L., Soares, E., Ferreira, M., Ribeiro, F., Benevenuto, F.: ifeel 2.0: A
multilingual benchmarking system for sentence-level sentiment analysis. In: 10th
international AAAI conference on weblogs and social media. Cologne, Germany (2016)
23. Kang, D., Park, Y.: Review-based measurement of customer satisfaction in mobile service:
Sentiment analysis and VIKOR approach. Expert Systems with Applications 41, 1041
1050 (2014)
24. Walther, J.B., Parks, M.R.: Cues filtered out, cues filtered in. In: Knapp, M.L., Daly, J.A.
(eds.) Handbook of Interpersonal Communication, pp. 529563. SAGE, Thousand Oaks,
CA, USA (2002)
25. Maedche, A., Morana, S., Schacht, S., Werth, D., Krumeich, J.: Advanced User Assistance
Systems. Business & Information Systems Engineering 58, 367370 (2016)
26. Gnewuch, U., Morana, S., Maedche, A.: Towards Designing Cooperative and Social
Conversational Agents for Customer Service. In: Proceedings of the 38th International
Conference on Information Systems (ICIS). AISel, Seoul (2017)
27. Au, N., Ngai, E.W.T., Cheng, T.E.: A critical review of end-user information system
satisfaction research and a new research framework. Omega-International Journal of
Management Science 30, 451478 (2002)
28. Jones, T.O., Sasser, W.E.: Why satisfied customers defect. Harvard Business Review 73,
88-& (1995)
29. Verhagen, T., van Nes, J., Feldberg, F., van Dolen, W.: Virtual Customer Service Agents.
Using Social Presence and Personalization to Shape Online Service Encounters. Journal of
Computer-Mediated Communication 19, 529545 (2014)
30. Caruana, A.: Service loyalty. European Journal of Marketing 36, 811828 (2002)
31. Shankar, V., Smith, A.K., Rangaswamy, A.: Customer satisfaction and loyalty in online
and offline environments. International Journal of Research in Marketing 20, 153175
(2003)
32. Bitner, M.J., Brown, S.W., Meuter, M.L.: Technology Infusion in Service Encounters.
Journal of the Academy of Marketing Science 28, 138149 (2000)
33. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and Trends in
Information Retrieval 2, 1135 (2008)
34. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: An exploration of
features for phrase-level sentiment analysis. Computational Linguistics 35, 399433
(2009)
35. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with
twitter: What 140 characters reveal about political sentiment. In: International AAAI
Conference on Weblogs and Social Media, 10, pp. 178185. Menlo Park, CA, USA (2010)
36. Abbasi, A., Chen, H.: Affect Intensity Analysis of Dark Web Forums. In: Proceedings of
Intelligence and Security Informatics. Chengdu, China (2017)
37. Thelwall, M., Buckley, K., Paltoglou, G., Di Cai, Kappas, A.: Sentiment strength detection
in short informal text. Journal of the American Society for Information Science and
Technology 61, 25442558 (2010)
38. Higashinaka, R., Minami, Y., Dohsaka, K., Meguro, T.: Issues in Predicting User
Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and
Prediction Models. In: Lee, G.G., Mariani, J., Nakamura, S. (eds.) Spoken Dialogue
Systems for Ambient Environments. Seond International Workshop, IWSDS 2010,
Gotemba, Shizuoka, Japan, October 1-2, 2010. Proceedings, 6392, pp. 4860. Springer,
New York (2010)
39. Wang, Y., Lu, X., Tan, Y.: Impact of product attributes on customer satisfaction: An
analysis of online reviews for washing machines. Electronic Commerce Research and
Applications 29, 111 (2018)
40. Gnewuch, U., Morana, S., Adam, M., Maedche, A.: Faster Is Not Always Better:
Understanding the Effect of Dynamic Response Delays in Human-Chatbot Interaction. In:
in Proceedings of the 26th European Conference on Information Systems (ECIS),
Portsmouth, United Kingdom, June 23-28.
41. Logacheva, V., Burtsev, M., Malykh, V., Poluliakh, V., Rudnicky, A., Serban, I., Lowe,
R., Prabhumoye, S., Black, A.W. and Bengio, Y.: A Dataset of Topic-Oriented Human-to-
Chatbot Dialogues, http://convai.io/2017/data/dataset_description.pdf (Accessed:
30.08.2018)
42. Corredera Arbide, A., Romero, M., Moya Fernández, J.M.: Affective computing for smart
operations: a survey and comparative analysis of the available tools, libraries and web
services. International Journal of Innovative and Applied Research 5, 1235 (2017)
43. Gilbert, C.H.E.: Vader: A parsimonious rule-based model for sentiment analysis of social
media text. In: Eighth International AAAI Conference on Weblogs and Social Media. Ann
Arbor, MI, USA (2014)
44. Nielsen, F.Å.: A new ANEW: Evaluation of a word list for sentiment analysis in
microblogs. arXiv preprint arXiv:1103.2903 (2011)
45. Cohen, J.: A power primer. Psychological bulletin 112, 155159 (1992)
46. Hone, K.: Empathic agents to reduce user frustration. The effects of varying agent
characteristics. Interacting with Computers 18, 227245 (2006)
47. Klein, J., Moon, Y., Picard, R.W.: This computer responds to user frustration. Theory,
design, and results. Interacting with Computers 14, 119140 (2002)
48. Taylor, R.: Interpretation of the Correlation Coefficient: A Basic Review. Journal of
Diagnostic Medical Sonography 6, 3539 (1990)
... Building on these articles, the psychological and individual knowledge that can be gathered within the conversation can be conceptualized as verbal social cues that trigger a reaction in the forms of emotional, cognitive, or behavioral responses (Feine et al., 2019). The expression of social cues is characteristic of interpersonal communication (Feine et al., 2019). ...
... Building on these articles, the psychological and individual knowledge that can be gathered within the conversation can be conceptualized as verbal social cues that trigger a reaction in the forms of emotional, cognitive, or behavioral responses (Feine et al., 2019). The expression of social cues is characteristic of interpersonal communication (Feine et al., 2019). Social cues can be verbal or non-verbal, whereas verbal cues can be content-related or stylerelated. ...
Article
Customer service conversations are becoming increasingly digital and automated, leaving service encounters impersonal. The purpose of this paper is to identify how customer service agents and conversational artificial intelligence (AI) applications can provide a personal touch and improve the customer experience in customer service. The authors offer a conceptual framework delineating how text-based customer service communication should be designed to increase relational personalization. This paper presents a systematic literature review on conversation styles of conversational AI and integrates the extant research to inform the development of the proposed conceptual framework. Using social information processing theory as a theoretical lens, the authors extend the concept of relational personalization for text-based customer service communication. The conceptual framework identifies conversation styles, whose degree of expression needs to be personalized to provide a personal touch and improve the customer experience in service. The personalization of these conversation styles depends on available psychological and individual customer knowledge, contextual factors such as the interaction and service type, as well as the freedom of communication the conversational AI or customer service agent has. The article is the first to conduct a systematic literature review on conversation styles of conversational AI in customer service and to conceptualize critical elements of text-based customer service communication required to provide a personal touch with conversational AI. Furthermore, the authors provide managerial implications to advance customer service conversations with three types of conversational AI applications used in collaboration with customer service agents, namely conversational analytics, conversational coaching and chatbots.
... Kullanıcıların, çevrimiçi insan aracılara yönelik duygularının olumsuz olduğu ancak sohbet robotlarına yönelik duyguların, daha da olumsuz olduğu sonucuna varılmaktadır (Tran, Pallant, Johnson, 2021). Diğer bir çalışmada, farklı duygu analizi yöntemlerini karşılaştırarak, sohbet robotundan elde edilen diyalogların nesnel olarak hesaplanmış duygu puanları ile öznel olarak ölçülen CSES (sohbet robotu hizmet alma memnuniyeti) değerleri arasındaki ilişki araştırılmıştır (Feine, Morana, Gnewuch, 2019). ...
Article
Full-text available
Yapay zekâ alanındaki son gelişmelerle, sesli ve yazılı olarak cevap verebilme imkânı sağlayan sanal asistanlar ve sohbet robotları kullanıcılar ve müşteriler tarafından yaygın bir şekilde kullanılmaya başlanmıştır. Bu araştırmada, ‘sohbet robotu’ (chatbot) anahtar kelimesi ile eşleşen tweetler toplanarak, belirlenen alanlarda tematik dağılım ortaya konulması amaçlanmıştır. Sohbet robotlarının, dört önemli özelliği (sohbet/konuşma, erişilebilirlik, entegrasyon ve duygu) dikkate alınmıştır. Çalışmada İngilizce dilinde Twitter API ile toplamda 153093 olan gönderi üzerinden kelime ilişkilendirme analizi, kelime frekans analizi ve tematik analiz teknikleri kullanılarak tematik dağılım ortaya konulması amaçlanmıştır. ‘Sohbet Robotu’ ifadesi içeren gönderilerde istatistiksel olarak anlamlı ilişkilendirilmiş kelimeler %8,9’unda ‘müşteri’ ve %7,3’ünde ‘google’ olmuştur. Ayrıca, ‘iletişim’, ‘link’, ‘mühendis’, ‘hizmet’ ve ‘doğrudan mesaj’ kelimeleri de diğer ilişkilendirilmiş kelimelerden bazılarıdır. İstatistiksel olarak anlamlı ilişkilendirilmiş sohbet/konuşma alanında en çok yer alan kelime, % 15,3 ile ‘otomatikleştirme’ sözcüğü olmuştur. Erişilebilirlik alanında, %46,7’sinde ‘genel’, %32,9’unda ‘sanal’ ifadesi yer almaktadır. Entegrasyon alanında, ‘bileşen kullanımı’ (%22,4) ve duygu alanında ‘insan’ (%27,3) sözcükleri istatistiksel olarak ilişkilendirilmiştir. Sonuç olarak, temalar ve alt temalar dikkate alındığında, sohbet robotlarının sadece teknik özellikleri değil sosyal ve duygusal yönlerinin de öne çıktığı ortaya çıkmaktadır.
... Ranoliya et al. [15] proposed a more classical Extensible Markup Language-based approach for Universityrelated queries, achieving impressive results for an automatic questionanswering problem in educational support. Often, data received from customers is further analysed with sentiment analysis using either scoring and polarity [16,17] or classification [18,19]. In this study, the sequence of inputs and attention masking are considered, and so, although not explicitly scoring or classifying sentiment, valence data still exist within the dataset. ...
Article
Full-text available
Customer service is an important and expensive aspect of business, often being the largest department in most companies. With growing societal acceptance and increasing cost efficiency due to mass production, service robots are beginning to cross from the industrial domain to the social domain. Currently, customer service robots tend to be digital and emulate social interactions through on-screen text, but state-of-the-art research points towards physical robots soon providing customer service in person. This article explores the feasibility of Transfer Learning different customer service domains to improve chatbot models. In our proposed approach, transfer learning-based chatbot models are initially assigned to learn one domain from an initial random weight distribution. Each model is then tasked with learning another domain by transferring knowledge from the previous domains. To evaluate our approach, a range of 19 companies from domains such as e-Commerce, telecommunications, and technology are selected through social interaction with X (formerly Twitter) customer support accounts. The results show that the majority of models are improved when transferring knowledge from at least one other domain, particularly those more data-scarce than others. General language transfer learning is observed, as well as higher-level transfer of similar domain knowledge. For each of the 19 domains, the Wilcoxon signed-rank test suggests that 16 have statistically significant distributions between transfer and non-transfer learning. Finally, feasibility is explored for the deployment of chatbot models to physical robot platforms including “Pepper”, a semi-humanoid robot manufactured by SoftBank Robotics, and “Temi”, a personal assistant robot.
... Conversational agent systems can be applied to medical scenarios to alleviate alexithymia and its effects, and to strengthen the caregivers' emotional awareness of their own feelings [7]. ECA can be used in high-risk, high-reward mental health services. ...
Article
Full-text available
Taking a big step with the development of AI, the functions of conversation agents are becoming increasingly mature, and people's lives are becoming increasingly dependent on the conversation agent system. Its assistance to people is reflected in many fields. This has sparked people's exploration of some emerging fields, it is necessary to organize and summarize the new technologies, functions, data, and methods for training new data required for studying emerging fields. More and more deep learning technology have been applied to conversation agents that improve the quality of the service significantly. It is still in the initial stage, and needs improvement both at modeling methods and datasets gathering.
... For sufficiently large values of the transformed variable, the arcsinh transformation is similar to a log-transformation with the difference that it is defined at zero 28 A description of the service can be found at https://www.ibm.com/cloud/watson-naturallanguage-understanding. Watson Natural Language Understanding is publicly available and free of charge (for limited usage) and has been found to outperform other widely used natural language understanding tools (see e.g., Abdellatif et al. [2022]; Canonico and De Russis [2018]; Carvalho and Harris [2020]; Feine et al. [2019]). In addition, in contexts where detecting a negative text is of interest, Ermakova et al. [2021] recommend using Watson Natural Language Understanding because its ability to correctly detect negative sentiments surpasses other popular tools. ...
Article
Exploiting the award process, we implement a regression discontinuity design to estimate the effect of winning France's main literary prize, the Goncourt. It increases sales, especially for books that sold fewer copies before the announcement, the number of reviews on Amazon, and the probability of them being negative. The effect is partly driven by an increase in word of mouth. Those findings are consistent with a model where the prize provides information on the existence of a book and acts as a quality signal and a coordination device but prompts consumers to read books that are far from their tastes.
Chapter
Emotion is a vital, adaptive, and profound notion to understand and analyze. It moves along the state of mind. It has uncertain motion. Affection, love, care, liking, joy, happiness, sadness, anxiety, anger, hatred, jealousy, fear, lust, and greed are some emotions. In general, these are categorized into positive and negative types based on feelings experienced by the body, mind, heart, brain, and spirit (soul). Not only Humans but also every living entity possesses emotions. Emotion is a multifaceted term though; the discussion will highlight human brain-related activities and emotions mapped to it. There could be some questions in your mind as what is emotion, what is the significance to recognize it, and why emotion recognition became essential in the current era? Which are the parameters to play a role in analyzing them? And so on! Certainly, the motivation focuses on the interesting areas which drive emotion recognition based on current technological trends. Before going into the details, it is noteworthy to acknowledge affective computing and the role of affective computing in emotion recognition. There exist scientific knowledge-based methods, statistical methods, and hybrid methods on which researchers have been working for the last few decades. It is not merely important to recognize an emotion; the accuracy measured is important while recognizing an emotion with the help of computational techniques. The effort summarizes comprehensive, explorative possibilities of emotion recognition for self and social well-being. It suggests exceptional intuitions into the practical-oriented applications. Subfields of emotion recognition include text, dialogue, conversation, audio, video, gesture, and physiology to distinguish emotions. Datasets are a crucial requirement for models to recognize emotions. In the end, efforts are taken to put light on emotional intelligence. It is an aptitude-based skill that supports machines, systems, and computers in sensing, articulating, and understanding emotions.
Article
The potential of Artificial Intelligence in assisting human teamwork has yet to be fully realized, despite its success in other domains. To ensure AI’s effectiveness and credibility as a team advisor, it must be able to effectively infer team dynamics and issue appropriate interventions. This study focuses on AI-mediated human teamwork in an simulated search and rescue (SAR) task, where a team of humans is monitored and guided by an artifical social intelligence (ASI). Six different ASIs are compared against a human baseline investigating the characteristics and effectiveness of their interventions. When adjusted for initial player competence ASIs performed on par with the human advisor although the human advisor was rated as more trustworthy and useful. Additionally, sentiment analysis of the interventions reveals that participants were more likely to accept interventions with negative emotions and resulted in improved team performance.
Conference Paper
Full-text available
Working alliance describes an important relationship quality between health professionals and patients and is robustly linked to treatment success. However, due to limited resources of health professionals, working alliance cannot always be promoted just-in-time in a ubiquitous fashion. To address this scalability problem, we investigate the direct effect of interpersonal closeness cues of text-based healthcare chatbots (THCBs) on attachment bond from the working alliance con-struct and the indirect effect on the desire to continue interacting with THCBs. The underlying research model and hypotheses are informed by counselling psychology and research on conver-sational agents. In order to investigate the hypothesized effects, we first develop a THCB codebook with 12 design dimensions on interpersonal closeness cues that are categorized into visual cues (i.e. avatar), verbal cues (i.e. greetings, address, jargon, T-V-distinction), quasi-nonverbal cues (i.e. emoticons) and relational cues (i.e. small talk, self-disclosure, empathy, humor, meta-relational talk and continuity). In a second step, four distinct THCB designs are developed along the continuum of interpersonal closeness (i.e. institutional-like, expert-like, peer-like and myself-like THCBs) and a corresponding study design for an interactive THCB-based online experiment is presented to test our hypotheses. We conclude this work-in-progress by outlining our future work.
Conference Paper
Full-text available
A key challenge in designing conversational user interfaces is to make the conversation between the user and the system feel natural and human-like. In order to increase perceived humanness, many systems with conversational user interfaces (e.g., chatbots) use response delays to simu-late the time it would take humans to respond to a message. However, delayed responses may also negatively impact user satisfaction, particularly in situations where fast response times are expected, such as in customer service. This paper reports the findings of an online experiment in a customer service context that investigates how user perceptions differ when interacting with a chatbot that sends dynamically delayed responses compared to a chatbot that sends near-instant responses. The dynamic delay length was calculated based on the complexity of the re-sponse and complexity of the previous message. Our results indicate that dynamic response de-lays not only increase users’ perception of humanness and social presence, but also lead to greater satisfaction with the overall chatbot interaction. Building on social response theory, we provide evidence that a chatbot’s response time represents a social cue that triggers social re-sponses shaped by social expectations. Our findings support researchers and practitioners in understanding and designing more natural human-chatbot interactions.
Article
Full-text available
In this paper, we make a deep search of the available tools in the market, at the current state of the art of Sentiment Analysis. Our aim is to optimize the human response in Datacenter Operations, using a combination of research tools, that allow us to decrease human error in general operations, managing Complex Infrastructures. The use of Sentiment Analysis tools is the first step for extending our capabilities for optimizing the human interface. Using different data collections from a variety of data sources, our research provides a very interesting outcome. In our final testing, we have found that the three main commercial platforms (IBM Watson, Google Cloud and Microsoft Azure) get the same accuracy (89-90%). for the different datasets tested, based on Artificial Neural Network and Deep Learning techniques. The other stand-alone Applications or APIs, like Vader or MeaninCloud, get a similar accuracy level in some of the datasets, using a different approach, semantic Networks, such as Concepnet 1 , but the model can easily be optimized above 90% of accuracy, just adjusting some parameter of the semantic model. This paper points to future directions for optimizing DataCenter Operations Management and decreasing human error in complex environments.
Conference Paper
Full-text available
The idea of interacting with computers through natural language dates back to the 1960s, but recent technological advances have led to a renewed interest in conversational agents such as chatbots or digital assistants. In the customer service context, conversational agents promise to create a fast, convenient, and cost-effective channel for communicating with customers. Although numerous agents have been implemented in the past, most of them could not meet the expectations and disappeared. In this paper, we present our design science research project on how to design cooperative and social conversational agents to increase service quality in customer service. We discuss several issues that hinder the success of current conversational agents in customer service. Drawing on the cooperative principle of conversation and social response theory, we propose preliminary meta-requirements and design principles for cooperative and social conversational agents. Next, we will develop a prototype based on these design principles.
Article
Full-text available
This paper furthers our understanding of online customer support with regard to online live chat systems. Online live chat systems allow customers to seek service related information from an organisation via online-based synchronous media with a human service representative who provides answers through such media. With use of a web-based survey involving 302 respondents of real-life live chat service experiences with mobile phone network providers in the UK and through the use of structural equation modelling, the aim of this research is to understand the variables capable of influencing a customer’s satisfaction with their experience during an online live chat service encounter. The results indicate the importance of service quality, information quality and system quality variables influencing satisfaction with the experience, while such influence is dependent on the purpose of use. Additionally, the results outline the role of emoticons, presence of service reps picture, automated ‘canned’ responses and the presence of response time estimations in moderating the influence of service quality, information quality and system quality variables on satisfaction with the experience
Article
Online reviews are an important information source for companies analysing users’ demands. We conducted a study of online reviews to measure how product attributes impact customer satisfaction. First, we attempted to infer through sentiment analysis whether a customer is satisfied with a purchase according to their review. Second, a logistic regression model was developed to estimate the impact of various product properties on customer satisfaction scores. Our estimates indicated that customer satisfaction is influenced by drainage mode, loading type, frequency conversion, type, display, colour, and capacity. We further investigate the impact of price and find that customers who buy cheap products should be treated differently from purchasers of expensive items because the relevance of design features on their satisfaction is different. Additionally, we observed that although customers are concerned about noise, perceived noise is not consistent with actual noise levels. We analysed specific reviews and then obtained more detailed information on customer attitudes.
Book
This book provides a comprehensive introduction to the conversational interface, which is becoming the main mode of interaction with virtual personal assistants, smart devices, various types of wearables, and social robots. The book consists of four parts: Part I presents the background to conversational interfaces, examining past and present work on spoken language interaction with computers; Part II covers the various technologies that are required to build a conversational interface along with practical chapters and exercises using open source tools; Part III looks at interactions with smart devices, wearables, and robots, and then goes on to discusses the role of emotion and personality in the conversational interface; Part IV examines methods for evaluating conversational interfaces and discusses future directions. · Presents a comprehensive overview of the various technologies that underlie conversational user interfaces; · Combines descriptions of conversational user interface technologies with a guide to various toolkits and software that enable readers to implement and test their own solutions; · Provides a series of worked examples so readers can develop and implement different aspects of the technologies.
Article
The service encounter – one of the foundational concepts in service research – is fundamentally changing due to rapid evolutions in technology. In this paper, we offer an updated perspective on what we label the “Service Encounter 2.0”. To this end, we develop a conceptual framework that captures the essence of the Service Encounter 2.0 and provides a synthesis of the changing interdependent roles of technology, employees, and customers. We find that technology either augments or substitutes service employees, and can foster network connections. In turn, employees and customers are taking on the role of enabler, innovator, coordinator and differentiator. In addition, we identify critical areas for future research on this important topic.