Conference PaperPDF Available

Measuring Service Encounter Satisfaction with Customer Service Chatbots using Sentiment Analysis

Authors:

Abstract and Figures

Chatbots are software-based systems designed to interact with humans using text-based natural language and have attracted considerable interest in online service encounters. In this context, service providers face the challenge of measuring chatbot service encounter satisfaction (CSES), as most approaches are limited to post-interaction surveys that are rarely answered and often biased. As a result, service providers cannot react quickly to service failures and dissatisfied customers. To address this challenge, we investigate the application of automated sentiment analysis methods as a proxy to measure CSES. Therefore, we first compare different sentiment analysis methods. Second, we investigate the relationship between objectively computed sentiment scores of dialogs and subjectively measured CSES values. Third, we evaluate whether this relationship also exists for utterance sequences throughout the dialog. The paper contributes by proposing and applying an automatic and objective approach to use sentiment scores as a proxy to measure CSES.
Content may be subject to copyright.
This is the author’s version of a work that was published in the following source
Feine, J., Morana, S., and Gnewuch, U. 2019. Measuring Service Encounter Satisfaction with
Customer Service Chatbots using Sentiment Analysis,” in Proceedings of the 14th International
Conference on Wirtschaftsinformatik (WI2019), Siegen, Germany, February 2427.
Please note: Copyright is owned by the author and / or the publisher.
Commercial use is not allowed.
Institute of Information Systems and Marketing (IISM)
Fritz-Erler-Strasse 23
76133 Karlsruhe - Germany
http://iism.kit.edu
Karlsruhe Service Research Institute (KSRI)
Kaiserstraße 89
76133 Karlsruhe Germany
http://ksri.kit.edu
© 2017. This manuscript version is made available under the CC-
BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-
nc-nd/4.0/
14th International Conference on Wirtschaftsinformatik,
February 24-27, 2019, Siegen, Germany
Measuring Service Encounter Satisfaction with
Customer Service Chatbots using Sentiment Analysis
Jasper Feine1, Stefan Morana1, and Ulrich Gnewuch1
1 Karlsruhe Institute of Technology, Institute of Information Systems and Marketing (IISM),
Karlsruhe, Germany
{jasper.feine,stefan.morana,ulrich.gnewuch}@kit.edu
Abstract. Chatbots are software-based systems designed to interact with humans
using text-based natural language and have attracted considerable interest in
online service encounters. In this context, service providers face the challenge of
measuring chatbot service encounter satisfaction (CSES), as most approaches are
limited to post-interaction surveys that are rarely answered and often biased. As
a result, service providers cannot react quickly to service failures and dissatisfied
customers. To address this challenge, we investigate the application of automated
sentiment analysis methods as a proxy to measure CSES. Therefore, we first
compare different sentiment analysis methods. Second, we investigate the
relationship between objectively computed sentiment scores of dialogs and
subjectively measured CSES values. Third, we evaluate whether this relationship
also exists for utterance sequences throughout the dialog. The paper contributes
by proposing and applying an automatic and objective approach to use sentiment
scores as a proxy to measure CSES.
Keywords: online customer service, chatbot, sentiment analysis, service
encounter satisfaction, correlation analysis
1 Introduction
Digital communication technologies have become an integral part for organizations to
interact with their customers [1]. Many companies offer online services via live chat
interfaces, which enable customers to directly interact with customer service employees
[2]. This type of text-based service encounter is a cost effective service solution and
often the preferred way of communication for young people [3]. One technology which
is often deployed to assist service employees in online service encounters are chatbots
[1]. Chatbots are software-based systems designed to interact with humans via text-
based natural language [4, 5] and can be found across industries (e.g., airlines, energy
provider). Gartner predicts that by 2020, 25% of all customer services organizations
will integrate this technology [6].
Despite their great potential, many customer service chatbots did not meet customer
expectations and led to service failures [7]. As a result, many service providers retired
their chatbots, as unsatisfactory online service encounters have negative effects on
word-of-mouth, loyalty, and intention to repurchase a product [8]. Ignoring customer
frustrations can strongly impede the performance of customer service encounters and
carries the risks that the service chatbot is perceived as cold, socially indebt,
untrustworthy, and incompetent [9]. Therefore, service providers should identify
service encounters that were below customer’s expectations [10] and trigger service
recovery procedures (e.g., offering compensation). Such procedures can help to recover
from almost any service failure and increase trust, perception of fairness, and service
experience [10]. However, most approaches to identify dissatisfied customers in a text-
based online environment (e.g., chat, social media) are limited to post-interaction
surveys [11]. This is problematic as self-reported data can hardly be retrieved during
an interaction, is influenced by various biases [12], and only few users are willing to
provide this kind of information [8, 11]. Therefore, we propose that an automated
method to measure chatbot service encounter satisfaction (CSES) during a customer-
chatbot interaction could help service providers to deal with these issues.
To develop such a method, we want to take advantage of the fact that written text is
associated with a persons thoughts, emotions and motivations [1315]. Humans write
differently when they are happy or frustrated and thus, written text by itself conveys
much information about a human [14]. Users who are less happy with a chatbot use less
assent, fewer positive, and more anger-related words and thus, express more negative
sentiments [16]. The analysis of such opinionated text can provide valuable information
about the user as opinions are “key influencers of our behaviors[17, p. 2] and
sentiment and tonal polarity are inherent properties of human-human communication
and interaction[18, p. 1367].
As a manual analysis of expressed polarity in written text does not scale well to
larger datasets [19], automated sentiment analysis methods have been developed. These
methods are capable of automatically extracting positive or negative polarity expressed
in written text [20]. Moreover, current sentiment analysis methods have been found to
be very accurate and thus, seem to be a valid approach [21, 22]. However, research has
rarely applied sentiment analysis in human-computer interaction (HCI) so far [20, 23].
Most HCI studies focus on auditory and visual signals of humans as these transmit the
majority of communication-related information [20]. Moreover, most sentiment
analysis studies focus on the method itself [23]. As a result, there is a lack of
understanding on how to apply sentiment analysis in online chatbot service encounters
to obtain valuable information about the user and her/his CSES. Therefore, we
investigate the application of sentiment analysis methods for chatbots in online service
encounters by drawing on research that text-based communication by itself is rich in
informative signals [24] and that written language is influenced by emotions, intentions,
and thoughts [1315]. More specifically, we argue that sentiment analysis of dialog
data can be used as an easy-to-use and objective proxy to measure CSES. Therefore,
our research project addresses the following research question:
How to measure service encounter satisfaction with a chatbot using sentiment
analysis methods?
To address this research question, we first compare different sentiment analysis
methods on an empirical level by analyzing the calculated sentiment scores for two
datasets. Next, we test for a potential correlation between sentiment scores and CSES
values that were measured using a survey-based approach in an online experiment. In
doing so, we first investigate this potential relationship on a dialog level and second on
an utterance level (i.e., single messages). This paper contributes by proposing and
applying an automatic and objective approach to use sentiment scores as a proxy to
measure CSES. Our proposed approach enables researchers and practitioners, such as
online customer service providers, to objectively and automatically retrieve valuable
information after and during an online service encounter.
2 Related Work
2.1 Customer Service Chatbots
Recent advances in technology and great business potential have led to an increased
interest in the development of conversational agents [5, 25]. Conversational agents are
software-based systems designed to converse with a user via natural language [4, 5].
Thereby, the user interacts with the conversational agent in a natural dialog and does
not use a predefined set of keywords or command phrases [4]. They can offer both
speech- and text-based interfaces and can also be visualized and animated (i.e.,
embodied conversational agent) [4]. Conversational agents that interact with the user
primarily via a text-based interface are often referred to as chatbots [5]. Chatbots can
be deployed on various communication channels, such as instant messaging platforms
(e.g., Line, Telegram, WeChat), websites, or on social media (e.g., Facebook, Twitter)
and are accessible from various devices (e.g., PCs, mobile phones) [4]. Since
Weizenbaum developed the first chatbot named ELIZA in 1966, much research has
been conducted and various chatbots have been deployed across industries [4, 5].
One of the reasons why both research and practice are increasingly using this
technology is the fact that chatbots interact in a human-like interaction style (i.e., use
natural language) and offer great business potential (i.e., 24/7 availability at lost costs)
[4]. Therefore, chatbots are increasingly implemented in online service encounters as
many companies communicate with their customers via live-chats on their website or
on social media platforms [1, 2]. Chatbots could help to automate online customer
service, save costs, and enhance online experience [1, 26]. For example, instead of a
customer calling or chatting with a service employee, customers are now
communicating with a service chatbot [26]. In addition, chatbots can also take the role
of first tier support agents and assist customer service employees. Therefore, chatbots
can first start an online service encounter and then seemingly handover the conversation
to a human agent when required. This can lead to a great reduction of routine requests
usually handled by service employees.
2.2 Chabot Service Encounter Satisfaction
Satisfaction is an often applied construct in information systems (IS) research to
evaluate the success and effectiveness of a system and it is particularly critical for the
success of service systems [27]. It reflects whether customers perceive a service as
pleasurable with regard to its consumption-related fulfilment [8]. High customer
satisfaction values are important to achieve long-term success, especially in highly
competitive markets, and therefore should have priority for any organization [8, 28].
Customer satisfaction is strongly impacted by the service encounter satisfaction,
which refers to the post-consumption evaluation of a service encounter [29, 30]. A
successful service encounter makes a company’s product incrementally more effective
and easier to use [28], influences the customer’s choice independent whether a service
is provided offline or online [31], and is linked to several desired outcomes such as
word-of-mouth, loyalty, and intention to repurchase a product [8, 32]. Thus, service
encounter satisfaction is a critical indicator for any organization [8, 28].
Service encounter satisfaction is influenced by several antecedents such as the
customization and flexibility in service encounter, effective service recovery when
failures occur, and spontaneous delights (i.e., pleasing experiences customers do not
expect) [32]. In addition, various design elements of a chatbot influence the CSES such
as verbal communication cues (i.e., being polite, responsive, and show mutual
understanding), level of expertise (i.e., a core attribute of a service employee), or visual
cues (i.e., such as an avatar) [29]. In an online context, the measurement of CSES is
often limited to follow-up surveys [11, 29]. Thus, CSES cannot be retrieved in real-
time, is often biased, and the surveys are only answered by a few users [11, 12].
Moreover, customers have a general “reluctance to share their sentiments with firms
[8, p. 359] and thus, companies are often not able to react fast enough to dissatisfied
customers using service recovery procedures [10]. Failing to recover can result in lost
customers, negative word of mouth, decreased loyalty, and less profits [28, 32].
2.3 Sentiment Analysis Methods
A common method within the natural language understanding literature is the analysis
of opinions and sentiments expressed in written text. This becomes meaningful as
research has shown that written text is clearly impacted by the user’s emotions,
intentions, and thoughts [1315]. Consequently, written text says something about us
and can be used as a proxy for information about the author. Therefore, various methods
have been developed to analyze the opinions and sentiments expressed in written text
[21]. These methods are named and defined in many different ways (e.g., sentiment
analysis, opinion mining, see [33]). As it is the most common name, we follow [17, 33]
and define sentiment analysis as the computational analysis of written language to
identify the user’s perceived positive or negative valence towards a certain entity (e.g.,
product, service, event). Sentiment analysis has recently witnessed great attention,
because of the large availability of opinion-rich resources on the Internet (e.g., online
reviews) and advances in artificial intelligence [17]. Consequently, many of the major
technology companies offer sentiment analysis solutions (e.g., IBM, Google) and also
various open source solutions are available (see [21]). This led to the development of
many available and precise methods (see [20, 21]).
Sentiment analysis methods can be generally distinguished into two broad but also
overlapping approaches, namely the application of semantic rules or statistical methods
[20]. Methods of the first category compare sentiment-related expressions with
sentiment lexicons that contain the semantic orientation of words [34]. One of the
greatest challenges of these methods is that the semantic orientation of individual words
does not necessarily correspond to the contextual polarity of the whole sentence [34].
Therefore, it is necessary to extract additional linguistic patterns of the text by
conducting morpho-syntactic text analyses (i.e., wordform, lemma, part of speech tags)
[20]. Too specific extraction patterns, however, limit the application range to a specific
domain. Methods of the second, more recently applied category use unsupervised or
supervised machine learning algorithms including support vector machines and Bayes
classifiers [20]. These methods enable the development of more generic models, but
require labeled data for training purposes. Consequently, the quality of such models is
heavily influenced by the reliability of sentiment annotations [20].
Today’s applications of sentiment analysis are manifold. Sentiment analysis can be
used to predict the success of political campaigns [35], identify interaction problems
within a conversation corpus [11], or even to scan the dark web in an intelligence
context [36]. Nevertheless, only a few studies analyzed sentiments in a chatbot context
yet as most studies are focusing on the method itself [20, 23]. One reason is the
difficulty to classify rather short informal chat messages, which include a high degree
of language creativity, spelling mistakes, and the expression of sentiments without real
intentions [19]. Another reason are the differences and ambiguities in human mood
coding which make it difficult to create a gold standard [37] and thus, it is difficult to
develop user-independent prediction models [38]. However, some related research has
already applied sentiment analysis to infer the customer satisfaction from product
reviews for shopping websites and mobile services [23, 39].
3 Research Method
To answer our research question, we first selected suitable dialog corpora and sentiment
analysis methods to run our analyses. Then, we defined a three-step research approach
to analyze the corpora in order to answer our research question.
3.1 Dialog Corpora and Sentiment Methods
First, we collected one dialog corpus from an online experiment in a customer service
context [40]. The participants (n = 79, mean age = 28.835, SD age = 6.388) were given
a fictive mobile phone bill and the experimental task was to find a more suitable mobile
phone plan through interacting with a customer service chatbot. The chatbot asked
several consumption-related questions and was capable of responding interactively to
given user queries. After the interaction, all participants were asked to complete a
questionnaire measuring CSES using an established measurement instrument on a 7-
point Likert scale [29]. The construct displayed a sufficient composite reliability (CR)
above 0.8 (CR = 0.814) and the average variance extracted was above 0.5. All
measurement items had factor loadings above 0.7 and the mean CSES value was 4.924
(SD = 1.179). The complete experiment, all dialogs, as well as the questionnaire were
in English. The complete dialog corpus consists of 79 user dialogs with a total of 1416
user utterances. We removed 353 utterances because they consisted of only mobile
contract related numbers. The final corpus included 79 dialogs and a total of 1063
utterances with an average of 13.456 utterances per dialog (SD = 8.312). We refer to
this dialog corpus as “ExpCorpus” in the remainder of this paper.
In addition to ExpCorpus, we used a second, publicly available dialog corpus
(without CSES values) in order to have a greater basis for the comparison of different
sentiment analysis methods. Therefore, we selected the ConvAI dialog corpus [41].
500 volunteers chatted with ten chatbots and the dialog set is freely available as a JSON-
File. The dataset includes 2778 dialogs from which we excluded 441 human-human
dialogs, 102 empty dialogs, 54 bot only dialogs, and one numbers-only dialog. This
resulted in the extraction of 2180 human-chatbot dialogs, which were neither empty nor
contained only numbers. Finally, we extracted all 12482 human written utterances. We
refer to this dialog corpus as “ConvAI” in the remainder of this paper.
To select appropriate sentiment analysis methods for our study, we reviewed two
benchmark analyses [21, 42]. We followed the benchmark analysis of [21], which
compared 24 open source methods, as well as the benchmark analysis of [42], which
also included sentiment analysis methods from major technology companies (e.g., IBM,
Microsoft). The benchmark analyses reveal that there is no superior sentiment analysis
method because all tools perform differently depending on the specific context they are
applied on or depending on the corresponding data source on which they were trained
[21]. Consequently, both benchmarks reveal several suitable methods depending on the
respective context and the training data [21]. The benchmark of [21] reveals that two
of the best sentiment analysis methods providing numerical polarity for negative,
neutral, and positive sentiments are VADER [43] and AFINN (i.e., an extension of
ANEW [44]) [21]. VADER and AFINN are rule-based sentiment analysis methods,
which use rules and heuristics to match the analyzed texts to sentiment lexicons. Both
lexicons were developed and trained on social media content and Twitter data [21, 43].
The benchmark analysis of [42] reveals that the sentiment analysis methods by IBM
Watson, Google Cloud, and Microsoft Azure perform best with varying types of
datasets [42]. These sentiment analysis methods leverage machine learning
classification algorithms in order to predict the sentiment score. Therefore, all three
providers trained their algorithms on an extensive body of sentiment annotated text
databases [42]. To cover both types of sentiment analysis techniques, namely semantic
rules and statistical methods [20], we selected the following methods for our study: two
open source methods using rule-based sentiment analysis methods (i.e., VADER,
AFINN) and three commercial methods using machine learning classification
algorithms (i.e., IBM Watson, Google Cloud, and Microsoft Azure). We calculated the
sentiment scores for each of the open source methods using the web service ifeel 2.0
provided by [22] and for each of the commercial methods using their Node.js APIs.
3.2 Research Approach
In this section, we present our three-step research approach (see Table 1) to answer our
research question and to investigate the potential correlation between sentiments and
CSES. All analyses were conducted using R 3.5.0.
Table 1. Research approach
Step
Research method
Dialog corpora
1.
Comparison of sentiment analysis
methods
ConvAI (dialog & utterance level),
ExpCorp (dialog & utterance level)
2.
Correlation analysis between
sentiment scores and CSES values
ExpCorp (dialog level)
3.
Exploratory analysis of sentiment
scores and CSES values
ExpCorp (utterance level)
In the first step, we compared all selected sentiment methods because the accuracy of
sentiment analysis method are highly context and data dependent. Therefore, we
investigated whether sentiment scores from each tool are similar on a dialog and
utterance level by calculating the sentiment scores for each dialog and each single
utterance of both corpora with all five methods. Next, we tested for potential
correlations among the five sentiment scores. We do this analysis on a sentence and
utterance level as sentiment analysis methods seem to perform better on carefully
authored, lengthier content, but often struggle when faced with informal online
communication[19, p. 318]. Consequently, we assume that some sentiment methods
may struggle to predict the sentiment score of rather short utterance level and that the
methods perform quite differently on both levels.
In the second step, we tested for a correlation between sentiment scores and CSES
values. Therefore, we standardized the sentiment scores to -1 (i.e., negative) and +1
(i.e., positive) and subsequently tested for a correlation between sentiment scores (from
all five methods) and CSES values using the dialogs and satisfaction data of
ExpCorpus. By doing this, we aimed to reveal whether sentiment scores are a valid
proxy for CSES values.
In the third step, we investigated the minimum number of utterances required to
show a correlation between sentiment scores and CSES values. For this analysis, we
used IBM’s sentiment method because it yielded the highest correlation in the previous
step. Therefore, we extracted utterance sequences of each dialog, calculated their
sentiment scores, and tested for a correlation between sentiment scores and CSES
values. Next, we investigated whether these findings also hold for utterance sequences
throughout the whole dialog. This analysis provides insights whether sentiment scores
can be used as a proxy for CSES during a customer service encounter.
4 Results
Step 1: Comparison of Sentiment Analysis Methods
In the first step, we started our analysis by comparing the calculated sentiment scores
of selected sentiment analysis methods for both dialog corpora (ConvAI and
ExpCorpus). Table 2 contains the correlation analysis between sentiment scores of both
corpora for each dialog and single utterance calculated by all five methods.
Table 2. Pearson correlation analyses among sentiment scores of different methods
Corpus
Method
AFINN
VADER
IBM
Microsoft
AFINN
VADER
IBM
Microsoft
ConvAI
AFINN
-
-
VADER
.605***
-
.322***
-
IBM
.385***
.317***
-
.387***
.357***
-
Microsoft
.368***
.356***
.533***
-
.300***
.311***
.604***
-
Google
.369***
.295***
.504***
.395***
.414***
.388***
.600***
.497***
n = 2180 dialogs
n = 12482 utterances
ExpCorpus
AFINN
-
-
VADER
.719***
-
.597***
-
IBM
.508***
.512***
-
.505***
.169***
-
Microsoft
.473***
.467***
.600***
-
.383***
.201***
.615***
-
Google
.516***
.366***
.521***
.537***
.653***
.439***
.625 ***
.487***
n = 79 dialogs
n = 1063 utterances
*** p < .001
The results reveal that sentiment scores of dialog data from both corpora are at least
moderately positively correlated with each other [45] (ConvAi .295 r .605, n =
2180, p < .001, ExpCorpus .366 r .719, n = 79, p < .001). The strongest correlation
for ConvAi dialogs were identified between VADER’s and AFINN’s sentiment scores
(r = .605, n = 2180, p < .001) and the weakest between Vader’s and Google’s sentiment
scores (r = .295, n = 2180, p < .001). The strongest correlation for ExpCorpus was again
identified between VADER’s and AFINN’s sentiment scores (r = .719, n = 79, p <
.001) and the weakest one between Vader’s and Google’s sentiment scores (r = .366, n
= 2180, p < .001). All sentiment scores on an utterance level were significantly
positively correlated, but some correlations were weaker among some methods than
they were on a dialog level (.169 r .653, p < .001). All in all, the findings reveal
that sentiment methods using similar methodologies to identify the expressed polarity
in a given text provide rather similar results. Thus, methods using semantic rules such
as VADER and AFINN are strongly correlated on a dialog level. Moreover, methods
using machine classification algorithms such as IBM’s, Microsoft’s, and Google’s
methods are at least moderately correlated on a dialog and utterance level.
Step 2. Correlation Analysis Between Sentiment Scores and CSES Values
In the second step, we tested for a correlation between sentiment scores and CSES
values using the dialogs and CSES values of ExpCorpus. The results and the
corresponding scatterplots are displayed in Figure 1. The analysis reveals a significant
moderate to strong correlation between sentiment scores (from all five methods) and
CSES values (.405 ≤ r ≤ .513, n = 79, p < .001) [45]. Thus, we conclude that there is a
moderate positive correlation between sentiment scores and CSES values for four
sentiment analysis methods and a strong positive correlation for IBM’s sentiment
method (r = .513, n = 79, p < .001) [45]. Moreover, it becomes visible that sentiment
scores seem to be primarily a better predictor for positive than for negative CSES
values. Moreover, semantic rule based algorithms seem to calculate sentiment scores
of the dialogs generally more positive.
Sentiment
method
Pearson’s correlation
statistic
AFINN
r = .417, n = 79, p < .001
VADER
r = .464, n = 79, p < .001
IBM
r = .513, n = 79, p < .001
Microsoft
r = .461, n = 79, p < .001
Google
r = .405, n = 79, p < .001
Figure 1. Correlation analyses between sentiment scores (of dialogs) and CSES
values for ExpCorpus
Step 3: Exploratory Analysis of Sentiment Scores and CSES Values
In the third step, we investigated the minimum number of utterances required to show
a significant positive correlation between sentiment scores and CSES values. Therefore,
we combined the first ten utterances (ui, i = 1, ,10) into ten different utterance
sequences (USi, i = 1, …, 10), calculated their sentiment scores, and tested for a
correlation with CSES values. The results are summarized in Table 3.
Table 3. Correlation analyses between sentiment scores (of utterances sequences) and CSES
values for ExpCorpus
Analysed utterance
squence
Included
dialogs
Included
utterances
Included
words
Pearson’ s correlation statistic
US1 = {u1}
79
79
739
n = 79, r = .018, p = .872
US2 = {u1, u2}
79
158
1086
n = 79, r = .133, p = .244
US3 = {u1, u2, u3}
79
237
1360
n = 79, r = .234, p = .038
US4 = {u1, …, u4}
79
316
1574
n = 79, r = .251, p = .026
US5 = {u1, …, u5}
79
395
1779
n = 79, r = .372, p < .001
US6 = {u1, …, u6}
75
450
1989
n = 75, r = .437, p < .001
US7 = {u1, …, u7}
74
518
2159
n = 74, r = .480, p < .001
US8 = {u1, …, u8}
68
544
2350
n = 68, r = .443, p < .001
US9 = {u1, …, u9}
58
522
2495
n = 58, r = .506, p < .001
US10= {u1, …, u10}
46
460
2633
n = 46, r = .503, p < .001
All dialogs with all
utterances
79
1060
3431
n = 79, r = .513, p < .001
Please note that not all dialogs included up to ten user utterances. As a consequence, the number of analyzed dialogs
decreases with increasing sequence length. The last row analyzes all dialogs including all utterances of each dialog.
The analysis reveals that the sentiment scores of US1 and US2 have no significant
correlation with the CSES values. However, the correlation increases with an increasing
number of utterances combined in each sequence. Our results show a significant weak
positive correlation between sentiment scores and CSES values after the analysis of the
first three utterances (r = .234, n = 79, p = .038). Moreover, we revealed a significant
moderate positive correlation (r = .372, n = 79, p < .001) after the analysis of the first
five utterances [45]. To provide a greater understanding of these findings, Table 4
provides some exemplary utterance sequences, their sentiment scores, and the measured
CSES values.
Table 4. Exemplary utterance sequences including the first three utterances
Utterance sequence
Sentiment
score
CSES
value
{“Hi”, Nice to meet you. I’m interested in a cheaper phone plan. Can you help
me?”, I think it is SuperMobile”}
0.769
6
{“Hey, I’m currently on the mobile phone plan Yellow Basic 1000 and I
received an unexpectedly high mobile phone billl last month.”, Are there any
better mobile phone plans for me?”, “It’s SuperMobile Yellow Basic 1000”}
0.488
6
{My bill is too high”, Help me to find a new mobile phone plan”, I dont
know”}
-0.566
4,333
Having shown that at least the first three utterances of a dialog are required to find a
significant positive correlation between sentiment scores and CSES values, we further
investigated whether this correlation can also be found for all utterance sequences
throughout the whole dialogs. Therefore, we extracted all consecutive utterance
sequences consisting of three or five utterances within the first ten utterances of each
dialog. This extraction resulted in eight consecutive utterance sequences for dialogs that
were at least that long (e.g., Seq-1 = {u1, u2, u3}, Seq-2 = {u2, u3, u4}, Seq-9 = {u1,
u2, u3, u4, u5}). Then we calculated the sentiment scores and tested for a correlation
between sentiment scores and CSES values (see Table 5).
Table 5. Correlation analyses between sentiment scores (of consecutive utterance sequences)
and CSES values for ExpCorpus
Sequence
Included utterances
Pearson’ s correlation statistic
Seq-1
{u1,
u2,
u3}
n = 79, r = .234, p = .038
Seq-2
{u2,
u3,
u4}
n = 79, r = .243, p = .031
Seq-3
{u3,
u4,
u5}
n = 79, r = .289, p = .010
Seq-4
{u4,
u5,
u6}
n = 75, r = .350, p = .002
Seq-5
{u5,
u6,
u7}
n = 74, r = .410, p < .001
Seq-6
{u6,
u7,
u8}
n = 68, r = .267, p = .029
Seq-7
{u7,
u8,
u9}
n = 58, r = .501, p < .001
Seq-8
{u8,
u9,
u10}
n = 46, r = .501, p < .001
Seq-9
{u1,
u2,
u3,
u4,
u5}
n = 79, r = .372, p < .001
Seq-10
{u2,
u3,
u4,
u5,
u6}
n = 75, r = .407, p < .001
Seq-11
{u3,
u4,
u5,
u6,
u7}
n = 74, r = .436, p < .001
Seq-12
{u4,
u5,
u6,
u7,
u8}
n = 68, r = .377, p = .002
Seq-13
{u5,
u6,
u7,
u8,
u9}
n = 58, r = .503, p < .001
Seq-14
{u6,
u7,
u8,
u9,
u10}
n = 46, r = .325, p = .028
Please note that not all dialogs included up to ten user utterances. As a consequence, the number of analyzed dialogs
decreases with increasing utterance position.
The analysis shows that sentiment scores of all utterance sequences throughout the
whole dialog are positively correlated with the CSES values. All correlations are
significant at least at a p < .05 level. The correlation strength varies among the different
sequences between weak and strong correlation. However, the minimum and maximum
value of the correlation strength is higher for sequences consisting of five consecutive
utterances, which always had at least a moderate positive correlation with CSES values.
5 Discussion
In this paper, we investigate whether sentiment scores from textual input can be used
as a proxy to measure CSES in a customer-chatbot interaction. Therefore, we followed
a three-step research approach: first, we compared five sentiment analysis methods by
testing the relation of sentiment scores from two dialog corpora. Second, we tested for
a correlation between sentiment scores and CSES values. Third, we analyzed this
correlation in detail at the utterance level. Results of step 1 reveal a significant
positively correlation among sentiment scores from all selected sentiment analysis
methods. Results of step two reveal that sentiment scores of complete dialogs are
significantly positive correlated with the subjectively measured CSES values. Results
of step three reveal that this relation is not only valid for the analysis of an entire dialog,
but also for any sequences of at least three consecutive utterances throughout the entire
dialog. Thus, we conclude that sentiment scores can be used as an automatic and
objective proxy to measure CSES in an online service encounter. Therefore, our
findings further contribute to existing research that states sentiment analysis
corresponds surprisingly well with emotional self-report[15, p. 87].
The results of our analysis have implications for the design of customer service
chatbots. As customers may express their frustrations in written language, future
chatbots could continuously perform sentiment analyses and use sentiment scores as a
proxy to identify dissatisfied customers (by analyzing at least three consecutive
utterances). In this way, service providers can intervene to reduce the risk of service
failures. For example, a customer service chatbot could recognize that the current
conversation with a customer is turning towards a negative sentiment score. In this case,
several strategies could be triggered. The chatbot could seamlessly handover the
conversation to a trained human service agent, automatically trigger service recovery
procedures, or express certain verbal utterances such as excuses [46, 47]. Research has
shown that these immediate reactions can reduce the level of frustration [46] and can
lead to an increased interaction length [47]. Furthermore, service providers can use this
data in post-interaction analyses to retrieve valuable information about CSES. This
information cannot only be used for service recovery, but also for identifying general
weaknesses in the service quality of the chatbot.
Although we aimed to ensure a high rigor in our research, some limitations should
be considered. First, many sentiment analysis methods exist and they all may evaluate
a given text differently depending on the context and type of a message [21]. This
becomes even more meaningful when applied to rather short and informal chat data.
Therefore, a different selection of sentiment methods may have led to different results.
Consequently, we tried to minimize this risk by starting with a selection of five
sentiment methods based on benchmarks and compared them with each other by
applying them on two dialog corpora. Even though all sentiment analysis methods had
a moderate to strong correlation to CSES, some sentiments methods were rather weak
predictors for users having low CSES values. Therefore, it is important that
researchers and companies perform experiments with different methods before
applying a method[21, p. 27]. Second, we analyzed a dialog corpus, which measured
the CSES using a post-interaction survey. However, data of a survey-based approach
might be influenced by various biases [12]. To reduce this risk, we reviewed all dialogs
and verified that participants followed the experimental task and did not answer with
straight line responses. Third, we only analyzed the relationship between sentiments
and CSES based on a dialog corpus from a hypothetical online service task (i.e., finding
new plan) in a specific context (i.e., mobile contract) in one language (i.e., English).
Therefore, it is unclear whether our findings also hold for other customer service tasks
(e.g., book ticket) in other contexts (e.g., airlines) in other languages (e.g., German).
Fourth, we conducted correlation analyses between sentiment scores and CSES values
to reveal a correlation between these two variables. Even though we found a strong
positive correlation and propose sentiment scores as a proxy for CSES values, this
analysis does not provide the explanation for this relation and does not indicate a cause-
and-effect relationship [48]. Thus, results need to be applied with care as we cannot
predict CSES based on sentiments scores or vice versa [48].
Considering these limitations, we identify several avenues for future research. First,
future work can replicate our analyses on additional dialog corpora from different
contexts, doing different tasks, and in different languages. This could further strengthen
the applicability of sentiment analysis as a proxy to measure CSES in several domains
and languages. Second, future studies could investigate adaptive reaction strategies
based on real-time analyses of at least three consecutive user utterances. This could
enable chatbots to recognize user frustrations and supports the development of chatbots
that act more socially [46, 47]. Moreover, future research could investigate the
application of more trivial text analysis methods, such as word count and length of
dialogs, as well as more complex methods, such as topic modelling, as proxies to predict
customer satisfaction. Integrating these techniques into a chatbot can lead to even
greater understanding of the user and enables more precise reactions by the chatbot.
6 Conclusion
In this paper, we investigate the application of sentiment analysis methods in an online
service encounter with a chatbot and show that sentiment scores can serve as a proxy
to measure CSES. This enables researchers and practitioners, such as online service
providers, to objectively and automatically retrieve user information during and after
an online service encounter. This information can be used not only to trigger service
recovery procedures, but also to identify weaknesses in the service quality and to
analyze the user in real-time. Therefore, our results contribute towards the design of
user adaptive service chatbots.
References
1. Larivière, B., Bowen, D., Andreassen, T.W., Kunz, W., Sirianni, N.J., Voss, C.,
Wünderlich, N.V., Keyser, A. de: “Service Encounter 2.0”: An investigation into the roles
of technology, employees and customers. Journal of Business Research 79, 238246
(2017)
2. McLean, G., Osei-Frimpong, K.: Examining satisfaction with the experience during a live
chat service encounter-implications for website providers. Computers in Human Behavior
76, 494508 (2017)
3. Kowatsch, T., Nißen, M., Rüegger, D., Stieger, M., Flückiger, C., Allemand, M.,
Wangenheim, F. von: The Impact of Interpersonal Closeness Cues in Text-based
Healthcare Chatbots on Attachment Bond and the Desire to Continue Interacting: An
Experimental Design. In: Twenty-Sixth European Conference on Information Systems
(ECIS). Portsmouth, UK (2018)
4. McTear, M., Callejas, Z., Griol, D.: The Conversational Interface. Talking to Smart
Devices. Springer International Publishing, Switzerland (2016)
5. Dale, R.: The return of the chatbots. Natural Language Engineering 22, 811817 (2016)
6. Gartner: Gartner Says 25 Percent of Customer Service Operations Will Use Virtual
Customer Assistants by 2020, https://www.gartner.com/newsroom/id/3858564
7. Ben Mimoun, M.S., Poncin, I., Garnier, M.: Case studyEmbodied virtual agents. An
analysis on reasons for failure. Journal of Retailing and Consumer Services 19, 605612
(2012)
8. Oliver, R.L.: Satisfaction. A behavioral perspective on the consumer. McGraw Hill, New
York (1997)
9. Brave, S., Nass, C.: Emotion in human-computer interaction. In: Jacko, J.A., Sears, A.
(eds.) The human-computer interaction handbook, pp. 8196. L. Erlbaum Associates Inc,
NJ, USA (2002)
10. Holloway, B.B., Beatty, S.E.: Service Failure in Online Retailing: A Recovery
Opportunity. Journal of Service Research 6, 92105 (2003)
11. Xiang, Y., Zhang, Y., Zhou, X., Wang, X., Qin, Y.: Problematic situation analysis and
automatic recognition for chinese online conversational system. In: Joint Conference on
Chinese Language Processing. Wuhan, China (2014)
12. Podsakoff, P.M., MacKenzie, S.B., Lee, J.-Y., Podsakoff, N.P.: Common method biases in
behavioral research: a critical review of the literature and recommended remedies. The
Journal of applied psychology 88, 879903 (2003)
13. Tausczik, Y.R., Pennebaker, J.W.: The Psychological Meaning of Words: LIWC and
Computerized Text Analysis Methods. Journal of Language and Social Psychology 29,
2454 (2010)
14. Nerbonne, J.: The Secret Life of Pronouns. What Our Words Say About Us. Literary and
Linguistic Computing 29, 139142 (2014)
15. Küster, D., Kappas, A.: Measuring Emotions Online: Expression and Physiology. In:
Holyst, J.A. (ed.) Cyberemotions: Collective Emotions in Cyberspace, pp. 7193. Springer
International Publishing, Cham (2017)
16. Skowron, M., Rank, S., Theunis, M., Sienkiewicz, J.: The Good, the Bad and the Neutral:
Affective Profile in Dialog System-User Communication. In: D’Mello, S., Graesser, A.,
Schuller, B., Martin, J.-C. (eds.) Affective Computing and Intelligent Interaction, pp. 337
346. Springer, Berlin, Heidelberg (2011)
17. Liu, B.: Sentiment analysis and opinion mining. Synthesis lectures on human language
technologies 5, 1167 (2012)
18. Banchs, R.E.: On the construction of more human-like chatbots: Affect and emotion
analysis of movie dialogue data. In: Asia-Pacific Signal and Information Processing
Association Annual Summit and Conference. Kuala Lumpur (2017)
19. Brooks, M., Kuksenok, K., Torkildson, M.K., Perry, D., Robinson, J.J., Scott, T.J.,
Anicello, O., Zukowski, A., Harris, P., Aragon, C.R.: Statistical Affect Detection in
Collaborative Chat. In: Conference on Computer Supported Cooperative Work, pp. 317
328. ACM, New York, NY, USA (2013)
20. Clavel, C., Callejas Z.: Sentiment Analysis: From Opinion Mining to Human-Agent
Interaction. IEEE Transactions on affective computing 7, 7493 (2016)
21. Ribeiro, F.N., Araújo, M., Gonçalves, P., André Gonçalves, M., Benevenuto, F.:
SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods.
EPJ Data Science 5, 23 (2016)
22. Diniz, J.P., Bastos, L., Soares, E., Ferreira, M., Ribeiro, F., Benevenuto, F.: ifeel 2.0: A
multilingual benchmarking system for sentence-level sentiment analysis. In: 10th
international AAAI conference on weblogs and social media. Cologne, Germany (2016)
23. Kang, D., Park, Y.: Review-based measurement of customer satisfaction in mobile service:
Sentiment analysis and VIKOR approach. Expert Systems with Applications 41, 1041
1050 (2014)
24. Walther, J.B., Parks, M.R.: Cues filtered out, cues filtered in. In: Knapp, M.L., Daly, J.A.
(eds.) Handbook of Interpersonal Communication, pp. 529563. SAGE, Thousand Oaks,
CA, USA (2002)
25. Maedche, A., Morana, S., Schacht, S., Werth, D., Krumeich, J.: Advanced User Assistance
Systems. Business & Information Systems Engineering 58, 367370 (2016)
26. Gnewuch, U., Morana, S., Maedche, A.: Towards Designing Cooperative and Social
Conversational Agents for Customer Service. In: Proceedings of the 38th International
Conference on Information Systems (ICIS). AISel, Seoul (2017)
27. Au, N., Ngai, E.W.T., Cheng, T.E.: A critical review of end-user information system
satisfaction research and a new research framework. Omega-International Journal of
Management Science 30, 451478 (2002)
28. Jones, T.O., Sasser, W.E.: Why satisfied customers defect. Harvard Business Review 73,
88-& (1995)
29. Verhagen, T., van Nes, J., Feldberg, F., van Dolen, W.: Virtual Customer Service Agents.
Using Social Presence and Personalization to Shape Online Service Encounters. Journal of
Computer-Mediated Communication 19, 529545 (2014)
30. Caruana, A.: Service loyalty. European Journal of Marketing 36, 811828 (2002)
31. Shankar, V., Smith, A.K., Rangaswamy, A.: Customer satisfaction and loyalty in online
and offline environments. International Journal of Research in Marketing 20, 153175
(2003)
32. Bitner, M.J., Brown, S.W., Meuter, M.L.: Technology Infusion in Service Encounters.
Journal of the Academy of Marketing Science 28, 138149 (2000)
33. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and Trends in
Information Retrieval 2, 1135 (2008)
34. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: An exploration of
features for phrase-level sentiment analysis. Computational Linguistics 35, 399433
(2009)
35. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with
twitter: What 140 characters reveal about political sentiment. In: International AAAI
Conference on Weblogs and Social Media, 10, pp. 178185. Menlo Park, CA, USA (2010)
36. Abbasi, A., Chen, H.: Affect Intensity Analysis of Dark Web Forums. In: Proceedings of
Intelligence and Security Informatics. Chengdu, China (2017)
37. Thelwall, M., Buckley, K., Paltoglou, G., Di Cai, Kappas, A.: Sentiment strength detection
in short informal text. Journal of the American Society for Information Science and
Technology 61, 25442558 (2010)
38. Higashinaka, R., Minami, Y., Dohsaka, K., Meguro, T.: Issues in Predicting User
Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and
Prediction Models. In: Lee, G.G., Mariani, J., Nakamura, S. (eds.) Spoken Dialogue
Systems for Ambient Environments. Seond International Workshop, IWSDS 2010,
Gotemba, Shizuoka, Japan, October 1-2, 2010. Proceedings, 6392, pp. 4860. Springer,
New York (2010)
39. Wang, Y., Lu, X., Tan, Y.: Impact of product attributes on customer satisfaction: An
analysis of online reviews for washing machines. Electronic Commerce Research and
Applications 29, 111 (2018)
40. Gnewuch, U., Morana, S., Adam, M., Maedche, A.: Faster Is Not Always Better:
Understanding the Effect of Dynamic Response Delays in Human-Chatbot Interaction. In:
in Proceedings of the 26th European Conference on Information Systems (ECIS),
Portsmouth, United Kingdom, June 23-28.
41. Logacheva, V., Burtsev, M., Malykh, V., Poluliakh, V., Rudnicky, A., Serban, I., Lowe,
R., Prabhumoye, S., Black, A.W. and Bengio, Y.: A Dataset of Topic-Oriented Human-to-
Chatbot Dialogues, http://convai.io/2017/data/dataset_description.pdf (Accessed:
30.08.2018)
42. Corredera Arbide, A., Romero, M., Moya Fernández, J.M.: Affective computing for smart
operations: a survey and comparative analysis of the available tools, libraries and web
services. International Journal of Innovative and Applied Research 5, 1235 (2017)
43. Gilbert, C.H.E.: Vader: A parsimonious rule-based model for sentiment analysis of social
media text. In: Eighth International AAAI Conference on Weblogs and Social Media. Ann
Arbor, MI, USA (2014)
44. Nielsen, F.Å.: A new ANEW: Evaluation of a word list for sentiment analysis in
microblogs. arXiv preprint arXiv:1103.2903 (2011)
45. Cohen, J.: A power primer. Psychological bulletin 112, 155159 (1992)
46. Hone, K.: Empathic agents to reduce user frustration. The effects of varying agent
characteristics. Interacting with Computers 18, 227245 (2006)
47. Klein, J., Moon, Y., Picard, R.W.: This computer responds to user frustration. Theory,
design, and results. Interacting with Computers 14, 119140 (2002)
48. Taylor, R.: Interpretation of the Correlation Coefficient: A Basic Review. Journal of
Diagnostic Medical Sonography 6, 3539 (1990)
... Advances in technology led to the popularity of using conversational agents, such as chatbot agents, in organizations' marketing communication efforts (Dale, 2016). Chatbots are software-based and can interact with users via a text-based interface (Feine et al., 2019). Using a natural dialogue, where they do not operate based on some predetermined keywords or command phrases, chatbots are deployed on organizational websites, instant messaging apps, or social media, and can be easily made accessible (Karri & Kumar, 2020;McTear et al., 2016). ...
... Using a natural dialogue, where they do not operate based on some predetermined keywords or command phrases, chatbots are deployed on organizational websites, instant messaging apps, or social media, and can be easily made accessible (Karri & Kumar, 2020;McTear et al., 2016). Chatbots mimic real-life interpersonal communication, allowing users to feel comfortable launching a conversation and distinguish similarities between chatbots and themselves (Feine et al., 2019;Go & Sundar, 2019;Prasetya et al., 2018). Dialogic chatbot communication offers customers real-time information, feedback and fulfillment of their needs, alongside product or service consumption (Reinartz et al., in press). ...
... Apart from responsiveness, analyzing the degree to which chatbot agents can interact with customers in a conversational text-based natural language is key to measuring service satisfaction (Feine et al., 2019). Prior literature on user-conversation chatbot interaction has focused on examining interactive dialogue systems, specifically, the personalities that a conversational bot agent assumes while interacting with customers (Fadhil & Schiavo, 2019). ...
Article
The present study is grounded in social exchange theory and resource exchange theory. By exploring customers' satisfaction with chatbot services and their social media engagement, it examined the effects of responsiveness and a conversational tone in dialogic chatbot communication on customers. To test the proposed mediation model, we surveyed a representative sample of customers (N = 965) living in the U.S. After examining the validity and reliability of our measurement model, we tested the hypothesized model using structural equation modeling (SEM) procedures. All proposed hypotheses were supported, indicating the significant direct effects of (1) responsiveness and a conversational tone on customers' satisfaction with chatbot services, (2) customers' chatbot use satisfaction on social media engagement, (3) customers’ social media engagement on price premium and purchase intention, and (4) purchase intention on price premium. In addition, we examined satisfaction, social media engagement, and purchase intention as significant mediators in the proposed model. Theoretical and practical implications of the study were then discussed.
... 236 Research in this field is often centered on customer satisfaction. For example Chung et al. 237 considered this issue in relation to customer service CAs for luxury brands, whereas Feine et al. 238 studied how the customer experience can be assessed using sentiment analysis, concluding that automated sentiment analysis can act as a proxy for direct customer feedback. An important finding that must be taken into account when developing customer service CAs is that customers generally prefer systems that provide a quick and efficient solution to their problem, 239 and that embodiment does not always improve users' perception of the interaction. ...
Preprint
In this chapter, we provide a review of conversational agents (CAs), discussing chatbots, intended for casual conversation with a user, as well as task-oriented agents that generally engage in discussions intended to reach one or several specific goals, often (but not always) within a specific domain. We also consider the concept of embodied conversational agents, briefly reviewing aspects such as character animation and speech processing. The many different approaches for representing dialogue in CAs are discussed in some detail, along with methods for evaluating such agents, emphasizing the important topics of accountability and interpretability. A brief historical overview is given, followed by an extensive overview of various applications, especially in the fields of health and education. We end the chapter by discussing benefits and potential risks regarding the societal impact of current and future CA technology. Please find the preprint on arXiv: https://arxiv.org/abs/2202.03164
... The same goes for chatbots. A review of literature also suggests that there is a breadth of studies on chatbots as information support tools (Ranoliya et al. 2017;Tariverdiyeva, 2019;Balaji, 2019;Feine et al. 2019;Følstad & Brandtzaeg, 2020). Among other factors and features, these studies almost highlight the importance of a chatbot KB to properly accommodate users. ...
Article
Full-text available
In the educational domain, artificial intelligence (AI) is one of the information and communication technologies gaining popularity for its advantages in teaching and learning, especially in information support services. The University of the Philippines Open University (UPOU), as a leader of open and distance e-learning in the country, explored this technology and came up with its own tool to streamline its information support services. The UPOU chatbot, personified as Iska and IskOU, provides immediate and appropriate human-like conversations when prompted by users. The tool is able to deliver these conservations through its intelligence database or knowledge base, which is a result of a university-wide effort to collate relevant information. This chatbot intelligence influences user satisfaction as it is the basis of the tool’s performance. Therefore, the study aimed to evaluate the UPOU chatbot’s performance as an information support tool by determining the level of satisfaction of UPOU chatbot users. Data was collected through a post-interaction survey with the users and was analyzed using descriptive statistics and thematic analysis. Results showed mixed experiences among UPOU chatbot users. It was mainly reported that the tool has issues in interpretations and addressing complex, multiple, and specific/unique queries. Nonetheless, users evaluated the UPOU chatbot as a satisfying and helpful tool. A number of areas and topics for future investigations were also listed.
... For example, Diederich et al. (2019b) found that chatbots with sentiment-adaptive responses to emulate empathy in a service encounter provide a higher level of perceived humanness, social presence and service encounter satisfaction compared to chatbots with static responses in an experiment. Feine et al. (2019b) suggest that chatbots may serve as first tier support agents and hand over the service to a human service employee when needed. Building on this premise, Poser et al. (2021) propose a hybrid service recovery design with real-time handovers from chatbots to human service employees if chatbots' capabilities are exceeded. ...
Article
Full-text available
Interactions with conversational agents (CAs) become increasingly common in our daily life. While research on human-CA interactions provides insights into the role of CAs, the active role of users has been mostly neglected. We addressed this void by applying a thematic analysis approach and analysed 1000 interactions between a chatbot and customers of an energy provider. Informed by the concepts of social presence and social cues and using the abductive logic, we identified six human-chatbot interaction types that differ according to salient characteristics, including direction, social presence, social cues of customers and the chatbot and customer effort. We found that bi-directionality, a medium degree of social presence and selected social cues used by the chatbot and customers are associated with desirable outcomes in which customers mostly obtain requested information. The findings help us understand the nature of human-CA interactions in a customer service context and inform the design and evaluation of CAs.
... Ranoliya et al. [13] proposed a more classical XML-based approach for University-related queries, achieving impressive results for an automatic question-answering problem in educational support. Often, data received from customers is further analysed with sentiment analysis via either scoring and polarity [14], [15] or classification [16], [17]. In this study, the sequence of inputs and attention masking are considered, and so, although not explicitly scoring or classifying sentiment, valence data still exists within the dataset. ...
Preprint
Full-text available
With growing societal acceptance and increasing cost efficiency due to mass production, service robots are beginning to cross from the industrial to the social domain. Currently, customer service robots tend to be digital and emulate social interactions through on-screen text, but state-of-the-art research points towards physical robots soon providing customer service in person. This article explores two possibilities. Firstly, whether transfer learning can aid in the improvement of customer service chatbots between business domains. Secondly, the implementation of a framework for physical robots for in-person interaction. Modelled on social interaction with customer support Twitter accounts, transformer-based chatbot models are initially tasked to learn one domain from an initial random weight distribution. Given shared vocabulary, each model is then tasked with learning another domain by transferring knowledge from the prior. Following studies on 19 different businesses, results show that the majority of models are improved when transferring weights from at least one other domain, in particular those that are more data-scarce than others. General language transfer learning occurs, as well as higher-level transfer of similar domain knowledge in several cases. The chatbots are finally implemented on Temi and Pepper robots, with feasibility issues encountered and solutions are proposed to overcome them.
... CAs implemented with social cues [26] transfer behavioral and social signals while interacting within a user interface [41]. The interaction begins to feel more human-like, similar to a natural communication between two people [15,42]. This paradigm is reported by the research of Nass et al. [41], who identified that human-like CA characteristics trigger social responses by users, better known as the Computers Are Social Actors (CASA) paradigm. ...
Chapter
Conversational agents (CAs) are rapidly changing the way humans and computers interact. Through developments in natural language processing, CAs are increasingly capable of conducting human-like conversations with users. Furthermore, human-like features (e.g., having a name or an avatar) lead to positive user reactions as if they were interacting with a real human conversational partner. CAs promise to replace or supplement traditional interactions between humans (e.g., counseling, interviews). One field of CA-human interaction that is not yet fully understood in developing human-like CAs is donating to a good cause. Notably, many charities rely on approaching people on the streets to raise funds. Against this background, the questions arise: How should a CA for raising funds for non-profit organizations be designed and how does human-like design of a CA influence the user’s donation behavior. To explore these two questions, we conducted a 2 × 2 experiment with 134 participants.
... Popular conversational interfaces are voice assistants that react to spoken user input and chatbots, which are discussed here. In chatbot solutions, conversation typically takes place through typed text input and a front-end that can be, for example, embedded in a website or messaging solution [17]. Conversational design as a special discipline of interactive design deals with all tasks of designing conversational interfaces (e.g., stakeholder and goal definition, conversational flow design [16], actual development and testing) with the goal to provide a good user experience [18]. ...
Conference Paper
Full-text available
Chatbots are text-based dialogue systems that automate communication processes. Instead of communicating with a person, the user communicates with a computer system. Due to the use of Artificial Intelligence (AI) methods, such systems have become increasingly powerful in recent years and allow for more realistic dialogue processes. In particular, methods from the field of machine learning have contributed to an improved understanding of natural language. Nevertheless, such systems are not yet able to acquire the knowledge required to answer user queries independently. Dialogue structures and elements need to be defined as the conversational design of the chatbot. Herein, an user intent describes an information need or a goal that the user aims to achieve by entering text. For a user-centered chatbot design, a relevant set of intents must be identified and structured. In addition, training questions are required in order train the AI models for matching user input with the defined set of user intents. This article describes the procedure for developing chatbots using the example of an application in recruiting. The focus is on the appropriate identification and analysis of user intents. In our case study, the procedure for user-centered intent identification is described as well as approaches for the analysis and consolidation of intents. Furthermore, it is shown how corresponding measures affect the quality of intention identification.
Conference Paper
Full-text available
Working alliance describes an important relationship quality between health professionals and patients and is robustly linked to treatment success. However, due to limited resources of health professionals, working alliance cannot always be promoted just-in-time in a ubiquitous fashion. To address this scalability problem, we investigate the direct effect of interpersonal closeness cues of text-based healthcare chatbots (THCBs) on attachment bond from the working alliance con-struct and the indirect effect on the desire to continue interacting with THCBs. The underlying research model and hypotheses are informed by counselling psychology and research on conver-sational agents. In order to investigate the hypothesized effects, we first develop a THCB codebook with 12 design dimensions on interpersonal closeness cues that are categorized into visual cues (i.e. avatar), verbal cues (i.e. greetings, address, jargon, T-V-distinction), quasi-nonverbal cues (i.e. emoticons) and relational cues (i.e. small talk, self-disclosure, empathy, humor, meta-relational talk and continuity). In a second step, four distinct THCB designs are developed along the continuum of interpersonal closeness (i.e. institutional-like, expert-like, peer-like and myself-like THCBs) and a corresponding study design for an interactive THCB-based online experiment is presented to test our hypotheses. We conclude this work-in-progress by outlining our future work.
Conference Paper
Full-text available
A key challenge in designing conversational user interfaces is to make the conversation between the user and the system feel natural and human-like. In order to increase perceived humanness, many systems with conversational user interfaces (e.g., chatbots) use response delays to simu-late the time it would take humans to respond to a message. However, delayed responses may also negatively impact user satisfaction, particularly in situations where fast response times are expected, such as in customer service. This paper reports the findings of an online experiment in a customer service context that investigates how user perceptions differ when interacting with a chatbot that sends dynamically delayed responses compared to a chatbot that sends near-instant responses. The dynamic delay length was calculated based on the complexity of the re-sponse and complexity of the previous message. Our results indicate that dynamic response de-lays not only increase users’ perception of humanness and social presence, but also lead to greater satisfaction with the overall chatbot interaction. Building on social response theory, we provide evidence that a chatbot’s response time represents a social cue that triggers social re-sponses shaped by social expectations. Our findings support researchers and practitioners in understanding and designing more natural human-chatbot interactions.
Article
Full-text available
In this paper, we make a deep search of the available tools in the market, at the current state of the art of Sentiment Analysis. Our aim is to optimize the human response in Datacenter Operations, using a combination of research tools, that allow us to decrease human error in general operations, managing Complex Infrastructures. The use of Sentiment Analysis tools is the first step for extending our capabilities for optimizing the human interface. Using different data collections from a variety of data sources, our research provides a very interesting outcome. In our final testing, we have found that the three main commercial platforms (IBM Watson, Google Cloud and Microsoft Azure) get the same accuracy (89-90%). for the different datasets tested, based on Artificial Neural Network and Deep Learning techniques. The other stand-alone Applications or APIs, like Vader or MeaninCloud, get a similar accuracy level in some of the datasets, using a different approach, semantic Networks, such as Concepnet 1 , but the model can easily be optimized above 90% of accuracy, just adjusting some parameter of the semantic model. This paper points to future directions for optimizing DataCenter Operations Management and decreasing human error in complex environments.
Conference Paper
Full-text available
The idea of interacting with computers through natural language dates back to the 1960s, but recent technological advances have led to a renewed interest in conversational agents such as chatbots or digital assistants. In the customer service context, conversational agents promise to create a fast, convenient, and cost-effective channel for communicating with customers. Although numerous agents have been implemented in the past, most of them could not meet the expectations and disappeared. In this paper, we present our design science research project on how to design cooperative and social conversational agents to increase service quality in customer service. We discuss several issues that hinder the success of current conversational agents in customer service. Drawing on the cooperative principle of conversation and social response theory, we propose preliminary meta-requirements and design principles for cooperative and social conversational agents. Next, we will develop a prototype based on these design principles.
Article
Full-text available
This paper furthers our understanding of online customer support with regard to online live chat systems. Online live chat systems allow customers to seek service related information from an organisation via online-based synchronous media with a human service representative who provides answers through such media. With use of a web-based survey involving 302 respondents of real-life live chat service experiences with mobile phone network providers in the UK and through the use of structural equation modelling, the aim of this research is to understand the variables capable of influencing a customer’s satisfaction with their experience during an online live chat service encounter. The results indicate the importance of service quality, information quality and system quality variables influencing satisfaction with the experience, while such influence is dependent on the purpose of use. Additionally, the results outline the role of emoticons, presence of service reps picture, automated ‘canned’ responses and the presence of response time estimations in moderating the influence of service quality, information quality and system quality variables on satisfaction with the experience
Article
Online reviews are an important information source for companies analysing users’ demands. We conducted a study of online reviews to measure how product attributes impact customer satisfaction. First, we attempted to infer through sentiment analysis whether a customer is satisfied with a purchase according to their review. Second, a logistic regression model was developed to estimate the impact of various product properties on customer satisfaction scores. Our estimates indicated that customer satisfaction is influenced by drainage mode, loading type, frequency conversion, type, display, colour, and capacity. We further investigate the impact of price and find that customers who buy cheap products should be treated differently from purchasers of expensive items because the relevance of design features on their satisfaction is different. Additionally, we observed that although customers are concerned about noise, perceived noise is not consistent with actual noise levels. We analysed specific reviews and then obtained more detailed information on customer attitudes.
Book
This book provides a comprehensive introduction to the conversational interface, which is becoming the main mode of interaction with virtual personal assistants, smart devices, various types of wearables, and social robots. The book consists of four parts: Part I presents the background to conversational interfaces, examining past and present work on spoken language interaction with computers; Part II covers the various technologies that are required to build a conversational interface along with practical chapters and exercises using open source tools; Part III looks at interactions with smart devices, wearables, and robots, and then goes on to discusses the role of emotion and personality in the conversational interface; Part IV examines methods for evaluating conversational interfaces and discusses future directions. · Presents a comprehensive overview of the various technologies that underlie conversational user interfaces; · Combines descriptions of conversational user interface technologies with a guide to various toolkits and software that enable readers to implement and test their own solutions; · Provides a series of worked examples so readers can develop and implement different aspects of the technologies.
Article
The service encounter – one of the foundational concepts in service research – is fundamentally changing due to rapid evolutions in technology. In this paper, we offer an updated perspective on what we label the “Service Encounter 2.0”. To this end, we develop a conceptual framework that captures the essence of the Service Encounter 2.0 and provides a synthesis of the changing interdependent roles of technology, employees, and customers. We find that technology either augments or substitutes service employees, and can foster network connections. In turn, employees and customers are taking on the role of enabler, innovator, coordinator and differentiator. In addition, we identify critical areas for future research on this important topic.