Conference PaperPDF Available

A Chatbot Response Generation System

Authors:

Abstract and Figures

Developing successful chatbots is a non-trivial endeavor. In particular, the creation of high-quality natural language responses for chatbots remains a challenging and time-consuming task that often depends on high-quality training data and deep domain knowledge. As a consequence, it is essential to engage experts in the chatbot response development process which have the required domain knowledge. However, current tool support to engage domain experts in the response generation process is limited and often does not go beyond the exchange of decoupled prototypes and spreadsheets. In this paper, we present a system that enables chatbot developers to efficiently engage domain experts in the chatbot response generation process. More specifically, we introduce the underlying architecture of a system that connects to existing chatbots via an API, provides two improvement mechanisms for domain experts to improve chatbot responses during their chatbot interaction, and helps chatbot developers to review the collected response improvements with a sentiment supported review dashboard. Overall, the design of the system and its improvement mechanisms are useful extensions for chatbot development systems in order to support chatbot developers and domain experts to collaboratively enhance the natural language responses of a chatbot.
Content may be subject to copyright.
This is the author’s version of a work that was published in the following source
Jasper Feine, Stefan Morana, and Alexander Maedche. 2020. A Chatbot
Response Generation System. In Mensch und Computer 2020 (MuC’20), September
69, 2020, Magdeburg, Germany. ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3404983.3405508
Please note: Copyright is owned by the author and / or the publisher.
Commercial use is not allowed.
Institute of Information Systems and Marketing (IISM)
Kaiserstrasse 89
76133 Karlsruhe - Germany
http://iism.kit.edu
Karlsruhe Service Research Institute (KSRI)
Kaiserstraße 89
76133 Karlsruhe Germany
http://ksri.kit.edu
© 2017. This manuscript version is made available under the CC-
BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-
nc-nd/4.0/
A Chatbot Response Generation System
Jasper Feine
Karlsruhe Institute of Technology
Karlsruhe, Germany
jasper.feine@kit.edu
Stefan Morana
Saarland University
Saarbrücken, Germany
stefan.morana@uni-saarland.de
Alexander Maedche
Karlsruhe Institute of Technology
Karlsruhe, Germany
alexander.maedche@kit.edu
ABSTRACT
Developing successful chatbots is a non-trivial endeavor. In partic-
ular, the creation of high-quality natural language responses for
chatbots remains a challenging and time-consuming task that often
depends on high-quality training data and deep domain knowledge.
As a consequence, it is essential to engage experts in the chatbot
response development process which have the required domain
knowledge. However, current tool support to engage domain ex-
perts in the response generation process is limited and often does
not go beyond the exchange of decoupled prototypes and spread-
sheets. In this paper, we present a system that enables chatbot devel-
opers to eciently engage domain experts in the chatbot response
generation process. More specically, we introduce the underlying
architecture of a system that connects to existing chatbots via an
API, provides two improvement mechanisms for domain experts
to improve chatbot responses during their chatbot interaction, and
helps chatbot developers to review the collected response improve-
ments with a sentiment supported review dashboard. Overall, the
design of the system and its improvement mechanisms are useful
extensions for chatbot development systems in order to support
chatbot developers and domain experts to collaboratively enhance
the natural language responses of a chatbot.
CCS CONCEPTS
Human-centered computing Natural language interfaces
;
User interface design.
KEYWORDS
chatbot response, improvement mechanism, system, domain expert,
chatbot developer
ACM Reference Format:
Jasper Feine, Stefan Morana, and Alexander Maedche. 2020. A Chatbot
Response Generation System. In Mensch und Computer 2020 (MuC’20), Sep-
tember 6–9, 2020, Magdeburg, Germany. ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3404983.3405508
1 INTRODUCTION
The quality of conversational interaction design in general, and the
quality of chatbot responses in particular, is critical for the user
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
MuC’20, September 6–9, 2020, Magdeburg, Germany
©2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-7540-5/20/09. . . $15.00
https://doi.org/10.1145/3404983.3405508
experience with chatbots (i.e., text-based conversational agents)
[
19
,
25
,
34
]. However, many interactions with chatbots are rather
short, constrained by limited vocabulary, and contain incomplete
or wrong information [
1
,
6
,
12
]. This does not only lead to a low
penetration rate of chatbots [
25
], but also limits their application
beyond simple dyadic interactions [37, 50, 54].
The natural language capabilities of chatbots are mostly limited
by the amount of eort developers invest in the development of a
chatbot’s dialog system [
21
,
38
]. Independent of the type of dialog
system (i.e., data-driven or rule-based), the creation and evalua-
tion of high-quality chatbot responses is a very time-consuming
endeavor. Whereas chatbot developers have the essential technical
expertise to develop such dialog systems, they often lack required
domain knowledge. On the other hand, domain experts have the
required domain knowledge, but they lack technological expertise
[
2
,
52
]. As a consequence, it is necessary to develop a system that
empowers domain experts to engage in the response generation
process in order to support the chatbot developers to craft chat-
bot responses that are relevant for the respective end-user groups
[21, 29, 54].
What is currently missing is a system that allows chatbot devel-
opers to actually involve respective domain experts in an eective
and ecient chatbot response generation process. Current pro-
cesses are often limited to the testing of decoupled prototypes and
the exchange of spreadsheets. More specically, current chatbot
development systems lack two important interrelated functionali-
ties: (1) they do not enable domain experts to easily improve and
propose new chatbot responses while (2) chatbot developers keep
control and curation over the response generation process.
To address this need, we introduce a system with three key
functionalities: (1) The system is implemented as a web application
and connects easily with any existing chatbots via an API. Therefore,
the system serves as an additional layer between domain experts
and the connected chatbots. In addition, (2) the system enables
domain experts to interact with the connected chatbot via an auto-
generated chat window which allows them to directly improve
chatbot responses during their interaction. Finally, (3) the system
supports chatbot developers with a sentiment supported review
dashboard in order to review, accept, change, or reject collected
chatbot response improvements of the domain experts.
The developed system is eective, because it enables chatbot
developers to continuously improve chatbot responses of existing
chatbots with respective domain experts. It is ecient, because it
orchestrates the response generation process in one system without
creating additional development eort. Therefore, it enables chatbot
developers to create high quality chatbot responses, interaction data,
and contextual information that can be used to enhance the dialog
system of a chatbot. Therefore, the design of the system can be
used to extend existing chatbot development systems in order to
Session 6: Conversational UIs
333
MuC’20, September 6–9, 2020, Magdeburg, Germany Jasper Feine, Stefan Morana, and Alexander Maedche
support chatbot developers and domain experts to collaboratively
enhance the responses of a chatbot.
In the remainder of this paper, we rst review work on chatbot
dialog systems, their natural language capabilities, and existing chat-
bot response improvement systems. Subsequently, we introduce
the design of the proposed system and outline how we instantiated
it. Next, we report the results of a pilot study in which we tested
the proposed system in a eld deployment. Finally, we discuss the
benets of the system as well as critically reect its application
contexts, its limitations, and further research avenues.
2 RELATED WORK
2.1 Chatbot Dialog Systems
The quality of a conversational interaction between a human and
a chatbot is currently mostly limited by the amount of eort de-
velopers invest in the chatbot’s dialog system. The dialog system
typically consists of three interacting components, namely the nat-
ural language understanding (i.e., convert words to meaning), the
dialog management (i.e., decide the next system action), as well
as the response generation component (i.e., convert meaning to
words) [40].
Dialog systems of chatbots can be broadly distinguished in terms
of their dialog coherency and scalability [
21
,
27
]. On the one hand,
dialog systems that comprise handcrafted domain-specic dialog
rules enable goal-oriented chatbots to converse coherently about
a specic topic. The naturalness of the chatbot responses is, how-
ever, mostly determined by the amount of eort chatbot developers
invest in the development of dialog rules and the authoring of rule-
specic chatbot responses [
21
]. Thus, it is time-consuming to extend
the natural language capabilities of such a chatbot, which limits
the scalability of these approaches. On the other hand, data-driven
dialog managers automatically generate chatbot responses based on
large, existing dialog corpora. They enable the probabilistic match-
ing of user messages to examples in the training data. Then they
select the best matching response from the training data set with-
out using any handcrafted dialog rules. These approaches are often
used for the development of non-goal-oriented chatbots, whose
primary purpose is chatting. However, data-driven approaches lack
coherency and robustness because the naturalness of the responses
strongly relies on the quality of the training data. The generation of
high-quality training data is, however, a major challenge [21, 27].
Overall, rule-based dialog systems dominated the chatbot land-
scape in the last decades [
39
] but data-driven dialog systems are
becoming more popular [
46
]. To further leverage the strengths
of both approaches, hybrid dialog systems have been proposed
[
21
]. For example, Hybrid Code Networks uses a data-driven neu-
ral network combined with the ability to include procedural rules
[
60
]. However, existing real-world solutions that combine both
approaches are still scarce [21].
2.2 Natural Language Limitations of Chatbots
A major goal in the development of chatbots is to create natural
language capabilities that meet the user expectations [
13
,
15
]. How-
ever, most chatbots often reply with the same message, only possess
a very limited vocabulary, and often provide wrong information
[26, 34].
To demonstrate the lack of the chatbots’ language capabilities,
we compared their responses with the responses of humans. There-
fore, we analyzed an existing human-chatbot dialog corpus from the
Conversational Intelligence Challenge 2 (ConvAI2) [
9
]. We selected
this dialog corpus consisting of 1,111 dialogs because it contains
human-chatbot dialogs of state-of-the-art chatbots that are sup-
posed to hold up an intelligent conversation with a human over
several interaction turns. For our analysis, we downloaded the dia-
log corpus of the wild evaluation round in which the human users
evaluated the chatbot responses. We analyzed the lexical diversity
of all chatbot and human messages using a Stanford CoreNLP server
[
36
]. Using the server, we investigated all messages regarding their
unique word lemmas and part-of-speech (POS) tags (i.e., in total
94,933) and counted the unique adjectives, adverbs, and verbs. We
analyzed the adjectives, adverbs, and verbs because they are highly
relevant to express emotions which is an inherently human ability
[
3
,
14
]. As depicted in Figure 1, the ConvAI2 chatbots (human users)
used in total 282 (494) unique adjectives, 97 (160) unique adverbs,
and 264 (466) unique verbs in all ConvAI2 conversations of the wild
evaluation round. The results indicate that the human users used
75% more unique adjectives, 65% more adverbs, and 76% more verbs
across all conversations than the ConvAI2 chatbots. Therefore, the
corpus analysis reveals that the humans’ language usage is very
diverse in terms of lexical and emotional diversity and even these
well-designed chatbots from the ConvAI2 challenge cannot stand
up to this.
Figure 1: Analysis of human-chatbot dialogs of the ConvAI2
challenge. The graph illustrates the amount of unique ad-
jectives, adverbs, and verbs used by the chatbots and human
users during the wild evaluation round of the ConvAI2 chal-
lenge.
2.3 Crowd-Powered Dialog Systems
To further improve chatbot responses, chatbot developers can de-
velop chatbot prototypes, present them to domain experts, analyze
their interaction data, and conduct user interviews [
25
]. Subse-
quently, they can modify the chatbot responses before they start
Session 6: Conversational UIs
334
A Chatbot Response Generation System MuC’20, September 6–9, 2020, Magdeburg, Germany
Table 1: Crowd-powered dialog systems.
System Description
SUEDE [29]
SUEDE is a speech interface prototyping tool which allows designers to easily create prompt/response speech interfaces and
further enables them to test and analyze them with many users using a Wizard-of-Oz mode.
Chorus [34]
is, a crowd-powered conversational assistant. While the assistant appears to be a single individual, it is actually driven by a
dynamic crowd of multiple workers using a specially designed response interface.
Edina [31]
The paper reports a hybrid dialog manager which uses the technique called self-dialogs. This means that crowd-workers write
both the answers of a user and the responses of a chatbot in order to increase naturalness of the dialog corpus.
RegionSpeak [62]
RegionSpeak is an advanced version of ViwWiz [
4
] which collects labels from the crowd for several objects in a visual area and
then enables blind users to explore the spatial layout of the objects.
Evorus [22]
The Evorus system engages crowd-workers to propose best suitable chatbot responses and then automates itself over time
using machine learning.
Mnemo [20]
The Mnemo system is a crowd powered dialog plugin which is capable to save and aggregate human-generated context notes
from goal-oriented dialogs.
Fantom [27]
The Fantom system generates evolving dialog trees and automatically creates crowd tasks in order to collect responses for user
requests that could not be handled so far.
a next improvement cycle. This process, however, is a very work
intensive.
A promising solution to improve the eciency of improving
chatbots responses is to leverage a dialog system that utilizes crowd-
workers. Crowd-workers are human workers which are usually re-
cruited anonymously through an open call over the web and consist
of non-experts [
5
]. Crowd-powered dialog systems could reduce the
scalability limitation of manually developed dialog systems with-
out sacricing the complete control over the response generation
process [
27
]. Whereas early crowd-working attempts took several
hours to complete a task, recent approaches have shown to work
in nearly real-time [
4
,
32
]. As a consequence, crowd-workers have
been used to collectively reply to a user. A brief non-exhaustive
review of promising crowd-powered dialog systems is illustrated
in Table 1.
Beside their advantages, crowd-powered dialog systems also cre-
ate some serious challenges because crowd-workers have shown to
abuse these systems [
23
,
33
,
47
]. In the context of crowd-powered
dialog systems three malicious user groups have been identied
[
23
]: inappropriate workers (i.e., provide faulty or irrelevant infor-
mation), irters (i.e., are interested in the user’s true identity or
develop unnecessary personal connections), spammers (i.e., per-
forms abnormally large amount of meaningless actions in a task)
[
23
]. In particular, the case of Microsoft’s Tay has shown dramat-
ically what can go wrong with systems that automatically learn
from user generated content [47].
2.4 Chatbot Response Improvement
Mechanisms
Overall, chatbot developers need to ensure that user-generated
chatbot responses do not lead to oensive, inappropriate, and mean-
ingless responses and that the contributors have sucient domain
knowledge to propose meaningful chatbot responses [
23
,
33
,
47
].
This applies to crowd-workers, end-users, but also domain experts
who may not take the improvement task seriously. To reduce these
risks, several systems have investigated counter-mechanisms to
reduce the creation of malicious user improvements.
For example, Chorus [
23
] uses voting as a ltering mechanism
which worked fairly well in a eld deployment. However, the mech-
anism only worked when at least one other crowd-worker also
voted for a message [
23
]. In another study, the VizWiz system en-
sures that it always receives at least two response proposals from
two dierent crowd-workers. The results of a eld deployment
further revealed that it took on average three response propos-
als to always receive at least one correct response [
4
]. Another
approach is to award points to the responses when they contain
correct information [
59
]. A counter-mechanism to reduce the risk
of stealing sensitive user data is used by Edina [
31
]. Edina uses the
technique called self-dialogs in which crowd-workers author the
content of the chatbot and the content of the user. Another promis-
ing approach is to divide the improvement tasks into micro tasks
that prevent any crowd-worker from seeing too many information
[
33
]. However, the interaction context is very important in order
to correctly understand a natural language interaction [
20
]. Seeing
fractions of a conversation might not be sucient to correctly im-
prove a chatbot response. Finally, Fantom anonymized the dialogs,
but the anonymization still had to be done manually [27].
Summing up, chatbot developers need to carefully consider the
improvement mechanisms that can be used by domain experts to
improve chatbot responses as well as the mechanisms to reevaluate
the improved responses before they are nally shown to the end-
users.
3 DESIGNING A CHATBOT RESPONSE
GENERATION SYSTEM
In this section, we propose the design of a chatbot response gen-
eration system that enables chatbot developers to engage domain
experts in the chatbot response generation process. The high-level
design of the system is described in the next section. Subsequently,
the improvement mechanisms as well as the sentiment supported
review dashboard are described .
Session 6: Conversational UIs
335
MuC’20, September 6–9, 2020, Magdeburg, Germany Jasper Feine, Stefan Morana, and Alexander Maedche
Figure 2: High-level design of the chatbot response genera-
tion system.
3.1 High-level Design
The proposed system is developed as a web application. The main
dashboard and the main features of the web application are dis-
played in Figure 3. To start using the system, chatbot developer can
connect any existing chatbot via their API. The only requirement of
such a chatbot API is to exchange messages between the users and
the chatbot. Therefore, the system can be used to improve chatbots
with dierent types of dialog systems because the dialog system of
the to be improved chatbot still handles all the dialog management.
This means that the system acts as an additional layer between
the chatbot developers, domain experts, and the to be improved
chatbot without requiring access to the chatbot’s source code.
The high level design of the system is illustrated in Figure 2. It
illustrates that the system is not limited to one chatbot only but
functions as a platform that can be easily connected with several
chatbots via their API. In addition, an auto-generated chat window
can be shared with domain experts. This ensures that the system
scales well with the need of testing many dierent chatbot versions
and also reduces eort to develop specic chatbots that can be
connected to this system.
To develop a prototype of the proposed design, we decided to in-
clude chatbots that converse via Microsoft’s Direct Line 3.0 API [
43
].
After chatbot developer connected a chatbot via their Direct Line
API (see Figure 3, top-left), the system instantiates a knowledge-base
and generates a shareable chat window (see Figure 3, bottom-left).
Chatbot developers can then share a link of the chat window with
domain experts. Domain experts can then interact with the chatbot
and the chat window oers two improvement mechanisms in order
to directly improve the chatbot responses during an interaction
(described in detail in the following section). This enables domain
experts to directly improve disappointing chatbot responses during
the interaction which are then stored in the system’s knowledge
base. Finally, chatbot developers can review and delete the collected
chatbot responses using a sentiment supported review dashboard
which is described in section 3.2.
3.2 Improvement Mechanisms
We can classify chatbot response improvement mechanisms along
two counteracting continua, namely the need for a chatbot devel-
oper to review the collected chatbot response improvements and
the mechanism’s restrictiveness. At the one side of the continuum,
very unrestricted improvement mechanisms can lead to creative
and natural improvements that increase the quality of the chatbot.
However, they can also lead to potential malicious improvements
[
10
,
26
,
33
,
47
]. At the other side of the continuum, very restricted
improvement mechanisms that only allow users to change specic
sections of chatbot responses reduce the reviewing eort of chatbot
developers [
56
]. This limits the creativity and naturalness of the
proposed chatbot responses. A combination of restricted and less re-
stricted mechanisms could be a promising approach to increase the
language variation of chatbot responses while chatbot developers
can keep control over the response generation process. To instanti-
ate such improvement mechanisms, we developed a chat window
based on Microsoft’s WebChat [
42
]. The developed chat window
allows domain experts to directly improve chatbot responses with
both restricted and unrestricted improvement mechanisms during
a human-chatbot interaction.
The rst improvement mechanism limits the domain expert’s
degree of freedom to change the responses of a chatbot while also
allowing domain experts to increase the lexical and emotional vari-
ety of a chatbot. To design such a mechanism, we analyzed popular
online translators (i.e., Google translate, DeepL) that enable users
to improve given translations. For example, Google translate allows
users to click on a specic part of the sentence and then shows a
drop-down list with alternative translations. Users can then select
a more appropriate translation in order to directly manipulate the
translation in the web interface.
Based on this idea, we developed a similar improvement mech-
anism for chatbots and implemented it into the chat window. To
do so, the chat window sends all chatbot responses received via
the API to an instance of a Stanford CoreNLP server [
36
] before
displaying any chatbot responses. The CoreNLP server tags all
words regarding their part of speech (POS) tag. The chat window
then highlights all adjectives, adverbs, and verbs because they are
highly relevant to express emotions [
3
]. If domain experts want to
improve a word in a chatbot response, they can simply click on it.
This word is then, including its POS tag, sent to an instance of a
Princeton’s Word Net dictionary [
45
] which returns appropriate
synonyms. In the case the word is a verb, the JavaScript package
compromise [
28
] further transforms the verb into its appropriate
form. The synonyms are then displayed in a drop-down list and
the domain experts can select a more appropriate synonym. The
instantiation of the restricted mechanism is shown in in Figure 4.
The second improvement mechanism should enable domain ex-
perts to directly manipulate the chatbot interaction in the chat
window in order to simplify the mapping between goals and ac-
tions [
17
] and to encourage a feeling of engagement and power
of control [
17
,
55
]. Therefore, we developed a direct manipulation
mechanism that is characterized by a continuous representation of
the object of interests (i.e., chatbot responses), oers physical ac-
tions (i.e., domain experts can directly click on a chatbot response),
Session 6: Conversational UIs
336
A Chatbot Response Generation System MuC’20, September 6–9, 2020, Magdeburg, Germany
Figure 3: (Middle) Main dashboard of the web application which displays all connected chatbots and main functionalities; (top-
left) interface to connect chatbots via their API key; (bottom-left); automatically generated chat window that can be shared
with domain experts to improve the chatbot responses; (top-right) improvement dashboard that summarizes key measures of
the chatbot response improvement process; (bottom-right) review dashboard to review and delete collected chatbot responses.
Figure 4: Chat window with restricted improvement mecha-
nism.
and shows the impact of the users’ action immediately on the ob-
jects of interest (i.e., chatbot responses get immediately updated
in the chat window) [
55
]. This improvement mechanism enables
Figure 5: Chat window with unrestricted improvement
mechanism.
domain experts to freely improve and add chatbot responses in the
chat window as displayed in Figure 5. However, chatbot response
improvements created with this mechanism increase the reviewing
Session 6: Conversational UIs
337
MuC’20, September 6–9, 2020, Magdeburg, Germany Jasper Feine, Stefan Morana, and Alexander Maedche
eort of chatbot developers because domain experts may propose
malicious chatbot responses.
3.3 Review Dashboard
To reduce the reviewing eort of the collected chatbot responses,
the system supports chatbot developers by automatically deleting
all responses that contain profanity using the bad-words JavaScript
package [
41
]. In addition, the system oers a review dashboard
(see Figure 6) which enables chatbot developers to review, accept,
reject, or modify the collected chatbot responses which are then
updated in the system’s knowledge base. To further support chatbot
developers, the review dashboard analyzes all chatbot responses
using the Vader sentiment analysis package [
18
]. Improvements
with a negative sentiment score are highlighted in the review dash-
board. Developers can then easily delete or modify the collected
improvements. Finally, chatbot developer can export the collected
chatbot responses. The export can then be used to further enhance
the dialog system of the chatbot.
4 PILOT STUDY
We tested the system’s ability to improve chatbot responses in a
pilot study. In particular, we investigated how the system is actually
used in a real-world scenario. Therefore, we used the system to
improve the chatbot responses of an existing chatbot, namely the
student service chatbot of our institute. The student service chatbot
was developed using the Microsoft Bot Framework and can respond
to questions about employees, lectures, exams, and thesis projects.
It is available on our website and is mostly used by our students.
We connected the chatbot to the system by adding its Direct Line
3.0 API key and then engaged end-users with the required domain
knowledge in a response improvement process. To do so, we posted
the link of the chat window in two Facebook groups in which end-
users of the chatbot (i.e., students of our institute) coordinate and
discuss course contents. We asked them to improve the responses
of the student service chatbot because they know best how the
chatbot should respond in specic situations. The participation was
voluntarily and the participants did not receive any compensation.
After two days of data collection, several students interacted in
110 sessions with the chatbot via the system. The students sent in to-
tal 230 messages to the chatbot. They improved 36 complete chatbot
responses and added 27 new chatbot responses to the knowledge
base. In addition, they used the restricted improvement mechanism
to replace 8 synonyms of the original chatbot responses. Moreover,
one student changed one synonym as well as the text of a response.
Thus, a total of 72 chatbot responses were changed or added. As
a consequence, the design of the chatbot response generation sys-
tem enabled the students to easily improve the responses of the
institute’s chatbot.
Subsequently, we investigated the language variation of the
newly collected chatbot responses and compared them to the ini-
tial chatbot response set of the chatbot. To analyze the language
variation, we used a Stanford’s CoreNLP server [
36
] and tagged all
responses regarding their unique word lemmas and POS tags (see
Figure 7). The results revealed that the collected chatbot responses
contain a total of eight more unique adjectives, nine more unique
adverbs, and eleven more unique verbs than the original chatbot
response set. This illustrate that the lexical diversity of the chat-
bot responses were increased through the response improvements
proposed by the students.
Finally, we analyzed the chatbot responses that have been im-
proved most frequently. The most frequently improved chatbot
response was the welcome message (i.e., “Hello, I am the chatbot
of...”). The students proposed in total 10 dierent versions of it. This
variety of improvements leads to the challenge of selecting the best
welcome message. We discuss this challenge in the next section in
more detail. The second most frequently improved chatbot response
(7 times) was the chatbot’s excuse for not understanding a user’s
message (i.e., “I am sorry, but I did not understand your message”).
For example, one student asked whether the chatbot also speaks
German. The chatbot excused for not understanding the user and
the student improved the response by replacing it with “Nein tut
mir leid. I only speak English!”. This information can now be used
to update the dialog system of the chatbot in order to enable the
chatbot to respond to questions regarding its language capabilities.
Figure 7: Lexical variety of original chatbot responses and
of the improved chatbot responses collected during the pilot
study.
5 DISCUSSION
In this paper, we proposed the design of a system that engages
domain experts in the chatbot response generation process. The
system enables chatbot developers to connect an existing chatbot to
the system and enables domain experts to directly improve chatbot
responses during a conversation with this chatbot. In addition,
chatbot developers keep control and curation over the improvement
process because they can review the collected responses using a
sentiment supported review dashboard.
The proposed design of the system can be useful when chatbot
developers are looking to expand the natural language response
capabilities of a chatbot by involving domain experts in their de-
velopment process. As a consequence, the proposed system and
its components can be useful extensions for real-world chatbot im-
provement systems (e.g., Rasa X [
51
]) or chatbot development kits
(e.g., Microsoft’s Power Virtual Agents [
44
]). Such an approach can
enable chatbot developers and domain experts to collaboratively
enhance the natural language responses of a chatbot.
A major advantage of the system is its compatibility with dier-
ent types of existing chatbots independent of their technology and
goal. This is possible because the system only requires an API con-
nection to the chatbot that reveals all messages exchanged between
a user and the chatbot. All chatbot responses and the improve-
ments are directly displayed in the chat window and stored in the
Session 6: Conversational UIs
338
A Chatbot Response Generation System MuC’20, September 6–9, 2020, Magdeburg, Germany
Figure 6: Sentiment supported review dashboard which shows all collected chatbot response improvements. In addition, it
shows their interaction context. Color coding indicates the sentiment scores of the collected chatbot responses. Chatbot de-
velopers can delete or modify the collected chatbot responses before they export the data.
system’s knowledge base. This is a major advantage over other
improvement approaches that require a chatbot to be adapted to
a specic platform [
11
] or require additional eort in developing
decoupled prototypes, e.g. in the form of mockups.
Therefore, chatbot improvement systems that are able to con-
nect to existing chatbots via APIs can be a promising approach to
increase adoption of tool support because the threshold to use the
system is very low. Chatbot developers do not need to change the
source code of the chatbot or do not need to develop additional
mockups. They only need to connect an existing chatbot via their
API to the portal. Other promising approaches even generated a
chatbot directly from APIs by leveraging the crowd [
24
]. Such easy-
to-use approaches can support knowledge transfer from research
to practice because research ndings are not limited to a specic
chatbot instantiation and thus, can be directly applied in practice.
5.1 Limitations and Future Research Avenues
Our study comes with limitations that also suggest opportunities
for future research. First, it must be noted that the system was
only evaluated in a pilot study which revealed that it was indeed
seamlessly possible to connect an existing chatbot with the system
and that expert users such as students would use the system to
contribute new and enhanced chatbot responses. To further gain
understanding on how domain experts in organizations actually
perceive and use such a system, more evaluations are necessary
because the current pilot study only focused on one type of user
group. Therefore, we plan to conduct further laboratory and eld
evaluations with dierent user groups in order to show the benets
of such a system beyond the rst indications here. This also includes
to test how end-users of such a chatbot react to the improved
chatbot responses in order to demonstrate that the approach results
in the design of better chatbots.
Second, additional evaluations are necessary in order to investi-
gate the scalability of such an improvement approach. The current
design of the system leads to the creation of several response im-
provements for the same chatbot response because humans gener-
ally reply in a multitude of ways [
27
]. Chatbot developers need to
converge the collected improvements to an optimal set of chatbot re-
sponses in order to implement them in the dialog system. However,
it is quite dicult to distil a best suitable set of chatbot responses
because no best interaction style of a chatbot exists. The best chat-
bot response always depends on the user, task, and context of the
interaction [
15
,
16
]. For example, it has been shown that users with
dierent levels of task experience also prefer dierent language
styles of a chatbot [
8
]. To address this, chatbot developers would
need to develop chatbots that adapt their language style to the
preferences of an individual user. This has shown to be a promising
approach [
57
]. However, such approaches require more complex
dialog systems that do not only manage the dialog ow, but also
select the most appropriate chatbot response based on individual
user characteristics that have been collected by the system. Such
approaches could further help to develop chatbots that are able
to adapt to individual users [
14
,
54
,
58
], act as emotion regulators
[
48
,
53
], eectively support collaborative collocated talk-in-action
[49, 54], and support the future of work [35].
Third, future research should further investigate unrestricted
and other restricted user-driven improvement mechanisms to im-
prove chatbot responses while avoiding abuse. While synonym
selection, profanity detection, and sentiment analysis help chatbot
developers to identify harmful user improvements, more automated
approaches [
33
] can make user-driven chatbot improvement ap-
proaches much more ecient. In this regards, existing approaches
from FAQ systems could be extended to a chatbot context in which
end-users are capable of rating the received chatbot responses.
Fourth, it must be noted that the system’s improvement mech-
anisms may not be equally useful for all types of chatbots. For
Session 6: Conversational UIs
339
MuC’20, September 6–9, 2020, Magdeburg, Germany Jasper Feine, Stefan Morana, and Alexander Maedche
chatbots that mainly employ predened dialogue ows, corrected
or reworked answer alternatives may be a substantial contribution.
However, a chatbot response is often written to serve as a suitable
response to a broad range of user requests. Hence, reworking the
chatbot responses based on the experiences of a single user may
be counterproductive in some contexts. Therefore, future research
could extend the proposed system to improve the complete dialog
ow of a chatbot in regards to the interaction context.
Fifth, the design of the system is currently restricted to improve
responses of chatbots (i.e., text-based conversational agents). How-
ever, voice-based conversational agents are becoming increasingly
important [61]. They have lower barriers of entry and use [7] and
enable interactions separate from the already busy chat modality
[
30
]. Consequently, future research can extend the design of the
proposed system to investigate response generation systems for
voice-based conversational agents.
6 CONCLUSION
In this paper, we propose the design of a system that enables chatbot
developers to eciently engage domain experts in the chatbot re-
sponse generation process. The system enables chatbot developers
to connect existing chatbots via an API to the system. The system
enables domain experts to improve the chatbot responses during
an interaction. We tested the system with students in a pilot study.
Overall, the design of the system and its improvement mechanisms
can be useful extensions for chatbot development systems in order
to support chatbot developers and domain experts to collaboratively
enhance the natural language responses of a chatbot.
REFERENCES
[1]
Martin Adam, Michael Wessel, and Alexander Benlian. 2020. AI-based chatbots
in customer service and their eects on user compliance. Electronic Markets 9, 2
(2020), 204.
[2]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014.
Power to the people: The role of humans in interactive machine learning. AI
MAGAZINE 35, 4 (2014), 105–120. https://doi.org/10.1609/aimag.v35i4.2513
[3]
Farah Benamara, Carmine Cesarano, Antonio Picariello, and Venkatramana S.
Subrahmanian. 2019. Sentiment analysis: Adjectives and adverbs are better than
adjectives alone. In Processdings of ICWSM. Academic Press.
[4]
Jerey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller,
Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, and Samual
White. 2012. VizWiz: nearly real-time answers to visual questions. In Proceedings
of the 23nd annual ACM symposium on User interface software and technology.
ACM. https://doi.org/10.1145/1866029.1866080
[5]
Jerey P. Bigham, Richard E. Ladner, and Yevgen Borodin. 2011. The De-
sign of Human-Powered Access Technology. In The Proceedings of the 13th
International ACM SIGACCESS Conference on Computers and Accessibility (AS-
SETS ’11). Association for Computing Machinery, New York, NY, USA, 3–10.
https://doi.org/10.1145/2049536.2049540
[6]
Petter Bae Brandtzaeg and Asbjørn Følstad. 2018. Chatbots: Changing User Needs
and Motivations. Interactions 25, 5 (2018), 38–43. https://doi.org/10.1145/3236669
[7]
Robin N. Brewer, Leah Findlater, Joseph ’Josh’ Kaye, Walter Lasecki, Cosmin
Munteanu, and Astrid Weber. 2018. Accessible Voice Interfaces. In Conference
on Computer Supported Cooperative Work and Social Computing. ACM, 441–446.
https://doi.org/10.1145/3272973.3273006
[8]
Veena Chattaraman, Wi-Suk Kwon, Juan E. Gilbert, and Kassandra Ross. 2019.
Should AI-Based, conversational digital assistants employ social- or task-oriented
interaction style? A task-competency and reciprocity perspective for older adults.
Computers in Human Behavior 90 (2019), 315–330. https://doi.org/10.1016/j.chb.
2018.08.048
[9] ConvAI. 2018. The Conversational Intelligence Challenge 2. http://convai.io/
[10]
Florian Daniel, Cinzia Cappiello, and Boualem Benatallah. 2019. Bots Acting Like
Humans: Understanding and Preventing Harm. IEEE Internet Computing 23, 2
(2019), 40–49. https://doi.org/10.1109/MIC.2019.2893137
[11]
Stephan Diederich, Alfred Brendel, and Lutz M Kolbe. 2019. Towards a Taxonomy
of Platforms for Conversational Agent Design. In 14. International Conference on
Wirtschaftsinformatik (WI2019).
[12]
Stephan Diederich, Alfred Benedikt Brendel, and Lutz M. Kolbe. 2020. Designing
Anthropomorphic Enterprise Conversational Agents. Business & Information
Systems Engineering (2020). https://doi.org/10.1007/s12599-020-00639- y
[13]
Jasper Feine, Ulrich Gnewuch, Stefan Morana, and Alexander Maedche. 2019. A
Taxonomy of Social Cues for Conversational Agents. International Journal of
Human-Computer Studies 132 (2019), 138–161. https://doi.org/10.1016/j.ijhcs.
2019.07.009
[14]
Jasper Feine, Stefan Morana, and Ulrich Gnewuch. 2019. Measuring Service
Encounter Satisfaction with Customer Service Chatbots using Sentiment Analysis.
In 14. Internationale Tagung Wirtschaftsinformatik (WI2019).
[15]
Jasper Feine, Stefan Morana, and Alexander Maedche. 2019. Designing a Chat-
bot Social Cue Conguration System. In Proceedings of the 40th International
Conference on Information Systems (ICIS). AISel, Munich.
[16]
Jasper Feine, Stefan Morana, and Alexander Maedche. 2019. Leveraging Machine-
Executable Descriptive Knowledge in Design Science Research – The Case of
Designing Socially-Adaptive Chatbots. In Extending the Boundaries of Design
Science Theory and Practice, Bengisu Tulu, Soussan Djamasbi, and Gondy Leroy
(Eds.). Springer International Publishing, Cham, 76–91.
[17]
David Frohlich. 1993. The history and future of direct manipulation. Be-
haviour & Information Technology 12, 6 (1993), 315–329. https://doi.org/10.
1080/01449299308924396
[18]
C. Hutto EricJ Gilbert. 2014. Vader: A parsimonious rule-based model for senti-
ment analysis of social media text. In Eighth International AAAI Conference on
Weblogs and Social Media. Ann Arbor, MI, USA.
[19]
Ulrich Gnewuch, Stefan Morana, and Alexander Maedche. 2017. Towards De-
signing Cooperative and Social Conversational Agents for Customer Service. In
Proceedings of the 38th International Conference on Information Systems (ICIS).
AISel, Seoul.
[20]
S. R. Gouravajhala, Y. Jiang, P. Kaur, and J. Chaar. 2018. Finding Mnemo: Hybrid
Intelligence Memory in a Crowd-Powered Dialog System. In Collective Intelligence
Conference (CI 2018). Zurich, Switzerland.
[21]
J. Harms, P. Kucherbaev, A. Bozzon, and G. Houben. 2019. Approaches for Dialog
Management in Conversational Agents. IEEE Internet Computing 23, 2 (2019),
13–22. https://doi.org/10.1109/MIC.2018.2881519
[22]
Ting-Hao Huang, Joseph Chee Chang, and Jerey P. Bigham. 2018. Evorus: A
Crowd-powered Conversational Assistant Built to Automate Itself Over Time. In
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems -
CHI ’18. ACM Press, 1–13. https://doi.org/10.1145/3173574.3173869
[23]
Ting-Hao Kenneth Huang, Walter S. Lasecki, Amos Azaria, and Jerey P. Bigham.
2016. "Is There Anything Else I Can Help You With?" Challenges in Deploying an
On-Demand Crowd-Powered Conversational Agent. In Fourth AAAI Conference
on Human Computation and Crowdsourcing.
[24]
Ting-Hao Kenneth Huang, Walter S. Lasecki, and Jerey P. Bigham. 2015.
Guardian: A crowd-powered spoken dialog system for web apis. In Third AAAI
conference on human computation and crowdsourcing.
[25]
Mohit Jain, Pratyush Kumar, Ramachandra Kota, and Shwetak N. Patel. 2018.
Evaluating and Informing the Design of Chatbots. In DIS 2018, Ilpo Koskinen,
Youn-kyung Lim, Teresa Cerratto-Pargman, Kenny Chow, and William Odom
(Eds.). Association for Computing Machinery, [New York, NY], 895–906. https:
//doi.org/10.1145/3196709.3196735
[26]
Youxuan Jiang, Jonathan K. Kummerfeld, and Walter S. Lasecki. 2017. Under-
standing Task Design Trade-os in Crowdsourced Paraphrase Collection. In
Proceedings of the 55th Annual Meeting of the Association for. Association for
Computational Linguistics, 103–109. https://doi.org/10.18653/v1/P17-2017
[27]
Patrik Jonell, Mattias Bystedt, Fethiye Irmak Dogan, Per Fallgren, Jonas Ivarsson,
Marketa Slukova, José Lopes Ulme Wennberg, Johan Boye, and Gabriel Skantze.
2018. Fantom: A Crowdsourced Social Chatbot using an Evolving Dialog Graph.
In 1st Proceedings of Alexa Prize.
[28]
Spencer Kelly. 17.09.2019. compromise: modest natural-language processing in
javascript. https://github.com/spencermountain/compromise
[29]
Scott R. Klemmer, Anoop K. Sinha, Jack Chen, James A. Landay, Nadeem
Aboobaker, and Annie Wang. 2000. Suede: a Wizard of Oz prototyping tool
for speech user interfaces. In Proceedings of the 13th annual ACM symposium on
User interface software and technology. ACM, 1–10.
[30]
Rafal Kocielnik, Daniel Avrahami, Jennifer Marlow, Di Lu, and Gary Hsieh. 2018.
Designing for workplace reection: a chat and voice-based conversational agent.
In In Proceedings of the 2018 Designing Interactive Systems Conference (DIS ’18).
ACM, 881–894. https://doi.org/10.1145/3196709.3196784
[31]
Ben Krause, Marco Damonte, Mihai Dobre, Daniel Duma, Joachim Fainberg,
Federico Fancellu, Emmanuel Kahembwe, Jianpeng Cheng, and Bonnie Webber.
2017. Edina: Building an open domain socialbot with self-dialogues. In 1st
Proceedings of Alexa Prize.
[32]
Walter S. Lasecki, Kyle I. Murray, Samuel White, Robert C. Miller, and Jerey P.
Bigham. 2011. Real-time crowd control of existing interfaces. In Proceedings of
the 24th Annual ACM Symposium on User Interface Software and Technology, Je
Pierce, Maneesh Agrawala, and Scott Klemmer (Eds.). ACM Press, New York, NY,
23. https://doi.org/10.1145/2047196.2047200
Session 6: Conversational UIs
340
A Chatbot Response Generation System MuC’20, September 6–9, 2020, Magdeburg, Germany
[33]
Walter S. Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction
and manipulation threats in crowd-powered systems. In Proceedings of the ACM
conference on Computer supported cooperative work & social computing. ACM.
https://doi.org/10.1145/2531602.253173
[34]
Walter S. Lasecki, Rachel Wesley, Jerey Nichols, Anand Kulkarni, James F. Allen,
and Jerey P. Bigham. 2013. Chorus: a crowd-powered conversational assistant.
In Proceedings of the 26th annual ACM symposium on User interface software and
technology. ACM. https://doi.org/10.1145/2501988.2502057
[35]
Alexander Maedche, Christine Legner, Alexander Benlian, Benedikt Berger, Hen-
ner Gimpel, Thomas Hess, Oliver Hinz, Stefan Morana, and Matthias Söllner.
2019. AI-Based Digital Assistants. Business & Information Systems Engineering
61, 4 (2019), 535–544.
[36]
Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard,
and David McClosky. 2014. The Stanford CoreNLP natural language processing
toolkit. In Proceedings of 52 nd Annual Meeting of the Association for Computational
Linguistics:. ACL.
[37]
Moira McGregor and John C. Tang. 2017. More to Meetings: Challenges in
Using Speech-Based Technology to Support Meetings. In Proceedings of the 2017
ACM Conference on Computer Supported Cooperative Work and Social Computing
(CSCW ’17). ACM, New York, NY, USA, 2208–2220. https://doi.org/10.1145/
2998181.2998335
[38]
M. McTear, Z. Callejas, and D. Griol. 2016. The Conversational Interface: Talking
to Smart Devices (1st ed.). Springer International Publishing, Switzerland.
[39]
Michael F. McTear. 2002. Spoken dialogue technology: enabling the conversa-
tional user interface. Comput. Surveys 34, 1 (2002), 90–169.
[40]
Michael F. McTear. 2017. The Rise of the Conversational Interface: A New Kid
on the Block?. In Future and Emerging Trends in Language Technology. Machine
Learning and Big Data, José F. Quesada, Francisco-Jesús Martín Mateos, and
Teresa López Soto (Eds.). Springer International Publishing, Cham, 38–49.
[41]
Michael Price. 17.09.2019. bad-words: A javascript lter for badwords. https:
//github.com/web-mech/badwords
[42]
Microsoft. 2019. Bot Framework Web Chat. https://github.com/microsoft/
BotFramework-WebChat
[43]
Microsoft. 2019. Microsoft Bot Framework Direct Line JS Client. https://github.
com/microsoft/BotFramework-DirectLineJS
[44]
Microsoft. 2019. Microsoft Power Virtual Agents. https://powervirtualagents.
microsoft.com/en-us/
[45]
George A. Miller. 1995. WordNet: a lexical database for English. Commun. ACM
38, 11 (1995), 39–41. https://doi.org/10.1145/219717.219748
[46]
Maali Mnasri. 2019. Recent advances in conversational NLP: Towards the stan-
dardization of Chatbot building. arXiv preprint arXiv:1903.09025 (2019).
[47]
Gina Neand Peter Nagy. 2016. Automation, algorithms, and politics| talking to
Bots: Symbiotic agency and the case of Tay. International Journal of Communica-
tion 10 (2016), 17.
[48]
Zhenhui Peng, Taewook Kim, and Xiaojuan Ma. 2019. GremoBot. In Conference
Companion Publication of the 2019 on Computer Supported Cooperative Work and
Social Computing - CSCW ’19, Eric Gilbert and Karrie Karahalios (Eds.). ACM Press,
New York, New York, USA, 335–340. https://doi.org/10.1145/3311957.3359472
[49]
Martin Porcheron, Joel E. Fischer, Moira McGregor, Barry Brown, Ewa Luger,
Heloisa Candello, and Kenton O’Hara. 2017. Talking with Conversational Agents
in Collaborative Action. In Companion of the 2017 ACM Conference on Computer
Supported Cooperative Work and Social Computing, Charlotte P. Lee (Ed.). ACM,
New York, NY, 431–436. https://doi.org/10.1145/3022198.3022666
[50]
Martin Porcheron, Joel E. Fischer, and Sarah Sharples. 2017. "Do Animals Have A
ccents?". In CSCW’17, Charlotte P. Lee, Steve Poltrock, Louise Barkhuus, Marcos
Borges, and Wendy Kellogg (Eds.). The Association for Computing Machinery,
New York, New York, 207–219. https://doi.org/10.1145/2998181.2998298
[51]
Rasa. 2019. Improve your contextual assistant with Rasa X. https://rasa.com/
docs/rasa-x/
[52]
Tony Russell-Rose and Tyler Tate. 2013. Designing the search experience: The
information architecture of discovery. Morgan Kaufmann/Elsevier, Amsterdam.
https://ebookcentral.proquest.com/lib/subhh/detail.action?docID=1046391
[53]
Isabella Seeber, Lena Waizenegger, Stefan Seidel, Stefan Morana, Izak Benbasat,
and Paul Benjamin Lowry. 2019. Collaborating with Technology-Based Au-
tonomous Agents: Issues and Research Opportunities. SSRN Electronic Journal
(2019). https://doi.org/10.2139/ssrn.3504587
[54]
Joseph Seering, Michal Luria, GeoKaufman, and Jessica Hammer. 2019. Beyond
Dyadic Interactions. In Proceedings of the 2019 CHI Conference on Human Factors
in Computing Systems - CHI ’19. ACM Press, New York, USA, 1–13. https:
//doi.org/10.1145/3290605.3300680
[55]
Ben Shneiderman. 1997. Direct manipulation for comprehensible, predictable
and controllable user interfaces. In Proceedings of the international conference on
Intelligent user interfaces. ACM, 33–39. https://doi.org/10.1145/238218.238281
[56]
Mark S. Silver. 2008. On the Design Features of Decision Support Systems:
The Role of System Restrictiveness and Decisional Guidance. In Handbook on
Decision Support Systems 2: Variations, Frada Burstein and Clyde W. Holsapple
(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 261–291.
[57]
Paul Thomas, Mary Czerwinski, Daniel McDu, Nick Craswell, and Gloria Mark.
2018. Style and Alignment in Information-Seeking Conversation. In Proceedings
of the 2018 Conference on Human Information Interaction&Retrieval - CHIIR ’18,
Chirag Shah, Nicholas J. Belkin, Katriina Byström, JeHuang, and Falk Scholer
(Eds.). ACM Press, New York, New York, USA, 42–51. https://doi.org/10.1145/
3176349.3176388
[58]
Felipe Thomaz, Carolina Salge, Elena Karahanna, and John Hulland. 2020. Learn-
ing from the Dark Web: leveraging conversational agents in the era of hyper-
privacy to enhance marketing. Journal of the Academy of Marketing Science 48, 1
(2020), 43–63. https://doi.org/10.1007/s11747-019- 00704-3
[59]
Luis von Ahn and Laura Dabbish. 2004. Labeling images with a computer game.
In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
Elizabeth Dykstra-Erickson (Ed.). ACM, New York, NY, 319–326. https://doi.org/
10.1145/985692.985733
[60]
Jason D. Williams, Kavosh Asadi, and Georey Zweig. 2017. Hybrid code net-
works: practical and ecient end-to-end dialog control with supervised and
reinforcement learning. arXiv preprint (2017).
[61]
Rainer Winkler, Sebastian Hobert, Antti Salovaara, Matthias Söllner, and
Jan Marco Leimeister. 2020. Sara, the Lecturer: Improving Learning in On-
line Education with a Scaolding-Based Conversational Agent. In Proceedings
of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, 1–14.
https://doi.org/10.1145/3313831.3376781
[62]
Yu Zhong, WalterS. Lasecki, Erin Brady, and Jerey P. Bigham. 2015. RegionSpeak:
Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users. In
Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing
Systems, Bo Begole (Ed.). ACM, New York, NY, 2353–2362. https://doi.org/10.
1145/2702123.2702437
Session 6: Conversational UIs
341
... In view of the necessity of reviewing conversations in a dedicated platform, another work developed a web platform that connects to existing chatbots via API and provides an interface for domain experts to evaluate conversations and suggest improvements to the chatbots [21]. The most interesting feature of this work is the existence of deployment stages. ...
Article
Full-text available
Managing and evolving a chatbot’s content is a laborious process and there is still a lack of standardization. In this context of standardization, the absence of a management process can lead to bad user experiences with a chatbot. This work proposes the Chatbot Management Process, a methodology for content management on chatbot systems. The proposed methodology is based on the experiences acquired with the development of Evatalk, the chatbot for the Brazilian Virtual School of Government. The focus of this methodology is to evolve the chatbot content through the analysis of user interactions, allowing a cyclic and human-supervised process. We divided the proposed methodology into three distinct phases, namely, manage, build, and analyze. Moreover, the proposed methodology presents a clear definition of the roles of the chatbot team. We validate the proposed methodology along with the creation of the Evatalk chatbot, whose amount of interactions was of 22,771 for the 1,698,957 enrolled attendees in the Brazillian Virtual School of Government in 2020. The application of the methodology on Evatalk’s chatbot brought positive results: we reduced the chatbot’s human hand-off rate from 44.43% to 30.16%, the chatbot’s knowledge base examples increased by 160% whilst maintaining a high percentage of confidence in its responses and keeping the user satisfaction collected in conversations stable.
... In turn, social cues can increase user satisfaction (Gnewuch, Morana, Adam, & Maedche, 2018;Rietz, Benke, & Maedche, 2019) and help conversational agents to adapt to specific users, contexts, and tasks (Feine, Morana, & Maedche, 2019b). From an enterprise perspective, conversational agents might be a threat to rationalize human workers (Feine, Morana, & Maedche, 2020c, 2020d but can also provide valuable support in employees' work routines (Benke, Knierim, & Maedche, 2020;Feine, Adam, Benke, Maedche, & Benlian, 2020a). Ergo, conversational agent designers always have to consider both positive and negative design implications and must engage in ethical considerations and design trade-offs (André et al., 2019;Benke, 2020). ...
Chapter
Full-text available
Technological innovations raise axiological questions such as what is right or wrong, good and bad, and so on (i.e., ethical considerations). These considerations have particular importance in design science research (DSR) projects since the developed artifacts often actively intervene into human affairs and, thus, cannot be free from value. To account for this fact, Myers and Venable (2014) proposed six ethical principles for DSR in order to support researchers to conduct ethical DSR. However, ethical principles per se — and the ethical DSR principles that Myers and Venable propose — have an abstract nature so that they can apply to a broad range of contexts. As a consequence, they do not necessarily apply to specific research projects, which means researchers need to contextualize them for each specific DSR project. Because doing so involves much challenge, we explore how contemporary DSR publications have dealt with this contextualization task and how they implemented the six ethical principles for DSR. Our results reveal that DSR publications have not discussed ethical principles in sufficient depth. To further promote ethical considerations in DSR, we argue that both DSR researchers and reviewers should be supported in implementing ethical principles. Therefore, we outline two pathways toward ethical DSR. First, we propose that researchers need to articulate the next generation of ethical principles for DSR using prescriptive knowledge structures from DSR. Second, we propose extending established DSR conceptualizations with an ethical dimension and specifically introduce the concept of ethical DSR process models. With this work, we contribute to the IS literature by reviewing ethical principles and their implementation in DSR, identifying potential challenges hindering efforts to implement ethics in DSR, and providing two pathways towards ethical DSR.
... The system that enables domain experts to improve the responses of a chatbot was developed as a Java web application (see description of the system's architecture in Feine et al. (2020c)). It can be connected with any existing chatbots via their message API. ...
Conference Paper
Full-text available
Domain experts with their special knowledge and understanding of a specific field are critical in the development of chatbots. However, engaging domain experts in the chatbot development is time-consuming and cumbersome because developers lack adequate systems. We address this problem by proposing three design principles for interactive chatbot development systems grounded in the interactivity effects model. We instantiate the proposed design and evaluate the resulting artifact in an online experiment. The results of the online experiment (N=70) show that the proposed design significantly increases subjective and objective engagement and that perceived interactivity mediates these effects. Our study contributes with prescriptive knowledge for designing interactive systems that increase engagement and with a novel artifact in the form of an interactive chatbot development system.
Article
Full-text available
Communicating with customers through live chat interfaces has become an increasingly popular means to provide real-time customer service in many e-commerce settings. Today, human chat service agents are frequently replaced by conversational software agents or chatbots, which are systems designed to communicate with human users by means of natural language often based on artificial intelligence (AI). Though cost- and time-saving opportunities triggered a widespread implementation of AI-based chatbots, they still frequently fail to meet customer expectations, potentially resulting in users being less inclined to comply with requests made by the chatbot. Drawing on social response and commitment-consistency theory, we empirically examine through a randomized online experiment how verbal anthropomorphic design cues and the foot-in-the-door technique affect user request compliance. Our results demonstrate that both anthropomorphism as well as the need to stay consistent significantly increase the likelihood that users comply with a chatbot’s request for service feedback. Moreover, the results show that social presence mediates the effect of anthropomorphic design cues on user compliance.
Article
Full-text available
The increasing capabilities of conversational agents (CAs) offer manifold opportunities to assist users in a variety of tasks. In an organizational context, particularly their potential to simulate a human-like interaction via natural language currently attracts attention both at the customer interface as well as for internal purposes, often in the form of chatbots. Emerging experimental studies on CAs study the impact of anthropomorphic design elements, so-called social cues, on user perception. However, while these studies provide valuable prescriptive knowledge on selected social cues, they neglect the potential detrimental influence of the limited responsiveness of present-day conversational agents. In practice, many CAs fail to continuously provide meaningful responses in a conversation due to the open nature of natural language interaction, which negatively influences user perception and often led to CAs being discontinued in the past. Thus, designing a CA that provides a human-like interaction experience while minimizing the risks associated with limited conversational capabilities represents a substantial design problem. This study addresses the aforementioned problem by proposing and evaluating a design for a CA that offers a human-like interaction experience while mitigating negative effects due to limited responsiveness. Through the presentation of the artifact and the synthesis of prescriptive knowledge in the form of a nascent design theory for anthropomorphic enterprise CAs, this research adds to the growing knowledge base on designing human-like assistants and supports practitioners seeking to introduce them in their organizations.
Article
Full-text available
Purpose: This article reports the results from a panel discussion held at the 2019 European Conference on Information Systems (ECIS) on the use of technology-based autonomous agents in collaborative work. Approach: The panelists (Drs. presented ideas related to affective and cognitive implications of using autonomous technology-based agents in terms of (1) emotional connection with these agents, (2) decision making, and (3) knowledge and learning in settings with autonomous agents. These ideas provided the basis for a moderated panel discussion (the moderators were: Drs. Isabella Seeber and Lena Waizenegger), during which the initial position statements were elaborated on and additional issues were raised. Findings: Through the discussion, a set of additional issues were identified. These issues related to (1) the design of autonomous technology-based agents in terms of human-machine workplace configurations, as well as transparency and explainability, and (2) the unintended consequences of using autonomous technology-based agents in terms of de-evolution of social interaction, prioritization of machine teammates, psychological health, and biased algorithms. Originality/value: Key issues related to the affective and cognitive implications of using autonomous technology-based agents, design issues, and unintended consequences highlight key contemporary research challenges that allow researchers in this area to leverage compelling questions that can guide further research in this field.
Article
Full-text available
The Web is a constantly evolving, complex system, with important implications for both marketers and consumers. In this paper, we contend that over the next five to ten years society will see a shift in the nature of the Web, as consumers, firms and regulators become increasingly concerned about privacy. In particular, we predict that, as a result of this privacy-focus, various information sharing and protection practices currently found on the Dark Web will be increasingly adapted in the overall Web, and in the process, firms will lose much of their ability to fuel a modern marketing machinery that relies on abundant, rich, and timely consumer data. In this type of controlled information-sharing environment, we foresee the emersion of two distinct types of consumers: (1) those generally willing to share their information with marketers (Buffs), and (2) those who generally deny access to their personal information (Ghosts). We argue that one way marketers can navigate this new environment is by effectively designing and deploying conversational agents (CAs), often referred to as “chatbots.” In particular, we propose that CAs may be used to understand and engage both types of consumers, while providing personalization, and serving both as a form of differentiation and as an important strategic asset for the firm—one capable of eliciting self-disclosure of otherwise private consumer information.
Conference Paper
Full-text available
Maintaining a positive group emotion is important for team collaboration. It is, however, a challenging task for self-managing teams especially when they conduct intra-group collaboration via text-based communication tools. Recent advances in AI technologies open the opportunity of using chatbots for emotion regulation in group chat. However, little is known about how to design such a chatbot and how group members react to its presence. As an initial exploration, we design GremoBot based on text analysis technology and emotion regulation literature. We then conduct a study with nine three-person teams performing different types of collective tasks. In general, participants find GremoBot useful for reinforcing positive feelings and steering them away from negative words. We further discuss the lessons learned and considerations derived for designing a chatbot for group emotion management.
Conference Paper
Full-text available
Social cues (e.g., gender, age) are important design features of chatbots. However, choosing a social cue design is challenging. Although much research has empirically investigated social cues, chatbot engineers have difficulties to access this knowledge. Descriptive knowledge is usually embedded in research articles and difficult to apply as prescriptive knowledge. To address this challenge, we propose a chatbot social cue configuration system that supports chatbot engineers to access descriptive knowledge in order to make justified social cue design decisions (i.e., grounded in empirical research). We derive two design principles that describe how to extract and transform descriptive knowledge into a prescriptive and machine-executable representation. In addition, we evaluate the prototypical instantiations in an exploratory focus group and at two practitioner symposia. Our research addresses a contemporary problem and contributes with a generalizable concept to support researchers as well as practitioners to leverage existing descriptive knowledge in the design of artifacts.
Article
Full-text available
Conversational agents (CAs) are software-based systems designed to interact with humans using natural language and have attracted considerable research interest in recent years. Following the Computers Are Social Actors paradigm, many studies have shown that humans react socially to CAs when they display social cues such as small talk, gender, age, gestures, or facial expressions. However, research on social cues for CAs is scattered across different fields, often using their specific terminology, which makes it challenging to identify, classify, and accumulate existing knowledge. To address this problem, we conducted a systematic literature review to identify an initial set of social cues of CAs from existing research. Building on classifications from interpersonal communication theory, we developed a taxonomy that classifies the identified social cues into four major categories (i.e., verbal, visual, auditory, invisible) and ten subcategories. Subsequently, we evaluated the mapping between the identified social cues and the categories using a card sorting approach in order to verify that the taxonomy is natural, simple, and parsimonious. Finally, we demonstrate the usefulness of the taxonomy by classifying a broader and more generic set of social cues of CAs from existing research and practice. Our main contribution is a comprehensive taxonomy of social cues for CAs. For researchers, the taxonomy helps to systematically classify research about social cues into one of the taxonomy's categories and corresponding subcategories. Therefore, it builds a bridge between different research fields and provides a starting point for interdisciplinary research and knowledge accumulation. For practitioners, the taxonomy provides a systematic overview of relevant categories of social cues in order to identify, implement, and test their effects in the design of a CA.
Article
Full-text available
This article summarizes the panel discussion at the International Conference on Wirtschafts-informatik in March 2019 in Siegen (WI 2019) and presents different perspectives on AI-based digital assistants. It sheds light on (1) application areas, opportunities, and threats as well as (2) the BISE community’s roles in the field of AI-based digital assistants. The different authors’ contributions emphasize that BISE, as a socio-technical discipline, must address the designs and the behaviors of AI-based digital assistants as well as their interconnections. They have identified multiple research opportunities to deliver descriptive and prescriptive knowledge, thereby actively shaping future interactions between users and AI-based digital assistants. We trust that these inputs will lead BISE researchers to take active roles and to contribute an IS perspective to the academic and the political discourse about AI-based digital assistants.
Chapter
Full-text available
In Design Science Research (DSR) it is important to build on descriptive (Ω) and prescriptive (Λ) state-of-the-art knowledge in order to provide a solid grounding. However, existing knowledge is typically made available via scientific publications. This leads to two challenges: first, scholars have to manually extract relevant knowledge pieces from the data-wise unstructured textual nature of scientific publications. Second, different research results can interact and exclude each other, which makes an aggregation, combination, and application of extracted knowledge pieces quite complex. In this paper, we present how we addressed both issues in a DSR project that focuses on the design of socially-adaptive chatbots. Therefore, we outline a two-step approach to transform phenomena and relationships described in the Ω-knowledge base in a machine-executable form using ontologies and a knowledge base. Following this new approach, we can design a system that is able to aggregate and combine existing Ω-knowledge in the field of chatbots. Hence, our work contributes to DSR methodology by suggesting a new approach for theory-guided DSR projects that facilitates the application and sharing of state-of-the-art Ω-knowledge.