Conference PaperPDF Available

Designing Interactive Chatbot Development Systems

Authors:

Abstract and Figures

Domain experts with their special knowledge and understanding of a specific field are critical in the development of chatbots. However, engaging domain experts in the chatbot development is time-consuming and cumbersome because developers lack adequate systems. We address this problem by proposing three design principles for interactive chatbot development systems grounded in the interactivity effects model. We instantiate the proposed design and evaluate the resulting artifact in an online experiment. The results of the online experiment (N=70) show that the proposed design significantly increases subjective and objective engagement and that perceived interactivity mediates these effects. Our study contributes with prescriptive knowledge for designing interactive systems that increase engagement and with a novel artifact in the form of an interactive chatbot development system.
Content may be subject to copyright.
This is the author’s version of a work that was published in the following source
Feine, J., Morana S. and Maedche A. (2019). “Designing Interactive Chatbot Development Systems”. In
Proceedings of the 41st International Conference on Information Systems (ICIS).
India: AISel.
Please note: Copyright is owned by the author and / or the publisher.
Commercial use is not allowed.
Institute of Information Systems and Marketing (IISM)
Kaiserstrasse 89
76133 Karlsruhe - Germany
http://iism.kit.edu
Karlsruhe Service Research Institute (KSRI)
Kaiserstraße 89
76133 Karlsruhe Germany
http://ksri.kit.edu
© 2017. This manuscript version is made available under the CC-
BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-
nc-nd/4.0/
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
1
Designing Interactive Chatbot Development
Systems
Completed Research Paper
Jasper Feine
Karlsruhe Institute of Technology (KIT)
Karlsruhe, Germany
jasper.feine@kit.edu
Stefan Morana
Saarland University
Saarbruecken, Germany
stefan.morana@uni-saarland.de
Alexander Maedche
Karlsruhe Institute of Technology (KIT)
Karlsruhe, Germany
alexander.maedche@kit.edu
Abstract
Domain experts with their special knowledge and understanding of a specific field are
critical in the development of chatbots. However, engaging domain experts in the chatbot
development is time-consuming and cumbersome because developers lack adequate
systems. We address this problem by proposing three design principles for interactive
chatbot development systems grounded in the interactivity effects model. We instantiate
the proposed design and evaluate the resulting artifact in an online experiment. The
results of the online experiment (N=70) show that the proposed design significantly
increases subjective and objective engagement and that perceived interactivity mediates
these effects. Our study contributes with prescriptive knowledge for designing interactive
systems that increase engagement and with a novel artifact in the form of an interactive
chatbot development system.
Keywords: chatbot, domain expert, development, engagement, design science research
Introduction
A specific class of AI-based systems that has become popular in recent years are chatbots. Chatbots are
software-based systems designed to interact with humans via text-based natural language (Feine et al.
2019a; McTear et al. 2016). One of the major drawbacks of contemporary chatbots is that interacting with
them is often not perceived as being natural (Grudin and Jacques 2019). Most of the times, their responses
are constrained, not contingent, and contain meaningless and nonsensical information (Coniam 2014; Go
and Sundar 2019; Grudin and Jacques 2019). While humans acquire communicative skills over time,
chatbot developers need to teach chatbots what to respond in specific situations (Rieser and Lemon 2011).
Therefore, chatbots are usually developed in collaboration with and approved by domain experts that have
the required knowledge and understanding of the essential aspects of a specific field of inquiry (Rieser and
Lemon 2011; Russell-Rose and Tate 2013). However, developers often face difficulties to involve domain
experts with limited technological skills in the development process. A reason for this is, among others, that
developers lack systems that enable and motivate domain experts to engage in the chatbot development
process (Harms et al. 2019). Domain experts usually chat with a chatbot prototype and then share feedback
over a separate channel (e.g., spreadsheets, verbal communication). This limits the domain expert’s ability
to directly influence the chatbot’s responses and negatively impacts their motivation to engage in the
chatbot’s development process (Amershi et al. 2014; Harms et al. 2019).
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
2
Novel systems that enable developers to engage domain experts to share their domain-specific knowledge
can increase the effectiveness and efficiency of chatbot development (Harms et al. 2019; Huang et al. 2018;
Rieser and Lemon 2011). A promising approach to increase the engagement of domain experts is to design
interactive development systems that enable the flow of information not only from the user to the system
but also vice versa (Amershi et al. 2014; Calma et al. 2016). The interactive features offered by these systems
have psychological correlates and empirical evidence has shown that they can shift the attention and
cognitive processing capacities towards the system (Sundar et al. 2015). Because research on designing
interactive chatbot development systems for domain experts is limited, we argue that there is a need to
propose design principles that provide a solid foundation for their design. Hence, we articulate the following
research question:
Which design principles should guide the design of interactive chatbot development systems to increase
the engagement of domain experts?
To address this research question, we follow a design science research (DSR) approach by addressing a real-
world problem (i.e., increase the engagement of domain experts in the chatbot development) through the
iterative creation and evaluation of proposed software artifacts (i.e., interactive chatbot development
system) with respective stakeholders (i.e., chatbot developers and domain experts) (Gregor and Hevner
2013). In this paper, we report the first completed design cycle of the DSR project which starts with a semi-
structured interview study with the chatbot development team of a telecommunication provider.
Subsequently, we derive three theory-grounded design principles justified by the interactivity effects model
(Sundar 2012; Sundar et al. 2015) and evaluate their instantiations in an online experiment with students
(N=70). The results provide evidence that the proposed design increases engagement in terms of subjective
and objective outcome measures and that perceived interactivity mediates this effect. Therefore, the results
of the first design cycle contribute with prescriptive knowledge for designing interactive systems that
increase engagement and with a novel artifact in the form of an interactive chatbot development system.
Related Work and Theoretical Foundations
Chatbot Dialog Systems
Chatbots have experienced an increased interest in several domains such as customer service (Adam et al.
2020; Diederich et al. 2019; Gnewuch et al. 2017; Thomaz et al. 2020), digital tutoring (Wellnhammer et
al. 2020; Winkler and Söllner 2018), and enterprise applications (Diederich et al. 2020; Feine et al. 2020a).
They offer benefits such as shorter resolution times and ubiquitous availability (Waizenegger et al. 2020).
Despite their potential, the acceptance, use, and impact of chatbots is growing much slower than expected.
Many human-chatbot interactions do not feel natural and they frequently fail to meet customer
expectations (Grudin and Jacques 2019). One reason is that chatbots’ responses are often constrained, not
contingent, and contain meaningless and nonsensical information (Grudin and Jacques 2019). The
chatbot’s core components handling the responses is the dialog system. A dialog system comprises three
modules, one each for input, output, and control (Rieser and Lemon 2011). These are a natural language
understanding module (NLU) (i.e., converts words to meaning), a dialog management module (i.e., decides
the next system action), as well as a response generation module (i.e., converts meaning to words) (McTear
et al. 2016). Dialog systems are often distinguished in terms of their dialog coherency and scalability (Harms
et al. 2019; Jonell et al. 2018). On the one hand, dialog systems that comprise handcrafted domain-specific
dialog rules enable goal-oriented chatbots to converse coherently about a specific topic (e.g., ELIZA)
(Weitekamp et al. 2020). The naturalness of the chatbot responses is, however, mostly determined by the
amount of effort chatbot developers invest in the creation of dialog rules and the authoring of rule-specific
chatbot responses (Harms et al. 2019). On the other hand, data-driven dialog managers automatically
generate chatbot responses based on large, existing dialog corpora (e.g., Xiaoice) (Shum et al. 2018). They
enable the probabilistic matching of a user’s message to examples in the training dataset. These approaches
are often used for the development of non-goal-oriented chatbots (Weitekamp et al. 2020). However, data-
driven approaches lack coherency and robustness because the naturalness of the responses strongly relies
on the quality of the training data (Harms et al. 2019; Jonell et al. 2018). In the past, rule-based dialog
systems dominated the chatbot landscape, but data-driven dialog systems are becoming increasingly
popular (Harms et al. 2019). Independent of the type of dialog system, the creation of high-quality chatbot
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
3
responses and training data represents a major challenge in the development of chatbots (Harms et al.
2019; Rieser and Lemon 2011).
Engagement and Interactivity
Engagement is a key concept in the design of any online application (Lalmas et al. 2014). It can be defined
as “the emotional, cognitive, and behavioral experience of a user with a technological resource that exists,
at any point in time and over time(Lalmas et al. 2014, p. 3). Engagement is positively related with user
participation and involvement (Lalmas et al. 2014) which are positively related to the success of a system
(Doherty and Doherty 2019; O’Brien et al. 2018). Therefore, it is a top priority for developers to increase
the users’ engagement with a system (Doherty and Doherty 2019). Because engagement is a complex
phenomenon, several drivers of engagement have been identified (Doherty and Doherty 2019). Research
found that one major driver of engagement is interactivity (Sundar 2012). Interactivity can be defined as
is the extent to which users can participate in modifying the form and content of a mediated environment
in real time(Steuer 1992, p. 84) and thus, focuses on the quality and intensity of an interaction (Dolata
and Schwabe 2016). The most salient change brought by interactive systems is that users are active and not
passive recipients anymore (Sundar et al. 2015). The interactive action possibilities (affordances) offered
by an interface have psychological correlates that impact engagement (Sundar et al. 2015). For example,
greater interactivity has shown to increase involvement, focused attention, attitude towards the system, as
well as conscious processing (Sundar 2012). Therefore, interactivity has “the real potential to involve even
the uninvolved(Sundar 2012, p. 10) and is becoming increasingly important in the development of AI-
based systems (Amershi et al. 2014). Because research on designing interactive chatbot development
systems for domain experts is limited, we address this gap in this research project.
Design Science Research Project
To investigate design principles that should guide the design of interactive chatbot development systems,
we followed the DSR approach proposed by Kuechler and Vaishnavi (2008) (see Figure 1). More specifically,
we conducted the DSR project in the context of customer service chatbots in order to instantiate the
proposed design in a real-world context. In this paper, we report the methods and results of the first design
cycle. The applied methods are elaborated in the following sections. Subsequently, we discuss the results of
the first design cycle.
Awareness of Problem: We started the design cycle by conceptualizing the underlying problem space
(Maedche et al. 2019). Therefore, we conducted a semi-structured interview study in order to understand
the challenges of engaging domain experts in the development of chatbots (Myers 2009). We interviewed
employees from a chatbot project team of a major European telecommunication provider. Subsequently,
we analyzed the transcripts in order to identify the goal and scope of this DSR project.
Suggestion: Next, we reviewed the theory that should guide the design of interactive chatbot development
systems to increase the engagement of domain experts. As a starting point for our design, we reviewed
mechanisms about interactive systems that have shown to be successful antecedents for engagement. Based
on this justificatory knowledge, we derive requirements (REQs) that define the overarching goals of the
proposed design (Gregor and Jones 2007). Based on the REQs, we derived design principles (DPs). A DP
can be defined as “a statement that prescribes what and how to build an artifact in order to achieve a
predefined design goal” (Chandra et al. 2015, p. 4040). Finally, we translated the DPs that are abstracted
from technical specifications into concrete design features (DFs) that can be implemented into a prototype
(Seidel et al. 2018). Thus, DFs can be defined as “specific ways to implement a design principle in an actual
artifact“ (Meth et al. 2015, p. 807).
1. Aware ness of Problem
Interview study with a chatbot
project team of a major
European telecommunication
provider.
2. Suggestio n
Suggestion of three design
principles and six design
features grounded in the
interactivity effects model.
3. Deve lopment
Development of two sy stems:
a system implementing the
proposed design and a
baseline design.
4. Evaluation
Online experiment to evaluate
the effects of the proposed
design on users and t heir
engagement.
5. Conclusion
Results support the proposed
design and activities for next
design iterations are
identified.
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
4
Development: We instantiated the proposed design in the form of a web application that enables domain
experts to improve the responses of a chatbot. Based on this core functionality, we developed two versions
of the web application: one version that implements the proposed design and one design that does not
implement it (baseline design). After we instantiated both versions, the chatbot project team of the
telecommunication provider tested them over a week. Subsequently, we conducted two workshops as a first
non-formative evaluation of the initial designs (Venable et al. 2016). Based on the collected feedback, we
refined both versions.
Evaluation: We evaluated the interactive chatbot development system by following a human risk and
effectiveness evaluation strategy (Venable et al. 2016). We chose this evaluation strategy because the major
design risk of the proposed artifact is user-oriented. Therefore, we conducted an artificial, summative
evaluation in the first design cycle to judge the extent to which the outcomes of the proposed design match
our expectations (Venable et al. 2016). More specifically, we investigated the effects of the proposed design
on the users’ perceived interactivity and engagement in an online experiment with a student sample.
Designing Interactive Chatbot Development System
Awareness of Problem
To start the DSR project, we interviewed employees from a chatbot project team of a major European
telecommunication provider. The provider uses a chatbot on their customer service website to offer 24/7
first-level support for their customers. In total, we interviewed seven employees (3 female, 4 male; age =
36.43 years, SD = 8.66; working experience = 15.57 years, SD = 7.69; experience of developing chatbots =
2.5 years, SD = 1.22) which we selected based on the roles they have in the project team (i.e., three project
managers, three chatbot content managers, as well as one computer linguist). Before the interviews, we
developed a semi-structured interview guide consisting of 23 questions that belong to three overarching
question blocks: (1) general questions and demographics; (2) questions about the users, tasks, and
technologies involved in the chatbot development; (3) questions regarding the stakeholders’ motivation to
get involved in the development. The interviews lasted on average 26.86 minutes (SD = 4.29) and were
transcribed subsequently. All transcribed documents had a total length of 3,490 words. Subsequently, we
used MAXQDA to code the transcribed documents following a predefined coding scheme that we developed
for each question block in order to extract the following information: (1) a systematic description of the
internal processes to develop the chatbot, (2) an overview of all involved stakeholders, (3) a detailed
assessment of the currently used chatbot development systems, as well as (4) the major challenges to
motivate domain experts to participate in the development process of the company’s customer service
chatbot.
The analysis of the transcripts revealed that the current chatbot development process is mostly driven by
the content editors of the project team. The editors are chatting with the chatbot and are manually updating
the chatbot responses in spreadsheets that include over 10,000 question-answer mappings. After
refinement, these spreadsheets get aggregated by the project team and are then uploaded to the chatbot’s
dialog system. To further improve and update the knowledge base, all interviewees expressed the need to
involve customer service agents as well as report writers from the customer service departments because
they have daily contacts with customers and know best what customers want and how they frame their
questions. However, chatbot response proposals from customer service agents are rare. A reason is that the
developers lack systems to involve them in the development process. The current process of refining
spreadsheets is very complex and thus, not applicable for domain experts without any chatbot development
experience. The only way to approach the domain experts is to conduct individual feedback sessions or
focus group workshops, which are costly and lengthy. As a result, the involvement of domain experts in the
chatbot development process is always mediated by the chatbot project team. Thus, it takes a long time
until improvements from domain experts are finally implemented in the chatbot’s dialog system. This limits
the domain expert’s ability to affect the resulting chatbot, which decreases their motivation to engage in the
process and may increase their skepticisms concerning the chatbot. Therefore, we identified the problem
that the current chatbot development process and therein used systems do not empower nor motivate
domain experts to actively engage in it. Thus, we argue that adequate systems that increase the domain
expert’s engagement can mitigate the outlined problem. Thus, the purpose and scope of this DSR project
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
5
are to investigate the design of chatbot development systems that increase the engagement of domain
experts.
Suggestion: Requirements, Design Principles, and Design Features
As a starting point for our design, we reviewed mechanisms that have shown to be successful antecedents
for engagement. In this context, interactive systems that enable information flows not only from the user
to the system but also vice versa have shown to have a positive impact on engagement (Amershi et al. 2014;
Calma et al. 2016). To conceptualize interactivity, we followed the interactivity effects model (Sundar 2012;
Sundar et al. 2015) that distinguishes interactivity in the fundamental core components of communication
(i.e., source, modality, and message). Based on this, it defines three interactivity features that extend the
range and functionality of these. The interactivity features are modality interactivity, message interactivity,
and source interactivity which are all related to psychological correlates impacting users engagement.
Modality interactivity refers to the perceived functional bandwidth of a system (Sundar 2012). Message
interactivity is mainly driven by contingency (Sundar 2012) which means that users receive responses from
the system that are dependent upon their previous actions (Rafaeli 1988). Source interactivity is the degree
to which the interface provides users the ability to manipulate the information displayed by the system
(Sundar 2012). By building on this model as justificatory knowledge, we argue that systems with a high level
of modality, message, and source interactivity will increase the engagement of domain experts. Therefore,
we formulated three requirements (REQ): interactive chatbot development systems should have a high level
of modality interactivity (REQ1), message interactivity (REQ2), as well as source interactivity (REQ3).
To increase the modality interactivity (REQ1) and source interactivity (REQ3), we address the limitation
of current systems, which do not enable domain experts to directly improve a chatbot during an interaction.
Therefore, we reviewed research on direct manipulation interfaces (Frohlich 1993; Shneiderman 1997).
Direct manipulation refers to a style of interaction characterized by three properties (Shneiderman 1997):
(1) a continuous representation of the object of interest; (2) physical actions or labeled button presses
instead of complex syntax; (3) rapid incremental reversible operations whose impact is immediately visible.
Overall, interfaces that support direct manipulation have been shown to simplify the mapping between
goals and actions at the interface by reducing the semantic and articulatory distance (Frohlich 1993). This
increases the users power of control and can encourage a feeling of engagement because users do not
interact with the system through some hidden intermediary (Shneiderman 1997). Therefore, we articulated
the first DP: (DP1) an interactive chatbot development system should enable users to directly manipulate
the objects of interests in order to increase engagement of domain experts.
To increase the message interactivity (REQ2), the most critical mechanism is perceived contingency
(Sundar 2012; Sundar et al. 2015). The concept of contingency describes the interaction with a system as a
series of message exchanges between the user and the system, which are dependent upon preceding
messages and sequentially related (Rafaeli 1988). According to Rafaeli (1988), interactions can be
distinguished into two-way, reactive, and responsive interactions. Two-way interaction exists when the
user’s and the system’s messages flow bilaterally without accounting for each other. A system is reactive
when its responses refer to the user’s messages that are immediately preceding them. If the system reactions
refer to all previous messages, it is a responsive system (Rafaeli, 1988). Responsive systems have shown to
positively influence the user’s perception of the system (Sundar et al. 2003) as well as positively influence
their engagement (Sundar et al. 2016). Therefore, we articulated the second DP: (DP2) an interactive
chatbot development system should contingently respond to any user input in order to increase engagement
of domain experts.
To increase the modality interactivity (REQ1) as well as message contingency (REQ2), we reviewed
research on affordances that offer action possibilities by transmitting information about the system’s
functionality and the designer’s intent (Norman 2013; Sundar et al. 2015). Affordances can affect users’
perceptions by their mere presence on an interface and guide assessments of the underlying content of the
system even without the user engaging in those features (Norman 2013). Affordances impact psychological
correlates that influence engagement (Lee and Sundar 2013). For example, by adaptively gathering
information about the user’s interaction in the form of metrics, designers can increase the perceived
contingency, which enhances the perception of message relevance, uniqueness, and reliability (Lee and
Sundar 2013; Sundar et al. 2015). Moreover, a counter indicating the behavior of other users can act as a
criterion to judge the popularity of the system (Sundar et al. 2015). This effect is triggered by the bandwagon
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
6
heuristic, which leads to the perception that a majority of others have also endorsed the system (Lee and
Sundar 2013). This is similar to the number of shares, likes, or retweets shown by social media sites. This
has been shown to enhance users’ participation by conveying a sense of community (Kim and Sundar 2011)
and to positively influence the perception of the system’s content (Lee and Sundar 2013). Hence, we
articulated the third DP: (DP3) an interactive chatbot development system should collect and visualize
interaction metrics in order to increase engagement of domain experts.
Table 1. Design Principles and Design Features
Design Principle
Design Feature
DP1: An interactive chatbot development system should
enable users to directly manipulate the objects of interests
in order to increase engagement of domain experts.
DF1: Physical actions to directly change the objects of interest.
DF2: Rapidly reversible operations whose impact is immediately
visible.
DP2: An interactive chatbot development system should
contingently respond to any user input in order to
increase engagement of domain experts.
DF3: Immediate system reactions to user input.
DF4: Contingent system responses accounting for all prior user input.
DP3: An interactive chatbot development system should
collect and visualize interaction metrics in order to
increase engagement of domain experts.
DF5: Auto-generated interaction metrics about the user’s interaction.
DF6: Auto-generated interaction metrics about the interactions of
other users.
Subsequently, the DPs were translated into DF that can be implemented in a prototypical instantiation
(Seidel et al. 2018). To instantiate DP1, we articulated two DFs that build on direct manipulation research
(Frohlich 1993; Shneiderman 1997). (DF1) The system should support physical actions to directly change
the objects of interest. In our context, the objects of interest are the chatbot responses. Therefore, the system
should enable users to directly click on a chatbot response in order to improve it. In addition, (DF2)
changes proposed by users should be immediately visible on the objects of interest. This means that the
original chatbot response should be immediately updated based on the inserted user input. In addition,
users should always have the possibility to undo any improvements. To instantiate DP2, we articulated two
DFs based on research on message contingency (Rafaeli 1988; Sundar et al. 2003; Sundar et al. 2015). The
system (DF3) should show immediate reactions to any user input in order to signal the user that the system
has successfully received the input. In our context, the system should therefore directly react to any inserted
chatbot response improvement. In addition, the system should not only react to all directly preceding user
input, but (DF4) should also account for all prior user inputs when reacting to the user in order to increase
perceived contingency. To instantiate DP3, we derived two DFs for implementing interactivity metrics that
can trigger heuristics about the nature of the system (Kim and Sundar 2011; Lee and Sundar 2013; Norman
2013). The system (DF5) should display auto-generated interaction metrics by adaptively gathering
information about the user. Moreover, the system (DF6) should display auto-generated interaction metrics
about other users in order to indicate the popularity of the systems functionality. Summing up, Table 1
provides an overview and lists all six DFs and the related DPs.
Development
The system that enables domain experts to improve the responses of a chatbot was developed as a Java web
application (see description of the system’s architecture in Feine et al. (2020c)). It can be connected with
any existing chatbots via their message API. To showcase the concept, we decided to include chatbots that
converse via Microsoft's Direct Line 3.0 API. After a chatbot gets connected to the system via the Direct
Line API key, the system instantiates a knowledge base and generates a shareable web application. The web
application was developed with JavaScript and based on the code of Microsofts WebChat. The core
functionality of the web application is to enable domain experts to interact with the chatbot and to directly
improve the chatbot’s responses. The improvements are then stored in the system’s knowledge base. Thus,
the web application serves as an additional layer between the domain experts and the connected chatbot
and does not require access to the chatbot's source code. Based on this core functionality, we developed two
versions of the web application: one version that implements the proposed design including all six DFs and
one baseline design providing a simple tabular-based dialog editing functionality.
The baseline chatbot development system was inspired by the current systems used by the project
team of the telecommunication provider. They test a chatbot and then collect response improvements in a
separate spreadsheet. Similarly, the baseline design displays a chat window on the left side and a
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
7
spreadsheet on the right side of the website (see Figure 2). The chat window was developed using
Microsoft’s WebChat and the spreadsheet functionality was developed using the JavaScript package
Handsontable. The web application enables users to normally interact with the connected chatbot on the
left side and to improve the chatbot response on the right side of the web application. Each time a user
receives a chatbot response on the left, the spreadsheet automatically displays it on the right. Users can
then click on the column next to the chatbot response and insert an improvement. The improvements are
then uploaded to the system’s knowledge base. Overall, the system does not include any instantiations of
the proposed DFs because the domain experts cannot directly change the responses in the chat window
(DF1) and do not see any direct effects of their improvements in the chat window (DF2). In addition, the
web application is not directly reacting to the users’ improvements (DF3) and is not accounting for prior
improvements (DF4). For example, when a user improves a chatbot response, the chatbot is still answering
with the original response in the same context at a later stage of the interaction. Finally, the application
does not visualize any auto-generated interaction metrics about the user’s interaction (DF5) or about the
interactions of other users (DF6).
Figure 2. Baseline Chatbot Development System
The interactive chatbot development system implementing the proposed design (see Figure 3) uses
the same chat window for handling interactions with the connected chatbot, but further includes
instantiations of all six DFs. To instantiate DF1, all chatbot responses are highlighted with the information
that the user can click on it to improve the response. When a user clicks on a chatbot response, a pop-up
window appears. Users can then insert an improved chatbot response in the text field which gets uploaded
to the system’s knowledge base. To instantiate DF2, the system immediately updates the original chatbot
response in the chat window with the inserted improvement. This happens in real-time while the user is
typing in the text field. In addition, users can reverse the improvements by pressing the reverse button. To
instantiate DF3, the system directly reacts to all inserted improvements by showing a pop-up notification
confirming that the improvement has been saved. In addition, the chat window highlights all improved
responses. To instantiate DF4, the chat window is accounting for all prior user improvements before
displaying a chatbot response. Each time a user sends a message to the chatbot, the chat window evaluates
whether the received chatbot response from the API was already improved by the user. To do so, the website
calculates distances between the original chatbot response and all improved chatbot responses by using the
Jaro-Winkler similarity measure (Winkler 1990). To further account for the context of the interaction, the
system also calculates all distances between the current user input and all stored user inputs that were
inserted before an earlier chatbot response was improved. If both measures indicate a high similarity, the
chat window replaces the original chatbot response from the API with an improved response and highlights
it in the chat window. To instantiate DF5, the website visualizes three interaction metrics next to the chat
window: (1) the total number of improved chatbot responses, (2) the total number of messages sent to the
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
8
chatbot, and (3) the total number of received messages from the chatbot. To instantiate DF6, the website
displays the number of chatbot response improvements suggested by all users of the interactive chatbot
development system. To do this, the application queries the system's knowledge base and updates the
counter in real-time.
Figure 3. Interactive Chatbot Development System
Evaluation
To evaluate our design, we compare the interactive chatbot development system with the baseline system
in an online experiment. Therefore, we formulated three hypotheses to investigate the effect of the proposed
design on engagement (see Figure 4): First, we propose that a system implementing our DPs will increase
the perceived interactivity. Because our DPs were derived based on the interactivity effects model (Sundar
2012; Sundar et al. 2015), we propose the following hypothesis: (H1) The proposed design leads to a higher
level of perceived interactivity than the baseline design. Second, we investigated whether the proposed
design also increases the engagement of the users. Because interactivity has been shown to increase
engagement in other contexts (Amershi et al. 2014), we propose that: (H2) The proposed design leads to a
higher level of engagement than the baseline design. Finally, we wanted to explain the effects of
implementing our DPs in a system. By building on the interactivity effects model, we proposed that the level
of perceived interactivity mediates the engagement of the users. Thus, we propose: (H3) Perceived
interactivity mediates the effect of the proposed design on engagement.
H1/H2: Direct Effect Hypothesis.
H3: Mediation Effect
Hypothesis.
Controls: Age, Gender,
Experience in Using Chatbots,
Experience in Developing,
Chatbots, Affinity for
Technology, Disposition to Trust
Technology.
DF2: Rapid operations
whose im pact is
immediately vi sible and
reversible.
DF3: Immediate system
reactions to user input.
DF4: System reactions
are contingent upon all
preceding user inputs.
DF5: Auto-generated
interaction m etrics about
the user’s interaction.
DF1: Physical actions to
directly change the
objects of interest.
DF6: Auto-generated
interaction m etrics about
interactions of other
users.
Perceived
Interactivity
H1
H2
H3
Engagement
Condition
(Baseline vs. Proposed Design)
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
9
Experimental Method
Experimental Design: To test our hypotheses, we conducted an online experiment and used a between-
subjects design with two conditions: a control condition (baseline design) and a treatment condition
(proposed design). We used LimeSurvey as the experimental platform and followed the experimental
procedure as illustrated in Figure 5.
At the beginning of the experiment, participants were randomly assigned to one of the two groups. After
reading the consent form and agreeing to it, the participants answered questions about their demographics.
Subsequently, the scenario was described. The scenario description stated that the telecommunication
provider has noticed that their customer service chatbot has limited small talk capabilities and cannot
respond to questions regarding their recently introduced products. Therefore, the company asks the
participants to use a chatbot development system in order to improve the chatbot’s small talk capabilities
as well as to improve its responses to questions regarding their new products. Next, participants saw a 1-
minute long introduction video that detailly explained the functionalities of the chatbot development
system. The participants were able to watch the video several times. Next, participants had to conduct two
tasks by using the interactive or baseline system depending on their group assignment. Both tasks were in
a counterbalanced order. In the small talk task, participants were asked to improve the chatbot responses
to small talk questions by using the chatbot development system. In the product task, participants first saw
information about five recently introduced real-world products of the telecommunication provider. These
included information about a mobile phone contract, a DSL contract, a mobile phone data flat contract, as
well as an antivirus program. After reading the information for at least three minutes, participants were
asked to use the chatbot development system to improve the chatbot responses regarding these products.
To make the scenario as realistic as possible, we used the real-world chatbot of the telecommunication
provider, which is deployed on their website. To do so, we received the chatbot’s knowledge base of the
telecommunication provider, instantiated the chatbot using the Microsoft Bot Framework, and connected
it to the chatbot development system. Thus, participants received the same chatbot responses in the online
experiment using both versions of the chatbot development system as real customers on the website of the
telecommunication provider. After participants decided to finish both improvement tasks, they were asked
to fill out a post-experiment questionnaire regarding their experience with the chatbot development system.
Table 2. Demographics and Controls
Condition
N
Age
Gender
Experience in Using
Chatbots *
Experience in
Developing Chatbots *
Control
(Baseline Design)
35
23.91 (SD = 3.18)
Female: 12, Male: 23
2.49 (SD = 1.27)
1.20 (SD = .63)
Interactive
(Proposed Design)
35
23.31 (SD = 3.61)
Female: 15, Male: 20
2.23 (SD = 1.11)
1.26 (SD = .61)
Note: SD = Standard deviation;* Measured on a five-point Likert scale
Participants: Participants (see Table 2), mainly students, were recruited via the university’s research
panel. We considered students to be appropriate subjects for our experiment because they are experienced
with mobile phone contracts, are usually customers of telecommunication providers, and are among the
early adopters of chatbots (Gnewuch et al. 2018). In addition, we empowered the students to conduct the
experimental task by showing them the five telecommunication products during the complete improvement
task. Thus, they had the required domain knowledge to conduct the task. We assumed a large effect size
and using G*Power, we calculated the required sample size of 70 participants (effect size = .80, α = .05, β =
.95) (Faul et al. 2007). Among all participants, we raffled 600€ as compensation for participation. In total,
90 participants finished the experiment over a duration of two days. After the experiment, we excluded
participants that did not correctly answer three attention filter questions (5) and did not use the system to
Onboarding
video
Scenario
description
Post-
experiment
questionnaire
Random
group
assignment
Demo-
graphics
Small talk
task
Product
task
Product
task
Small talk
task
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
10
improve the chatbot (15). Our final sample included 70 participants. The average experimental duration
was 33 minutes (SD = 15.45).
Manipulation Checks: The experiment included three manipulation check questions to test whether the
manipulation of the three REQs through the instantiation of the proposed DPs was successful. On seven-
point Likert scales (i.e., 1: strongly disagree, 7: strongly agree), participants were asked whether “the chatbot
development system offers extensive functionalities to improve the chatbot responses in order to assess
modality interactivity (MC1), whether “the reactions of the chatbot development system were contingent
upon preceding inputs in order to assess message interactivity (MC2), and whether “the chatbot
development system lets the user serve as the source of communication in order to assess source
interactivity (MC3).
Measurement Instruments: Overall, several different approaches to measuring the multidimensional
nature of engagement with a system exist (Doherty and Doherty 2019; Lalmas et al. 2014; Sundar et al.
2015). These approaches are often distinguished in subjective measures (i.e., questionnaires, interviews,
and other forms of self-report) as well as objective measures (i.e., logging behavior and interaction,
psychophysiological measures, as well as audio and visual analysis) (Doherty and Doherty 2019). Subjective
measures offer rich descriptions influenced by subjectivity, cognition, emotion, and memory. Objective
measures assess engagement without direct questioning and thus, have higher ease of application, limited
disruption of experience, and reduced user burden. However, they are also influenced by the user’s
subjectivity in terms of awareness and the researchers’ experimental and interpretive choices (Doherty and
Doherty 2019). As a consequence, researchers usually combine both measurement approaches to assess the
engagement with a system (Lalmas et al. 2014).
Table 3. Objective Measurement Instruments
Measurement
Description
Duration of Interaction
Interaction duration of both chatbot improvement tasks in seconds.
Number of Messages
Number of messages sent to the chatbot in both improvement tasks.
Number of Improvements
Number of improved chatbot responses in both improvement tasks.
To objectively assess engagement, we logged users’ behavior during the interaction with the chatbot
development system (see Table 3). Therefore, we measured the duration of the interaction, which indicates
the user’s overall engagement with the system. We further counted the number of messages sent to the
chatbot, which indicates the user’s depth of interaction. In addition, we counted the number of improved
chatbot responses, which indicates how much the participants got involved in using the main functionality
of the system.
Table 4. Subjective Measurement Instruments
Construct
Reference
Measurement
Items
Factor Loadings
Reliability
Engagement
Scale
O’Brien et al.
(2018)
5-point Likert
scale
12
(1 dropped)
[0.522 : 0.778]
α = .883, CR = .885
Perceived
Interactivity
Liu (2003)
7-point Likert
scale
15
(5 dropped)
[0.514 : 0.757]
α = .869, CR = .872
Affinity for
Technology
Franke et al.
(2019)
6-point Likert
scale
9
(0 dropped)
[0.588 : 0.815]
α = .908, CR = .910
Disposition to
Trust Technology
Lankton et al.
(2015)
7-point Likert
scale
3
(0 dropped)
[0.762 : 0.899]
α = .885, CR = .892
Note: SD = Standard deviation, α = Cronbach’s alpha, CR = Composite reliability, AVE = Average variance extracted.
To subjectively assess engagement, we used the engagement scale by O’Brien et al. (2018) which measures
the depth of an actor’s investment when interacting with a digital system. To measure perceived
interactivity, we used the scale from Liu (2003) which was developed to measure the perceived interactivity
of websites. Moreover, we tested for other control variables that have been identified as relevant in extant
literature. Therefore, we assessed the individual’s tendency to actively engage in technology interactions
using the affinity for technology scale from Franke et al. (2019). In addition, we measured the individual’s
disposition to trust technology using the scale from Lankton et al. (2015). We measured all constructs using
the original Likert-scales with the anchors strongly disagree and strongly agree. After data collection, we
conducted a confirmatory factor analysis using the lavaan package (version 0.6-3) in R. Subsequently, we
dropped all items that did not load strongly on the construct they intended to measure (i.e., below 0.5) (Hair
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
11
et al. 2010). Next, construct reliability was assessed using Cronbach’s α and composite reliability (CR). As
shown in Table 4, Cronbach’s α and CR of all constructs exceeded the threshold value of .70 (Hair et al.
2010). Lastly, we tested for common method bias (Podsakoff et al. 2003) and conducted a Harman's single-
factor test. A principal component analysis of all measured items revealed 7 factors with eigenvalues greater
than 1. Because the first factor accounts for only 24.59% of the variance, less than 50% of the total variance,
the Hartman one-factor test indicated that common method bias may not be a concern.
Statistical Analyses: To test whether our manipulation was successful, we compared the three
manipulation checks (MC1, MC2, MC3) between both groups. To do so, we first tested for normality using
the Shapiro-Wilk test. The test revealed a non-normal distribution for all three manipulation checks.
Consequently, we used the Mann-Whitney-U test to investigate differences among the groups.
Subsequently, we performed the following three steps to analyze the experimental results regarding our
three hypotheses. First, we analyzed the effect of the treatment (i.e., baseline design vs. proposed design)
on perceived interactivity (H1) using a linear regression. Therefore, we coded the treatment as a binary
dummy independent variable (i.e., 0: baseline, 1: proposed design), included perceived interactivity as the
dependent variable, and added the reported control variables. To test H2, we first investigated the effect of
the treatment on subjective engagement. Therefore, we ran a linear regression with the treatment as the
independent variable and the engagement scale as the dependent variable including the control variables.
Next, we investigated the influence of the design on the objective measures. It must be noted that all of
these three objective variables are counter variables that are discrete, limited to non-negative values, and
are positively skewed (Cameron and Trivedi 2013). Consequently, they cannot be analyzed using ordinary
linear regressions but get analyzed using a Poisson or negative binomial regression (Cameron and Trivedi
2013; Gardner et al. 1995). Because individual counts are typically more variable (i.e., overdispersal) than
this is implied by a normal Poisson model, we first investigated the dispersions of all variables (Gardner et
al. 1995). The overdispersion test from Cameron and Trivedi (1990) using the R package AER (version 1.2.9)
showed that all variables are significantly overdispersed (p < .04). This means that the conditional variance
exceeds the conditional mean (Cameron and Trivedi 2013). Thus, we analyzed all objective engagement
measures using negative binomial regressions (Cameron and Trivedi 2013). To test H3, we investigated
whether perceived interactivity mediates the effects of the treatment on the five dependent variables.
Therefore, we used the regression models from the previous analyses and tested for mediation using a
percentile bootstrap estimation approach (Hayes 2013).
Results
Manipulation Checks: First, we investigated the manipulation check items, namely whether our
proposed design increases the modality interactivity (MC1baseline = 3.17, SD = 1.65; MC1interactive = 4.43, SD =
1.67), message interactivity (MC2baseline = 3.29, SD = 1.56; MC2interactive = 5.66, SD=1.24), and source
interactivity (MC3baseline = 4.51, SD=1.70; MC3interactive = 6.57, SD = 0.61). As expected, Mann-Whitney-U
tests revealed significant differences between both groups for all three manipulation checks (i.e., MR1: U =
364.5, p = .002, r = .353; MR2: U = 150, p < .001, r = .658; MR3: U = 191, p < .001, r = .618) with moderate
(MR1) and strong effect sizes (MR2, MR3) (Cohen 1992). In summary, we concluded that our treatment
manipulation to increase modality interactivity (MC1), message interactivity (MC2), and source
interactivity (MC3) through our DPs was successful.
Table 5. Descriptive Results
Subjective Measures
Objective Measures
Condition
Perceived
Interactivity
Engagement
Scale
Duration of
Interaction
Number of
Messages
Number of
Improvements
Control
(Baseline
Design)
4.43 (SD = .97)
3.11 (SD = .83)
12.74 min (SD=7.77)
14.26 (SD = 7.89)
6.57 (SD = 3.06)
Interactive
(Proposed
Design)
5.50 (SD = .98)
3.52 (SD = .60)
17.57 min (SD = 14.95)
20.03 (SD = 11.48)
10.26 (SD = 5.05)
Note: Numbers represent means; SD = Standard deviation
H1: To test H1, we investigated the effect of the treatment (baseline design vs. proposed design) on
perceived interactivity (see descriptive results in Table 5). The regression result (Table 6) shows a significant
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
12
positive effect of the treatment on perceived interactivity (B = 1.035, SE = .245, p < .001). Consequently,
the test provides evidence that the proposed design increases perceived interactivity. Thus, we can confirm
H1.
H2: To test H2, we first investigated the effect of the treatment on subjective engagement. The regression
result (see Table 6) reveals a significantly positive relationship of the treatment on the engagement scale (B
= .364, SE = .176, p = .042). Next, we investigated the influence of the design on the objective measures. To
do so, we used the treatment as the independent variable and fitted for each dependent variable a negative
binomial regression model. The results (see Table 6) show that the treatment has a significantly positive
effect on all three objective engagement measures (i.e., duration of interaction: B = .305, SE = .150, p =
.041; number of messages: B = .355, SE = .117, p = .002; number of improvements: B = .419, SE = .109, p <
.001). Thus, the proposed design increases the duration of interaction, the number of messages, as well as
in the number of improvements. Taking all analyses into account, the results provide evidence that the
proposed design leads to significantly higher subjective and objective engagement than the baseline design
(see Table 6). Consequently, we can confirm H2.
Table 6. Regression Results
Perceived
Interactivity
Engagement
Scale
Duration of
Interaction
Number of
Messages
Number of
Improvements
Intercept
Constant
4.775
(SE = 1.27)
4.318
(SE = .910)
6.628
(SE = .773 )
2.55
(SE = .623)
1.972
(SE = .569)
Manipulation
Condition: Proposed Design
1.035 ***
(SE = .245)
.364*
(SE = .176)
.305*
(SE = .150 )
.355**
(SE = .117)
.419***
(SE = .109)
Controls
Age
-.026
(SE = .038)
-.0315
(SE = .027)
-.017
(SE = .022 )
-.007
(SE = .019)
.002
(SE = .017)
Gender: Male
-.353
(SE = .281)
-.400
(SE =.201)
-.161
(SE = .166 )
.168
(SE = .134)
.081
(SE = .125)
Experience in Using Chatbots
-.072
(SE = .118)
-.077
(SE = .085)
-.152*
(SE =.070 )
-.046
(SE = .058)
-.072
(SE = .054)
Experience in Developing Chatbots
-.015
(SE = .211)
-.244
(SE = .151)
-.151
(SE =.139 )
-.101
(SE = .104)
-.028
(SE = .095)
Affinity for Technology
-.009
(SE = .161)
.046
(SE = .115)
.101
(SE = .095)
-.012
(SE = .076)
-.074
(SE = .069)
Disposition to Trust Technology
.138
(SE = .105)
.017
(SE = .075)
.115
(SE = .067)
.065
(SE = .052)
.063
(SE = .046)
F / Likelihood-ratio χ2
F(7, 62) = 3.63**
F(7,62) = 2.24*
χ2 (7) = 14.62*
χ2 (7) = 14.31*
χ2 (7) = 22.82**
r2 / Pseudo r2
r2 = .291
r2 = .202
Pseudo r2 = .014
Pseudo r2 = .029
Pseudo r2 = .059
Note: SE = Standard errors; *, **, *** indicates significance at the 90%, 95%, and 99% level, respectively.
H3: To test H3, we used the regression models from the previous analyses and tested for mediation. We
first controlled for the mediator (i.e., perceived interactivity) in the linear regression model (i.e.,
engagement scale). Results reveal that the treatment is no longer a significant predictor for the engagement
scale (B = -.074, SE = .162, p = .649). Next, indirect effects were tested. The results indicate that the indirect
effect is significant (B = .438, SE = .129, 95% CI = .185, .691). As a consequence, the results provide evidence
that perceived interactivity is fully mediating the effect of the treatment on the engagement scale (Hayes
2013). Subsequently, we investigated a potential mediation for objective engagement measures. Therefore,
we used the negative binomial regressions models from above and controlled for the mediator (i.e.,
perceived interactivity). Results show that the treatment is no longer a significant predictor for the duration
of interaction (B = .125, SE = .163, p = .442), number of messages exchanged with the chatbot (B = .193, SE
= .125, p = .124), but is still a significant predictor for the number of improvements (B = .273, SE = .120, p
= .023). Next, indirect effects were tested and the results indicate that the indirect effect is significant for
the duration of interaction (B = 1.199, SE = .086, 95% CI = .1013, .1.419), the number of messages exchanged
with the chatbot (B = 1.19, SE = .073, 95% CI = .1030, .1.374), and the number of improvements (B = 1.158,
SE = .068, 95% CI = .1013, .1.324). Consequently, the results provide evidence that perceived interactivity
is fully mediating the duration of interaction, the number of messages exchanged with the chatbot, and
partially mediating the number of improvements. Taking all four mediation analyses together, we can
confirm H3.
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
13
Discussion
In this paper, we propose and evaluate design principles that guide the design of interactive chatbot
development systems to increase the engagement of domain experts. Therefore, we conducted the first
design cycle of our DSR project and proposed three DPs grounded in the interactivity effects model (Sundar
2012; Sundar et al. 2015). Subsequently, we derived six DFs and instantiated the design in a chatbot
development system. We demonstrated that it was seamlessly possible to connect an existing chatbot via
their API to the system and to implement the proposed design in a web application that can be shared with
respective domain experts. The evaluation of the web application revealed that the proposed DPs increase
the perceived interactivity of the artifact and positively influence the subjective and objective engagement
of the participants. In addition, we were able to explain the increase in their engagement by showing a
mediation effect of perceived interactivity. This demonstrates that interactivity is a powerful driver to
increase users’ curiosity to interact with the system longer and more extensively (Lalmas et al. 2014). As a
consequence, our study shows that the existing descriptive knowledge in the form of the interactivity effects
model is a powerful foundation to inform the design of interactive artifacts that are supposed to increase
user engagement. Thus, the first design cycle of our DSR project makes two contributions: prescriptive
knowledge for designing interactive systems that increase engagement and a novel artifact in the form of
an interactive chatbot development system.
Developers can now use the prescriptive knowledge as a solid foundation to improve existing chatbot
prototyping tools (e.g., botmock.com, botsociety.io), chatbot improvements systems (e.g., Rasa X), or
chatbot development kits (e.g., Microsoft's Power Virtual Agents). Therefore, it can guide developers to
increase the interactivity of their systems in order to increase the engagement of the users. However, the
current DPs are restricted to the specific case of designing a chatbot development system. Therefore, it is
necessary to further generalize the findings of our study in order to further account for the changing role of
humans in the design of other AI-based systems (Baskerville et al. 2018; Vom Brocke et al. 2020).
Therefore, the proposed DPs are just a starting point which can guide to the adoption of the interactivity
effects model (Sundar 2012; Sundar et al. 2015) as justificatory knowledge in the design of interactive
development systems for AI-based systems. For example, in the context of interactive machine learning,
the interface design is fundamental to engage respective domain experts in the improvement of machine
learning models (Amershi et al. 2014; Dudley and Kristensson 2018). While much research on interactive
machine learning focuses on technical capabilities (Dudley and Kristensson 2018), existing prescriptive
knowledge for their interface design is often rather abstract, e.g., “exploit interactivity and promote rich
interactions”, “engage the user(Dudley and Kristensson 2018, p. 26). Therefore, researchers can extend
the proposed prescriptive knowledge by investigating it in the context of interactive development systems
for other AI-based systems, such as interactive machine learning. Thereby, it is not only possible to put the
domain expert in the center of the development process but to further ensure their high engagement in the
knowledge acquisition process. This can enable the design of effective and efficient knowledge-sourcing
platforms that do not only engage internal domain experts (i.e., domain experts inside the legal borders of
a company) but also external domain experts (i.e., domain experts outside the legal borders of a company)
in the knowledge acquisition process (Mrass et al. 2017). However, it must be noted that the application of
such a system can have severe implications for specific stakeholder groups that also raise ethical
considerations (Myers and Venable 2014). In this context, AI-based systems such as chatbots might be a
threat for specific jobs such as customer service agents. However, state-of-the-art chatbots are by far not
capable to solve complex customer requests yet. Thus, they have currently only the capability to support
customer service agents by handling the first-level support. This empowers customer service agents to focus
on the more challenging tasks which require human reasoning and understanding.
In addition, our study comes with limitations that also suggest opportunities for future research. First, we
proposed three DPs and evaluated their instantiation against a baseline design. We assumed that the three
DPs are all influencing users’ engagement because the underlying REQ as well as the DPs were derived
based on established literature on engagement. However, we could not determine the individual direction
and strength of the effect for each DP distinctively, nor could we investigate an interaction effect.
Consequently, we plan follow-up experiments to evaluate each DP separately as well as potential
combinations of the DPs against the baseline design to extend the theoretical design contribution. Second,
we found that perceived interactivity only partially mediates the number of improved chatbot responses.
Consequently, the amount of knowledge input suggested by participants was also determined by other
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
14
factors not controlled for in our experiment. In this context, social and personal factors as well as the
individual’s domain knowledge have shown to influence a person’s engagement in creative tasks (Madjar et
al. 2011). This underlines the importance of not only designing the right development system but also of
involving the right people in the development process of chatbots. In a follow-up evaluation in the form of
a field study, we will investigate how real-world domain experts actually perceive and use the proposed
design in the form of a chatbot development system in their daily work routines. Therefore, this first design
cycle builds the foundation for further attempts to investigate the design of interactive chatbot development
systems in the field. Third, our proposed DPs grounded in the interactivity effects model are currently
focusing on chatbot development systems. However, design knowledge for the design of interactive
development systems that engage domain experts in the development process is relevant for any systems.
Therefore, we aim to further generalize this knowledge to a broader class of systems in the subsequent
design cycles. Fourth, we collected several response improvements in the online experiment because users
can generally reply in a multitude of ways (Jonell et al. 2018). Now, chatbot developers would need to
converge the collected improvements to an optimal set of chatbot responses in order to implement them in
the dialog system. However, it is quite difficult to distill the best suitable set of chatbot responses because
no best interaction style of a chatbot exists. The best chatbot response always depends on the user, task,
and context of the current interaction. For example, it has been shown that users with different levels of
task experience also prefer different language styles of a chatbot (Chattaraman et al. 2019). To address this,
further research can investigate how promising convergence mechanisms, for example from brainstorming
research (Seeber et al. 2017), could be used to enable chatbot developers to handle a large amount of
collected chatbot responses and to distill a best suitable set of responses. In addition, future research can
investigate other innovative approaches to directly improve responses of a chatbot directly in the
conversation. Fifth, we evaluated the effects of the proposed design on the engagement of the users in the
online experiment. In the next design cycle, we further want to show that the application of the system also
results in the development of better chatbots. Sixth, the design of the system is currently restricted to the
improvement of chatbot responses only. However, chatbot development support should not only support
the creation of meaningful responses but should further help developers to design all important design
features in an effective manner (Feine et al. 2019b, 2019c). For example, the gender of the chatbot has
shown to be a powerful social cue that has a strong impact on user perception (Feine et al. 2020b).
Therefore, future chatbot development system should not only focus on the dialog but also on other
important design features (Feine et al. 2019a). Seventh, we focused on text-based chatbots. However, voice-
based conversational agents are becoming increasingly important (McTear et al. 2016; Wellnhammer et al.
2020). They have lower barriers of entry and use and enable interactions separate from the already busy
chat modality. Consequently, we plan to extend the design of the proposed system to further investigate the
design of interactive development systems for voice-based conversational agents. This will give us the
opportunity to investigate the design in an additional context of AI-based systems.
Conclusion
In this paper, we addressed the challenge of engaging domain experts in the chatbot development.
Therefore, we proposed three DPs for the design of interactive chatbot development systems. We
instantiated the proposed design and showed that the DPs increased participants’ engagement in the
chatbot development process. In addition, we showed that perceived interactivity is driving this
engagement and that direct manipulation interfaces, contingent responses to user input, as well as auto -
generated interaction metrics are powerful design features to increase perceived interactivity and
engagement. Consequently, we contribute with prescriptive knowledge for designing interactive chatbot
development systems that increase engagement and with a novel artifact in the form of an interactive
chatbot development system.
References
Adam, M., Wessel, M., and Benlian, A. 2020. “AI-based chatbots in customer service and their effects on
user compliance,” Electronic Markets (9:2), p. 204.
Amershi, S., Cakmak, M., Knox, W. B., and Kulesza, T. 2014. “Power to the people: The role of humans in
interactive machine learning,” AI MAGAZINE (35:4), pp. 105-120 (doi: 10.1609/aimag.v35i4.2513).
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
15
Baskerville, R., Baiyere, A., Gregor, S., Hevner, A., and Rossi, M. 2018. “Design science research
contributions: finding a balance between artifact and theory,” Journal of the Association for
Information Systems (19:5), pp. 358-376.
Calma, A., Leimeister, J. M., Lukowicz, P., Oeste-Reiß, S., Reitmaier, T., Schmidt, A., Sick, B., Stumme, G.,
and Zweig, K. A. 2016. “From active learning to dedicated collaborative interactive learning,” in 29th
International Conference on Architecture of Computing Systems, VDE, pp. 1-8.
Cameron, A. C., and Trivedi, P. K. 2013. Regression analysis of count data, Cambridge University Press.
Cameron, A.C., and Trivedi, P. K. 1990. “Regression-based tests for overdispersion in the Poisson model,”
Journal of Econometrics (46:3), pp. 347-364.
Chandra, L., Seidel, S., and Gregor, S. 2015. “Prescriptive Knowledge in IS Research: Conceptualizing Design
Principles in Terms of Materiality, Action, and Boundary Conditions,” in 48th Hawaii International
Conference on System Sciences, pp. 4039-4048.
Chattaraman, V., Kwon, W.-S., Gilbert, J. E., and Ross, K. 2019. “Should AI-Based, conversational digital
assistants employ social- or task-oriented interaction style? A task-competency and reciprocity
perspective for older adults,” Computers in Human Behavior (90), pp. 315-330 (doi:
10.1016/j.chb.2018.08.048).
Cohen, J. 1992. “A power primer,” Psychological bulletin (112:1), pp. 155-159.
Coniam, D. 2014. “The linguistic accuracy of chatbots: usability from an ESL perspective,” Text & Talk
(34:5), pp. 545-567.
Diederich, S., Brendel, A. B., and Kolbe, L. M. 2020. “Designing Anthropomorphic Enterprise
Conversational Agents,” Business & Information Systems Engineering (doi: 10.1007/s12599-020-
00639-y).
Diederich, S., Janßen-Müller, M., Brendel, A., and Morana, S. 2019. “Emulating Empathetic Behavior in
Online Service Encounters with Sentiment-Adaptive Responses: Insights from an Experiment with a
Conversational Agent,” in Proceedings of the 40th International Conference on Information Systems
(ICIS), Munich: AISel.
Doherty, K., and Doherty, G. 2019. “Engagement in HCI: Conception, Theory and Measurement,” ACM
Computing Surveys (51:5), pp. 1-39 (doi: 10.1145/3234149).
Dolata, M., and Schwabe, G. 2016. “More interactivity with IT support in advisory service encounters?” in
Mensch und Computer, Aachen, Germany.
Dudley, J. J., and Kristensson, P. O. 2018. “A Review of User Interface Design for Interactive Machine
Learning,ACM Trans. Interact. Intell. Syst. (8:2), pp. 1-37.
Faul, F., Erdfelder, E., Lang, A.-G., and Buchner, A. 2007. “G*Power 3: A flexible statistical power analysis
program for the social, behavioral, and biomedical sciences,” Behavior Research Methods, pp. 175-191.
Feine, J., Adam, M., Benke, I., Maedche, A., and Benlian, A. 2020a. “Exploring Design Principles for
Enterprise Chatbots: An Analytic Hierarchy Process Study,” in 15th International Conference on Design
Science Research in Information Systems and Technology (DESRIST), Kristiansand, Norway: Springer.
Feine, J., Gnewuch, U., Morana, S., and Maedche, A. 2019a. “A Taxonomy of Social Cues for Conversational
Agents,” International Journal of Human-Computer Studies (132), pp. 138-161 (doi:
10.1016/j.ijhcs.2019.07.009).
Feine, J., Gnewuch, U., Morana, S., and Maedche, A. 2020b. “Gender Bias in Chatbot Design,” in Chatbot
Research and Design: Third International Workshop, CONVERSATIONS 2019, Amsterdam, The
Netherlands, November 19-20, 2019, Revised Selected Papers, Springer.
Feine, J., Morana, S., and Maedche, A. 2019b. “Designing a Chatbot Social Cue Configuration System,” in
Proceedings of the 40th International Conference on Information Systems (ICIS), Munich: AISel.
Feine, J., Morana, S., and Maedche, A. 2019c. “Leveraging Machine-Executable Descriptive Knowledge in
Design Science Research The Case of Designing Socially-Adaptive Chatbots,” in Extending the
Boundaries of Design Science Theory and Practice, B. Tulu, S. Djamasbi and G. Leroy (eds.), Cham:
Springer International Publishing, pp. 76-91.
Feine, J., Morana, S., and Maedche, A. 2020c. “A Chatbot Response Generation System,” in Mensch und
Computer 2020 (MuC’20), New York, NY, USA: ACM.
Franke, T., Attig, C., and Wessel, D. 2019. “A Personal Resource for Technology Interaction: Development
and Validation of the Affinity for Technology Interaction (ATI) Scale,” International Journal of
HumanComputer Interaction (35:6), pp. 456-467.
Frohlich, D. 1993. “The history and future of direct manipulation,” Behaviour & Information Technology
(12:6), pp. 315-329 (doi: 10.1080/01449299308924396).
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
16
Gardner, W., Mulvey, E. P., and Shaw, E. C. 1995. “Regression analyses of counts and rates: Poisson,
overdispersed Poisson, and negative binomial models,” Psychological bulletin (118:3), p. 392.
Gnewuch, U., Morana, S., Adam, M., and Maedche, A. 2018. “Faster Is Not Always Better: Understanding
the Effect of Dynamic Response Delays in Human-Chatbot Interaction,” in Proceedings of the 26th
European Conference on Information Systems (ECIS), Portsmouth, United Kingdom, June 23-28.
Gnewuch, U., Morana, S., and Maedche, A. 2017. “Towards Designing Cooperative and Social
Conversational Agents for Customer Service,” in Proceedings of the 38th International Conference on
Information Systems (ICIS), Seoul: AISel.
Go, E., and Sundar, S. S. 2019. “Humanizing chatbots: The effects of visual, identity and conversational cues
on humanness perceptions,” Computers in Human Behavior (97), pp. 304-316.
Gregor, S., and Hevner, A. R. 2013. “Positioning and presenting design science research for maximum
impact,” MIS Quarterly (37:2), pp. 337-355.
Gregor, S., and Jones, D. 2007. “The Anatomy of a Design Theory,” Journal of the Association for
Information Systems (8:5), pp. 312-335 (doi: 10.17705/1jais.00129).
Grudin, J., and Jacques, R. 2019. “Chatbots, Humbots, and the Quest for Artificial General Intelligence,” in
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, New York, NY,
USA: ACM, 209:1209:11.
Hair, J. F., Black, W. C., Babin, B. J., and Anderson, R. E. 2010. Multivariate data analysis: A GLobal
Perspective, New Jersey: Pearson Prentice Hall.
Harms, J., Kucherbaev, P., Bozzon, A., and Houben, G. 2019. “Approaches for Dialog Management in
Conversational Agents,” IEEE Internet Computing (23:2), pp. 13-22 (doi: 10.1109/MIC.2018.2881519).
Hayes, A. F. 2013. Introduction to mediation, moderation, and conditional process analysis: A regression-
based approach, New York, NY: Guilford Press.
Huang, T.-H., Chang, J. C., and Bigham, J. P. 2018. “Evorus: A Crowd-powered Conversational Assistant
Built to Automate Itself Over Time,” in Proceedings of the 2018 CHI Conference on Human Factors in
Computing Systems - CHI '18, ACM Press, pp. 1-13.
Jonell, P., Bystedt, M., Dogan, F. I., Fallgren, P., Ivarsson, J., Slukova, M., Ulme Wennberg, J. L., Boye, J.,
and Skantze, G. 2018. “Fantom: A Crowdsourced Social Chatbot using an Evolving Dialog Graph,” in 1st
Proceedings of Alexa Prize.
Kim, H.-S., and Sundar, S. S. 2011. “Using interface cues in online health community boards to change
impressions and encourage user contribution,” in Conference proceedings of the 29th Annual CHI
Conference on Human Factors in Computing Systems, New York, NY: ACM, p. 599.
Kuechler, B., and Vaishnavi, V. 2008. “On theory development in design science research: Anatomy of a
research project,” European Journal of Information Systems (17:5), pp. 489-504.
Lalmas, M., O'Brien, H., and Yom-Tov, E. 2014. “Measuring User Engagement,” Synthesis Lectures on
Information Concepts, Retrieval, and Services (6:4), pp. 1-132 (doi:
10.2200/S00605ED1V01Y201410ICR038).
Lankton, N. K., McKnight, D. H., and Tripp, J. 2015. “Technology, Humanness, and Trust: Rethinking Trust
in Technology,” Journal of the Association for Information Systems (16:10).
Lee, J. Y., and Sundar, S. S. 2013. “To tweet or to retweet? That is the question for health professionals on
twitter,” Health communication (28:5), pp. 509-524 (doi: 10.1080/10410236.2012.700391).
Liu, Y. 2003. “Developing a Scale to Measure the Interactivity of Websites,” Journal of Advertising
Research (43), pp. 207-216 (doi: 10.1017/S0021849903030204).
Madjar, N., Greenberg, E., and Chen, Z. 2011. “Factors for radical creativity, incremental creativity, and
routine, noncreative performance,” The Journal of applied psychology (96:4), pp. 730-743.
Maedche, A., Gregor, S., Morana, S., and Feine, J. 2019. “Conceptualization of the Problem Space in Design
Science Research,” in Extending the Boundaries of Design Science Theory and Practice, B. Tulu, S.
Djamasbi and G. Leroy (eds.), Cham: Springer International Publishing, pp. 18-31.
McTear, M., Callejas, Z., and Griol, D. 2016. The Conversational Interface: Talking to Smart Devices,
Switzerland: Springer International Publishing.
Meth, H., Mueller, B., and Maedche, A. 2015. “Designing a requirement mining system,” Journal of the
Association for Information Systems (16:9), p. 799.
Mrass, V., Peters, C., and Leimeister, J. M. 2017. “One for All? Managing External and Internal Crowds
through a Single Platform - A Case Study,” in Hawaii International Conference on System Sciences
(HICSS), Waikoloa, HI, USA.
Myers, M. D. 2009. Qualitative research in business & management, Thousand Oaks, CA: SAGE
Publications.
Designing Interactive Chatbot Development Systems
Forty-First International Conference on Information Systems, India 2020
17
Myers, M. D., and Venable, J. R. 2014. “A set of ethical principles for design science research in information
systems,” Information & Management (51:6), pp. 801-809 (doi: 10.1016/j.im.2014.01.002).
Norman, D. 2013. The design of everyday things: Revised and expanded edition, Basic Books (AZ).
O’Brien, H. L., Cairns, P., and Hall, M. 2018. “A practical approach to measuring user engagement with the
refined user engagement scale (UES) and new UES short form,” International Journal of Human-
Computer Studies (112), pp. 28-39 (doi: 10.1016/j.ijhcs.2018.01.004).
Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., and Podsakoff, N. P. 2003. “Common method biases in
behavioral research: a critical review of the literature and recommended remedies,” The Journal of
applied psychology (88:5), pp. 879-903.
Rafaeli, S. 1988. “From new media to communication,” Sage annual review of communication research:
Advancing communication science (16), pp. 110-134.
Rieser, V., and Lemon, O. 2011. Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven
Methodology for Dialogue Management and Natural Language Generation, Springer.
Russell-Rose, T., and Tate, T. 2013. Designing the search experience: The information architecture of
discovery, Amsterdam: Morgan Kaufmann/Elsevier.
Seeber, I., Vreede, G.-J. de, Maier, R., and Weber, B. 2017. “Beyond Brainstorming: Exploring Convergence
in Teams,Journal of Management Information Systems (34), pp. 939-969.
Seidel, S., Chandra Kruse, L., Székely, N., Gau, M., Stieger, D., Peffers, K., Tuunanen, T., Niehaves, B., and
Lyytinen, K. 2018. “Design principles for sensemaking support systems in environmental sustainability
transformations,” European Journal of Information Systems (27:2), pp. 221-247.
Shneiderman, B. 1997. “Direct manipulation for comprehensible, predictable and controllable user
interfaces,” in Proceedings of the international conference on Intelligent user interfaces, ACM, pp. 33-
39.
Shum, H.-y., He, X.-d., and Di Li 2018. “From Eliza to XiaoIce: challenges and opportunities with social
chatbots,Frontiers of Information Technology & Electronic Engineering (19:1), pp. 10-26.
Steuer, J. 1992. “Defining virtual reality: Dimensions determining telepresence,” Journal of Communication
(42:4), pp. 73-93 (doi: 10.1111/j.1460-2466.1992.tb00812.x).
Sundar, S. S. 2012. “Social psychology of interactivity in human-website interaction,” in Oxford handbook of
internet psychology, Oxford University Press.
Sundar, S. S., Bellur, S., Oh, J., Jia, H., and Kim, H.-S. 2016. “Theoretical Importance of Contingency in
Human-Computer Interaction: Effects of Message Interactivity on User Engagement,Communication
Research (43:5), pp. 595-625.
Sundar, S. S., Jia, H., Waddell, T. F., and Huang, Y. 2015. “Toward a Theory of Interactive Media Effects
(TIME),” in The handbook of the psychology of communication technology, S. S. Sundar (ed.),
Chichester, West Sussex, UK, Malden, MA: Wiley Blackwell, pp. 47-86.
Sundar, S. S., Kalyanaraman, S., and Brown, J. 2003. “Explicating Web Site Interactivity: Impression
Formation Effects in Political Campaign Sites,” Communication Research (30:1), pp. 30-59 (doi:
10.1177/0093650202239025).
Thomaz, F., Salge, C., Karahanna, E., and Hulland, J. 2020. “Learning from the Dark Web: leveraging
conversational agents in the era of hyper-privacy to enhance marketing,” Journal of the Academy of
Marketing Science (48:1), pp. 43-63.
Venable, J., Pries-Heje, J., and Baskerville, R. 2016. “FEDS: a Framework for Evaluation in Design Science
Research,European Journal of Information Systems (25:1), pp. 77-89.
Vom Brocke, J., Winter, R., Hevner, A., and Maedche, A. 2020. “Special Issue Editorial–Accumulation and
Evolution of Design Knowledge in Design Science Research: A Journey Through Time and Space,”
Journal of the Association for Information Systems (21:3), p. 9.
Waizenegger, L., Seeber, I., Dawson, G., and Desouza, K. 2020. Conversational Agents - Exploring
Generative Mechanisms and Second-hand Effects of Actualized Technology Affordances.
Weitekamp, D., Harpstead, E., and Koedinger, K. R. 2020. “An Interaction Design for Machine Teaching to
Develop AI Tutors,” in Proceedings of the 2020 CHI Conference on Human Factors in Computing
Systems, ACM, pp. 1-11.
Wellnhammer, N., Dolata, M., Steigler, S., and Schwabe, G. 2020. Studying with the Help of Digital Tutors:
Design Aspects of Conversational Agents that Influence the Learning Process.
Winkler, R., and Söllner, M. 2018. “Unleashing the Potential of Chatbots in Education: A State-Of-The-Art
Analysis,Academy of Management Proceedings (2018), p. 15903.
Winkler, W. E. 1990. “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model
of Record Linkage,” in Proceedings of the Section on Survey Research Methods, pp. 354-359.
... The conclusion step finds the results to support the next design iteration. Feine et al. (2020) elaborated the suggestion step of W. Kuechler and Vaishnavi's method by conducting a formalized requirements engineering for a systematic transition into the development stage. The illustration of this adaptation can be seen in Figure 1. ...
... ements that prescribe what and how to build an artifact to achieve a predefined design goal" (p. 4040). Finally, the linkage of requirements and design principles is further extended into concrete design features as per Seidel's notion of substantiating technical specifications into concrete design elements for the prototype (Seidel et al., 2018). (Feine et. al., 2020). ...
... After a series of deliberation and discussion cycles based on prior knowledge about methods in DS and CS, keeping in view the application of the methods in successful projects (Chandra et al., 2015;Feine et al., 2020;Seidel et al., 2018), the requirements elicitation of Kuechler and Vaishnavi's adaptation of design science research approach (W. Kuechler and Vaishnavi, 2012) was chosen. ...
Conference Paper
Full-text available
In this paper, we reflect on the experiences of two Grounded Design (GD) research projects conducted by a multidisciplinary group of researchers between 2019-2021 and highlight the methodological foundations and related obstacles for iterative designing. Both projects investigate the phenomena of knowledge sharing and crisis-related learning in business organizations under the GD paradigm, which has been increasingly adopted within the Computer-supported Cooperative Work (CSCW) community. During these projects, the researchers with backgrounds in computer science, business informatics, software engineering, and sociology experienced the need for systematization to transition between the stages of GD. Looking back, we realize that our teams arrived at this systematization by blending the prior knowledge from team members' original educational backgrounds. While blending practices most likely happens intuitively in interdisciplinary projects, as is often the case of the user-centered design initiatives seen in CSCW and Human-Computer Interaction, little can be found on how this usually happens and its implications. In this paper, we respond to this literature gap by discussing how this blending can facilitate the realization of GD projects and lead to a praxeological information science research perspective, which has 'methods appropriation' as key to systematizing abstraction, broader traceability, and flexibility of research methods. 2
... This is also in line with the results of previous researches that studied the effect of immediate representations of the design choices [7; 19]. Looking at the general class of AI-based interactive systems, this feature can give the designer control over the outcome of AI models, and provides a lens for understanding AI's challenges in the design of interactive systems [10]. Assistance on intent and entity identification. ...
... At a more general level, an important aspect concerns the control that designers should have on Human-Centered AI [22]. In this respect, our future work will focus on generalizing our research to consider the changing role of humans in the design of AI-based systems and foster a discussion on design practices for human-AI interaction [10]. ...
Chapter
Full-text available
This paper presents an interaction paradigm for the design of chatbots. Its novelty is the completion of conversational patterns that progressively guide the design activity and provide an interactive, immediate representation of the conversation under construction. Thanks to the automatic generation of code, the paradigm facilitates the rapid prototyping of the conversational UI, thus it empowers non-programmers to master the design process. The paper also illustrates some preliminary user studies and discusses some lessons learned for the definition of interactive paradigms for the design of conversational UIs.
... Nevertheless, there is a lack of research on CA involvement in human-human conversation. For instance, while Feine et al. (2020aFeine et al. ( , 2020b investigated CA development systems focusing on the development process through interaction between domain expert and CA, Gao and Jiang (2021) and Hohenstein and Jung (2018) provided starting points for HIS with an agent focusing on the human-human-interaction. Thereby, they examine, among others, usage, efficiency, and quality of suggestions provided by the CA. These insights provide a valuable starting point to extend the study of HIS with a CA. ...
Conference Paper
Full-text available
Conversational agents (CAs) are increasingly deployed to automate online customer service encounters. Hence, researchers and practitioners have so far predominantly addressed attributes and features of customer-facing CAs toward more efficient customer request processing. However, as CAs still regularly fail to answer complex issues, the concept of Hybrid Intelligence (HI) suggests combining artificial with human intelligence in a Hybrid Intelligence System to overcome the weaknesses of CAs and service employees (SEs) and promote their strengths leading to enhanced performance results and collaborative learning through mutual augmentation. Thus, following a Design Science Research approach, we formulate design principles (DPs) to develop an employee-facing CA for augmenting SEs simultaneously to their customer interaction. We implement a CA prototype and evaluate it with 21 participants in a user test. We found that the DPs were successfully implemented. Thereby, we contribute to practice, customer service, and HI research and provide avenues for future research.
... In particular, an important aspect concerns the control that both the designers and the final users should have on the AI models governing NLP; more in general, this, in turn, relates to the need of controlling and trusting AI-enhanced systems [9]. Within this context, our future work will focus on generalizing our research to consider the changing role of humans in the design of AI-based systems and foster a reflection on responsibility in design practices for human-AI interaction [4]. ...
Conference Paper
Full-text available
This paper reports on the experience we have carried out in the last years for the definition of a model-based technique for the automatic generation of chatbots for data exploration. We illustrate how this technique has been integrated within a no-code platform offering visual notations to i) index relational data sources and ii) bind them to conversation flows for data exploration.
... In turn, social cues can increase user satisfaction (Gnewuch, Morana, Adam, & Maedche, 2018;Rietz, Benke, & Maedche, 2019) and help conversational agents to adapt to specific users, contexts, and tasks (Feine, Morana, & Maedche, 2019b). From an enterprise perspective, conversational agents might be a threat to rationalize human workers (Feine, Morana, & Maedche, 2020c, 2020d but can also provide valuable support in employees' work routines (Benke, Knierim, & Maedche, 2020;Feine, Adam, Benke, Maedche, & Benlian, 2020a). Ergo, conversational agent designers always have to consider both positive and negative design implications and must engage in ethical considerations and design trade-offs (André et al., 2019;Benke, 2020). ...
Chapter
Full-text available
Technological innovations raise axiological questions such as what is right or wrong, good and bad, and so on (i.e., ethical considerations). These considerations have particular importance in design science research (DSR) projects since the developed artifacts often actively intervene into human affairs and, thus, cannot be free from value. To account for this fact, Myers and Venable (2014) proposed six ethical principles for DSR in order to support researchers to conduct ethical DSR. However, ethical principles per se — and the ethical DSR principles that Myers and Venable propose — have an abstract nature so that they can apply to a broad range of contexts. As a consequence, they do not necessarily apply to specific research projects, which means researchers need to contextualize them for each specific DSR project. Because doing so involves much challenge, we explore how contemporary DSR publications have dealt with this contextualization task and how they implemented the six ethical principles for DSR. Our results reveal that DSR publications have not discussed ethical principles in sufficient depth. To further promote ethical considerations in DSR, we argue that both DSR researchers and reviewers should be supported in implementing ethical principles. Therefore, we outline two pathways toward ethical DSR. First, we propose that researchers need to articulate the next generation of ethical principles for DSR using prescriptive knowledge structures from DSR. Second, we propose extending established DSR conceptualizations with an ethical dimension and specifically introduce the concept of ethical DSR process models. With this work, we contribute to the IS literature by reviewing ethical principles and their implementation in DSR, identifying potential challenges hindering efforts to implement ethics in DSR, and providing two pathways towards ethical DSR.
Conference Paper
Full-text available
Developing successful chatbots is a non-trivial endeavor. In particular, the creation of high-quality natural language responses for chatbots remains a challenging and time-consuming task that often depends on high-quality training data and deep domain knowledge. As a consequence, it is essential to engage experts in the chatbot response development process which have the required domain knowledge. However, current tool support to engage domain experts in the response generation process is limited and often does not go beyond the exchange of decoupled prototypes and spreadsheets. In this paper, we present a system that enables chatbot developers to efficiently engage domain experts in the chatbot response generation process. More specifically, we introduce the underlying architecture of a system that connects to existing chatbots via an API, provides two improvement mechanisms for domain experts to improve chatbot responses during their chatbot interaction, and helps chatbot developers to review the collected response improvements with a sentiment supported review dashboard. Overall, the design of the system and its improvement mechanisms are useful extensions for chatbot development systems in order to support chatbot developers and domain experts to collaboratively enhance the natural language responses of a chatbot.
Conference Paper
Full-text available
Chatbots have attracted tremendous interest in recent years and are increasingly employed in form of enterprise chatbots (ECBs) (i.e., chatbots used in the explicit context of enterprise systems). Although ECBs substantially differ in their design requirements from, for example, more common and widely deployed customer service chatbots, only few studies exist that specifically investigate and provide guidance for the design of ECBs. To address this emerging gap, we accumulated existing design knowledge from previous studies and created a list of 26 design features (DFs) which we integrated into 6 design principles (DPs). Subsequently, 36 practitioners from an IT consulting company which are experienced in using ECBs evaluated the importance of the DPs and DFs following the Analytic Hierarchy Process method. Our results provide evidence that DPs and DFs promoting usability and flexibility are ranked more important than DPs and DFs promoting socialness and human likeness. These findings provide valuable insights, as they are partially contrary to some existing studies investigating the importance of social cues of chatbots in other domains. Overall, the identified lists of DPs and DFs and their importance rankings provide guidance for the design of ECBs and can serve as a basis for future research projects.
Article
Full-text available
Communicating with customers through live chat interfaces has become an increasingly popular means to provide real-time customer service in many e-commerce settings. Today, human chat service agents are frequently replaced by conversational software agents or chatbots, which are systems designed to communicate with human users by means of natural language often based on artificial intelligence (AI). Though cost- and time-saving opportunities triggered a widespread implementation of AI-based chatbots, they still frequently fail to meet customer expectations, potentially resulting in users being less inclined to comply with requests made by the chatbot. Drawing on social response and commitment-consistency theory, we empirically examine through a randomized online experiment how verbal anthropomorphic design cues and the foot-in-the-door technique affect user request compliance. Our results demonstrate that both anthropomorphism as well as the need to stay consistent significantly increase the likelihood that users comply with a chatbot’s request for service feedback. Moreover, the results show that social presence mediates the effect of anthropomorphic design cues on user compliance.
Article
Full-text available
The Web is a constantly evolving, complex system, with important implications for both marketers and consumers. In this paper, we contend that over the next five to ten years society will see a shift in the nature of the Web, as consumers, firms and regulators become increasingly concerned about privacy. In particular, we predict that, as a result of this privacy-focus, various information sharing and protection practices currently found on the Dark Web will be increasingly adapted in the overall Web, and in the process, firms will lose much of their ability to fuel a modern marketing machinery that relies on abundant, rich, and timely consumer data. In this type of controlled information-sharing environment, we foresee the emersion of two distinct types of consumers: (1) those generally willing to share their information with marketers (Buffs), and (2) those who generally deny access to their personal information (Ghosts). We argue that one way marketers can navigate this new environment is by effectively designing and deploying conversational agents (CAs), often referred to as “chatbots.” In particular, we propose that CAs may be used to understand and engage both types of consumers, while providing personalization, and serving both as a form of differentiation and as an important strategic asset for the firm—one capable of eliciting self-disclosure of otherwise private consumer information.
Article
Full-text available
Sir Isaac Newton famously said, "If I have seen further it is by standing on the shoulders of giants." Research is a collaborative, evolutionary endeavor-and it is no different with design science research (DSR) which builds upon existing design knowledge and creates new design knowledge to pass on to future projects. However, despite the vast, growing body of DSR contributions, scant evidence of the accumulation and evolution of design knowledge is found in an organized DSR body of knowledge. Most contributions rather stand on their own feet than on the shoulders of giants, and this is limiting how far we can see; or in other words, the extent of the broader impacts we can make through DSR. In this editorial, we aim at providing guidance on how to position design knowledge contributions in wider problem and solution spaces. We propose (1) a model conceptualizing design knowledge as a resilient relationship between problem and solution spaces, (2) a model that demonstrates how individual DSR projects consume and produce design knowledge, (3) a map to position a design knowledge contribution in problem and solution spaces, and (4) principles on how to use this map in a DSR project. We show how fellow researchers, readers, editors, and reviewers, as well as the IS community as a whole, can make use of these proposals, while also illustrating future research opportunities.
Conference Paper
Full-text available
Conversational agents currently attract strong interest for technology-based service provision due to increased capabilities driven by advances in machine learning and natural language processing. The interaction via natural language in combination with a human-like design promises service that is always available, fast, and with a consistent quality and at the same time resembles a human service encounter. However, current conversational agents exhibit the same inherent limitation that every interactive technology has, which is a lack of social skills. In this study, we make a first step towards overcoming this limitation by presenting a design approach that combines automatic sentiment analysis with adaptive responses to emulate empathy in a service encounter. By means of an experiment with 112 participants, we evaluate the approach and find empirical support that a CA with sentiment-adaptive responses is perceived as more empathetic, human-like, and socially present and, in particular, yields a higher service encounter satisfaction.
Conference Paper
Full-text available
Social cues (e.g., gender, age) are important design features of chatbots. However, choosing a social cue design is challenging. Although much research has empirically investigated social cues, chatbot engineers have difficulties to access this knowledge. Descriptive knowledge is usually embedded in research articles and difficult to apply as prescriptive knowledge. To address this challenge, we propose a chatbot social cue configuration system that supports chatbot engineers to access descriptive knowledge in order to make justified social cue design decisions (i.e., grounded in empirical research). We derive two design principles that describe how to extract and transform descriptive knowledge into a prescriptive and machine-executable representation. In addition, we evaluate the prototypical instantiations in an exploratory focus group and at two practitioner symposia. Our research addresses a contemporary problem and contributes with a generalizable concept to support researchers as well as practitioners to leverage existing descriptive knowledge in the design of artifacts.
Article
Full-text available
Conversational agents (CAs) are software-based systems designed to interact with humans using natural language and have attracted considerable research interest in recent years. Following the Computers Are Social Actors paradigm, many studies have shown that humans react socially to CAs when they display social cues such as small talk, gender, age, gestures, or facial expressions. However, research on social cues for CAs is scattered across different fields, often using their specific terminology, which makes it challenging to identify, classify, and accumulate existing knowledge. To address this problem, we conducted a systematic literature review to identify an initial set of social cues of CAs from existing research. Building on classifications from interpersonal communication theory, we developed a taxonomy that classifies the identified social cues into four major categories (i.e., verbal, visual, auditory, invisible) and ten subcategories. Subsequently, we evaluated the mapping between the identified social cues and the categories using a card sorting approach in order to verify that the taxonomy is natural, simple, and parsimonious. Finally, we demonstrate the usefulness of the taxonomy by classifying a broader and more generic set of social cues of CAs from existing research and practice. Our main contribution is a comprehensive taxonomy of social cues for CAs. For researchers, the taxonomy helps to systematically classify research about social cues into one of the taxonomy's categories and corresponding subcategories. Therefore, it builds a bridge between different research fields and provides a starting point for interdisciplinary research and knowledge accumulation. For practitioners, the taxonomy provides a systematic overview of relevant categories of social cues in order to identify, implement, and test their effects in the design of a CA.
Conference Paper
Conversational agents such as Apple’s Siri or Amazon’s Alexa are becoming more and more prevalent. Almost every smart device comes equipped with such an agent. While on the one hand they can make menial everyday tasks a lot easier for people, there are also more sophisticated use cases in which conversational agents can be helpful. One of these use cases is tutoring in higher education. Several systems to support both formal and informal learning have been developed. There have been many studies about single characteristics of pedagogical conversational agents and how these influence learning outcomes. But what is still missing, is an overview and guideline for atomic design decisions that need to be taken into account when creating such a system. Based on a review of articles on pedagogical conversational agents, this paper provides an extension of existing classifications of characteristics as to include more fine-grained design aspects.