Conference PaperPDF Available

A New Chatbot for Customer Service on Social Media

Authors:
  • IBM Research, Almaden
  • IBM Research, Almaden

Abstract and Figures

Users are rapidly turning to social media to request and receive customer service; however, a majority of these requests were not addressed timely or even not addressed at all. To overcome the problem, we create a new conversational system to automatically generate responses for users requests on social media. Our system is integrated with state-of-the-art deep learning techniques and is trained by nearly 1M Twitter conversations between users and agents from over 60 brands. The evaluation reveals that over 40% of the requests are emotional, and the system is about as good as human agents in showing empathy to help users cope with emotional situations. Results also show our system outperforms information retrieval system based on both human judgments and an automatic evaluation metric.
Content may be subject to copyright.
A New Chatbot for Customer
Service on Social Media
Anbang Xu, Zhe Liu, Yufan Guo, Vibha Sinha, Rama Akkiraju
IBM Research - Almaden
San Jose, CA, USA
{anbangxu, liuzh, guoy, vibha.sinha, akkiraju@us.ibm.com}
ABSTRACT
Users are rapidly turning to social media to request and
receive customer service; however, a majority of these
requests were not addressed timely or even not addressed at
all. To overcome the problem, we create a new
conversational system to automatically generate responses
for users requests on social media. Our system is integrated
with state-of-the-art deep learning techniques and is trained
by nearly 1M Twitter conversations between users and
agents from over 60 brands. The evaluation reveals that
over 40% of the requests are emotional, and the system is
about as good as human agents in showing empathy to help
users cope with emotional situations. Results also show our
system outperforms information retrieval system based on
both human judgments and an automatic evaluation metric.
Author Keywords
Chatbot; social media; customer service; deep learning.
ACM Classification Keywords
H.5.3 Information Interfaces and Presentation: Group and
Organization Interfaces.
INTRODUCTION
Social media has changed the way users approach customer
service. Nearly half of U.S. Internet users are turning to
social media for help, as they can easily send off a Tweet or
Facebook status rather than call a 1-800 number or draft a
detailed email [10]. Twitter users send millions of requests
to major U.S. brands monthly. With the rapid increase in
the number of user requests, it has become increasingly
challenging to process and respond to incoming requests.
To address this challenge, many organizations form
dedicated customer service teams responding to user
requests on social media. The team consists of dozens or
even hundreds of human agents trained to address users
various needs [9]. However, manually addressing requests
is time-consuming and often fails users expectations.
Recent studies show that 72% of users who contact a brand
on Twitter expect a response within an hour [19]. Yet, our
analysis of 1M conversations shows the average response
time is 6.5 hours. This gap motivated us to explore the
feasibility of chatbots for customer service on social media.
There has been a long history of chatbots powered by
various techniques such as information retrieval and
template rules [15]. Deep learning techniques have been
recently applied to natural language generation; however,
prior work focuses on general scenarios without specific
contexts [7]. Lessons could also be informed by studies of
social Q&A [5, 6, 13], where users may ask informational
questions about products or services. Yet, it is not clear how
such question types can be applied for customer service.
In this work, we create a new conversational system for
customer service on social media. State-of-the-art deep
learning techniques such as long short-term memory
(LSTM) networks are first applied to generate responses for
customer-service requests on social media. The system
takes a request as the input, computes its vector
representations, feeds it to LSTM, and then outputs the
response. The system was trained on nearly 1M Twitter
conversations between users and agents from 60+ brands.
In the evaluation, we conduct a content analysis revealing
two major themes related to user requests on social media:
emotional and informational. More than 40% of the
requests are emotional without specific informational
intents. Our system performs nearly as well as human
agents in providing empathy to address users’ emotional
requests. In addition, we find that our system received
significantly higher ratings than information retrieval (IR)
system in both human judgments and an automatic metric.
CUSTOMER SERVICE CHATBOT VIA DEEP LEARNING
The conversation between users and customer service
agents on social media can be viewed as mapping one
sequence of words representing the request to another
sequence of words representing the response (see Figure 1).
Deep learning techniques can be applied to learn the
mapping from sequences to sequences [17].
Sequence-to-Sequence Learning
The core of the system consists of two LSTM neural
networks: one as an encoder that maps a variable-length
input sequence to a fixed-length vector, and the other as a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
CHI 2017, May 06-11, 2017, Denver, CO, USA
© 2017 ACM. ISBN 978-1-4503-4655-9/17/05…$15.00
DOI: http://dx.doi.org/10.1145/3025453.3025496
decoder that maps the vector to a variable-length output
sequence (Figure 1). The advantage of LSTM is that it can
store sequential information over extended time intervals
and learn to block or pass on information depending on its
importance. Following [17], the encoder LSTM reads each
input sequence in reverse (Figure 1). This helps the learning
algorithm establish a connection between two sequences.
Word Embedding
Words in a user’s request cannot be directly used as inputs
for LSTMs; each word needs to be converted to a feature
vector. Traditional lexicon-based methods [12] can convert
words into feature vectors, and many words from social
media don’t exist in current lexicons [4]. Other feature
representations such as n-grams treat words as discrete
elements, which would result in a high dimensional vector
and, accordingly, a large number of parameters have to be
learned. This may cause data sparsity when the amount of
training data is incomparable to the number of parameters.
Our system adopts a word embedding method, word2vec
neural network language model [8], to learn distributed
representations of words from customer service
conversations in an unsupervised fashion. The idea of
word2vec is that each dimension of the embedding
represents a latent feature of the word, which can capture
useful syntactic and semantic properties. For example, in a
discrete space, words such as “sorry”, “apologize”, and
“glad” are equally distant from each other; but word2vec
can represent these words in a continuous space and the
distance between “sorry” and “apologize” is shorter than
the distance between “sorry” and “glad”.
Implementation
62 brands were selected according to three criteria. 1) A
brand has a Twitter account dedicated to customer service
(e.g. ATTCares). 2) A large variety of brands is covered to
enhance the generalizability of our findings across product
categories. 3) National brands are selected so that a national
sample from crowdsourcing is suitable for evaluation tasks.
The conversation data was collected by the Twitter public
API. We used the Streaming API to capture tweets that
@mention any of the brands; we also continuously
collected the most recent tweets from each brand. We next
matched each reply with its request based on the
“in_reply_to_status_id” and “in_reply_to_user_id” fields,
and thus reconstructed the conversation. Since the
Streaming API only contains a sample of user tweets, we
also used the Search API to get additional tweets, which
were appeared in the “in_reply_to_status_id” field, but
were not captured by the Streaming API.
Over 2.6M user requests were collected and only 40.4% of
them received replies. 87.6% of the conversations only have
one turn (one user request with one agent reply). The
collected conversations happened between Jun. 1 and Aug.
1, 2016. 30K of the 1M conversations were stratified
sampled from the brands for evaluation and the rest were
used to develop our system. Several steps were performed
to create the system:
Step 1: Clean the data. We removed non-English requests
and requests with images. All the @mentions were also
removed in the training and testing data.
Step 2: Tokenize the data. We built a vocabulary of the
most frequent 100K words in the conversations.
Step 3: Generate word-embedding features. We used the
collected corpus to train word2vec models. Each word in
the vocabulary was represented as a 640-dimension vector.
Step 4: Train LSTM networks. The input and output of
LSTMs are vector representations of word sequences, with
one word encoded or decoded at a time. In view of the
clear advantage of deep LSTMs over shallow LSTMs in
reported sequence-to-sequence tasks [17], we trained deep
LSTMs jointly with 5 layers x 640 memory cells using
stochastic gradient descent and gradient clipping.
EVALUATION
We conducted a content analysis to identify themes related
to user requests on social media, and examined how the
system performs in responding to requests with different
themes. The system was compared with actual human
agents as well as a standard information retrieval baseline
[15], where we retrieved the response whose associated
request is most similar to a new request. The similarity
measure was based on a TF-IDF weighted vector space
model implemented in Apache Lucene [20]. The quality of
the generated responses was measured by human judgments
and an automatic evaluation metric.
Content Analysis
Following qualitative analysis methods [16], two hundred
requests were sampled and coded using a bottom-up
approach. The requests were first segmented into the
smallest logical units. A first pass was then performed to
assign categories to the units and subsequent passes were
made to revise and aggregate the categories. We found that
there were two types of request:
Figure 1. Sequence-to-sequence learning with LSTM neural networks.
1) Emotional Request. In emotional requests, users intend to
express their emotions, attitudes or opinions toward a brand
without explicitly seeking specific solutions (see examples
in Table 1). 2) Informational Request. Requests are sent
with the intent of getting information that may help users
solve their problems. This request type is similar to
informational question identified in social Q&A sites [5].
We recruited two annotators to code another sample of 200
requests using the taxonomy. First, the coders received
training in which they were introduced to the themes,
definitions, and examples. They then coded requests on a
smaller sample of the data and resolved disagreements.
Then, they independently coded the requests. Agreement
between the coder was high (kappa coefficient = 0.79, p <
.001). After disagreement was solved, 40.5% of the requests
were emotional and 59.5% of them were informational.
Human Evaluation
Three evaluation measures were derived from prior work to
assess the response quality: 1) Appropriateness. An
appropriate response should be on the same topic as the
request, and should also “make sense” in response to it [15].
2) Empathy. The reply should give individualized attention
to a user and make s/he feel valued [14]. 3) Helpfulness. A
helpful reply should contain useful and concrete advice that
can address the user request [6].
Crowdflower was used to recruit participants. All 703
participants were native English speakers and they were 18
or older. The geographic distribution of participants was
USA (66.0%), UK (22.8%), Canada (8.5%) and Australia
(2.7%). Participants had to fill out at least one gold question
in order to participate the survey. 14.1% of participants
failed the check and their responses were removed.
In a survey task, participants were first instructed to learn
the three rating criteria appropriateness, empathy, and
helpfulness with definitions and examples. Then, they were
shown a request and asked to rate the three responses from
our deep learning system, IR, and human agent
respectively. The responses were arranged in random order
to control order effects. 200 requests were sampled and thus
600 responses were rated. Each response was rated by 5
participants according to the three criteria. The ratings were
made on a 7-point scale from strongly disagree (-3) to
strongly agree (+3) with whether the response met the given
criterion. Intra-class correlation (ICC(1, k)) of participants
ratings was ranged from 0.60 to 0.87, indicating moderately
high reliabilities [2]. The average of participants’ ratings of
a response was used to measure the quality of the response.
We performed three two-way ANOVA tests to examine the
influence of agent type (deep learning, IR, human agent)
and request type (emotional, informational) on the three
dimensions (Figure 2). On appropriateness ratings, the
main effects of agent type (F(2, 594) = 109.46, p < .001) and
request type (F(1, 594) = 44.86, p < .001) and the interaction
between them (F(2, 594) = 5.61, p < .01) were significant at
the 0.01 significance level. The two-way ANOVA of
empathy scores revealed the significant main effects of
agent type (F(2, 594) = 52.37, p < .001) and request type (F(1,
694) = 44.21, p < .001), and interaction effect between these
(F(2, 594) = 3.48, p < .05). Similarly, the results for
helpfulness showed the significant main effects of agent and
request types (F(2, 594) = 87.86, p < .001; F(1, 694) = 25.9, p <
.001), and the interaction (F(1, 594) = 7.17, p < .001).
Emotional
Informational
User: Your customer service is
horrible! You don't even know
how to deliver packages!
Agent: I'm sorry to hear that.
What was the delivery date
provided in your order?
User: Comcast is the worst.
Period.
Agent: Hi, please let us know
if you need assistance.
User: Can't wait to travel with
you next week for the 1st time
of my life with you ;)
Agent: We can't wait to see
you! We're excited to have
you onboard with us soon!
User: How do I disable the alarm
system on my 2009 Toyota matrix?!
Thanks in advance! stranded work
in the am gotta get there.
Agent: Hi, you may visit: (URL)
User: Downloaded Nike training
club app but won't let me in. Just
shows the loading circle before the
error message occurs
Agent: We're here to help. Please
dm us the device you're using and
we'll get started.
User: I'm booking a flight and I am
not seeing the 25% back on my
rewards. Do I have to book on PC?
Agent: Hi, you can find out more
about the pay's website here: (URL)
Table 1. Examples of user requests on social media and their
corresponding replies generated by our deep learning system.
Interestingly, there was no statistically significant
difference between deep learning and human agents on
empathy for emotional requests (t-test, p = 0.15; Figure 2b),
indicating that our system has a similar ability as actual
agents to show empathy toward users in emotional
situations. Table 1 shows our system recognized different
emotional situations and offered empathy accordingly.
Deep learning outperformed IR in all three aspects of
ratings (t-test, p < .01). The advantage of deep learning over
IR was more evident on emotional than informational
questions (Figure 2). However, the performance of both
deep learning and IR agents dropped significantly when
requests became informational (t-test, p < .001), Post hoc
comparisons indicated that human agent performed equally
well on different requests (e.g. t-test, p = 0.94; Figure 2c).
Another interesting observation was that, unlike IR, deep
learning agent transferred certain writing styles from one
brand to another. For example, banking customer service
agents often adopted formal language such as “I apologize
for the poor user experience” in their responses. However,
responses generated by our system became more casual
“I’m sorry you feel this way”. It is possible that a majority
of brands used informal styles on social media. Our system
learned these styles and applied them other brands.
Automatic Evaluation
The field of natural language generation has benefited
greatly from the existence of an automatic evaluation
metric, BLEU [11], which grades an output response
according to n-gram matches to the reference (the response
from a human agent). We applied this metric to a large
testing data set including 30K user requests. Again, deep
learning performed significantly better than IR (t-test, p <
.001; see Figure 3). Moreover, we compared deep learning
and IR within each brand. In general, the BLEU scores of
deep learning were higher than the scores of IR across
brands at the 0.01 significance level.
DISCUSSION AND FUTURE WORK
Traditional customer service often emphasizes users’
informational needs [9]; however, we found that over 40%
of user requests on Twitter are emotional and they are not
intended to seek specific information. This reveals a new
paradigm of customer service interactions. One explanation
is that, compared with calling the 1-800 number or writing
an email, social media significantly lowers the cost of
participation and allows more users to freely share their
experiences with brands. Also, sharing emotions with
public is considered as one of the main motivations for
using social media [1]. Future studies can examine how
emotional requests are associated with users’ motivation in
the context of social media.
Deep learning based system achieved similar performance
as human agents in handling emotional requests, which
represent a significant portion of user requests on social
media. This finding opens new possibilities for integrating
chatbots with human agents to support customer service on
social media. For example, an automated technique can be
designed to separate emotional and informational requests,
and thus emotional requests can be routed to deep learning
chatbots. The response speed can be greatly improved.
Deep learning outperformed IR in all the measures. This is
primarily because of deep learning, as a statistical-based
approach is much better at handling unseen data and thus
more flexible than keyword search approaches. For
instance, given a reference reply to the request “my flight is
delayed” and one to “my order is cancelled”, a deep
learning based system is able to generalize the reply in both
scenarios and provide meaningful replies to unseen
questions such as “my flight is cancelled”, for which the
most appropriate replies can hardly be retrieved from
limited requests/topics available in the training data.
The performance of deep learning and IR systems
decreased when requests switched from emotional to
informational, especially in the case of empathy ratings.
One explanation is that users’ informational needs are more
diverse than their emotional situations. As a result, it is
more challenging to learn and apply the knowledge to
informational requests. The drop in empathy ratings is
probably due to the lack of emotional words in
informational requests. Machine learning techniques are not
able to recognize subtle emotions in these requests and
response empathetically. Future systems could consider
additional contextual information such as users’ social
media profiles to better understand their emotional status.
We observed that deep learning based system was able to
learn writing styles from a brand and transfer them to
another. Future work can explore the functionality in a
more supervised fashion by filtering the training data with
certain styles and specifying the target style for output
sentences. This raises new opportunities of developing
impression management tools on social media. As written
text from brands and individual users affect how they are
perceived on social media [18], such a tool can help them
create images of themselves they wish to present.
Finally, chatbots on social media offer a new opportunity to
provide individualized attention to users at scale and
encourage interactions between users and brands, which can
not only enhance brand performance but also help users
gain social, information and economic benefits [3]. Future
studies can be designed to understand how chatbots affect
the relationship between users and brands in a long term.
Figure 3. BLEU scores of deep learning and IR systems.
REFERENCES
1. Natalya N. Bazarova, Yoon Hyung Choi, Victoria
Schwanda Sosik, Dan Cosley, and Janis Whitlock.
Social Sharing of Emotions on Facebook: Channel
Differences, Satisfaction, and Replies. In Proc. of
CSCW, 2015, 154-164.
2. Joan-Isaac Biel, Oya Aran, and Daniel Gatica-Perez.
You Are Known by How You Vlog: Personality
Impressions and Nonverbal Behavior in Youtube. In
Proc. of ICWSM, 2011, 446-449.
3. Keith S Coulter, Johanna Gummerus, Veronica
Liljander, Emil Weman, and Minna Pihlström.
Customer Engagement in a Facebook Brand
Community. Management Research Review, 2012, 35,
9, 857-877.
4. Ethan Fast, Binbin Chen, and Michael S. Bernstein.
Empath: Understanding Topic Signals in Large-Scale
Text. In Proc. of CHI, 2016, 4647-4657.
5. F. Maxwell Harper, Daniel Moy, and Joseph A.
Konstan. Facts or Friends?: Distinguishing
Informational and Conversational Questions in Social
Q&A Sites. In Proc. of CHI, 2009, 759-768.
6. F. Maxwell Harper, Daphne Raban, Sheizaf Rafaeli,
and Joseph A. Konstan. Predictors of Answer Quality
in Online Q&A Sites. In Proc. of CHI, 2008, 865-874.
7. Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias
Kaufmann, Andrew Tomkins, Balint Miklos, Greg
Corrado, Laszlo Lukacs, Marina Ganea, Peter Young,
and Vivek Ramavajjala. Smart Reply: Automated
Response Suggestion for Email. In Proc. of KDD,
2016, 955-964.
8. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S
Corrado, and Jeff Dean. Distributed Representations of
Words and Phrases and Their Compositionality. In
Proc. of NIPS, 2013, 3111-3119.
9. Keith B Murray. A Test of Services Marketing Theory:
Consumer Information Acquisition Activities. The
Journal of Marketing, 1991, 10-25.
10. NIELSEN. State of the Media: Social Media Report.
2011.
11. Kishore Papineni, Salim Roukos, Todd Ward, and
Wei-Jing Zhu. Bleu: A Method for Automatic
Evaluation of Machine Translation. In Proc. of ACL,
2002, 311-318.
12. James W Pennebaker, Martha E Francis, and Roger J
Booth. Linguistic Inquiry and Word Count: Liwc 2001.
Mahway: Lawrence Erlbaum Associates, 2001, 71,
2001.
13. Tiziano Piccardi, Gregorio Convertino, Massimo
Zancanaro, Ji Wang, and Cedric Archambeau. Towards
Crowd-Based Customer Service: A Mixed-Initiative
Tool for Managing Q&a Sites. In Proc. of CHI, 2014,
2725-2734.
14. Leyland F Pitt, Richard T Watson, and C Bruce Kavan.
Service Quality: A Measure of Information Systems
Effectiveness. Management Information Systems
Quarterly, 1995, 173-187.
15. Alan Ritter, Colin Cherry, and William B Dolan. Data-
Driven Response Generation in Social Media. In Proc.
of EMNLP, 2011, 583-593.
16. Anselm L Strauss Qualitative Analysis for Social
Scientists. Cambridge University Press, 1987.
17. Ilya Sutskever, Oriol Vinyals, and Quoc V Le.
Sequence to Sequence Learning with Neural Networks.
In Proc. of NIPS, 2014, 3104-3112.
18. Anbang Xu, Haibin Liu, Liang Gou, Rama Akkiraju,
Jalal Mahmud, Vibha Sinha, Yuheng Hu, and Mu
Qiao. Predicting Perceived Brand Personality with
Social Media. In Proc. of ICWSM, 2016, 436-445.
19. http://blog.hubspot.com/marketing/twitter-response-
time-data.
20. http://lucene.apache.org.
... Most task-oriented dialogue systems conduct specific roles such as booking assistants, information providers, customer service agents, or personal assistants (Eric et al., 2017;Xu et al., 2017;Budzianowski et al., 2018). However, studies on open-domain dialogue systems that perform specific roles have been insufficiently investigated, even though the role can be defined for the practical chatbot systems (e.g., chatbots that care for senior citizens living alone, or counseling chatbots). ...
... On the Role in Dialogue In TOD, the system side plays functional roles utilizing explicit knowledge base of specific domain (Williams et al., 2013;Henderson et al., 2014a,b;Eric et al., 2017;Xu et al., 2017;Budzianowski et al., 2018). For example, agent in Budzianowski et al. (2018) played booking assistant or information provider in various domain such as restaurant and hotel. ...
Preprint
Full-text available
Recent open-domain dialogue models have brought numerous breakthroughs. However, building a chat system is not scalable since it often requires a considerable volume of human-human dialogue data, especially when enforcing features such as persona, style, or safety. In this work, we study the challenge of imposing roles on open-domain dialogue systems, with the goal of making the systems maintain consistent roles while conversing naturally with humans. To accomplish this, the system must satisfy a role specification that includes certain conditions on the stated features as well as a system policy on whether or not certain types of utterances are allowed. For this, we propose an efficient data collection framework leveraging in-context few-shot learning of large-scale language models for building role-satisfying dialogue dataset from scratch. We then compare various architectures for open-domain dialogue systems in terms of meeting role specifications while maintaining conversational abilities. Automatic and human evaluations show that our models return few out-of-bounds utterances, keeping competitive performance on general metrics. We release a Korean dialogue dataset we built for further research.
... Following the guideline of Human-AI interactions summarized by [2], the interaction between user and chatbot can be concluded in four stages, namely, initially, during interaction, when wrong, and over time. Most of the commercially available task-oriented chatbots in business, such as Bolt by Zoom 2 , are designed only to interact with the user to complete a task in a relatively short time [26,51,76]. We therefore did not include the interaction stage over time in our self-mockery design and correspondingly did not measure personalization, a characteristic of the chatbot's social intelligence generally manifested in long-term interactions [13,39]. ...
... Depending on its functionalities, a chatbot may be enabled by a wide range of technologies such as natural language processing, machine learning, deep learning, artificial neural networks, etc. (Nirala et al., 2022). Chatbots' interactive, costeffective nature has led to a growth in their popularity and applications in multiple industries, primarily for customer service (Xu et al., 2017;Johannsen & Leist, 2018;Behera et al., 2021;Chuah & Kabilan, 2021;Kushwaha and Kar, 2021;Nguyen et al., 2021). Recently, researchers have also explored the use of chatbots in a variety of other areas, such as facilitating collaboration, enhancing work performance, conducting recruiting interviews, and promoting physical and mental health (Ahmad et al., 2022;Avula et al., 2018;Fadhil & Gabrielli, 2017;Fitzpatrick et al., 2017;Hwang & Chang, 2021;Lee et al., 2019;Schroeder et al., 2018;Stieglitz et al., 2021;Williams et al., 2018;Zhou et al., 2019). ...
Article
Full-text available
In higher education, low teacher-student ratios can make it difficult for students to receive immediate and interactive help. Chatbots, increasingly used in various scenarios such as customer service, work productivity, and healthcare, might be one way of helping instructors better meet student needs. However, few empirical studies in the field of Information Systems (IS) have investigated pedagogical chatbot efficacy in higher education and fewer still discuss their potential challenges and drawbacks. In this research we address this gap in the IS literature by exploring the opportunities, challenges, efficacy, and ethical concerns of using chatbots as pedagogical tools in business education. In this two study project, we conducted a chatbot-guided interview with 215 undergraduate students to understand student attitudes regarding the potential benefits and challenges of using chatbots as intelligent student assistants. Our findings revealed the potential for chatbots to help students learn basic content in a responsive, interactive, and confidential way. Findings also provided insights into student learning needs which we then used to design and develop a new, experimental chatbot assistant to teach basic AI concepts to 195 students. Results of this second study suggest chatbots can be engaging and responsive conversational learning tools for teaching basic concepts and for providing educational resources. Herein, we provide the results of both studies and discuss possible promising opportunities and ethical implications of using chatbots to support inclusive learning.
... Chatbots are conversational agents that rely on natural language in the form of text messages Xu et al. 2017). Although the first chatbot ELIZA (Weizenbaum 1966) was already developed in the 1960s, it was not until the 2010s that chatbots gained broader organizational interest (Grudin and Jacques 2019;Seeger et al. 2021). ...
Article
Full-text available
Research has shown that employing social cues (e.g., name, human-like avatar) in chatbot design enhances users’ social presence perceptions and their chatbot usage intentions. However, the picture is less clear for the social cue of chatbot response time. While some researchers argue that instant responses make chatbots appear unhuman-like, others suggest that delayed responses are perceived less positively. Drawing on social response theory and expectancy violations theory, this study investigates whether users’ prior experience with chatbots clarifies the inconsistencies in the literature. In a lab experiment ( N = 202), participants interacted with a chatbot that responded either instantly or with a delay. The results reveal that a delayed response time has opposing effects on social presence and usage intentions and shed light on the differences between novice users and experienced users – that is, those who have not interacted with a chatbot before vs. those who have. This study contributes to information systems literature by identifying prior experience as a key moderating factor that shapes users’ social responses to chatbots and by reconciling inconsistencies in the literature regarding the role of chatbot response time. For practitioners, this study points out a drawback of the widely adopted “one-design-fits-all” approach to chatbot design.
... Dialogue systems 1 have a daily presence in many individuals' lives, acting as virtual assistants (Hoy, 2018), customer service agents (Xu et al., 2017), or even companions (Zhou et al., 2020). While some systems are designed to conduct unstructured conversations in open domains (chatbots), others (task-oriented dialogue systems) help users to complete tasks in a specific domain (Jurafsky and Martin, 2009;Qin et al., 2019). ...
Conference Paper
Full-text available
Task-oriented dialogue systems are increasingly prevalent in healthcare settings, and have been characterized by a diverse range of ar-chitectures and objectives. Although these systems have been surveyed in the medical community from a non-technical perspective, a systematic review from a rigorous computational perspective has to date remained noticeably absent. As a result, many important implementation details of healthcare-oriented dialogue systems remain limited or under-specified, slowing the pace of innovation in this area. To fill this gap, we investigated an initial pool of 4070 papers from well-known computer science, natural language processing , and artificial intelligence venues, identifying 70 papers discussing the system-level implementation of task-oriented dialogue systems for healthcare applications. We conducted a comprehensive technical review of these papers, and present our key findings including identified gaps and corresponding recommendations .
... Social bots can imitate the behaviors of human accounts and be active in OSNs for a long time [4]. At first, social bots were used to serve users and their behaviors included chatting with users [5], automatically posting news (https://bbcnewslabs.co.uk/ projects/bots/), and so on. However, there are a growing number of malicious social bots that attempt to control public opinion and even distort reality [6][7][8][9]. ...
Article
Full-text available
With the increasing popularity of online social networks (OSNs), a huge number of social bots have emerged. Social bots are involved in various cybercrimes like cyberbullying and rumor dissemination, which have seriously affected the normal order of OSNs. Nowadays, existing studies in this field almost focus on English OSNs like Twitter and Facebook. However, it is difficult to directly apply these detection technologies to Sina Weibo, which is one of the largest Chinese microblogging services in the world. In addition, social bots are evolving rapidly and time-consuming feature engineering may not perform well in detecting newly emerging social bots. In this paper, we propose a new joint approach with Temporal and Profile information for social bot detection (TPBot). The approach includes data collection module, feature extraction module, and detection module. To begin with, data collection module uses a web crawler to obtain user data from Sina Weibo. Next, the feature extraction module regards the user posts as temporal data to extract temporal-semantic and temporal-metadata features. Furthermore, this module extracts features based on users’ profile. Finally, a detection model based on BiGRU and attention mechanism is designed in the detection module. The results show that TPBot performs better than baselines with the F1-score of 0.9837 on the Sina Weibo dataset. Moreover, we have also conducted an experiment on the two datasets collected from Twitter to evaluate the generalization ability of TPBot. It is found that TPBot outperforms baselines on the new datasets and has good generalization ability.
... Users perceive chatbots as friendly companions rather than digital assistants (Costa, 2018). According to Xu et al. (2017), forty per cent of requests users make towards chatbots are rather emotional than informative. Another critical aspect of user and chatbot interaction is user perception of chatbots. ...
Article
Full-text available
During the past two decades, technological advances led to the development of various internet-based technologies and platforms with a certain level of automated interaction with users. Furthermore, thanks to the learning process, these technological platforms gain a certain level of autonomy and (artificial) intelligence. A combination of internet-based technologies, advanced algorithms and contemporary hardware resulted in the concept of social robots. Social robots co-create value through interactions with humans as fully or partially automated technologies. Social robots came in many flavours-virtual or embodied. The simplest form is represented by various types of robotic toys, while more complex representatives can offer their respective services in homes, the entertainment industry, the service sector and healthcare. The aim of this paper is twofold. First, this paper investigates the available literature and research studies on social robots and potential marketing applications primarily within the service industry. With the further improvement of the interaction with humans and more humanlike appearance, social robots can potentially excel in roles where service quality significantly relies on the skills and motivation of a service provider. Repetitive and dull activities, such as taking orders at the drive-in, can drain energy and motivation from a person. On the other hand, social robots would perform such tasks without any deviations in behaviour. In some countries, the service industry faces challenges, such as trained staff shortages and a relatively long training process for newly appointed employees. Therefore, we can expect businesses to look for different types of solutions. However, the actual commercial value of social robots will ultimately depend on the preparedness of customers to accept service from a social robot as an alternative to human service. Additionally, the employees' willingness to accept robotic co-workers is also important from the perspectives of a new technology adoption. Finally, this paper aims to detect and describe potential routes for further research on this topic.
Article
In this work we develop a fully data driven conversational agent capable of carrying out motivational coaching sessions in Spanish, French, Norwegian and English. Unlike the majority of coaching, and in general, well-being related conversational agents that can be found in the literature, ours is not designed by hand-crafted rules. Instead, we directly model the coaching strategy of professionals with end users. To this end, we gather a set of virtual coaching sessions through a Wizard of Oz platform, and apply state of the art Natural Language Processing techniques. We employ a transfer learning approach, pretraining GPT2 neural language models and fine-tuning them on our corpus. However, since these only take as input a local dialogue history, a simple fine-tuning procedure is not capable of modeling the long-term dialogue strategies that appear in coaching sessions. To alleviate this issue, we first propose to learn dialogue phase and scenario embeddings in the fine-tuning stage. These indicate to the model at which part of the dialogue it is and which kind of coaching session it is carrying out. Second, we develop global deep learning system which controls the long-term structure of the dialogue. We also show that this global module can be used to visualize and interpret the decisions taken by the the conversational agent, and that the learnt representations are comparable to dialogue acts. Automatic and human evaluation show that our proposals serve to improve the baseline models. Finally, interaction experiments with coaching experts indicate that system is usable and gives raise to positive emotions in Spanish, French and English, while the results in Norwegian point out that there is still work to be done in fully data driven approaches with very low resource languages.
Article
Chatbots have become common in marketing-related applications, providing 24/7 service, engaging customers in humanlike conversation, and reducing employee workload in handling customer calls. However, the academic literature on the use of chatbots in marketing remains sparse and scattered across disciplines. The present study combines morphological analysis and co-occurrence analysis to bring structure to this area and to identify relevant research gaps. Morphological analysis divides a problem into pertinent and clearly distinguishable components, namely dimensions (at an abstract level) and variants (at a concrete level). A Zwicky box (a cross-variant matrix of dimensions) is then constructed to identify future research opportunities. Here, the authors obtain 11 dimensions and 264 variants. To eliminate inconsistent configurations (i.e., combinations of variants across dimensions), they perform a cross-consistency assessment and identify potential research gaps. To increase objectivity in the selection of relevant gaps, the authors use VOSviewer software to conduct a co-occurrence analysis of the variants.
Conference Paper
Full-text available
Brand personality has been shown to affect a variety of user behaviors such as individual preferences and social interactions. Despite intensive research efforts in human personality assessment, little is known about brand personality and its relationship with social media. Leveraging the theory in marketing, we analyze how brand personality associates with its contributing factors embodied in social media. Based on the analysis of over 10K survey responses and a large corpus of social media data from 219 brands, we quantify the relative importance of factors driving brand personality. The brand personality model developed with social media data achieves predicted R 2 values as high as 0.67. We conclude by illustrating how modeling brand personality can help users find brands suiting their personal characteristics and help companies manage brand perceptions.
Article
Full-text available
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.
Conference Paper
Full-text available
People often share emotions with others in order to manage their emotional experiences. We investigate how social media properties such as visibility and directedness affect how people share emotions in Facebook, and their satisfaction after doing so. 141 participants rated 1,628 of their own recent status updates, posts they made on others' timelines, and private messages they sent for intensity, valence, personal relevance, and overall satisfaction felt after sharing each message. For network-visible channels-status updates and posts on others' timelines-they also rated their satisfaction with replies they received. People shared differently between channels, with more intense and negative emotions in private messages. People felt more satisfied after sharing more positive emotions in all channels and after sharing more personally relevant emotions in network-visible channels. Finally, people's overall satisfaction after sharing emotions in network-visible channels is strongly tied to their reply satisfaction. Quality of replies, not just quantity, matters, suggesting the need for designs that help people receive valuable responses to their shared emotions.
Conference Paper
Full-text available
In this paper, we propose a mixed-initiative approach to integrate a Q&A site based on a crowd of volunteers with a standard operator-based help desk, ensuring quality of customer service. Q&A sites have emerged as an efficient way to address questions in various domains by leveraging crowd knowledge. However, they lack sufficient reliability to be the sole basis of customer service applications. We built a proof-of-concept mixed-initiative tool that helps a crowd-manager to decide if a question will get a satisfactory and timely answer by the crowd or if it should be redirected to a dedicated operator. A user experiment found that our tool reduced the participants’ cognitive load and improved their performance, in terms of their precision and recall. In particular, those with higher performance benefited more than those with lower performance.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Conference Paper
In this paper we propose and investigate a novel end-to-end method for automatically generating short email responses, called Smart Reply. It generates semantically diverse suggestions that can be used as complete email responses with just one tap on mobile. The system is currently used in Inbox by Gmail and is responsible for assisting with 10% of all mobile responses. It is designed to work at very high throughput and process hundreds of millions of messages daily. The system exploits state-of-the-art, large-scale deep learning. We describe the architecture of the system as well as the challenges that we faced while building it, like response diversity and scalability. We also introduce a new method for semantic clustering of user-generated content that requires only a modest amount of explicitly labeled data.
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.