Conference PaperPDF Available

A New Chatbot for Customer Service on Social Media

Authors:
  • IBM Research, Almaden
  • IBM Research, Almaden

Abstract and Figures

Users are rapidly turning to social media to request and receive customer service; however, a majority of these requests were not addressed timely or even not addressed at all. To overcome the problem, we create a new conversational system to automatically generate responses for users requests on social media. Our system is integrated with state-of-the-art deep learning techniques and is trained by nearly 1M Twitter conversations between users and agents from over 60 brands. The evaluation reveals that over 40% of the requests are emotional, and the system is about as good as human agents in showing empathy to help users cope with emotional situations. Results also show our system outperforms information retrieval system based on both human judgments and an automatic evaluation metric.
Content may be subject to copyright.
A New Chatbot for Customer
Service on Social Media
Anbang Xu, Zhe Liu, Yufan Guo, Vibha Sinha, Rama Akkiraju
IBM Research - Almaden
San Jose, CA, USA
{anbangxu, liuzh, guoy, vibha.sinha, akkiraju@us.ibm.com}
ABSTRACT
Users are rapidly turning to social media to request and
receive customer service; however, a majority of these
requests were not addressed timely or even not addressed at
all. To overcome the problem, we create a new
conversational system to automatically generate responses
for users requests on social media. Our system is integrated
with state-of-the-art deep learning techniques and is trained
by nearly 1M Twitter conversations between users and
agents from over 60 brands. The evaluation reveals that
over 40% of the requests are emotional, and the system is
about as good as human agents in showing empathy to help
users cope with emotional situations. Results also show our
system outperforms information retrieval system based on
both human judgments and an automatic evaluation metric.
Author Keywords
Chatbot; social media; customer service; deep learning.
ACM Classification Keywords
H.5.3 Information Interfaces and Presentation: Group and
Organization Interfaces.
INTRODUCTION
Social media has changed the way users approach customer
service. Nearly half of U.S. Internet users are turning to
social media for help, as they can easily send off a Tweet or
Facebook status rather than call a 1-800 number or draft a
detailed email [10]. Twitter users send millions of requests
to major U.S. brands monthly. With the rapid increase in
the number of user requests, it has become increasingly
challenging to process and respond to incoming requests.
To address this challenge, many organizations form
dedicated customer service teams responding to user
requests on social media. The team consists of dozens or
even hundreds of human agents trained to address users
various needs [9]. However, manually addressing requests
is time-consuming and often fails users expectations.
Recent studies show that 72% of users who contact a brand
on Twitter expect a response within an hour [19]. Yet, our
analysis of 1M conversations shows the average response
time is 6.5 hours. This gap motivated us to explore the
feasibility of chatbots for customer service on social media.
There has been a long history of chatbots powered by
various techniques such as information retrieval and
template rules [15]. Deep learning techniques have been
recently applied to natural language generation; however,
prior work focuses on general scenarios without specific
contexts [7]. Lessons could also be informed by studies of
social Q&A [5, 6, 13], where users may ask informational
questions about products or services. Yet, it is not clear how
such question types can be applied for customer service.
In this work, we create a new conversational system for
customer service on social media. State-of-the-art deep
learning techniques such as long short-term memory
(LSTM) networks are first applied to generate responses for
customer-service requests on social media. The system
takes a request as the input, computes its vector
representations, feeds it to LSTM, and then outputs the
response. The system was trained on nearly 1M Twitter
conversations between users and agents from 60+ brands.
In the evaluation, we conduct a content analysis revealing
two major themes related to user requests on social media:
emotional and informational. More than 40% of the
requests are emotional without specific informational
intents. Our system performs nearly as well as human
agents in providing empathy to address users’ emotional
requests. In addition, we find that our system received
significantly higher ratings than information retrieval (IR)
system in both human judgments and an automatic metric.
CUSTOMER SERVICE CHATBOT VIA DEEP LEARNING
The conversation between users and customer service
agents on social media can be viewed as mapping one
sequence of words representing the request to another
sequence of words representing the response (see Figure 1).
Deep learning techniques can be applied to learn the
mapping from sequences to sequences [17].
Sequence-to-Sequence Learning
The core of the system consists of two LSTM neural
networks: one as an encoder that maps a variable-length
input sequence to a fixed-length vector, and the other as a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
CHI 2017, May 06-11, 2017, Denver, CO, USA
© 2017 ACM. ISBN 978-1-4503-4655-9/17/05…$15.00
DOI: http://dx.doi.org/10.1145/3025453.3025496
decoder that maps the vector to a variable-length output
sequence (Figure 1). The advantage of LSTM is that it can
store sequential information over extended time intervals
and learn to block or pass on information depending on its
importance. Following [17], the encoder LSTM reads each
input sequence in reverse (Figure 1). This helps the learning
algorithm establish a connection between two sequences.
Word Embedding
Words in a user’s request cannot be directly used as inputs
for LSTMs; each word needs to be converted to a feature
vector. Traditional lexicon-based methods [12] can convert
words into feature vectors, and many words from social
media don’t exist in current lexicons [4]. Other feature
representations such as n-grams treat words as discrete
elements, which would result in a high dimensional vector
and, accordingly, a large number of parameters have to be
learned. This may cause data sparsity when the amount of
training data is incomparable to the number of parameters.
Our system adopts a word embedding method, word2vec
neural network language model [8], to learn distributed
representations of words from customer service
conversations in an unsupervised fashion. The idea of
word2vec is that each dimension of the embedding
represents a latent feature of the word, which can capture
useful syntactic and semantic properties. For example, in a
discrete space, words such as “sorry”, “apologize”, and
“glad” are equally distant from each other; but word2vec
can represent these words in a continuous space and the
distance between “sorry” and “apologize” is shorter than
the distance between “sorry” and “glad”.
Implementation
62 brands were selected according to three criteria. 1) A
brand has a Twitter account dedicated to customer service
(e.g. ATTCares). 2) A large variety of brands is covered to
enhance the generalizability of our findings across product
categories. 3) National brands are selected so that a national
sample from crowdsourcing is suitable for evaluation tasks.
The conversation data was collected by the Twitter public
API. We used the Streaming API to capture tweets that
@mention any of the brands; we also continuously
collected the most recent tweets from each brand. We next
matched each reply with its request based on the
“in_reply_to_status_id” and “in_reply_to_user_id” fields,
and thus reconstructed the conversation. Since the
Streaming API only contains a sample of user tweets, we
also used the Search API to get additional tweets, which
were appeared in the “in_reply_to_status_id” field, but
were not captured by the Streaming API.
Over 2.6M user requests were collected and only 40.4% of
them received replies. 87.6% of the conversations only have
one turn (one user request with one agent reply). The
collected conversations happened between Jun. 1 and Aug.
1, 2016. 30K of the 1M conversations were stratified
sampled from the brands for evaluation and the rest were
used to develop our system. Several steps were performed
to create the system:
Step 1: Clean the data. We removed non-English requests
and requests with images. All the @mentions were also
removed in the training and testing data.
Step 2: Tokenize the data. We built a vocabulary of the
most frequent 100K words in the conversations.
Step 3: Generate word-embedding features. We used the
collected corpus to train word2vec models. Each word in
the vocabulary was represented as a 640-dimension vector.
Step 4: Train LSTM networks. The input and output of
LSTMs are vector representations of word sequences, with
one word encoded or decoded at a time. In view of the
clear advantage of deep LSTMs over shallow LSTMs in
reported sequence-to-sequence tasks [17], we trained deep
LSTMs jointly with 5 layers x 640 memory cells using
stochastic gradient descent and gradient clipping.
EVALUATION
We conducted a content analysis to identify themes related
to user requests on social media, and examined how the
system performs in responding to requests with different
themes. The system was compared with actual human
agents as well as a standard information retrieval baseline
[15], where we retrieved the response whose associated
request is most similar to a new request. The similarity
measure was based on a TF-IDF weighted vector space
model implemented in Apache Lucene [20]. The quality of
the generated responses was measured by human judgments
and an automatic evaluation metric.
Content Analysis
Following qualitative analysis methods [16], two hundred
requests were sampled and coded using a bottom-up
approach. The requests were first segmented into the
smallest logical units. A first pass was then performed to
assign categories to the units and subsequent passes were
made to revise and aggregate the categories. We found that
there were two types of request:
Figure 1. Sequence-to-sequence learning with LSTM neural networks.
1) Emotional Request. In emotional requests, users intend to
express their emotions, attitudes or opinions toward a brand
without explicitly seeking specific solutions (see examples
in Table 1). 2) Informational Request. Requests are sent
with the intent of getting information that may help users
solve their problems. This request type is similar to
informational question identified in social Q&A sites [5].
We recruited two annotators to code another sample of 200
requests using the taxonomy. First, the coders received
training in which they were introduced to the themes,
definitions, and examples. They then coded requests on a
smaller sample of the data and resolved disagreements.
Then, they independently coded the requests. Agreement
between the coder was high (kappa coefficient = 0.79, p <
.001). After disagreement was solved, 40.5% of the requests
were emotional and 59.5% of them were informational.
Human Evaluation
Three evaluation measures were derived from prior work to
assess the response quality: 1) Appropriateness. An
appropriate response should be on the same topic as the
request, and should also “make sense” in response to it [15].
2) Empathy. The reply should give individualized attention
to a user and make s/he feel valued [14]. 3) Helpfulness. A
helpful reply should contain useful and concrete advice that
can address the user request [6].
Crowdflower was used to recruit participants. All 703
participants were native English speakers and they were 18
or older. The geographic distribution of participants was
USA (66.0%), UK (22.8%), Canada (8.5%) and Australia
(2.7%). Participants had to fill out at least one gold question
in order to participate the survey. 14.1% of participants
failed the check and their responses were removed.
In a survey task, participants were first instructed to learn
the three rating criteria appropriateness, empathy, and
helpfulness with definitions and examples. Then, they were
shown a request and asked to rate the three responses from
our deep learning system, IR, and human agent
respectively. The responses were arranged in random order
to control order effects. 200 requests were sampled and thus
600 responses were rated. Each response was rated by 5
participants according to the three criteria. The ratings were
made on a 7-point scale from strongly disagree (-3) to
strongly agree (+3) with whether the response met the given
criterion. Intra-class correlation (ICC(1, k)) of participants
ratings was ranged from 0.60 to 0.87, indicating moderately
high reliabilities [2]. The average of participants’ ratings of
a response was used to measure the quality of the response.
We performed three two-way ANOVA tests to examine the
influence of agent type (deep learning, IR, human agent)
and request type (emotional, informational) on the three
dimensions (Figure 2). On appropriateness ratings, the
main effects of agent type (F(2, 594) = 109.46, p < .001) and
request type (F(1, 594) = 44.86, p < .001) and the interaction
between them (F(2, 594) = 5.61, p < .01) were significant at
the 0.01 significance level. The two-way ANOVA of
empathy scores revealed the significant main effects of
agent type (F(2, 594) = 52.37, p < .001) and request type (F(1,
694) = 44.21, p < .001), and interaction effect between these
(F(2, 594) = 3.48, p < .05). Similarly, the results for
helpfulness showed the significant main effects of agent and
request types (F(2, 594) = 87.86, p < .001; F(1, 694) = 25.9, p <
.001), and the interaction (F(1, 594) = 7.17, p < .001).
Emotional
Informational
User: Your customer service is
horrible! You don't even know
how to deliver packages!
Agent: I'm sorry to hear that.
What was the delivery date
provided in your order?
User: Comcast is the worst.
Period.
Agent: Hi, please let us know
if you need assistance.
User: Can't wait to travel with
you next week for the 1st time
of my life with you ;)
Agent: We can't wait to see
you! We're excited to have
you onboard with us soon!
User: How do I disable the alarm
system on my 2009 Toyota matrix?!
Thanks in advance! stranded work
in the am gotta get there.
Agent: Hi, you may visit: (URL)
User: Downloaded Nike training
club app but won't let me in. Just
shows the loading circle before the
error message occurs
Agent: We're here to help. Please
dm us the device you're using and
we'll get started.
User: I'm booking a flight and I am
not seeing the 25% back on my
rewards. Do I have to book on PC?
Agent: Hi, you can find out more
about the pay's website here: (URL)
Table 1. Examples of user requests on social media and their
corresponding replies generated by our deep learning system.
Interestingly, there was no statistically significant
difference between deep learning and human agents on
empathy for emotional requests (t-test, p = 0.15; Figure 2b),
indicating that our system has a similar ability as actual
agents to show empathy toward users in emotional
situations. Table 1 shows our system recognized different
emotional situations and offered empathy accordingly.
Deep learning outperformed IR in all three aspects of
ratings (t-test, p < .01). The advantage of deep learning over
IR was more evident on emotional than informational
questions (Figure 2). However, the performance of both
deep learning and IR agents dropped significantly when
requests became informational (t-test, p < .001), Post hoc
comparisons indicated that human agent performed equally
well on different requests (e.g. t-test, p = 0.94; Figure 2c).
Another interesting observation was that, unlike IR, deep
learning agent transferred certain writing styles from one
brand to another. For example, banking customer service
agents often adopted formal language such as “I apologize
for the poor user experience” in their responses. However,
responses generated by our system became more casual
“I’m sorry you feel this way”. It is possible that a majority
of brands used informal styles on social media. Our system
learned these styles and applied them other brands.
Automatic Evaluation
The field of natural language generation has benefited
greatly from the existence of an automatic evaluation
metric, BLEU [11], which grades an output response
according to n-gram matches to the reference (the response
from a human agent). We applied this metric to a large
testing data set including 30K user requests. Again, deep
learning performed significantly better than IR (t-test, p <
.001; see Figure 3). Moreover, we compared deep learning
and IR within each brand. In general, the BLEU scores of
deep learning were higher than the scores of IR across
brands at the 0.01 significance level.
DISCUSSION AND FUTURE WORK
Traditional customer service often emphasizes users’
informational needs [9]; however, we found that over 40%
of user requests on Twitter are emotional and they are not
intended to seek specific information. This reveals a new
paradigm of customer service interactions. One explanation
is that, compared with calling the 1-800 number or writing
an email, social media significantly lowers the cost of
participation and allows more users to freely share their
experiences with brands. Also, sharing emotions with
public is considered as one of the main motivations for
using social media [1]. Future studies can examine how
emotional requests are associated with users’ motivation in
the context of social media.
Deep learning based system achieved similar performance
as human agents in handling emotional requests, which
represent a significant portion of user requests on social
media. This finding opens new possibilities for integrating
chatbots with human agents to support customer service on
social media. For example, an automated technique can be
designed to separate emotional and informational requests,
and thus emotional requests can be routed to deep learning
chatbots. The response speed can be greatly improved.
Deep learning outperformed IR in all the measures. This is
primarily because of deep learning, as a statistical-based
approach is much better at handling unseen data and thus
more flexible than keyword search approaches. For
instance, given a reference reply to the request “my flight is
delayed” and one to “my order is cancelled”, a deep
learning based system is able to generalize the reply in both
scenarios and provide meaningful replies to unseen
questions such as “my flight is cancelled”, for which the
most appropriate replies can hardly be retrieved from
limited requests/topics available in the training data.
The performance of deep learning and IR systems
decreased when requests switched from emotional to
informational, especially in the case of empathy ratings.
One explanation is that users’ informational needs are more
diverse than their emotional situations. As a result, it is
more challenging to learn and apply the knowledge to
informational requests. The drop in empathy ratings is
probably due to the lack of emotional words in
informational requests. Machine learning techniques are not
able to recognize subtle emotions in these requests and
response empathetically. Future systems could consider
additional contextual information such as users’ social
media profiles to better understand their emotional status.
We observed that deep learning based system was able to
learn writing styles from a brand and transfer them to
another. Future work can explore the functionality in a
more supervised fashion by filtering the training data with
certain styles and specifying the target style for output
sentences. This raises new opportunities of developing
impression management tools on social media. As written
text from brands and individual users affect how they are
perceived on social media [18], such a tool can help them
create images of themselves they wish to present.
Finally, chatbots on social media offer a new opportunity to
provide individualized attention to users at scale and
encourage interactions between users and brands, which can
not only enhance brand performance but also help users
gain social, information and economic benefits [3]. Future
studies can be designed to understand how chatbots affect
the relationship between users and brands in a long term.
Figure 3. BLEU scores of deep learning and IR systems.
REFERENCES
1. Natalya N. Bazarova, Yoon Hyung Choi, Victoria
Schwanda Sosik, Dan Cosley, and Janis Whitlock.
Social Sharing of Emotions on Facebook: Channel
Differences, Satisfaction, and Replies. In Proc. of
CSCW, 2015, 154-164.
2. Joan-Isaac Biel, Oya Aran, and Daniel Gatica-Perez.
You Are Known by How You Vlog: Personality
Impressions and Nonverbal Behavior in Youtube. In
Proc. of ICWSM, 2011, 446-449.
3. Keith S Coulter, Johanna Gummerus, Veronica
Liljander, Emil Weman, and Minna Pihlström.
Customer Engagement in a Facebook Brand
Community. Management Research Review, 2012, 35,
9, 857-877.
4. Ethan Fast, Binbin Chen, and Michael S. Bernstein.
Empath: Understanding Topic Signals in Large-Scale
Text. In Proc. of CHI, 2016, 4647-4657.
5. F. Maxwell Harper, Daniel Moy, and Joseph A.
Konstan. Facts or Friends?: Distinguishing
Informational and Conversational Questions in Social
Q&A Sites. In Proc. of CHI, 2009, 759-768.
6. F. Maxwell Harper, Daphne Raban, Sheizaf Rafaeli,
and Joseph A. Konstan. Predictors of Answer Quality
in Online Q&A Sites. In Proc. of CHI, 2008, 865-874.
7. Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias
Kaufmann, Andrew Tomkins, Balint Miklos, Greg
Corrado, Laszlo Lukacs, Marina Ganea, Peter Young,
and Vivek Ramavajjala. Smart Reply: Automated
Response Suggestion for Email. In Proc. of KDD,
2016, 955-964.
8. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S
Corrado, and Jeff Dean. Distributed Representations of
Words and Phrases and Their Compositionality. In
Proc. of NIPS, 2013, 3111-3119.
9. Keith B Murray. A Test of Services Marketing Theory:
Consumer Information Acquisition Activities. The
Journal of Marketing, 1991, 10-25.
10. NIELSEN. State of the Media: Social Media Report.
2011.
11. Kishore Papineni, Salim Roukos, Todd Ward, and
Wei-Jing Zhu. Bleu: A Method for Automatic
Evaluation of Machine Translation. In Proc. of ACL,
2002, 311-318.
12. James W Pennebaker, Martha E Francis, and Roger J
Booth. Linguistic Inquiry and Word Count: Liwc 2001.
Mahway: Lawrence Erlbaum Associates, 2001, 71,
2001.
13. Tiziano Piccardi, Gregorio Convertino, Massimo
Zancanaro, Ji Wang, and Cedric Archambeau. Towards
Crowd-Based Customer Service: A Mixed-Initiative
Tool for Managing Q&a Sites. In Proc. of CHI, 2014,
2725-2734.
14. Leyland F Pitt, Richard T Watson, and C Bruce Kavan.
Service Quality: A Measure of Information Systems
Effectiveness. Management Information Systems
Quarterly, 1995, 173-187.
15. Alan Ritter, Colin Cherry, and William B Dolan. Data-
Driven Response Generation in Social Media. In Proc.
of EMNLP, 2011, 583-593.
16. Anselm L Strauss Qualitative Analysis for Social
Scientists. Cambridge University Press, 1987.
17. Ilya Sutskever, Oriol Vinyals, and Quoc V Le.
Sequence to Sequence Learning with Neural Networks.
In Proc. of NIPS, 2014, 3104-3112.
18. Anbang Xu, Haibin Liu, Liang Gou, Rama Akkiraju,
Jalal Mahmud, Vibha Sinha, Yuheng Hu, and Mu
Qiao. Predicting Perceived Brand Personality with
Social Media. In Proc. of ICWSM, 2016, 436-445.
19. http://blog.hubspot.com/marketing/twitter-response-
time-data.
20. http://lucene.apache.org.
... As such, they lack the nuanced communication afforded in face-to-face interactions (J€ orling et al., 2019). It is also worth noting that Xu et al. (2017) found that more than 40% of requests made to chatbots were not informational but, rather, users expressing an emotional state, such as positive or negative feelings for the company. ...
Article
Full-text available
Purpose This paper aims to contribute to the discussion on integrating humans and technology in customer service within the framework of Society 5.0, which emphasizes the growing role of artificial intelligence (AI). It examines how effectively new generative AI-based chatbots can handle customer emotions and explores their impact on determining the point at which a customer–machine interaction should be transferred to a human agent to prevent customer disengagement, referred to as the Switch Point (SP). Design/methodology/approach To evaluate the capabilities of new generative AI-based chatbots in managing emotions, ChatGPT-3.5, Gemini and Copilot are tested using the Trait Emotional Intelligence Questionnaire Short-Form (TEIQue-SF). A reference framework is developed to illustrate the shift in the Switch Point (SP). Findings Using the four-intelligence framework (mechanical, analytical, intuitive and empathetic), this study demonstrates that, despite advancements in AI’s ability to address emotions in customer service, even the most advanced chatbots—such as ChatGPT, Gemini and Copilot—still fall short of replicating the empathetic capabilities of human intelligence (HI). The concept of artificial emotional awareness (AEA) is introduced to characterize the intuitive intelligence of new generative AI chatbots in understanding customer emotions and triggering the SP. A complementary rather than replacement perspective of HI and AI is proposed, highlighting the impact of generative AI on the SP. Research limitations/implications This study is exploratory in nature and requires further theoretical development and empirical validation. Practical implications The study has only an exploratory character with respect to the possible real impact of the introduction of the new generative AI-based chatbots on collaborative approaches to the integration of humans and technology in Society 5.0. Originality/value Customer Relationship Management managers can use the proposed framework as a guide to adopt a dynamic approach to HI–AI collaboration in AI-driven customer service.
... CAs are relatively successful in task-oriented interactions [267,268], the initial promise of building CAs that can carry out natural and coherent conversations with users has largely remained unfulfilled due to both design and technical challenges [269,270,271]. This "gulf" between user expectation and experience with CAs [272] has led to constant user frustration, frequent conversation breakdowns, and eventual abandonment of CAs [272,273,271]. ...
Thesis
Full-text available
AI systems are being equipped with human-like social capabilities while serving different social roles as our assistants and partners. Some recent AI systems are said to have Theory of Mind (ToM)-like capability that advances their social adeptness. ToM is a basic social and cognitive human capability of attributing mental states such as beliefs, emotions, knowledge, plans, and goals to oneself and others based on behavioral or verbal cues. As these AI systems exhibit such advanced social capability, humans are increasingly uncertain about how they should perceive such AI systems’ social roles and capabilities. Thus, managing and accounting for human perceptions of AI systems performing at various social capacities becomes crucial in improving user experience and mitigating harms in human-AI communications. Inspired by people’s usage of their ToM capability in human-human communication to constantly recognize, monitor, and respond to others’ perceptions of them, this thesis posits the Mutual Theory of Mind (MToM) framework to enhance human-AI communication. The MToM framework aims to guide the design of human-AI communication by breaking down this iterative communication process into three analyzable stages: (1) ToM construction: AI’s construction of human’s interpretation of the AI, (2) ToM recognition: human’s recognition of AI’s interpretation of the human, and (3) ToM revision: AI’s revision of its interpretation of the human. Each MToM stage represents a ToM process of one party’s communication feedback shaping the other’s interpretation of how they are perceived by others. Following the MToM framework, this thesis reports on a series of empirical studies that provide design implications for AI systems that can account for human perceptions of AI during communications. These studies were conducted in the context of AI-mediated social interaction in large-scale learning environments, where AI systems are already leveraging their ToM-like capability to provide personalized social recommendations to socially isolated adult learners based on information inferred from their digital footprints. This thesis begins by qualitatively examining the design requirements of AI’s social roles and capabilities in AI-mediated social interaction that can cater to adult learners’ current practices, challenges, and preferences in remote social interactions. The rest of the thesis empirically explores students’ perceptions of AI in AI-mediated social interaction at each stage of the MToM framework. At the ToM construction stage, I conducted a longitudinal survey study that established the feasibility for AI agents to construct students’ evolving perceptions of the AI by leveraging social cues embedded in students’ utterances to the AI. At the ToM recognition stage, I conducted a mixed-methods study and found that students continuously acquire knowledge from AI’s (mis)interpretations of them to shape their perceptions of AI, which can be inaccurate and harmful. At the ToM revision stage, I conducted a mixed-factorial vignette experiment and found that AI’s revision and communication of its misinterpretations can effectively mitigate students’ negative perceptions of AI after encountering AI misinterpretations. Overall, this dissertation makes theoretical, design, and empirical contributions to the fields of human-AI interaction, computer-supported cooperative work, and responsible AI. This work provides theoretical guidance, rich empirical descriptions, and actionable design implications for the next generation of AI systems that can continuously construct, recognize, and respond to human perceptions of AI in human-AI communication.
... Chatbot-based customer support services have become increasingly prevalent across industries [4,14,15]. Companies adopt these automated services to reduce operational costs and improve efficiency. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. ...
Preprint
Chatbot-based customer support services have significantly advanced with the introduction of large language models (LLMs), enabling enhanced response quality and broader application across industries. However, while these advancements focus on reducing business costs and improving customer satisfaction, limited attention has been given to the experiences of customer service agents, who are critical to the service ecosystem. A major challenge faced by agents is the stress caused by unnecessary emotional exhaustion from harmful texts, which not only impairs their efficiency but also negatively affects customer satisfaction and business outcomes. In this work, we propose an LLM-powered system designed to enhance the working conditions of customer service agents by addressing emotionally intensive communications. Our proposed system leverages LLMs to transform the tone of customer messages, preserving actionable content while mitigating the emotional impact on human agents. Furthermore, the application is implemented as a Chrome extension, making it highly adaptable and easy to integrate into existing systems. Our method aims to enhance the overall service experience for businesses, customers, and agents.
... With the success of deep learning architectures in generating synthetic data on images and text, state-of-the-art architectures like GANs were extended to synthesize tabular data. From [5] and [6], it is evident that robust real-world application models need large-scale training data to yield superior performance. But large-scale data is not often readily available. ...
Preprint
Full-text available
The present study aimed to address the issue of imbalanced data in classification tasks and evaluated the suitability of SMOTE, ADASYN, and GAN techniques in generating synthetic data to address the class imbalance and improve the performance of classification models in low-resource settings. The study employed the Generalised Linear Model (GLM) algorithm for class balancing experiments and the Random Forest (RF) algorithm for low-resource setting experiments to assess model performance under varying training data. The recall metric was the primary evaluation metric for all classification models. The results of the class balancing experiments showed that the GLM model trained on GAN-balanced data achieved the highest recall value. Similarly, in low-resource experiments, models trained on data enhanced with GAN-synthesized data exhibited better recall values than original data. These findings demonstrate the potential of GAN-generated synthetic data for addressing the challenge of imbalanced data in classification tasks and improving model performance in low-resource settings.
Article
Purpose This study investigates the impact of emoji use and user personality traits (conscientiousness vs extraversion) on user behavior in the context of academic advising. It uniquely considers the interaction between these chatbot characteristics and human users' dominant personality traits (conscientiousness and extraversion). Design/methodology/approach A mixed-factor design experiment involving 153 university students was employed. Participants interacted with four different chatbot conditions: a conscientious bot and an extroverted bot, each with and without emojis. Findings The inclusion of emojis negatively influenced users' intentions to use the chatbots but did not affect trust, perceived authenticity or intended engagement with the bots. Additionally, the students' personality traits played a role in evaluating the different chatbot types. Originality/value This research introduces a novel approach by integrating emoji use and human personality traits into chatbot communication, focusing on academic advising. It examines the interaction effects of emojis and personality traits (conscientiousness and extraversion) on user behavior, also considering the user’s personality traits. This work enriches the human-computer interaction field and guides future chatbot development.
Article
Full-text available
OpenAI-the name of the latest breakthroughs in artificial intelligence (AI) research-has captured the imagination since its announcement at the end of 2015.[1] This non-profit research organization, unlike its for-profit rivals, has an ambitious vision: to ensure that Artificial General Intelligence (AGI), the highest form of AI development where machines can outperform humans in a variety of applications, works for humanity at large. This paper discusses the ability to turn design papers into a knowledge-base via Generative AI (Gen-AI). Using LLMs and a strong search engine like OpenSearch, lets explore how to create a robust chatbot that can answer questions and provide insight right from the design document. Lets walk through all the key components, starting with data preparation and indexing to model selection and integration. Lets understand how to mine valuable data from design files, preprocess them for optimal LLM performance, and provide a slick search solution with OpenSearch. Hopefully, will learn enough to create own intelligent chatbot that will help teams effectively access and make sense of important design information at the end of this article.
Conference Paper
Full-text available
Brand personality has been shown to affect a variety of user behaviors such as individual preferences and social interactions. Despite intensive research efforts in human personality assessment, little is known about brand personality and its relationship with social media. Leveraging the theory in marketing, we analyze how brand personality associates with its contributing factors embodied in social media. Based on the analysis of over 10K survey responses and a large corpus of social media data from 219 brands, we quantify the relative importance of factors driving brand personality. The brand personality model developed with social media data achieves predicted R 2 values as high as 0.67. We conclude by illustrating how modeling brand personality can help users find brands suiting their personal characteristics and help companies manage brand perceptions.
Article
Full-text available
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.
Conference Paper
Full-text available
People often share emotions with others in order to manage their emotional experiences. We investigate how social media properties such as visibility and directedness affect how people share emotions in Facebook, and their satisfaction after doing so. 141 participants rated 1,628 of their own recent status updates, posts they made on others' timelines, and private messages they sent for intensity, valence, personal relevance, and overall satisfaction felt after sharing each message. For network-visible channels-status updates and posts on others' timelines-they also rated their satisfaction with replies they received. People shared differently between channels, with more intense and negative emotions in private messages. People felt more satisfied after sharing more positive emotions in all channels and after sharing more personally relevant emotions in network-visible channels. Finally, people's overall satisfaction after sharing emotions in network-visible channels is strongly tied to their reply satisfaction. Quality of replies, not just quantity, matters, suggesting the need for designs that help people receive valuable responses to their shared emotions.
Conference Paper
Full-text available
In this paper, we propose a mixed-initiative approach to integrate a Q&A site based on a crowd of volunteers with a standard operator-based help desk, ensuring quality of customer service. Q&A sites have emerged as an efficient way to address questions in various domains by leveraging crowd knowledge. However, they lack sufficient reliability to be the sole basis of customer service applications. We built a proof-of-concept mixed-initiative tool that helps a crowd-manager to decide if a question will get a satisfactory and timely answer by the crowd or if it should be redirected to a dedicated operator. A user experiment found that our tool reduced the participants’ cognitive load and improved their performance, in terms of their precision and recall. In particular, those with higher performance benefited more than those with lower performance.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Conference Paper
In this paper we propose and investigate a novel end-to-end method for automatically generating short email responses, called Smart Reply. It generates semantically diverse suggestions that can be used as complete email responses with just one tap on mobile. The system is currently used in Inbox by Gmail and is responsible for assisting with 10% of all mobile responses. It is designed to work at very high throughput and process hundreds of millions of messages daily. The system exploits state-of-the-art, large-scale deep learning. We describe the architecture of the system as well as the challenges that we faced while building it, like response diversity and scalability. We also introduce a new method for semantic clustering of user-generated content that requires only a modest amount of explicitly labeled data.
Article
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Article
The teaching of qualitative analysis in the social sciences is rarely undertaken in a structured way. This handbook is designed to remedy that and to present students and researchers with a systematic method for interpreting qualitative data', whether derived from interviews, field notes, or documentary materials. The special emphasis of the book is on how to develop theory through qualitative analysis. The reader is provided with the tools for doing qualitative analysis, such as codes, memos, memo sequences, theoretical sampling and comparative analysis, and diagrams, all of which are abundantly illustrated by actual examples drawn from the author's own varied qualitative research and research consultations, as well as from his research seminars. Many of the procedural discussions are concluded with rules of thumb that can usefully guide the researchers' analytic operations. The difficulties that beginners encounter when doing qualitative analysis and the kinds of persistent questions they raise are also discussed, as is the problem of how to integrate analyses. In addition, there is a chapter on the teaching of qualitative analysis and the giving of useful advice during research consultations, and there is a discussion of the preparation of material for publication. The book has been written not only for sociologists but for all researchers in the social sciences and in such fields as education, public health, nursing, and administration who employ qualitative methods in their work.