Conference PaperPDF Available

DigiMo - towards developing an emotional intelligent chatbot in Singapore

Authors:

Abstract and Figures

The paper is a work in progress report on the development of DigiMo, a chatbot with emotional intelligence. The chatbot development is based on a data collection and annotations of real dialogues between local Singaporeans expressing genuine emotions. The models were trained with Cakechat, an open source sequence-to-sequence deep neural network. Perplexity measurements from automatic testing as well as feedback from 6 expert evaluators confirmed that the chatbot answers have high accuracy. Future research directions and development are briefly discussed.
Content may be subject to copyright.
DigiMo towards developing an emotional intelligent
chatbot in Singapore
Andreea I. Niculescu
Institute for Infocomm Research
Singapore
andreea-n@i2r.a-stra.edu.sg
Ivan Kukanov
Institute for Infocomm Research
Singapore
ivan_kukanov@i2r.a-star.edu.sg
Bimlesh Wadhwa
National University of Singapore
Singapore
bimlesh@nus.edu.sg
ABSTRACT
The paper is a work in progress report on the development of
DigiMo, a chatbot with emotional intelligence. The chatbot
development is based on a data collection and annotations of real
dialogues between local Singaporeans expressing genuine
emotions. The models were trained with Cakechat, an open source
sequence-to-sequence deep neural network. Perplexity
measurements from automatic testing as well as feedback from 6
expert evaluators confirmed that the chatbot answers have high
accuracy. Future research directions and development are briefly
discussed.
Author Keywords
Authors’ choice; Natural language interaction; deep
learning; data annotation; emotion; chatbot; expert
evaluation.
CSS Concepts
Human-centered computing~Human computer
interaction (HCI); Natural language interfaces;
INTRODUCTION
Chatbots are conversational software agents that use natural
language to interact with human users. Chatbots are being
developed since the ’60s ELIZA, the chatbot psychologist
being one famous example [14] - but only recently, there is
a growing interest in many industry sectors for this
particular technology. It is estimated that around 80% of all
businesses globally would like to use chatbots by 2020 [2].
The interest is motivated by increased customer demand for
services accessible over messaging platforms. Studies show
that customers prefer to contact service providers over
instant messages rather than over phone or email [4].
Additionally, bots have 24/7 availability and are efficient in
handling repetitive tasks, thereby cutting down significant
costs for companies. As a result, chatbots are an appealing
asset for most organizations.
Another desirable feature the industry wants chatbots to
have is emotional intelligence. Emotional intelligence goes
beyond informational or transaction tasks and enables
chatbots to be successfully deployed as customer service
assistants. Such chatbots would “understand” users’
feelings and respond accordingly. An example chatbot that
uses emotion detection and reacts emphatically is Replika
[11]. Replika learns a pattern of behavior from the user over
time aiming to become a virtual second-self. Replika offers
only emotional support and does not have any other task-
oriented functionality.
Following the global trend, many chatbots were developed
and are currently used in Singapore, for example JIM, the
DBS virtual bank recruiter [5]; AskJamie, e-governance
virtual assistant[9]; Kris, Singapore Airline’s chatbot for
flights and travel queries [13]; the Bus Uncle, the joke-
loving bus schedule assistant commuters [1]; SARA, the
virtual assistant for tourists [10]. While being informative,
helpful and even witty, these applications lack emotion
embedding and empathic reaction when interacting with
users.
When chatbots mimic humans, they can effectively provide
emotional support. Emotionally intelligent chatbots could
help for example in addressing sensitive issues e.g. enabling
humans to anonymously report an improper behaviour
without conversing with a human. Among many challenges
to design an emotionally intelligent chatbot, perhaps the
hardest is to model the emotion across the conversation for
effective courteous response generation.
In this study, we present our work-in-progress in
developing an emotional intelligent chatbot for Singapore
users that would eventually combine emotional
“intelligence” with task-oriented capabilities. The chatbot
development is based on the data of real dialogues between
local Singaporeans expressing genuine emotions. It is part
of a larger project on integrative approaches to emotion
recognition from multi-modal cues called “Digital
Emotions” [6]. In our next research stage, we plan to
answer specific questions such as: (i) how the data collected
and annotated can be best incorporated into our training
model to maximize our chatbot’s emotional ‘skills’? and (ii)
how to ascertain the value of the effect of the local data
used for training, on the overall satisfaction of Singaporean
users?
DATA COLLECTION
While standard information requests and transactional tasks
may show very similar patterns across businesses around
the world, emotion expression is intrinsically related to a
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others
than ACM must be honored. Abstracting with credit is permitted. To copy otherwise,
or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from Permissions@acm.org.
CHI 2020, April 2530, 2020, Honolulu, HI, USA.
© 2020 Copyright is held by the owner/author(s). Publication rights licensed to
ACM. ACM ISBN 978-1-4503-6708-0/20/04...$15.00.
DOI: https://doi.org/10.1145/3313831.XXXXXXX
*update the above block and DOI per your rightsreview confirmation (provided after acceptance)
person’s culture and personality. Therefore, it is
fundamental to understand, how local people express
emotions when they chat with each other.
Research in this area usually deploys open data collections
based on movie subtitles [12] or Twitter corpora [8]. The
advantage is that the vast amount of data can generate solid
models. On the other hand, content of such data might not
be optimal for the development of an emotional intelligent
chatbot: firstly, movie subtitles are transcribed spoken
interaction with enacted, i.e. not real emotions. Secondly,
transcribed spoken interactions are significantly different
from chat interactions. Thirdly, twitter data are public
messages to which people react with comments: even
though in real time, the interaction is rather asynchronous.
Therefore, in our study we opted for a different approach:
we collected conversations, in English, exchanged between
local Singaporeans, and we used it to adjust pre-trained
models on twitter data for our chatbot.
For the data collection purpose, experts were engaged to
lead dialogue conversations on 3 pre-selected topics with
potential emotional load: customer experiences, weight
management & nutrition, and events with psychological
impact. Participants were encouraged to talk about their
own experiences during a chat session that lasted 30 min.
A total of 60 participants interacted over a chat platform
with our experts - a customer representative, a psychologist,
and a nutritionist. 60 dialogues with a total 7027 turns were
collected. The sub-topics covered in the dialogues included
experiences in retail and restaurant services, issues
concerning home renovation, work, unemployment,
personal relationships and family matters (e.g. death of a
relative, getting married and going through break-ups),
education (e.g. studies, course training, school), hobbies,
army, health issues (e.g. lack of sleep, depression, illness,
weight management, nutrition, etc.), holiday and experience
abroad. The data was collected and annotated over a period
of three months. An example extracted from our data
collection is given below:
“09:41 Participant: they just assumed I couldn’t afford it
09:41 Expert: Omg that is bad
09:41 Expert: and you walked away?
09:42 Participant: Yup
09:42 Participant: i did something childish and totally immature after that
09:42 Expert: what was it?
09:42 Participant went to withdrew $1000 and went back in and showed
them that I COULD HAVE BOUGHT IT IF I WANTED…”
ANNOTATION SCHEME
Humans can express emotions over single or mixed
channels. These channels make use of vocal cues, words or
facial expressions. Emotions expressed through mixed
channels, (e.g. vocal cues & words or vocal cues & facial
expressions, etc.) are easier to interpret, as they are less
ambiguous. Single channel emotion on the other side, such
as words in written communication can be more
challenging to decode”.
To help our annotators uncover emotions in our data, we
elaborated an annotation scheme based on Ekman’s six
emotions model [7]: anger, disgust, fear, happiness,
sadness, and surprise. The scheme was enhanced with
additional values that basic emotions could embrace (see
Figure 1). Its role was to help annotators identify the correct
emotion expressed in the dialogues. Three intensity levels
for emotion expression were defined such as low (1),
middle (2) and high (3).
Figure 1 Emotion Annotation Scheme
We also defined the expression mode, i.e. whether the
emotion was expressed empathetically. The scheme was
developed in an iterative process based on the first dialogue
samples from the newly collected data. An elaborated
guideline with numerous annotation examples was handed
to the annotators at the end of the data collection. Table 1
shows an example extracted from our guideline. It shows
the annotation of ‘surprise’.
Table 1 Examples of annotations for ‘surprise’
ANNOTATION RESULTS
2 annotators performed annotations. The inter-agreement
reliability calculations were performed using Krippendorf
alpha measurements on 10% of the entire data. The results
indicate a high inter-agreement rate of 0.817%.
The data collection enabled us to tag about 801 emotions
with most frequent emotions being angry (29%) and happy
(28%) as shown in Figure 2.
EXPERT EVALUATION
To test how a perplexity of 29 translates into human
judgment, we performed a test on a subset of emotions. We
<SURPRISE( =”1”( VALUE=NEG:(oh,( I( didn’t( know( that( >( -" the"
surprised" is" highlighted" by" “oh”." "The" intensity" value" is" 1" since"
there" are" no" other" markers" of" emphasis" (no" adjectives," no"
uppercase"letters,"no"exclamation"mark.(
<SURPRISE(=”2”(VALUE=POS(MODE=(EMPHATIC:(wow(great(news(
>( ( -" here" the" intensity=2" because" of" “wow”" and" the" adjective"
“great”" however," it" is" expressed"empathically," so" the" intensity" is"
below"3"(
<SURPRISE=”3”( VALUE=POS>( OMG,( Really??</SURPRISE>(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
<SURPRISE=”3”( VALUE=NEG>(He( died??( OMG( what( a( shock!(
</SURPRISE>(*Upper"case"letters,"double"question"or"exclamation"
mark"a"higher"level"of"emotion"–"this"this"case:”3”(
chose ‘happy’ (positive), ‘angry’ (negative) and ‘no
emotion’ (neutral statements); we also chose ‘disgust’
(negative) as an emotion most difficult to detect and
correctly annotate in our corpus. We manually generated
12x4 user input sentences that would be neutral or express
emotions such as happy, angry and disgust. Then, we
programmed DigiMo to respond automatically to each of
these sentences using the same repertoire of emotions in an
experimental 4x4 matrix settingsee Table 2. In this way,
a total number of 192 (48x4) questions & answers were
generated. Further, six experts - three linguists and three
chatbot developers - were asked to classify DigiMo’s
replies as suitable, neutral1 and unsuitable.
A brief examination of emotion expressed by our
Singaporean participants revealed several interesting
observations, such as:
Happiness is often expressed using the verb “to feel” in
combination with superlative positive adjectives: “I am
feeling really happy”, “I feel a sense of satisfaction”, “very
excited / proud / good /glad”, “absolutely amazing/ damn
siok”, “excellent”, “magic” and “wonderful”.
Other verbs often used in sentences annotated as “happy”
are “to like” and “to love”. Happy expressions are
sometimes combined with transliteration of laughs. Upper
case letters and exclamation marks are used to emphasize
the expression of happiness.
Anger is expressed similar to happiness but using the
opposite polarity: verbs + superlative negative adjectives
using upper case letters & exclamation marks for emphasis.
Interestingly, anger seems to deploy a larger variation of
negative verbs such as “to refuse”, “to fail”, “to anger”, “to
upset”, “to insult”, “to annoy”, “to feel+negative adjectives
in superlative form: “really disappointed”, “so angry”,
very rude” and also “bad”, “unfair”, “really sucked”,
“spoiled mood”. Apart from larger linguistic diversity,
statements expressing anger tend to be longer and more
descriptive than their “happy” counterparts spreading
sometimes over more than one dialogue turn.
Figure 2 Emotion distribution frequencies
Sadness is often expressed in conjunction with the verb “to
feel”+adjective (“sad”, “lonely”, “much worse”, “guilty”,
“uncomfortable”, “regretful”) or using the passing
construction: “I was affected”. Often descriptions of events
1 Neutral are responses suitable in certain contexts only.
are presented as “It was”+“depressing”, “though”, “awful”,
“a pity”. Feelings of despair are also described using
expressions like “to cry”/ “I was in tears” or through
rhetorical questions and observations: “Why am I here?”, “I
have no meaning”. Sad emotions are sometimes
accompanied by a sad smiley.
Surprise is mostly marked through interjections, such as
“oh”, “ah” and “wow” “OMG” followed by a statement of
appreciation and disapproval. Surprise statements may also
mention the cause of surprise: “I didn’t know” or “I thought
[…]” in situation when the exact opposite was believed to
be true. Often surprise statements are ending with one more
exclamation or question marks. Surprise can be also
expressed using phrases like “I was surprised / puzzled /
shocked“, “better than I expected” or “surprisingly”.
Disgust is more difficult to spot, as there are no distinctive
linguistic patterns than would separate it from Anger”.
Disgust often expresses sarcasm: “Only Asian love to ask
such questions!” It appears in statements containing
criticisms of observed behaviors or events that are highly
disliked or hold in contempt. However, these events do not
harm the observer directly. Negative adjectives are used,
such as “bad urine smell” “pushy”, “unfriendly”, “poor”,
“selfish”, “infamous airline”. Only the context can
determine whether it is an expression of anger or disgust.
ARCHITECTURE
To train DigiMo - our chatbot - we used Cakechat [3], an
open source project. Cakechat implements a sequence-to-
sequence dialogue model (HRED) obtained from pre-
trained models. The models were trained on carefully pre-
processed 11 GB Twitter data.
We fed our own data collection to Cakechat, using a
dictionary of 50k words. Statements with no emotional
expression were labeled as ‘neutral’. The learning rate was
set at 0.01. For the encoded/decoded sequence maximum of
30 tokens are allowed. As such, we trimmed and readjusted
the turns to fit this condition. Both encoder and decoder
contain 2 GRU layers with 512 hidden units each.
As, this study is a work-in-progress, we didn’t manage to
implement yet an emotion classifier for the user input.
However, we were able to test the chatbot answers by
manually adding the emotion classification of each user
input and testing the system over the terminal: <user_input:
“I am really scared of this exam” [FEAR]. The architecture
of DigiMo is depicted in Figure 3.
AUTOMATIC EVALUATION
We tested our model experimenting with different epochs
and batch sizes. During training, we controlled context-
sensitive and context-free perplexity values. The best
results context free & context sensitive perplexity2, both
with a low value of 29 - were achieved with a system
2 Perplexity is a measure that shows how well a probability model predicts
test data; in this case, it shows how good is the language model. A lower
perplexity means a better model.
Figure 3 DigiMo architect
configuration using a batch size of 128, 15 epochs, free-
context perplexity and context size equal to 3.
EXPERT EVALUATION
To test how a perplexity of 29 translates into human
judgment, we performed a test on a subset of emotions. We
chose ‘happy’ (positive), ‘angry’ (negative) and ‘no
emotion’ (neutral statements); we also chose ‘disgust’
(negative) as an emotion most difficult to detect and
correctly annotate in our corpus. We manually generated
12x4 user input sentences that would be neutral or express
emotions such as happy, angry and disgust. Then, we
programmed DigiMo to respond automatically to each of
these sentences using the same repertoire of emotions in an
experimental 4x4 matrix setting see Table 2. In this way,
a total number of 192 (48x4) questions & answers were
generated. Further, six experts - three linguists and three
chatbot developers - were asked to classify DigiMo’s
replies as suitable, neutral3 and unsuitable. To calculate
DigiMo’s answer accuracy we used (Nsuitable + Nneutral) /
Ntotal *100.
From a total of 1152 evaluations (192x6), 665 were
suitable, 236 unsuitable and 251 unsuitable, thus the total
accuracy of our sample data is: 78.20%.
Table 2 Expert evaluation of answer accuracy
Highest average values were achieved for chatbot answers
expressing happiness 85.75%, the highest values being
achieved for combinations matching happy emotions for
both user and DigiMo (95.83%).
An interesting case is the combination of user_angry and
chatbot_happy (88.88%). This combination, despite being a
3 Neutral are responses suitable in certain contexts only.
mismatch achieved many accurate responses. A closer look
revealed the fact that this combination creates sarcastic but
at the same time suitable responses, e.g. (User: “my phone
is out of battery”, DigiMo: “awesome”). Lowest average
values were achieved for chatbot answers loaded with angry
emotions (67.01%), the lowest value being achieved by the
combination: user_happy & chatbot_angry (52.77%).
DISCUSSION
Even though our chatbot is still under development, it
shows very promising results. It responds emotionally
appropriate to our input thanks to our data collection and
careful annotation. Our next target is to develop and test an
emotion classifier and incorporate the models into a task-
oriented chatbot deployed in customer service.
Unlike sentiment analysis, emotion detection focuses on
detecting several emotion categories going beyond the
binary negative/positive classification. This classification
complexity makes emotion detection a more challenging
task. Secondly, relying on text only, i.e. in the absence of
speech, which is the typical case of text-based chatbot,
makes the emotion detection much harder. Intensity varies
according to user personality, chatting habits and
culture. In Singapore, we would expect people to express
emotions rather moderately, which is the typical way for
Asian culture known for the tendency towards introvert
pattern of expressions.
As mentioned earlier, in the long term, we investigate
following research questions:
(i) How the data collected and annotated can be best
incorporated into our training model to maximize our
chabot’s emotional ‘skills’? At the moment, we took into
account only ‘emotion’ relevant dialogues however,
context could play an additional important role. We aim to
investigate how much context would be required and how
to efficiently incorporate it in our training model.
(ii) How to evaluate the effects of the local data collected
and used for training, on the overall satisfaction of
Singaporean users? In other words, is it worth collecting
and modeling a chatbot using local data or would any type
of data do a similar job? These questions will be answered
in our future research.
ACKNOWLEDGMENTS
We thank to our experts for participating in our study. This
research was supported by SERC Strategic Fund from Science &
User
Chatbot
Neutral
Happy
Angry
Avg.
Neutral
90.27%
88.88%
61.11%
77.07%
Happy
75%
95.83%
52.77%
73.95%
Disgust
83.33%
69.44%
79.16%
77.77%
Angry
81.94%
88.88%
75%
84.02%
Avg.
82.63%
85.75%
67.01%
78.20%
Engineering Research Council (SERC), A*STAR (project no.
a1718g0046).
REFERENCES
[1] Bus Uncle. Retrieved January 5th 2020 from
https://www.busuncle.sg/
[2] Business Insider. Retrieved January 5th 2020 from
https://www.businessinsider.com/80-of-businesses-want-
chatbots-by-2020-2016-12?IR=T
[3] Cakechat Github. Retrieved January 5th 2020 from
https://github.com/lukalabs/cakechat#network-architecture-
and-features
[4] Chatbots Magazine. Retrieved January 5th, 2020 from
https://chatbotsmagazine.com/the-role-of-emotional-
intelligence-in-ai-1e078ac0e328
[5] DBS website. Retrieved January 5th from 2020 from
https://www.dbs.com/newsroom/DBS_introduces_Jim_South
east_Asias_first_virtual_bank_recruiter
[6] Digital Emotions. Retrieved January 5th 2020 from
http://projectdigitalemotion.net/
[7] Paul Ekman. 1999. Basic Emotions. In: Handbook of
cognition and emotions. T. Dalgleish & M.J. Power (eds.),
Willey, US, 45-60
[8] Boris Galitsky. 2019. Developing Enterprise Chatbots:
Learning Linguistic Structures. Springer.
[9] GovTech Singapore. Retrieved January 5th 2020 from
https://www.tech.gov.sg/products-and-services/ask-jamie/
[10] A.I. Niculescu, K.H. Yeo, L.F. D’Haro, S. Kim, R. Jiang,
R.E. Banchs. 2014. Design and evaluation of a conversational
agent for the touristic domain. In: Proc. of APSIPA
[11] Replika. Retrieved January 5th, 2020 from https://replika.ai/
[12] C. Segura, A. Palau, J. Luque, M. R. Costa-Jussà, R.E.
Banchs. 2019. Chatbol, a Chatbot for the Spanish “La Liga”.
In: D'Haro L., Banchs R., Li H. (eds.) 9th Int. Workshop on
Spoken Dialogue System Technology. LNEE, vol. 579.
Springer, Singapore. 319-330
[13] SilverKris. Retrieved January 5th 2020 from
https://www.silverkris.com/meet-kris-the-new-beta-chatbot-
for-singapore-airlines/
[14] Joseph Weizenbaum.1966. Eliza computer program for the
study of natural language communication between man and
machine. Communications of the ACM, vol. 9, no. 1: 3645
... Classical Arabic [73], [60], [72] Education English [117], [86], [90], [108], [118], [1 07], [45], [112], [109], [122] Classical and MSA Arabic [50], [51] MSA Arabic [54], [58], [57], [55], [52], [53], [ 67], [66], [110], [63], [65] Arabic dialects: Saudi Arabic dialect and Jordanian [77], [78] Healthcare English [41], [44], [47], [70], [49], [71], [ 119] MSA Arabic [74], [111] Arabic Dialects: Egyptian [100] Tourism and airline English [35], [43] MSA Arabic [59], [75], [46], [113], [115] Business and customer service English [87], [69], [121], [81], [84], [95], [83], [106], [120] Empathy and personalizati on English [123], [101], [92], [124] MSA Arabic [85], [125] Open English [68], [89], [79], [40], [104], [99], [36], [102], [94], [39], [82], [48], [105], [88], [114], [98], [116] MSA Arabic [64], [42], [61], [56], [80], [93], [ 38], [62] Arabic Dialects: Gulf Arabic and Egyptian [91], [76] Automat ic based Metric F1-Score [89], [114], [106], [80], [62], [100] Precision [105], [106], [80], [62] Recall [105], [106], [80], [62] Accuracy [92], [114], [126], [105], [124], [80], [ 93], [62], [100], [76], [56], [74] PPL [89], [102], [94], [92], [98], [124], [12 6], [39], [101], [85] BLEU [79], [83], [94], [81], [92], [95], [84], [ 126], [127], [86], [101], [49], [40], [4 0], [99], [85], [91] ROUGE [79], [83], [84], [49] MAP, P@1 and MRR [103], [104] SkipThoughts cosine similarity, embedding average cosine similarity, vector extrema cosine similarity, BOW and greedy matching scores [79], [102], [94], [81], [84], [127], [88 ] Other [87], [95], [84], [39], [116], [40], [99], [109], [43], [48], [79], [102], [127] NA [70], [41], [44], [107], [47], [68], [69], [108], [121], [38], [61], [67], [67], [73], [108], [113] Human based Metric H: User Satisfaction [35], [45], [82], [84], [89], [118], [128], [36], [49], [98], [101], [117], [112], [119], [120], [122], [60], [77], [85], [90], [63], [78], [91], [110], [125], [46], [57], [59], [111], [115], [54], [55], [65], [66], [75], [42], [50]- [53], [58], [64], [72] www.ijacsa.thesai.org ...
... Classical Arabic [73], [60], [72] Education English [117], [86], [90], [108], [118], [1 07], [45], [112], [109], [122] Classical and MSA Arabic [50], [51] MSA Arabic [54], [58], [57], [55], [52], [53], [ 67], [66], [110], [63], [65] Arabic dialects: Saudi Arabic dialect and Jordanian [77], [78] Healthcare English [41], [44], [47], [70], [49], [71], [ 119] MSA Arabic [74], [111] Arabic Dialects: Egyptian [100] Tourism and airline English [35], [43] MSA Arabic [59], [75], [46], [113], [115] Business and customer service English [87], [69], [121], [81], [84], [95], [83], [106], [120] Empathy and personalizati on English [123], [101], [92], [124] MSA Arabic [85], [125] Open English [68], [89], [79], [40], [104], [99], [36], [102], [94], [39], [82], [48], [105], [88], [114], [98], [116] MSA Arabic [64], [42], [61], [56], [80], [93], [ 38], [62] Arabic Dialects: Gulf Arabic and Egyptian [91], [76] Automat ic based Metric F1-Score [89], [114], [106], [80], [62], [100] Precision [105], [106], [80], [62] Recall [105], [106], [80], [62] Accuracy [92], [114], [126], [105], [124], [80], [ 93], [62], [100], [76], [56], [74] PPL [89], [102], [94], [92], [98], [124], [12 6], [39], [101], [85] BLEU [79], [83], [94], [81], [92], [95], [84], [ 126], [127], [86], [101], [49], [40], [4 0], [99], [85], [91] ROUGE [79], [83], [84], [49] MAP, P@1 and MRR [103], [104] SkipThoughts cosine similarity, embedding average cosine similarity, vector extrema cosine similarity, BOW and greedy matching scores [79], [102], [94], [81], [84], [127], [88 ] Other [87], [95], [84], [39], [116], [40], [99], [109], [43], [48], [79], [102], [127] NA [70], [41], [44], [107], [47], [68], [69], [108], [121], [38], [61], [67], [67], [73], [108], [113] Human based Metric H: User Satisfaction [35], [45], [82], [84], [89], [118], [128], [36], [49], [98], [101], [117], [112], [119], [120], [122], [60], [77], [85], [90], [63], [78], [91], [110], [125], [46], [57], [59], [111], [115], [54], [55], [65], [66], [75], [42], [50]- [53], [58], [64], [72] www.ijacsa.thesai.org ...
... Classical Arabic [73], [60], [72] Education English [117], [86], [90], [108], [118], [1 07], [45], [112], [109], [122] Classical and MSA Arabic [50], [51] MSA Arabic [54], [58], [57], [55], [52], [53], [ 67], [66], [110], [63], [65] Arabic dialects: Saudi Arabic dialect and Jordanian [77], [78] Healthcare English [41], [44], [47], [70], [49], [71], [ 119] MSA Arabic [74], [111] Arabic Dialects: Egyptian [100] Tourism and airline English [35], [43] MSA Arabic [59], [75], [46], [113], [115] Business and customer service English [87], [69], [121], [81], [84], [95], [83], [106], [120] Empathy and personalizati on English [123], [101], [92], [124] MSA Arabic [85], [125] Open English [68], [89], [79], [40], [104], [99], [36], [102], [94], [39], [82], [48], [105], [88], [114], [98], [116] MSA Arabic [64], [42], [61], [56], [80], [93], [ 38], [62] Arabic Dialects: Gulf Arabic and Egyptian [91], [76] Automat ic based Metric F1-Score [89], [114], [106], [80], [62], [100] Precision [105], [106], [80], [62] Recall [105], [106], [80], [62] Accuracy [92], [114], [126], [105], [124], [80], [ 93], [62], [100], [76], [56], [74] PPL [89], [102], [94], [92], [98], [124], [12 6], [39], [101], [85] BLEU [79], [83], [94], [81], [92], [95], [84], [ 126], [127], [86], [101], [49], [40], [4 0], [99], [85], [91] ROUGE [79], [83], [84], [49] MAP, P@1 and MRR [103], [104] SkipThoughts cosine similarity, embedding average cosine similarity, vector extrema cosine similarity, BOW and greedy matching scores [79], [102], [94], [81], [84], [127], [88 ] Other [87], [95], [84], [39], [116], [40], [99], [109], [43], [48], [79], [102], [127] NA [70], [41], [44], [107], [47], [68], [69], [108], [121], [38], [61], [67], [67], [73], [108], [113] Human based Metric H: User Satisfaction [35], [45], [82], [84], [89], [118], [128], [36], [49], [98], [101], [117], [112], [119], [120], [122], [60], [77], [85], [90], [63], [78], [91], [110], [125], [46], [57], [59], [111], [115], [54], [55], [65], [66], [75], [42], [50]- [53], [58], [64], [72] www.ijacsa.thesai.org ...
... Bashir, 2018;Alshammari, 2022;Nuruzzaman, 2020), ROUGE (Omoregbe, 2020;Kapočiūtė-Dzikienė, 2020;Hori, 2019). Accuracy (Boussakssou, 2022;Peng, 2020;Wael, 2021;Wijaya, 2020;Niculescu, 2020;Grosuleac, 2020;Alshammari,2022) Recall (Mai, 2021;Omoregbe, 2020; Precision (Boussakssou, 2022;Peng, 2020;Wael, 2021;Wijaya, 2020;Niculescu, 2020;Grosuleac, 2020;Alshammari, 2022;Mai, 2021;Omoregbe , 2020;) BLEU (Yang, 2018;Aleedy, 2019;Palasundram, 2019;Alshareef, 2020;Kim, 2020;Tran, 2019;Kim et al.,2019), PPL (Song, 2021;Zhang et al., 2020;) MRR, MAP, and P@1 (Prassanna, 2020;; Candra, 2019) Skip thoughts cosine similarity, BOW and greedy matching scores, vector extreme cosine similarity, embedding average cosine similarity (Mai, 2021;Omoregbe, 2020; Other (Hu, 2018;Sajjapanroj, 2020;Mohialden, 2021;Mavridis, 2011) NA (Roca, 2020;Zahour, 2020;Ranavare, 2020;Alotaibi, 2020;Kasinathan, 2020;Vanjani, 2019) Metric based on human H: User Satisfaction (Hijjawi, 2014;Noori, 2014;Octavany, 2020;El Hefny, 2021;Al-Ajmi, 2021;Chete, 2020;Oguntosin, 2021;Mageira, 2022) To create meaningful and interesting interactions, it is essential to understand user preferences and satisfaction levels. As is the case in all other applications, the development of a chatbot will necessarily make it important to choose platforms or frameworks for its design based on their advantages and disadvantages, taking into account the rapid development in (NLU) (Dagkoulis, 2022). ...
... Bashir, 2018;Alshammari, 2022;Nuruzzaman, 2020), ROUGE (Omoregbe, 2020;Kapočiūtė-Dzikienė, 2020;Hori, 2019). Accuracy (Boussakssou, 2022;Peng, 2020;Wael, 2021;Wijaya, 2020;Niculescu, 2020;Grosuleac, 2020;Alshammari,2022) Recall (Mai, 2021;Omoregbe, 2020; Precision (Boussakssou, 2022;Peng, 2020;Wael, 2021;Wijaya, 2020;Niculescu, 2020;Grosuleac, 2020;Alshammari, 2022;Mai, 2021;Omoregbe , 2020;) BLEU (Yang, 2018;Aleedy, 2019;Palasundram, 2019;Alshareef, 2020;Kim, 2020;Tran, 2019;Kim et al.,2019), PPL (Song, 2021;Zhang et al., 2020;) MRR, MAP, and P@1 (Prassanna, 2020;; Candra, 2019) Skip thoughts cosine similarity, BOW and greedy matching scores, vector extreme cosine similarity, embedding average cosine similarity (Mai, 2021;Omoregbe, 2020; Other (Hu, 2018;Sajjapanroj, 2020;Mohialden, 2021;Mavridis, 2011) NA (Roca, 2020;Zahour, 2020;Ranavare, 2020;Alotaibi, 2020;Kasinathan, 2020;Vanjani, 2019) Metric based on human H: User Satisfaction (Hijjawi, 2014;Noori, 2014;Octavany, 2020;El Hefny, 2021;Al-Ajmi, 2021;Chete, 2020;Oguntosin, 2021;Mageira, 2022) To create meaningful and interesting interactions, it is essential to understand user preferences and satisfaction levels. As is the case in all other applications, the development of a chatbot will necessarily make it important to choose platforms or frameworks for its design based on their advantages and disadvantages, taking into account the rapid development in (NLU) (Dagkoulis, 2022). ...
... The Government Technology Agency of Singapore has been experimenting with virtual assistants (chatbots) [27] and according to their claims this has helped their citizens and businesses to significantly shorten wait times to queries, increased accessibility and in general improved user experience. Chatbot mediated public services are becoming increasingly sophisticated in Singapore as well as many other countries, and there have been attempts, for example DigiMo by Niculescu et al [28], to use sequence to sequence deep neural networks which learn on dialogue data focussed on a particular demographic (in this case Singaporean citizens), taking into account the emotional content in that data. ...
Preprint
Full-text available
The current wave of digital transformation has spurred digitisation reforms and has led to prodigious development of AI & NLP systems, with several of them entering the public domain. There is a perception that these systems have a non trivial impact on society but there is a dearth of literature in critical AI on what are the kinds of these systems and how do they operate. This paper constructs a broad taxonomy of NLP systems which impact or are impacted by the ``public'' and provides a concrete analyses via various instrumental and normative lenses on the socio-technical nature of these systems. This paper categorises thirty examples of these systems into seven families, namely; finance, customer service, policy making, education, healthcare, law, and security, based on their public use cases. It then critically analyses these applications, first the priors and assumptions they are based on, then their mechanisms, possible methods of data collection, the models and error functions used, etc. This paper further delves into exploring the socio-economic and political contexts in which these families of systems are generally used and their potential impact on the same, and the function creep of these systems. It provides commentary on the potential long-term downstream impact of these systems on communities which use them. Aside from providing a birds eye view of what exists our in depth analysis provides insights on what is lacking in the current discourse on NLP in particular and critical AI in general, proposes additions to the current framework of analysis, provides recommendations future research direction, and highlights the need to importance of exploring the social in this socio-technical system.
Conference Paper
In the digital governance era, question-answering (QA) systems are critical in efficiently answering citizens' different questions. Answer quality from these QA systems remarkably influences citizens' satisfaction and trust in the government. However, there is a lack of research in detecting answer quality for the QA systems. Nowadays, leveraging the capabilities of large language models (LLMs) in digital governance shows great potential to fill in this research gap. LLMs perform well in understanding unstructured text and show better performance in text classification tasks. Despite their powerful abilities, existing LLMs are limited in understanding complicated text attributes such as quality. This study proposes an answer quality detection method for digital government QA systems, combining the strengths of LLMs and machine learning (ML). Instead of asking for a direct rating of abstract attributes, we used an established metric to guide LLMs in several comprehensible dimensions and then used ML models to learn the relationship between these dimensions and the overall quality. Our approach harnesses LLMs' proficiency in understanding unstructured text and ML models' capability in detecting and classifying structural matrix data. Positioned as a pre-filter in QA systems, this method aims to classify whether the answers meet the criteria for high quality as citizens' expectations. Ultimately, this method efficiently selects high-quality answers for the final output, prompting reevaluation and refinement of low-quality answers. This, in turn, improves the service level of digital governments, fostering citizens' satisfaction and trust in the government.
Conference Paper
Full-text available
This work describes the development of a social chatbot for the football domain. The chatbot, named chatbol, aims at answering a wide variety of questions related to the Spanish football league "La Liga". Chatbol is deployed as a Slack client for text-based input interaction with users. One of the main Chatbol's components , a NLU block, is trained to extract the intents and associated entities related to user's questions about football players, teams, trainers and fixtures. The information for the entities is obtained by making sparql queries to Wikidata site in real time. Then, the retrieved data is used to update the specific chatbot responses. As a fall-back strategy, a retrieval-based conversational engine is incorporated to the chatbot system. It allows for a wider variety and freedom of responses, still football oriented, for the case when the NLU module was unable to reply with high confidence to the user. The retrieval-based response database is composed of real conversations collected both from a IRC football channel and from football-related excerpts picked up across movie captions, extracted from the OpenSubtitles database.
Article
Full-text available
This paper focuses on the design and evaluation of SARA, a conversational agent for the touristic domain featuring a high number of different, unique characteristics: spoken dialogue interaction, dialogue orchestration, context dependent information, an animated avatar and support for different kind of dialogue types, i.e. chat, specific and general question-answering, task oriented dialogues. The agent has currently two implementations: as web client and as mobile phone application. The paper describes the modules and resources required for running the agent on both interfaces, as well as the evaluation results obtained from two assessment studies concerning the interaction design of these two agent interfaces. The feedback gathered from the studies will enable us to improve the applications in terms of service, performance and usability.
Book
A chatbot is expected to be capable of supporting a cohesive and coherent conversation and be knowledgeable, which makes it one of the most complex intelligent systems being designed nowadays. Designers have to learn to combine intuitive, explainable language understanding and reasoning approaches with high-performance statistical and deep learning technologies. Today, there are two popular paradigms for chatbot construction: 1. Build a bot platform with universal NLP and ML capabilities so that a bot developer for a particular enterprise, not being an expert, can populate it with training data; 2. Accumulate a huge set of training dialogue data, feed it to a deep learning network and expect the trained chatbot to automatically learn “how to chat”. Although these two approaches are reported to imitate some intelligent dialogues, both of them are unsuitable for enterprise chatbots, being unreliable and too brittle. The latter approach is based on a belief that some learning miracle will happen and a chatbot will start functioning without a thorough feature and domain engineering by an expert and interpretable dialogue management algorithms. Enterprise high-performance chatbots with extensive domain knowledge require a mix of statistical, inductive, deep machine learning and learning from the web, syntactic, semantic and discourse NLP, ontology-based reasoning and a state machine to control a dialogue. This book will provide a comprehensive source of algorithms and architectures for building chatbots for various domains based on the recent trends in computational linguistics and machine learning. The foci of this book are applications of discourse analysis in text relevant assessment, dialogue management and content generation, which help to overcome the limitations of platform-based and data driven-based approaches. Supplementary material and code is available at https://github.com/bgalitsky/relevance-based-on-parse-trees
Chapter
IntroductionThe Characteristics That Distinguish Basic EmotionsDoes Any One Characteristic Distinguish the Basic Emotions?The Value of the Basic Emotions PositionAcknowledgementsReferences
Article
ELIZA is a program operating within the MAC time-sharing system of MIT which makes certain kinds of natural language conversation between man and computer possible. Input sentences are analyzed on the basis of decomposition rules which are triggered by key words appearing in the input text. Responses are generated by reassembly rules associated with selected decomposition rules. The fundamental technical problems with which ELIZA is concerned are: (1) the identification of key words, (2) the discovery of minimal context, (3) the choice of appropriate transformations, (4) generation of responses in the absence of key words, and (5) the provision of an editing capability for ELIZA “scripts”. A discussion of some psychological issues relevant to the ELIZA approach as well as of future developments concludes the paper. © 1983, ACM. All rights reserved.
Basic Emotions. In: Handbook of cognition and emotions
  • Paul Ekman
Paul Ekman. 1999. Basic Emotions. In: Handbook of cognition and emotions. T. Dalgleish & M.J. Power (eds.), Willey, US, 45-60
Design and evaluation of a conversational agent for the touristic domain
  • A I Niculescu
  • K H Yeo
  • L F Haro
  • S Kim
  • R Jiang
  • R E Banchs
A.I. Niculescu, K.H. Yeo, L.F. D'Haro, S. Kim, R. Jiang, R.E. Banchs. 2014. Design and evaluation of a conversational agent for the touristic domain. In: Proc. of APSIPA