Conference PaperPDF Available

Learning from Interaction: An Intelligent Networked-based Human-bot and Bot-bot Chatbot System

Authors:

Abstract and Figures

In this paper we propose an approach to a chatbot software that is able to learn from interaction via text messaging between human-bot and bot-bot. The bot listens to a user and decides whether or not it knows how to reply to the message accurately based on current knowledge, otherwise it will set about to learn a meaningful response to the message through pattern matching based on its previous experience. Similar methods are used to detect offensive messages, and are proved to be effective at overcoming the issues that other chatbots have experienced in the open domain. A philosophy of giving preference to too much censorship rather than too little is employed given the failure of Microsoft Tay. In this work, a layered approach is devised to conduct each process, and leave the architecture open to improvement with more advanced methods in the future. Preliminary results show an improvement over time in which the bot learns more responses. A novel approach of message simplification is added to the bot’s architecture, the results suggest that the algorithm has a substantial improvement on the bot’s conversational performance at a factor of three.
Content may be subject to copyright.
Learning from Interaction: An Intelligent Networked-
based Human-bot and Bot-bot Chatbot System
Jordan J. Bird, Anikó Ekárt and Diego R. Faria
Aston Lab for Intelligent Collectives Engineering (ALICE)
School of Engineering and Applied Science
Aston University, Birmingham, B4 7ET, UK.
{birdj1, a.ekart, d.faria}@aston.ac.uk
Abstract. In this paper we propose an approach to a chatbot software that is able
to learn from interaction via text messaging between human-bot and bot-bot. The
bot listens to a user and decides whether or not it knows how to reply to the
message accurately based on current knowledge, otherwise it will set about to
learn a meaningful response to the message through pattern matching based on
its previous experience. Similar methods are used to detect offensive messages,
and are proved to be effective at overcoming the issues that other chatbots have
experienced in the open domain. A philosophy of giving preference to too much
censorship rather than too little is employed given the failure of Microsoft Tay.
In this work, a layered approach is devised to conduct each process, and leave
the architecture open to improvement with more advanced methods in the future.
Preliminary results show an improvement over time in which the bot learns more
responses. A novel approach of message simplification is added to the bot’s ar-
chitecture, the results suggest that the algorithm has a substantial improvement
on the bot’s conversational performance at a factor of three.
Keywords: Artificial Intelligence, Natural Language Processing, Chatbot
1 Introduction
Both Artificial Intelligence (AI) researchers and industry are increasingly recognising
the importance of chatbot systems. Chatbots are embedded in everyday life, performing
various roles serving as assistants for end-users. Examples include consumer interac-
tion geared towards problem-solving and addressing customer requests in a variety of
industries [1] and even performing as a virtual home-assistant in home-automation du-
ties within a modern household [2].
In this work, a Chatbot system learns through experience to respond to certain messages
based on previous interactions, and gains more knowledge over time based on its inter-
action with users.
The main contributions of this work are three-fold:
2
1. A modular web-based chatbot system useful for different applications and
with the potential in industry is presented.
2. Natural Language Processing (NLP) and Sentiment Analysis are applied for
offensive vocabulary detection to prevent the current issues faced by chatbots
in the open domain [12].
3. A novel approach is introduced for a message simplification layer in which
colloquialisms and identical synonyms are mitigated by simplifying them into
marking flags, and building responses in real-time.
This paper will first explore the related works (Section 2), highlighting milestones as
well as open issues currently existing in the field. The proposed approach (Section 3)
will then be detailed, followed by a preliminary set of results (Section 4). The paper
then concludes with a discussion of results, future work and impact (Section 5).
2 Related Work
Sentiment analysis (a.k.a. opinion mining) identifies people’s opinions, sentiments,
emotions, appraisals and attitudes towards entities such as services and products, or-
ganizations and/or individuals, events and their attributes. Chatbot systems are tools
that employ natural language processing and sentiment analysis for multiple purposes,
examples include human-machine communication applied to business, health, and ed-
ucation. The objective measurement of the users experience and emotional states is a
key factor to understand the level of satisfaction or mood given a certain context. Tack-
ling this problem is very challenging, since the measurements of satisfaction levels or
internal states might be too ambiguous, but new emerging technologies combined with
computational intelligence can provide a new perspective for approaching the problem
in a more effective manner. There are many channels to detect and perform sentiment
analysis such as affection through facial expressions [17][18], voice [19], and also
through text communication via natural language processing [4]. Body gesture can also
be an interesting alternative for user behaviour or emotional expression understanding
[20]. This work falls into the category of text communication between two agents (hu-
man-bot or bot-bot) using natural language processing and sentiment analysis via an
intelligent web-based chatbot system. Related works are presented below to show the
progress of chatbot systems in different domains, their challenges and open issues in
this field.
Weizenbaum’s ELIZA [22] system is often considered as the first ever chatbot system,
which quickly became famous in the 1960s due to people engaging in conversations
with it beyond the expectations of its creator.
Shawar and Atwell’s exploration of using dialogue corpora to generate chatbots [3]
found that a non-bot human experience (such as the script of two human beings con-
versing) was more than capable of forming the basis for a chatbot. This therefore sug-
gests that rather than starting with no knowledge whatsoever, a dataset of conversations
3
could be provided to give the chatbot a useful head start in learning from interaction. A
response format would need to be developed to organise this data into.
In 2004, the effectiveness of chatbots deployed to teach students foreign languages was
demonstrated [4]. The effectiveness of a robotic agent became clear when it was noted
that the students were more open to admit fault to the bot rather than a person, showing
that students were far less intimidated by a feeling of inadequacy during the learning
process. This research is very promising, as it shows an application of a chatbot in ed-
ucation which would not only aid the education system, but further improve it.
Freudbot is an Artificial Intelligence Markup Language (AIML) based chatbot, an
XML based approach to message pattern and response pairings. For experimental pur-
poses it was designed to speak to students to aid with distance learning when staff were
not available out of hours [5]. Heller et al. found that the experience was itself neutral,
neither good nor bad, but explained that with more responses it would have been ex-
pected to be better. This suggests that more responses create a better experience for the
user, and so learning from experience could constantly evolve a chatbot to become bet-
ter with each and every interaction.
Tatai et al. ran tests of chatbots who preferred using different sentiments to see how a
user would react and recorded how they found the experience [6]. Everyone, regardless
of input sentiment, found a better experience when the chatbot response was positive.
This suggests that a learning bot should respond to negative messages, but not risk re-
iterating them in the future, which would result in only positive responses, regardless
of input. In terms of impact, this means the bot must analyse the sentiment of the mes-
sages in order to make use of this important work and as such allow the implementation
of a chatbot that will not learn from negative messages.
The Loebner prize [7] is awarded to a chatbot which can fool a human into thinking
they are conversing with another human. This is based on the “Imitation game” pro-
posed by Turing in 1950 [21]. Shawar and Atwell [8] found that case-by-case training
was required depending on localisation and audience. They also suggest that this prob-
lem could be overcome by having a more massive dataset to draw from. This suggests
that larger datasets would improve the chatbot, but to contextualise it to an environ-
ment, the level of training required is dynamic and depends on its uses.
ALICE is an XML based chatbot [9] that makes use of pattern matching between user
inputs as well as a hidden person to generate responses to a user’s message. The method
of pattern matching to a stored set of messages was an effective enough method for the
bot to have won the Loebner Prize three times [10] suggesting that the act of learning
from interaction is an effective method of creating record-breaking chatbots, in terms
of competitions that were won.
Tay, an AI chatbot that learns from previous interaction [11] caused major controversy
due to it being targeted by internet trolls on Twitter [12]. The bot was exploited, and
4
after 16 hours began to send extremely offensive Tweets to users. This suggests that
although the bot learnt effectively from experience, adequate protection was not put in
place to prevent misuse. This implies two important influences - firstly, learning from
interaction is reinforced as an effective technique. Secondly, the bot must have protec-
tion in place to prevent such an event from occuring.
The Google Cloud NLP API [13] is a library that can process a message and calculate
sentiment and magnitude thereof. The API, once authorised, will receive a message as
raw text, and then produce a response array containing pieces of sentiment information
gathered from the input, therefore enabling us to be able to measure the sentiment of a
message and deem whether or not it is positive or negative. This knowledge would
allow for selective response elicitation based on positivity.
Despite the efforts spent on these extensive and high quality works in the area of sci-
entific philosophy of chatbots, most, except for [4], are yet to produce systems capable
of specifically long-term deployment into a real-world situation, rather concentrating
on the learning theory behind their conception. This suggests that there is room for such
systems in industry, making use of aforementioned techniques already well docu-
mented and published in respectable Machine Learning journals. To conclude, the level
of current scientific knowledge is seemingly ahead of that employed in the real-world,
and furthermore is not yet in a state to be used in real-world applications.
3 Proposed Approach
For user interaction with the bot, the user will input a message into an arbitrary front-
end application. The bot makes use of an input/output structure of web-requests and
therefore any application that accommodates this can be used. The system outputs its
response in the form of a simple key-value JSON Array. To show the bot’s potential,
two front-end applications were developed (see Fig. 1), a browser-based application
using asynchronous JavaScript to populate a chat window with results, and an auto-
mated Twitter bot that would perform the same process when replying to Tweets.
Fig. 1. HTML and UI bot interfaces.
5
System architecture. In terms of the bot itself, a black box system is developed (see
Fig. 2) seen in steps 1 to 4.1.1, the message from the user is processed and results are
generated accordingly. Firstly, to avoid the the aforementioned Microsoft Tay disaster,
a pattern-matching algorithm is executed, comparing the input message to a library of
all of the known offensive words in the English Language [16], together with a list of
Political terms. If the bot were to detect any of these terms, a response will not be gen-
erated.
Secondly, the Google Cloud Sentiment Analysis API is requested to analyse the senti-
ment of the input message. This allows the bot to detect whether the message is con-
sidered positive, negative, or neutral (without emotion). Furthermore, following the
work performed into user experience with emotional chatbots [6], the bot will only store
the unknown message if, and only if, it is not considered of negative sentiment. This in
effect will prevent the bot from having the ability to produce a negative response to a
message regardless of the input sentiment. In addition, sentiment analysis is also per-
formed (See Fig.2) during step 2 in real-time after a response has been generated, to
mitigate a slight change in said message’s sentiment after the simplification and flag-
ging is reversed, effectively preventing any incorrect measurements or representations.
Fig. 2. Proposed architecture overview and flow of system process.
4. Response Selection
3. Message Simplification (flagging)
2. Sentiment Analysis
1. Offensiveness Analysis (‘Internet trolling’)
4.1.1 Flag unknown message
Message
Formatted
response
Request JSON
Response
User
Front end
Application
Back end
Application
4.1 Decide whether learning is required
6
LSTM Overview. The Google Sentiment Analysis tool [13] is trained using Long-
Short-Term-Memory (LSTM), where multiple recurrent neural networks (RNN) pre-
dict an output based on their input and their current state [14]. The general idea is as
follows.
Firstly, a logical forget gate at the current timestep ft will decide which information to
discard and delete: Wf represents the learning-weighting matrix, h represents the output
vector of the unit at provided timestep t-1, xt being the input vector at defined timestep
t, and finally bf is a bias vector applied to the process.
 
    (1)
Secondly, the cell must decide on which information to store. The variable i represents
the input data being received by the cell, and does so through a logical input gate. Ct
being the vector of the new values generated by the process.
    (2)

   (3)
Thus, the cell’s parameters are therefore updated using the calculated variables (1-3) in
a convolutional operation:

(4)
An output is consequently generated where Ot represents the cell’s output gate at the
current timestep, t. The internal (hidden) state of the cell is subsequently updated to
match its new value:
    (5)
 (6)
This LSTM paradigm is used extensively in modern Machine Learning applications
due to its ongoing record breaking effectiveness, one of which was most notably Mi-
crosoft’s speech recognition system being at a genuinely human level of complexity
[15]. A user’s message was passed at the second step in the process (see fig.2) to the
API and its responses were used accordingly. The Google Sentiment Analysis toolkit
made use of the LSTM by deriving the sentiment score on a scale of -1.0 to 1.0 (most
negative, through neutral, to most positive) as well as its magnitude (the strength of the
derived sentiment score regardless of value). These values are used in both input to
adhere to the findings of Tatai et al. [6] in sentiment impact of user experience, as well
as provided by the system to the front-end application to be used in a platform-applica-
ble approach.
7
Proposed Simplification Step. A novel approach of message simplification is em-
ployed in (see Fig. 2) step 3. Messages that contain colloquialisms or otherwise identi-
cal synonyms that do not change the meaning of a message, although greatly impacting
the structure, will be simplified by replacing said terms with a flag.
Possibilities of message s where x X are the set of all known flag-phrase pairs are
given as the product of all possibilities of said phrase within the message.
  
 . (7)
An iterative process takes into account the stored steps of flag-phrase relationships and
replaces the phrases based on their flag parent. Possibilities, therefore, can be calculated
via the product of the set of phrase siblings within flags existent in the message s.
The reason for the introduction of this simplification step is that the usage of this layer
would expand the learning capabilities by overcoming the differentiation of spoken
sentences in terms of societal colloquialisms and synonyms with identical meaning.
This is done by denoting them with flags rather than retaining the original string data.
These are arbitrarily stored sets of strings that have been created manually where an
index simply defines either a flag or a phrase eg. {“[GREETING]”, “Hello”, “Hi”}
where at index 0 a flag exists, in this case “[GREETING]”. All values at index i>0
therefore, are children of said flag.
For example, if the bot were to know ten phrases for ‘hello’, and five for ‘happy’ the
sentence ‘Hello! I am happy!’ would be simplified as ‘[GREETING]! I am
[ADJ_HAPPY]!’ which would in turn have 50 possible combinations. Without this
method, fifty messages would have been learned individually, whereas with its imple-
mentation, only one exchange is needed to learn responses to all messages.
Fig. 3. Previous interaction comparison between lingual simplification algorithm active and in-
active.
Fig. 3 details the usage of lingual simplification in terms of two messages that, other
than a slight differentiation of sentiment (through wording), have identical meanings.
Without the layer active, the two messages must be learnt from individually. With the
lingual simplification active, the experiences gathered from one of the message are ap-
plied to the other due to their identical meaning.
“Hello, how are you?” “Hey there, how’re you
doing?”
&= 59%
“[GREETING],
[MISC_HOW_ARE_YOU]”
“[GREETING],
[MISC_HOW_ARE_YOU]”
&= 100%
8
Response Selection. (See Fig. 2) Step 4 is comprised of three sub-layers. Firstly, pat-
tern matching is performed against the bot’s stored dataset of message-response pairs.
Thresholds are preliminarily defined as 60% and 90% (see Fig. 4), the former defining
an accurate response that needs to be learnt from, and the latter defining an accurate
response that does not require further learning. Results below either values will flag the
message for further learning, whereas a result below 60% will have the bot change
conversation due to the response being too inaccurate to give.
Learning is only performed on the non-negative messages due to the findings of user
chatbot experience in terms of response sentiment [6].
Testing will be performed by having three individuals having a twenty-message con-
versation with the bot (10 user messages and 10 bot responses), and will be automated
by recreating them on the system with and without the simplification algorithm active.
Conversational dataset will persist throughout the three conversations. This method of
testing is followed so two identical conversations can be compared on two identical
chatbot states.
Learning Methodology. The learning process of the bot is proposed as follows. If a
bot were not to understand a message input, it would change conversation. The conver-
sation would be changed to a message that the bot has previously not understood (from
another user) and the user’s response would be stored as a potential candidate for a
message-response pair, as well as their measured sentiment (see Fig. 4). This calcula-
tion is simply based on a pattern similarity between the two messages (the input, and
the closest previously seen message) which is logically relevant as the simplification
layer has been executed, meaning the simplified flags have been taken into account in
said pattern match (See Fig 3.)
Fig. 4. Bot reactions to detected percentage similarity.
4 Preliminary Results
Testing was performed by observing three individuals conversing with the bot for a
period of 20 messages; conversations were repeated on identical datasets both with the
simplification layer turned off, and on. The environment selected was a web-based in-
terface that asynchronously requested responses from the bot server. A small general
conversational dataset of 200 message-response pairs was deployed to give the bot a
0-59% 60-89% 90%+
The response is not accurate enough, change
conversation and do not give the response. Add
the message to the ‘unknowns’ memory.
The response is considered
accurate enough to give.
Iterate the response, but
also add it to the
‘unknowns’ memory
Give the response,
no need to learn
further
ONLY IF sentiment is neutral to positive
9
starting knowledge. The message dataset was produced by having simple, general con-
versations with the bot.
The 200 messages were pre-processed via the layer for the simplification testing which
began at 478 combinations “known responses” referring to both the stored responses
and their combinations due to a combinational message being treated as a message (see
proposed approach). Thus, the illusion of a larger dataset was produced.
Table 1 details the bot’s Known Response Increase (KRI) over the course of three con-
versations. Without the novel method of message simplification, a total of 27 new re-
sponses were learnt and on average, a response was deemed accurate enough 58.3% of
the time. On the other hand, with the simplification layer activated, the exact same con-
versations resulted in 82 new responses learnt and on average created an accurate mes-
sage response 68.3% of the time. The calculation of percentage success is simply the
ratio of the number of messages the bot replied to the total number of messages, ie.
contrasting those that were replied to as opposed to the user messages that were not
understood. This is seen further in figures 5 and 6, in which success is indicated as
binary result of 0/100 (could not reply/did reply).
Table 1. Conversation success with and without Message Simplification.
KRI
Success [ %]
KRI
Success [ %]
No simplification
With simplification
8
65
21
80
8
60
28
70
11
50
33
55
27
58.3 Avg.
82
68.3 (Avg.)
Figures 5 and 6 give a graphical representation of the three sequential conversations
over time, where X is the number of the message of the total 1-60 messages. Success
(%) shows the percentage success in terms of accuracy of the response where 0 was
deemed inaccurate and 100% was a reply from the bot. Known responses shows the
learning process over time through the increasing number of known responses the bot
has to input messages.
10
Fig. 5. Performance through three iterative twenty-message conversations (no simplification).
Fig. 6. Performance through three iterative ten-message conversations (with simplification).
5 Conclusion
The learning ability is clearly indicated by the preliminary results. The number of
known message-response pairs grows over time, with experience. Message success
does not seem to improve, but further extensive research is needed to explore the rela-
tionships between known responses and conversational success. The experiment only
0
50
100
150
200
250
1 6 11 16 21 26 31 36 41 46 51 56
Number of responses/%success
Message (over time)
Known responses Success (%)
0
100
200
300
400
500
600
1 6 11 16 21 26 31 36 41 46 51 56
Number of responses/%success
Message (over time)
Known responses Success (%)
11
covered sixty messages (three twenty message pairs), but many more would be needed
to explore said relationship.
Furthermore, conversational re-creation during testing shows the massive improvement
when the message simplification layer is employed to mitigate colloquialisms, and even
though this system tends to get fewer opportunities to learn (due to its higher success
rate), it will make use of these opportunities far better than the system without will ever
do.
Future Work and Impact. The decrease of message success gives the illogical im-
pression of the bot performing worse, as it accumulates more knowledge. This is likely
due to the user’s differing conversational subjects, due to the fact that more knowledge
quite literally correlates to more responses in this system. More extensive conversations
with the bot must be obtained to gain a more accurate figure of success over time. Pat-
tern matching learning thresholds (60%, 90%) during the learning process were set ar-
bitrarily at levels that made logical sense, as they performed as expected during the
testing stage (see Table 1). Further improvement to an accuracy-based learning system
in terms of string patterns can be expected by setting more effective threshold values.
Jia found that chatbots had the potential to not only aid in the education system, but
also effectively improve it [4]. The further introduction of selective learning from ex-
perts, and through this, the formation of an expert system could ultimately lead to a
conversational aid in required situations. For example, a teaching assistant may answer
many uniquely-phrased questions with logically identical, or close-to, answers. An ex-
pert chatbot born from the knowledge of said teaching assistants could, for example,
lead to a more efficient University Laboratory paradigm in which students are aided by
both human and machine.
Following this vein of thought, a system deployed post-learning from many experts in
the field of psychotherapy could introduce the usage of knowledge re-application in
situations such as mental health counseling. This is a strong social impact that, as of
yet, has not been achieved.
References
1. Kuligowska, K. (2015). Commercial chatbot: performance evaluation, usability metrics and
quality standards of embodied conversational agents.
2. Alexa, Amazon. "Amazon" (2014).
3. Shawar, Bayan Abu, and Eric Atwell. (2003). "Machine Learning from dialogue corpora to
generate chatbots." Expert Update journal 6.3: 25-29.
4. Jia, Jiyou. "The study of the application of a web-based chatbot system on the teaching of
foreign languages." (2004). Society for Information Technology & Teacher Education In-
ternational Conference. Association for the Advancement of Computing in Education
(AACE). (pp. 1201-1207)
12
5. Heller, B., Proctor, M., Mah, D., Jewell, L., & Cheung, B. (2005). Freudbot: An investiga-
tion of chatbot technology in distance education. In EdMedia: World Conference on Educa-
tional Media and Technology (pp. 3913-3918). Association for the Advancement of Com-
puting in Education (AACE).
6. Tatai, G., Csordás, A., Kiss, Á., Szaló, A., & Laufer, L. (2003). Happy chatbot, happy user.
In Intelligent Virtual Agents (pp. 5-12). Springer Berlin/Heidelberg.
7. Mauldin, M. L. (1994). Chatterbots, tinymuds, and the turing test: Entering the loebner prize
competition. In AAAI (Vol. 94, pp. 16-21).
8. Shawar, B. A., & Atwell, E. (2007). Different measurements metrics to evaluate a chatbot
system. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Re-
search in Dialog Technologies (pp. 89-96). Association for Computational Linguistics.
9. Wallace, R. (2001). Artificial linguistic internet computer entity (ALICE). Retrieved from
https://www.chatbots.org/chatbot/a.l.i.c.e/. Last accessed 25/5/2018
10. The Exeter Blog. (2014). The Loebner Prize, a Turing Test competition at Bletchley Park.
Retrieved from https://blogs.exeter.ac.uk/exeterblog/blog/2014/12/08/the-loebner-prize-a-
turing-test-competition-at-bletchley-park/
11. Microsoft (March). Tay AI. Retrieved from https://twitter.com/tayandyou. Last accessed
25/5/2018.
12. Wakefield, J. (2016). BBC News. Microsoft chatbot is taught to swear on Twitter. Retrieved
April 12, 2018, from http://www.bbc.co.uk/news/technology-35890188.
13. "Google Cloud Products" (n.d.). Retrieved March 28, 2018 from https://cloud.google.com.
Last accessed 25/5/2018
14. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation,
9(8), 1735-1780.
15. Haridy, Rich (2017). "Microsoft's speech recognition system is now as good as a human".
newatlas.com. Last accessed April 6, 2018 from https://newatlas.com/microsoft-speech-
recognition-equals-humans/50999/.
16. AllSlang. (n.d.). Swear Word List, Dictionary, Filter, and API. Retrieved March 11, 2018,
from https://www.noswearing.com/
17. D. R. Faria, M. Vieira, F. C. C. Faria, C. Premebida (2017). Affective Facial Expressions
Recognition for Human-Robot Interaction. IEEE International Symposium on Robot and
Human Interactive Communication, 805-810.
18. 18. D. R. Faria, M. Vieira, F. C. C. Faria (2017). Towards the Development of Affective
Facial Expression Recognition for Human-Robot Interaction. International Conference on
Pervasive Technologies Related to Assistive Environments, 300-304.
19. D. Bertero, F. Siddique, C. Wu, Y. Wan, R. Chan, P. Fung (2016). Real-Time Speech Emo-
tion and Sentiment Recognition for Interactive Dialogue Systems. Conference on Empirical
Methods in Natural Language Processing, 10421047.
20. M. Vieira, D. R. Faria, U.Nunes (2015). Real-time Application for Monitoring Human Daily
Activities and Risk Situations in Robot-assisted Living. 2nd Iberian Robotics Conference,
449-461.
21. A. M. Turing (1950) Computing Machinery and Intelligence. Mind 49: 433-460.
22. J. Weizenbaum (1976). Computer Power and Human Reason: From Judgment to Calcula-
tion. New York: W.H. Freeman and Company.
... The main open issue in the field of conversational agents is data scarcity which in turn can lead to unrealistic and unnatural interaction, overcoming which are requirements for the Loebner Prize based on the Turing test [32]. Solutions have been offered such as data selection of input [33], input simplification and generalisation [34], and more recently parapahrasing of data [35]. These recent advances in data augmentation by paraphrasing in particular have shown promise in improving conversational systems by increasing understanding of naturally spoken language [36,37]. ...
... • Conversational AI [34] -The participant requests to have a conversation, a chatbot program is executed. ...
Preprint
Full-text available
In this work, we present the Chatbot Interaction with Artificial Intelligence (CI-AI) framework as an approach to the training of deep learning chatbots for task classification. The intelligent system augments human-sourced data via artificial paraphrasing in order to generate a large set of training data for further classical, attention, and language transformation-based learning approaches for Natural Language Processing. Human beings are asked to paraphrase commands and questions for task identification for further execution of a machine. The commands and questions are split into training and validation sets. A total of 483 responses were recorded. Secondly, the training set is paraphrased by the T5 model in order to augment it with further data. Seven state-of-the-art transformer-based text classification algorithms (BERT, DistilBERT, RoBERTa, DistilRoBERTa, XLM, XLM-RoBERTa, and XLNet) are benchmarked for both sets after fine-tuning on the training data for two epochs. We find that all models are improved when training data is augmented by the T5 model, with an average increase of classification accuracy by 4.01%. The best result was the RoBERTa model trained on T5 augmented data which achieved 98.96% classification accuracy. Finally, we found that an ensemble of the five best-performing transformer models via Logistic Regression of output label predictions led to an accuracy of 99.59% on the dataset of human responses. A highly-performing model allows the intelligent system to interpret human commands at the social-interaction level through a chatbot-like interface (e.g. "Robot, can we have a conversation?") and allows for better accessibility to AI by non-technical users.
... It is important that the synthetic data is not only different to the actual data, but also that it contains useful knowledge to improve classifiers when attempting to understand language. For example, chatbot software has been noted to improve in ability when synonymous terms are generalised as flags (Bird et al. 2018a (Sun et al. 2020). In this work, we consider improving a data scarce problem by augmenting the training dataset by paraphrasing it via a pre-trained Transformer model. ...
... The main open issue in the field of conversational agents is data scarcity which in turn can lead to unrealistic and unnatural interaction, overcoming which are requirements for the Loebner Prize based on the Turing test (Stephens 2002). Solutions have been offered such as data selection of input (Dimovski et al. 2018), input simplification and generalisation (Bird et al. 2018a), and more recently paraphrasing of data (Virkar et al. 2019). These recent advances in data augmentation by paraphrasing in particular have shown promise in improving conversational systems by increasing understanding of naturally spoken language (Hou et al. 2018;Jin et al. 2018). ...
Article
Full-text available
In this work we present the Chatbot Interaction with Artificial Intelligence (CI-AI) framework as an approach to the training of a transformer based chatbot-like architecture for task classification with a focus on natural human interaction with a machine as opposed to interfaces, code, or formal commands. The intelligent system augments human-sourced data via artificial paraphrasing in order to generate a large set of training data for further classical, attention, and language transformation-based learning approaches for Natural Language Processing (NLP). Human beings are asked to paraphrase commands and questions for task identification for further execution of algorithms as skills. The commands and questions are split into training and validation sets. A total of 483 responses were recorded. Secondly, the training set is paraphrased by the T5 model in order to augment it with further data. Seven state-of-the-art transformer-based text classification algorithms (BERT, DistilBERT, RoBERTa, DistilRoBERTa, XLM, XLM-RoBERTa, and XLNet) are benchmarked for both sets after fine-tuning on the training data for two epochs. We find that all models are improved when training data is augmented by the T5 model, with an average increase of classification accuracy by 4.01%. The best result was the RoBERTa model trained on T5 augmented data which achieved 98.96% classification accuracy. Finally, we found that an ensemble of the five best-performing transformer models via Logistic Regression of output label predictions led to an accuracy of 99.59% on the dataset of human responses. A highly-performing model allows the intelligent system to interpret human commands at the social-interaction level through a chatbot-like interface (e.g. “Robot, can we have a conversation?”) and allows for better accessibility to AI by non-technical users.
... Along with this, repetitive inquiries from applicants and their parents accompany the admission form [12] and it is hard to handle such queries in a very short period of time. To address the issue, education organizations are using AI tools in the form of the chatbox, etc. for handling the flood of inquiries during the admission time or process [13]. ...
Article
Full-text available
The aim of the article is to explore the academic and administrative applications of Artificial Intelligence. Teachers have the main responsibility of teaching in any educational setting. But there are various other tasks to be performed by the teachers as well. Besides academic duty, most of the teacher’s time and educational resources are dedicated to administrative works. Artificial Intelligence Applications (AIA) are not only assisting education academically and administratively but also enhance their effectiveness. AIA provides help to teachers in various types of tasks in the shape of Learning Analytics (LA), Virtual Reality (VR), Grading/Assessments (G/A), and Admissions. It minimizes the administrative tasks of a teacher to invest more in teaching and guiding students. In the current era, where there are a lot of tasks associated with the teaching profession, AIA adds a significant contribution to enhance student learning, minimize the workload of a teacher, grade/assess the students effectively and easily, and to help in a lot of other administrative tasks. The study needs to be quantitatively checked to make it generalized and acceptable.
... In [5], a reinforcement learning model was developed for expanding knowledge bases with open-world data from conversations, and, in [6], a self-feeding chatbot that uses sentiment analysis decides what is new knowledge and should be added to the knowledge base. Both works showed positive results in their testing, and it is clear that the conversationbased approach can increase the chatbot's predictive capacity. ...
Article
Full-text available
Managing and evolving a chatbot’s content is a laborious process and there is still a lack of standardization. In this context of standardization, the absence of a management process can lead to bad user experiences with a chatbot. This work proposes the Chatbot Management Process, a methodology for content management on chatbot systems. The proposed methodology is based on the experiences acquired with the development of Evatalk, the chatbot for the Brazilian Virtual School of Government. The focus of this methodology is to evolve the chatbot content through the analysis of user interactions, allowing a cyclic and human-supervised process. We divided the proposed methodology into three distinct phases, namely, manage, build, and analyze. Moreover, the proposed methodology presents a clear definition of the roles of the chatbot team. We validate the proposed methodology along with the creation of the Evatalk chatbot, whose amount of interactions was of 22,771 for the 1,698,957 enrolled attendees in the Brazillian Virtual School of Government in 2020. The application of the methodology on Evatalk’s chatbot brought positive results: we reduced the chatbot’s human hand-off rate from 44.43% to 30.16%, the chatbot’s knowledge base examples increased by 160% whilst maintaining a high percentage of confidence in its responses and keeping the user satisfaction collected in conversations stable.
... "Tay, instead of enhancing 'her' linguistic fluency by navigating the Internet space, turned into a representative of the more horrific face of social media and adopted a chaotic, crudely sexist and racist (anti-Semitic) mode of talk" (Beran 2018). Within less than 24 h the bot developed into a racist, misogynistic and generally horrid conversation generator (Bird et al. 2018;Wakefield 2016). By being programmed to mirror behavior it encountered, the bot showed very clearly what the tenor of conversation in social media -here Twitter -actually looks like. ...
Article
Full-text available
The massive introduction of artificial intelligence (AI) has triggered significant societal concerns, ranging from “technological unemployment” and the dominance of algorithms in the work place and in everyday life, among others. While AI is made by humans and is, therefore, dependent on the latter for its purpose, the increasing capabilities of AI to carry out productive activities for humans can lead the latter to unwitting slavish existence. This has become evident, for example, in the area of social media use, where AI programmers tie psychology and persuasion to the human social need for approval and validation in ways that few users can resist. We argue that AI should serve humans with humans as masters and not the other way around. Moreover, we propose that virtue ethics might play a role to solidify the human as master of AI and guard against the alternative of AI as the master.
... Such IVAs incorporate speech recognition capabilities allowing users for asking questions and making requests to different interfaces. In addition to speech recognition, it is now possible to count the number of speakers in a conversation via speaker diarization [42], to infer the sentiment (mood) of individuals by analysing their voice [43] or yet recognise a speaker by his/her voice with considerably low error rates [44,45]. These advances clearly provide opportunities for implementing audio-based crime prevention tools. ...
Preprint
Full-text available
Criminal activity is a prevalent issue in contemporary culture and society, with most nations facing unacceptable levels of crime. Technological innovation has been one of the main driving forces leading to the continuous improvement of crime control and crime prevention strategies (e.g. GPS tracking and tagging, video surveillance, etc.). Given this, it is a moral obligation for the research community to consider how the contemporary technological developments (i.e. Internet of Things (IoT), Machine Learning, Edge Computing)might help reduce crime worldwide. In line with this, this paper provides a discussion of how a sample of contemporary hardware and software-based technologies might help further reduce criminal actions. After a thorough analysis of a wide array of technologies and a number of workshops with organisations of interest, we believe that the adoption of novel technologies by vulnerable individuals, victim support organisations and law enforcement can help reduce the occurrence of criminal activity.
... Nowadays, the business and researchers are progressively perceive the importance of chatbot systems, because they are integrated into daily life, playing roles as assistants to end users [1]. In the educational domain, Kowalski [2] indicates that chatbots can play an important role, because it represents an interactive mechanism, instead of the traditional e-learning systems, where students can constantly ask questions related to a specific field. ...
Thesis
Full-text available
In modern Human-Robot Interaction, much thought has been given to accessibility regarding robotic locomotion, specifically the enhancement of awareness and lowering of cognitive load. On the other hand, with social Human-Robot Interaction considered, published research is far sparser given that the problem is less explored than pathfinding and locomotion. This thesis studies how one can endow a robot with affective perception for social awareness in verbal and non-verbal communication. This is possible by the creation of a Human-Robot Interaction framework which abstracts machine learning and artificial intelligence technologies which allow for further accessibility to non-technical users compared to the current State-of-the-Art in the field. These studies thus initially focus on individual robotic abilities in the verbal, non-verbal and multimodality domains. Multimodality studies show that late data fusion of image and sound can improve environment recognition, and similarly that late fusion of Leap Motion Controller and image data can improve sign language recognition ability. To alleviate several of the open issues currently faced by researchers in the field, guidelines are reviewed from the relevant literature and met by the design and structure of the framework that this thesis ultimately presents. The framework recognises a user's request for a task through a chatbot-like architecture. Through research in this thesis that recognises human data augmentation (paraphrasing) and subsequent classification via language transformers, the robot's more advanced Natural Language Processing abilities allow for a wider range of recognised inputs. That is, as examples show, phrases that could be expected to be uttered during a natural human-human interaction are easily recognised by the robot. This allows for accessibility to robotics without the need to physically interact with a computer or write any code, with only the ability of natural interaction (an ability which most humans have) required for access to all the modular machine learning and artificial intelligence technologies embedded within the architecture. Following the research on individual abilities, this thesis then unifies all of the technologies into a deliberative interaction framework, wherein abilities are accessed from long-term memory modules and short-term memory information such as the user's tasks, sensor data, retrieved models, and finally output information. In addition, algorithms for model improvement are also explored, such as through transfer learning and synthetic data augmentation and so the framework performs autonomous learning to these extents to constantly improve its learning abilities. It is found that transfer learning between electroencephalographic and electromyographic biological signals improves the classification of one another given their slight physical similarities. Transfer learning also aids in environment recognition, when transferring knowledge from virtual environments to the real world. In another example of non-verbal communication, it is found that learning from a scarce dataset of American Sign Language for recognition can be improved by multi-modality transfer learning from hand features and images taken from a larger British Sign Language dataset. Data augmentation is shown to aid in electroencephalographic signal classification by learning from synthetic signals generated by a GPT-2 transformer model, and, in addition, augmenting training with synthetic data also shows improvements when performing speaker recognition from human speech. Given the importance of platform independence due to the growing range of available consumer robots, four use cases are detailed, and examples of behaviour are given by the Pepper, Nao, and Romeo robots as well as a computer terminal. The use cases involve a user requesting their electroencephalographic brainwave data to be classified by simply asking the robot whether or not they are concentrating. In a subsequent use case, the user asks if a given text is positive or negative, to which the robot correctly recognises the task of natural language processing at hand and then classifies the text, this is output and the physical robots react accordingly by showing emotion. The third use case has a request for sign language recognition, to which the robot recognises and thus switches from listening to watching the user communicate with them. The final use case focuses on a request for environment recognition, which has the robot perform multimodality recognition of its surroundings and note them accordingly. The results presented by this thesis show that several of the open issues in the field are alleviated through the technologies within, structuring of, and examples of interaction with the framework. The results also show the achievement of the three main goals set out by the research questions; the endowment of a robot with affective perception and social awareness for verbal and non-verbal communication, whether we can create a Human-Robot Interaction framework to abstract machine learning and artificial intelligence technologies which allow for the accessibility of non-technical users, and, as previously noted, which current issues in the field can be alleviated by the framework presented and to what extent.
Book
Full-text available
Książka "Architektura informacji istotą projektu" jest wielowątkowym głosem w dyskusji o problemach, zasadach, rozwiązaniach oraz badaniach z dziedziny, która obecnie bardzo intensywnie się rozwija zarówno w swoich zastosowaniach praktycznych, jak i w teorii. Zawarte w niej treści zostały przygotowane przez praktyków i teoretyków – specjalistów architektury informacji, naukowców i studentów. Jej tematyka obejmuje zagadnienia związane z dydaktyką z tego zakresu. Podejmuje również tematy konkretnych rozwiązań stosowanych w budowaniu zasobów cyfrowych w ramach ich architektury informacji. Z architekturą informacji nierozłącznie związane są również tematy dotyczące użytkowników zasobów i serwisów internetowych oraz projektowania ich treści i funkcjonalności pod kątem potrzeb odbiorców. Ta tematyka jest również interesująco przedstawiona w niniejszej pracy. Architektura informacji i jej wizualizacja to zagadnienia mocno ze sobą splecione. Odzwierciedleniem tych połączeń są zawarte w książce teksty ukazujące zagadnienia wizualizacji informacji oraz analiz praktycznych związanych z konkretnymi projektami i serwisami. (Prof. dr hab. Ewa Głowacka, Uniwersytet Mikołaja Kopernika w Toruniu)
Article
Full-text available
The aim of this paper is to explore commercial applications of chatbots, as well as to propose several measurement metrics to evaluate performance, usability and overall quality of an embodied conversational agent. On the basis of these metrics we examine existing Polish-speaking commercial chatbots that a) work in the B2C sector, b) reach the widest possible range of users, and c) are presumably the most advanced commercial deployments of their creators. We analyse various aspects of functioning of each embodied conversational agent: visual look, form of implementation on the website, speech synthesis unit, built-in knowledge base (with general and specialized information), presentation of knowledge and additional functionalities, conversational abilities and context sensitiveness, personality traits, personalization options, emergency responses in unexpected situations, possibility of rating chatbot and the website by the user. Our study reveals the current condition of Polish market of commercial virtual assistants and emphasizes the importance of a multidimensional evaluation of any commercial chatbot deployment.
Conference Paper
Full-text available
In this work, we present a real-time application in the scope of human daily activity recognition for robot-assisted living as an extension of our previous work [1]. We implemented our approach using Robot Operating System (ROS) environment, combining different modules to enable a robot to perceive the environment using different sensor modalities. Thus, the robot can move around, detect, track and follow a person to monitor daily activities wherever the person is.We focus our attention mainly on the robotic application by integrating several ROS modules for navigation, activity recognition and decision making. Reported results show that our framework accurately recognizes human activities in a real time application, triggering proper robot (re)actions, including spoken feedback for warnings or a physical reaction by changing the robot navigation route to avoid collision with the human when the robot is following the person. Results evidence the potential of our approach for robot-assisted living applications.
Conference Paper
Full-text available
A chatbot is a software system, which can interact or "chat" with a human user in natural language such as English. For the annual Loebner Prize contest, rival chatbots have been assessed in terms of ability to fool a judge in a restricted chat session. We are investigating methods to train and adapt a chatbot to a specific user's language use or application, via a user-supplied training corpus. We advocate open-ended trials by real users, such as an example Afrikaans chatbot for Afrikaans-speaking researchers and students in South Africa. This is evaluated in terms of "glass box" dialogue efficiency metrics, and "black box" dialogue quality metrics and user satisfaction feedback. The other examples presented in this paper are the Qur'an and the FAQchat prototypes. Our general conclusion is that evaluation should be adapted to the application and to user needs.
Conference Paper
Affective facial expression is a key feature of non-verbal behavior and is considered as a symptom of an internal emotional state. Emotion recognition plays an important role in social communication: human-human and also for human-robot interaction. This work aims at the development of a framework able to recognise human emotions through facial expression for human-robot interaction. Simple features based on facial landmarks distances and angles are extracted to feed a dynamic probabilistic classification framework. The public online dataset Karolinska Directed Emotional Faces (KDEF) [12] is used to learn seven different emotions (e.g. angry, fearful, disgusted, happy, sad, surprised, and neutral) performed by seventy subjects. Offline and on-the-fly tests were carried out: leave-one-out cross validation tests using the dataset and on-the-fly tests during human-robot interactions. Preliminary results show that the proposed framework can correctly recognise human facial expressions with potential to be used in human-robot interaction scenarios.
Conference Paper
The Turing Test was proposed by Alan Turing in 1950; he called it the Imitation Game. In 1991 Hu Loebner prize competition, offering a f h Loebner started the 100,000 prize to the author of the first computer program to pass an unrestricted Turing test. Annual competitions are held each year with smaller prizes for the best program on a restricted Turing test. This paper describes the development of one such Turing System, including the technical design of the program and its performance on the first three Loebner Prize competitions. We also discuss the program's four year development effort, which has depended heavily on constant interaction with people on the Internet via Tinymuds (multiuser network communication servers). Finally, we discuss the design of the Loebner com- petition itself, and address its usefulness in furthering the development of Artificial Intelligence.
Conference Paper
We compare our own embodied conversational agent (ECA) scheme, BotCom, with seven other complex Internet-based ECAs according to recently-published information about them, and highlight some important attributes that have received little attention in the construction of realistic ECAs. BotCom incorporates the use of emotions, humor and complex information services. We cover issues that are likely to be of greatest interest for developers of ECAs that, like BotCom, are directed towards intensive commercial use.