ArticlePDF Available

A comparative study of retrieval-based and generative-based chatbots using Deep Learning and Machine Learning

Authors:
Healthcare Analytics 3 (2023) 100198
Contents lists available at ScienceDirect
Healthcare Analytics
journal homepage: www.elsevier.com/locate/health
A comparative study of retrieval-based and generative-based chatbots using
Deep Learning and Machine Learning
Sumit Pandey
, Srishti Sharma
The NorthCap University School of Engineering & Technology, Gurugram, Haryana, 122017, India
ARTICLE INFO
Keywords:
Artificial Intelligence
Chatbot
Deep Learning
Machine Learning
Mental health
ABSTRACT
Increased screen time may cause significant health impacts, including harmful effects on mental health. Studies
on the association between technological obsessions and their influence on health have been conducted
using Deep Learning (DL) and Machine Learning (ML) techniques. The deployment of chatbots in different
industries has been proven as a game-changer. We study conversational Artificial Intelligence (AI) systems
enabling operators to conduct conversations with machines that resemble those with humans. We design and
develop two retrieval-based and generative-based chatbots, each with six designs. Among the retrieval-based
chatbots, Vanilla Recurrent Neural Network (RNN) has an accuracy of 83.22%, Long Short Term Memory
(LSTM) is 89.87% accurate, Bidirectional LSTM (Bi-LSTM) is 91.57% accurate, Gated Recurrent Unit (GRU) is
65.57% accurate, and Convolution Neural Network (CNN) is 82.33% accurate. In comparison, generative-based
chatbots have encoder–decoder designs that are 94.45% accurate. The most significant distinction is that while
generative-based chatbots can generate new text, retrieval-based chatbots are restricted to responding to inputs
that match the best of the outputs they already know.
1. Introduction
Psychological behavioural changes caused by mental diseases may
have an impact on a person’s development. People of different ages,
from many cultures, and nations are susceptible to them. Frequently,
odd thoughts, feelings, behaviour, and perceptions are indicators of
mental illnesses. Mental health conditions include developmental and
neurodegenerative illnesses like autism1as well as schizophrenia,2
bipolar disorder,3depression,4and other psychoses.5One in seven
Indians, according to a 2017 study, suffer from a mental condition such
as schizophrenia or bipolar disorder. As people’s awareness of mental
health issues grows, many researchers are turning their attention to this
area as a key area for improvement. To provide a uniform approach to
mental health facilities, including treatment, support, and prevention,
as well as a comprehensive means of addressing the nation’s psycholog-
ical well-being, India’s first National Mental Health Survey6(NMHS)
Corresponding author.
E-mail addresses: sumit18csd004@ncuindia.edu (S. Pandey), srishti@ncuindia.edu (S. Sharma).
1A neurodevelopmental disease called autism impacts behaviour, social interaction, and communication.
2Schizophrenia is a psychiatric disorder that hinders an individual’s ability to engage in coherent cognitive processes, experience emotions in a stable manner,
and exhibit appropriate behaviours.
3Bipolar disorder, a mental illness, is typified by alternating episodes of depressive and manic or hypomanic states.
4Depression, a mental disorder, is characterized by prolonged periods of experiencing emotions such as sadness, despair, and a lack of interest in previously
enjoyed activities.
5A range of mental health conditions known as psychoses have an impact on a person’s capacity to think, feel, and perceive reality.
6The National Mental Health Survey (NMHS) had the goal of estimating the frequency and burden of mental health disorders in India, identifying current
treatment gaps, existing patterns of healthcare use, and understanding the effect and disability caused by these diseases.
was published [1]. Twelve states altogether attended the NMHS. It
used both quantitative and qualitative methods to conduct assessments
of adults such as Focus-Group Discussions (FGD) and Key Informant
Interviews (KII) as stated in [2]. India had a population of about
150 million individuals who needed assistance, with men outnum-
bering women. As is commonly assumed, men are much more likely
than women to experience behavioural and mental health problems.
However, schizotypal, psychoneurotic diseases, mood disorders, and
psychotic syndromes are associated with physical abnormalities. The
majority of those with mental health issues were between the ages of
40 and 49. It was also demonstrated that the middle class and lower
classes were burdened more than the wealthy [2,3]. The NMHS System
has raised attention to mental health issues in individuals and increased
the use of Psychological Therapies (PSIs), and prescription drugs. From
70% to 92% more people with various mental diseases are receiving
treatment. Currently, 1.3% of all health spending in India is allocated
to supporting psychological well-being [4].
https://doi.org/10.1016/j.health.2023.100198
Received 2 April 2023; Received in revised form 8 May 2023; Accepted 17 May 2023
2772-4425/©2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc- nd/4.0/).
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Also, the importance of School-Based Mental Health Services
(SBMHSs) must be prioritized because adolescents have experienced
comparable challenges such as one in every thirty-three children and
one in every eight adolescents may deal with clinical depression which
increases the risk of suicide in the case of adolescents. Therefore, this
study emphasizes the value of a multi-tiered approach at different
stages to get around potential obstacles [5,6]. A multi-tiered approach
can be extremely valuable in overcoming potential obstacles in any
complex project, such as the development of chatbots for mental health
care. In this study, the authors propose a multi-tiered approach at
different stages to address the challenges faced in developing effec-
tive chatbots for mental health care. At the initial stage, the authors
suggest compiling pertinent data to address the problem of scarcity
of large-scale, subject-to-excessive-quality data. This approach involves
gathering relevant data from various sources and using Machine Learn-
ing (ML) approaches to filter out irrelevant data or noise, which can
help to increase data quality and decrease bias. In the next stage,
with the help of Neural Networks (NN), the authors of this paper
have created two different types of chatbots, a chatbot that searches
for information and a chatbot powered by generative algorithms,
which specifically highlight mental health issues in college students by
reducing stress and, as a result, assisting in the determination of the
root cause of their issues. Furthermore, the authors suggest customizing
the chatbot by changing the operator’s questions and combining them
with the developed chatbot, which can improve the chatbot’s ability to
communicate with humans and have a ‘‘human-like’’ look. As, with the
advancement of digital technology, there is a chance to create novel
and easily accessible mental health therapies, such as telepsychiatry,
online counselling, and mental health chatbots. These innovations have
the potential to improve access to mental health care, lessen the stigma
around asking for assistance, and offer more specialized and efficient
treatment alternatives. The utilization of technology can also involve
the collection and analysis of vast amounts of data, which can offer
insights into the occurrence and potential causes of mental health
disorders, as well as the effectiveness of various treatments. We can
address the escalating mental health crisis and enhance the wellness
of people and communities by using technology to improve mental
health. Finally, the authors emphasize the importance of conducting
follow-up studies with larger sample sizes to evaluate the effectiveness
of chatbots in boosting well-being and reducing stress. This multi-tiered
approach involves evaluating the chatbots’ performance at different
stages and addressing potential challenges, such as the risk of placebo
effects and false positives. Because people might have an interactive
way to take part in Artificial Intelligence (AI)-driven behavioural health
interventions if chatbots are used as a scalable option as chatbots can
be available 24/7, providing immediate support and assistance during
times of crisis. However, people are growing more and more dependent
on digital devices for work, play, and communication as technology
becomes more pervasive in modern culture. But this increased reliance
on technology has also been linked to several detrimental health effects,
such as sedentary behaviour, poor posture, eye strain, disturbed sleep
cycles, and psychological issues like anxiety, depression, and addiction.
We can better understand the effects of technology on people and soci-
ety as a whole by investigating the relationship between technological
engrossment and health. We can then develop measures to encourage
healthy technology use and lessen the detrimental consequences on
health and wellbeing. Therefore, understanding the user behaviours
of chatbots for depression is a crucial first step in creating chatbot
designs and sharing information about the benefits and drawbacks
of chatbots. Although some chatbots’ early efficacy results have been
encouraging, there is little data on how users use these chatbots.
For instance, chatbots in different industries have been proven as
game-changer such as [710]. However, in mental health people tend
to forget unpleasant circumstances; therefore, certain stress-related
instances could go unrecognized which makes data collecting difficult
when it comes to felt stress [11,12]. Therefore, because collecting
Fig. 1. Retrieval-based chatbots.
Fig. 2. Generative-based chatbots.
data can have different structures such as Comma Separated Values
(CSV) and JavaScript Object Notation (JSON), there is a need to
construct two separate kinds of chatbots: retrieval-based chatbots and
generative-based chatbots as shown in Figs. 1 and 2. Retrieval-based
models utilize a set of predetermined responses and experiences to
determine the most suitable reply, taking into account the input and
context provided. The evaluation standards could be anything from
straightforward rule-based charting to intricate ML ensembles. These
techniques select a response from a list of options rather than creating
fresh text. Retrieval-based chatbots are often error-free when they were
trained on a large and diverse dataset with sufficient annotation and
feedback. But because they seem too stiff and the responses do not
seem ‘‘human’’, they are constrained in their approach. Pre-determined
responses are not used in complex generative-based chatbot models.
They develop original ideas from the ground up. In generative-based
chatbot models, automatic conversion is common; because they learn
from the ground up, generative-based chatbot models are utilized to
create sophisticated chatbots that are quite progressive in function.
This paper is organized as: A thorough literature analysis of the
suggested chatbot approaches is provided in Section 2. Additionally,
Section 3explains the implementation, and the dataset, along with the
2
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Table 1
Retrieval-based Chatbots.
Study Efficacy Privacy and Confidentiality Safety
Empirical
testinga
InformationbTransparencycPrivacy
agreementsd
AvailabilityeTransparencyfTraditional
supportg
Automatic conversation
terminationh
Zhang et al. [13]
Moore et al. [14]
Akkineni et al. [15]
Shi et al. [16]
Wang and Fang [17]
Qian and Dou [18]
Lan et al. [19]
Kadam et al. [20]
Wang and Fang [21]
Kim et al. [22]
Aksu et al. [23]
Patchava and Kiran [24]
Lopez-Rodriguez et al. [25]
aEmpirical testing of chatbots?
bProviding information on theoretical approach of chatbots to users?
cTransparency of privacy policy in chatbots for patients?
dProviding privacy agreements to patients before starting the conversation with chatbots?
eAvailability of privacy agreement review for patients during chatbot conversation?
fTransparency of chatbots to patients regarding their intelligent nature?
gAvailability of traditional support in chatbots for patients?
hAutomatic conversation termination by chatbots while interacting with patients?
explanation of the experimental results for the suggested chatbots that
are retrieval and generative. A comparison of JSON and CSV files with
various models is demonstrated in Section 4. The paper concluded in
Section 5with a few observations and suggestions for supplementary
study.
2. Related works
The two main categories of conversational frameworks used to build
chatbots are retrieval-based and generative-based [26,27]. Retrieval-
based chatbots find matched responses from a database of intentionally
awkward conversational phrases, which is the main distinction between
the two. Additionally, generative-based chatbots use ML approaches to
automatically generate responses.
As of now, the retrieval-based approach is used by the majority
of therapeutic chatbots [28,29]. Lommatzsch and Katins et al. [30]
state that retrieval-based chatbots keep track of the conversation using
dialogue management frameworks and determine what to do next de-
pending on the responses they discover [31]. Whereas, Wang et al. [32]
state that many therapy chatbots manage their user conversations
using pre-trained dialogue management frameworks. Retrieval-based
frameworks can be divided into two categories: Finite state which was
proposed by Sutton et al. [33] in 1998 and Frame-based which was
proposed by Goddeau et al. [34] in 1996. Parmar et al. [35] state
that a chatbot with a finite state framework restricts the dialogue to a
predetermined set of steps. At each level, users are only provided with a
limited number of response possibilities, and the chatbot can only react
using those options. This indicates that the dialogue is constrained and
does not permit natural conversation [36]. For simple, structured activi-
ties where the chatbot can direct the dialogue, a finite state framework
works well [29]. The dialogue flow is not predetermined in frames-
based [29]. Instead, the chatbot poses targeted queries to the user and
methodically gathers data. The ‘‘slots’’ that make up this structured data
are collections of well-known notions according to Wei et al. [37]. The
chatbot then moves forward under pre-specified actions for each group
concept of slots, enabling it to offer more individualized responses
and manage increasingly challenging jobs. Chen et al. [38] state that
this concept is frequently applied in interactions involving information-
seeking, where users have information based on a set of constraints.
Users providing data to fill in specified slots, such as their departure and
arrival cities when looking for a route, is an example of a frame-based
framework. This kind of framework, nevertheless, may have trouble
adjusting to different talks [39] that do not follow a predetermined
plan. Due to the non-predetermined dialogue flow, it might sometimes
result in consumers revealing more information than is necessary [29].
Building interaction management frameworks for chatbots typically
involves using Artificial Intelligence Markup Language (AIML) and
ChatScript [26], two well-liked methods. Artificial Linguistic Internet
Computer Entity (ALICE), a pioneering chatbot that was able to have
basic interactions with users, was the first to use AIML. A tree structure
is used by the Extensible Markup Language (XML)-compliant language
AIML to quickly match patterns and get the right answers. AIML
was used to create the chatbot systems for several treatment chatbots
that have been mentioned in the literature, including a Virtual Agent
Equipped with Voice Communication (VICA) which was proposed by
Sakurai et al. [40], an alcohol misuse intervention chatbot proposed by
Dulin et al. [41], Barnett et al. [42], Win et al. [43] and a consultant
chatbot proposed by Parviainen and Rantala [44], Bharti et al. [45],
Shinde et al. [46]. Decision tree topologies have been employed by
several treatment chatbots, including Vivibot which was proposed by
Greer et al. [47], Woebot was proposed by Fitzpatrick et al. [48],
and a chatbot for post-traumatic stress disorder was proposed by Han
et al. [49], Chaix et al. [50], Ahn et al. [51], Tielman et al. [52].
Users were given the option to react in an option-choice manner by an
embodied conversational agent for education [53]. On the other hand,
the generative-based approach limits free dialogues [54,55] due to pre-
determined outputs [27], whereas the retrieval-based approach enables
chatbots to answer with more meaningful responses [26]. Because it
relies on a decision tree mechanism, the option-choice format used by
some chatbots is inappropriate for multi-linear conversations [26]. Fur-
thermore, the usability of the system becomes more difficult to enhance
since it will not successfully complete the task in cases where user
inputs fail to match any information in the database [26]. Table 1 lists
more retrieval-based chatbots in addition to those already discussed.
In contrast to retrieval-based chatbots, generative chatbots employ
ML approaches to learn how to respond. These chatbots are educated
on a huge quantity of training data [26] and use that knowledge to pro-
duce responses to users’ inputs rather than relying on pre-determined
responses. RNN, LSTM, and Seq2Seq models are a few of the common
AI approaches utilized in generative-based chatbots. But there have not
been many studies that have used generative-based methods to create
therapy chatbots. Bidirectional Encoder Representations from Trans-
formers (BERT) [56] and the OpenAI Generative Pre-Training-2 Model
3
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Fig. 3. CSV file converted to the JSON file.
Table 2
Generative-based chatbots.
Study Efficacy Privacy and Confidentiality Safety
Empirical
testing
Information Transparency Privacy
agreements
Availability Transparency Traditional support Automatic conversation
termination
See and Manning [62]
Sheikh et al. [63]
Hirosawa et al. [64]
Sawant et al. [65]
Si et al. [66]
Raj and Phridviraj [67]
Bachtiar et al. [68]
Khadija et al. [69]
(GPT-2) [57], which has been improved into the Third-Generation Au-
toregressive Language Model (GPT-3) [58], are two of the most sophis-
ticated generative-based models. These models are frequently utilized
in tasks involving Natural Language Processing (NLP) because they
have demonstrated notable gains in producing human-like language.
For producing task-oriented discourse, these models are open-source,
simple to train, and adaptable [55,59]. In 2019, OpenAI introduced
the GPT-2, an unsupervised generative model that had undergone
pre-training on a substantial unannotated dataset. This model’s advan-
tages include the ability to support deep language models, reduce the
cost of manual annotation, and avoid the need to train a new model
from scratch. On language-related tasks including summarizing, read-
ing comprehension, answering questions, and translation, the model
did well according to Relc et al. [57]. The chatbot can also be enhanced
with various domain data to serve certain objectives for its target
customers [60,61]. The OpenAI GPT-2 model has benefits, but there
are still some problems that need to be solved, such as users having
trouble understanding the responses and the model producing mistakes
that do not make sense in the context of the conversation according
to Zhang et al. [55]. Incorporating pre-trained models that have been
specifically designed for particular domains and fine-tuning them with
datasets that are specific to those domains could be a solution to these
problems. Table 1 lists more generative-based chatbots in addition to
those already discussed (see Table 2). .
3. Implementation and analysis
A variety of data collection techniques [70], including observa-
tional research, case studies using focus groups, and a quasi-statistical
method, have been employed in the development of chatbots [71].
Kaggle [72], GitHub [73], scraping data from Reddit [74], and clinical
Table 3
Dataset description.
Name Mental_Health_FAQ
Category Comma-Separated Values (CSV)
No. of Rows 98
No. of Columns 3 i.e. (Question_ID, Questions, Answers)
This CSV file does not include any tags that match every query, hence it cannot be
utilized in the case of a retrieval-based class method. Thus, the author’s physical tags
were added to the first rows of the database.
data [75] are some sources for mental health chatbots that have a
variety of datasets available. Technical details on the parts that make
up the chatbot models are provided in this section. The classification
issue has been addressed in the past using a variety of ML techniques.
However, given their extensive feature grabbing, NNs frequently seem
to outperform ML methods. The authors create a conversational AI that
specifically highlights issues with college students’ mental health by
reducing stress and, as a result, assisting in the discovery of the root
of their difficulties. The authors are attempting to create an AI-based
chatbot. This paradigm uses NN to classify user input depending on a
preset set of replies and connect user input to intent. The authors chose
the Kaggle psychological well-being Frequently Asked Question (FAQ)
database [76] since there was no publicly available mental health ser-
vices database designed specifically for creating a chatbot. The chosen
dataset’s description is shown in Table 3.
3.1. Dataset preprocessing
The conversion of CSV data into a JSON file was necessary for the
development of a retrieval-based chatbot, as tags play a pivotal role
in the process. Fig. 3 provides an illustration of the JSON file, which
contains intent tags with defined forms and corresponding responses.
4
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Fig. 4. Vocabulary lists with index.
Table 4
Final learning accuracies and losses of Vanilla RNN.
Metrics Value
Loss 1.1619
Accuracy 0.8322
Validation loss 2.0922
Validation accuracy 0.2013
This is an example of an intent tag, and the chatbot replies with
the solution provided in the ‘‘responses’’ of the purpose part. By trans-
forming the outlines and replies into tags, the authors produced two
data structures. During preprocessing, extra spaces, tokenization, and
punctuation were taken out of the response and outline strings. A
vocabulary, which is a thesaurus of unique phrases and their frequency,
was built using the tokenized terms. Utilizing this vocabulary, a list of
exclusive phrases is created and given an index, as seen in Fig. 4.
The authors used the example structure to construct the validation
and training data for the model by removing the initial outline from
each tag and then formalizing the architecture. The ten columns that
make up the final training data each list the lengthiest string in the
outline. The array’s length is 16, which is the entire number of tags, as
well as the final analysis. Labels make up data that match each text in
the practice data (containing the puzzled tag).
3.2. Retrieval-based models
This subsection displays the outcomes of the retrieval-based chatbot
employing several NN types, including the following:
Vanilla Recurrent Neural Network (RNN)7:The Embedding
Layer (EL), which is the first layer, use a method to represent vocabu-
lary in texts as an N-Dimensional Vector (NDV), where N is the integer
of sizes the authors intend to give the words. These Feature Vectors
(FV) are picked up while training and equivalent terms senses in the
vector space have connected signals. Following that, a basic RNN was
created using Keras Layers (KL). The model had been trained across
50 epochs, and Fig. 5 illustrates the learning curves of the Vanilla
RNN. Table 4 lists the model’s final learning accuracy and losses.
Overfitting is revealed by the discrepancy in exactness between vali-
dation and training. After four trials, the endorsement harm increases,
demonstrating that our method does not work well with unreliable
data.
Long Short Term Memory (LSTM)8:The LSTM architecture
outperforms the vanilla architecture because it has more gates than
7Using a hidden state that is updated at each time step, a vanilla RNN is
a type of NN created to process sequential data. Each input in the sequence is
processed along with data from previous inputs in a recursive manner. It has
issues with vanishing and exploding gradients and is the most basic type of
RNN.
8LSTM is a type of RNN architecture that addresses the problem of
vanishing gradients in conventional RNNs by incorporating memory cells that
allow the network to selectively retain or discard information over time. This
makes it particularly advantageous for tasks that involve sequential input.
Table 5
Final learning accuracies and losses of LSTM.
Metrics Value
Loss 0.7420
Accuracy 0.8987
Validation loss 2.4457
Validation accuracy 0.1990
Table 6
Final learning accuracies and losses of Bi-LSTM.
Metrics Value
Loss 1.2129
Accuracy 0.9157
Validation loss 2.3987
Validation accuracy 0.1010
a simple RNN, making it more sophisticated. KLs were employed by
the authors to train the architecture. Table 5 lists the final learning
accuracies and losses after 50 LSTM epochs, and Fig. 6 displays the
LSTM learning curves. The discrepancy between training and valida-
tion accuracy indicates overfitting. The gap increases with increasing
overfitting. A low training error indicates low bias.
Bidirectional LSTM (Bi-LSTM)9:A Bi-LSTM and an LSTM are
fundamentally different from one another since a Bi-LSTM uses double
the amount of information that an LSTM does first from beginning to
end and vice-versa. Differing from a unidirectional LSTM, it utilizes
bidirectional processing to capture information from future time steps,
and the authors can save data from both at any moment in the past
or future during the phase. Because they are so adept at understanding
rounded outward methods, Bi-LSTMs ought to produce greater results.
The dense layer’s constraints on the number of input elements and
output elements are the same as those of the Bi-LSTM architecture.
The architecture was trained by the authors using KL’s LSTM. The final
learning accuracies and losses after 50 epochs of Bi-LSTM are provided
in Table 6 and the learning curves of Bi-LSTM are shown in Fig. 7.
The architecture is quite varied given the stark contrast between the
validation and training datasets. However, Bi-LSTM is more accurate
when compared to LSTM.
Gated Recurrent Unit (GRU)10:The authors also utilized GRU,
another well-liked NN algorithm. Given that it has a smaller number
of gates than LSTMs, it operates more rapidly. Because of the LSTM
architecture’s reduced difficulty, the authors rated the results as being
less precise than the LSTM architecture. However, LSTM overfitting was
a possibility because the authors’ database was so tiny. The number of
elements in the ELs and dense layers both present restrictions when
used in conjunction with the LSTM model. The authors employed KL’s
LSTM to train the architecture. The overall learning errors and losses
for GRU after 50 epochs are shown in Table 7 and Fig. 8 respectively.
The validation loss initially reduces but starts to grow after about 10
epochs, indicating overfitting. The training loss falls continuously. The
final learning accuracies and losses of GRU are shown in Table 7, which
demonstrates that while the model is less accurate and more prone to
damage than the LSTM architecture, it still has a respectable accuracy
9The classic LSTM is extended by the Bi-LSTM, which concurrently captures
past and future context by processing the input sequence both forward and
backward across time. In NLP tasks like text categorization, sentiment analysis,
and machine translation, it is frequently employed.
10 A specific kind of RNN architecture called a GRU includes gating methods
to regulate the information flow between time steps. With fewer parameters
and comparable performance on some applications, it was introduced as a
more straightforward alternative to the LSTM design. A reset gate and an
update gate are two gates in the GRU that control how much of the previous
state should be forgotten and how much of the current input should be
incorporated, respectively. As a result, the GRU is quicker to train and more
computationally economical than the LSTM.
5
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Fig. 5. Learning curves of Vanilla RNN.
Fig. 6. Learning curves of LSTM.
Fig. 7. Learning curves of Bi-LSTM.
6
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Fig. 8. Learning curves of GRU.
Table 7
Final learning accuracies and losses of GRU.
Metrics Value
Loss 1.0214
Accuracy 0.6557
Validation loss 2.8854
Validation accuracy 0.1035
of 0.6557. Additionally, with a validation accuracy of only 0.1035,
the generalization performance is poor. They are far less precise and
damage-prone than the LSTM architecture, as seen by the final results.
The breakdowns in endorsement precisions and practising precisions,
which are in line with earlier findings from practised architecture, also
show that the architecture is overfitting.
The authors then attempted a little more complex architecture to
see if they could prevent overfitting.
Convolution Neural Network (CNN)11:An EL, a CNN layer,
and a connected layer were the next designs that were evaluated.
The output of the ELs is served into a 1D- Convolutional Layer (CL),
which is subsequently compressed and served into two completely
related coats. A-Max Pooling Layer (MPL) monitors all of these coats
to control their sizes. These settings are the result of several iterative
strategies meant to provide the greatest design possible. KLs were
employed by the authors to train the architecture. Table 8 lists the
final learning accuracies and losses for CNN after 50 epochs, and Fig. 9
displays the CNN learning curves. CNN’s learning curves show how
the model performs when learning from training and validation sets of
data. The vertical axis shows the performance parameter, such as loss
or accuracy, while the horizontal axis reflects the number of epochs.
Fig. 9 shows an orange line for the validation loss and a blue line
for the training loss. The validation accuracy is displayed in red, and
the training accuracy is displayed in green. The accuracy is initially
poor, and both the training and validation losses are large. The model
gets better at fitting the training data as the training goes on, which
lowers the training loss and raises the training accuracy. But if the
model gets too complicated, it might begin to overfit the training
data, which would lower validation accuracy and raise validation loss.
According to Fig. 9, the model has learnt from the training data when
11 It is frequently used for image and video recognition applications. A
CNN is a type of NN that processes input data using filters (kernels) to
extract features, convolutions to combine the features, and pooling to reduce
dimensionality.
Table 8
Final learning accuracies and losses of CNN.
Metrics Value
Loss 1.9127
Accuracy 0.8233
Validation loss 2.1981
Validation accuracy 0.1980
the training loss and accuracy improve over time after about 20 epochs.
The validation loss and accuracy likewise increase at the same time
but begin to settle after 30 epochs. The model is not overfitting to the
training data, as shown by the training and validation loss gaps.
3.3. Generative-based models
The outcomes of the generative-based chatbot are displayed in this
subsection and are as follows:
Encoder–Decoder architecture: All of the models that we have
seen so far have been retrieval-based approaches. The best an-
swer to the operator’s query was chosen with the use of NN on
numerous models, and the responses were encrypted. Instead
of choosing from a predefined response list, the authors will
generate an answer based on the preparation body. A seq2seq
paradigm that produces results is the encoder–decoder. To put it
simply, it predicts a phrase that the operator provides, and every
one of the following statements is then forecasted depending on
the possibility that term will appear. Because the tags are not
necessary for producing predictions, the database for this design
may simply be a CSV file. The ‘‘<END>’’ tag was applied to the
Target Tags (TT), which were left alone in the input column.
Decoder output data, encoder input data, and decoder input data
are the three matrices of One-Hot Vectors (OHV). The Decoder
uses two matrices, which are also used by the seq2seq structure
during preparation, to enable instructor force. The goal is to help
the architecture get ready for the current objective token by using
the input token from a prior phase step. The encoder architecture
demands both an LSTM level with a predetermined number of
unseen positions and an input level that creates a matrix for stor-
ing OHVs. Except for sending the status information along with
the decoder involvements, the decoder design is fundamentally
comparable to the encoder architecture. The decoder operates as
follows:
7
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Fig. 9. Learning curves of CNN.
Fig. 10. Encoder–decoder model summary.
1. Obtain the output positions of the encoder.
2. Give the decoder the output places so that it can decode
the phrase term by word.
3. After decoding each word, update the decoder’s hidden
location so that previously decoded words can be used to
assist in decoding new words.
In Fig. 10, the model summary is displayed. KLs were employed by the
authors to train the architecture. Table 9 lists the overall learning errors
and losses after 50 epochs, and Fig. 11 displays the curves of learning
for the encoder–decoder. The training and validation curves may be
used to create drawings with particular inferences. The less distance
between the validation and training curves, the better the architecture
fits the datasets. The damage curves’ notable discrepancy indicates that
a suitable condition is emerging. A declining training loss that stays
down until the end of the epochs may likewise point to an unsuitable
architecture.
Table 9
Final learning accuracies and losses of encoder–decoder.
Metrics Value
Loss 0.2010
Accuracy 0.9445
Validation loss 2.1554
Validation accuracy 0.7017
4. Comparative analysis
In this section, a comparison between files, and models are de-
scribed for mental health chatbots.
4.1. Comparison between JSON file and CSV file
For a retrieval-based chatbot, the authors used a JSON file, and for a
chatbot that is generated by itself, we used a CSV file. Examining how a
8
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Fig. 11. Learning curves of encoder–decoder..
chatbot based on retrieval works chooses its options can help to explain
this. Based on a collection of pre-written responses, a retrieval-based
chatbot is prepared to provide the optimal response. This approach is
best suited for situations where the range of possible user inputs is
limited and well-defined, as it can quickly provide accurate responses
without the need for extensive training data. However, it may struggle
to handle more complex or open-ended interactions where the user’s
input is less predictable. Therefore, the total number of predetermined
responses has a tag. The input for the result plotting must be a tag.
Alternatively, the tag applied to the operator’s input classifies it (the
setting). Following the identification of the most appropriate tag, the
operator is provided with one of the pre-established responses. Due to
its structured and organized design, storing such information in a JSON
file is a more straightforward task. On the other hand, an effective
chatbot ‘‘creates’’ answers from the ground up. As a result of this, final
predictions for the following word are generated by graphing the input,
outcome, and prior words. To select the next word at each step based
on the predicted outcomes and previous words, a selective approach is
used. As only the input script and output script columns are necessary
for this data format, capturing it in a CSV file is uncomplicated. Com-
pared to a JSON file, inserting or removing data is more straightforward
in this case.
4.2. Model comparison
To take into account the two different chatbot varieties retrieval-
based chatbots and generative-based chatbots the authors designed
six designs and connected them. The six architectures the authors
created have some of the following features in common and differences:
1. CNN performs excellently when compared to retrieval-based
chatbots like LSTM, GRU, CNN, Bi-LSTM, and vanilla RNN in
terms of overfitting and accuracy. The discrepancy between val-
idation loss and training loss is what defines overfit. Comparing
CNN’s curves to those of other systems, they are fairly uniform.
2. When comparing RNNs, LSTM, GRU, CNN, Bi-LSTM, and vanilla
RNN all perform better than the others. GRU is a more recent
technique that performs better mathematically than LSTM. On
low-practice data, GRUs outperform LSTMs in terms of growth
and implementation. GRUs are often used, which makes it less
difficult to modify them, additional gates can be incorporated
into the network if more input is needed, for example.
3. In comparing LSTM and Bi-LSTM, the authors observe that the
latter endeavors to minimize loss during the later stages of
training. This is supported by the descending loss curve, which
results in a smaller gap between the validation loss curves and
the training loss.
Fig. 12 illustrates how the generative-based chatbots, which are based
on the encoder–decoder model, perform better than the retrieval-based
chatbots when the two models are compared. When compared to earlier
chatbots, it has a much higher validation accuracy. The chatbot is
still acceptable, and minimizing the harm is crucial, according to the
considerable shift in validation and training loss as well as the reduced
training loss following training. This might be a result of the chatbot’s
current rudimentary condition, which prevents it from being able to
learn a lengthy sequence of results. The authors suggest that they solve
this issue by developing attention layers that enhance forecasting and
only recall pertinent preceding information.
5. Conclusions and future works
ML has the potential to enhance the delivery of mental health
services, but the efficacy of current approaches is unclear due to a
dearth of high-quality data. The initial step towards addressing this is to
acquire relevant data through techniques such as topic-noise modelling.
The chatbot can be trained and validated once sufficient data has been
gathered, and the authors can then think about the trade-off between
bias and modification. Instead of looking for a larger dataset, more
research can give a deeper understanding of the data, and the suggested
approach can be changed to get the best results. Users can pick between
retrieval-based and generative-based chatbots. While generative-based
chatbots allow for more experimentation through the addition of new
layers like transformer structures and attention layers, retrieval-based
chatbots demand annotated input from a medical expert. The Keras-
facilitated EL is used by the authors to make grammatical inspection
easier, but other prediction structures, including Global Vectors for
Word Representations (GloVe), can improve the model’s base layer.
The generated chatbot can be combined with the operator’s questions
in the current database, which is an Excel file with responses from
operators and bots. The ultimate objective is to build a chatbot that
interacts with people in a ‘‘human-like’’ way. This might be included
in a web application or a mobile application. There are disadvantages
to this strategy, though. If the chatbot is truly helpful in enhancing
well-being and reducing stress, further research with larger sample
sizes is required. The chatbot should be available to participants for
9
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
Fig. 12. Generative-based chatbots vs. retrieval-based chatbots.
as long as necessary, and follow-up measurements at one, three, and
six months should be included to see if benefits hold up over time. The
authors note that future studies with more advanced designs may be
required to uncover ‘‘hidden’’ cases of depression that the chatbot failed
to correctly identify and that the lack of an effective control group
increases the potential for placebo effects.
Declaration of competing interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Data availability
Data will be made available on request.
References
[1] R. Sagar, R. Dandona, G. Gururaj, R.S. Dhaliwal, A. Singh, A. Ferrari, T. Dua,
A. Ganguli, M. Varghese, J.K. Chakma, G.A. Kumar, K.S. Shaji, A. Ambekar,
T. Rangaswamy, L. Vijayakumar, V. Agarwal, R.P. Krishnankutty, R. Bhatia, F.
Charlson, N. Chowdhary, H.E. Erskine, S.D. Glenn, V. Krish, A.M. Mantilla Her-
rera, P. Mutreja, C.M. Odell, P.K. Pal, S. Prakash, D. Santomauro, D.K. Shukla,
R. Singh, R.K.L. Singh, J.S. Thakur, A.S. ThekkePurakkal, C.M. Varghese, K.S.
Reddy, S. Swaminathan, H. Whiteford, H.J. Bekedam, C.J.L. Murray, T. Vos,
L. Dandona, The burden of mental disorders across the states of India: the
Global Burden of Disease study 1990–2017, Lancet Psychiatry 7 (2020) 148–161,
http://dx.doi.org/10.1016/S2215-0366(19)30475- 4.
[2] B.S. Pradeep, G. Gururaj, M. Varghese, V. Benegal, G.N. Rao, G.M. Sukumar, S.
Amudhan, B. Arvind, S. Girimaji, K. Thennarasu, P. Marimuthu, K.J. Vijayasagar,
B. Bhaskarapillai, J. Thirthalli, S. Loganathan, N. Kumar, P. Sudhir, V.A.
Sathyanarayana, K. Pathak, L.K. Singh, R.Y. Mehta, D. Ram, T.M. Shibukumar,
A. Kokane, L.R.K. Singh, B.S. Chavan, P. Sharma, C. Ramasubramanian, P.K.
Dalal, P.K. Saha, S.P. Deuri, A.K. Giri, A.B. Kavishvar, V.K. Sinha, J. Thavody,
R. Chatterji, B.S. Akoijam, S. Das, A. Kashyap, R.V. Sathish, M. Selvi, S.K. Singh,
V. Agarwal, R. Misra, National mental health survey of India, 2016 - rationale,
design and methods, PLoS One 13 (2018) e0205096, http://dx.doi.org/10.1371/
journal.pone.0205096.
[3] M. Rashida, M.A. Habib, A smartphone-based wander management system for
bangla speaking patients with Alzheimer’s disease, Int. J. Inf. Technol. (Singap.)
13 (2021) 2543–2550, http://dx.doi.org/10.1007/s41870-021- 00761-4.
[4] R.S. Murthy, National mental health survey of India 2015–2016, Indian J. Psychi-
atry 59 (2017) 21–26, http://dx.doi.org/10.4103/psychiatry.IndianJPsychiatry_
102_17.
[5] R. Parikh, D. Michelson, M. Sapru, R. Sahu, A. Singh, P. Cuijpers, V. Patel,
Priorities and preferences for school-based mental health services in India: a
multi-stakeholder study with adolescents, parents, school staff, and mental health
providers, Glob. Ment. Health 6 (2019) 1–12, http://dx.doi.org/10.1017/gmh.
2019.16.
[6] S. Bissoyi, M.R. Patra, A similarity matrix based approach for building patient
centric social networks, Int. J. Inf. Technol. (Singap.) 13 (2021) 1449–1455,
http://dx.doi.org/10.1007/s41870-021- 00692-0.
[7] S.A. Abdul-Kader, J. Woods, Question answer system for online feedable new
born Chatbot, in: 2017 Intelligent Systems Conference, IntelliSys 2017, Institute
of Electrical and Electronics Engineers Inc, 2018, pp. 863–869, http://dx.doi.
org/10.1109/IntelliSys.2017.8324231.
[8] L. Zhou, J. Gao, D. Li, H.Y. Shum, The design and implementation of xiaoice,
an empathetic social chatbot, Comput. Linguist. 46 (2020) 53–93, http://dx.doi.
org/10.1162/COLI_a_00368.
[9] B.A. Eren, Determinants of customer satisfaction in chatbot use: evidence from
a banking application in Turkey, Int. J. Bank Mark. 39 (2021) 294–311, http:
//dx.doi.org/10.1108/IJBM-02- 2020-0056.
[10] Q. Chen, Y. Gong, Y. Lu, J. Tang, Classifying and measuring the service
quality of AI chatbot in frontline service, J. Bus. Res. 145 (2022) 552–568,
http://dx.doi.org/10.1016/j.jbusres.2022.02.088.
[11] T. Celine, J. Antony, A study on mental disorders: 5-year retrospective study,
J. Fam. Med. Prim. Care 3 (2014) 12, http://dx.doi.org/10.4103/2249-4863.
130260.
[12] S. Pandey, S. Sharma, S. Wazir, Mental healthcare chatbot based on natural
language processing and deep learning approaches: Ted the therapist, Int. J. Inf.
Technol. (Singap.) (2022) 1–10, http://dx.doi.org/10.1007/s41870-022- 00999-
6.
[13] L. Zhang, Y. Yang, J. Zhou, C. Chen, L. He, Retrieval-polished response gener-
ation for chatbot, IEEE Access 8 (2020) 123882–123890, http://dx.doi.org/10.
1109/ACCESS.2020.3004152.
[14] K. Moore, S. Zhong, Z. He, T. Rudolf, N. Fisher, B. Victor, N. Jindal, A
comprehensive solution to retrieval-based chatbot construction, 2021, https:
//arxiv.org/abs/2106.06139v1 (accessed May 5, 2023).
[15] H. Akkineni, P.V.S. Lakshmi, L. Sarada, Design and development of retrieval-
based chatbot using sentence similarity, in: Lecture Notes in Networks and
Systems, Springer Science and Business Media Deutschland GmbH, 2022, pp.
477–487, http://dx.doi.org/10.1007/978-981- 16-2919- 8_43.
[16] L. Shi, K. Zhang, W. Rong, Query-response interactions by multi-tasks in semantic
search for chatbot candidate retrieval, 2022, https://arxiv.org/abs/2208.11018v1
(accessed May 5, 2023).
[17] D. Wang, H. Fang, Length adaptive regularization for retrieval-based chatbot
models, in: ICTIR 2020 - Proceedings of the 2020 ACM SIGIR International
Conference on Theory of Information Retrieval, Association for Computing
Machinery, 2020, pp. 113–120, http://dx.doi.org/10.1145/3409256.3409823.
[18] H. Qian, Z. Dou, Topic-Enhanced Personalized Retrieval-Based Chatbot, Springer,
Cham, 2023, pp. 79–93, http://dx.doi.org/10.1007/978-3- 031-28238- 6_6.
[19] T. Lan, X.-L. Mao, X. Gao, W. Wei, H. Huang, Ultra-fast, low-storage, highly
effective coarse-grained selection in retrieval-based chatbot by using deep
semantic hashing, 2020, https://arxiv.org/abs/2012.09647v2 (accessed May 5,
2023).
10
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
[20] K. Kadam, S. Godbole, D. Joijode, S. Karoshi, P. Jadhav, S. Shilaskar, Multilingual
Information Retrieval Chatbot, in: Studies in Computational Intelligence, Springer
Science and Business Media Deutschland GmbH, 2022, pp. 107–121, http://dx.
doi.org/10.1007/978-3- 030-96634- 8_10.
[21] D. Wang, H. Fang, Predicting question responses to improve the performance
of retrieval-based chatbot, in: Lecture Notes in Computer Science (Including
Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioin-
formatics), Springer Science and Business Media Deutschland GmbH, 2021, pp.
425–431, http://dx.doi.org/10.1007/978-3- 030-72240- 1_44.
[22] J. Kim, S. Chung, S. Moon, S. Chi, Feasibility study of a BERT-based question
answering chatbot for information retrieval from construction specifications,
in: IEEE International Conference on Industrial Engineering and Engineering
Management, IEEE Computer Society, 2022:, pp. 970–974, http://dx.doi.org/10.
1109/IEEM55944.2022.9989625.
[23] I.T. Aksu, N.F. Chen, L.F. D’Haro, R.E. Banchs, Reranking of Responses using
Transfer Learning for a Retrieval-Based Chatbot, in: Lecture Notes in Electrical
Engineering, Springer Science and Business Media Deutschland GmbH, 2021, pp.
239–250, http://dx.doi.org/10.1007/978-981- 15-9323- 9_20.
[24] R.S. Patchava, J.S. Kiran, Intelligent response retrieval for semantically similar
querying using a chatbot, in: Proceedings of the International Conference on
Intelligent Computing and Control Systems, ICICCS 2020, Institute of Electrical
and Electronics Engineers Inc, 2020, pp. 502–508, http://dx.doi.org/10.1109/
ICICCS48265.2020.9121118.
[25] V. Lopez-Rodriguez, H.G. Ceballos, Retrieval-based statistical chatbot in a scien-
tometric domain, in: Lecture Notes in Computer Science (Including Subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
Springer Science and Business Media Deutschland GmbH, 2022, pp. 303–315,
http://dx.doi.org/10.1007/978-3- 031-19496- 2_23.
[26] R. Dsouza, S. Sahu, R. Patil, D.R. Kalbande, Chat with bots intelligently: A critical
review analysis, in: 2019 6th IEEE International Conference on Advances in
Computing, Communication and Control, ICAC3 2019, Institute of Electrical and
Electronics Engineers Inc, 2019, http://dx.doi.org/10.1109/ICAC347590.2019.
9036844.
[27] L.T. Mudikanwi, T.T. Gotora, Student personal assistant using machine learning,
(n.d.).
[28] A.A. Abd-alrazaq, M. Alajlani, A.A. Alalwan, B.M. Bewick, P. Gardner, M.
Househ, An overview of the features of chatbots in mental health: A scoping
review, Int. J. Med. Inform. 132 (2019) 103978, http://dx.doi.org/10.1016/j.
ijmedinf.2019.103978.
[29] L. Laranjo, A.G. Dunn, H.L. Tong, A.B. Kocaballi, J. Chen, R. Bashir, D. Surian, B.
Gallego, F. Magrabi, A.Y.S. Lau, E. Coiera, Conversational agents in healthcare:
A systematic review, J. Am. Med. Inform. Assoc. 25 (2018) 1248–1258, http:
//dx.doi.org/10.1093/jamia/ocy072.
[30] A. Lommatzsch, J. Katins, An information retrieval-based approach for building
intuitive chatbots for large knowledge bases, in: CEUR Workshop Proceedings,
2019, https://dialogflow.com/ (accessed May 5, 2023).
[31] W. Swartout, R. Artstein, E. Forbell, S. Foutz, H.C. Lane, B. Lange, J. Morie,
D. Noren, S. Rizzo, D. Traum, Virtual humans for learning, AI Mag. 34 (2013)
13–30, http://dx.doi.org/10.1609/aimag.v34i4.2487.
[32] L. Wang, M.I. Mujib, J. Williams, G. Demiris, J. Huh-Yoo, An evaluation of
generative pre-training model-based therapy chatbot for caregivers, 2021, https:
//arxiv.org/abs/2107.13115v1 (accessed May 5, 2023).
[33] S. Sutton, R. Cole, J. de Villiers, J. Schalkwyk, P. Vermeulen, M. Macon, Y. Yan,
E. Kaiser, B. Rundle, K. Shobaki, P. Hosom, A. Kain, J. Wouters, D. Massaro, M.
Cohen, Universal speech tools: the CSLU toolkit, in: 5th International Conference
on Spoken Language Processing, ICSLP 1998, 1998, http://dx.doi.org/10.21437/
icslp.1998-714.
[34] D. Goddeau, H. Meng, J. Polifroni, S. Seneff, S. Busayapongchai, Form-based
dialogue manager for spoken language applications, in: International Conference
on Spoken Language Processing, ICSLP, Proceedings, IEEE, 1996, pp. 701–704,
http://dx.doi.org/10.21437/icslp.1996-177.
[35] P. Parmar, J. Ryu, S. Pandya, J. Sedoc, S. Agarwal, Health-focused conversational
agents in person-centered care: a review of apps, Npj Digit. Med. 5 (2022) 1–9,
http://dx.doi.org/10.1038/s41746-022- 00560-6.
[36] H. Chen, X. Liu, D. Yin, J. Tang, A survey on dialogue systems, ACM SIGKDD
Explor. Newsl. 19 (2017) 25–35, http://dx.doi.org/10.1145/3166054.3166058.
[37] J. Wei, S. Kim, H. Jung, Y.-H. Kim, Leveraging large language models to power
chatbots for collecting user self-reported data, 2023, https://arxiv.org/abs/2301.
05843v1 (accessed May 5, 2023).
[38] Z. Chen, Y. Lu, M.P. Nieminen, A. Lucero, Creating a chatbot for and with
migrants: Chatbot personality drives co-design activities, in: DIS 2020 - Proceed-
ings of the 2020 ACM Designing Interactive Systems Conference, Association
for Computing Machinery, Inc, 2020, pp. 219–230, http://dx.doi.org/10.1145/
3357236.3395495.
[39] B. Thomson, Statistical Methods for Spoken Dialogue Management (Ph.D.),
Cambridge University, 2009, http://mi.eng.cam.ac.uk/~brmt2/papers/2010-
thomson-thesis.pdf (accessed May 3, 2023), t.
[40] Y. Sakurai, Y. Ikegami, M. Sakai, H. Fujikawa, S. Tsuruta, A.J. Gonzalez, E.
Sakurai, E. Damiani, A. Kutics, R. Knauf, F. Frati, VICA, A visual counseling agent
for emotional distress, J. Ambient Intell. Human. Comput. 10 (2019) 4993–5005,
http://dx.doi.org/10.1007/s12652-019- 01180-x.
[41] P. Dulin, R. Mertz, A. Edwards, D. King, Contrasting a mobile app with a
conversational chatbot for reducing alcohol consumption: Randomized controlled
pilot trial, JMIR Form. Res. 6 (2022) e33037, http://dx.doi.org/10.2196/33037.
[42] A. Barnett, M. Savic, K. Pienaar, A. Carter, N. Warren, E. Sandral, V. Manning,
D.I. Lubman, Enacting ‘more-than-human’ care: Clients’ and counsellors’ views
on the multiple affordances of chatbots in alcohol and other drug counselling,
Int. J. Drug Policy 94 (2021) 102910, http://dx.doi.org/10.1016/j.drugpo.2020.
102910.
[43] M.N. Win, L.W. Han, E.K.K. Samson Chandresh Kumar, T.Y. Keat, S.D. Ravana,
AI-based personalized virtual therapist for alcohol relapse, Enthus.: Int. J. Appl.
Stat. Data Sci. (2022) 82–96, http://dx.doi.org/10.20885/enthusiastic.vol2.iss2.
art3.
[44] J. Parviainen, J. Rantala, Chatbot breakthrough in the 2020s? An ethical
reflection on the trend of automated consultations in health care, Med. Health
Care Philos. 25 (2022) (2020) 61–71, http://dx.doi.org/10.1007/s11019-021-
10049.
[45] U. Bharti, D. Bajaj, H. Batra, S. Lalit, S. Lalit, A. Gangwani, Medbot: Conver-
sational Artificial Intelligence Powered Chatbot for Delivering Tele-Health After
COVID-19, Institute of Electrical and Electronics Engineers (IEEE), 2020, pp.
870–875, http://dx.doi.org/10.1109/icces48766.2020.9137944.
[46] N.V. Shinde, A. Akhade, P. Bagad, H. Bhavsar, S.K. Wagh, A. Kamble, Health-
care chatbot system using artificial intelligence, in: Proceedings of the 5th
International Conference on Trends in Electronics and Informatics, ICOEI 2021,
Institute of Electrical and Electronics Engineers Inc, 2021, pp. 1174–1181, http:
//dx.doi.org/10.1109/ICOEI51242.2021.9452902.
[47] S. Greer, D. Ramo, Y.J. Chang, M. Fu, J. Moskowitz, J. Haritatos, Use of the chat-
bot vivibot to deliver positive psychology skills and promote well-being among
young people after cancer treatment: Randomized controlled feasibility trial,
JMIR MHealth UHealth 7 (2019) e15018, http://dx.doi.org/10.2196/15018.
[48] K.K. Fitzpatrick, A. Darcy, M. Vierhile, Delivering cognitive behavior therapy to
young adults with symptoms of depression and anxiety using a fully automated
conversational agent (Woebot): A randomized controlled trial, JMIR Ment. Health
4 (2017) e7785, http://dx.doi.org/10.2196/mental.7785.
[49] H.J. Han, S. Mendu, B.K. Jaworski, J.E. Owen, S. Abdullah, Ptsdialogue:
Designing a conversational agent to support individuals with Post-Traumatic
Stress Disorder, in: UbiComp/ISWC 2021 - Adjunct Proceedings of the 2021
ACM International Joint Conference on Pervasive and Ubiquitous Computing and
Proceedings of the 2021 ACM International Symposium on Wearable Computers,
ACM, New York, NY, USA, 2021, pp. 198–203, http://dx.doi.org/10.1145/
3460418.3479332.
[50] B. Chaix, G. Delamon, A. Guillemassé, B. Brouard, J.E. Bibault, Psychological
distress during the COVID-19 pandemic in France: A national assessment of at-
risk populations, Gen. Psychiatry 33 (2020) 100349, http://dx.doi.org/10.1136/
gpsych-2020- 100349.
[51] Y. Ahn, Y. Zhang, Y. Park, J. Lee, A chatbot solution to chat app problems:
Envisioning a chatbot counseling system for teenage victims of online sexual
exploitation, in: Conference on Human Factors in Computing Systems - Proceed-
ings, Association for Computing Machinery, 2020, http://dx.doi.org/10.1145/
3334480.3383070.
[52] M.L. Tielman, M.A. Neerincx, W.P. Brinkman, Design and evaluation of personal-
ized motivational messages by a virtual agent that assists in post-traumatic stress
disorder therapy, J. Med. Internet Res. 21 (2019) e9240, http://dx.doi.org/10.
2196/jmir.9240.
[53] J. Sebastian, D. Richards, Changing stigmatizing attitudes to mental health via
education and contact with embodied conversational agents, Comput. Hum.
Behav. 73 (2017) 479–488, http://dx.doi.org/10.1016/j.chb.2017.03.071.
[54] L.C. Klopfenstein, S. Delpriori, S. Malatini, A. Bogliolo, The rise of bots: A survey
of conversational interfaces, patterns, and paradigms, in: DIS 2017 - Proceedings
of the 2017 ACM Conference on Designing Interactive Systems, Association
for Computing Machinery, Inc, 2017, pp. 555–565, http://dx.doi.org/10.1145/
3064663.3064672.
[55] J. Zhang, Y.J. Oh, P. Lange, Z. Yu, Y. Fukuoka, Artificial intelligence chatbot
behavior change model for designing artificial intelligence chatbots to promote
physical activity and a healthy diet: Viewpoint, J. Med. Internet Res. 22 (2020)
e22845, http://dx.doi.org/10.2196/22845.
[56] J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidi-
rectional transformers for language understanding, in: Chapter of the Association
for Computational Linguistics: Human Language Technologies - Proceedings
of the Conference, Association for Computational Linguistics, ACL, 2019, pp.
4171–4186, https://arxiv.org/abs/1810.04805v2 (accessed May 3, 2023).
[57] R. Alec, W. Jeffrey, C. Rewon, L. David, A. Dario, S. Ilya, Language models
are unsupervised multitask learners |enhanced reader, OpenAI Blog 1 (2019) 9,
https://github.com/codelucas/newspaper (accessed May 3, 2023).
[58] T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A.
Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G.
Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter, C.
Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner,
S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-
shot learners, in: Advances in Neural Information Processing Systems, 2020, pp.
1877–1901, https://commoncrawl.org/the-data/ (accessed May 3, 2023).
11
S. Pandey and S. Sharma Healthcare Analytics 3 (2023) 100198
[59] X.P. Qiu, T.X. Sun, Y.G. Xu, Y.F. Shao, N. Dai, X.J. Huang, Pre-trained models
for natural language processing: A survey, Sci. China Technol. Sci. 63 (2020)
1872–1897, http://dx.doi.org/10.1007/s11431-020- 1647-3.
[60] J.S. Lee, J. Hsiang, Patent claim generation by fine-tuning OpenAI GPT-2, World
Patent Inf. 62 (2020) 101983, http://dx.doi.org/10.1016/j.wpi.2020.101983.
[61] J. Vig, A multiscale visualization of attention in the transformer model, in:
ACL 2019-57th Annual Meeting of the Association for Computational Linguistics,
Proceedings of System Demonstrations, 2019, pp. 37–42, http://dx.doi.org/10.
18653/v1/p19-3007.
[62] A. See, C.D. Manning, Understanding and predicting user dissatisfaction in a
neural generative chatbot, in: SIGDIAL 2021-22nd Annual Meeting of the Special
Interest Group on Discourse and Dialogue, Proceedings of the Conference, 2021,
pp. 1–12, https://aclanthology.org/2021.sigdial-1.1 (accessed May 5, 2023).
[63] S.A. Sheikh, V. Tiwari, S. Singhal, Generative model chatbot for human resource
using deep learning, in: 2019 International Conference on Data Science and
Engineering, ICDSE 2019, Institute of Electrical and Electronics Engineers Inc,
2019, pp. 126–132, http://dx.doi.org/10.1109/ICDSE47409.2019.8971795.
[64] T. Hirosawa, Y. Harada, M. Yokose, T. Sakamoto, R. Kawamura, T. Shimizu, Diag-
nostic accuracy of differential-diagnosis lists generated by generative pretrained
transformer 3 chatbot for clinical vignettes with common chief complaints: A
pilot study, Int. J. Environ. Res. Public Health 20 (2023) 3378, http://dx.doi.
org/10.3390/ijerph20043378.
[65] S. Sawant, A. Vishwakarma, P. Sawant, P. Bhavathankar, Analytical and sen-
timent based text generative chatbot, in: 2021 12th International Conference
on Computing Communication and Networking Technologies, ICCCNT 2021,
Institute of Electrical and Electronics Engineers Inc, 2021, http://dx.doi.org/10.
1109/ICCCNT51525.2021.9580069.
[66] P. Si, Y. Qiu, J. Zhang, Y. Yang, Guiding topic flows in the generative chatbot
by enhancing the ConceptNet with the conversation corpora, 2021, https://arxiv.
org/abs/2109.05406v2 (accessed May 5, 2023).
[67] V. Raj, M.S.B. Phridviraj, A generative model based chatbot using recurrent
neural networks, in: Communications in Computer and Information Science,
Springer Science and Business Media Deutschland GmbH, 2023, pp. 379–392,
http://dx.doi.org/10.1007/978-3- 031-28183- 9_27.
[68] F.A. Bachtiar, A.D. Fauzulhaq, M.T.R. Manullang, F.R. Pontoh, K.S. Nugroho,
N. Yudistira, A Generative-Based Chatbot for Daily Conversation: A Prelimi-
nary Study, in: ACM International Conference Proceeding Series, Association
for Computing Machinery, 2022, pp. 8–12, http://dx.doi.org/10.1145/3568231.
3568234.
[69] M.A. Khadija, W. Nurharjadmo, . Widyawan, Deep learning generative Indone-
sian response model chatbot for JKN-KIS, in: APICS 2022-2022 1st International
Conference on Smart Technology, Applied Informatics, and Engineering, Pro-
ceedings, Institute of Electrical and Electronics Engineers Inc, 2022, pp. 70–74,
http://dx.doi.org/10.1109/APICS56469.2022.9918686.
[70] Methods of collecting data |boundless psychology, (n.d.), 2022,
https://courses.lumenlearning.com/boundless-psychology/chapter/methods-
of-collecting- data/ (accessed February 20, 2022).
[71] L.A. Palinkas, Qualitative and mixed methods in mental health services and
implementation research, J. Clin. Child Adolesc. Psychol. 43 (2014) 851–861,
http://dx.doi.org/10.1080/15374416.2014.910791.
[72] . Kaggle, Kaggle: Your Machine Learning and Data Science Community, Kaggle,
2019, https://www.kaggle.com/ (accessed February 20, 2022).
[73] Github, GitHub: Where the World Builds Software GitHub, Internet, 2021, p.
1, https://github.com/ (accessed February 20, 2022).
[74] Reddit, Reddit - Dive into anything, 2021, https://www.reddit.com/ (accessed
February 20, 2022).
[75] L.A. Claude, J. Houenou, E. Duchesnay, P. Favre, Will machine learning applied
to neuroimaging in bipolar disorder help the clinician? A critical review and
methodological suggestions, Bipolar Disord. 22 (2020) 334–355, http://dx.doi.
org/10.1111/bdi.12895.
[76] Mental health FAQ for chatbot |kaggle, (n.d.), 2022, https://www.kaggle.com/
narendrageek/mental-health- faq-for- chatbot (accessed February 20, 2022).
12
... The retrieval-based model works on the principles of retrieving a response from a set of predefined responses using various techniques, such as a straightforward rule base or ML ensemble techniques, to provide the best possible response given the context [28]. While training such models on a large dataset can make them less prone to errors, they are still prone to errors due to other limitations. ...
... While training such models on a large dataset can make them less prone to errors, they are still prone to errors due to other limitations. One notable limitation of retrieval-based models is that they can only generate a response from the available options rather than generating a new text [28]. Additionally, due to its limitations, the model's responses do not sound human-like, thus making the model unsuitable for chatbots that require complex queries [28]. ...
... One notable limitation of retrieval-based models is that they can only generate a response from the available options rather than generating a new text [28]. Additionally, due to its limitations, the model's responses do not sound human-like, thus making the model unsuitable for chatbots that require complex queries [28]. ...
Article
Full-text available
With the emergence of artificial intelligence (AI), machine-learning (ML), and chatbot technologies, the field of education has been transformed drastically. The latest advancements in AI chatbots (such as ChatGPT) have proven to offer several benefits for students and educators. However, these benefits also come with inherent challenges, that can impede students’ learning and create hurdles for educators. The study aims to explore the benefits and challenges of AI chatbots in educational settings, with the goal of identifying how they can address existing barriers to learning. The paper begins by outlining the historical evolution of chatbots along with key elements that encompass the architecture of an AI chatbot. The paper then delves into the challenges and limitations associated with the integration of AI chatbots into education. The research findings from this narrative review reveal several benefits of using AI chatbots in education. AI chatbots like ChatGPT can function as virtual tutoring assistants, fostering an adaptive learning environment by aiding students with various learning activities, such as learning programming languages and foreign languages, understanding complex concepts, assisting with research activities, and providing real-time feedback. Educators can leverage such chatbots to create course content, generate assessments, evaluate student performance, and utilize them for data analysis and research. However, this technology presents significant challenges concerning data security and privacy. Additionally, ethical concerns regarding academic integrity and reliance on technology are some of the key challenges. Ultimately, AI chatbots offer endless opportunities by fostering a dynamic and interactive learning environment. However, to help students and teachers maximize the potential of this robust technology, it is essential to understand the risks, benefits, and ethical use of AI chatbots in education.
... In the context of organic chemistry, the study [30] assessed the accuracy of ChatGPT and Bard in understanding structural notations and answering related questions, revealing limitations in their ability to handle complex tasks and highlighting the need for further training and development. Additionally, the study [31] compared the accuracy of retrieval-based and generative-based chatbots in the mental healthcare domain, with retrieval-based models achieving an accuracy of 65.5% and generative-based models achieving 71.2%. These studies underscore the ongoing efforts to evaluate and enhance the accuracy of AI chatbots across various domains, paving the way for their reliable and effective integration into diverse fields. ...
... While 43 out of the 52 reviewed papers conducted performance evaluations of their AI-Chatbots [1][2][3][4][5][6][7][8][10][11][12][13][14][15][16][17][18][19][20][21][22][23]25,26,[28][29][30][31][32][33][34][37][38][39][40][42][43][44][45], this research presents a more rigorous and comprehensive evaluation, leveraging a significantly larger dataset (as shown in Table 6) and diverse deployment strategies. A user study involving 35 participants across the US, UK, and Australia, who executed 321 news queries tailored to various categories and geographic locations (Table 7), demonstrated the chatbot's superior performance, achieving precision, recall, and F1-score of 0.96, 0.99, and 0.97, respectively. ...
Article
Full-text available
This study advances AI-powered news delivery by introducing an innovative chatbot capable of providing personalized news summaries and real-time event analysis. This approach addressed a critical gap identified through a comprehensive review of 52 AI chatbot studies. Unlike prior models limited to static information retrieval or predefined interactions, this chatbot harnesses generative AI and real-time data integration to deliver a dynamic and tailored news experience. Its unique architecture combines conversational AI, robotic process automation (RPA), a comprehensive news database (989,432 reports from 2342 sources spanning 27 October 2023 to 30 September 2024), and a large language model (LLM). Within this architecture, LLM generates dynamic queries against the News database for obtain tailored News for the users. Hence, this approach interprets user intent, and delivers LLM-based summaries of the fetched tailored news. Empirical testing with 35 users across 321 diverse news queries validated its robustness in navigating a combinatorial classification space of 53,916,650 potential news categorizations, achieving an F1-score of 0.97, recall of 0.99, and precision of 0.96. Deployed on Microsoft Teams and as a standalone web app, this research lays the foundation for transformative AI applications in news analysis, promising to revolutionize news consumption and empower a more informed citizenry.
... The field of dentistry has undergone significant transformation in recent decades, with AI-based technologies playing a critical role in this shift (14). Unlike search engines that provide general information from various sources on a specific topic, AI-powered chatbots present information in a conversational style, making it easier to understand complex subjects (15,16). Although there has been growing interest in the use of chatbots in medical research, concerns persist regarding the accuracy and reliability of the health information provided by these chatbots (17). ...
Article
Aim: This study aimed to evaluate the reliability and consistency of four artificial intelligence (AI) chatbots—ChatGPT 3.5, Google Gemini, Bing, and Claude AI—as public sources of information on the management of primary tooth trauma. Materials and Methods: A total of 31 dichotomous questions were developed based on common issues and concerns related to dental trauma, particularly those frequently raised by parents. Each question, sequentially presented to the four AI chatbots, was repeated three times daily, with a one-hour interval between repetitions, over a five-day period, to assess the reliability and reproducibility of responses. Accuracy was determined by calculating the proportion of correct responses, with 95% confidence intervals estimated using the Wald binomial method. Reliability was assessed using Fleiss’ kappa coefficient. Results: All AI chatbots demonstrated high accuracy. Bing emerged as the most accurate model, achieving an accuracy rate of 96.34%, while Claude had the lowest accuracy at 88.17%. Consistency was classified as “almost perfect” for ChatGPT, Bing, and Gemini, whereas Claude exhibited a “substantial” level of agreement. These findings underscore the relative performance of AI models in tasks requiring high accuracy and reliability. Conclusion: These results emphasize the importance of critically evaluating AI-based systems for their potential use in clinical applications. Continuous improvements and updates are essential to enhance their reliability and ensure their effectiveness as public information tools.
... A mental health chatbot, "Ted the Therapist" [28], used deep learning and NLP for support but struggled with longterm dependencies in conversations. A rule-based ecommerce chatbot [29] for Covenant University relied on stored templates, leading to repetitive interactions. ...
... The use of social media has grown rapidly over the last decade, significantly impacting patterns of interaction, communication, and societal habits. The Quick access, ease of communication, and the availability of various information at the click of a button are some of the main benefits of social media [1]. However, behind these benefits lies the phenomenon of social media addiction, which is increasingly widespread, particularly among teenagers and young adults. ...
Article
The rapid growth of social media has transformed interaction and communication patterns, but it has also led to the rise of social media addiction, particularly among teenagers and young adults. This addiction, marked by compulsive usage and negative impacts on mental health and daily life, necessitates effective interventions. This research explores the development and evaluation of an AI-based chatbot designed to mitigate social media addiction by employing cognitive and behavioural strategies. The study utilizes the Waterfall model—a structured, sequential approach—in the chatbot’s development, encompassing stages from needs analysis to maintenance. The chatbot’s effectiveness was assessed through rigorous testing and user feedback. The methodology included problem analysis, system design, implementation, testing, and iterative improvements. A comprehensive needs analysis identified the psychological and behavioural factors contributing to social media addiction, leading to the design of a prototype chatbot integrated with AI for dynamic content adaptation and real-time feedback. The implementation phase focused on coding and system integration, followed by rigorous testing using Black Box Testing and the System Usability Scale (SUS) to ensure functionality and user-friendliness. Results indicate that the chatbot significantly reduced social media addiction scores, with a mean decrease from 55.21 to 50.17, supported by a highly significant p-value of <0.0001. User satisfaction was high, particularly regarding ease of use and information quality. However, user engagement declined over time, highlighting the need for ongoing content updates and feature enhancements. This study contributes to the field by providing insights into the application of the Waterfall model in AI chatbot development and offers a scalable solution for addressing social media addiction, with implications for future digital interventions in mental health.
... In recent years, scholars have begun exploring automated Q&A for psychological wellness, with particular enthusiasm for generating empathetic responses [60] [61]. Many researchers have utilized Transformer-based encoderdecoder frameworks [62] or pre-trained language models [63] to generate empathetic responses under the guidance of user emotions. ...
Article
Full-text available
Psychological wellness has become an increasingly significant global health issue, with millions affected worldwide. In response to the growing demand for accessible psychological support, we propose a novel model, Psychological Snapshot Guided Pairing (PSG-Pair), designed to enhance query-response pairing in Community Question Answering for Psychological Wellness (CQA-PW). Unlike traditional methods that focus solely on conceptual-level matching, PSG-Pair integrates role-based psychological snapshots derived from the historical posts of help-seekers and supporters. The model operates in two phases: the initial screening phase, which utilizes a BERT-based retrieval model to filter relevant supportive posts, and the pairing phase, which incorporates psychological snapshots using a stacked attention mechanism to refine conceptual pairings based on the psychological characteristics of users. Extensive experiments conducted on the CLPsych 2022 Shared Task dataset demonstrate that PSG-Pair significantly outperforms traditional single-phase models, enhancing both precision and recall in pairing processes. The inclusion of psychological snapshots allows the model to better handle the complexities of psychological wellness scenarios, thereby improving the overall effectiveness of automated psychological support systems. However, this study has several limitations. Firstly, the dataset used for experiments, although rich, still suffers from data imbalance and noise due to the high proportion of irrelevant negative samples, which could potentially impact the model’s performance. Secondly, while the approach demonstrates promising results in the context of psychological wellness, the generalizability of the model to other domains or applications remains uncertain. Further exploration into the adaptability of PSG-Pair to diverse scenarios is required. Additionally, while the current evaluation metrics adequately reflect the retrieval and pairing capabilities, there is a need for the development of more tailored evaluation systems to assess models within the unique context of psychological wellness support. Future work should also investigate how to mitigate biases in user-generated content, as the quality and authenticity of answers in non-factual Q&A platforms can vary significantly, potentially affecting the accuracy of the pairing.
Article
Full-text available
Substance abuse remains a critical public health challenge in the United States, impacting individuals, families, and communities. Behavioral therapies, a cornerstone of substance abuse treatment, have seen advancements with the integration of technology. This paper delves into the application of Generative Artificial Intelligence (Gen AI) in enhancing behavioral therapies for substance abuse treatment and recovery. Gen AI, with its ability to generate personalized, adaptive, and context-aware responses, offers new avenues for addressing the unique needs of individuals in treatment programs. The study outlines the architecture and mechanics of Gen AI algorithms, detailing their design, training methodologies, and real-world applications in therapeutic settings. It examines the integration of Gen AI in therapy delivery, including personalized virtual therapy sessions, real-time relapse prevention tools, and recovery support systems. Additionally, the paper provides step-by-step implementation guides, supported by practical code examples, flow diagrams, and illustrative figures, to demonstrate how these technologies can be deployed in treatment settings. Furthermore, the paper explores the ethical considerations, challenges, and potential limitations of using Gen AI in substance abuse treatment, such as data privacy, bias mitigation, and the need for human oversight. By combining theoretical insights with practical applications, this study aims to inform researchers, clinicians, and policymakers on leveraging Gen AI to improve treatment outcomes, enhance recovery experiences, and ultimately address the ongoing substance abuse crisis in the United States.
Article
Background This study assessed the accuracy and consistency of responses provided by six Artificial Intelligence (AI) applications, ChatGPT version 3.5 (OpenAI), ChatGPT version 4 (OpenAI), ChatGPT version 4.0 (OpenAI), Perplexity (Perplexity.AI), Gemini (Google), and Copilot (Bing), to questions related to emergency management of avulsed teeth. Materials and Methods Two pediatric dentists developed 18 true or false questions regarding dental avulsion and asked public chatbots for 3 days. The responses were recorded and compared with the correct answers. The SPSS program was used to calculate the obtained accuracies and their consistency. Results ChatGPT 4.0 achieved the highest accuracy rate of 95.6% over the entire time frame, while Perplexity (Perplexity.AI) had the lowest accuracy rate of 67.2%. ChatGPT version 4.0 (OpenAI) was the only AI that achieved perfect agreement with real answers, except at noon on day 1. ChatGPT version 3.5 (OpenAI) was the AI that showed the weakest agreement (6 times). Conclusions With the exception of ChatGPT's paid version, 4.0, AI chatbots do not seem ready for use as the main resource in managing avulsed teeth during emergencies. It might prove beneficial to incorporate the International Association of Dental Traumatology (IADT) guidelines in chatbot databases, enhancing their accuracy and consistency.
Article
Traditional health-seeking behaviors, particularly in menstrual health, are deeply rooted in local languages, sociocultural norms, and religious beliefs. These factors significantly influence women in South Asia and other developing regions, often proving detrimental to their physical and psychological health. Recognizing the multifaceted nature of these challenges, this paper investigates the role of generative domain-specific chatbots as a supportive mechanism to enhance menstrual health-seeking behaviors among South Asian women. Specifically, it aims to create and assess Mai, a conversational agent, which utilizes Roman-Urdu and English models to provide accurate information, thereby contributing to the broader effort of empowering women with knowledge and resources. Due to a lack of conversational data in this domain, our research involves creating a dataset from existing English sources and translating it into Roman Urdu, presenting its own set of challenges. These datasets, specifically curated for this research, addressed a wide range of menstrual health topics. Despite being an English monolingual model, DialoGPT’s transformer architecture produced favorable results when fine-tuned on a Roman Urdu dataset. While the Roman Urdu version occasionally faced challenges with contextual understanding, both models achieved promising results. Automatic evaluation metrics indicated strong performance for Mai compared to baseline models. Likewise, user evaluations yielded average ratings of 4.2 and 4.3 (out of 5) for the Roman Urdu and English chatbots, respectively, and more than half of the medical professionals surveyed expressed willingness to recommend the dialogue system. Overall, our findings highlight the transformative potential of domain-specific dialogue systems in health education, with room for improvement in overcoming cultural and linguistic barriers to women’s health information access.
Article
Full-text available
The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.
Article
Large language models (LLMs) provide a new way to build chatbots by accepting natural language prompts. Yet, it is unclear how to design prompts to power chatbots to carry on naturalistic conversations while pursuing a given goal such as collecting self-report data from users. We explore what design factors of prompts can help steer chatbots to talk naturally and collect data reliably. To this aim, we formulated four prompt designs with different structures and personas. Through an online study (N = 48) where participants conversed with chatbots driven by different designs of prompts, we assessed how prompt designs and conversation topics affected the conversation flows and users' perceptions of chatbots. Our chatbots covered 79% of the desired information slots during conversations, and the designs of prompts and topics significantly influenced the conversation flows and the data collection performance. We discuss the opportunities and challenges of building chatbots with LLMs.
Chapter
Conversational modeling is an important task in natural language understanding and machine intelligence. It makes sense for natural language to become the primary way in which we interact with devices because that is how humans communicate with each other. Thus, the possibility of having conversations with machines would make our interaction much more smooth and human-like. The natural language techniques need to be evolved to match the level of power and sophistication that users expect from virtual assistants. Although previous approaches exist, they are often restricted to specific domains and require handcrafted rules. The obvious problem lies in their inability to answer questions for which the rules were not written. To overcome this problem, we build a generative model neural conversation system using a deep LSTM Sequence to Sequence model with an attention mechanism. Our main emphasis is to build a generative model chatbot in open domain which can have a meaningful conversation with humans. We consider Reddit conversation datasets to train the model and applied turing test on the proposed model. The proposed chatbot model is compared with Cleverbot and the results are presented.KeywordsChatbotRecurrent Neural NetworksLSTM
Chapter
Building a personalized chatbot has drawn much attention recently. A personalized chatbot is considered to have a consistent personality. There are two types of methods to learn the personality. The first mainly model the personality from explicit user profiles (e.g., manually created persona descriptions). The second learn implicit user profiles from the user’s dialogue history, which contains rich, personalized information. However, a user’s dialogue history can be long and noisy as it contains long-time, multi-topic historical dialogue records. Such data noise and redundancy impede the model’s ability to thoroughly and faithfully learn a consistent personality, especially when applied with models that have an input length limit (e.g., BERT). In this paper, we propose deconstructing the long and noisy dialogue history into topic-dependent segments. We only use the topically related dialogue segment as context to learn the topic-aware user personality. Specifically, we design a Topic-enhanced personalized Retrieval-based Chatbot, TopReC. It first deconstructs the dialogue history into topic-dependent dialogue segments and filters out irrelevant segments to the current query via a Heter-Merge-Reduce framework. It then measures the matching degree between the response candidates and the current query conditioned on each topic-dependent segment. We consider the matching degree between the response candidate and the cross-topic user personality. The final matching score is obtained by combining the topic-dependent and cross-topic matching scores. Experimental results on two large dataset show that TopReC outperforms all previous state-of-the-art methods.KeywordsPersonalizationDialogue systems
Article
Binge drinking is one type of harmful alcohol use that has a variety of negative health impacts in both the drinker and others, either globally or in Malaysia. According to previous research, one in two current drinkers in Malaysia who are 13 years and older reported having engaged in binge drinking. Therefore, increased attention should be given to understand the drinking pattern of an individual and propose a solution that can help with addiction relapse. Thus, this study identified interventions that could assist alcohol relapse recovery and proposed a new generation of relapse prevention solution based on artificial intelligence (AI). By using a deep learning approach and machine learning based recommendation technique, it predicts the relapse rate of users, providing recovery consultation based on the user’s data and clinical data through a chatbot. This study involved helpful data collection, advanced data modeling, prediction analysis to support the alcohol relapse recovery journey. Hence, the proposed AI solution acted as a personalized virtual therapist to help the addicts stay sober. The objective is to present the design and realization of the AI based solution for sober journey. The proposed solution was tested with pilot study and significant benefits of virtual therapists for alcohol addiction relapse is reported in this paper.