Content uploaded by Kim Phuc TRAN
Author content
All content in this area was uploaded by Kim Phuc TRAN on Jul 06, 2022
Content may be subject to copyright.
Decision Support Systems for Healthcare based on
Probabilistic Graphical Models: a survey and
perspective
Ali Raza1,2, Kim Phuc Tran*1, Ludovic Koehl1, and Shujun Li2
1University of Lille, ENSAIT, GEMTEX, Lille, France,
*Corresponding author. Email: kim-phuc.tran@ensait.fr
2School of Computing & Institute of Cyber Security for Society
(iCSS), University of Kent, UK
November 21, 2021
Contents
1 Introduction 2
1.1 Probabilistic modeling . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Applications of PGMs . . . . . . . . . . . . . . . . . . . . . . 4
2 Decision Support Systems in Healthcare 6
2.1 Probablistic Graphical Models . . . . . . . . . . . . . . . . . 7
2.2 Bayesian networks: Directed graphical models . . . . . . . . . 8
2.3 Markov Random fields . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Neural networks with Probabilistic Graphical Models . . . . . 15
3 Artificial Intelligence in Healthcare Applications 18
4 Healthcare Decision Support Systems based on Probabilistic
Graphical Models 20
5 Perspectives for Healthcare Decision Support Systems based
on Probabilistic Graphical Models 21
6 Case Studies 23
6.1 Logistic Regression for ECG classification . . . . . . . . . . . 23
6.2 Variational Autoencoder for ECG anomaly Detection . . . . . 24
7 Conclusions 29
1
Acronyms
AI Artificial Intelligence NN Neural Networks
CNN Convolutional Neural Network DSS Decision support system
DNN Deep Neural Networks MLP Multi Layer Perceptron
BN Bayesian network MRF Markov Random Field
IoT Internet of Things ECG Electrocardiograph
RMSE Root Mean Squared Error CE Cross Entropy
CS Computer Science SVM Support Vector Machine
VAE Variational Autoencoder CPD Conditional Probability Distribution
RNN Recurrent Neural Network PCA Principle Component Analysis
1 Introduction
Probabilistic graphical modeling (PGM) deals with the branch of machine
learning which studies the use of probability distributions to describe a given
event to make useful predictions about it. PGM is widely used throughout
machine learning and in many real-time applications. Such techniques can
be used to address problems in fields such as medicine, language processing,
computer vision, and many more. This combination of theory and powerful
applications makes PGMs one of the most interesting topics in the mod-
ern era of artificial intelligence (AI) and computer science (CS). The major
advantage of probabilistic models is that they provide an idea about the
uncertainty associated with predictions. Such ideas related to uncertainty
and confidence are of extreme utility when it comes to sensitive and critical
machine learning applications, such as clinical healthcare. To understand
probabilistic models at the abstract level, let us consider a classification
problem with Nclasses. If the model is probabilistic it will provide prob-
abilities for each of the N classes for a given input i.e., the model which
provides probability distribution over the Nclasses. Usually, we consider
the class with the highest probability as the output class. Note that logistic
regression based on sigmoid function can be considered as an exception, as
it provides the probability in relation to one class only. Typical examples
of probabilistic models in machine learning are logistic regression, Hidden
Markov models and Bayesian classifiers, and Neural Networks with softmax
function (we will discuss each example in detail in the upcoming sections.).
Another way to understand the difference between probabilistic and non-
probabilistic models is their respective objective functions. For example, in
linear regression, the objective function is based on the squared error. Where
the objective is to minimize the Mean Squared Error or Root Mean Squared
Error (RMSE), as given by equation 1.
RM SE =v
u
u
t(1
n)
n
X
i=1
(y−y0)2,(1)
Here, nis the total number of data samples, yis the true label, and y0
2
is the predicted label. The intuition behind this is to calculate the mean
squared error by predicting a given data point based on the difference be-
tween the actual value and the predicted value. As the objective function
here is not based on probabilities, but on the absolute difference between
the actual value and the predicted value. Hence, they can be considered as
non-probabilistic models. Typical examples of non-probabilistic models are
Support Vectors Machines (SVMs) and linear regression (here we are talk-
ing about a regression problem, not a classification problem). In regards to
probabilistic models, such as neural networks with Softmax output function,
the objective function is usually cross-entropy (binary cross-entropy in case
of a binary classifier), given by equation 2. Here, p(yi) and yiis predicted
label and true label of data sample irespectively. The intuition behind
cross-entropy is ; if the probabilistic model predicts the true class of a data
point with high confidence, the loss will be less.
CE =−1
n
n
X
i=1
(log(p(yi))),(2)
As we notice that cross-entropy is based on probabilities and hence, such
models can be regarded as probabilistic models. Therefore, to differentiate
between probabilistic and non-probabilistic models, one of the easiest ways
is to analyze the loss function of the model.
1.1 Probabilistic modeling
To understand probabilistic modeling, the simplest way would be to define
a real-world model in the form of the mathematical equation.
y=αx, (3)
where yis the dependent variable which we want to predict, and xis the
independent variable, upon which the yis dependent. For example, ymay
be the price of a car, and xare features that affect price, e.g., color, the
number of seats, the engine size, etc. We assume that yis a linear function
of x, parameterized by α. However, real-world events are very complicated
to model because they involve a certain amount of uncertainty. Therefore
we model such events in the form of probability distributions, represented as
p(x, y), where pis the probability distribution. The probabilistic aspect of
modeling has significant importance, because: we cannot perfectly predict
the future as we have enough knowledge about the event, and often the
world itself is stochastic. Moreover, we need to assess the confidence of
our predictions; often, predicting a single value is not enough, we need the
system to output its beliefs about what’s going on in the event. To overcome
this we can write the probability model as a product of factors.
P(y, x1, x2, . . . , xn) = p(y)φn
i=1p(xi|y) (4)
3
Small number if parameters can used to describe each factor p(xi|y).
Probabilities with graphs are a convenient way to represent the indepen-
dent assumptions. Such graphical representations provide an easy under-
standing. We use graphs to analyze the speed of learning algorithms and
to quantify the computational complexity (e.g., NP-hardness) of different
learning tasks.
1.2 Applications of PGMs
PGMs have a number of diverse real-world applications. Typical examples
of such applications include Image generation, inpainting, denoising, lan-
guage translation, speech recognition, and diagnosis in clinical healthcare
and medicine.
Here we provide an overview of application in healthcare and medicine.
PGMs can assist doctors in diagnosing diseases and prognoses. For example,
in1Bayesian networks (we will discuss later) based model was developed for
diagnosing pneumonia. Their model was able to distinguish patients with
pneumonia from patients with other diseases with high sensitivity (0.95)
and specificity (0.965), and was used for many years in the clinic. Figure 1
outline the network proposed in1.
Figure 1: Structure of Bayesian network. Figure adopted from 1
Regarding the application of PGMs in healthcare, probabilistic methods
lie primarily in the realm of Artificial Intelligence (AI). The AI community
first encountered these methods in the search of building computerized sys-
tems designed to perform complex tasks, such as medical diagnosis, at an
expert level. Researchers in this domain quickly realized the need for meth-
4
ods that allow the integration of evidence and information to provide support
for making decisions under certain uncertainty. Furthermore, academia has
recognized that Decision Support Systems (DSS) have the utmost impor-
tant roles in computer-based information systems and play a crucial role
in supporting managers in their semi-structured or unstructured decision-
making activities. By using a predefined set of rules, DSS extracts knowledge
from complex data and presents it at an appropriate time . For instance, 2
claimed that information systems should exist only to support decisions.
Thereafter, there has been an exponentially growing amount of research in
the area of DSS. Medical diagnosis is one of the most important research
subjects in medical informatics. Hence, a lot of research is being carried out
in the application of DSS in healthcare. By adopting proper DSS, health-
care can become more easily accessible to remote and large populations.
Furthermore, Physicians can have easy access to medical records, medical
test results, medical images, and information about medication, remotely
anytime3. Moreover, health activity requires responsibility in managing a
large amount of healthcare-related information. It can be done by proper
modeling of information that field experts can continuously build a strong
policy of welfare. The main goal of DSS is to provide experts with informa-
tion when it is needed. Such systems provide knowledge, models, and data
processing tools to help the experts in making efficient and better decisions
in many situations. The goal of such systems is to resolve several problems
in healthcare to help patients their families, and the clinical practitioners
manage their healthcare by providing better access to these services4.
A lot of research has been carried in the application of DSS in health-
care.5suggests that data mining methods are promising in the application of
DSS in healthcare. A prototype of a system for self-management in health-
care to assist patients with diabetes and to track their blood glucose levels
has been developed in6.7used web-based DSS for departments related to
emergency to assess the performance features source of the recommenda-
tions generated by experts. The results show that a remote clinical decision
support system decreases time-to-trial in the decision support to clinical
interventions.
In regards to the importance of DSS in healthcare, this chapter reviews
research work on healthcare DSS based on Probabilistic Graphical Models
(PGMs)8and machine learning. The rest of the chapter is organized as
follows: Section 2 discusses decision support systems in Healthcare. Section
3 a review about the application of artificial intelligence in healthcare. Sec-
tion 4 discusses healthcare DSSs based on Probabilistic Graphical Models.
Section 5 provides perspectives for Healthcare DSSs based on Probabilistic
Graphical Models. Section 6 provides case studies of DSS in healthcare.
Section 7 concludes the chapter.
5
2 Decision Support Systems in Healthcare
Decision support systems (DSSs) in healthcare are intended to assist physi-
cians and other health practitioners in decision-making tasks. It can be
also defined as a computerized algorithm that uses data from a number
of patients to generate case-specific or encounter-specific advice9. Decision
support systems in healthcare have been studied and explored extensively
in the healthcare industry. These systems links observations with health
knowledge to determine health options by health practitioners for improved
healthcare. The main idea of healthcare DSS is a set of rules derived from
medical professionals applied to dynamic knowledge. Data mining is well
suited to give decision support for healthcare. There are several proba-
bilistic classification techniques available that can be used for healthcare
decision support systems. Various techniques are being used for differential
diagnoses. Health decision support systems provide a number of soft com-
puting techniques to derive useful information from data repositories, human
knowledge, and literature to support decision-making across operational and
clinical healthcare processes.
DSSs are an important part of modern healthcare organizations. They
help felicitate patients, practitioners, and healthcare stakeholders by provid-
ing patient-centered information and expert health knowledge10. To improve
the efficiency and quality of healthcare, healthcare decision-making uses the
knowledge obtained from the smart decisions systems. For example, auto-
mated DSSs for ECG is available in primary health care units and hospitals
to fulfill the increasing healthcare requirements of prognosis in the domain
of heart diseases. A lot of studies have used healthcare DSSs to promote
individualized cardiovascular prevention11;12;13. DSSs provide timely infor-
mation at the point of care to inform patient care decisions. The use cases
of decision support systems can be summarized as follow;
1. Clinical Management: DSS can alert healthcare practitioners to reach
out to patients who have not followed management schedules, or are
due for a follow-up, and help identify patients eligible for research
based on specific criteria14 .
2. Diagnosis Support: DSSs for healthcare diagnosis, known as diagnostic
decision support systems (DDSs) have traditionally provided comput-
erized support, whereby they might be provided an input (data/user
selections), and then the output of possible diagnoses15;16;17 .
Moreover, the healthcare industry generates a large amount of data.
Consequently, DSSs are used extensively to capture and transfer informa-
tion. Therefore, in this section, we will briefly overview various classifica-
tion techniques for healthcare decision support systems. Hence, this section
summarizes the historical and state of the art decision support systems in
6
healthcare, and analyzes the success factors needed for widespread deploy-
ment, and postulates the future trends of the field in the context of a new
decision management paradigm.
2.1 Probablistic Graphical Models
PGMs were developed in the 1980s by researchers working in Mathemat-
ics, Artificial Intelligence, and Economy with the purpose of solving com-
plex problems which were proven not to be solvable by methods existing
so far. PGMs are a rich framework for encoding probability distributions
over complex domains: joint (multivariate) distributions over large numbers
of random variables that interact with each other. These representations
exist in the intersection of statistics and computer science, depending on
concepts from probability theory, graph algorithms, machine learning, and
more. They are the basis for the state-of-the-art methods in a number of
applications, such as healthcare, image processing, speech, and natural lan-
guage processing, etc. They are also the building blocks in the formulation
of many machine learning problems.
PGMs allow dealing with problems that were not solvable with traditional
probabilistic methods or other artificial intelligence techniques.
Mainly, there are two representations commonly used, (1) Bayesian net-
works and (2) Markov random fields, as shown in Figure 2.
Figure 2: Types of PGMs
Depending on whether the graph is directed or undirected, we can classify
graphical models into Bayesian networks and Markov networks, respectively.
Both types contain the properties of factorization and independence. How-
7
ever, they can encode the different sets of independence and the factorization
of the distribution. Each type is discussed as follows.
2.2 Bayesian networks: Directed graphical models
Bayesian network is a knowledge-based graphical (18;19) representation that
depicts a set of variables and their probabilistic relationships among diseases
and the corresponding symptoms. In other words, Bayesian networks repre-
sent probability distributions that can be calculated by-product of the local
conditional probability distribution. To understand, let us use the notation
I(p) to denote the set of all independencies for a joint distribution p. If
p(x|y) = p(x)p(y), we say x⊥y∈I(p)
The Bayesian network can describe many independencies in I(p); such
independencies can be retrieved from the directed graphs. For example, a
Bayes net Gwith three nodes A, B and, C, could have essentially three
different possible structures with different independence assumptions, as
shown in Figure 3.
Figure 3: Bayesian networks over three variables, encoding different types
of dependencies
These structures describe the independencies in a three-variable Bayesian
net. We can extend it to general networks by recursively applying them over
any larger graph. This leads to a notion called d-separation (where d stands
for directed).
Bayesian networks are used to find the probability of possible diseases
to occur, given their symptoms. These networks take advantage of their
property to require the knowledge and conclusions of domain experts in the
form of probabilities. However, it is not viable for large complex systems
given multiple symptoms. To understand the Bayesian networks, let us
consider a canonical example of a Researchers network. The setting of the
graph, as shown in Figure 4, consists of four variables which are given as
8
follows.
Figure 4: Bayesian networks: Directed graphical models
1. Difficulty: Takes values 0 and 1 for minimum and maximum diffi-
culty, respectively.
2. Intelligence: Takes values 0 and 1 for not intelligent and intelligent,
respectively.
3. Research output: Takes values 1, 2 and 3 for good, average and bad
research, respectively
4. Research articles: shows the number of research articles published.
The edges in the graph show the dependencies in the graph. The Research
Output of a researcher depends on the Difficulty of the research area and
the Intelligence of the researcher. The Research output, in turn, determines
whether the researcher publishes a good, average, or a bad number of pub-
lications. Note that the direction of arrows shows the cause-effect relation-
ships. Difficulty affects the Research Output score, but the Research Output
does not influence the Difficulty. Finally, let us look at the tables associated
with each of the nodes. Formally, these are called conditional probability
distributions (CPDs), as shown in Figure 5. The CPDs for Difficulty and
Intelligence are easy to compute, because these variables are independent.
The tables basically encode the probabilities of these variables, taking 0 or
1 as values. As you might have noticed, the values in each of the rows must
sum to 1.
9
Figure 5: Bayesian network with conditional probability distributions
(CPDs)
Next, let us look at the CPD for Research Output. Each row corresponds
to the values that its parent (Difficulty and Intelligence) can take, and each
column corresponds to the values that Research Output can take. Each
cell has the conditional probability p(RsearchOutput =RS|Intelligence =
I, Dif ficulty =D), that is, given that the value of Intelligence and Diffi-
culty , what is the probability of the value of Research Output being RS. For
example, as P(ResearchOutput =RS1|Dif f iculty =D1, Intelligence =
I1) is 0.5, that is, if the intelligence of the researcher and the difficulty of
the research area is high, then the probability of the research output to be
good is 0.5. The CPD for “Research Articles” is easy to understand with the
above knowledge. Because it has one parent, the conditional probabilities
will be of the formP(ResearchArticles =RA1|ResearchOutput =RS1,
that is, what is the probability of “Research Articles” being RA, given that
the value of “Research Output” is RS. Each row now corresponds to a pair
of values of “Research Output” Again, the row values add up to 1. An essen-
tial requirement for Bayesian networks is that the graph must be a directed
acyclic graph (DAG).
In literature,20 designed a clinical decision support system was designed to
help general practitioners assess the need for orthodontic treatment in pa-
tients with permanent dentition. Particularly, a Bayesian network (BN) is
used as the underlying model for assessing the need for orthodontic treat-
ment. Around one thousand permanent dentition patient’s datasets chosen
from a hospital record system were prepared in which one data element rep-
resented one participant with information for all variables and their stated
need for orthodontic treatment. The proposed system in this work provided
promising results; it showed a high classification accuracy for classifying
10
groups into needing and not needing orthodontic treatment.
2.3 Markov Random fields
Although Bayesian networks can compactly represent interesting probabil-
ity distributions. Nevertheless, some distributions may have independent
assumptions that cannot be well represented by the structure of a Bayesian
network. To address such challenges there exist another technique for com-
pactly representing and visualizing a probability distribution that is based
on the language of undirected graphs called Markov Random fields21;22. Let
us take a motivating example to understand the Markov Random fields.
Suppose that we are modeling voting preferences among persons W ,X , Y,
and Z. Let us say that (W,X), (X,Y),(Y,Z), and(Z,W), are relatives. More-
over, relatives have similar voting preferences. These relationships can be
naturally depicted by undirected graphs, as shown in Figure 6.
Figure 6: Undirected graphical representation of a joint probability of voting
preferences over four individuals. Colors illustrates the pairwise preference
present in the model
A Markov Random Field (MRF) is a probability distribution pover vari-
ables defined by an undirected graph G in which nodes represent variables
xi. The probability pis given by.
p(x1, ..., xn) = 1
ZY
c∈C
βc(Xc)
Where Cdenotes the set of cliques of G, and each Factor βcis a non-negative
function over cliques. The partition function
Z=X
x1,...,Xn
Y
c∈C
βc(Xc)
11
Scheme Method Use Case
27 MRF Vertebral Tumor Prediction
28 MRF Tumor segmentation and gene-expression based classification
29 MRF Segmented MRI-based partial volume correction in PET
30 MRF Unsupervised 4D myocardium segmentation
31 MRF EMR-based medical knowledge representation and inference
32 BN Identifying Risk Factors of Depression in Middle-Aged Persons
33 BN ECG or Heart rate monitoring
34 BN Human activity recognition
Table 1: Applications of MRF and BN in healthcare
is a normalizing constant which sums the distribution to one. Hence, given a
graph G, there might be factors in the probability distribution whose scope
is any clique in G, it can be a single node, an edge, etc. It is important to
note that there is no need to specify a factor for each clique. In the example
above, a factor is defined over each edge (which is a clique of two nodes).
Nevertheless, cliques over single nodes have been specified.
In regards to the application of Bayesian networks and Markov random
fields,23 presents a Firefly Algorithm and Shannon Entropy (FA+SE) based
multi-threshold to increase the pneumonia lesion and implements Markov
random field segmentation to identify the lesions with better accuracy. 24
developed a system based on Bayesian network which uses Bayesian reason-
ing to compute posterior probabilities of possible diagnoses depending on
the given symptoms. This system was developed for diagnosis in Internal
Medicine and now covers about 1500 diagnoses in this domain, based on
thousands findings.25 proposed a system called DXplain which uses a mod-
ified form of the Bayesian networks. It generates a list of ranked diagnoses
associated with the given symptoms. It finds its use particularly for health-
care practitioners who lack computer expertise. It is also used as a reference
with a searchable database of diseases and clinical manifestations. Further-
more, SimulConsult26 , utilizes Bayesian networks to input data in a scalable
fashion and compute probabilities, accomplishing it by focusing specialty by
specialty. It uses a statistical pattern-matching method which consists of
the onset and offset of the findings in each disease. Table 1 presents a sum-
mary of a few past developments and applications of BN and MRF in the
healthcare sector.
2.4 Deep Neural Networks
Deep Neural Network (DNN)35;36 are non-knowledge-based decision support
systems which are adaptive in nature. They learn from existing knowledge
and experiences (data). A typical workflow of neural networks in healthcare
12
is shown in Figure 7.
Figure 7: The illustration of major phases for development of machine learn-
ing (ML) based healthcare systems. Figure adopted from 37
The architecture of Neural Networks mainly consists of three layers: In-
put, Output, and Hidden layer(s). These networks are made of nodes called
neurons. Weights and biases are the connection between nodes of different
layers, which are used to propagate the input between the nodes. Neural
Network is able to work with incomplete data which gives educated guesses
about missing data and get improved with adaptive system learning. A
method for training in an unsupervised fashion is autoencoders38 . An au-
toencoder learns features of a dataset, typically of lower dimensions. Au-
toencoder is a type of NN that learns to reconstruct its input in the output.
It has an internal representation layer that describes a code used to repre-
sent the input, and it is made up of an encoder that translates the input into
the latent space, and a decoder that maps the latent space to reconstructed
input. A lot of improvement has been made so far in the architecture and
algorithm of NNs to make them learn without any supervised pretraining.
Such as, the use of RELU activation f(z) = max(z, 0), which learns more
efficiently in a multi-layer model. A typical NN is depicted in Figure 8.
13
Figure 8: Deep Neural Network (DNN) architecture
In this DNN the convolutional layer along with the max-pooling layers
are used for feature extraction and the dense layer is used for classifica-
tion. The output dense layer often uses a sigmoid function in the case of
binary classification and a softmax function in the case of multiclass classi-
fication. Deep neural networks have been used in several applications, such
as image classification, computer vision, activity recognition, and deep rein-
forcement learning. For example,39 proposed a healthcare decision support
system based on Jordan/Elman neural network for the diagnosis of epilepsy.
The proposed system obtained comparatively high overall training accuracy
99.83% and for cross-validation data and testing accuracy of 99.92%. A
decision support system based on a neural network for the classification of
heart-related diseases into 5 categories of heart disease with 97.5% accuracy
by using multilayer perceptron with backpropagation training algorithm is
proposed in40 . In41 researchers proposed a decision support system using
an artificial neural network to classify the fetal delivery method into nor-
mal or surgical. They primarily used three different algorithms to train the
neural network: Radial Basis function, Back Propagation algorithm, and
Learning vector quantization Network. They obtained reliable results of ac-
curacy 99%, 93.75%, and 87.5% respectively. Researchers have proposed a
large number of methods to apply Decision Support Systems in health care.
For example, 42 explains the role played decision support system which is a
computer-based system that aids in the process of decision-making in order
to ensure correct diagnosis of any illness.43 describe the medical decision-
support system for the mediastinal staging of non-small cell lung cancer,
which is also known as called Mediastinet. Table 2 presents a summary of
a few works in the applications of neural networks for healthcare.
2.5 Neural networks with Probabilistic Graphical Models
There are certain limitations and challenges for NNs.
1. Interpretability: DNNs are like black boxes, hence it is often com-
14
Scheme Method Use Case
44 MLP Diagnosing diabetes
45 RNN Clinical intervention prediction and understanding
46 NN Emotion Recognition for Healthcare Surveillance
47 CNN ECG biometric recognition
48 RNN ECG signal denoising
49 DNN ECG-based cardiac arrest pulse detection
50 CNN ECG Classification
51 CNN ECG Arrhythmias detection
Table 2: Applications of Neural networks in healthcare
plex and difficult to explain the reasons behind their predictions and
decisions. Such aspects are important for some applications such as
medical prognosis and diagnosis52.
2. Uncertainty measure: DNNs cannot provide a quantification of the
uncertainty of their decisions and outputs.
3. Robustness: similar to the first point, it is hard to know about the
aspect of input they are using to take decisions.
Probabilistic graphical models (PGMs) can help to solve these shortcomings,
thereafter there is an opportunity to use these approaches to take advantage
of their complementary strengths. For example, PGMS provides a practi-
cal way to represent dependence relationships between variables and spatial
relations53;54. DNN outperforms other approaches in classification. Thus
to integrate PGMs and DNN is by representing the structure of a complex
problem through PGM, followed by the use of DNNs as classifiers for dif-
ferent elements of the underlying problem. The DNNs, trained on labeled
data, provide an initial estimation; then these initial estimates can be com-
bined and improved through belief propagation in the graphical model. This
approach can be used for efficient training of the model, as each one only
considers a particular dataset. Spatial analysis problems are hybrid systems
in which the above-mentioned systems can be useful. For instance, human
activity estimation, in which body parts have a certain spatial structure;
this structure provides constraints that can be used by a graphical model.
The spatial constraints between the distinct elements in the model can be
represented in terms of a Markov network, showing the constraints as the lo-
cal joint probabilities of neighboring elements. These elements are detected
and classified using a DNN. Another type of such problem is temporal mod-
eling. In temporal modeling, the output change and evolve over the time
usually depending on the previous state, for example, time-series. Markov
chain and hidden Markov are often used to represent such problems. In the
15
hybrid system, DNNs can be used to classify the state-based observation,
and the Markov model can be used for encoding the temporal relations. The
application of such systems is human activity recognition and speech recog-
nition. A toy example of such hybrid systems has been shown in Figure
9. Variational Autoencoders (VAE) are another such example. Due to the
increasing popularity of VAEs in anomaly detection, they have been used
in various fields, such as, Healthcare55 , cybersecurity56 and various other
applications being discovered constantly with time. One of the important
use of VAEs in healthcare is anomaly detection. The idea is to train the
VAE using normal data and note down the corresponding reconstruction
error. When the VAE is subjected to anomaly data the reconstruction error
is usually high. Hence, data with reconstruction error more than that of the
normal data is considered anomalous. In order to give more in-depth details
owing to the growing utility of VAEs in anomaly detection, we will discuss
a case study about VAE in healthcare anomaly detection in section 6.2.
Figure 9: A hybrid DNN classifier and a hidden Markov model architecture
Researchers have proposed such hybrid architectures for human activ-
ity recognition. For example, in 57 an architecture for the recognition of
the human posture in video sequences was developed. The proposed model
consists of a Convolutional Neural Network-based detector and a Hidden
Markov Model (CoHMM). The integration of both models allows learning
spatial and temporal dependencies. The detector recognizes the different
joints based on a CNN, and uses the spatial correlations between neighbor-
ing regions through a Conditional Random Field (CRF)58. Whereas, the
CoHMMs computes the best possible movement sequence among interacting
processes.
16
More interesting research that combines deep neural networks and graphical
models include: conditional random fields as recurrent neural networks have
been summarized in59 .
3 Artificial Intelligence in Healthcare Applications
In the last decade, Artificial intelligence (AI) has revolutionized the health
care system60;61. AI is bringing a paradigm shift to healthcare, powered
by the increasing availability of healthcare data and rapid progress of an-
alytic techniques. AI can be applied to various types of healthcare data
(structured and unstructured) to help practitioners diagnose the underlying
health issue in the early stages with more accuracy and efficiency. In this
section, we review the current AI applications in healthcare and also discuss
their future applications. The usefulness and the advantages of AI have been
extensively discussed in62;63;64 . ML constructs data analysis algorithms to
extract features from data. ML algorithm’s inputs include patient ‘traits’
and sometimes medical outcomes of interest. A patient’s traits commonly
include baseline data, such as age, gender, disease history, Xray-images,
electrocardiogram (ECG). This may also include disease-specific data, gene
expressions, EP test, physical examination results, clinical symptoms, med-
ication. Depending on the outcomes and the input data, ML algorithms can
be classified into two major categories: unsupervised learning and super-
vised learning. Unsupervised learning (UL) is a type of algorithm that learns
patterns from untagged data. In unsupervised learning, through mimicry,
the machine is forced to build a compact internal representation of the un-
derlying traits of data. In supervised learning (SL) the data is tagged or
labeled by a human, e.g., as ”car” or ”fish” etc. Unsupervised learning is
well known for feature extraction, while supervised learning is suitable for
predictive modeling by building some relationships between the input and
the outcome of interest. Recently, a hybrid of unsupervised learning and su-
pervised learning, known as semisupervised learning has been proposed as
which is suitable for scenarios where the outcome is missing for certain sub-
jects. Clustering and principal component analysis (PCA) are two famous
and extensively used unsupervised learning techniques. Clustering groups
data points with similar features together into clusters, without using the
labeled outcome information. Clustering algorithms predict the cluster la-
bels as output for the given input data point by maximizing and minimizing
the similarity of the data within and between the clusters. Most popular
clustering algorithms include k-means, hierarchical, and Gaussian mixture
clustering. On the other hand, PCA plays a key role in the dimension
reduction of complex data. Especially, when the recorded data has multi-
dimensions. For example, the number of genes in a genome-wide association
study. PCA works by projecting data into a few principal component (PC)
17
directions, without losing too much information about the underlying data.
It is sometimes recommended to use PCA for multi-dimensional data and
then use clustering for better, efficient clustering of data.
In regards to supervised learning, it considers the subject’s outcomes
together with their features, and goes through a certain training process to
determine the best outputs associated with the inputs that are closest to
the mean outcomes . Usually, the output formulations vary and are depen-
dent on the underlying method and problem being solved. For example, the
outcome can be the probability of getting a particular clinical event, the
expected value of a disease level, or the expected survival duration.
The application of artificial intelligence especially in healthcare is well
studied in literature65. For example, the Internet of medical things (IoMT)
integrates healthcare devices, sensors and machine learning algorithms to
provide new applications in healthcare66 . Machines based on artificial intel-
ligence can add support in healthcare by providing continuous automatic
monitoring and alerting the healthcare provider or clinical practitioners
through an alert system. Moreover, these devices can also help in decision-
making through DSSs. One of the major advantages of this transforma-
tion is the transition of tasks from a manual, hectic and time-consuming
methodology to smart, automatic, and time-efficient systems in healthcare.
Additionally, these systems help clinical practitioners to attend to patients
in emergency cases by providing timely information. DNN has always out-
performed in healthcare by providing hybrid architectures and blended con-
cepts like convolutional neural networks (CNN), to enable new healthcare
solutions. Due to the variety of healthcare data including clinical data,
HAR data, it is difficult for humans to infer the data for decision making.
Accordingly, ML has been used in healthcare for better understanding of
data and for better decision-making process 67 . For example, 68 proposed a
CNN-based classifier architecture for a health case study on an ECG clas-
sification.69 proposed ML algorithms for early prediction of pathological
complete response (PCR) to neoadjuvant chemotherapy and survival out-
come of breast cancer patients using Multiparametric Magnetic Resonance
Imaging (mpMRI) data and eight different deep neural network-based clas-
sifiers. In this regard, decision-making is incorporated at the edge thereby
sending notifications to the user in the case of disease detection. This gives
timely information for decision-making at the initial stage of the healthcare
monitoring and improving the healthcare system.
In conclusion, we believe that AI has an important role to play in health-
care in the future. In the form of machine learning, it plays a primary role in
the development of precision medicine, and healthcare solutions. Although
early efforts at providing prognosis, diagnosis, and treatment recommen-
dations are challenging, it can be seen that AI will ultimately cope with
these challenges as well . Given the fast research advances in AI for imaging
18
analysis, it can be seen that most radiology and pathology images will be
examined at some point by machines using machine learning models. As for
now, automatic speech and text recognition systems are already employed
for tasks like patient communication and for clinical notes, and usage of such
systems is continuously increasing.
One of the greatest challenges to AI application in healthcare is to en-
sure their adoption in daily clinical practice. For adoption to take place,
AI systems must be approved by regulators, and standardized to the ex-
tent that similar products work in a similar way. Such challenges will be
overcome ultimately, but they will take time for the technologies themselves
to be practical enough. As a result, we see limited use of AI in clinical
practice for the coming decade, but with the rapidly improving research can
make use of such a system in real life soon. For more interesting works
about the application of artificial intelligence in decision support systems,
we recommend users to read70;71 .
4 Healthcare Decision Support Systems based on
Probabilistic Graphical Models
Decision support systems (DSSs)72;73;74 have been widely used in the field
of healthcare for assisting physicians and other healthcare professionals with
decision-making tasks, for example, for analyzing patient data75;76;77;78.
DSSs are mainly based on two mainstream approaches: knowledge-based
and nonknowledge based. The knowledge-based DSS consists of two prin-
cipal components: the knowledge database and the inference engine. The
knowledge database contains the rules and associations of compiled data
which often take the form of if–then rules, whereas the inference engine
combines the rules from the knowledge database with the real patients’
data in order to generate new knowledge and to propose a set of suitable ac-
tions. Different methodologies have been proposed for designing healthcare
knowledge databases and inference engines, such as the ontological repre-
sentation of information79 . The nonknowledge-based DSSs have no direct
clinical knowledge about a particular healthcare process, however, they learn
clinical rules from past experiences and by finding patterns in clinical data.
For example, various machine learning algorithms such as decision trees rep-
resent methodologies for learning healthcare and clinical knowledge. Both
of these approaches could be used in conjunction with AmI technologies.
Indeed, the sensitive, adaptive, and unobtrusive nature of AmI is particu-
larly suitable for designing decision support systems capable of supporting
medical staff in critical decisions. In particular, AmI technology enables
the design of the third generation of telecare systems. The first generation
was the panic-alarms gadgets, often worn as pendants or around the wrist
to allow a person to summon help in the case of a fall or other kinds of a
19
health emergency. The second generation of telecare systems uses sensors
to automatically detect situations where assistance or medical decisions are
needed. Finally, the third generation represents AmI-based systems that
move away from the simple reactive approach and adopt a proactive strat-
egy capable of anticipating emergency situations. As a result, DSSs could
be used with multimodal sensing and wearable computing technologies for
constantly monitoring all important signs of a patient and for analyzing
such data in order to take real-time decisions and opportunely support peo-
ple. Finally, DSSs are jointly used with the AmI paradigm for enhancing
communications among health personnel such as doctors and nurses. For
example, in80 researchers have introduced a DSS based on context-aware
knowledge modeling aimed at facilitating the communication and improv-
ing the capability to take decisions among clinical practitioners located in
different locations.
5 Perspectives for Healthcare Decision Support
Systems based on Probabilistic Graphical Mod-
els
Probabilistic models such as deep neural networks have become popular
in medical applications, especially as healthcare support for computerized-
aided diagnosis and prognosis. Although, such probabilistic models provide
promising results and attract attention in healthcare research, real-life im-
plementation of such models would not be that easy. Firstly, there are no
clear regulations. Current regulations lack standards to measure the safety
and efficacy of probabilistic models. In order to overcome such issues, the
US FDA provided guidance for assessing probabilistic models systems81 .
It classifies probabilistic models to be general wellness products, which are
loosely regulated as long as the models are intended for general wellness and
have low risk to the users. They also provide guidance for adaptive design
in healthcare trials. Secondly, since healthcare data is highly sensitive, ex-
changing it among geographically far located parties governs privacy and
security challenges. Moreover, the data should be protected under GDPR
regulations. Techniques like encryption, differential privacy, and federated
learning can be applied to provide security and privacy to the data. How-
ever, such techniques come with a trade-off between privacy, security, and
accuracy. Here accuracy is a broader term for metrics like precision, recall,
and F1-score. Another hurdle in the implementation of probabilistic sys-
tems in healthcare is data ownership and incentives. Currently, there are no
clearly defined regulations for the ownership of data. Moreover, the current
healthcare environment does not provide incentives to the data owners for
sharing data on the system. Nevertheless, research is underway to stimulate
data sharing. The research is oriented toward changing the health service
20
payment systems. Many payers, such as insurance companies, have shifted
from rewarding the physicians by shifting the treatment volume to the treat-
ment outcome. Additionally, the payers also reimburse for a medication or
a treatment procedure by considering its efficiency. Under such an envi-
ronment, all the parties in the healthcare system, the clinical physicians,
pharmaceutical companies, and patients, have more incentives to compile
and exchange information.
Other than the regulations, the key challenges and perspectives for the
implementation of probabilistic systems in healthcare include those intrinsic
to the science of machine learning, logistical difficulties in implementation,
and consideration of the barriers to adoption as well as of the necessary
socio-cultural or pathway changes. Robust peer-reviewed clinical evaluation
as part of randomized controlled trials should be developed as a standard
for evidence generation, but in practice, it may not always be appropriate
or feasible. Performance evaluations should focus to capture real clinical ap-
plicability and be interpretable and understandable to the intended users.
Research for regulation to access the innovation with the potential for harm,
alongside thoughtful post-market surveillance, is needed to ensure that pa-
tients are not exposed to dangerous health and finical risks. Methodolo-
gies should be developed to enable make direct comparisons of probabilistic
models, including the use of independent, local, and baseline test datasets.
Research and development of probabilistic algorithms must be vigilant to
potential dangers, including dataset shift, accidental fitting of confounders,
unintended bias, the issues of generalization to new datasets, and the unin-
tended bad consequences of new algorithms on health outcomes.
In summary, the key future perspectives about the implementation of
probabilistic models in healthcare are as follow.
1. The data should be regulated properly by clear regulatory policies.
The proper mechanism should be developed to ensure the security
and privacy of data under GDPR regulations. For example, tech-
niques like federated learning82 can address such issues, but issues in
federated learning like data heterogeneity, privacy leakages needed to
be addressed83;84
2. Proper metrics should be developed to measure the risk and unin-
tended harm to users by probabilistic models for healthcare and clin-
ical practice.
3. Proper interpretable guidance and mechanisms should be developed
to understand the results of probabilistic models for healthcare and
clinical practice. For example, explainable artificial intelligence can
be used to interpret the result of deep neural networks51 .
4. Proper incentive mechanisms should to developed to reward the data
owners85 .
21
6 Case Studies
Discriminative and generative models are widely used machine learning mod-
els for ECG classification in healthcare. For example, logistic regression,
support vector machines, are popular discriminative models and variational
autoencoders (VAE) and autoencoder are examples of generative models. In
this section, we provide a case study to explore the discriminative model’s
graphical structure as PGMs, using logistic regression as an example for
ECG classification. We also provide a case study to explore the generative
model’s graphical structure as PGMs, using VAEs as an example for ECG
anomaly detection.
6.1 Logistic Regression for ECG classification
Suppose that we are solving a classification problem to decide if an ECG
signal is benign or not. We have a joint model over labels Y=y, and
features X=x1, ...xn. The joint distribution of the model is represented as
p(Y, X ) = P(y, x1, ...xn). Our aim is to estimate the probability of benign
ECG signal: P(Y= 1|X). To get the conditional probability P(Y|X),
discriminative models assume the functional form for P(Y|X) and estimate
parameters of P(Y|X) directly from training data.
Figure 10: Directed graphical model
In graph 10, the circles represent variable(s) and the arrow indicates
what probabilities can be inferred. In our example, Xis the ECG signal
and Yis the unknown class of the ECG signal. We see that the arrow is
pointing from Xto Y, indicating that we can infer P(Y|X) directly from
the given X.
22
Figure 11: Graphical representation of input and output relationship
Graph 11 represents the probability distribution of the model when fea-
ture Xis expanded. We can see that each feature xidepends on all the
previous features. This will have no effect as the model simply treats Xas
given facts and it estimates P(Y|X). As mentioned earlier that the model
estimates the probability from the training data:
P(Y|X) = p(y|x1. . . xn) (5)
In logistic regression, we parameterize the probability as
P(Y= 1|X;β) = 1
1 + exp(β0+Pn
i=1 βixi)(6)
Here, maximum likelihood estimation is used to estimate the parameters,
followed by classification into benign and not benign ECG.
6.2 Variational Autoencoder for ECG anomaly Detection
In this section, we will present a very influential probabilistic model called
the Variational autoencoders (VAEs) as a case study. VAEs are a type of
generative deep learning method which learns latent representations. Fig-
ure 12 shows a typical structure of a VAE. VAEs have also been used to
draw images, achieve state-of-the-art results in semi-supervised learning for
healthcare.
23
Figure 12: The Structure of the Variational Autoencoder
VAEs consist of two main parts: encoder and decoder, where the en-
coder part models E(Z|X), Zis the latent representation, and Xis the
data. E(Z|X) is the function that maps the data to the latent variables.
The decoder function D(X|z) learns to generate new data using the latent
variables. It should be noted that in VAEs, unlike the autoencoders, the
distribution of Zis forced to be close to Normal distribution as possible.
With VAEs parametric distribution can be achieved. Hence, during the run
time, we can construct new samples from the normal distribution and feed
them to the encoder function to generate samples, as depicted in Figure
14. The main difference between traditional autoencoder and variational
autoencoders is that the former has no continuous latent space, while the
latter has continuous latent space (a sample is mapped to a probability dis-
tribution with a certain mean and variance). Figure 13 depicts a comparison
between the mapping of input data to latent space by an autoencoder and
a variational encoder.
Figure 13: (a) mapping of an input to latent space by autoencoder, (b)
mapping of an input to latent space by variational autoencoder.
The main objective of this case study is to use the VAE to learn the
24
latent representations of the data. We will use the VAE to map the data to
latent representation. Thereafter, we will visualize features to see the model
has generalized enough to learn the data clustering or to differentiate data
as normal or not normal. Note that we do not use labels because VAEs
are unsupervised machine learning approaches. To show the applicability,
we will use the ECG healthcare dataset for anomaly detection and visualize
the features that the model has learned. The reason for using VAE is to
get rid of labeling the data, as labeling data can be a hectic task. In this
case study, we trained a convolutional VAE for ECG anomaly detection. We
trained the VAE on normal ( particular distribution) ECG signals so that,
when not normal (different distribution) ECG signals are fed into the VAE,
the reconstruction loss is expected to be higher than a certain threshold.
The threshold is usually the reconstruction error of VAE for normal data. If
for certain data, the reconstruction error crosses the threshold, we see that
data point as not normal. In other words, via the reconstruction loss, we
can keep track if an ECG signal belongs to a particular distribution or not.
The VAE has optimized over 2 losses, the kl-loss and reconstruction loss (the
difference between the input ECG and the reconstructed ECG). The kl-loss
is the difference between the distribution of the latent space and a standard
Gaussian with mean zero and standard deviation one. In other words, kl-loss
is used the minimize the distance between the distribution between distinct
classes yet keep them separable. The kl loss or kl divergence between two
distributions A and B can be calculated as the negative sum of probability
of each event in A, multiplied by the log probability of b over the probability
of the event in A, as given by equation 7.
KL(A||B) = −X
x∈X
A(x) log( B(x)
A(x)) (7)
Where, || is divergence. This is to compress the distribution of the latent
space to the standard distribution. This helps the decoder to map from
every area of the latent space when decoding the input ECG signal. Figure
14 shows a graphical representation of the VAE for new sample generation.
25
Figure 14: VAE as a graphical model and its use to generate new samples
We used the public baseline ECG dataset 86 to train and test our VAE
for anomaly detection. Figure 15 shows a scatter plot of the latent space
generated by the encoder for the test dataset, after training for 50 epochs,
with SGD optimization. The color of each point reflects its associated re-
construction error. In other words, it shows the marking of each data point
that has crossed the error threshold as an anomaly in dark violet color, and
the normal data point as yellow. We can clearly see one large cluster of
points that seem quite on the normal side (yellow dots), with a dark-colored
cluster with a relatively high error term on the sides. It should be noted
that this VAE was just a toy example without any hyper-parameter tuning.
The performance can be enhanced by optimizing the hyper-parameters and
adjusting the layer structure of the VAE.
26
Figure 15: Anomaly detection using VAE
VAEs are widely used for a variety of machine learning tasks. This case
study was a practical example providing a simple example that can be used
to prototype and test it in healthcare application. The applicability of VAEs
can be enhanced by using explainable Artificial intelligence (XAI), While re-
constructing a sample in anomaly detection methods such as class activation
maps can be utilized to tap the neurons which fire for a given input and by
applying max-polling over the activation maps we can generate a spatial
saliency map which shows the regions of input signal that contribute more
tho the output signal. For an reconstruction which is marked as anomaly,
the regions with high vales of gradient mapping will be the contributor.
Hence, the region with cause the anomalous behaviour can be trace out.
The example shown in Figure 16 was introduced in51 to trace back the re-
gions which are responsible for predicting a particular class. The regions in
red are responsible for classification of the ECG signal in to the output class
(color weighted scale shows contribution of each region). Similar approaches
27
can be adopted in anomaly detection to trace and explain the regions which
cause the anomalous behaviour.
Figure 16: Explanation of the regions responsible for a particular class
To summarize, traditional PGMS work well with discrete variables. Nev-
ertheless, the neural network based PGMs enhance their abilities to con-
tinuous high-dimensional data. Generative Adversarial Networks (GAN),
Variational Autoencoders (VA), and Deep Belief Networks are well studied
and common in practice examples of PGMs. Hence, we explored the dis-
criminative and generative model graphical structure with PGMS to find
the hidden boundary that separates the ECG into two different classes i.e.,
benign and not benign. A similar approach can be used with other DNN as
well.
7 Conclusions
In this chapter, we discussed different decision support systems (DSSs) based
on probabilistic graphical methods and machine learning. DSSs were proven
to be a useful tool. They help in the reduction of prescription errors, and
help in prognosis, with a higher capacity than the previously used methods.
DSS has been shown to help healthcare practitioners and providers and in a
variety of decisions making and diagnosis tasks, and as of now, they actively
and efficiently support in providing quality healthcare service. Moreover,
they were proved to be useful in the standardization of protocols, adjust-
ments with a target, and warning systems. We noticed that both DSSs
28
based on classical PGMs and DSSs based on advanced machine learning
methods are extensively used in healthcare. However, PGMS has fallen out
a little due to the ubiquity of probabilistic methods like neural networks.
Nevertheless, we believe they still have the potential to be relevant in the
future, because of their explanatory and intuitive nature. They can be used
for modeling casual relationships and can be useful in learning the repre-
sentation of abstract or high-level concepts. As we saw in the chapter that
combining neural networks with graphical models could be very useful in
the domain of machine learning, especially in healthcare DSSs.
Meanwhile, we must take extra measures and precautions and careful
analysis when creating, implementing, and maintaining DSSs. In this re-
gards complete solutions will be required in practice, especially as DSS
continue to evolve in complexity through advances in AI, interoperability,
interpretability, and new sources of data.
References
[1] D. Aronsky, P. J. Haug, Diagnosing community-acquired pneumonia
with a bayesian network., in: Proceedings of the AMIA Symposium,
American Medical Informatics Association, p. 632.
[2] G. A. Gorry, M. S. Scott Morton, A framework for management infor-
mation systems (1971).
[3] J. Horsky, J. Aarts, L. Verheul, D. L. Seger, H. van der Sijs, D. W.
Bates, Clinical reasoning in the context of active decision support dur-
ing medication prescribing, International journal of medical informatics
97 (2017) 1–11.
[4] M. W. L. Moreira, J. J. P. C. Rodrigues, V. Korotaev, J. Al-Muhtadi,
N. Kumar, A comprehensive review on smart decision support systems
for health care, IEEE Systems Journal 13 (2019) 3536–3545.
[5] D. Feinleib, Big data bootcamp: What managers need to know to profit
from the big data revolution, Apress, 2014.
[6] A. Sunyaev, D. Chornyi, Supporting chronic disease care quality: de-
sign and implementation of a health service and its integration with
electronic health records, Journal of Data and Information Quality
(JDIQ) 3 (2012) 1–21.
[7] H. S. Goldberg, M. D. Paterno, R. W. Grundmeier, B. H. Rocha, J. M.
Hoffman, E. Tham, M. Swietlik, M. H. Schaeffer, D. Pabbathi, S. J.
Deakyne, et al., Use of a remote clinical decision support service for a
multicenter trial to implement prediction rules for children with minor
29
blunt head trauma, International journal of medical informatics 87
(2016) 101–110.
[8] D. Koller, N. Friedman, Probabilistic graphical models: principles and
techniques, MIT press, 2009.
[9] J. Wyatt, J. Liu, Basic concepts in medical informatics, Journal of
Epidemiology & Community Health 56 (2002) 808–812.
[10] S. B. Clauser, E. H. Wagner, E. J. A. Bowles, L. Tuzzio, S. M. Greene,
Improving modern cancer care through information technology, Amer-
ican journal of preventive medicine 40 (2011) S198–S207.
[11] J. Chiang, J. Furler, D. Boyle, M. Clark, J.-A. Manski-Nankervis, et al.,
Electronic clinical decision support tool for the evaluation of cardiovas-
cular risk in general practice: a pilot study, Australian family physician
46 (2017) 764.
[12] P. A. Williams, R. D. Furberg, J. E. Bagwell, K. A. LaBresh, Usability
testing and adaptation of the pediatric cardiovascular risk reduction
clinical decision support tool, JMIR human factors 3 (2016) e17.
[13] A. A. Montgomery, T. Fahey, T. J. Peters, C. MacIntosh, D. J. Sharp,
Evaluation of computer based clinical decision support system and risk
chart for management of hypertension in primary care: randomised
controlled trial, Bmj 320 (2000) 686–690.
[14] R. T. Sutton, D. Pincock, D. C. Baumgart, D. C. Sadowski, R. N. Fe-
dorak, K. I. Kroeker, An overview of clinical decision support systems:
benefits, risks, and strategies for success, NPJ digital medicine 3 (2020)
1–10.
[15] E. S. Berner, T. J. La Lande, Overview of clinical decision support
systems, in: Clinical decision support systems, Springer, 2007, pp.
3–22.
[16] S. Belciug, F. Gorunescu, Intelligent Decision Support Systems-A Jour-
ney to Smarter Healthcare, Springer, 2020.
[17] C. Schaarup, L. B. Pape-Haugaard, O. K. Hejlesen, et al., Models used
in clinical decision support systems supporting healthcare professionals
treating chronic wounds: systematic literature review, JMIR diabetes
3 (2018) e8316.
[18] J. Pearl, Bayesian networks (2011).
[19] D. Heckerman, A tutorial on learning with bayesian networks, Innova-
tions in Bayesian networks (2008) 33–82.
30
[20] B. Thanathornwong, Bayesian-based decision support system for as-
sessing the needs for orthodontic treatment, Healthcare informatics
research 24 (2018) 22.
[21] G. R. Cross, A. K. Jain, Markov random field texture models, IEEE
Transactions on Pattern Analysis and Machine Intelligence (1983) 25–
39.
[22] C. Wang, N. Komodakis, N. Paragios, Markov random field modeling,
inference & learning in computer vision & image understanding: A
survey, Computer Vision and Image Understanding 117 (2013) 1610–
1627.
[23] V. Rajinikanth, S. Kadry, K. P. Thanaraj, K. Kamalanand, S. Seo,
Firefly-algorithm supported scheme to detect covid-19 lesion in lung ct
scan images using shannon entropy and markov-random-field, arXiv
preprint arXiv:2004.09239 (2020).
[24] I. , Decision support system, ????
[25] G. O. Barnett, J. J. Cimino, J. A. Hupp, E. P. Hoffer, Dxplain: an
evolving diagnostic decision-support system, Jama 258 (1987) 67–74.
[26] S. , A simultaneous consult on your patient’s diagnosis,, ????
[27] A. Alsiddiky, W. Awwad, K. Bakarman, H. Fouad, N. M. Mahmoud,
Magnetic resonance imaging evaluation of vertebral tumor prediction
using hierarchical hidden markov random field model on internet of
medical things (iomt) platform, Measurement 159 (2020) 107772.
[28] A. B. Ashraf, S. C. Gavenonis, D. Daye, C. Mies, M. A. Rosen, D. Kon-
tos, A multichannel markov random field framework for tumor segmen-
tation with an application to classification of gene expression-based
breast cancer recurrence risk, IEEE transactions on medical imaging
32 (2012) 637–648.
[29] A. Bousse, S. Pedemonte, B. A. Thomas, K. Erlandsson, S. Ourselin,
S. Arridge, B. F. Hutton, Markov random field and gaussian mixture
for segmented mri-based partial volume correction in pet, Physics in
Medicine & Biology 57 (2012) 6681.
[30] L. Cordero-Grande, G. Vegas-S´anchez-Ferrero, P. Casaseca-de-la
Higuera, J. A. San-Rom´an-Calvar, A. Revilla-Orodea, M. Mart´ın-
Fern´andez, C. Alberola-L´opez, Unsupervised 4d myocardium segmen-
tation with a markov random field based deformable model, Medical
image analysis 15 (2011) 283–301.
31
[31] C. Zhao, J. Jiang, Y. Guan, X. Guo, B. He, Emr-based medical knowl-
edge representation and inference via markov random fields and dis-
tributed representation learning, Artificial intelligence in medicine 87
(2018) 49–59.
[32] F. J. Costello, C. Kim, C. M. Kang, K. C. Lee, Identifying high-
risk factors of depression in middle-aged persons with a novel sons and
spouses bayesian network model 8 (2020) 562.
[33] P. Gupta, B. Bhowmick, A. Pal, Mombat: Heart rate monitoring from
face video using pulse modeling and bayesian tracking, Computers in
biology and medicine 121 (2020) 103813.
[34] Z. Zhou, H. Yu, H. Shi, Human activity recognition based on improved
bayesian convolution network to analyze health care data using wear-
able iot device, IEEE Access 8 (2020) 86411–86418.
[35] O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed,
H. Arshad, State-of-the-art in artificial neural network applications: A
survey, Heliyon 4 (2018) e00938.
[36] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F. E. Alsaadi, A survey of
deep neural network architectures and their applications, Neurocom-
puting 234 (2017) 11–26.
[37] A. Qayyum, J. Qadir, M. Bilal, A. Al-Fuqaha, Secure and robust ma-
chine learning for healthcare: A survey, IEEE Reviews in Biomedical
Engineering 14 (2021) 156–180.
[38] I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep learning, vol-
ume 1, MIT press Cambridge, 2016.
[39] P. Kharat, S. Dudul, Clinical decision support system based on jor-
dan/elman neural networks, in: 2011 IEEE Recent Advances in Intel-
ligent Computational Systems, IEEE, pp. 255–259.
[40] M. Gudadhe, K. Wankhade, S. Dongre, Decision support system for
heart disease based on support vector machine and artificial neural
network, in: 2010 International Conference on Computer and Commu-
nication Technology (ICCCT), IEEE, pp. 741–745.
[41] R. Janghel, A. Shukla, R. Tiwari, P. Tiwari, Clinical decision support
system for fetal delivery using artificial neural network, in: 2009 Inter-
national Conference on New Trends in Information and Service Science,
IEEE, pp. 1070–1075.
[42] K. Rajalakshmi, S. C. Mohan, S. D. Babu, Decision support system in
healthcare industry, International Journal of computer applications 26
(2011) 42–44.
32
[43] M. Luque Gallego, Probabilistic graphical models for decision making
in medicine (2009).
[44] O. Karan, C. Bayraktar, H. G¨um¨u¸skaya, B. Karlık, Diagnosing diabetes
using neural networks on small mobile devices, Expert Systems with
Applications 39 (2012) 54–60.
[45] H. Suresh, N. Hunt, A. Johnson, L. A. Celi, P. Szolovits, M. Ghassemi,
Clinical intervention prediction and understanding with deep neural
networks (2017) 322–337.
[46] M. Dhuheir, A. Albaseer, E. Baccour, A. Erbad, M. Abdallah,
M. Hamdi, Emotion recognition for healthcare surveillance systems
using neural networks: A survey, in: 2021 International Wireless Com-
munications and Mobile Computing (IWCMC), IEEE, pp. 681–687.
[47] R. D. Labati, E. Mu˜noz, V. Piuri, R. Sassi, F. Scotti, Deep-ecg: Convo-
lutional neural networks for ecg biometric recognition, Pattern Recog-
nition Letters 126 (2019) 78–85.
[48] K. Antczak, Deep recurrent neural networks for ecg signal denoising,
arXiv preprint arXiv:1807.11551 (2018).
[49] A. Elola, E. Aramendi, U. Irusta, A. Pic´on, E. Alonso, P. Owens,
A. Idris, Deep neural networks for ecg-based pulse detection during
out-of-hospital cardiac arrest, Entropy 21 (2019) 305.
[50] Y. Wu, F. Yang, Y. Liu, X. Zha, S. Yuan, A comparison of 1-d and 2-d
deep convolutional neural networks in ecg classification, arXiv preprint
arXiv:1810.07088 (2018).
[51] A. Raza, K. P. Tran, L. Koehl, S. Li, Designing ecg monitoring health-
care system with federated transfer learning and explainable ai, arXiv
preprint arXiv:2105.12497 (2021).
[52] F. K. Doˇsilovi´c, M. Brˇci´c, N. Hlupi´c, Explainable artificial intelligence:
A survey, in: 2018 41st International convention on information and
communication technology, electronics and microelectronics (MIPRO),
IEEE, pp. 0210–0215.
[53] M. J. Johnson, D. Duvenaud, A. B. Wiltschko, S. R. Datta, R. P.
Adams, Structured vaes: Composing probabilistic graphical models
and variational autoencoders, arXiv preprint arXiv:1603.06277 2 (2016)
2016.
[54] F. Le, M. Srivatsa, K. K. Reddy, K. Roy, Using graphical models as
explanations in deep neural networks (2019) 283–289.
33
[55] T. Fernando, H. Gammulle, S. Denman, S. Sridharan, C. Fookes,
Deep learning for medical anomaly detection–a survey, arXiv preprint
arXiv:2012.02364 (2020).
[56] T. T. Huong, T. P. Bac, D. M. Long, T. D. Luong, N. M. Dan, B. D.
Thang, K. P. Tran, et al., Detecting cyberattacks using anomaly de-
tection in industrial control systems: A federated learning approach,
Computers in Industry 132 (2021) 103509.
[57] R. B. Zebad´ua, Human body pose tracking based on spatio-temporal
joints dependency learning (2018).
[58] X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, X. Wang, Multi-
context attention for human pose estimation, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp.
1831–1840.
[59] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du,
C. Huang, P. H. Torr, Conditional random fields as recurrent neural
networks, in: Proceedings of the IEEE international conference on
computer vision, pp. 1529–1537.
[60] A. Panesar, Machine learning and AI for healthcare, Springer, 2019.
[61] R. Shah, A. Chircu, Iot and ai in healthcare: A systematic literature
review., Issues in Information Systems 19 (2018).
[62] S. E. Dilsizian, E. L. Siegel, Artificial intelligence in medicine and
cardiac imaging: harnessing big data and advanced computing to pro-
vide personalized medical diagnosis and treatment, Current cardiology
reports 16 (2014) 441.
[63] V. L. Patel, E. H. Shortliffe, M. Stefanelli, P. Szolovits, M. R. Berthold,
R. Bellazzi, A. Abu-Hanna, The coming of age of artificial intelligence
in medicine, Artificial intelligence in medicine 46 (2009) 5–17.
[64] S. Jha, E. J. Topol, Adapting to artificial intelligence: radiologists and
pathologists as information specialists, Jama 316 (2016) 2353–2354.
[65] A. Qayyum, J. Qadir, M. Bilal, A. Al-Fuqaha, Secure and ro-
bust machine learning for healthcare: A survey, arXiv preprint
arXiv:2001.08103 (2020).
[66] S. Durga, R. Nag, E. Daniel, Survey on machine learning and deep
learning algorithms used in internet of things (iot) healthcare, in: 2019
3rd International Conference on Computing Methodologies and Com-
munication (ICCMC), pp. 1018–1022.
34
[67] K. R. Bisaso, G. T. Anguzu, S. A. Karungi, A. Kiragga, B. Castelnuovo,
A survey of machine learning applications in hiv clinical research and
care, Computers in biology and medicine 91 (2017) 366–371.
[68] I. Azimi, J. Takalo-Mattila, A. Anzanpour, A. M. Rahmani, J.-P. Soini-
nen, P. Liljeberg, Empowering healthcare iot systems with hierarchical
edge-based deep learning, in: Proceedings of the 2018 IEEE/ACM
International Conference on Connected Health: Applications, Systems
and Engineering Technologies, pp. 63–68.
[69] A. Tahmassebi, G. J. Wengert, T. H. Helbich, Z. Bago-Horvath,
S. Alaei, R. Bartsch, P. Dubsky, P. Baltzer, P. Clauser, P. Kapetas,
et al., Impact of machine learning with multiparametric magnetic res-
onance imaging of the breast for early prediction of response to neoad-
juvant chemotherapy and survival outcomes in breast cancer patients,
Investigative radiology 54 (2019) 110.
[70] S. Montani, M. Striani, Artificial intelligence in clinical decision sup-
port: a focused literature survey, Yearbook of medical informatics 28
(2019) 120–127.
[71] M. Fernandes, S. M. Vieira, F. Leite, C. Palos, S. Finkelstein, J. M.
Sousa, Clinical decision support systems for triage in the emergency
department using intelligent systems: a review, Artificial intelligence
in medicine 102 (2020) 101762.
[72] E. Turban, Implementing decision support systems: a survey, in: 1996
IEEE International Conference on Systems, Man and Cybernetics. In-
formation Intelligence and Systems (Cat. No. 96CH35929), volume 4,
IEEE, pp. 2540–2545.
[73] S. B. Eom, S. M. Lee, E. Kim, C. Somarajan, A survey of decision
support system applications (1988–1994), Journal of the Operational
Research Society 49 (1998) 109–120.
[74] S. Eom, E. Kim, A survey of decision support system applications
(1995–2001), Journal of the Operational Research Society 57 (2006)
1264–1278.
[75] M. Omichi, Y. Maki, T. Ohta, Y. Sekita, S. Fujisaku, A decision support
system for regional health care planning in a metropolitan area., Japan-
hospitals: the journal of the Japan Hospital Association 3 (1984) 19–23.
[76] G. Acampora, D. J. Cook, P. Rashidi, A. V. Vasilakos, A survey on
ambient intelligence in healthcare, Proceedings of the IEEE 101 (2013)
2470–2494.
35
[77] R. Snyder-Halpern, Assessing health care setting readiness for point of
care computerized clinical decision support system innovations., Out-
comes management for nursing practice 3 (1999) 118–127.
[78] M. Raza Perwez, N. Ahmad, M. Sajid Javaid, et al., A critical analysis
on efficacy of clinical decision support systems in health care domain,
in: Advanced Materials Research, volume 383, Trans Tech Publ, pp.
4043–4050.
[79] M. C. Kaptein, P. Markopoulos, B. De Ruyter, E. Aarts, Persuasion in
ambient intelligence, Journal of Ambient Intelligence and Humanized
Computing 1 (2010) 43–56.
[80] O. Anya, H. Tawfik, S. Amin, A. Nagar, K. Shaalan, Context-aware
knowledge modelling for decision support in e-health, in: The 2010
International Joint Conference on Neural Networks (IJCNN), pp. 1–7.
[81] J. Graham, Artificial intelligence, machine learning, and the fda. 2016,
????
[82] M. Aledhari, R. Razzak, R. M. Parizi, F. Saeed, Federated learning:
A survey on enabling technologies, protocols, and applications, IEEE
Access 8 (2020) 140699–140725.
[83] J. H. Yoo, H. Jeong, J. Lee, T.-M. Chung, Federated learning: Issues
in medical application, 2021.
[84] N. Papernot, P. McDaniel, A. Sinha, M. Wellman, Towards the sci-
ence of security and privacy in machine learning, arXiv preprint
arXiv:1611.03814 (2016).
[85] Y. Zhan, J. Zhang, Z. Hong, L. Wu, P. Li, S. Guo, A survey of incen-
tive mechanism design for federated learning, IEEE Transactions on
Emerging Topics in Computing (2021).
[86] G. B. Moody, R. G. Mark, The impact of the mit-bih arrhythmia
database, IEEE Engineering in Medicine and Biology Magazine 20
(2001) 45–50.
36