Conference PaperPDF Available

Heart Disease Diagnosis via Nonparametric Mixture Models

Authors:

Abstract and Figures

Effective and efficient ways to heart disease diagnosis can be improved via clustering individuals with heterogeneous characteristics to similar risk groups. This paper focuses on nonparametric density based cluster analysis on the risks of heart disease via nonparametric mixtures. Cluster density distributions for the nonparametric mixture model are done through Gaussian kernel density estimators using graph theory techniques. The cluster quality for the clusters from the models were analysed and diagnosed via a density based silhouette information criteria. Although the number of components is not assumed the same with clusters, results shows that individuals under heart disease risks can be grouped into two categories using two component model. It was also concluded that the individuals in different cluster have varying risk levels for heart disease.
Content may be subject to copyright.
Journal of Advances in Mathematics and Computer Science
27(5): 1-17, 2018; Article no.JAMCS.40440
ISSN: 2456-9968
(Past name: British Journal of Mathematics &Computer Science, Past ISSN: 2231-0851)
Heart Disease Diagnosis via Nonparametric Mixture
Models
Chipo Mufudza1and Hamza Erol2
1Department of Applied Mathematics, National University of Science and Technology,
Corner Cecil Avenue and Gwanda Road, Bulawayo, Zimbabwe.
2Department of Computer Engineering, Mersin University, C¸ iftlikkoy Campus, TR-33343,
Mersin, Turkey.
Authors’ contributions
This work was carried out in collaboration between both authors. Author CM designed the study,
performed the statistical analysis, wrote the protocol and wrote the first draft of the manuscript.
Author HE managed literature searches and the analyses of the study. Both authors read and
approved the final manuscript.
Article Information
DOI: 10.9734/JAMCS/2018/40440
Editor(s):
(1) Morteza Seddighin, Professor, Indiana University East Richmond, USA.
Reviewers:
(1) Ramesh M. Mirajkar, Dr. Babasaheb Ambedkar College, India.
(2) Michael Chen, California State University, USA.
Complete Peer review History: http://www.sciencedomain.org/review-history/25018
Received: 28th February 2018
Accepted: 17th May 2018
Original Research Article Published: 6th June 2018
Abstract
Aims/Objectives: Effective and efficient heart disease prediction via nonparametric mixture
regression models.
Data Source: Data used in this paper is from the UCI database of the Cleveland Clinic
Foundation for heart disease. The original data source contains 76 raw attributes with 303
observations each. For the purpose of this paper only 14 attributes were used as explained in
section 4.
Methodology: Cluster analysis was applied via mixture models in the form of Nonparametric
Density-based models. The clusters were identified using a graph theory based technique. Voronoi
diagrams were used and and their distributions were estimated nonparametrically through a
mixture model with Gaussian kernels. The optimal number of clusters and components of the
*Corresponding author: E-mail: chipo.mufudza@nust.ac.zw
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
identified clusters were determined, analysed and diagnosed using a density based silhouette
information criteria. All the data analysis and model diagnosis were performed in R using the
PdfCluster package.
Results: Different number of components resulted in different number of clusters when
nonparametric mixture are used on heart disease. However, the optimal number of clusters
under heart disease risks were found to be represented by two clusters with two components
using density based silhouette information criteria. These were both well separated and classified
as indicated by lack of spurious clusters and high positive density based silhouette values (See
Figs. 2 and 4). Their properties are given in Table 2. The result is irregardless of the flexible
conditions which are assumptions free on: shape of the distribution, number of components and
number of clusters.
Conclusion: When nonparametric mixture models are used, individuals under risks of heart
diseases can be diagnosed either under high or low risk depending on the dominant characteristics
on a given individual. Those under high risk have attributes that makes them progress to heart
diseases faster compared to those under low risk. Therefore by classifying individuals into
these categories, medical personnel can quickly diagonise heart disease and efficiently identify
characteristics associated with each category.
Keywords: Density based silhouette information; heart disease; Kernel Density Estimator;
Nonparametric Mixture Models.
2010 Mathematics Subject Classification: 53C25; 83C05; 57N16.
1 Introduction
Data clustering aims to partition data points given in a certain space. In general there exist
no universal rule to define clusters, thus many methods which include both parametric and non-
parametric have been proposed. In particular density based clustering methods also referred to as
mixture models have been of interest due to their ability to capture diverse heterogeneous properties
of the data. Parametric mixture models although very informative and easy to interpret, include
a lot of restrictions and assumptions on the shape and distribution of clusters which can be very
misleading hence wrong interpretations. Overcoming these restrictions can involve use of algorithms
that are assumption free or minimise assumptions, a direction which makes use of nonparametric
density estimations. It is the aim of this paper to concentrate on these nonparametric mixture
models due to their flexibility. Nonparametric methods derive their strength from the sample
data given thus reducing the assumption deficit posed by the parametric methods as they limit
number of assumptions. The nonparametric densities used in nonparametric mixture models can be
represented by histograms or density estimation methods which include a range of varieties although
we will concentrate on kernel density estimators (kde). Cluster formed from nonparametric mixtures
are derived from own cluster density functions that is unknown but estimated rather than assuming
a shape and distribution of a cluster. It is therefore this view for improved flexibility that we are
going to focus on nonparametric clustering methods through mixture models. Given the general
parametric mixture model shown by equation (1.1):
f(x) =
n
i
πjfj(x) (1.1)
It is evident that clusters are associated with the components fjand hence the shape and properties
are assumed to follow suit, whilst no assumptions are made with nonparametric clusters which
usually associated with regions of high density. Although the two approaches may sometimes lead
to the same results, they are totally very different and it is our ultimate purpose in this paper
2
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
to dwell on the nonparametric without imposing any assumptions on shape, density and cluster-
component factor. This is a more robust way of identifying and inferring subpopulations as it
can use modality inference on the number of modalities as alluded by [1] over the parametric
and hierarchical approach which estimates the number of cluster output. Nonparametric density
based clustering can also be done via smooth polynomials which model the different cluster density
estimations as shown by [2]. They implemented the use of Legendre polynomials, Gamma and Beta
mixtures to approximate nonparametric mixture models. In this paper we focus on nonparametric
mixture models on multivariate mixed count data for heart disease using Gaussian kernel density
estimators. Although most nonparametric mixture models have been built under the assumption of
identically independent components for identifiability, latent variable models with Gaussian kernels
mixtures distributions can be built from observed data of mixed type which can be ordinary, binary,
continuous and with component independence etc as explained in [3, 4], respectively.
2 Materials and Methods
2.1 Nonparametric density based clustering
Density based clustering can be done both nonparametrically and parametrically as explained by
[5, 6]. In particular it was observed that heart disease can be predicted via two risk groups namely
high and low risk groups via Poisson mixture regression model as explained in [7]. Nonparametric
mixtures models however, identify clusters as regions of high density separated by regions of low
densities which can be done by identifying local maximum values of the estimated density or modes
of the data. Whilst nonparametric methods associate the clusters to the regions around the modes
of the probability distribution of data, clusters in the model-based parametric approach correspond
to the components of a mixture of distributions. The difference merges clearly since the number of
the modes in a mixture of distributions does not necessarily match the number of components as
explained by [8].
There has been great interest on nonparametric density based clustering methods due to their
flexibility in cluster determination and inference and for any given data type. Different approaches
have also been applied making use of both nonparametric density estimations and sometimes
distances. Nonparametric density based clustering estimation techniques also allow data to model
relationships among variables, thus making them robust to functional form specification and hence
the ability to detect structure which sometimes remains undetected by traditional parametric
estimation techniques [9]. The whole idea of parametric mixture model (1.1) is to assume that
each component is identified by fja parametric density function, thus shape of each cluster is
approximated by the same distribution. The clustering problem is now an approximation of the
mixing parameters πjand parameters associated with the density function fj. This is done under
some conditions which makes the model (1.1) identifiable. The most commonly used densities
are Gaussian although a lot of variations and shapes have also been considered including using
skewed t distributions to capture for a more flexible way for the shapes of clusters. Nonparametric
density based clustering can come in as a relief, on the need to free individual clusters from a given
density shape hence explore density assumption free clusters. In many cases the parametric mixture
modeling comes with serious disparity between a component and a cluster caused by compliance to
geometric heuristics as alluded by [10]. If the cluster shapes do not match the shapes of the density
fj, the parametric mixture approach may face difficulties, a motivation for considering completely
assumption free cluster shapes densities via nonparametric formulation. The second challenge may
also involve variability in the case of the mixing proportions π
jsinstead of having them as constants.
Nonparametric clustering involves different approaches although the general idea uses kernel density
estimators (KDE) which are a representation of mixture functions. Although KDE can use either
3
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
fixed or adaptive smoothing/bandwidth, the later is usually used to cater for the adaptive and
changing mixing proportions of the different clusters. Therefore a nonparametric representation
which replaces model (1.1) can be achieved via clusters with kernel density functions. Researchers
which include [10] used these kernel density functions as a mixture densities for clusters to identify
a point of local maximum of density called a hilltop using the Modal EM (MEM) to identify
clusters. Different clusters were linked via ridgelines linking two hilltops developed, creating a more
accurate way of nonparametric clustering through geometric heuristics. Practical procedures on
nonparametric cluster quality diagnosis have been developed through the use of mean integrated
errors as well as nonparametric density based silhouettes information by [8]. In [11], they used
nonparametric estimates for finite mixtures from data on repeated measurements with the aim of
advancing statistical inference on such models.
The density based nonparametric clustering methods normally use clusters driven by density estimates
which determines associated connected regions. Unlike the parametric methods nonparametric
density based clustering methods do not choose particular density function but normally use kernel
density estimate with kernels one one’s choice. The Gaussian kernel has so far proved to be
the commonly used [12, 13, 14, 15]. General researchers use the multidimensional kernel density
estimator with adaptive bandwidth suggested by [16] and represented as equation (2.1):
ˆ
f(x) =
n
i=1
1
nhi,j .....hi,d Kxjxj
i
hi,j (2.1)
where his the bandwidth and Kid the kernel function. Thus the general use of nonparametric
density based clustering represent a mixture model.
Several authors have represented nonparametric mixture models in different ways with the general
assumption that each cluster is generated by its own unknown density function [15]. Nonparametric
mixture models can therefore also mean that no assumptions are made about the form of the density
fj of model (1.1), even though the weights πjmaybe scalar parameters as alluded by [17]. It
should however, be noted that the weights are not only restricted to scalar weights since they can
be variables. [18], defines nonparametric mixture modeling in a different sense such that the family
F from which the component densities come is fully specified up to a parameter Θ, but the mixing
distribution from which they are drawn is assumed to be completely unspecified and unknown, rather
than having finite support of known cardinality. Thus nonparametric mixture models can therefore
be used to describe the case in which no assumptions are made about the distribution form of the
mixture model, even though the parameter mixing parameter can be Euclidean. Semi parametric
models normally refers to cases where the distribution form is partly specified by a finite-valued
parameter, such as the case in which fj(x) = f(xµj) for a symmetric but otherwise completely
unspecified density f(), as proposed by [19] an idea used in [17, 20] for the multivariate cases. The
assumption that component distributions come from a family of densities that may be indexed by a
finite-dimensional parameter vector is normally ignored when dealing with nonparametric mixtures.
It is still however, necessary to restrict the family of multivariate density functions from which the
component densities are drawn in order to avoid the problem of model non-identifiability [20].
A number of nonparametric mixture models have considered that the observed variables are jointly
conditionally independent given the latent class and use kernel methods to identify the finite
mixtures of nonparametric distributions [21, 22]. Suppose θis a vector of parameters, including the
mixing proportions λ1, ......., λkand the univariate densities fjk where, jindexes the component and
kindexes the coordinate, so 1 < k < r and 1 < j < m. Thus, under the assumption of conditional
independence, the nonparametric mixture density evaluated at xj= (xi1, ..., xir)tis given as:
fθ(xi) =
m
j=1
λjr
k=1
fjk (xik)(2.2)
4
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
If we let zbe the latent random variable with probability mass function (PMF) : χ[0; 1], where
χ={z1, z2, ........., zk}for an integer K. x= (x1;x2;........xr) be a vector of repeated measurements
on an observable outcome variable xwhose marginal probability density function (PDF) takes the
form equation (2.2) and fjk denotes the PDF of xiconditional on z=zk. It should be noted
that xi’s sometimes are not only conditionally independent but identically distributed as well which
can be represented in terms of blocks as eluded in the work of [20], where each block represents
coordinates that follows the same univariate distribution.
Graph theory techniques such as voronoi diagrams can also be used to identify clusters by observing
closely concentrated connections then estimation of density function along an interval of connections
has been explored in [12]. This method suggests that any existence of a valley between point intervals
is a disconnection using Voronoi diagram partition and Delaunay graph as edge connection between
points. The same idea was developed by [13, 14] although [14] further made an improvement on the
measure on which the valleys exhibited by use of minimum mass probability necessary to fill the
valley. In this paper we will focus heart disease diagnosis using graph theory and KDE to estimate
the nonparametric mixture models.
2.2 Nonparametric mixtures using graph theory
When graph theory methods are used in mixture models, clusters are connected components
estimated using graph based algorithms. The number of clusters can then be chosen based on
nearest neighbourhood graph Xi:ˆ
Khkwhere Khis the kernel density estimator and kis the
number of clusters. The number of clusters is normally done on the basis of the number of connected
components of a level set f > c in the given graph a scenario explained by [23]. [24] proposed that
a combined kde with single linkage graph can be used to estimate the number of clusters. A lot
more authors who included [25] have also used density clustering via connected regions using data
driven bandwidth selection measures and stability of the identified clusters. In this sense mixtures
are formed by kde’s where an edge between two points indicate that they belong to the same ball ρ,
in the same cluster. The number of connected components will then correspond to the number of
clusters found in the sample. The same idea was used by [26] when they investigated the stability of
the density based clustering by altering different tuning parameters to the mixture models including
the kernels.
Graph theory can also be used to identify cluster trees and pruning the trees to do away with spurious
clusters in cases where they exist. [27] explored a plug-in approach to cluster tree estimation:
estimate the cluster tree of the feature density by the cluster tree of a density estimate. In [12, 13]
they described nonparametric mixture models differently in an interesting way via graph theory by
use of the KDE. They identified clusters using connected points by observing the density function
along a given interval in a multidimensional space. Clusters are then identified by an existence of
some disconnections between Voronoi diagram indicated through a valley using a mode function
which is associated with both components and probability. Thus points of high connections indicates
a mode of the kernel density and hence a single cluster. Connected regions are then identified
using Delaunay diagrams or paired in a high dimensional space. They demonstrated this via the
PdfCluster package in R [13, 14], an algorithm which makes use of the density based silhouette
information to differentiate clusters and inferences on them as described by [8]. Thus nonparametric
mixture models have that flexibility to identify clusters effectively through so many methods. This
is the reason why we are going to concentrate on some of these methods on real dataset. In this
context we will look at heart disease data using nonparametric mixture models, hence help in early
and efficient hear disease diagnosis. This can also result alleviating and reduce complex cardiological
problems.
5
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
2.2.1 Nonparametric mixtures: Graph theory procedure
We will explore some of the procedures used by [12] that if the estimated function is unimodal then
we have a connected cluster otherwise it is disconnected. Thus, if we have χ= (x1, ......., xn), xi
dto be clustered in a d dimensional space with unknown bounded and differentiable density
function, f. Then for each constant c, R(c) = {x:x∈ ℜd, f(x)c}. In this method a mode
function is described such that the function fis replaced by its nonparametric estimate ˆ
f. Therefore
in multidimensional space where there is no obvious way to determine the connected region the
interest is to focus on the sample set of such that we have
S(c) = {x:x∈ ℜd,ˆ
f(x)c}(2.3)
Thus we have a multidimensional problem of identifying connected regions. Graph theory is then
used to identify connected components as detection of connected components of a given graph G
whose elements are vertices of S(c) where the edge is key. The task is then done via Voronoi diagram
using either Delaunay triangulation for d3 and pairing for d > 3 and according to some measure
of distance. The Voronoi diagram is a partition of dgiven χinto k regions Υ(x1), ......, Υ(xn)
where Υ(xi) is the set of all points of dcloser to xithan to any other point in the set according
to some measure of distance. The connected components are identified as union of connected pairs
that share at least one vertex and in this way cluster cores are determined. The clusters cores are
formed by data lying in the regions around the detected modes. Clusters can also be detected by
a minimum spanning tree as explained by [28] which is a subgraph of the Delaunay triangulation
[14].
Fig. 1. Voronoi tessellation and superimposed Delaunay triangulation
Fig. 1 shows an example of how the connected points share one facet of Delaunay triangulation
as they belong to adjacent Voronoi regions. When dimension is high then pairwise connections are
implemented as in our case where d= 14. The basic idea is to examine the behaviour of ˆ
f(x),
the kde when we move along a segment [x1, x2], since it depends on whether the sample values x1
and x2belong to the same connected set of S(c) or not. Thus we can view set S(c) as a union of
the two intervals of the group. If x1and x2belong to the same interval, then the corresponding
portion of density along the segment (x1, x2) has no local minimum. On the contrary, if x1and
x2belong to different subsets of S(c), then at some point along [x1, x2] the density exhibits a
local minimum, which we shall refer to as presence of a valley [13]. The density estimates which
determines the connected regions are not linked to any particular density function but are purely
estimated. The commonly used density estimate is the multidimensional kernel density estimate
given before in equation (2.1). The choice of the kernel normally does not have an effect on the
estimate unlike the smoothing parameter. The bandwidth can be fixed and adaptive as suggested
according to [27]. In this paper although, both the fixed and adaptive bandwidth were developed and
analysed for heart disease, only adaptive bandwidth will be represented. Nonparametric mixture
6
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
models via graph theory can be described by use of the KDE where clusters are identified by an
existence of some disconnections between voronoi diagrams as indicated through a valley. Thus
points of high connections indicates a mode of the kernel density and hence a single cluster. They
demonstrated this via the PdfCluster [13, 14] algorithm which makes use of the density based
silhouette information to differentiate clusters and inferences on them as described in [8].
2.2.2 Density Based Silhouette Information (DBS) criteria
In general silhouette analysis is a form of internal cluster validation which can be used to study
the separation distance between and within the resulting clusters. It validates the clustering
performance based on the pairwise difference of between and within cluster distances. This index
can also be used to determine the optimal cluster number through maximizing the value of this
index [29]. The silhouette plot displays a measure of range [1,1] which show how close each point
in one cluster is to points in the neighboring clusters and thus provides a way to assess parameters
like number of clusters visually. Silhouette index (as these values are referred to as) near +1 indicate
that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample
is on or very close to the decision boundary between two neighboring clusters and negative values
indicate that those samples might have been assigned to the wrong cluster. The thickness of the
silhouette plot can also give an idea of the cluster size. Silhoutte information can also be used to
describe cluster compactness. This is a description of how objects are closely related in a single
cluster although this can be done via variance analysis. Therefore, silhouette information can be
used for both within and between cluster diagnosis.
The between and within distances are not always easy to calculate and apply with nonparametric
methods. An idea of silhouette information usage in nonparametric density-based clustering procedure
was developed by [8]. This was made possible via incorporating the idea of both within cluster
distances as well as cluster posterior probabilities same idea employed by [13, 14]. The dbs values
are calculated from the fact that if xiχis drawn from a probability density function f, one can
evaluate the posterior probability that it belongs to group υm, m =1:Mas:
τm(xi) = πmfm(xi)
πmfm(xi)(2.4)
where πmis the prior probability of υmand fmis the density of group υmat xi. Then the dbs is
defined as follows
dbsi=
log τm0(xi)
τm1(xi)
maxj=1:nlog τm0(xj)
τm1(xj)
(2.5)
where m0is such that xihas been classified in υm0and m1is the group into which τmis maximum
m̸=m0. Thus a dbs is vector reporting the density-based silhouette information of the clustered
data clusters [8, 13, 14]. It can be used as a diagnostic tool for cluster quality evaluation as
proposed where a high positive dbs value indicates well classified observations and clusters that
are well separated whilst a negative dbs indicates observations might have been assigned to wrong
clusters. The partitioning of clusters may indicates existence of hidden clusters (spurious) [8].
2.2.3 Bandwidth selection
The selection of an appropriate value of hthe bandwidth parameter which affects the density
estimations dramatically as explained in nonparametric methods is vital for best cluster results and
estimations. The choice of kernel since it is not as influential we are going to use Gaussian kernels
for all the estimations. Bandwidth selection methods have been studied intensively especially in
7
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
no mixture structures by authors who include [16, 30, 31]. Bandwidth selection in nonparametric
mixtures can be a very challenging procedure although some standard ideas can still be incorporated,
there seems to be challenges on under or over smoothing as explained by [32] in the case of
conditional independence. However, when Gaussian kernels are used under the assumption of
multivariate normality, [12] suggested that a bandwidth of the following format equation (2.6) can
be used to the adaptive Gaussian kernel in equation (2.1) namely
hj=σj4
(d+ 2)n1
(d+4) , j = 1, ..., d, (2.6)
where the standard deviation σjof the jth variable is replaced by an estimate. The bandwidth, hj
are then multiplied by a shrinkage factor of 3/4 in order to relieve the over smoothing determined
by computing the bandwidths under the assumption of multivariate normality. A square root law
can be used to allow variation for each dimension and on different scales. This brings flexibility in
such a way that different components are allowed to have different properties. Moreover, since the
bandwidth is iterative, the bandwidth estimations can be done prior to knowledge of the mixture
structure.
3 Results and Discussion
3.1 Data
The data used in this paper is found in the UCI database for the Cleveland Clinic Foundation
for heart disease as given by [33]. Original data is mixed data with 76 raw attributes and 303
observations for each attribute of which only 14 attributes are used in this paper. These include
number of diagnosis (num), Age, sex, chest pain, cholesterol, fasting blood sugar level (fbs),
rest blood pressure (trestbp), maximum heart rate (thalcd), resting electrocariographic (restecg),
exercise induced angina (exang), depression by exercise (oldpeak), slope of peak exercise (slope),
vessels (ca) and defects (thal). The data had some missing values which we replaced using the mean
response method. In the analysis continuous variables were taken to be log normal variables and
sometimes changed to log-normal whilst discrete variables with more than two levels were considered
ordinal. Nonparametric cluster analysis is done without prior information assumed or known. The
cluster density estimations, clusters and inferences on the clusters are solely based on the data and
hence the use of nonparametric methods. Kernel estimation methods with a Gaussian kernel were
considered in this paper to estimate the density distributions. Thus the ultimate distribution of the
data is a nonparametric mixture model which compromises of Gaussian kernel density estimates.
3.2 Packages used
Clustering of the data was nonparametrically done using graph theory methods as explained in
section 2.2.1. The dissimilarities distances among the points were calculated using the Gower
coefficient by [28] using the cluster package in R. The cluster package is also used to classify
multidimensional scaling of the data under the cmdscale function as described by [34] since the
data used was mixed data. Nonparametric density approximations are then implemented to the
mixed heart disease data using Gaussian kernel density estimators. This was implemented through
the PdfCluster package by Azzalini and Menardi [13]. The PdfCluster automatically selects the
procedure to be used for detecting connected components of the density level sets, depending on the
data dimensionality. This is enabled by making an internal call to function kepdf both to estimate
the density underlying the data and to build the connection network when the pairwise connection
criterion is selected. In this paper a kernel density estimation with Gaussian kernel is chosen and
built, with the vector of smoothing parameters set to the one asymptotically optimal under the
8
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
assumption of multivariate normality. We used both a fixed shrinked smoothing parameter and
adaptive bandwidth since the data is highly multidimensional. Cluster diagnosis in this method
uses the dbs criteria described before in section (2.2.2). The number of groups here is identified as
the number of modes of the estimated density used. Analysis was done under different number of
components and diagnosis of the clusters was done using dbs criteria from the PdfCluster package.
Therefore, use of the dbs method is applied to evaluate the cluster quality of Cleveland heart disease
data under nonparametric density based clustering using different principal components for the same
data.
3.3 Analysis covered
Analysis on the Cleveland heart disease data is done using PdfCluster to determine the number
of groups possible for heart disease prediction in a nonparametric way. Each of the cluster is
represented by a Gaussian kernel density funtion and hence the whole sample data as a mixture.
Both the fixed smoothing parameter (here not shown) and adaptive smoothing parameter via the
adaptive bandwidth as explained in section (2.2.3). Different number of components were used to
determine the number of clusters nonparametrically via connected regions as explained in section
(2.2.1). Results of the analysis is shown in the next sections (3.4 - 3.6).
3.4 Cluster diagnosis
In order to determine cluster optimal number of clusters and their quality, dbs criteria is used
for cluster diagnosis as explained in section 2.2.2. Different models with different number of
components result in a variety of clusters produced and hence the need to determine the quality
and optimality of the clusters. Fig. 2 shows dbs plot and values for models with different principal
components resulting in varying median dbs values for each given cluster. The cluster information
for a model with principal components i.e. p= 2,3,4,and 5 are given by Fig. 2(a), (b), (c) and (d),
respectively. The 2 component model has 2 clusters both with positive dbs median values, and no
partitions within each cluster. Both clusters have high positive median dbs values which indicates
that observations have been well classified. They are also well separated due to absence of within
dbs plot partition an evidence of no hidden clusters (spurious) within each of the clusters. The
quality of clusters seems to decrease as the number of principal component increases to 3 as shown
by Fig. 2(b) which shows the existence of 3 spurious clusters out of the 5 clusters. Furthermore,
the 2 other clusters have negative dbs median values a pointer that most of the observations were
wrongly classified. Whilst the 4 component model in Fig. 2(c) has 5 clusters, the 5 component
in Fig. 2(d) has 4 clusters most of them which has hidden clusters and with wrongly classified
observation as indicated by dbs plot partitions and with negative dbs median values.
The dbs plot and dbs median values for models with 8 and 10 components are given by Fig. 3(a)
and (b), respectively. The 8 component model has 5 clusters whilst there are only 2 clusters with a
ten component model. Although the 10 component model gives the same number of clusters as in 2
principal component one, the dbs median values for the clusters are both negative a high indication
that observations in these clusters were wrongly classified and ill partitioned. Therefore given the
mean dbs values for each cluster we conclude that data fits well in model with 2 components and 2
clusters. The width of the dbs graphs for the 2 component 2 cluster model are also reasonably bigger
than any other an indication that most observations are clustered in these clusters. This is because
the clusters are well separated, have no hidden clusters and with well classified observations. Thus,
number of principal component corresponds to the number of clusters although it should be noted
that different results can be found in other situations. It can be deduced that individuals exposed
to heart disease risks can be classified into 2 categories depending on which risks are they exposed
to at a given time. This can be high risk and low risk individuals as observed by [7] using Poisson
mixture models.
9
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
a)
dbs plot
cluster median dbs = 0.11
cluster median dbs = 0.21
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
b)
dbs plot
cluster median dbs = 0.14
cluster median dbs = 0.21
cluster median dbs = 0.26
cluster median dbs = −0.07
cluster median dbs = −0.15
−0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
c)
dbs plot
cluster median dbs = 0.4
cluster median dbs = 0.25
cluster median dbs = 0.04
cluster median dbs = −0.07
cluster median dbs = −0.12
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
d)
dbs plot
cluster median dbs = 0.15
cluster median dbs = 0.18
cluster median dbs = −0.07
cluster median dbs = −0.12
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 2. Dbs diagrams for different pprincipal components: a) p= 2, b) p= 3, c) p= 4
and d) p= 5
3.4.1 Cluster scatter plots
The cluster scatter plots represent how observations are classified, separated and partitioned within
and between clusters. It can also help to view cluster partitioning besides within a component
and cluster besides using the dbs diagnosis plot. In order to view these cluster partitions and
classifications, we are going to use Fig. 4, which represents the cluster scatter plots for models
with different components. The cluster scatter plot for a 2 and 3 component models are shown by
Fig. 4(a) and (b), respectively. They show that a 2 component model has 2 clusters which are well
separated in the scatter graph as groupings can easily be identified. Another 2 component model,
not shown in this work using fixed bandwidth had 3 clusters which were poorly separated. However,
a 3 component model on the other hand has 5 clusters which are very difficult to identify on the
scatter graph. The same condition is repeated as we increase the number of components to 4 as
shown by Fig. 4(c). It shows that it is very difficult to discretely separate observations among the
10
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
different clusters. This highly indicates clusters that are ill separated and observations are most
likely to be wrongly classified. A 10 component model although it has the same number of clusters
as the 2 component model, Fig. 4(d) shows that their elements can rarely be distinguished from
each other. Thus again using cluster scatter plots we deduce that 2 component 2 cluster model
proves to be the best model for heart disease diagnosis.
a)
b)
Fig. 3. Dbs diagrams for different pprincipal components: a) p= 8, b) p= 10.
3.5 Cluster summary for different models
In this section we represent a summary of all the models considered in the analysis with different
number of components and clusters as given in Table 1. It shows the different cluster properties of
the nonparametric mixture models analysed using PdfCluster package in R. The clusters alternate
from 2-5 with increase in the number of components. However, this seems to result in an increase in
the number of spurious clusters as well as wrong observation classifications as indicated by increase in
clusters with negative dbs median values thus reducing the cluster quality. Deducing from the cluster
properties in the table, best clusters are produced by model with 2 components as it has no spurious
clusters as well as all positive dbs median values as we have already seen. Although the 7 and 10
componential models produced the same number of clusters as 2, both clusters from a 7 component
model are spurious and with very low positive dbs median values, compared to a 2 component
model. This indicates poor cluster separation and relatively high observation misclassification. In
the same manner the 10 component model have all negative dbs values for the 2 clusters a high
indication that clusters were wrongly classified. We therefore conclude that individuals under risks
of heart disease be classified by 2 component nonparametric density model with 2 clusters.
3.6 Best model properties (2 Component 2 Cluster Model)
The best model for the heart disease is a 2 component 2 cluster model as it proved to be best model
for the heart disease data. In this section we will explore some of the properties of the choosen
model to have a better under understanding of the clusters. This may involve getting the properties
of each of the clusters produced by the PdfCluster algorithm. Table 2 shows a summary of some
of the statistical measures for the model.
The table summarises the dbs information for each cluster as well as for the whole data. It shows
that cluster 2 has high median dbs and mean values of 0.21 and 0.29, respectively compared to
cluster 1 which has a value even lower than the whole sample data. Cluster 2 also has a high max
positive dbs of 1 indicating well compacted cluster quality clustered data.
11
Mufudza and Erol; JAMCS, 27(5): 1-17, 2018; Article no.JAMCS.40440
a)
2
2
2
1
2
1
2
1
2
2
1
2
2
1
1
1
1
1
1
1
2
2
2
2
2
11
1
1
2
1
1
1
1
1
1
2
2
1
1
2
1
1
1
2
2
1
2
2
2
1
1
2
2
1
2
1
2
2
2
1
2
2
1
1
2
2
2
2
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
22
2
2
1
1
2
1
2
2
1
2
22
2
2
1
1
1
2
1
1
22
2
2
1
1
2
1
22
2
2
1
1
2
2
2
1
1
1
2
1
2
2
1
2
1
2
1
2
1
2
1
1
2
1
2
1
2
1
1
2
2
2
2
2
1
2
2
1
1
2
1
2
1
1
1
2
2
1
1
2
1
2
2
1
1
2
1
2
2
2
2
2
2
1
1
1
2
2
1
2
2
2
2
2
2
2
1
2
2
1
1
1
1
22
2
1
1
1
1
2
21
2
1
2
1
2
2
2
1
1
1
1
1
1
22
2
2
2
2
1
2
2
2
1
1
1
1
1
1
1
1
1
2
1
2
1
1
1
2
1
1
1
2
2
1
1
2
1
1
2
1
1
1
1
1
2
2
1
1
1
2
2
1
2
1
1
1
2
1
1
2
2
1
2
1
2
1
1
2
1
1
2
1
1
1
1
2
1
−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3
−0.2 −0.1 0.0 0.1 0.2
V1
V2
b)
var 1
−0.2 −0.1 0.0 0.1 0.2
4
3
3
1
2
1
2
2
3
3
1
5
3
1
1
1
1
1
21
1
2
4
3
3
2
2
2
1
3
2
3
1
1
1
1
3
31
1
3
1
2
1
5
3
1
3
2
4
2
1
4
1
1
3
1
1
21
2
2
3
2
1
3
14
3
1
2
31
3
4
2
3
5
4
3
3
24
3
1
4
4
2
5
2
2
3
1
2
2
1
3
3
1
4
4
2
5
5
1
1
1
3
1
1
3
3
4
3
2
1
4
2
3
3
3
3
1
1
4
2
3
1
1
2
11
1
1
2
5
3
31
41
3
1
1
3
1
3
1
1
2
1
2
2
3
3
31
3
3
1
1
3
2
21
1
1
2
3
2
1
3
2
2
31
1
3
1
4
3
3
2
3
5
2
1
3
3
3
1
3
3
2
5
3
2
2
2
4
5
2
1
1
1
3
3
3
1
2
2
1
4
1
1
1
2
2
2
1
2
2
2
3
2
2
1
2
3
3
5
2
4
2
2
3
3
1
211
2
2
1
2
1
1
31
4
1
1
1
2
1
2
2
2
4
1
2
2
2
1
33
11
1
1
3
1
1
21
3
2
2
4
2
1
1
2
1
1
3
3
1
1
1
3
2
1
3
2
1
3
2
1
3
1
5
1
−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3
4
3
3
1
2
1
2
2
3
3
1
5
3
1
1
1
1
1
21
1
2
4
3
3
2
2
2
1
3
2
3
1
1
1
1
3
3
1
1
3
1
2
1
5
3
1
3
2
4
2
14
1
1
3
11
2
1
2
2
3
2
1
3
14
3
1
2
3
1
3
4
2
3
5
4
3
3
24
3
1
4
4
2
52
2
3
1
2
2
1
3
3
1
4
4
2
55
1
1
13
1
1
3
3
4
3
2
1
4
2
3
3
3
3
1
1
4
2
3
1
1
2
1
1
1
1
2
5
3
3
1
4
1
3
1
13
1
3
11
2
1
2
2
3
3
3
13
3
1
1
3
2
21
1
1
2
3
2
1
3
2
2
3
1
1
3
1
4
3
3
2
3
5
2
1
3
3
3
1
33
2
5
3
2
2
2
4
5
2
1
1
1
3
3
3
1
2
2
1
4
1
1
1
2
2
21
2
2
2
3
2
2
1
2
3
3
5
2
4
2
2
3
3
1
21
1
2
2
1
2
1
1
31
4
11
1
2
1
2
224
1
2
2
2
1
3
3
11
1
1
3
1
1
21
3
2
2
4
2
1
1
2
1
1
3
3
1
1
1
3
2
1
3
2
1
3
2
1
3
1
5
1
−0.2 −0.1 0.0 0.1 0.2
4
3
3
1
2
1
2
2
3
3
1
5
3
1
1
1
1
1
2
1
1
2
4
3
3
22
2
1
3
2
3
1
1
11
3
3
1
1
3
1
2
1
5
3
1
3
2
4
2
1
41
1
3
1
1
2
1
2
2
3
2
1
3
1
4
3
1
2
3
1
34
2
3
5
4
3
3
2
4
3
1
4
4
2
52
2
3
1
2
2
1
3
3
1
4
42
55
11
1
3
1
1
33
4
3
2
1
4
2
33
3
3
1
1
42
3
1
1
2
1
1
1
1
2
5
3
3
1
4
1
3
1
1
3
1
3
1
1
2
1
2
2
3
33
1
3
3
1
1
3
2
2
1
1
1
2
3
2
1
3
2
2
3
11
3
1
4
3
32
3
5
2
1
3
3
3
1
3
3
2
5
32
2
2
4
5
2
1
1
1
333
1
2
2
1
4
11
1
2
2
2
1
22
2
3
2
2
1
2
33
5
24
2
2
3
3
1
2
1
1
2
2
1
2
1
1
3
1
4
1
1
1
2
1
2
2
2
4
1
2
2
2
1
3
3
1
1
1
1
3
1
1
2
1
3
2
2
4
2
1
1
2
1
1
3
3
1
1
1
3
2
1
3
2
1
3
2
1
3
1
5
1
var 2
4
3
3
1
2
1
2
2
3
3
1
5
3
1
1
1
1
1
2
1
1
2
4
3
3
22
2
1
3
2
3
1
1
11
3
3
1
1
3
1
2
1
5
3
1
3
2
4
2
1
41
1
3
1
1
2
1
2
2
3
2
1
3
1
4
3
1
2
3
1
3
4
2
3
5
4
3
3
2
4
3
1
4
4
2
52
2
3
1
2
2
1
3
3
1
4
42
55
11
1
3
1
1
33
4
3
2
1
4
2
33
3
3
1
1
4
2
3
1
1
2
1
1
1
1
2
5
3
3
1
4
1
3
1
1
3
1
3
1
1
2
1
2
2
3
33
1
3
3
1
1
3
2
2
1
1
1
2
3
2
1
3
2
2
3
11
3
1
4
3
32
3
5
2
1
3
3
3
1
3
3
2
5
32
2
2
4
5
2
1
1
1
3
33
1
2
2
1
4
11
1
2
2
2
1
22
2
3
2
21
2
33
5
24
2
2
3
3
1
2
1
1
2
2
1
2
1
1
3
1
4
11
1
2
1
2
2
24
1
2
2
2
1
3
3
1
1
1
1
3
1
1
2
1
3
2
2
4
2
1
1
2
1
1
3
3
1
1
1
3
2
1
3
2
1
3
2
1
3
1
5
1
−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3
4
3
312
1
2
2
3
3
1
5
3
1
1
1
1
1
2
1
12
4
3
3
22
2
1
3
2
3
1
1
1
1
3
3
11
3
1
2
1
5
3
1
3
2
4
2
1
4
1
1
3
1
1
2
1
2
2
3
2
1
3
1
4
31
2
3
1
34
2
3
5
4
3
3
2
4
31
4
4
2
52
2
3
1
2
2
1
3
3
1
4
4
2
5
5
1
1
1
3
1
1
3
3
4
3
2
1
4
2
33
3
3
1
1
4
2
3
1
1
2
1
1
1
1
2
5
3
3
1
4
1
3
1
1
31
31
1
2
1
2
2
3
3
3
1
3
31
1
3
2
2
1
1
1
2
3
2
1
3
2
2
3
1
1
3
1
4
3
3
2
3
52
1
3
3
3
1
3
3
2
5
3
2
2
2
4
5
2
1
1
1
333
1
2
2
1
4
1
1
1
2
2
2
1
2
2
2
3
2
2
1
2
3
35
2
4
22
3
3
1
2
1
1
2
2
1
2
1
1
3
1
4
1
1
1
2
1
2
2
2
4
1
2
22
1
331
1
11
3
1
1
2
1
3
2
2
4
2
1
1
2
1
1
3
3
1
1
1
3
2
1
3
2
1
3
2
1
3
1
5
1
4
3
31
2
1
2
2
33
1
5
3
1
1
1
1
1
2
1
1
2
4
3
3
2
2
2
1
3
2
3
11
1
1
3
3
1
1
3
1
2
1
5
3
1
3
2
4
2
1
4
1
1
3
1
1
21
2
2
3
2
1
3
1
4
31
2
3
1
3
4
2
3
5
4
3
3
2
4
31
4
4
2
5
2
2
3
1
2
21
3
3
1
4
4
2
5
5
1
1
1
3
1
1
3
3
4
3
2
1
4
2
333
3
11
4
2
3
1
1
2
1
1
1
1
2
5
3
3
1
4
1
3
1
1
31
31
1
2
1
2
23
3
31
3
31
1
3
2
2
1
1
1
2
3
2
1
3
2
2
3
1
1
3
1
4
3
3
23
52
1
3
3
3
1
3
3
2
53
2
2
2
4
5
2
1
1
1
3
33
1
2
2
1
4
1
1
1
2
2
2
1
2
2
23
2
2
1
2
3
3
5
2
4
22
3
3
1
2
11
2
2
1
2
1
1
3
1
4
1
1
1
21
2
2
2
41
2
22
1
33
1
1
1
1
3
1
1
2
1
3
2
2
4
2
1
1
2
1
1
3
3
1
1
1
3
21
3
2
1
3
2
1
3
1
5
1
−0.3 −0.2 −0.1 0.0 0.1 0.2
−0.3 −0.2 −0.1 0.0 0.1 0.2
var 3
c)
var 1
−0.2 −0.1 0.0 0.1 0.2
3
24
1
21
2
1
14
1
2
4
1
1
1
1
1
11
3
2
3
3
4
1
1
1
1
4
1
1
1
1
1
1
2
41
1
2
1
1
1
2
3
1
2
2
3
1
1
3
3
1
4
1
3
33
1
2
4
1
1
4
33
4
1
1
1
1
3
3
2
4
23
4
2
23
4
1
3
3
2
2
2
3
2
1
1
2
1
4
2
1
3
3
3
2
2
1
1
1
3
5
1
2
2
3
2
1
1
3
1
2
1
1
2
1
1
3
2
2
1
1
1
31
3
3
1
2
1
31
31
3
1
1
2
1
2
1
3
1
1
2
2
4
2
21
3
1
11
2
1
21
1
1
2
3
1
1
2
1
2
25
1
4
1
3
3
2
3
3
2
1
1
1
3
3
1
1
4
2
2
2
3
2
1
3
21
1
1
1
4
42
1
1
1
1
3
1
1
3
1
2
1
3
2
2
1
1
1
1
1
1
2
2
2
2
3
2
1
2
4
3
11
1
1
1
1
1
1
1
21
3
1
1
1
2
1
1
1
2
3
1
1
2
1
1
21
11
1
1
4
3
1
11
3
2
1
3
1
1
1
1
1
1
1
2
1
3
1
3
1
1
4
1
1
2
1
1
1
1
21
3
24
1
21
2
1
1
4
1
2
4
1
1
1
1
1
11
3
2
3
3
4
11
1
1
4
1
1
1
1
11
2
4
1
1
2
1
1
1
2
3
1
2
2
3
1
13
3
1
4
13
3
3
1
2
4
1
1
4
33
4
1
1
1
1
3
3
2
4
23
4
2
23
4
1
3
3
2
22
3
2
1
12
1
4
2
1
3
3
3
22
1
1
13
5
1
22
3
2
1
13
1
2
1
1
2
1
1
3
2
2
1
1
1
3
1
3
3
12
1
3
1
3
1
3
1
12
1
2
13
1
1
2
2
4
2
2
13
1
11
2
1
21
1
1
2
3
1
1
2
1
2
2
5
1
4
1
3
3
2
3
3
2
1
1
1
3
3
1
14
2
2
2
3
2
1
3
2
1
1
1
1
4
42
1
1
1
1
3
1
1
3
1
2
13
2
2
1
1
1
1
1
1
2
2
2
2
3
2
1
2
4
3
11
1
1
1
1
1
1
1
21
3
11
1
2
1
1
123
1
1
2
1
1
2
1
11
1
1
4
3
1
11
3
2
1
3
1
1
1
1
1
1
1
2
13
1
3
1
1
4
1
1
2
1
1
1
1
21
−0.2 −0.1 0.0 0.1 0.2
−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3
3
2
4
1
21
2
1
1
4
1
2
4
11
1
1
1
1
1
3
2
3
3
4
111
1
4
1
1
1
1
11
241
1
2
1
1
1
2
3
1
2
2
3
1
1
3
3
1
4
1
3
3
3
1
2
4
1
1
4
33
4
1
1
1
1
3
3
2
4
2
3
4
2
2
3
4
1
3
3
2
2
2
3
2
1
12
1
42
1
3
3
3
22
1
1
13
5
1
22
3
2
1
13
1
2
1
12
1
1
3
2
2
1
11
3
1
3
3
1
2
1
3
1
3
1
3
1
1
2
1
2
1
3
1
1
2
2
4
2
2
13
1
1
1
2
1
21
1
1
2
3
1
1
2
1
2
2
5
1
4
1
3
3
2
3
3
2
1
1
1
3
3
1
1
4
2
2
2
3
2
1
3
2
1
1
1
1
4
4
2
1
1
1
1
3
1
1
3
1
2
1
3
2
2
1
1
1
1
1
1
2
2
2
2
3
2
1
2
4
3
1
1
1
1
1
1
1
1
1
21
3
11
1
2
1
11
2
3
1
1
2
1
1
2
1
11
1
1
4
3
1
11
3
2
1
3
1
1
1
1
1
1
1
2
1
3
1
3
1
1
4
1
1
2
1
1
1
1
2
1
−0.2 −0.1 0.0 0.1 0.2
3
2
4
1
2
1
2
1
1
4
1
2
4
1
1
1
1
11
1
3
2
3
3
4
11
1
1
41
1
1
111
2
4
1
1
2
1
1
1
2
3
1
2
2
3
1
1
33
1
4
1
3
3
3
1
2
4
1
1
43
3
4
1
1
1
1
33
2
4
2
3
42
2
3
4
1
3
3
2
22
3
2
1
1
2
1
4
2
1
3
33
22
11
1
3
51
22
3
2
1
1
3
1
21
1
2
1
1
32
2
1
1
1
3
1
3
3
1
2
1
3
1
3
1
3
1
1
2
1
2
1
3
1
1
2
2
4
22
1
3
1
11
2
1
2
1
1
1
2
3
1
1
2
1
2
2
51
4
1
3
3
23
3
2
1
1
1
3
3
1
1
4
2
2
23
2
1
3
2
1
1
1
1
442
1
11
1
3
11
3
1
2
1
3
22
1
1
1
1
1
1
22
2
23
2
1
2
43
1
1
1
1
1
11
1
1
2
1
3
1
1
1
2
11
1
2
3
1
1
2
1
1
2
1
1
1
1
1
43
1
1
1
3
2
1
3
1
1
1
1
1
1
1
2
1
3
1
3
1
1
4
1
1
2
1
1
1
1
2
1
var 2
3
2
4
1
2
1
2
1
1
4
1
2
4
1
1
1
1
1
1
1
3
2
3
3
4
11
1
1
4
1
1
1
1
11
2
4
1
1
2
1
1
1
2
3
1
2
2
3
1
1
33
1
4
1
3
3
3
1
2
4
1
1
43
3
4
1
1
1
1
3
3
2
4
2
3
4
2
2
3
4
1
3
3
2
22
3
2
1
1
2
1
4
2
1
3
33
22
11
1
3
51
22
3
2
1
1
3
1
21
1
2
1
1
3
2
2
1
1
1
3
1
3
3
1
2
1
3
1
3
1
3
1
1
2
1
2
1
3
1
1
2
2
4
22
1
3
1
11
2
1
2
1
1
1
2
3
1
1
2
1
2
2
51
4
1
3
3
23
3
2
1
1
1
3
3
1
1
4
2
2
23
2
1
3
2
1
1
1
1
4
42
1
11
1
3
11
3
1
2
1
3
22
1
1
1
11
1
22
2
23
2
1
2
43
1
1
1
1
1
1
1
1
1
2
1
3
11
1
2
1
1
1
23
1
1
2
1
1
2
1
11
1
1
43
1
1
1
3
2
1
3
1
1
1
1
1
1
1
2
1
3
1
3
1
1
4
1
1
2
1
1
1
1
2
1
3
2
4
1
2
1
2
1
1
4
1
2
4
11
1
1
11
1
3
2
3
3
4
111
1
41
1
1
1
11
2
4
1
1
2
1
1
1
2
3
1
2
2
3
1
1
3
3
1
4
1
3
3
3
1
2
4
1
1
4
3
3
4
1
1
1
1
3
3
2
4
2
3
4
2
2
34
1
3
3
2
2
2
3
2
1
1
2
1
4
2
1
3
3
3
22
1
1
1
3
51
22
3
2
1
1
3
1
2
1
1
2
1
1
3
22
1
1
1
3
1
3
3
1
2
1
3
1
3
1
3
1
1
2
1
2
1
3
1
1
2
2
4
22
1
31
1
1
2
1
2
1
1
1
2
3
1
1
2
1
2
2
51
4
1
3
3
23
3
2
1
1
1
33
1
1
4
2
2
23
2
1
3
2
1
1
1
1
4
4
2
1
11
1
3
1
1
3
1
2
1
3
22
1
1
1
1
1
1
22
2
23
2
1
2
43
1
1
1
1
1
1
1
1
1
2
1
3
11
1
2
111
2
3
1
1
2
1
1
2
1
11
1
1
43
1
1
1
3
2
1
3
1
1
1
1
11
1
2
1
3
1
3
1
1
4
1
1
2
1
1
1
1
2
1
3
2
412
1
2
1
1
4
12
4
1
1
1
1
1
1
1
32
3
3
4
11
1
1
4
1
1
1
111
24
11
2
1
1
1
2
3
1
2
2
3
1
1
3
3
1
41
3
3
3
1
2
4
1
1
4
3
3
41
1
1
1
33
2
42
3
4
2
2
3
41
3
3
2
22
3
2
1
1
2
1
4
2
1
3
3
3
22
1
1
1
3
5
1
2
2
3
21
1
3
1
21
1
2
1
1
3
2
2
1
1
1
3
1
3
3
1
2
1
3
1
3
1
3
1
1
21
21
3
1
1
2
2
4
2
2
1
3
11
1
2
1
2
1
11
2
3
1
1
2
1
2
2
5
1
4
1
3
3
2
3
3
21
1
1
3
3
1
1
4
2
2
2
3
2
1
3
2
1
1
1
1
442
1
1
1
1
3
1
1
3
1
2
1
3
2
2
1
1
1
1
1
1
222
2
3
21
2
4
3
1
1
1
1
1
1
1
1
1
2
1
3
1
1
1
2
1
1
1
2
3
1
1
21
1
211
1
11
4
3
1
1
1
3
21
3
1
1
1
1
1
1
1
2
1
3
1
3
1
1
4
1
1
2
1
1
1
12
1
3
2
41
21
2
1
14
1
2
4
1
1
1
1
1
1
1
3
2
3
3
4
1
1
1
1
4
1
1
11
1
1
2
4
1
1
2
1
1
1
2
3
1
2
2
3
1
1
3
3
1
41
3
33
1
2
4
1
1
4
3
3
41
1
1
1
3
3
24
2
3
4
2
2
3
41
3
3
2
2
2
3
2
1
1
21
4
2
1
3
3
3
2
2
1
1
1
3
5
1
2
2
3
21
1
3
1
2
11
2
11
3
2
2
1
1
1
3
1
3
3
1
2
1
3
1
3
1
3
1
1
21
21
3
1
1
2
24
2
21
3
11
1
2
1
2
1
1
1
2
3
11
2
1
2
2
5
1
4
1
33
2
33
21
1
1
3
3
1
1
4
2
22
3
2
1
3
2
1
1
1
1
4
42
1
1
1
1
3
1
1
3
1
2
1
3
2
2
11
1
1
1
1
2
2
2
2
3
21
2
4
3
1
1
1
1
1
1
11
1
2
1
3
1
1
1
21
1
1
2
31
1
21
1
21
1
1
1
1
4
3
1
1
1
3
21
3
1
1
1
1
1
1
1
2
1
3
1
3
11
4
1
1
2
1
1
1
1
2
1
var 3
−0.3 −0.2 −0.1 0.0 0.1 0.2
3
2
4
121
2
1
1
4
1
2
4
11
1
1
1
1
1
32
3
3
4
11
1
1
4
1
1
1
1
11
24
1
1
2
1
1
1
2
3
1
2
2
3
1
1
3
3
1
41
3
3
3
1
2
4
1
1
4
3
3
41
1
1
1
3
3
2
42
3
4
2
2
3
4
1
3
3
2
2
2
3
2
1
1
2
1
4
2
1
3
3
3
22
1