Conference PaperPDF Available

Healthy Cognitive Aging: a hybrid random vector functional-link model for the analysis of Alzheimer's disease


Abstract and Figures

lzheimer's disease (AD) is a genetically complex neurodegenerative disease, which leads to irreversible brain damage, severe cognitive problems and ultimately death. A number of clinical trials and study initiatives have been set up to investigate AD pathology, leading to large amounts of high dimensional heterogeneous data (biomarkers) for analysis. This paper focuses on combining clinical features from different modalities, including medical imaging, cerebrospinal fluid (CSF), etc., to diagnose AD and predict potential progression. Due to privacy and legal issues involved with clinical research, the study cohort (number of patients) is relatively small, compared to thousands of available biomarkers (predictors). We propose a hybrid pathological analysis model, which integrates manifold learning and Random Vector functional-link network (RVFL) so as to achieve better ability to extract discriminant information with limited training materials. Furthermore, we model (current and future) cognitive healthiness as a regression problem about age. By comparing the difference between predicted age and actual age, we manage to show statistical differences between different pathological stages. Verification tests are conducted based on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Extensive comparison is made against different machine learning algorithms, i.e. Support Vector Machine (SVM), Random Forest (RF), Decision Tree and Multilayer Perceptron (MLP). Experimental results show that our proposed algorithm achieves advantageous results than the comparison targets, which indicates promising robustness for practical clinical implementation.
Content may be subject to copyright.
Healthy Cognitive Aging: a Hybrid Random Vector Functional-Link Model for the
Analysis of Alzheimers Disease
Peng Dai1,2,3, Femida Gwadry-Sridhar1,2,3, Michael Bauer1, Michael Borrie4, Xue Teng3, for the ADNI
1Department of Computer Science, University of Western Ontario, London, ON, Canada
2Robarts Research, London, ON, Canada
3Pulse Infoframe Inc., London, ON, Canada
4Division of Geriatric Medicine, University of Western Ontario, London, ON, Canada, {fgwadrys, bauer},,
Alzheimer’s disease (AD) is a genetically complex neurode-
generative disease, which leads to irreversible brain damage,
severe cognitive problems and ultimately death. A number
of clinical trials and study initiatives have been set up to
investigate AD pathology, leading to large amounts of high
dimensional heterogeneous data (biomarkers) for analysis.
This paper focuses on combining clinical features from dif-
ferent modalities, including medical imaging, cerebrospinal
fluid (CSF), etc., to diagnose AD and predict potential pro-
gression. Due to privacy and legal issues involved with clin-
ical research, the study cohort (number of patients) is rela-
tively small, compared to thousands of available biomark-
ers (predictors). We propose a hybrid pathological analysis
model, which integrates manifold learning and Random Vec-
tor functional-link network (RVFL) so as to achieve better
ability to extract discriminant information with limited train-
ing materials. Furthermore, we model (current and future)
cognitive healthiness as a regression problem about age. By
comparing the difference between predicted age and actual
age, we manage to show statistical differences between dif-
ferent pathological stages. Verification tests are conducted
based on the Alzheimers Disease Neuroimaging Initiative
(ADNI) database. Extensive comparison is made against dif-
ferent machine learning algorithms, i.e. Support Vector Ma-
chine (SVM), Random Forest (RF), Decision Tree and Mul-
tilayer Perceptron (MLP). Experimental results show that our
proposed algorithm achieves better results than the compari-
son targets, which indicates promising robustness for practi-
cal clinical implementation.
According to the 2015 World Alzheimer report, there are an
estimated 46 million people worldwide living with demen-
tia at a total cost of over $818 billion, which is estimated to
increase to a trillion dollar by 2018 (Alzheimers Disease In-
ternational, 2015). Alzheimers disease (AD) is one of the
most common causes for dementia, accounting for about
Data used in preparation of this article were obtained from
the Alzheimers Disease Neuroimaging Initiative (ADNI) database
( As such, the investigators within the ADNI
contributed to the design and implementation of ADNI and/or pro-
vided data but did not participate in analysis or writing of this re-
Copyright c
2017, Association for the Advancement of Artificial
Intelligence ( All rights reserved.
60% of the total. The disease presents a tremendous burden
and challenge to public health, health care delivery, social
services and the family (Alzheimers Disease International,
2015). AD usually develops in situ while the patient is cog-
nitively normal. At some point in time, sufficient brain dam-
age accumulates to result in cognitive symptoms, which may
further deteriorate to disability and ultimately death. There is
currently no effective cure to reverse the damages caused by
Alzheimer’s. Treatments are mainly to ease cognitive symp-
toms, delay progression and improve quality of life via as-
sistive technologies. Therefore, it is crucial to diagnose or
predict AD as early as possible so as to allow treatments
start early, which helps patients to maintain cognitive func-
Clinical diagnosis of AD often includes establishing the
presence of dementia, amnesia and a deficit in one or
more cognitive functions, such as skilled movements (limb
apraxia), language (aphasia) or executive function (e.g.,
planning, attention and abstract reasoning)(Scott and Barrett
2007; American Psychiatric Association, 2013). The diag-
nosis process is complex, which involves a number of as-
sessments, e.g. medical history review, physical examina-
tion, neurological examination, cognitive testing, laboratory
testing and brain imaging. Physicians usually evaluate the
above mentioned tests based on experience with quantitative
guidelines. It is very challenging especially for early AD pa-
tients without clear cognitive symptoms.
With recent advances in artificial intelligence, evidence
has shown that effective application of machine learning al-
gorithms can greatly improve the efficiency of many tasks.
Machine learning offers valuable tools for advanced diag-
nostic techniques, which can assist the clinicians to bet-
ter understand the information underlying the high dimen-
sional heterogeneous medical variables. Automatic diagno-
sis of AD can be formulated as a classification problem. The
problem is particularly challenging due to the inherent dif-
ficulty in distinguishing between normal aging, mild cogni-
tive impairment (MCI), and early signs of AD. For example,
patients with dementia may not complain of cognitive dif-
ficulty owing to loss of self-awareness, while patients with
depression often complain of memory difficulties and seek
medical attention of their own initiative (Scott and Barrett
Because of the fast development of medical technology, a
number of medical tests have been developed for AD anal-
ysis, which yields a large amount of high dimensional het-
erogeneous data. However, due to privacy and legal issues of
clinical research, the study cohort (i.e. the number of avail-
able patients) is relatively small. If well trained, state-of-
the-art machine learning algorithms, e.g. deep learning, can
usually achieve very good performance (Hinton et al. 2012),
which on the other hand require a large amount of training
materials, i.e. large cohort. Therefore, we have to balance
between algorithm complexity and required data. In this pa-
per, we propose a hybrid system which combines manifold
learning and random vector functional link network (RVFL)
to achieve better ability to capture high dimensional non-
linear information from clinical data. Different from tradi-
tional Artificial Neural Network (ANN), RVFL sets most of
the parameters completely at random, which do not need to
be tuned during training. Besides, manifold learning is inte-
grated as part of the system, which helps to construct a better
representation of the high dimensional heterogeneous data.
Combined with manifold learning, RVFL is able to obtain a
satisfying approximation of the original problem. It is par-
ticularly suitable for problems where only limited data are
Related Work
Alzheimer’s disease causes progressive damages to the hu-
man brain, causing massive brain cell death and thus atrophy
in various brain regions. Magnetic resonance imaging (MRI)
techniques utilize strong magnetic fields to form anatomi-
cal images of the body, which provides a valuable tool to
directly observe brain changes such as cerebral atrophy or
ventricular expansion. Therefore, MRI has become one of
the most widely used means to assist AD diagnosis. A large
amount of work has been done about applying image pro-
cessing techniques to MRIs. For example, Keraudren et. al.
proposed to use Scale-Invariant Feature Transform (SIFT) to
analyze brain atrophy (Keraudren et al. 2013). Another im-
portant approach is to establish 3D brain model and extract
volumetric information of various brain regions. Freesurfer
is one of the most commonly used package for the analysis
and visualization of structural and functional neuroimaging
data (Dale, Fischl, and Sereno 1999). In our current imple-
mentation, brain volume information together with genome
and demographics (age, gender, education) forms the feature
The diagnosis of Alzheimer’s disease can be formulated
as a classification problem, where the clinical diagnosis can
serve as labels and the high dimensional medical variables
can serve as features. Therefore, a number of related work
have been reported during the past decades. Lebedev et.
al. utilized the random forest algorithm for AD diagnosis
(Lebedev et al. 2014). Lopez et. al. utilized support vector
machine (SVM) to detect early signs of AD (L´
opez et al.
2011). Feature selection algorithms, e.g. statistical signifi-
cance, are widely used for dimension reduction in clinical
studies. Recently, manifold learning algorithms are intro-
duced to relevant studies. Conventional manifold learning
refers to nonlinear dimensionality reduction methods based
on the assumption that high-dimensional input data are sam-
pled from a smooth manifold so that one can embed these
data into the low-dimensional manifold while preserving
some structural (or geometric) properties that exist in the
original input space (Lin and Zha 2008). Instead of remov-
ing redundant feature dimensions, manifold learning algo-
rithms construct a low dimensional representation based on
the original data. Lopez et al. implements PCA as part of
their system(L´
opez et al. 2011). Dai et. al. proposed an im-
proved isometric mapping algorithm for feature embedding
and utilized ensemble learning algorithms for similar tasks
(Dai et al. 2015; 2016a).
Recently, deep learning has become one of the most pow-
erful machine learning techniques, which has shown supe-
rior performance in various practical applications, e.g. nat-
ural language processing and image recognition (Hinton et
al. 2012; LeCun, Bengio, and Hinton 2015). Inspired by its
promising performance, researchers have been trying to im-
plement deep learning in dementia research. Li et al. pro-
posed a hybrid system which combined principal component
analysis (PCA) and deep learning autoencoder to extract dis-
criminative features for AD diagnosis (Li et al. 2015). Payan
et al. proposed a deep learning algorithm based on 3D con-
volution of MRI images (Payan and Montana 2015). Dai et.
al. utilized multilayer perceptron (MLP) for AD diagnosis
and prognosis (Dai et al. 2016b). It has to be noted that
all the above mentioned work mainly implement a relative
‘easy’ or ‘shallow’ version of the deep learning algorithms.
There are only a small number of hidden layers and barely
show any complex structures, e.g. convolution layer. This is
mainly due to the fact that the study cohort is relatively small
compared with the imaging database used in deep learning
studies. Therefore, we have to balance between algorithm
complexity and the issues caused by limited training data.
In this paper, we study Random vector functional link net-
work (RVFL), in which only the output weights are chosen
as adaptable parameters, while the remaining parameters are
constrained to random values independently selected in ad-
vance (Husmeier 1999). RVFL simplifies the artificial neu-
ral network as a linear regression problem on top of a series
of randomly assigned transition functions (hidden layers),
which is an efficient approximation of the original nonlinear
optimization problem.
Problem Formulation
As the patient develops AD, there are pathological changes
in various regions of the human brain, which can be mea-
sured by volumetric changes. Besides, medical history, labo-
ratory testing, physical examination and cognitive testing are
all closely related to the final diagnosis (American Psychi-
atric Association, 2013). All those medical variables forms
the original feature vector, f. There are totally Nppartic-
ipants in the study. For each participant u, the diagnos-
tic analysis is repeated (follow-up medical tests) every 6
months, which will form a r×nfeature vector, f1×(r·n),
where nis the number of features obtained in each test and
ris the number of follow-up tests.
There are generally two problems in Alzheimer’s disease
research, i.e. diagnosis and prognosis. Diagnosis intends to
identify if the patient is cognitively normal or AD (see Prob-
lem 1). Prognosis is to tell how the patient will evolve in
AD pathology (see Problem 2). For example, if the patient
is currently healthy, the prognosis task is to determine if the
patient will stay healthy or likely develop AD.
Problem 1 (Diagnosis) Given different patients, described
as feature B, how to decide the patient’s pathological sta-
tus, e.g. Healthy, Mild Cognitive Impairment (MCI), or
Alzheimer’s disease (AD)?
Problem 2 (Prognosis) Given a patient, described as B,
and his/her historical mental status label, D, how to pre-
dict if the patient will stay at the same stage or progress in
the pathological path?
The aging process can be understood as an interactive
process between the human body and the environment, de-
scribed as a sequence of medical variables. Diagnosis is to
reveal the current cognitive status and thus a classification
problem. For prognosis, a straight forward approach is to
construct a time series model, e.g. Hidden Markov Model,
to capture the temporal evolution trajectory. Nevertheless,
due to the lack of valid data, it’s very difficult to fully train
advanced time series models. We formulate the prognosis
problem as a classification problem based on the sequence
of clinical diagnosis from ADNI. The prognosis is gener-
ated as ‘progression’ and ‘no progression’. In the progno-
sis task, we group the current and preceding observation,
{Bt,f ,Bt1,f }, to form the new feature vector so as to ac-
count for the temporal cognitive changes.
Problem 3 (Healthy Aging) Given different patients, de-
scribed as feature B, what causes some people develop AD
while other people remain healthy?
Moreover, we investigate a third problem about healthy
aging within our proposed framework. While aging, vari-
ous functions of the human body gradually degrades. The
healthy aging problem intends to explain the difference that
features various aging pathways, i.e. healthy or dementia.
Figure 1 shows the diagram of our proposed system. Our
proposed system mainly consists of two parts, i.e. mani-
fold learning and Random vector functional link network
(RVFL), which will be discussed in detail in the following
Manifold Learning
The available clinical data are high dimensional heteroge-
neous, which are obtained from different sources, e.g. med-
ical imaging and blood tests. It is of vital importance to pre-
process the data so as to remove noise, normalize scaling
factors, etc. Another important step is to reduce feature di-
mension. Because of the curse of dimensionality, the data
required to fully represent the hidden mechanism increase
exponentially as the feature dimension increases. Therefore,
dimension reduction is one of the most simple and effective
way to boost system performance.
Manifold learning algorithms are designed to construct a
low dimensional representation, which preserves the topo-
logical or structural properties of the original data (Lee and
Manifold Learning
Random Vector Functional-Link
network (RVFL)
amyloid-β, etc.
age, ApoE, etc.
Diagnosis &
Prognosis Aging Speed
Figure 1: Schematic overview of the proposed methodology.
Verleysen 2007). In this paper, we compare the performance
of different manifold learning algorithms in our RVFL
based framework, including Principal Component Anal-
ysis (PCA), Neighborhood Preserving Embedding (NPE)
(Xiaofei He et al. 2005), Locality Preserving Projections
(LPP) (Xiaofei He 2003) and stochastic proximity embed-
ding (SPE). NPE and SPE are based on neighborhood graph.
Our experimental results show that LPP shows better perfor-
mance in our current framework.
Locality Preserving Projection (LPP) is a linear approx-
imation of the nonlinear Laplacian Eigenmap (Belkin and
Niyogi 2001). A neighborhood graph is firstly constructed
with weights defined as
Wi,j =ekfifjk
where fiis the feature vector for different patients
Then, LPP embedding can be calculated as the general-
ized eigenvector problem
where Dis a diagonal matrix whose entries are column sums
of W,Dii = ΣjWji .L=DWis the Laplacian matrix.
Random vector functional-link (RVFL) network
The idea of functional link network was suggested by Pao
and co-workers in 1988 (Klassen, Pao, and Chen 1988). A
typical artificial neural network consists of a linear link of
inputs together with nonlinear activation functions, while
Pao and co-workers suggested that a link can also be non-
linear. In a semi-linear net, the pattern vector at any layer is
multiplied linearly by a matrix of link weights to yield the
vector input to the next layer. Pao suggested that a nonlinear
functional transform be carried out along a nonlinear func-
tional link to yield a new pattern vector in a larger space
(Klassen, Pao, and Chen 1988). Since functional link net-
work incorporates nonlinearity by variations of additional
input nodes, generally ‘flat’ nets with no hidden layers are
sufficient for most of practical tasks (Klassen, Pao, and Chen
Random vector functional link network (RVFL) is one
of the practical implementations of functional link network.
(a) Hidden Layer Neural Network: dInput Nodes, Nhid-
den nodes, koutput Nodes.
G1G2...x1... GN-1
Enhanced NodesInput
Target Valuest1tm
β1 β2 ... βj
(b) Random vector functional link network (RVFL): dInput
Nodes, NRandom Neurons (equivalent to hidden nodes), k
output Nodes.
Figure 2: Diagram of (a) hidden-layer net and (b) functional-
link network architectures.
It’s a multilayer perceptron (MLP) in which only the out-
put weights are chosen as adaptable parameters, while the
remaining parameters are constrained to random values in-
dependently selected in advance (Husmeier 1999). Standard
single layer neural network can be modeled as
The random-vector version of the functional-link net gener-
ates Aand b, randomly, and learn only β.
Figure 2(b) shows the net architecture of a single hidden
layer RVFL net. Although random vector (or feature) have
been generally believed to be less powerful than learned
features, it has shown reasonably success in many practi-
cal applications (Saxe et al. 2011; Rahimi and Recht 2008).
Recently, Huang et. al. further improved RVFL to extreme
learning machine (ELM), which achieves very promising re-
sults with simple network structure (Huang, Zhu, and Siew
2006). Random vector can significantly simplify the algo-
rithm complexity, since a large amount of the parameters are
randomly selected and do not need to be tuned. The RVFL
can also be considered to consist of input and output lay-
ers, but no hidden layers. An input layer has enhanced input
values which are created by various functional links with
original input values.
The key advantage of RVFL related approaches is the
ability to obtain promising results with limited data, where
state-of-the-art algorithms, e.g. deep learning, probably can-
not properly trained. In our present implementation, we use
the Extreme Learning Machine (ELM) version of RVFL 1.
Healthy Aging
Although people are in different aging path (healthy or de-
mentia), there will always be degradation in various func-
tions of daily living. Patients with dementia ‘moves faster’ in
the aging process than the healthy aging counterparts. There-
fore, a straight forward approach to evaluate the cognitive
health is to study the ‘pathological’ aging status.
A regression model is constructed based on RVFL to esti-
mate the patient’s age.
where Bis the feature matrix consisting relevant medical
variables. The model is trained using the healthy partici-
pants. We assume the predictive power of our proposed al-
gorithm is reasonably well. Therefore, the predicted age can
reflect the actual status of the patient’s brain. When applied
to an AD patient, it reflects how old the patient should be if
he/she is healthy.
Then, we study the difference between the predicted age
and the actual age.
Adif =ApAreal (5)
where Areal is the actual age; Apis the predicted age. Based
on our study, Adif follows Normal distribution. It reflects
the aging speed of the target patient and shows different sta-
tistical properties for different AD stages. More details will
be given in the results section.
Results and Discussion
In this section, detailed descriptions about the database and
experiment settings are presented. Extensive comparison is
made to show how the performance of our proposed algo-
Data acquisition and pre-processing
Verification tests are performed based on the Alzheimers
Disease Neuroimaging Initiative (ADNI) database
( The ADNI was launched in 2003
by the National Institute on Aging (NIA), the National In-
stitute of Biomedical Imaging and Bioengineering (NIBIB),
the Food and Drug Administration (FDA), private pharma-
ceutical companies and non-profit organizations, as a $60
million, 5-year public/private partnership. The primary goal
of ADNI has been to test whether serial magnetic resonance
imaging (MRI), positron emission tomography (PET), other
biological markers, and clinical and neuropsychological
assessment can be combined to measure the progression
of mild cognitive impairment (MCI) and early Alzheimers
disease (AD). (Weiner et al. 2012).
The Principal Investigator of this initiative is Michael W.
Weiner, MD, VA Medical Center and University of Califor-
nia - San Francisco. ADNI is the result of efforts of many
1The implementation codes are provided by the authors at (Huang, Zhu, and Siew
2006) .
co-investigators from a broad range of academic institutions
and private corporations, and subjects have been recruited
from over 50 sites across the U.S. and Canada. The initial
goal of ADNI was to recruit 800 subjects but ADNI has
been followed by ADNI-GO and ADNI-2. For up-to-date
information, see .
Experiment setup
As described in previous sections, there are multiple feature
points for the same patient corresponding the patients’ dif-
ferent visits to ADNI site. After removing invalid entries,
there are totally 2158 data points, with 636 Healthy Control
(HC) records, 1056 MCI records, and 466 AD records. Ten
fold cross validation is used in our experiments. Comparison
is made against Multilayer Perceptron (MLP), Support Vec-
tor Machine (SVM), Random Forest (Breiman 2001) and
Decision Tree. The optimal result of MLP is achieved with 2
hidden layers. SVM is implemented with Radial basis func-
tion (RBF) kernel.
Experimental results
Aging Speed
Alzheimer’s disease is a geriatric disease, and thus age is a
strong risk factor of the disease. Since aging is (to the best
of our knowledge) is inevitable, what makes the difference
is aging speed. Aging speed is defined as the difference be-
tween structural age (predicted age) and the demographic
age (actual age). Higher aging speed indicates more like-
lihood (or vulnerability) to dementia. We fit the proposed
algorithm into a regression task to estimate the patient age
based on the healthy control set. Then, we calculate the esti-
mation difference, Adif in Equation (5). Figure 3 shows the
results. It can be seen that at younger age (e.g. <70) the
predicted age tends to be smaller than the actual age, while
at older age (e.g. >70) the predicted age tends to be larger
than the actual age. This is mainly due to the fact that age is
a strong indicator of AD. Therefore, there are more occur-
rences in the senior population. In our current experiment
settings, mean(Adif ) = 1.33 and std(Adif ) = 8.72.
Based on normality tests, i.e. Kolmogorov-Smirnov test and
Anderson-Darling test, the difference between predicted age
and real age can be modeled by a normal distribution.
Predicted Age
Predicted Age ' std
Figure 3: Predicted age Vs. actual age.
Mean estimation difference, mean(Adif ), for HC, MCI
and AD are -1.02, 1.65 and 2.44, respectively. It can be
seen that HC participants tend to be older than the predicted
age, while MCI and AD patients tend to be younger than
the predicted age. The physical meaning of the results is
that healthy people possess a younger brain (or nearly at the
same age). Nevertheless, the brain of an AD patient seems
older than the actual age. Besides, MCI patients are more
concentrated to the predicted age, while AD patients are
more scattered. Different pathological phases show differ-
ent statistical properties. Although HC, MCI and AD partic-
ipants show different mean estimation difference, the corre-
sponding standard deviations are relatively large, 4.35, 6.31
and 6.71. There are large overlap between neighboring cate-
gories. This is mainly due to the complex pathology of AD.
Although severe brain damage can lead to cognitive dis-
order, there is no direct causal relationship between struc-
tural anomaly and dementia symptoms in geriatric cohorts.
The underlying anatomical mechanism of dementia (cog-
nitive disorder) is still unclear. The majority of the input
features are brain volumes extracted from MRIs. Therefore,
our proposed framework is more suitable to identify struc-
tural anomaly. When it comes to cognitive disorder, cogni-
tive assessment scores, e.g. Mini Mental State Examination
(MMSE), may be more suitable, since they directly answer
all the criteria (in the form of interactive questionnaires) of
clinical dementia diagnosis. However, the objective of our
research project is to help identify risk factors associated
with brain structural changes and other related biomark-
ers, while cognitive assessments treat internal pathological
changes as a black box. This work offers a valuable tool
to model the aging process as different pathways featured
by various aging speed, which can be calculated based on
anatomical properties of the brain. It explains how brain
pathological aging affects the aging pathways of different
patients (Problem 3).
Automatic Diagnosis One of the most important prob-
lems in AD study is diagnosis. The key contribution of this
paper is an automatic diagnosis system based on RVFL net-
work. Manifold learning is incorporated as part of the sys-
tem to remove noise and construct low dimensional repre-
sentation of the original high dimensional data. Figure 4
shows the comparison results. In the experiments all the re-
sults are based on RVFL. We compare the results from prin-
cipal component analysis (PCA), Neighborhood Preserving
Embedding (NPE), Locality Preserving Projections (LPP)
and stochastic proximity embedding (SPE). It can be seen
that generally manifold learning algorithms improves the
performance of the classification algorithm. In particular,
NPE and LPP improve the system performance by about
2%. The optimal results are achieved at about 40 selected
Table 1 gives the experimental results. It can be seen
that our proposed algorithm achieves very promising results,
overall 93.28% accuracy. The precisions for HC, MCI and
AD are 92.91%, 92.21% and 96.64%, respectively. The sen-
sitivities for HC, MCI and AD are 94.81%, 95.36% and
86.48%, respectively. Although the sensitivity for AD is rel-
Figure 4: Comparison of different manifold learning algo-
atively low, 86.48%, AD is mainly misclassified as MCI. It
is not very harmful. It has to be noted that the progression
of AD is a gradual process, which may take decades. The
benchmarks between different pathology phases are rela-
tively vague. Therefore, classification of the transition stages
is very challenging.
Table 1: Confusion Matrix, ten-fold cross validation.
Predicted Class Rate
True HC 603 31 2 94.81%
Class MCI 37 1007 12 95.36%
AD 9 54 403 86.48%
Rate 92.91% 92.21% 96.64% 93.28%
Extensive comparison is made against Support Vector
Machine (SVM), Random Subspace (RS), Multilayer Per-
ceptron (MLP), Random Forest and Decision Tree. Table 2
shows the results for comparison targets. It can be seen that
our proposed algorithm achieves superior results. The im-
provements are 9.45%, 3.95%, 9.07%, and 25.91%, respec-
tively. Besides, we also show that with only medical imaging
data (‘Imaging’ in Table 2), the proposed system achieves
84.13% accuracy. The integration of multiple sources of
medical data show clear synergy and added value to the
overall performance.
Table 2: Recognition results for comparison targets (%).
Accuracy 83.83 89.33 84.21 67.37 84.13
Rel. Imp. 9.45 3.95 9.07 25.91 9.15
Another important problem in Alzheimer’s disease research
is prognosis, i.e. the prediction of AD progression (Problem
2). In this task, only patients with clear pathological progres-
sion are included in the experiments. Besides, since we uti-
lize coupled observation as input, only those patients with at
least 2 consecutive observation at the same stage are chosen.
There are totally 425 valid records in the prognosis task.
Table 3: Confusion Matrix for prognosis, ten-fold cross val-
Predicted Class Rate
Progression No Prog.
True Progression 32 26 55.17%
Class No Prog. 1 366 99.73%
Rate 96.97% 93.37% 93.65%
Table 3 shows the confusion matrix for the prognosis task.
It can be seen that our proposed algorithm achieves a nearly
perfect result to predict no progression, 99.73%. However,
when it comes to progression, the results are two-fold. On
the one hand, the proposed algorithm achieves very high
precision, 96.97%. On the other hand, the sensitivity is rel-
atively low, 55%. This is mainly due to the fact that clinical
diagnosis of various stages of Alzheimer’s consists of many
subjective decisions. Moreover, the anatomical structures of
the brain is closely correlated to dementia symptoms. How-
ever, there is no clear qualitative causal relationship between
structure change and the corresponding symptoms. Besides,
as shown in the table, the data in the prognosis problem are
extremely unbalanced. The data for ‘progression’ is less than
0.1 of the ‘No Progression’ category, which makes different
classifiers tend to overfit on the ‘No Progression’ class.
In this study, we show that Random Vector Functional-link
(RVFL) network is very suitable for Alzheimer’s disease
analysis, due to its ability to incorporate nonlinear relation-
ship with a single layer structure. The proposed algorithm
achieves very promising results in the diagnosis task. On
the other hand, our proposed algorithm achieves very high
prognosis precision with relatively low sensitivity. This may
due to the complex nature of Alzheimer’s disease pathology.
Moreover, we present a novel analysis framework to study
the aging speed of the participants, which clearly follows the
biological aging process. Preliminary results are presented
to show the potential of aging speed as a strong indicator
for diagnosis and prognosis applications as well as a tool to
assess future machine learning algorithms. Future work will
be focused on the investigation of a complete aging model
to describe different aging styles.
Data collection and sharing for this project was funded
by the Alzheimer’s Disease Neuroimaging Initiative
(ADNI) (National Institutes of Health Grant U01
AG024904) and DOD ADNI (Department of Defense
award number W81XWH-12-2-0012). Please refer to for more details.
Alzheimers Disease International,. 2015. World Alzheimer
Report 2015: The Global Impact of Dementia.
American Psychiatric Association,. 2013. Diagnostic and
statistical manual of mental disorders. 5th edition edition.
Belkin, M., and Niyogi, P. 2001. Laplacian eigenmaps and
spectral techniques for embedding and clustering. In Ad-
vances in Neural Information Processing Systems 14, 585–
591. MIT Press.
Breiman, L. 2001. Random Forests. Machine Learning
Dai, P.; Gwadry-Sridhar, F.; Bauer, M.; and Borrie, M. 2015.
A hybrid manifold learning algorithm for the diagnosis and
prognostication of Alzheimer’s disease. In AMIA 2015 An-
nual Symposium, San Francisco, CA, USA, 14-18 Nov.
Dai, P.; Gwadry-Sridhar, F.; Bauer, M.; and Borrie, M.
2016a. Bagging Ensembles for the Diagnosis and Prognosti-
cation of Alzheimer’s Disease. In the Thirtieth AAAI Confer-
ence on Artificial Intelligence (AAAI-16), Phoenix, Arizona,
USA, 12-17 Feb.
Dai, P.; Gwadry-Sridhar, F.; Bauer, M.; and Borrie,
M. 2016b. Longitudinal Brain Structure Changes
in Health/MCI Patients: A Deep Learning Approach
for the Diagnosis and Prognosis of Alzheimer’s Dis-
ease. In Alzheimer’s Association International Conference
(AAIC),Toronto, ON, Canada, 24-27 July.
Dale, A.; Fischl, B.; and Sereno, M. I. 1999. Cortical
surface-based analysis: I. segmentation and surface recon-
struction. NeuroImage 9(2):179 – 194.
Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A.-r.;
Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.;
and Kingsbury, B. 2012. Deep Neural Networks for Acous-
tic Modeling in Speech Recognition: The Shared Views of
Four Research Groups. IEEE Signal Processing Magazine
Huang, G.-B.; Zhu, Q.-Y.; and Siew, C.-K. 2006. Extreme
learning machine: Theory and applications. Neurocomput-
ing 70(1):489–501.
Husmeier, D. 1999. Neural Networks for Conditional Prob-
ability Estimation. Springer-Verlag London.
Keraudren, K.; Kyriakopoulou, V.; Rutherford, M.; Hajnal,
J. V.; and Rueckert, D. 2013. Localisation of the brain
in fetal MRI using bundled SIFT features. Medical image
computing and computer-assisted intervention : MICCAI ...
International Conference on Medical Image Computing and
Computer-Assisted Intervention 16(Pt 1):582–9.
Klassen; Pao; and Chen. 1988. Characteristics of the func-
tional link net: a higher order delta rule net. In IEEE In-
ternational Conference on Neural Networks, 507–513 vol.1.
Lebedev, A. V.; Westman, E.; Van Westen, G. J. P.; Kram-
berger, M. G.; Lundervold, A.; Aarsland, D.; Soininen, H.;
Koszewska, I.; Mecocci, P.; Tsolaki, M.; Vellas, B.; Love-
stone, S.; and Simmons, A. 2014. Random Forest ensem-
bles for detection and prediction of Alzheimer’s disease with
a good between-cohort robustness. NeuroImage. Clinical
LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning.
Nature 521(7553):436–444.
Lee, J. J. A., and Verleysen, M. 2007. Nonlinear dimension-
ality reduction. Springer.
Li, F.; Tran, L.; Thung, K.-H.; Ji, S.; Shen, D.; and Li, J.
2015. A Robust Deep Model for Improved Classification of
AD/MCI Patients. IEEE journal of biomedical and health
informatics 19(5):1610–6.
Lin, T., and Zha, H. 2008. Riemannian manifold learning.
Pattern Analysis and Machine Intelligence, IEEE Transac-
tions on 30(5):796–809.
opez, M.; Ram´
ırez, J.; G´
orriz, J.; ´
Alvarez, I.; Salas-
Gonzalez, D.; Segovia, F.; Chaves, R.; Padilla, P.; and
ıo, M. 2011. Principal component analysis-based
techniques and supervised classification schemes for the
early detection of Alzheimer’s disease. Neurocomputing
Payan, A., and Montana, G. 2015. Predicting alzheimer’s
disease: a neuroimaging study with 3d convolutional neural
networks. CoRR abs/1502.02506.
Rahimi, A., and Recht, B. 2008. Random features for large-
scale kernel machines. In Platt, J. C.; Koller, D.; Singer,
Y.; and Roweis, S. T., eds., Advances in Neural Information
Processing Systems 20. Curran Associates, Inc. 1177–1184.
Saxe, A.; Koh, P. W.; Chen, Z.; Bhand, M.; Suresh, B.; and
Ng, A. Y. 2011. On random weights and unsupervised
feature learning. In Getoor, L., and Scheffer, T., eds., Pro-
ceedings of the 28th International Conference on Machine
Learning (ICML-11), 1089–1096. New York, NY, USA:
Scott, K. R., and Barrett, A. M. 2007. Dementia syndromes:
evaluation and treatment. Expert review of neurotherapeu-
tics 7(4):407–22.
Weiner, M.; Veitch, D.; Aisen, P.; Beckett, L.; Cairns, N.;
Green, R.; Harvey, D.; Jack, C.; Jagust, W.; Liu, E.; Mor-
ris, J.; Petersen, R.; Saykin, A.; Schmidt, M.; Shaw, L.;
Siuciak, J. A.; Soares, H.; Toga, A.; and Trojanowski, J.
2012. The Alzheimer’s Disease Neuroimaging Initiative: a
review of papers published since its inception. Alzheimer’s
& dementia : the journal of the Alzheimer’s Association 8(1
Xiaofei He; Deng Cai; Shuicheng Yan; and Hong-Jiang
Zhang. 2005. Neighborhood preserving embedding. In
Tenth IEEE International Conference on Computer Vision
(ICCV’05) Volume 1, volume 2, 1208–1213 Vol. 2. IEEE.
Xiaofei He, P. N. 2003. Locality Preserving Projections.
... The universal approximation ability was proved by Huang et al. [11]. The RVFL was adopted into the learning from Alzheimer's disease samples and got favorable performance [14]. Li et al. [15] proposed an adaptive cost-sensitive ELM for the diagnose of software defect. ...
... Step 3. Using Eq. 6 to get the output weights of this neural network, then the ensemble result can be get via Eq. 4, in this case of ensemble learning, the input is . Then we can get the ensemble model with trained output (14) ...
Full-text available
Randomized methods are practical and efficient for training the connectionist models. In this paper, we contribute to develop a self-stacking random weight neural network. Two different methods of feature fusion are proposed in this paper. The first one inter-connects the coarse and high level features to make the classification decision more diverse by using the proposed hierarchical network architecture with dense connectivity. On the other hand, the different decisions all-throughout the network are incorporated by a novel non-linear ensemble learning in an end-to-end manner. Through experiments, we verified the effectiveness of random features fusion, and even if each hierarchical branch in the network has very unfavorable accuracy, the proposed ensemble learning presents the impressive performance to boost the classification results. Moreover, the proposed connectionist model is applied to address one practice engineering problem of gearbox fault diagnosis, and the simulation demonstrates that our method has better robust to the noise in vibration signal of working gearbox.
... Recently there is an increase in research in RVFLN because of its one step learning method and random selection of input layer parameters that significantly reduces the training time. RVFLN applications are found in solar power prediction [15], biomedical data classification [16], crude oil prediction [17 -19], electric load forecasting [20], and disease forecasting in healthcare [21] etc. ...
Full-text available
Manual estimation of compressive strength of concrete (CSC) is a time consuming and difficult task. It has been an active area of research in the domain of manufacturing engineering. Soft computing techniques are found better to statistical methods applied for prediction of CSC. However, sophisticated prediction models are still missing and need to be explored. Random vector functional link network (RVFLN) significantly reduces the time complexity by assigning input layer weights and bias randomly without further modification. Only, output layer weights are calculated iteratively by gradient methods or non-iteratively by closed form solution like least square methods. It is an efficient algorithm with low time complexity and is able to handle complex domain problems without compromising on accuracy. Motivated from characteristics of RVFLN, in this work we develop a RVFLN-based forecast for estimating compressive strength of concrete cement. A publicly available dataset from UCI repository is used to develop and access the performance of the model. For comparative analysis, few other models such as FLANN, MLP, RBFNN, MLR, and ARIMA are also developed and used for the forecasting considering samples with curing ages at 14, 28, 56, and 91 days. All the models are evaluated in terms of MAPE, ARV, U of Theil’s statistics (UT), NMSE, and execution time. Outcomes of the comparative studies and statistical significance tests are in favor of RVFLN-based forecasting.
... Another solution is integrating existing cognitive tests and biomarkers using machine learning models to improve performance [52][53][54]. However, conventional interpretable models, e.g., decision trees and linear models, can hardly deal with complex tasks like the prediction or diagnosis of dementia due to their limited model capacity. ...
Full-text available
Background: Accurate, cheap, and easy to promote methods for dementia prediction and early diagnosis are urgently needed in low- and middle-income countries. Integrating various cognitive tests using machine learning provides promising solutions. However, most effective machine learning models are black-box models that are hard to understand for doctors and could hide potential biases and risks. Objective: To apply cognitive-test-based machine learning models in practical dementia prediction and diagnosis by ensuring both interpretability and accuracy. Methods: We design a framework adopting Rule-based Representation Learner (RRL) to build interpretable diagnostic rules based on the cognitive tests selected by doctors. According to the visualization and test results, doctors can easily select the final rules after analysis and trade-off. Our framework is verified on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (n = 606) and Peking Union Medical College Hospital (PUMCH) dataset (n = 375). Results: The predictive or diagnostic rules learned by RRL offer a better trade-off between accuracy and model interpretability than other representative machine learning models. For mild cognitive impairment (MCI) conversion prediction, the cognitive-test-based rules achieve an average area under the curve (AUC) of 0.904 on ADNI. For dementia diagnosis on subjects with a normal Mini-Mental State Exam (MMSE) score, the learned rules achieve an AUC of 0.863 on PUMCH. The visualization analyses also verify the good interpretability of the learned rules. Conclusion: With the help of doctors and RRL, we can obtain predictive and diagnostic rules for dementia with high accuracy and good interpretability even if only cognitive tests are used.
... Multiple machine learning models have been proposed for the diagnosis of AD [6], [7]. Randomized neural networks [8] have also been developed for AD. Support vector machines (SVMs), artificial neural networks (ANN) and deep learning (DL) are popular machine learning tools used for the diagnosis of AD [9], [10]. ...
Full-text available
In this paper, deep RVFL and its ensembles are enabled to incorporate privileged information, however, the standard RVFL model and its deep models are unable to use privileged information. Privileged information-based approach commonly seen in human learning. To fill this gap, we incorporate learning using privileged information (LUPI) in deep RVFL model and propose deep RVFL with LUPI framework (dRVFL+). Privileged information is available while training the models. To make the model more robust, we propose ensemble deep RVFL+ with LUPI framework (edRVFL+). Unlike traditional ensemble approach wherein multiple base learners are trained, the proposed edRVFL+ optimises a single network and generates an ensemble via optimization at different levels of random projections of the data. Both dRVFL+ and edRVFL+ efficiently utilise the privileged information which results in better generalization performance. In LUPI framework, half of the available features are used as normal features and rest as the privileged features. However, we propose a novel approach for generating the privileged information. To the best of our knowledge, this is first time that a separate privileged information is generated. The proposed models are employed for the diagnosis of Alzheimer's disease. Experimental results show the promising performance of both the proposed models.
... Having heterogeneous medical data, it is a challenging task to diagnose it at an early stage. Dai et al. [65] proposed a hybrid model combining features extracted from different modalities and introduced the manifold concept in the RVFL model to enhance the performance of a standard model in the diagnosis process of AD. ...
Full-text available
Neural networks have been successfully employed in various domain such as classification, regression and clustering, etc. Generally, the back propagation (BP) based iterative approaches are used to train the neural networks, however, it results in the issues of local minima, sensitivity to learning rate and slow convergence. To overcome these issues, randomization based neural networks such as random vector functional link (RVFL) network have been proposed. RVFL model has several characteristics such as fast training speed, simple architecture, and universal approximation capability, that make it a viable randomized neural network. This article presents the comprehensive review of the development of RVFL model, which can serve as the extensive summary for the beginners as well as practitioners. We discuss the shallow RVFL, ensemble RVFL, deep RVFL and ensemble deep RVFL models. The variations, improvements and applications of RVFL models are discussed in detail. Moreover, we discuss the different hyperparameter optimization techniques followed in the literature to improve the generalization performance of the RVFL model. Finally, we give potential future research directions/opportunities that can inspire the researchers to improve the RVFL architecture further.
... Cecotti et al. [4] used deep RVFL neural networks for recognition of handwritten characters. Scardapane et al. [49] present Bayesian inference techniques for data modelling with RVFL networks, while Dai et al. [11] used RVFL networks for the diagnosis of Alzheimer's disease. The authors also used their model to determine the progression of disease. ...
Full-text available
Falls are a major health concern and result in high morbidity and mortality rates in older adults with high costs to health services. Automatic fall classification and detection systems can provide early detection of falls and timely medical aid. This paper proposes a novel Random Vector Functional Link (RVFL) stacking ensemble classifier with fractal features for classification of falls. The fractal Hurst exponent is used as a representative of fractal dimensionality for capturing irregularity of accelerometer signals for falls and other activities of daily life. The generalised Hurst exponents along with wavelet transform coefficients are leveraged as input feature space for a novel stacking ensemble of RVFLs composed with an RVFL neural network meta-learner. Novel fast selection criteria are presented for base classifiers founded on the proposed diversity indicator, obtained from the overall performance values during the training phase. The proposed features and the stacking ensemble provide the highest classification accuracy of 95.71% compared with other machine learning techniques, such as Random Forest (RF), Artificial Neural Network (ANN) and Support Vector Machine. The proposed ensemble classifier is 2.3× faster than a single Decision Tree and achieves the highest speedup in training time of 317.7× and 198.56× compared with a highly optimised ANN and RF ensemble, respectively. The significant improvements in training times of the order of 100× and high accuracy demonstrate that the proposed RVFL ensemble is a prime candidate for real-time, embedded wearable device–based fall detection systems.
... Neural networks with random weights (NNRW) have received extensive attention in recent years, and relevant algorithms and applications have shown great potential in many fields [3,5]. Some notable algorithms include: random vector functional link network (RVFL) [8], extreme learning machine (ELM) [7], stochastic configuration network (SCN) [10], broad learning system (BLS) [4], etc. ...
p>This paper comments on our recently published conference paper entitled "An Initial Study on the Relationship Between Meta Features of Dataset and the Initialization of NNRW". We point out that the above-mentioned article has a typographical error in proving that using Gamma distribution to initialize NNRW is not a good choice, and give the corresponding correct proof.
Ensembles are known to reduce the risk of selecting the wrong model by aggregating all candidate models. Ensembles are known to be more accurate than single models. Accuracy has been identified as an important factor in explaining the success of ensembles. Several techniques have been proposed to improve ensemble accuracy. But, until now, no perfect one has been proposed. The focus of this research is on how to create accurate ensemble learning machine (ELM) in the context of classification to deal with supervised data, noisy data, imbalanced data, and semi-supervised data. To deal with mentioned issues, the authors propose a heterogeneous ELM ensemble. The proposed heterogeneous ensemble of ELMs (AELME) for classification has different ELM algorithms, including regularized ELM (RELM) and kernel ELM (KELM). The authors propose new diverse AdaBoost ensemble-based ELM (AELME) for binary and multiclass data classification to deal with the imbalanced data issue.
Conference Paper
Full-text available
Alzheimer’s disease (AD) is a chronic neurodegenerative disease, which involves the degeneration of various brain functions, resulting in memory loss, cognitive disorder and death. Large amounts of multivariate heterogeneous medical test data are available for the analysis of brain deterioration. How to measure the deterioration remains a challenging problem. In this study, we first investigate how different regions of the human brain change as the patient develops AD. Correlation analysis and feature ranking are performed based on the feature vectors from different stages of the pathologic process in Alzheimer disease. Then, an automatic diagnosis system is presented, which is based on a hybrid manifold learning for feature embedding and the bootstrap aggregating (Bagging) algorithm for classification. We investigate two different tasks, i.e. diagnosis and progression prediction. Extensive comparison is made against Support Vector Machines (SVM), Random Forest (RF), Decision Tree (DT) and Random Subspace (RS) methods. Experimental results show that our proposed algorithm yields superior results when compared to the other methods, suggesting promising robustness for possible clinical applications.
Full-text available
Accurate classification of Alzheimer's disease (AD) and its prodromal stage, mild cognitive impairment (MCI), plays a critical role in possibly preventing progression of memory impairment and improving quality of life for AD patients. Among many research tasks, it is of a particular interest to identify noninvasive imaging biomarkers for AD diagnosis. In this paper, we present a robust deep learning system to identify different progression stages of AD patients based on MRI and PET scans. We utilized the dropout technique to improve classical deep learning by preventing its weight coadaptation, which is a typical cause of overfitting in deep learning. In addition, we incorporated stability selection, an adaptive learning factor, and a multitask learning strategy into the deep learning framework. We applied the proposed method to the ADNI dataset, and conducted experiments for AD and MCI conversion diagnosis. Experimental results showed that the dropout technique is very effective in AD diagnosis, improving the classification accuracies by 5.9% on average as compared to the classical deep learning methods.
Full-text available
The diagnosis of Alzheimer’s disease (AD) requires a variety of medical tests, which leads to huge amounts of multivariate heterogeneous data. Such data are difficult to compare, visualize, and analyze due to the heterogeneous nature of medical tests. We present a hybrid manifold learning framework, which embeds the feature vectors in a subspace preserving the underlying pairwise similarity structure, i.e. similar/dissimilar pairs. Evaluation tests are carried out using the neuroimaging and biological data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) in a three-class (normal, mild cognitive impairment, and AD) classification task using support vector machine (SVM). Furthermore, we make extensive comparison with standard manifold learning algorithms, such as Principal Component Analysis (PCA), Principal Component Analysis (PCA), Multidimensional Scaling (MDS), and isometric feature mapping (Isomap). Experimental results show that our proposed algorithm yields an overall accuracy of 85.33% in the three-class task.
Full-text available
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Full-text available
Pattern recognition methods using neuroimaging data for the diagnosis of Alzheimer's disease have been the subject of extensive research in recent years. In this paper, we use deep learning methods, and in particular sparse autoencoders and 3D convolutional neural networks, to build an algorithm that can predict the disease status of a patient, based on an MRI scan of the brain. We report on experiments using the ADNI data set involving 2,265 historical scans. We demonstrate that 3D convolutional neural networks outperform several other classifiers reported in the literature and produce state-of-art results.
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
The visual interpretation of data is an essential step to guide any further processing or decision making. Dimensionality reduction (or manifold learning) tools may be used for visualization if the resulting dimension is constrained to be 2 or 3. The field of machine learning has developed numerous nonlinear dimensionality reduction tools in the last decades. However, the diversity of methods reflects the diversity of quality criteria used both for optimizing the algorithms, and for assessing their performances. In addition, these criteria are not always compatible with subjective visual quality. Finally, the dimensionality reduction methods themselves do not always possess computational properties that are compatible with interactive data visualization. This paper presents current and future developments to use dimensionality reduction methods for data visualization.