Article

Prospectives of Big Data Analytics and Explainable Machine Learning in Identification of Probable Biomarkers of Alzheimer's disease

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This system provides users with individualized, model-agnostic explanations and interactive visualizations, as well as the ability to forecast future stress behavior. In an effort to find possible biomarkers linked to Alzheimer's disease, [46] modeled the large Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset using big data technology and machine learning correlation approaches. Using a real-world dataset, [47] proposes a unique approach that combines transparent machine learning with methods to enable prescriptive analytics to identify learners who are at danger of not finishing their certificate. ...
Article
Full-text available
Alzheimer’s disease is still a field of research with lots of open questions. The complexity of the disease prevents the early diagnosis before visible symptoms regarding the individual’s cognitive capabilities occur. This research presents an in-depth analysis of a huge data set encompassing medical, cognitive and lifestyle’s measurements from more than 12,000 individuals. Several hypothesis were established whose validity has been questioned considering the obtained results. The importance of appropriate experimental design is highly stressed in the research. Thus, a sequence of methods for handling missing data, redundancy, data imbalance, and correlation analysis have been applied for appropriate preprocessing of the data set, and consequently XGBoost model has been trained and evaluated with special attention to the hyperparameters tuning. The model was explained by using the Shapley values produced by the SHAP method. XGBoost produced a f1-score of 0.84 and as such is considered to be highly competitive among those published in the literature. This achievement, however, was not the main contribution of this paper. This research’s goal was to perform global and local interpretability of the intelligent model and derive valuable conclusions over the established hypothesis. Those methods led to a single scheme which presents either positive, or, negative influence of the values of each of the features whose importance has been confirmed by means of Shapley values. This scheme might be considered as additional source of knowledge for the physicians and other experts whose concern is the exact diagnosis of early stage of Alzheimer’s disease. The conclusions derived from the intelligent model’s data-driven interpretability confronted all the established hypotheses. This research clearly showed the importance of explainable Machine learning approach that opens the black box and clearly unveils the relationships among the features and the diagnoses.
Conference Paper
Full-text available
Timely uncovering of various dementia stages is vital in formulating effective treatment strategies for Alzheimer disease. The high-resolution MRI can be progressively exploited in the classification of various stages of dementia that in turn help in the development of efficient therapeutic stratagems. Considering the fact that analysis of the huge volume of data is a tenacious challenge, we explored the competence of machine learning (ML) based algorithms to identify stages of dementia in inflicted patients. The employed dimensionality reduction approach relied on the cross-sectional dataset of 434 MRI sessions of 416 subjects, aged between 18 to 96 years. A five-step strategy involving Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Neighbourhood Component Analysis (NCA), Factor Analysis (FA), and Fast Independent Component Analysis (FastICA) was developed. Next, numerous supervised machine learning algorithms were explored to classify the input data. The as-developed method attained an overall accuracy of 87 percent, which means a noteworthy improvement over the existing classical ML approach.
Conference Paper
Full-text available
Dementia, a fatal and progressive neurodegenerative disease, poses a huge challenge to health care fraternity. A timely diagnosis of this intricate disorder can help in designing and formulating strategies helpful in effective regulation of the disease. There exists a huge amount of multivariate heterogeneous clinical data which can be developed by examining MRI of the brain of the inflicted patients. Discretization is a widely employed method which transforms continuous attributes into discrete attributes by building a group of contiguous intervals. This helps in extending the range of the attribute’s values. It is employed to handle the outlier problem that may have a noteworthy effect on the generalization performance of the model. In this study, we propose a machine learning-based probabilistic approach aimed to evaluate the dementia progression in cross-sectional Magnetic Resonance Imaging (MRI) data of demented and nondemented adults. We analyzed a dataset based on 416 subjects, aged between 18 to 96 years. Initially, we built a ML model employing discretization and non-discretization on the MRI dataset. Next, we employed kmeans strategy and three encoding techniques viz. ordinal, onehot-dense and onehot. Clinical Dementia Rating (CDR) score aided in determining the classes of dementia. Besides overcoming inherent adverse outlier issues, the proposed approach helped in spreading the values of a skewed attribute throughout a set of bins along with the same number of observations. We compared the performance of 18 employed ML classifiers. Experimental results on the MRI data showed noteworthy relative improvement in the pattern analysis of both discretization and non-discretization approach.
Article
Full-text available
Over the past decade, data recorded (due to digitization) in healthcare sectors have continued to increase, intriguing the thought about big data in healthcare. There already exists plenty of information, ready for analysis. Researchers are always putting their best effort to find valuable insight from the healthcare big data for quality medical services. This article provides a systematic review study on healthcare big data based on the systematic literature review (SLR) protocol. In particular, the present study highlights some valuable research aspects on healthcare big data, evaluating 34 journal articles (between 2015 and 2019) according to the defined inclusion-exclusion criteria. More specifically, the present study focuses to determine the extent of healthcare big data analytics together with its applications and challenges in healthcare adoption. Besides, the article discusses big data produced by these healthcare systems, big data characteristics, and various issues in dealing with big data, as well as how big data analytics contributes to achieve a meaningful insight on these data set. In short, the article summarizes the existing literature based on healthcare big data, and it also helps the researchers with a foundation for future study in healthcare contexts.
Article
Full-text available
Objective: To evaluate and compare the performance of Random Forest (RF) ensemble classifier in imputation and non-imputation method of missing data values, and its impact to diagnose Alzheimer's disease (AD) based on longitudinal MRI data. Method: We studied 373 MRI sessions involving 150 AD subjects aged 60 to 90 years [Mean age ± SD = 77.01 ± 7.64]. T1-weighted MRI of each subject on a 1.5-T Vision scanner were used for the image acquisition. The MRI dataset was taken from OASIS (Open Access Series of Imaging Studies) database. Based upon the MRI acquitted features in the dataset, we applied missing data imputation using RF ensemble to classify the subjects as demented or non-demented. We then compared them to determine which is more precise in the AD diagnosis Result: RF model-based imputation analysis outperforms with better accuracy than RF non-imputation method.
Article
Full-text available
Alzheimer’s disease (AD) is the most common type of neurological disorder that leads to the brain’s cell death overtime. It is one of the major important causes of memory loss and cognitive decline in elderly subjects around the globe. Early detection and streamlining of diagnostic practices are the prime domains of the interest to the healthcare community. Machine learning (ML) algorithms and numerous multivariate data exploratory tools have been extensively used in the field of AD research. The primary purpose of this study is to present an automated classification system to retrieve information patterns. We proposed a five-stage ML pipeline, where each stage was further categorized in different sub-levels. The study relied on the Open Access Series of Imaging Studies (OASIS) database of MRI (Magnetic Resonance Imaging) brain images for the analysis. The dataset comprised of 343 MRI sessions involving 150 subjects. Three different scores namely, MMSE (Mini-Mental State Examination), CDR (Clinical Dementia Rating), and ASF (Atlas Scaling Factor) were used in the analysis. The proposed ML pipeline constitutes a classifier system along with data transformation and feature selection techniques that have been embedded inside an experimental and data analysis design. Performance metrics for Random Forest (RF) classifier showed the highest output in the classification accuracy.
Article
Full-text available
The clinical data are often multimodal and consist of both structured data and unstructured data. The modeling of clinical data has become a very important and challenging problem in healthcare big data analytics. Most existing systems focus on only one type of data. In this paper, we propose a knowledge graph based method to build the linkage between various types of multimodal data. First, we build a semantic-rich knowledge base using both medical dictionaries and practical clinical data collected from hospitals. Second, we propose a graph modeling method to bridge the gap between different types of data, and the multimodal clinical data of each patient are fused and modeled as one unified profile graph. To capture the temporal evolution of the patient’s clinical case, the profile graph is represented as a sequence of evolving graphs. Third, we develop a lazy learning algorithm for automatic diagnosis based on graph similarity search. To evaluate our method, we conduct experimental studies on ICU patient diagnosis and Orthopaedics patient classification. The results show that our method could outperform the baseline algorithms.We also implement a real automatic diagnosis system for clinical use. The results obtained from the hospital demonstrate high precision.
Conference Paper
Full-text available
Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
Article
Full-text available
Big Data (BD), with their potential to ascertain valued insights for enhanced decision-making process, have recently attracted substantial interest from both academics and practitioners. Big Data Analytics (BDA) is increasingly becoming a trending practice that many organizations are adopting with the purpose of constructing valuable information from BD. The analytics process, including the deployment and use of BDA tools, is seen by organizations as a tool to improve operational efficiency though it has strategic potential, drive new revenue streams and gain competitive advantages over business rivals. However, there are different types of analytic applications to consider. Therefore, prior to hasty use and buying costly BD tools, there is a need for organizations to first understand the BDA landscape. Given the significant nature of the BD and BDA, this paper presents a state-of-the-art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/employed by organizations to help others understand this landscape with the objective of making robust investment decisions. In doing so, systematically analysing and synthesizing the extant research published on BD and BDA area. More specifically, the authors seek to answer the following two principal questions: Q1 – What are the different types of BD challenges theorized/proposed/confronted by organizations? and Q2 – What are the different types of BDA methods theorized/proposed/employed to overcome BD challenges?. This systematic literature review (SLR) is carried out through observing and understanding the past trends and extant patterns/themes in the BDA research area, evaluating contributions, summarizing knowledge, thereby identifying limitations, implications and potential further research avenues to support the academic community in exploring research themes/patterns. Thus, to trace the implementation of BD strategies, a profiling method is employed to analyze articles (published in English-speaking peer-reviewed journals between 1996 and 2015) extracted from the Scopus database. The analysis presented in this paper has identified relevant BD research studies that have contributed both conceptually and empirically to the expansion and accrual of intellectual wealth to the BDA in technology and organizational resource management discipline.
Article
Full-text available
We present a technique for clustering categorical data by generating many dissimilarity matrices and averaging over them. We begin by demonstrating our technique on low dimensional categorical data and comparing it to several other techniques that have been proposed. Then we give conditions under which our method should yield good results in general. Our method extends to high dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context we compare our method with two other methods. Finally, we extend our method to high dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give examples to show that our method continues to provide good results, in particular, better in the context of genome sequences than clusterings suggested by phylogenetic trees.
Article
Full-text available
Neurofibrillary tangles are associated with cognitive dysfunction, and hippocampal atrophy with increased CSF tau markers. However, the plasma tau levels of Alzheimer's disease (AD) have not been well studied. We investigated plasma tau by using an immunomagnetic reduction assay in 20 patients with mild cognitive impairment (MCI) due to AD, 10 early AD dementia, and 30 healthy elders (HE). All received a 3D-brain MRI scan and a set of cognitive function test. We explored their relationships with both brain structure and cognitive functions. Images were analyzed to determine the brain volumes and gray matter densities. Patients with MCI or early AD had significantly increased plasma tau levels compared with HE. Plasma tau levels were negatively associated with the performance of logical memory, visual reproduction, and verbal fluency; also negatively associated with volume of total gray matter, hippocampus, amygdala; and gray matter densities of various regions. Regression analyses indicated that logical memory explained 0.394 and hippocampus volume predicted .608 of the variance of plasma tau levels, both P < 0.001. Education years were negatively associated with the gray matter densities of the supramarginal (r = -0.407), middle temporal gyrus (r = -0.40) and precuneus (r = -0.377; all P < 0.05) in HE; and negatively associated with plasma tau levels in patients (r = -0.626). We propose that plasma tau may serve as a window to both structure and function of the brain. Higher education is a protective factor against AD and is associated with lower plasma tau levels in patients. Hum Brain Mapp, 2013. © 2013 Wiley Periodicals, Inc.
Article
Full-text available
Alzheimer’s disease (AD) affects approximately 35 million people worldwide. Increasing evidence suggests that many risk factors for AD are modifiable. AD pathology develops over decades. Hence risk reduction interventions require very long follow-ups to show effects on AD incidence. Focussing on AD risk, instead of diagnosis, provides a more realistic target for prevention strategies. We developed a novel methodology that yields a global approach to risk assessment for AD for use in population-based settings and interventions. The methodology was used to develop a risk assessment tool that can be updated as more evidence becomes available. First, a systematic search strategy identified risk and protective factors for AD. Eleven risk factors and four protective factors for AD were identified for which odds ratios were published or could be calculated (age, sex, education, body mass index, diabetes, depression, serum cholesterol, traumatic brain injury, smoking, alcohol intake, social engagement, physical activity, cognitive activity, fish intake, and pesticide exposure). An algorithm was developed to combine the odds ratios into an AD risk score. The approach allows for interactions among risk factors which provides for their varying impact over the life-course as current evidence suggests midlife is a critical period for some risk factors. Finally, a questionnaire was developed to assess the risk and protective factors by self-report. Compared with developing risk indices on single cohort studies, this approach allows for more risk factors to be included, greater generalizeability of results, and incorporation of interactions based on findings from different stages of the lifecourse. Electronic supplementary material The online version of this article (doi:10.1007/s11121-012-0313-2) contains supplementary material, which is available to authorized users.
Article
Full-text available
With increasing life expectancy in developed countries, the incidence of Alzheimer's disease (AD) and its socioeconomic impact are growing. Increasing knowledge of the mechanisms of AD facilitates the development of treatment strategies aimed at slowing down or preventing neuronal death. AD treatment trials using clinical outcome measures require long observation times and large patient samples. There is increasing evidence that neuroimaging and cerebrospinal fluid and blood biomarkers may provide information that may reduce sample sizes and observation periods. The Alzheimer's Disease Neuroimaging Initiative will help identify clinical, neuroimaging, and biomarker outcome measures that provide the highest power for measurement of longitudinal changes and for prediction of transitions.
Article
Full-text available
Here I recap the scientific and personal background of the delineation of the amyloid cascade hypothesis for Alzheimer's disease that I wrote with Gerry Higgins and the events leading to the writing of that influential review.
Article
Background Alzheimer disease (AD) is a degenerative progressive brain disorder where symptoms of dementia and cognitive impairment intensify over time. Numerous factors exist that may or may not be related to the lifestyle of a patient that result in a higher risk for AD. Diagnosing the disorder in its beginning period is important, and several techniques are used to diagnose AD. A number of studies have been conducted on the detection and diagnosis of AD. This paper reports the empirical study performed on the longitudinal-based magnetic resonance imaging (MRI) Open Access Series of Brain Imaging dataset. Furthermore, the study highlights several factors that influence the prediction of AD. Objective This study aimed to correlate the effect of various factors such as age, gender, education, and socioeconomic background of patients with the development of AD. The effect of patient-related factors on the severity of AD was assessed on the basis of MRI features, Mini-Mental State Examination (MMSE), Clinical Dementia Rating (CDR), estimated total intracranial volume (eTIV), normalized whole brain volume (nWBV), and Atlas Scaling Factor (ASF). Methods In this study, we attempted to establish the role of longitudinal MRI in an exploratory data analysis (EDA) of AD patients. EDA was performed on the dataset of 150 patients for 343 MRI sessions (mean age 77.01 [SD 7.64] years). The T1-weighted MRI of each subject on a 1.5-Tesla Vision (Siemens) scanner was used for image acquisition. Scores of three features, MMSE, CDR, and ASF, were used to characterize the AD patients included in this study. We assessed the role of various features (ie, age, gender, education, socioeconomic status, MMSE, CDR, eTIV, nWBV, and ASF) on the prognosis of AD. Results The analysis further establishes the role of gender in the prevalence and development of AD in older people. Moreover, a considerable relationship has been observed between education and socioeconomic position on the progression of AD. Also, outliers and linearity of each feature were determined to rule out the extreme values in measuring the skewness. The differences in nWBV between CDR=0 (nondemented), CDR=0.5 (very mild dementia), and CDR=1 (mild dementia) are significant (ie, P<.01). Conclusions A substantial correlation has been observed between the pattern and other related features of longitudinal MRI data that can significantly assist in the diagnosis and determination of AD in older patients.
Article
Age, apolipoprotein E ε4 (APOE) and chromosomal sex are well-established risk factors for late-onset Alzheimer’s disease (LOAD; AD). Over 60% of persons with AD harbor at least one APOE-ε4 allele. The sex-based prevalence of AD is well documented with over 60% of persons with AD being female. Evidence indicates that the APOE-ε4 risk for AD is greater in women than men, which is particularly evident in heterozygous women carrying one APOE-ε4 allele. Paradoxically, men homozygous for APOE-ε4 are reported to be at greater risk for mild cognitive impairment and AD. Herein, we discuss the complex interplay between the three greatest risk factors for Alzheimer’s disease, age, APOE-ε4 genotype and female sex. We propose that the convergence of these three risk factors, and specifically the bioenergetic aging perimenopause to menopause transition unique to the female, creates a risk profile for AD unique to the female. Further, we discuss the unique risk of the APOE4 positive male which appears to emerge early in the aging process. Evidence for impact of the triad of AD risk factors is most evident in the temporal trajectory of AD progression and burden of pathology in relation to APOE genotype, age and sex. Collectively, the data indicate complex interactions between age, APOE genotype and gender that belies a one size fits all approach and argues for a precision medicine approach that integrates across the three main risk factors for Alzheimer’s disease.
Article
Atrial fibrillation (AF) has been linked with an increased risk for cognitive impairment and dementia. To complete a meta-analysis of studies examining the association between AF and cognitive impairment. Search of MEDLINE, PsycINFO, Cochrane Library, CINAHL, and EMBASE databases and hand search of article references. Prospective and nonprospective studies reporting adjusted risk estimates for the association between AF and cognitive impairment. Two abstracters independently extracted data on study characteristics, risk estimates, methods of AF and outcome ascertainment, and methodological quality. Twenty-one studies were included in the meta-analysis. Atrial fibrillation was significantly associated with a higher risk for cognitive impairment in patients with first-ever or recurrent stroke (relative risk [RR], 2.70 [95% CI, 1.82 to 4.00]) and in a broader population including patients with or without a history of stroke (RR, 1.40 [CI, 1.19 to 1.64]). The association in the latter group remained significant independent proof of clinical stroke history (RR, 1.34 [CI, 1.13 to 1.58]). However, there was significant heterogeneity among studies of the broader population (I2 = 69.4%). Limiting the analysis to prospective studies yielded similar results (RR, 1.36 [CI, 1.12 to 1.65]). Restricting the analysis to studies of dementia eliminated the significant heterogeneity (P = 0.137) but did not alter the pooled estimate substantially (RR, 1.38 [CI, 1.22 to 1.56]). There is an inherent bias because of confounding variables in observational studies. There was significant heterogeneity among included studies. Evidence suggests that AF is associated with a higher risk for cognitive impairment and dementia, with or without a history of clinical stroke. Further studies are required to elucidate the association between AF and subtypes of dementia as well as the cause of cognitive impairment. Deane Institute for Integrative Research in Atrial Fibrillation and Stroke at the Massachusetts General Hospital.
Article
At present, about 33·9 million people worldwide have Alzheimer's disease (AD), and prevalence is expected to triple over the next 40 years. The aim of this Review was to summarise the evidence regarding seven potentially modifiable risk factors for AD: diabetes, midlife hypertension, midlife obesity, smoking, depression, cognitive inactivity or low educational attainment, and physical inactivity. Additionally, we projected the effect of risk factor reduction on AD prevalence by calculating population attributable risks (the percent of cases attributable to a given factor) and the number of AD cases that might be prevented by risk factor reductions of 10% and 25% worldwide and in the USA. Together, up to half of AD cases worldwide (17·2 million) and in the USA (2·9 million) are potentially attributable to these factors. A 10-25% reduction in all seven risk factors could potentially prevent as many as 1·1-3·0 million AD cases worldwide and 184,000-492,000 cases in the USA.
Article
The relationships between alcohol consumption and dementia and cognitive decline were investigated in a systematic review including meta-analyses of 15 prospective studies. Follow-ups ranged from 2 to 8 years. Meta-analyses were conducted on samples including 14,646 participants evaluated for Alzheimer disease (AD), 10,225 participants evaluated for vascular dementia (VaD), and 11,875 followed for any type of dementia (Any dementia). The pooled relative risks (RRs) of AD, VaD, and Any dementia for light to moderate drinkers compared with nondrinkers were 0.72 (95% CI = 0.61-0.86), 0.75 (95% CI = 0.57-0.98), and 0.74 (95% CI = 0.61-0.91), respectively. When the more generally classified "drinkers," were compared with "nondrinkers," they had a reduced risk of AD (RR = 0.66, 95% CI = 0.47-0.94) and Any dementia (RR = 0.53, 95% CI = 0.53-0.82) but not cognitive decline. There were not enough data to examine VaD risk among "drinkers." Those classified as heavy drinkers did not have an increased risk of Any dementia compared with nondrinkers, but this may reflect sampling bias. Our results suggest that alcohol drinkers in late life have reduced risk of dementia. It is unclear whether this reflects selection effects in cohort studies commencing in late life, a protective effect of alcohol consumption throughout adulthood, or a specific benefit of alcohol in late life.
Conference Paper
Estimating statistical significance of detected differences between two groups of medical scans is a challenging problem due to the high dimensionality of the data and the relatively small number of training examples. In this paper, we demonstrate a non-parametric technique for estimation of statistical significance in the context of discriminative analysis (i.e., training a classifier function to label new examples into one of two groups). Our approach adopts permutation tests, first developed in classical statistics for hypothesis testing, to estimate how likely we are to obtain the observed classification performance, as measured by testing on a hold-out set or cross-validation, by chance. We demonstrate the method on examples of both structural and functional neuroimaging studies.
Article
In this study, the authors describe how the Clinical Dementia Rating (CDR) scale fits into the overall evaluation process in an outpatient memory clinic. Based on a retrospective review of 329 patients attending the clinic from 1994 to 1999, the evidence for the validity of the Clinical Dementia Rating's overall ability to stage dementia severity is presented. The Clinical Dementia Rating showed convergent validity when compared against clinical features, mental status, and psychometric test scores, and DSM III-R measures of dementia severity, thus underscoring the trans-cultural feasibility of the Clinical Dementia Rating instrument. The Clinical Dementia Rating is also congruent with the DSM-IV approach of identifying dementia, and demonstrates better discriminatory ability in the milder dementia stages compared with DSM III-R. Future research should focus on addressing the limitations of the Clinical Dementia Rating in other social settings, advanced cases, as well as detecting clinically significant change.
How does Alzheimer's affect women and men differently
  • Y Hara
Hara, Y., 2018, How does Alzheimer's affect women and men differently, https://www.alzdiscovery.org/cognitive-vitality/blog/how-doesalzheimers-affect-women-and-men-differently.
The Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception
  • M W Weiner
  • D P Veitch
  • P S Aisen
Weiner, M.W., Veitch, D.P., Aisen, P.S., et al., 2012, The Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception, Alzheimers Dement., 8(1).
Attitudes to dementia
World Alzheimer Report 2019 Attitudes to dementia, 2019.