Chapter

Multi-modal Neuroimaging Data Fusion via Latent Space Learning for Alzheimer’s Disease Diagnosis: First International Workshop, PRIME 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recent studies have shown that fusing multi-modal neuroimaging data can improve the performance of Alzheimer’s Disease (AD) diagnosis. However, most existing methods simply concatenate features from each modality without appropriate consideration of the correlations among multi-modalities. Besides, existing methods often employ feature selection (or fusion) and classifier training in two independent steps without consideration of the fact that the two pipelined steps are highly related to each other. Furthermore, existing methods that make prediction based on a single classifier may not be able to address the heterogeneity of the AD progression. To address these issues, we propose a novel AD diagnosis framework based on latent space learning with ensemble classifiers, by integrating the latent representation learning and ensemble of multiple diversified classifiers learning into a unified framework. To this end, we first project the neuroimaging data from different modalities into a common latent space, and impose a joint sparsity constraint on the concatenated projection matrices. Then, we map the learned latent representations into the label space to learn multiple diversified classifiers and aggregate their predictions to obtain the final classification result. Experimental results on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset show that our method outperforms other state-of-the-art methods.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Clinically, it is important to be able to discriminate multiple classes, so the proper clinical intervention (specific for a particular type of brain disorder) can be implemented in a timely manner. Indeed, multi-view feature learning for MCI and dementia diagnosis has been explored recently [10][11][12]. However, to the best of our knowledge, three-class classification analysis fully utilizing the two complementary MEG modalities (magnetometers and gradiometers) has yet to be thoroughly explored and reported in the literature. ...
Article
Full-text available
Magnetoencephalography (MEG) has been combined with machine learning techniques, to recognize the Alzhei-mer's disease (AD), one of the most common forms of dementia. However, most of the previous studies are limited to binary classification and do not fully utilize the two available MEG modalities (extracted using magnetometer and gradiometer sensors). AD consists of several stages of progression, this study addresses this limitation by using both magnetometer and gradiometer data to discriminate between participants with AD, AD-related mild cognitive impairment (MCI), and healthy control (HC) participants in the form of a three-class classification problem. A series of wavelet-based biomarkers are developed and evaluated, which concurrently leverage the spatial, frequency and time domain characteristics of the signal. A bimodal recognition system based on an improved score-level fusion approach is proposed to reinforce interpretation of the brain activity captured by magnetometers and gradiometers. In this preliminary study, it was found that the markers derived from gradiometer tend to outperform the magnetometer-based markers. Interestingly, out of the total 10 regions of interest, left-frontal lobe demonstrates about 8% higher mean recognition rate than the second-best performing region (left temporal lobe) for AD/MCI/HC classification. Among the four types of markers proposed in this work, the spatial marker developed using wavelet coefficients provided the best recognition performance for the three-way classification. Overall, the proposed approach provides promising results for the potential of AD/MCI/HC three-way classification utilizing the bimodal MEG data.
... Second, we did not extensively investigate the predictive power of different markers. Neuropsychological data are shown to be predictive of MCI progression to dementia due to AD 19,36-41 , with predictive performance improving when incorporating biological information 19,[42][43][44] . Further, recent studies have shown that blood based AD biomarkers have substantial predictive power in modelling AD trajectories 45,46 . ...
Preprint
Full-text available
The earliest stages of Alzheimer’s disease (AD) involve interactions between multiple pathophysiological processes. Although these processes are well studied, we still lack robust tools to predict individualised trajectories of disease progression. Here, we employ a robust and interpretable machine learning approach to combine multimodal biological data and predict future tau accumulation, translating predictive information from deep phenotyping cohorts at early stages of AD to cognitively normal individuals. In particular, we use machine learning to quantify interactions between key pathological markers (β-amyloid, medial temporal atrophy, tau and APOE 4) at early and asymptomatic stages of AD. We next derive a predictive index that stratifies individuals based on future pathological tau accumulation, highlighting two critical features for optimal clinical trial design. First, future tau accumulation provides a better outcome measure compared to changes in cognition. Second, stratification based on multimodal data compared to β-amyloid alone reduces the sample size required to detect a clinically meaningful change in tau accumulation. Further, we extend our machine learning approach to derive individualised trajectories of future pathological tau accumulation in early AD patients and accurately predict regional future rate of tau accumulation in an independent sample of cognitively unimpaired individuals. Our results propose a robust approach for fine scale stratification and prognostication with translation impact for clinical trial design at asymptomatic and early stages of AD.
... Second, we did not extensively investigate the predictive power of different markers. Neuropsychological data are shown to be predictive of MCI progression to dementia due to AD 19,36-41 , with predictive performance improving when incorporating biological information 19,[42][43][44] . Further, recent studies have shown that blood based AD biomarkers have substantial predictive power in modelling AD trajectories 45,46 . ...
Preprint
The earliest stages of Alzheimer’s disease (AD) involve interactions between multiple pathophysiological processes. Although these processes are well studied, we still lack robust tools to predict individualised trajectories of disease progression. Here, we employ a robust and interpretable machine learning approach to combine multimodal biological data and predict future tau accumulation, translating predictive information from deep phenotyping cohorts at early stages of AD to cognitively normal individuals. In particular, we use machine learning to quantify interactions between key pathological markers (β-amyloid, medial temporal atrophy, tau and APOE 4) at early and asymptomatic stages of AD. We next derive a predictive index that stratifies individuals based on future pathological tau accumulation, highlighting two critical features for optimal clinical trial design. First, future tau accumulation provides a better outcome measure compared to changes in cognition. Second, stratification based on multimodal data compared to β-amyloid alone reduces the sample size required to detect a clinically meaningful change in tau accumulation. Further, we extend our machine learning approach to derive individualised trajectories of future pathological tau accumulation in early AD patients and accurately predict regional future rate of tau accumulation in an independent sample of cognitively unimpaired individuals. Our results propose a robust approach for fine scale stratification and prognostication with translation impact for clinical trial design at asymptomatic and early stages of AD. One Sentence Summary Our machine learning approach combines baseline multimodal data to make individualised predictions of future pathological tau accumulation at prodromal and asymptomatic stages of Alzheimer’s disease with high accuracy and regional specificity.
... Modern datasets are often collected from diverse domains and views in many real-world applications thereby leading to the significantly increased interest in multiview (or multimodal) learning and analysis techniques [19]- [23]. For example, single images and continuous stream videos can both be described by using different visual descriptors, such as SIFT [24], Gabor [25], LBP [26], HOG [27], etc. ...
Article
Full-text available
Multiview subspace clustering has received significant attention as the availability of diverse of multidomain and multiview real-world data has rapidly increased in the recent years. Boosting the performance of multiview clustering algorithms is challenged by two major factors. First, since original features from multiview data are highly redundant, reconstruction based on these attributes inevitably results in inferior performance. Second, since each view of such multiview data may contain unique knowledge as against the others, it remains a challenge to exploit complimentary information across multiple views while simultaneously investigating the uniqueness of each view. In this paper, we present a novel dual shared-specific multiview subspace clustering (DSS-MSC) approach that simultaneously learns the correlations between shared information across multiple views and also utilizes view-specific information to depict specific property for each independent view. Further, we formulate a dual learning framework to capture shared-specific information into the dimensional reduction and self-representation processes, which strengthens the ability of our approach to exploit shared information while preserving view-specific property effectively. The experimental results on several benchmark datasets have demonstrated the effectiveness of the proposed approach against other state-of-the-art techniques.
Article
The fusion of complementary information contained in multi-modality data (e.g., Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET) and genetic data) has advanced the progress of automated Alzheimer’s disease (AD) diagnosis. However, multi-modality based AD diagnostic models are often hindered by the missing data, i.e., not all the subjects have complete multi-modality data. One simple solution used by many previous studies is to discard samples with missing modalities. However, this significantly reduces the number of training samples, thus leading to a sub-optimal classification model. Furthermore, when building the classification model, most existing methods simply concatenate features from different modalities into a single feature vector without considering their underlying associations. As features from different modalities are often closely related (e.g., MRI and PET features are extracted from the same brain region), utilizing their inter-modality associations may improve the robustness of the diagnostic model. To this end, we propose a novel latent representation learning method for multi-modality based AD diagnosis. Specifically, we use all the available samples (including samples with incomplete modality data) to learn a latent representation space. Within this space, we not only use samples with complete multimodality data to learn a common latent representation, but also use samples with incomplete multi-modality data to learn independent modality-specific latent representations. We then project the latent representations to the label space for AD diagnosis. We perform experiments using 737 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, and the experimental results verify the effectiveness of our proposed method.
Conference Paper
Full-text available
In this paper, we aim to maximally utilize multimodality neuroimaging and genetic data to predict Alzheimer's disease (AD) and its prodromal status, i.e., a multi-status dementia diagnosis problem. Multimodality neuroimaging data such as MRI and PET provide valuable insights to abnormalities, and genetic data such as Single Nucleotide Polymorphism (SNP) provide information about a patient's AD risk factors. When used in conjunction, AD diagnosis may be improved. However, these data are heterogeneous (e.g., having different data distributions), and have different number of samples (e.g., PET data is having far less number of samples than the numbers of MRI or SNPs). Thus, learning an effective model using these data is challenging. To this end, we present a novel three-stage deep feature learning and fusion framework , where the deep neural network is trained stage-wise. Each stage of the network learns feature representations for different combination of modalities, via effective training using maximum number of available samples . Specifically, in the first stage, we learn latent representations (i.e., high-level features) for each modality independently, so that the heterogeneity between modalities can be better addressed and then combined in the next stage. In the second stage, we learn the joint latent features for each pair of modality combination by using the high-level features learned from the first stage. In the third stage, we learn the diagnostic labels by fusing the learned joint latent features from the second stage. We have tested our framework on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset for multi-status AD diagnosis, and the experimental results show that the proposed framework outperforms other methods.
Article
Full-text available
Recently, there have been great interests for computer-aided diagnosis of Alzheimer’s disease (AD) and its prodromal stage, mild cognitive impairment (MCI). Unlike the previous methods that considered simple low-level features such as gray matter tissue volumes from MRI, and mean signal intensities from PET, in this paper, we propose a deep learning-based latent feature representation with a stacked auto-encoder (SAE). We believe that there exist latent non-linear complicated patterns inherent in the low-level features such as relations among features. Combining the latent information with the original features helps build a robust model in AD/MCI classification, with high diagnostic accuracy. Furthermore, thanks to the unsupervised characteristic of the pre-training in deep learning, we can benefit from the target-unrelated samples to initialize parameters of SAE, thus finding optimal parameters in fine-tuning with the target-related samples, and further enhancing the classification performances across four binary classification problems: AD vs. healthy normal control (HC), MCI vs. HC, AD vs. MCI, and MCI converter (MCI-C) vs. MCI non-converter (MCI-NC). In our experiments on ADNI dataset, we validated the effectiveness of the proposed method, showing the accuracies of 98.8, 90.7, 83.7, and 83.3 % for AD/HC, MCI/HC, AD/MCI, and MCI-C/MCI-NC classification, respectively. We believe that deep learning can shed new light on the neuroimaging data analysis, and our work presented the applicability of this method to brain disease diagnosis.
Article
Full-text available
Ensemble approaches to classification and regression have attracted a great deal of interest in recent years. These methods can be shown both theoretically and empirically to outperform single predictors on a wide range of tasks. One of the elements required for accurate prediction when using an ensemble is recognised to be error “diversity”. However, the exact meaning of this concept is not clear from the literature, particularly for classification tasks. In this paper we first review the varied attempts to provide a formal explanation of error diversity, including several heuristic and qualitative explanations in the literature. For completeness of discussion we include not only the classification literature but also some excerpts of the rather more mature regression literature, which we believe can still provide some insights. We proceed to survey the various techniques used for creating diverse ensembles, and categorise them, forming a preliminary taxonomy of diversity creation methods. As part of this taxonomy we introduce the idea of implicit and explicit diversity creation methods, and three dimensions along which these may be applied. Finally we propose some new directions that may prove fruitful in understanding classification error diversity.
Article
Full-text available
In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM''s. The approach is illustrated on a two-spiral benchmark classification problem.
Article
In this paper, a Brain-Wide and Genome-Wide Association (BW-GWA) study is presented to identify the associations between the brain imaging phenotypes (i.e., regional volumetric measures) and the genetic variants (i.e., Single Nucleotide Polymorphism (SNP)) in Alzheimer's Disease (AD). The main challenges of this study include the data heterogeneity, complex phenotype-genotype associations, high-dimensional data (e.g., thousands of SNPs), and the existence of phenotype outliers. Previous BW-GWA studies, while addressing some of these challenges, did not consider the diagnostic label information in their formulations, thus limiting their clinical applicability. To address these issues, we present a novel joint projection and sparse regression model to discover the associations between the phenotypes and genotypes. Specifically, to alleviate the negative influence of data heterogeneity, we first map the genotypes to an intermediate imaging-phenotype-like space. Then, to better reveal the complex phenotype-genotype associations, we project both the mapped genotypes and the original imaging phenotypes to a diagnostic-label-guided joint feature space, where the intra-class projected points are constrained to be close to each other. In addition, we use L2,1-norm minimization on both the regression loss function and the transformation coefficient matrices, to reduce the effect of phenotype outliers, and to encourage sparse feature selections of both the genotypes and phenotypes. We evaluate our method using Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, and the results show that the proposed method outperforms several state-of-the-art methods in term of average Root-Mean-Square Error (RMSE) of genome-to-phenotype predictions. Besides, the associated SNPs and brain regions found in this study have also been shown AD-related in previously studies, thus verifying the effectiveness and potential of the proposed method in AD pathogenesis study.
Article
In this paper, we aim to predict conversion and time-to-conversion of mild cognitive impairment (MCI) patients using multi-modal neuroimaging data and clinical data, via cross-sectional and longitudinal studies. However, such data are often heterogeneous, high-dimensional, noisy, and incomplete. We thus propose a framework that includes sparse feature selection, low-rank affinity pursuit denoising (LRAD), and low-rank matrix completion (LRMC) in this study. Specifically, we first use sparse linear regressions to remove unrelated features. Then, considering the heterogeneity of the MCI data, which can be assumed as a union of multiple subspaces, we propose to use a low rank subspace method (i.e., LRAD) to denoise the data. Finally, we employ LRMC algorithm with three data fitting terms and one inequality constraint for joint conversion and time-to-conversion predictions. Our framework aims to answer a very important but yet rarely explored question in AD study, i.e., when will the MCI convert to AD? This is different from survival analysis, which provides the probabilities of conversion at different time points that are mainly used for global analysis, while our time-to-conversion prediction is for each individual subject. Evaluations using the ADNI dataset indicate that our method outperforms conventional LRMC and other state-of-the-art methods. Our method achieves a maximal pMCI classification accuracy of 84% and time prediction correlation of 0.665.
Conference Paper
The diversity of base learners is of utmost importance to a good ensemble. This paper defines a novel measurement of diversity, termed as exclusivity. With the designed exclusivity, we further propose an ensemble SVM classifier, namely Exclusivity Regularized Machine (ExRM), to jointly suppress the training error of ensemble and enhance the diversity between bases. Moreover, an Augmented Lagrange Multiplier based algorithm is customized to effectively and efficiently seek the optimal solution of ExRM. Theoretical analysis on convergence, global optimality and linear complexity of the proposed algorithm, as well as experiments are provided to reveal the efficacy of our method and show its superiority over state-of-the-arts in terms of accuracy and efficiency.
Article
Accurate identification and understanding informative feature is important for early Alzheimer's disease (AD) prognosis and diagnosis. In this paper, we propose a novel discriminative sparse learning method with relational regularization to jointly predict the clinical score and classify AD disease stages using multimodal features. Specifically, we apply a discriminative learning technique to expand the class-specific difference and include geometric information for effective feature selection. In addition, two kind of relational information are incorporated to explore the intrinsic relationships among features and training subjects in terms of similarity learning. We map the original feature into the target space to identify the informative and predictive features by sparse learning technique. A unique loss function is designed to include both discriminative learning and relational regularization methods. Experimental results based on a total of 805 subjects [including 226 AD patients, 393 mild cognitive impairment (MCI) subjects, and 186 normal controls (NCs)] from AD neuroimaging initiative database show that the proposed method can obtain a classification accuracy of 94.68% for AD versus NC, 80.32% for MCI versus NC, and 74.58% for progressive MCI versus stable MCI, respectively. In addition, we achieve remarkable performance for the clinical scores prediction and classification label identification, which has efficacy for AD disease diagnosis and prognosis. The algorithm comparison demonstrates the effectiveness of the introduced learning techniques and superiority over the state-of-the-arts methods.
Article
Fusing information from different imaging modalities is crucial for more accurate identification of the brain state because imaging data of different modalities can provide complementary perspectives on the complex nature of brain disorders. However, most existing fusion methods often extract features independently from each modality, and then simply concatenate them into a long vector for classification, without appropriate consideration of the correlation among modalities. In this paper, we propose a novel method to transform the original features from different modalities to a common space, where the transformed features become comparable and easy to find their relation, by canonical correlation analysis. We then perform the sparse multi-task learning for discriminative feature selection by using the canonical features as regressors and penalizing a loss function with a canonical regularizer. In our experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, we use Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) images to jointly predict clinical scores of Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog) and Mini-Mental State Examination (MMSE) and also identify multi-class disease status for Alzheimer's disease diagnosis. The experimental results showed that the proposed canonical feature selection method helped enhance the performance of both clinical score prediction and disease status identification, outperforming the state-of-the-art methods.
Article
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone-Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in n. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.
Article
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone-Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in ℝn. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.
Article
Low-rank representation (LRR) is an effective method for subspace clustering and has found wide applications in computer vision and machine learning. The existing LRR solver is based on the alternating direction method (ADM). It suffers from $O(n^3)$ computation complexity due to the matrix-matrix multiplications and matrix inversions, even if partial SVD is used. Moreover, introducing auxiliary variables also slows down the convergence. Such a heavy computation load prevents LRR from large scale applications. In this paper, we generalize ADM by linearizing the quadratic penalty term and allowing the penalty to change adaptively. We also propose a novel rule to update the penalty such that the convergence is fast. With our linearized ADM with adaptive penalty (LADMAP) method, it is unnecessary to introduce auxiliary variables and invert matrices. The matrix-matrix multiplications are further alleviated by using the skinny SVD representation technique. As a result, we arrive at an algorithm for LRR with complexity $O(rn^2)$, where $r$ is the rank of the representation matrix. Numerical experiments verify that for LRR our LADMAP method is much faster than state-of-the-art algorithms. Although we only present the results on LRR, LADMAP actually can be applied to solving more general convex programs.
Conference Paper
We study the problem of classifying mild Alzheimer's disease (AD) subjects from healthy individuals (controls) using multi-modal image data, to facilitate early identification of AD related pathologies. Several recent papers have demonstrated that such classification is possible with MR or PET images, using machine learning methods such as SVM and boosting. These algorithms learn the classifier using one type of image data. However, AD is not well characterized by one imaging modality alone, and analysis is typically performed using several image types - each measuring a different type of structural/functional characteristic. This paper explores the AD classification problem using multiple modalities simultaneously. The difficulty here is to assess the relevance of each modality (which cannot be assumed a priori), as well as to optimize the classifier. To tackle this problem, we utilize and adapt a recently developed idea called Multi-Kernel learning (MKL). Briefly, each imaging modality spawns one (or more kernels) and we simultaneously solve for the kernel weights and a maximum margin classifier. To make the model robust, we propose strategies to suppress the influence of a small subset of outliers on the classifier - this yields an alternative minimization based algorithm for robust MKL. We present promising multi-modal classification experiments on a large dataset of images from the ADNI project.
Article
This letter shows a computer-aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) based on single photon emission computed tomography (SPECT) image feature selection and a statistical learning theory classifier. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data and defining normalized mean squared error features over regions of interest (ROI) that are selected by a t-test feature selection with feature correlation weighting. Thus, normalized mean square error (NMSE) features of cubic blocks located in the temporo-parietal brain region yields peak accuracy values of 98.3% for almost linear kernel support vector machine (SVM) defined over the 20 most discriminative features extracted. This new method outperformed recent developed methods for early AD diagnosis.
Article
We present a general method using kernel canonical correlation analysis to learn a semantic representation to web images and their associated text. The semantic space provides a common representation and enables a comparison between the text and images. In the experiments, we look at two approaches of retrieving images based on only their content from a text query. We compare orthogonalization approaches against a standard cross-representation retrieval technique known as the generalized vector space model.
  • N J Kabani
Kabani, N.J.: 3D anatomical atlas of the human brain. NeuroImage 7, P-0717 (1998)
Alzheimer's disease facts and figures
Alzheimer's Association: 2013 Alzheimer's disease facts and figures. Alzheimer's Dement. 9(2), 208-245 (2013) [PubMed: 23507120]
MKL for robust multi-modality AD classification
  • C Hinrichs
  • V Singh
  • G Xu
  • S Johnson
Hinrichs, C., Singh, V., Xu, G., Johnson, S.: MKL for robust multi-modality AD classification. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009. LNCS, vol. 5762, pp. 786-794. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04271-3 95
Feature learning and fusion of multimodality neuroimaging and genetic data for multi-status dementia diagnosis
  • T Zhou
  • K.-H Thung
  • X Zhu
  • D Shen
Zhou, T., Thung, K.-H., Zhu, X., Shen, D.: Feature learning and fusion of multimodality neuroimaging and genetic data for multi-status dementia diagnosis. In: Wang, Q., Shi, Y., Suk, H.-I., Suzuki, K. (eds.) MLMI 2017. LNCS, vol. 10541, pp. 132-140. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67389-9 16