Page 1

Multi-Modal Multi-Task Learning for Joint Prediction of Multiple

Regression and Classification Variables in Alzheimer’s Disease

Daoqiang Zhanga,b, Dinggang Shena,*, and the Alzheimer’s Disease Neuroimaging

Initiative1

aDepartment of Radiology and BRIC, University of North Carolina at Chapel Hill, NC 27599

bDepartment of Computer Science and Engineering, Nanjing University of Aeronautics and

Astronautics, Nanjing 210016, China

Abstract

Many machine learning and pattern classification methods have been applied to the diagnosis of

Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI).

Recently, rather than predicting categorical variables as in classification, several pattern regression

methods have also been used to estimate continuous clinical variables from brain images.

However, most existing regression methods focus on estimating multiple clinical variables

separately and thus cannot utilize the intrinsic useful correlation information among different

clinical variables. On the other hand, in those regression methods, only a single modality of data

(usually only the structural MRI) is often used, without considering the complementary

information that can be provided by different modalities. In this paper, we propose a general

methodology, namely Multi-Modal Multi-Task (M3T) learning, to jointly predict multiple

variables from multi-modal data. Here, the variables include not only the clinical variables used

for regression but also the categorical variable used for classification, with different tasks

corresponding to prediction of different variables. Specifically, our method contains two key

components, i.e., (1) a multi-task feature selection which selects the common subset of relevant

features for multiple variables from each modality, and (2) a multi-modal support vector machine

which fuses the above-selected features from all modalities to predict multiple (regression and

classification) variables. To validate our method, we perform two sets of experiments on ADNI

baseline MRI, FDG-PET, and cerebrospinal fluid (CSF) data from 45 AD patients, 91 MCI

patients, and 50 healthy controls (HC). In the first set of experiments, we estimate two clinical

variables such as Mini Mental State Examination (MMSE) and Alzheimer’s Disease Assessment

Scale - Cognitive Subscale (ADAS-Cog), as well as one categorical variable (with value of ‘AD’,

‘MCI’ or ‘HC’), from the baseline MRI, FDG-PET, and CSF data. In the second set of

experiments, we predict the 2-year changes of MMSE and ADAS-Cog scores and also the

conversion of MCI to AD from the baseline MRI, FDG-PET, and CSF data. The results on both

sets of experiments demonstrate that our proposed M3T learning scheme can achieve better

performance on both regression and classification tasks than the conventional learning methods.

1Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database

(www.loni.ucla.edu/ADNI). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or

provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at:

www.loni.ucla.edu\ADNI\Collaboration\ADNI_Authorship_list.pdf.

© 2011 Elsevier Inc. All rights reserved.

*Corresponding author. dqzhang@nuaa.edu.cn (D. Zhang), dgshen@med.unc.edu (D. Shen).

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our

customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of

the resulting proof before it is published in its final citable form. Please note that during the production process errors may be

discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

NIH Public Access

Author Manuscript

Neuroimage. Author manuscript; available in PMC 2013 January 16.

Published in final edited form as:

Neuroimage. 2012 January 16; 59(2): 895–907. doi:10.1016/j.neuroimage.2011.09.069.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 2

Keywords

Alzheimer’s disease (AD); multi-modal multi-task (M3T) learning; multi-task feature selection;

multi-modality; MCI conversion; MMSE; ADAS-Cog

Introduction

Alzheimer’s disease (AD) is the most common form of dementia diagnosed in people over

65 years of age. It is reported that there are 26.6 million AD sufferers worldwide, and 1 in

85 people will be affected by 2050 (Ron et al., 2007). Thus, accurate diagnosis of AD and

especially its early stage, i.e., mild cognitive impairment (MCI), is very important for timely

therapy and possible delay of the disease. Over the past decade, many machine learning and

pattern classification methods have been used for early diagnosis of AD and MCI based on

different modalities of biomarkers, e.g., the structural brain atrophy measured by magnetic

resonance imaging (MRI) (de Leon et al., 2007; Du et al., 2007; Fjell et al., 2010; McEvoy

et al., 2009), metabolic alterations in the brain measured by fluorodeoxyglucose positron

emission tomography (FDG-PET) (De Santi et al., 2001; Morris et al., 2001), and

pathological amyloid depositions measured through cerebrospinal fluid (CSF) (Bouwman et

al., 2007b; Fjell et al., 2010; Mattsson et al., 2009; Shaw et al., 2009), etc. In all these

methods, classification models are learned from training subjects to predict categorical

classification variable (i.e., class label) on test subjects.

Recently, rather than predicting categorical variables as in classification, several studies

begin to estimate continuous clinical variables from brain images (Duchesne et al., 2009;

Duchesne et al., 2005; Fan et al., 2010; Stonnington et al., 2010; Wang et al., 2010). This

kind of research is important because it can help evaluate the stage of AD pathology and

predict future progression. Different from classification that classifies a subject into binary

or multiple categories, regression needs to estimate continuous values and are thus more

challenging. In the literature, a number of regression methods have been used for estimating

clinical variables based on neuroimaging data. For example, linear regression models were

used to estimate the 1-year Mini Mental State Examination (MMSE) changes from structural

MR brain images (Duchesne et al., 2009; Duchesne et al., 2005). High-dimensional kernel-

based regression method, i.e., Relevance Vector Machine (RVM), were also used to estimate

clinical variables, including MMSE and Alzheimer’s Disease Assessment Scale - Cognitive

Subscale (ADAS-Cog), from structural MR brain images (Fan et al., 2010; Stonnington et

al., 2010; Wang et al., 2010). Besides clinical variables, regression methods have also been

used for estimating age of individual subject from MR brain images (Ashburner, 2007;

Franke et al., 2010).

In the practical diagnosis of AD, multiple clinical variables are generally acquired, e.g.,

MMSE and ADAS-Cog, etc. Specifically, MMSE is used to examine the orientation to time

and place, the immediate and delayed recall of three words, the attention and calculations,

language, and visuoconstructional functions (Folstein et al., 1975), while ADAS-Cog is a

global measure encompassing the core symptoms of AD (Rosen et al., 1984). It is known

that there exist inherent correlations among multiple clinical variables of a subject, since the

underlying pathology is the same (Fan et al., 2010; Stonnington et al., 2010). However, most

existing regression methods model different clinical variables separately, without

considering their inherent correlations that may be useful for robust and accurate estimation

of clinical variables from brain images. Moreover, to our knowledge, none of the existing

regression methods used for estimating clinical variables ever exploit the class labels which

are often available in the training subjects and are helpful to aid the accurate estimation of

regression variables, and vice versa.

Zhang et al. Page 2

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 3

On the other hand, although multi-modal data are often acquired for AD diagnosis, e.g.,

MRI, PET, and CSF biomarkers, nearly all existing regression methods developed for

estimation of clinical variables were based only on one imaging modality, i.e., mostly on the

structural MRI. Recent studies have indicated that the biomarkers from different modalities

provide complementary information, which is very useful for AD diagnosis (Fjell et al.,

2010; Landau et al., 2010; Walhovd et al., 2010b). More recently, a number of research

works have used multi-modal data for AD or MCI classification and obtained the improved

performance compared with the methods based only on single-modal data (Fan et al., 2008;

Hinrichs et al., 2011; Vemuri et al., 2009; Walhovd et al., 2010a; Zhang et al., 2011).

However, to the best of our knowledge, the same type of study in imaging-based regression,

i.e., estimation of clinical variables from multi-modal data, was not investigated previously.

Inspired by the above problems, in this paper, we propose a general methodology, namely

Multi-Modal Multi-Task (M3T) learning, to jointly predict multiple variables from multi-

modal data. Here, the variables include not only the continuous clinical variables for

regression (MMSE, ADAS-Cog) but also the categorical variable for classification (i.e.,

class label). We treat the estimation of different regression or classification variables as

different tasks, and use a multi-task learning method (Argyriou et al., 2008; Obozinski et al.,

2006) developed in the machine learning community for joint regression and classification

learning. Specifically, at first, we assume that the related tasks share a common relevant

feature subset but with a varying amount of influence on each task, and thus adopt a multi-

task feature selection method to obtain a common feature subset for different tasks

simultaneously. Then, we use a multi-modal support vector machine (SVM) method to fuse

the above-selected features from each modality to estimate multiple regression and

classification variables.

We validate our method on two sets of experiments. In the first set of experiments, we

estimate two regression variables (MMSE and ADAS-Cog) and one classification variable

(with value of ‘AD’, ‘MCI’ or ‘HC’) from the baseline MRI, PET, and CSF data. In the

second of experiment, we predict the 2-year changes of MMSE and ADAS-Cog scores and

also the conversion of MCI to AD from the baseline MRI, PET, and CSF data. We

hypothesize that the joint estimation or prediction of multiple regression and classification

variables would perform better than estimating or predicting each individual variable

separately, and that the use of multi-modal data (MRI, PET and CSF) would perform better

on joint regression and classification than the use of only single-modal data.

Method

The data used in the preparation of this paper were obtained from the Alzheimer’s Disease

Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu/ADNI). The ADNI was

launched in 2003 by the National Institute on Aging (NIA), the National Institute of

Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration

(FDA), private pharmaceutical companies, and non-profit organizations, as a $60 million, 5-

year public-private partnership. The primary goal of ADNI has been to test whether the

serial MRI, PET, other biological markers, and clinical and neuropsychological assessment

can be combined to measure the progression of MCI and early AD. Determination of

sensitive and specific markers of very early AD progression is intended to aid researchers

and clinicians to develop new treatments and monitor their effectiveness, as well as lessen

the time and cost of clinical trials.

ADNI is the result of efforts of many coinvestigators from a broad range of academic

institutions and private corporations, and subjects have been recruited from over 50 sites

across the U.S. and Canada. The initial goal of ADNI was to recruit 800 adults, aged 55 to

Zhang et al.Page 3

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 4

90, to participate in the research – approximately 200 cognitively normal older individuals

to be followed for 3 years, 400 people with MCI to be followed for 3 years, and 200 people

with early AD to be followed for 2 years (see www.adni-info.org for up-to-date

information). The research protocol was approved by each local institutional review board

and written informed consent is obtained from each participant.

Subjects

The ADNI general eligibility criteria are described at www.adni-info.org. Briefly, subjects

are between 55–90 years of age, having a study partner able to provide an independent

evaluation of functioning. Specific psychoactive medications will be excluded. General

inclusion/exclusion criteria are as follows: 1) healthy subjects: MMSE scores between 24–

30, a Clinical Dementia Rating (CDR) of 0, non-depressed, non MCI, and nondemented; 2)

MCI subjects: MMSE scores between 24–30, a memory complaint, having objective

memory loss measured by education adjusted scores on Wechsler Memory Scale Logical

Memory II, a CDR of 0.5, absence of significant levels of impairment in other cognitive

domains, essentially preserved activities of daily living, and an absence of dementia; and 3)

Mild AD: MMSE scores between 20–26, CDR of 0.5 or 1.0, and meets the National Institute

of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and

Related Disorders Association (NINCDS/ADRDA) criteria for probable AD.

In this paper, 186 ADNI subjects with all corresponding baseline MRI, PET, and CSF data

are included. In particular, it contains 45 AD patients, 91 MCI patients (including 43 MCI

converters (MCI-C) and 48 MCI non-converters (MCI-NC)), and 50 healthy controls. Table

1 lists the demographics of all these subjects. Subject IDs are also given in Supplemental

Table 4.

MRI, PET, and CSF

A detailed description on acquiring MRI, PET and CSF data from ADNI as used in this

paper can be found at (Zhang et al., 2011). Briefly, structural MR scans were acquired from

1.5T scanners. Raw Digital Imaging and Communications in Medicine (DICOM) MRI scans

were downloaded from the public ADNI site (www.loni.ucla.edu/ADNI), reviewed for

quality, and automatically corrected for spatial distortion caused by gradient nonlinearity

and B1 field inhomogeneity. PET images were acquired 30–60 minutes post-injection,

averaged, spatially aligned, interpolated to a standard voxel size, intensity normalized, and

smoothed to a common resolution of 8mm full width at half maximum. CSF data were

collected in the morning after an overnight fast using a 20- or 24-gauge spinal needle, frozen

within 1 hour of collection, and transported on dry ice to the ADNI Biomarker Core

laboratory at the University of Pennsylvania Medical Center. In this study, CSF Aβ42, CSF

t-tau and CSF p-tau are used as features.

Image analysis

Image pre-processing is performed for all MR and PET images following the same

procedures as in (Zhang et al., 2011). First, we do anterior commissure (AC) - posterior

commissure (PC) correction on all images, and use the N3 algorithm (Sled et al., 1998) to

correct the intensity inhomogeneity. Next, we do skull-stripping on structural MR images

using both brain surface extractor (BSE) (Shattuck et al., 2001) and brain extraction tool

(BET) (Smith, 2002), followed by manual edition and intensity inhomogeneity correction.

After removal of cerebellum, FAST in the FSL package (Zhang et al., 2001) is used to

segment structural MR images into three different tissues: grey matter (GM), white matter

(WM), and cerebrospinal fluid (CSF). After registration using HAMMER (Shen and

Davatzikos, 2002), we obtain the subject-labeled image based on a template with 93

manually labeled ROIs (Kabani et al., 1998). For each of the 93 ROI regions in the labeled

Zhang et al. Page 4

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 5

MR image, we compute the GM tissue volume of that region as a feature. For PET image,

we first align it to its respective MR image of the same subject using a rigid transformation,

and then compute the average intensity of each ROI in the PET image as a feature.

Therefore, for each subject, we totally obtain 93 features from MRI image, other 93 features

from PET image, and 3 features from CSF biomarkers.

Multi-Modal Multi-Task (M3T) learning

A new learning method, namely Multi-Modal Multi-Task (M3T) learning, is presented here

to simultaneously learn multiple tasks from multi-modal data. Fig. 1 illustrates the new

learning problem with comparison to the existing standard Single-Modal Single-Task

(SMST) learning, Multi-Task learning, and Multi-Modal learning. As can be seen from Fig.

1, in SMST and Multi-Task learning (Fig. 1(a–b)), each subject has only one modality of

data represented as xi, while, in M3T and Multi-Modal learning (Fig. 1(c–d)), each subject

has multiple modalities of data represented as

shows that, in SMST and Multi-Modal learning (Fig. 1(a) and (c)), each subject corresponds

to only one task denoted as ti, while, in M3T and Multi-Task learning (Fig. 1(b) and (d)),

each subject corresponds to multiple tasks denoted as

. On the other hand, Fig. 1 also

.

Now we can formally define the M3T learning as below. Given N training subjects and each

having M modalities of data, represented as

M3T method jointly learns a series of models corresponding to T different tasks denoted as

, our

. It is worth noting that M3T is a general learning

framework, and here we implement it through two successive major steps, i.e., (1) multi-task

feature selection (MTFS) and (2) multi-modal support vector machine (SVM) (for both

regression and classification). Fig. 2 illustrates the flowchart of the proposed M3T method,

where M = 3 modalities of data (e.g., MRI, PET, and CSF) are used for jointly learning

models corresponding to different tasks. Note that, for CSF modality, the original 3

measures (i.e., Aβ42, t-tau, and p-tau) are directly used as features without any feature

selection step.

Multi-task feature selection (MTFS)—For imaging modalities such as MRI and PET,

even after feature extraction, the number of features (extracted from brain regions) may be

still large. Besides, not all features are relevant to the disease under study. So, feature

selection is commonly used for dimensionality reduction, as well as for removal of

irrelevant features. Different from the conventional single-task feature selection, the multi-

task feature selection simultaneously selects a common feature subset relevant to all tasks.

This point is especially important for diagnosis of neurological diseases, since multiple

regression/classification variables are essentially determined by the same underlying

pathology, i.e., the diseased brain regions. Also, simultaneously performing feature selection

for multiple regression/classification variables is very helpful to suppress noises in the

individual variables.

Denote as the training data matrix on the m-th modality from

N training samples, and

from the same N training samples. Following the method proposed in (Obozinski et al.,

2006), linear models are used to model the multi-task feature selection (MTFS) as below:

as the response vector on the j-th task

Zhang et al. Page 5

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 6

where

modal data of a certain subject. The weight vectors for all T tasks form a weight matrix

is the weight vector for the j-th task on the m-th modality, and x(m) is the m-th

, which can be optimized by the following objective function:

where V(m)|d denotes the d-th row of V(m), D(m) is the dimension of the m-th modal data, and

λ is the regularization coefficient controlling the relative contributions of the two terms.

Note that λ also controls the ‘sparsity’ of the linear models, with the high value

corresponding to more sparse models (i.e., more values in V(m) are zero). It is easy to know

that the above equation reduces to the standard l1-norm regularized optimization problem in

Lasso (Tibshirani, 1996) when there is only one task. In our case, this is a multi-task

learning for the given m-th modal data.

The key point of the above objective function of MTFS is the use of l2-norm for V(m)|d,

which forces the weights corresponding to the d-th feature (of the m-th modal data) across

multiple tasks to be grouped together and tends to select features based on the strength of T

tasks jointly. Note that, because of the characteristic of ‘group sparsity’, the solution of

MTFS results in a weight matrix V(m) whose elements in some rows are all zeros. For

feature selection, we just keep those features with non-zero weights. At present, there are

many algorithms developed to solve MTFS; in this paper we adopt the SLEP toolbox (Liu et

al., 2009), which has been shown very effective on many datasets.

Multi-modal support vector machine—In our previous work (Zhang et al., 2011), the

multi-modal support vector classification (SVC) has been developed for multi-modal

classification of AD and MCI. Following (Zhang et al., 2011), in this paper, we derive the

corresponding multi-modal support vector regression (SVR) algorithm as below. Assume

that we have N training subjects with the corresponding target output {zi ∈ ℝ, i = 1, …, N},

and each subject has M modalities of data with the features selected by the above proposed

method and denoted as

following primal problem:

. Multi-modal SVR solves the

Where w(m), ϕ(m), and βm ≥ 0 denote the normal vector of hyperplane, the kernel-induced

mapping function, and the combining weight on the m-th modality, respectively. Here, we

constrain ∑m βm = 1. The parameter b is the offset. Note that ε-insensitive loss function is

used in the above objective function, and ξ and ξ* are the two sets of slack variables.

Similar to the conventional SVR, the dual form of the multi-modal SVR can be represented

as below:

Zhang et al.Page 6

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 7

Where

samples on the m-th modality.

is the kernel function for the two training

For a test sample with the selected features x = {x(1), …, x(m), …, x(M)}, we denote

as the kernel between each training sample x′i and

the test sample on the m-th modality. Then, the regression function takes the following form:

Similar to the multi-modal SVC (Zhang et al., 2011), the multi-modal SVR can also be

solved with the conventional SVR, e.g., through the LIBSVM toolbox, if we define the

mixed kernel between multi-modal training samples x′i

and x′j, and

test sample x. Here, βms are the nonnegative weight parameters used to balance the

contributions of different modalities, and their values are optimized through a coarse-grid

search by cross-validation on the training samples.

between multimodal training sample x′i and

After obtaining the common feature subset for all different tasks by MTFS as described

above, we use multi-modal SVM, including multi-modal SVC and multi-modal SVR, to

train the final support vector classification and regression models, respectively. Here, we

train a model for each corresponding variable (task). Specifically, we train support vector

regression models corresponding to the regression variables, and support vector

classification models corresponding to the classification variable, respectively. It is worth

noting that, since we use the common subset of features (learned by MTFS during the

feature selection stage) to train both regression and classification models, our models are

actually performing the multi-modal multi-task learning.

Validation

To evaluate the performance of different methods, we perform two sets of experiments on

186 ADNI baseline MRI, PET, and CSF data, respectively, from 45 AD, 91 MCI (including

43 MCI-C and 48 MCI-NC), and 50 HC. In the first set of experiments (Experiment 1), we

estimate two clinical variables (including MMSE and ADAS-Cog) and one categorical

variable (with class label of ‘AD’, ‘MCI’ or ‘HC’) from the baseline brain data of all 186

subjects. It is worth noting that only the baseline data of MRI, PET, and CSF are used in our

experiments, but, in order to alleviate the effect of noise in the measured clinical scores, we

use the mean clinical score at both baseline and immediate follow-up time points as the

ground truth for each subject. The same strategy has also been adopted in (Wang et al.,

2010). In the second set of experiments (Experiment 2), we predict the 2-year changes of

MMSE and ADAS-Cog scores and the conversion of MCI to AD from the baseline brain

Zhang et al. Page 7

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 8

data of 167 subjects (since 19 subjects do not have the 2-year follow-up MMSE or ADAS-

Cog scores and are thus discarded, as shown in Supplemental Table 4). Also, only the

baseline data of MRI, PET, and CSF are used in the experiment, and the corresponding

ground truths for the two regression tasks are the MMSE and ADAS-Cog changes from

baseline to the 2-year follow-up. For classification task, we will discriminate between MCI-

C and MCI-NC subjects, using the baseline MRI, PET, and CSF data.

We use 10-fold cross validation strategy by computing the Pearson’s correlation coefficient

(for measuring the correlation between the predicted clinical scores and the actual clinical

score in the regression tasks) and the classification accuracy (for measuring the proportion

of subjects correctly classified in the classification task). Specifically, the whole set of

subject samples are equally partitioned into 10 subsets, and each time the subject samples

within one subset are selected as the testing samples and all remaining subject samples in the

other 9 subsets are used for training the SVM models. This process is repeated for 10 times.

It is worth noting that, in Experiment 1, two binary classifiers (i.e., AD vs. HC and MCI vs.

HC, respectively) are built. Specifically, for AD vs. HC classification, we ignore the MCI

subjects at each cross-validation trial and use only the AD and HC subjects. Similarly, for

MCI vs. HC classification, we ignore the AD subjects at each cross-validation trial and use

only the MCI and HC subjects. On the other hand, in Experiment 2, only one binary

classifier (i.e., MCI-C vs. MCI-NC) is built involving only the MCI subjects. In both

experiments, SVM is implemented using LIBSVM toolbox (Chang and Lin, 2001), and

linear kernel is used after normalizing each feature vector with unit norm. For all respective

methods, the values of the parameters (e.g., λ and βm) are determined by performing another

round of cross-validation on the training data. Also, at preprocessing stage, we perform a

common feature normalization step, i.e., subtracting the mean and then dividing the standard

deviation (of all training subjects) for each feature value.

Results

Experiment 1: Estimating clinical stages (MMSE, ADAS-Cog, and class label)

We first estimate the clinical stages, including two regression variables (MMSE and ADAS-

Cog) and one classification variable (i.e., class label with a value of ‘AD’, ‘MCI’ or ‘HC’),

from the baseline MRI, PET, and CSF data. It is worth noting that the original multi-class

classification problem is formulated as two binary classification problems, i.e., AD vs. HC

and MCI vs. HC, as mentioned above. Table 2 shows the performances of the proposed

M3T method, compared with three methods each using individual modality, as well as the

CONCAT method (as detailed below). Specifically, in Table 2, MRI-, PET-, and CSF-based

methods denote the classification results using only the respective individual modality of

data. For MRI-based and PET-based methods, similarly as our M3T method, they contain

two successive steps, i.e., (1) the single-task feature selection method using Lasso

(Tibshirani, 1996), and (2) the standard SVM for both regression and classification. For

CSF-based method, it uses the original 3 features without any further feature selection, and

performs the standard SVM for both regression and classification. Obviously, MRI-, PET-

and CST-based methods all belong to the SMST learning as shown in Fig. 1. For

comparison, we also implement a simple concatenation method (denoted as CONCAT) for

using multi-modal data. In the CONCAT method, we first concatenate 93 features from

MRI, 93 features from PET, and 3 features from CSF into a 189 dimensional vector, and

then perform the same two steps (i.e., Lasso feature selection and SVM regression/

classification) as in MRI-, PET- and CSF-based methods. It is worth noting that the same

experimental settings are used in all five methods as compared in Table 2. Figs. 3–4 further

show the scatter plots of the estimated scores vs. the actual scores by five different methods

for MMSE and ADAS-Cog, respectively.

Zhang et al. Page 8

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 9

As can be seen from Table 2 and Figs. 3–4, our proposed M3T method consistently achieves

better performance than other four methods. Specifically, for estimating MMSE and ADAS-

Cog scores, our method achieves the correlation coefficients of 0.697 and 0.739,

respectively, while the best performance using individual modality is only 0.658 and 0.670

(when using PET), respectively. On the other hand, for AD vs. HC and MCI vs. HC

classification, our method achieves the accuracies of 0.933 and 0.832, respectively, while

the best performance using individual modality is only 0.848 (when using MRI) and 0.797

(when using PET), respectively. Table 2 and Figs. 3–4 also indicate that our proposed M3T

method consistently outperforms the CONCAT method on each performance measure,

although the latter also achieves better performance than three MRI-, PET-, or CSF-based

methods in most cases, because of using multimodal imaging data. However, CSF-based

method always achieves the worst performances in all tasks, and is significantly inferior to

MRI- and PET-based methods in this experiment. Finally, for each group (i.e., AD, MCI or

HC), we compute its average estimated clinical scores using M3T, with respective values of

24.8 (AD), 25.5 (MCI) and 28.1 (HC) for MMSE, and 14.9 (AD), 13.3 (MCI) and 8.3 (HC)

for ADAS-Cog. These results show certain consistency with the actual clinical scores as

shown in Table 1.

We also compare M3T with its two variants, i.e., Multi-Modal (Single-Task) and (Single-

Modal) Multi-Task methods as described in Fig. 1. Briefly, in Multi-Modal (Single-Task)

method, the single-task feature selection method (Lasso) and the multi-modal SVM (for both

regression and classification) are successively performed, while in Multi-Task method, the

multi-task feature selection method (MTFS) and the standard SVM (for both regression and

classification) are successively performed. In addition, we also contain the SMST methods

(MRI-, PET-, or CSF-based) for comparison. Fig. 5 shows the comparison results of those

methods on Experiment 1. It is worth noting that, using only CSF, (Single-Modal) Multi-

Task method is equivalent to SMST since in this case the original CSF measures are directly

used as features without any feature selection step. As can be seen from Fig. 5, our M3T

method consistently outperforms the other three methods: SMST, Multi-Modal (Single-

Task), and (Single-Modal) Multi-Task. Fig. 5 also indicates that, when performing MMSE

regression and AD vs. HC classification using MRI-based or PET-based method, and when

performing ADAS-Cog regression and MCI vs. HC classification using PET-based method,

(Single-Modal) Multi-Task method, which jointly estimates multiple regression/

classification variables, achieves better performance than SMST which estimates each

variable separately. On the other hand, Fig. 5 also shows that Multi-Modal (Single-Task)

method consistently outperforms SMST on both regression and classification, which

validates and further complements the existing conclusion on the advantage of multi-modal

classification using MRI, PET, and CSF data (Zhang et al., 2011). Finally, the t-test (at 95%

significance level) results between M3T and the second best method, i.e., Multi-Modal

(Single-Task) method, show that the former is significantly better than the latter on tasks of

estimating ADAS-Cog score and classifying between MCI and HC.

Experiment 2: Predicting 2-year MMSE and ADAS-Cog changes and MCI conversion

In this experiment, we predict the 2-year changes of MMSE and ADAS-Cog scores and the

conversion of MCI to AD, from the baseline MRI, PET, and CSF data. Here, we have two

regression tasks corresponding to the prediction of the regression variables of MMSE and

ADAS-Cog changes from baseline to 2-year follow-up, respectively, and one classification

task corresponding to prediction of the classification variable of MCI conversion to AD, i.e.,

MCI-C vs. MCI-NC. It is worth noting that as in Experiment 1, only the baseline MRI, PET,

and CSF data are used for all prediction tasks. We use the same subjects as in Experiment 1,

except for 19 subjects without 2-year MMSE or ADAS-Cog scores, thus reducing to totally

167 subjects with 40 AD, 80 MCI (38 MCI-C and 42 MCI-NC), and 47 HC that are finally

Zhang et al.Page 9

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 10

used in Experiment 2. Table 3 shows the performance of the proposed M3T method

compared with three individual-modality based methods and also the CONCAT method,

which are the same methods as those used in Experiment 1. Here, for MCI-C vs. MCI-NC

classification, besides reporting the classification accuracy, we also give other performance

measures including sensitivity (i.e., the proportion of MCI-C subjects correctly classified)

and the specificity (i.e., the proportion of MCI-NC subjects correctly classified). In addition,

we also plot the ROC curves of five different methods for classification between MCI-C and

MCI-NC, as shown in Fig. 6. Here, the individual-modality based methods (using MRI, CSF

or PET) and the CONCAT method are defined in the same way as in Experiment 1.

Table 3 and Fig. 6 show that, as in Experiment 1, M3T also consistently outperform the

individual-modality based methods and the CONCAT method, on both regression and

classification tasks. Specifically, our method achieves the correlation coefficients of 0.511

and 0.531 and the accuracy of 0.739, for predicting the 2-year changes of MMSE and

ADAS-Cog scores and the MCI conversion, respectively, while the best performance of

individual-modality based methods are 0.434 (when using PET), 0.455 (when using MRI),

and 0.639 (when using PET), respectively. In addition, the area under the ROC curve (AUC)

is 0.797 for MCI-C vs. MCI-NC classification with our M3T method, while the best AUC

using the individual-modality based method is 0.70 (when using PET) and the AUC of the

CONCAT method is 0.729. On the other hand, if comparing Table 3 with Table 2, we can

see that there is a significant decline in the corresponding performances. It implies that

predicting future MMSE and ADAS-Cog changes and the MCI conversion is much more

difficult and challenging than estimating the MMSE and ADAS scores and the class labels.

Finally, as in Experiment 1, we compare our M3T method with Multi-Modal (Single-Task),

(Single-Modal) Multi-Task, and SMST (including MRI, CSF or PET) methods, with the

results shown in Fig. 7. As can be seen from Fig. 7, M3T consistently achieves the best

performances among all methods. Fig. 7 also shows that, in all cases except on CSF-based

regression/classification (where no feature selection is performed), (Single-Modal) Multi-

Task method achieves better performance than the corresponding SMST method. On the

other hand, Fig. 7 also indicates that Multi-Modal (Single-Task) method consistently

outperforms the corresponding SMST method in both regression and classification tasks.

These results, respectively, validate the superiorities of Multi-Modal (Single-Task) and

(Single-Modal) Multi-Task methods over SMST method, where both methods improve

performance of SMST from different aspects. By fusing Multi-Modal (Single-Task) and

(Single-Modal) Multi-Task methods together in a unified framework, M3T further improves

the performance. In particular, the t-test (at 95% significance level) results between M3T

and the second best method, i.e., Multi-Modal (Single-Task) method, show that the former is

significantly better than the latter on tasks of predicting 2-year change of MMSE score and

predicting the conversion of MCI to AD.

Group comparisons of multiple variables

To investigate the relationship between multiple regression and classification variables in

Experiment 1 (including MMSE score, ADAS-Cog score, and class label (AD/MCI/HC))

and Experiment 2 (including MMSE change, ADAS-Cog change, and class label (MCI-C/

MCI-NC)), we perform group comparisons on them through the computation of the

correlation between each brain region and each variable across all subjects. Fig. 8 shows the

top 25% brain regions (with their names listed in Supplemental Table 5) that have the

highest correlation with class label (AD/MCI/HC), MMSE and ADAS-Cog scores using

MRI on Experiment 1, where different color represents correlation coefficient. For

comparison, we also list the bottom 25% brain regions with the lowest correlation with class

label, MMSE and ADAS-Cog scores in Supplemental Table 7. As can be seen from Fig. 8,

the selected brain regions with the highest correlations are very consistent across multiple

Zhang et al. Page 10

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 11

variables (i.e., class label, MMSE and ADAS-Cog scores). It implies that that there exist

inherent correlations among multiple variables, since the underlying pathology is the same.

A close observation on Fig. 8 indicates that most of the commonly selected top regions

(from multiple variables), e.g., hippocampal, amygdale and uncus regions, are known to be

related to the AD by many studies using group comparison methods (Chetelat et al., 2002;

Convit et al., 2000; Fox and Schott, 2004; Jack et al., 1999; Misra et al., 2009).

On the other hand, Fig. 9 shows the top 25% brain regions (with their names listed in

Supplemental Table 6) that have the highest correlation with class label (MCI-C/MCI-NC),

MMSE and ADAS-Cog changes using MRI on Experiment 2, where different color again

represents correlation coefficient. For comparison, we also list the bottom 25% brain regions

with the lowest correlation with class label, MMSE and ADAS-Cog changes in

Supplemental Table 8. Fig. 9 indicates that there still exists consistency between the selected

top regions across multiple variables (i.e., class label, MMSE and ADAS-Cog changes), but

it is not as apparent as the regions obtained in Experiment 1. This partly explains the fact

why the lower performance is achieved in predicting the future changes of MMSE and

ADAS-Cog scores and the MCI conversion to AD, compared to estimating the MMSE and

ADAS-Cog scores and the class labels, as shown in Tables 2–3. This is because the tasks of

predicting the future changes of clinical variables and the conversion of MCI to AD are

more challenging than the tasks of estimating clinical variables and class label.

Discussion

In this paper, we have proposed a new Multi-Modal Multi-Task (M3T) learning method

with two successive steps, i.e., multi-task feature selection and multi-modal support vector

machine, to jointly predict multiple regression and classification variables from multi-modal

data. Our proposed method has been validated on 186 baseline subjects from ADNI through

two different sets of experiments. In the first set of experiment, we tested its performance in

jointly estimating the MMSE and ADAS-Cog scores and the class label (AD/MCI/HC) of

subjects, from the baseline MRI, PET, and CSF data. In the second set of experiment, we

tested its performance in jointly predicting the 2-year changes of MMSE and ADAS-Cog

scores and the conversion of MCI to AD, also from the baseline MRI, PET, and CSF data.

Multi-task learning

Multi-task learning is a recent machine learning technique, which learns a set of related

models for predicting multiple related tasks (Argyriou et al., 2008; Obozinski et al., 2006;

Yang et al., 2009). Because multi-task learning uses the commonality among different tasks,

it often leads to a better model than learning the individual tasks separately. In multi-task

learning, one key issue is how to characterize and use the task relatedness among multiple

tasks, with several strategies used in the existing multi-task learning methods: (1) sharing

parameters or prior distributions of the hyperparameters of the models across multiple tasks

(Bi et al., 2008), and 2) sharing a common underlying representation across multiple tasks

(Argyriou et al., 2008; Obozinski et al., 2006; Yang et al., 2009). A few studies have used

multi-tasking learning in medical imaging. For example, multi-task learning, which is based

on sharing common prior distribution in the parameters of different models, is used to detect

different types of clinically related abnormal structures in medical images (Bi et al., 2008).

In another work which is the most related to the current study, a joint Bayesian classifier by

sharing the same hyperparameters for model parameters is used to estimate the MMSE and

ADAS-Cog scores from the baseline MRI data (Fan et al., 2010). In contrast, the multi-task

feature selection method used in this paper belongs to the second scenario, i.e., it assumes

that multiple related tasks share a subset of relevant features and thus select them from a

large common space of features based on group sparsity constraints. Moreover, besides the

Zhang et al. Page 11

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 12

regression tasks on estimating clinical scores including MMSE and ADAS-Cog, our method

also learns classification tasks in a unified multi-task learning framework. Our experimental

results have shown the advantage of jointly estimating multiple regression and classification

variables. Also, it is worth noting that, in (Fan et al., 2010), their method achieved the

correlation coefficients of 0.569 and 0.522 in estimating MMSE and ADAS-Cog scores,

respectively, from the ADNI baseline MRI data of 52 AD, 148 MCI, and 64 HC, which are

inferior to our corresponding results in Table 2.

It is worth noting that feature selection (including both MTFS and Lasso) is performed on

the training data only. Thus, the selected features at each cross-validation trial may be

different. Accordingly, we checked the selected features by MTFS at each cross-validation

trial in Experiment 1, and found that the selected features do vary across different cross-

validation trials. But we also found that some important features such as hippocampal

regions, which are highly relevant to the disease, are always selected in each cross-

validation trial.

Multi-modal classification and regression

In recent studies on AD and MCI, it has been shown that biomarkers from different

modalities contain complementary information for diagnosis of diseases (Apostolova et al.,

2010; de Leon et al., 2007; Fjell et al., 2010; Foster et al., 2007; Landau et al., 2010;

Walhovd et al., 2010b), and thus a lot of works on combining different modalities of

biomarkers have been reported for multi-modal classification (Bouwman et al., 2007a;

Chetelat et al., 2005; Fan et al., 2008; Fellgiebel et al., 2007; Geroldi et al., 2006; Vemuri et

al., 2009; Visser et al., 2002; Walhovd et al., 2010a). Typically in those methods, features

from all different modalities are concatenated into a longer feature vector for the purpose of

multi-modal classification. More recently, multiple-kernel method is used for multi-modal

data fusion and classification, and achieves better performance than the baseline feature

concatenation method (Hinrichs et al., 2011; Zhang et al., 2011).

On the other hand, compared with the abundant works on multi-modal classification, to the

best of our knowledge, there are no previous studies on using multi-modal data for

estimating clinical variables, i.e., using multi-modal regression. Instead, nearly all existing

works on estimating clinical variables use only the structural MRI. However, as shown in

Tables 2–3, in some cases using PET data achieves better performance than using MRI data,

and by further combining MRI, PET, and CSF data, the multi-modal regression methods

always outperform the individual-modality based methods. Also, our experimental results

suggest that, although using only CSF data alone achieves the worst performances in most

cases, it can help build powerful multi-modal regression models when combined with MRI

and PET data. Similar conclusions have also been drawn in the multi-modal classification

(Zhang et al., 2011).

Another general scheme for fusing multi-modal data is the ensemble learning (Zhang et al.,

2011) (denoted as ENSEMBLE in this paper), which trains multiple learners for each

modality and then aggregates them by majority voting (for classification) or averaging (for

regression) at the decision-making level. For comparison, we perform the ENSEMBLE

method on Experiment 1, where the ENSEMBLE method achieves the correlation

coefficients of 0.677 and 0.727 in estimating MMSE and ADAS-Cog scores, respectively,

and the classification accuracies of 0.888 and 0.769 in classifying AD and MCI from HC,

respectively. These results are inferior to the corresponding results of M3T in Table 2. Also,

similar to regression, we found that the ENSEMBLE method cannot achieve satisfactory

results on classification, which implies that the simple majority voting based on the only 3

individual classifiers (from MRI, PET and CSF modalities, respectively) may be not

sufficient for achieving a better ensemble classification on this dataset.

Zhang et al. Page 12

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 13

Our current model adopts (multi-modal) SVM for both regression and classification. For

SVM, a linear kernel is used after normalizing each feature vector with unit norm. The

advantage of using linear kernel is that there is no free parameter to be adjusted. In fact, we

also tried the Gaussian kernel under different values of the kernel width (i.e., sigma) (see

Supplemental Fig. 10). We found that using the linear kernel plus our normalization step can

achieve similar performance as using the Gaussian kernel with the best value of the kernel

width. Moreover, besides SVM, there also exist other models for regression and

classification, e.g., multiple regression, logistic regression, etc (Hastie et al., 2001).

However, our experimental results indicate that these models achieve much poorer

performance than SVM.

Finally, in this paper, for measuring the performance of different methods in regression

tasks, we use the Pearson’s correlation coefficient throughout our experiments. In fact,

besides correlation coefficient, there also exist other performance evaluation metrics, e.g.,

the PRESS statistic which is defined as the sum of squares of the prediction residuals

computed under the Leave-One-Out (LOO) strategy. Here, we compare the performance of

different regression methods on Experiment 1 using a variant of the PRESS metric, i.e.,

PRESS RMSE (root mean square prediction error). Specifically, for estimating MMSE

score, the PRESS RMSE measures of MRI-based, PET-based, CSF-based, CONCAT and

M3T methods are 3.073, 2.669, 3.112, 2.667 and 2.563, respectively. On the other hand, for

estimating ADAS-Cog score, the PRESS RMSE measures of MRI-based, PET-based, CSF-

based, CONCAT and M3T methods are 6.306, 5.966, 6.851, 5.834 and 5.652, respectively.

The above results further show that under the PRESS RMSE metric, our M3T method still

achieves the best performance on both regression tasks, followed by CONCAT and PET-

based methods, which are consistent to our previous results in Table 2 which uses

correlation coefficient as the performance measure.

Prediction of conversion and future decline of MCI

More and more of recent interests in early diagnosis of AD has been moved to identify the

MCI subjects who will progress to clinical AD, i.e., MCI converters (MCI-C), from those

who remain stable, i.e., MCI non-converters (MCI-NC) (Davatzikos et al., 2010; Leung et

al., 2010; Misra et al., 2009). Although our method was not specifically aiming for

prediction of MCI to AD, the achieved performances, i.e., an accuracy of 0.739 and an AUC

of 0.797 on 38 MCI-C and 42 MCI-NC, are very comparable to the best results reported in

several recent studies on ADNI. For example, in (Misra et al., 2009), the accuracy between

0.75 and 0.80 and an AUC of 0.77 were reported on 27 MCI-C and 76 MCI-NC using

structural MRI data of ADNI. In (Davatzikos et al., 2010), the maximum accuracy of 0.617

and AUC of 0.734 were reported on 69 MCI-C and 170 MCI-NC using both MRI and CSF

data. In (Leung et al., 2010), the maximum AUC of 0.67 was reported on 86 MCI-C and 128

MCI-NC using the hippocampal atrophy rates calculated by the boundary shift integral

within ROIs.

On the other hand, a few recent studies also investigated the problem of predicting future

cognitively decline of MCI subjects. For example, a study based on group comparison on 85

MCI from ADNI in (Landau et al., 2010) indicated that CSF and PET could predict

longitudinal cognitive decline. In (Wang et al., 2010), a Bagging relevant vector machine

(RVM) was adopted to predict the future decline of MMSE score from baseline MRI data

and a correlation coefficient of 0.537 was achieved on 16 MCI-C, 5 MCI-NC, and 5 AD. In

contrast, our method achieved the correlation coefficients of 0.511 and 0.531 on 38 MCI-C

and 42 MCI-NC, as well as 40 AD and 47 HC, which are comparable to the result in (Wang

et al., 2010). Finally, in (Duchesne et al., 2009), a principal component analysis (PCA)

based model is used on MRI data of 49 MCI (including 20 MCI-C and 29 MCI-NC) to

predict the one-year change in MMSE score, and a correlation coefficient of 0.31 was

Zhang et al. Page 13

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 14

reported. This low correlation coefficient result indicated that it is more difficult to predict

one-year changes than two-year changes, since the MCI converters (who convert to AD after

two years) had not progress to AD completely after one year and thus the corresponding

measured cognitive scores did not accurately reflect the underlying pathological changes in

brain regions.

Limitations

The current study is limited by several factors as below. First, the proposed method is based

on multi-modal data, i.e., MRI, PET, and CSF, and thus requires each subject to have the

corresponding modality data, which limits the size of subjects that can be used for study. For

example, there are approximately 800 subjects in ADNI database, while there are only

around 200 subjects having all baseline MRI, PET, and CSF data. Second, besides MRI,

PET, and CSF, there also exist other modalities of data, i.e., APOE. However, since not

every subject has this data and the number of subjects with all modality data (including

APOE) is too small for reasonable learning, the current study does not consider APOE for

multimodal classification and regression. Finally, besides MMSE and ADAS-Cog scores,

there exist other clinical scores in ADNI database. However, due to the similar reasons (i.e.,

not every subject has all clinical scores available), we did not investigate those clinical

variables in the current study, although in principle including more related clinical variables

is not difficult and would further improve the regression/classification performance.

Conclusion

In summary, our experimental results have showed that our proposed Multi-Modal Multi-

Task (M3T) method can effectively perform multiple-tasks learning from multi-modal data.

Specifically, it can effectively estimate the MMSE and ADAS-Cog scores and the

classification label in both AD vs. HC and MCI vs. HC classifications, and can also predict

the 2-year MMSE and ADAS-Cog changes and the classification label in MCI-C vs. MCI-

NC classification. To the best of our knowledge, it made the first investigation on jointly

predicting multiple regression and classification variables from the baseline multi-modal

data. In the future work, we will investigate incomplete multi-modal multi-task learning with

missing values in both modalities and tasks, to increase the number of ADNI subjects that

can be used for training our method. Moreover, we will develop new models which can

iteratively use multi-modal and multi-task information, i.e. using regression/classification

results to guide feature selection, for further improving the final performance. This general

wrapper-like framework can embrace a series of feature selection method, e.g., SVM-RFE

(Guyon et al., 2002) which has been widely used in neuroimaging area. We will extend it for

multi-modal multi-task learning and compare with our current model. Finally, it is

interesting to investigate the integration of the existing domain knowledge in AD research

into our current model, for not only achieving good prediction accuracy but also providing

good interpretability in understanding the biology of AD.

Research Highlights

➢➢

We jointly predict regression and classification variables from multi-modal

data

➢➢

Two sets of experiments are performed on baseline MRI, PET, and CSF data

from ADNI

➢➢

We first estimate MMSE and ADAS-Cog clinical scores and class label (AD/

MCI/HC)

➢➢

We then predict 2-year change of MMSE and ADAS-Cog scores and MCI

conversion to AD

Zhang et al. Page 14

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 15

➢➢

Our method achieves better performance on both experiments than

conventional ones

Supplementary Material

Refer to Web version on PubMed Central for supplementary material.

Acknowledgments

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI)

(National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the

National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the

following: Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical

Development, Elan Corporation, Genentech, GE Healthcare, GlaxoSmithKline, Innogenetics, Johnson and Johnson,

Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-

Plough, Synarc, Inc., as well as non-profit partners the Alzheimer's Association and Alzheimer's Drug Discovery

Foundation, with participation from the U.S. Food and Drug Administration. Private sector contributions to ADNI

are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is

the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's

Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the

Laboratory for Neuro Imaging at the University of California, Los Angeles.

References

Apostolova LG, Hwang KS, Andrawis JP, Green AE, Babakchanian S, Morra JH, Cummings JL, Toga

AW, Trojanowski JQ, Shaw LM, Jack CR Jr, Petersen RC, Aisen PS, Jagust WJ, Koeppe RA,

Mathis CA, Weiner MW, Thompson PM. 3D PIB and CSF biomarker associations with

hippocampal atrophy in ADNI subjects. Neurobiol Aging. 2010; 31:1284–1303. [PubMed:

20538372]

Argyriou A, Evgeniou T, Pontil M. Convex multi-task feature learning. Machine Learning. 2008;

73:243–272.

Ashburner J. A fast diffeomorphic image registration algorithm. Neuroimage. 2007; 38:95–113.

[PubMed: 17761438]

Bi, J.; Xiong, T.; Yu, S.; Dundar, M.; Rao, B. An improved multi-task learning approach with

applications in medical diagnosis; Proceedings of the 2008 European Conference on Machine

Learning and Knowledge Discovery in Databases; 2008. p. 117-132.

Bouwman FH, Schoonenboom SN, van der Flier WM, van Elk EJ, Kok A, Barkhof F, Blankenstein

MA, Scheltens P. CSF biomarkers and medial temporal lobe atrophy predict dementia in mild

cognitive impairment. Neurobiol Aging. 2007a; 28:1070–1074. [PubMed: 16782233]

Bouwman FH, van der Flier WM, Schoonenboom NS, van Elk EJ, Kok A, Rijmen F, Blankenstein

MA, Scheltens P. Longitudinal changes of CSF biomarkers in memory clinic patients. Neurology.

2007b; 69:1006–1011. [PubMed: 17785669]

Chang CC, Lin CJ. LIBSVM: a library for support vector machines. 2001

Chetelat G, Desgranges B, de la Sayette V, Viader F, Eustache F, Baron J-C. Mapping gray matter loss

with voxel-based morphometry in mild cognitive impairment. Neuroreport. 2002; 13:1939–1943.

[PubMed: 12395096]

Chetelat G, Eustache F, Viader F, De La Sayette V, Pelerin A, Mezenge F, Hannequin D, Dupuy B,

Baron JC, Desgranges B. FDG-PET measurement is more accurate than neuropsychological

assessments to predict global cognitive deterioration in patients with mild cognitive impairment.

Neurocase. 2005; 11:14–25. [PubMed: 15804920]

Convit A, de Asis J, de Leon MJ, Tarshish CY, De Santi S, Rusinek H. Atrophy of the medial

occipitotemporal, inferior, and middle temporal gyri in non-demented elderly predict decline to

Alzheimer's disease. Neurobiol Aging. 2000; 21:19–26. [PubMed: 10794844]

Davatzikos C, Bhatt P, Shaw LM, Batmanghelich KN, Trojanowski JQ. Prediction of MCI to AD

conversion, via MRI, CSF biomarkers, and pattern classification. Neurobiol Aging. 2010

Zhang et al. Page 15

Neuroimage. Author manuscript; available in PMC 2013 January 16.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript