Page 1

Linked Independent Component Analysis for Multimodal

Data Fusion

Adrian R. Groves∗,1, Christian F. Beckmann1,2, Steve M. Smith1,

and Mark W. Woolrich1,3

1FMRIB Centre, University of Oxford, UK.

2Imperial College, London, UK.

3Oxford Centre for Human Brain Activity, University of Oxford, UK.

∗Corresponding author: adriang@fmrib.ox.ac.uk.

Note: this document is a preprint, incorporating all revisions from peer review.

The full article is in NeuroImage, doi:10.1016/j.neuroimage.2010.09.073.

Abstract

In recent years, neuroimaging studies have increasingly been acquiring multiple modal-

ities of data and searching for task- or disease-related changes in each modality separately.

A major challenge in analysis is to find systematic approaches for fusing these differing data

types together to automatically find patterns of related changes across multiple modali-

ties, when they exist. Independent Component Analysis (ICA) is a popular unsupervised

learning method that can be used to find the modes of variation in neuroimaging data

across a group of subjects. When multimodal data is acquired for the subjects, ICA is typi-

cally performed separately on each modality, leading to incompatible decompositions across

modalities. Using a modular Bayesian framework, we develop a novel “Linked ICA” model

for simultaneously modelling and discovering common features across multiple modalities,

which can potentially have completely different units, signal- and contrast-to-noise ratios,

voxel counts, spatial smoothnesses and intensity distributions. Furthermore, this general

model can be configured to allow tensor ICA or spatially-concatenated ICA decompositions,

or a combination of both at the same time. Linked ICA automatically determines the opti-

mal weighting of each modality, and also can detect single-modality structured components

when present. This is a fully probabilistic approach, implemented using Variational Bayes.

We evaluate the method on simulated multimodal data sets, as well as on a real data set

of Alzheimer’s patients and age-matched controls that combines two very different types of

structural MRI data: morphological data (grey matter density) and diffusion data (fractional

anisotropy, mean diffusivity, and tensor mode).

1Introduction

One of the greatest strengths of MR neuroimaging is its flexibility; by using different pulse

sequences in a single scanning session, one can acquire information about the subject’s tissue

volume and morphology (using high-resolution structural scans), functional activity (using BOLD

FMRI), white matter integrity (using diffusion-weighted imaging), perfusion (using ASL), and

other distinct acquisition types. The result of this is that many recent studies have acquired

these multimodal MRI data sets for each subject and analysed them separately to find changes

in different aspects of the brain. For example, several recent studies have used structural and

1

Page 2

diffusion tensor imaging (DTI) to find changes in grey matter density and white matter tracts that

are related to schizophrenia (Douaud et al., 2007) or learning (Scholz et al., 2009). Other possible

combinations are DTI and task-related FMRI (Watkins et al., 2008) or structural, diffusion, and

resting-state FMRI (Filippini et al., 2009).

A major challenge is to find systematic approaches for fusing data across multiple MRI

modalities, in order to find any patterns of related change that may be present. We develop

a model based on Bayesian ICA to extract linked components from multimodal data, using

as inputs the subject-wise contrast images from modality-specific analyses. For example, these

inputs could be GLM contrasts from FMRI, cortical-thickness or VBM maps from structural

MRI, and skeletonised tensor measures from diffusion-weighted imaging. ICA is a particularly

effective model for finding meaningful, spatially-independent components in an unsupervised

setting because it searches for non-Gaussian spatial sources that are likely to represent real

structured features in the data.This is because linear mixing processes tend to turn non-

Gaussian independent sources into more Gaussian observed signals, so seeking non-Gaussianity

is an unsupervised way of isolating the original independent sources.

Standard ICA decompositions treat the input data as a 2D matrix, typically voxels × time-

points or voxels × subjects. Multimodal data does not naturally fit into this form and there

are a number of different configurations one could consider for performing combined ICA on

multimodal data:

• Separate ICA analysis of each modality reveals the salient features for each modality.

Since some of these features are caused by distributed neurological variations they could

be visible (to varying degrees) in all modalities, with similar subject-courses.

Corresponding components can then be matched up using heuristics; however there is no

guarantee that components with strongly-correlated subject-courses will be extracted, for

example a single component in one modality might be explained as a mixture of components

in another. When potential matches are found, it can be difficult to determine if they are

simply noisy estimates of the same subject-course or whether the underlying subject-courses

are different but correlated.

A slightly more sophisticated approach to this is the Parallel ICA method described by

Liu et al. (2009) which runs separate ICAs on each modality simultaneously; when cor-

related components are detected, it adds terms to the cost function to encourage these

components to become more correlated in later iterations. This relies on a number of tun-

able constraints (learning rates and weights) to ensure convergence and balance between

modalities. Furthermore, it is still not clear how to interpret paired components where the

subject-courses are significantly, but not perfectly, correlated.

• Spatial concatenation has also been used for analysing multimodal data, combining all

of the data from each subject into a single dataset with more voxels. This “joint ICA”

method has been used before for simultaneously analysing functional maps and gray matter

maps (Calhoun et al., 2006), and has been used to extract correlations in structural grey-

matter/white-matter density data (Xu et al., 2009). Since concatenation is a preprocessing

step, the ICA model is completely unaware of which voxels belong to which modality.

However, different modalities may have different spatial source histograms. ICA effectively

assumes that each component has a single, non-Gaussian histogram as the prior distribu-

tion for all voxels in its spatial map. If this map consists of voxels from several different

modalities, the modelled histogram (which is effectively an estimate of the source distribu-

tion) may have to compromise. For example, this can occur if one modality has a small area

of strong activation (or signal change in the case of structural modalities), while the other

2

Page 3

has a large region of weak activation. This can cause sub-optimal estimates of intensities

in spatial maps.

A related problem is that the contribution each modality makes to the ICA cost function

greatly depends on the scaling. One of the difficulties of concatenating multimodal data

is that the modalities may have different noise levels and different numbers of voxels. If

the scaling is mismatched, unsupervised methods such as PCA and ICA will be dominated

by the largest-variance modalities, or those with the most voxels. Typically these concate-

nation methods also require the same resolution and smoothing for all modalities, rather

than using optimized values for each.

There is also an issue of noise co-variance, for example due to spatial smoothing; in partic-

ular, adding more smoothing to one modality reduces the noise level but leaves the number

of voxels unchanged. The proposed method deals with this explicitly using a precalculated

correction for the number of effective degrees of freedom (eDOF), which is closely related

to the number resolution elements (RESELs) in the image (Worsley et al., 1995).

We also expect that some of the structured signals modelled by ICA will be observable

in only one modality, and may be extremely weak or even absent in some of the other

modalities. It would therefore be useful for sources to be “switched off” in the models

where they are not needed, just as it is important to eliminate unneeded components in

the single-modality Bayesian ICA model (Choudrey and Roberts, 2001).

• Tensor ICA stacks the modalities to create a 3D data matrix. This has been used for

multi-subject FMRI analysis, with dimensions of voxels × time × subjects (Beckmann and

Smith, 2005). In the multimodal scenario this would most likely translate into voxels ×

subjects × modalities. This is related to the PARAFAC model (see Nielsen 2004 for a

VB-based implementation) but with the addition of spatial-independence priors.

method assumes that each component has a single spatial map for all modalities, applied

to each modality with different weightings. This can be a beneficial feature because it

avoids unnecessary duplication of the spatial maps and can allow them to be inferred more

accurately when the assumption holds. However this is effectively a strong prior on the

nature of the spatial distribution and it may be inappropriate, for example if the number

of voxels is different or if the spatial maps in different modalities are not similar.

This

Using a modular Bayesian framework, we have developed a novel “Linked ICA” general model

that allows for either tensor ICA or spatially-concatenated ICA, or a combination of both at the

same time. The same subject loading matrix is shared between all of the modalities, so each

component consists of a single subject-course and one spatial map in each of the modalities.

The subject weighting matrix automatically balances information from all of the modalities.

This novel Linked ICA method will be applied to a data set with four different modalities, ac-

quired from 93 subjects (probable-Alzheimer’s patients and age-matched controls). One of these

modalities is a grey matter partial volume map (“GM”) derived from Voxel-Based Morphome-

try (VBM) methods (Ashburner and Friston, 2000), and the other three are measures of white

matter integrity: Fractional Anisotropy (FA), Mean Diffusivity (MD), and an orthogonal Tensor

Mode (MO) described in Ennis and Kindlmann (2006). These last three modalities have been

projected onto a two-dimensional white matter surface (the “skeleton”) using a Tract-Based

Spatial Statistics (TBSS) analysis (Smith et al., 2006).

3

Page 4

(a) Linked ICA matrix diagram (b) Spatially-concatenated ICA

Figure 1:

that the same subject loading matrix H is used for all of the modality groups, but otherwise

they are K separate Tensor ICAs, each with separate data dimensions Nk× Tk× R (voxels ×

modalities × subjects). Each of the modality groups contains one or more modalities stacked

together, expressed in terms of spatial maps X(k), modality weights W(k), a shared subject-

weighting matrix H, and additive noise E(k). (b) The spatially-concatenated ICA configuration,

for comparison. This model is almost identical to a standard Bayesian ICA.

(a) The main matrices of the Linked ICA that models multimodal data Y. Note

2 Theory

2.1Linked ICA Model for Multimodal Data Sets

We assume that the data set is from a group of R subjects, each scanned using several different

modalities. It should be noted that the proposed method has the potential to be applied in

any situation where multiple modalities have been collected across a single shared dimension

(subjects, trials, timepoints, etc.). Each of the scans is prepared for analysis using whatever

methods are recommended for a linear regression analysis (or a single-modality ICA) of the

group data. This produces maps for each modality, which can have different spatial masks and

different numbers of voxels. In this model, “modality” is defined as referring to a single contrast

image (per subject) that refers to a particular output extracted from the data. Typically, different

modalities will have different units, different scalings and different noise levels. In some cases,

a single analysis may result in several different contrast images; for example, a diffusion tensor

imaging (DTI) analysis can produce maps of FA (fractional anisotropy), MD (mean diffusivity)

and MO (tensor mode).These are treated as separate modalities as they contain distinct,

complementary, biophysical information.

However, to maintain the benefits of tensor ICA (inferring the same spatial patterns across

modalities) as much as possible, similar modalities can be collected into K “modality groups”.

Modalities in the same modality group must be observations of the same points in space; this

means the modalities must be spatially aligned to each other and have the same spatial mask, and

should also have similar spatial properties (for example, the same amount of smoothing). A good

example of this are multiple diffusion-derived measures projected onto a white matter skeleton

using TBSS. The data can then be packed into a set of 3D arrays Y(k)∈ RNk×Tk×R, where

Nkis the number of voxels in the shared spatial map and Tk≥ 1 is the number of modalities

in the kthmodality group. Each modality group is modelled using a Bayesian tensor ICA

model. This general configuration is shown in figure 1. Note that the Bayesian ICA differs from

4

Page 5

standard methods like FastICA (Hyv¨ arinen and Oja, 2000) in that it incorporates dimensionality-

reduction into the ICA method itself by the use of automatic relevance determination (ARD)

priors on components (Choudrey and Roberts, 2001; Bishop, 1999). The model works on the

full-dimensionality data directly and has an additive noise model. The Bayesian ICA also models

an explicitly parametrized non-Gaussian source model (in this case a Gaussian mixture model)

instead of maximizing negentropy (as used in FastICA).

2.2Bayesian Tensor ICA Model

Within each modality-group k the data is modelled as a sum of components using a tensor

decomposition. Each component i = 1...L can be expressed as the tensor product of one

spatial map, one subject-course, and one modality-course. These model the data in modality

group k = 1 : K, modality t = 1 : Tk, subject r = 1 : R and voxel n = 1 : Nkas

Y(k)

n,t,r

=

L

?

i=1

X(k)

n,iW(k)

t,iHi,r+ E(k)

n,t,r

(1)

where X(k)

weightings for component i in modality t (of modality group k), and Hi,r are the weights for

component i in subject r. For simplicity this model is used even when Tk= 1, so that W(k)

just a scalar. Crucially, the same H matrix is shared between all of the modality-groups; this

forms a link between the different modality groups, which are otherwise modelled completely

separately. The ithcomponent has the same subject weightings across modality groups but each

group has its own spatial map. Thus the number of repeats R and the maximum number of

components L must be the same everywhere, because these dimensions are shared, while Nkand

Tk are not. Uncorrelated Gaussian residuals are assumed, with the modality-dependent noise

precision (inverse variance) λ(k)

n,iare the spatial maps for component i in modality group k, W(k)

t,iare the modality

·,iis

t :

E(k)

n,t,r

∼

N(0,1/λ(k)

t ).

(2)

Note that this assumes the same noise variance for each voxel, while in the original data there

may actually be large (orders-of-magnitude) differences in the white noise intensity. To correct

for this, we rely on a robust preprocessing method called variance normalization which is widely

used for ICA on functional MRI (Beckmann and Smith, 2004).

The sketch of the Linked ICA matrices is shown in figure 1, and figure 2 shows how these

variables fit into the full Linked ICA graphical model; this also includes the hyperparameters

explained in the next two sections. Aside from the shared matrix H, linked ICA model is identical

to performing separate tensor ICA analyses: each modality group k has its own separate source

mixture model, as well as having its own noise model and separate ARD priors to drive different

patterns of sparsity. Note that r indexes the “repeats” dimension and is the dimension that is

shared across modality groups, for example r indexes subjects in the multi-subject application.

2.2.1 Adaptive Modality-weighting

The tensor model (equation 1) implies that the same spatial sources X(k)

different maps t ∈ 1..Tk, with weightings given by W(k)

(Beckmann and Smith, 2005), this t dimension indexes over repeats of the same scan, such as in

a multi-subject FMRI data from a study with identical stimulus timings. In that case it makes

sense to assume the same noise level for all timepoints, i.e. use only a scalar λ(k). Instead, the

·,iare used for all of the

t,i. In previous tensor ICA applications

5