Linked Independent Component Analysis for Multimodal
Adrian R. Groves∗,1, Christian F. Beckmann1,2, Steve M. Smith1,
and Mark W. Woolrich1,3
1FMRIB Centre, University of Oxford, UK.
2Imperial College, London, UK.
3Oxford Centre for Human Brain Activity, University of Oxford, UK.
∗Corresponding author: email@example.com.
Note: this document is a preprint, incorporating all revisions from peer review.
The full article is in NeuroImage, doi:10.1016/j.neuroimage.2010.09.073.
In recent years, neuroimaging studies have increasingly been acquiring multiple modal-
ities of data and searching for task- or disease-related changes in each modality separately.
A major challenge in analysis is to find systematic approaches for fusing these differing data
types together to automatically find patterns of related changes across multiple modali-
ties, when they exist. Independent Component Analysis (ICA) is a popular unsupervised
learning method that can be used to find the modes of variation in neuroimaging data
across a group of subjects. When multimodal data is acquired for the subjects, ICA is typi-
cally performed separately on each modality, leading to incompatible decompositions across
modalities. Using a modular Bayesian framework, we develop a novel “Linked ICA” model
for simultaneously modelling and discovering common features across multiple modalities,
which can potentially have completely different units, signal- and contrast-to-noise ratios,
voxel counts, spatial smoothnesses and intensity distributions. Furthermore, this general
model can be configured to allow tensor ICA or spatially-concatenated ICA decompositions,
or a combination of both at the same time. Linked ICA automatically determines the opti-
mal weighting of each modality, and also can detect single-modality structured components
when present. This is a fully probabilistic approach, implemented using Variational Bayes.
We evaluate the method on simulated multimodal data sets, as well as on a real data set
of Alzheimer’s patients and age-matched controls that combines two very different types of
structural MRI data: morphological data (grey matter density) and diffusion data (fractional
anisotropy, mean diffusivity, and tensor mode).
One of the greatest strengths of MR neuroimaging is its flexibility; by using different pulse
sequences in a single scanning session, one can acquire information about the subject’s tissue
volume and morphology (using high-resolution structural scans), functional activity (using BOLD
FMRI), white matter integrity (using diffusion-weighted imaging), perfusion (using ASL), and
other distinct acquisition types. The result of this is that many recent studies have acquired
these multimodal MRI data sets for each subject and analysed them separately to find changes
in different aspects of the brain. For example, several recent studies have used structural and
diffusion tensor imaging (DTI) to find changes in grey matter density and white matter tracts that
are related to schizophrenia (Douaud et al., 2007) or learning (Scholz et al., 2009). Other possible
combinations are DTI and task-related FMRI (Watkins et al., 2008) or structural, diffusion, and
resting-state FMRI (Filippini et al., 2009).
A major challenge is to find systematic approaches for fusing data across multiple MRI
modalities, in order to find any patterns of related change that may be present. We develop
a model based on Bayesian ICA to extract linked components from multimodal data, using
as inputs the subject-wise contrast images from modality-specific analyses. For example, these
inputs could be GLM contrasts from FMRI, cortical-thickness or VBM maps from structural
MRI, and skeletonised tensor measures from diffusion-weighted imaging. ICA is a particularly
effective model for finding meaningful, spatially-independent components in an unsupervised
setting because it searches for non-Gaussian spatial sources that are likely to represent real
structured features in the data.This is because linear mixing processes tend to turn non-
Gaussian independent sources into more Gaussian observed signals, so seeking non-Gaussianity
is an unsupervised way of isolating the original independent sources.
Standard ICA decompositions treat the input data as a 2D matrix, typically voxels × time-
points or voxels × subjects. Multimodal data does not naturally fit into this form and there
are a number of different configurations one could consider for performing combined ICA on
• Separate ICA analysis of each modality reveals the salient features for each modality.
Since some of these features are caused by distributed neurological variations they could
be visible (to varying degrees) in all modalities, with similar subject-courses.
Corresponding components can then be matched up using heuristics; however there is no
guarantee that components with strongly-correlated subject-courses will be extracted, for
example a single component in one modality might be explained as a mixture of components
in another. When potential matches are found, it can be difficult to determine if they are
simply noisy estimates of the same subject-course or whether the underlying subject-courses
are different but correlated.
A slightly more sophisticated approach to this is the Parallel ICA method described by
Liu et al. (2009) which runs separate ICAs on each modality simultaneously; when cor-
related components are detected, it adds terms to the cost function to encourage these
components to become more correlated in later iterations. This relies on a number of tun-
able constraints (learning rates and weights) to ensure convergence and balance between
modalities. Furthermore, it is still not clear how to interpret paired components where the
subject-courses are significantly, but not perfectly, correlated.
• Spatial concatenation has also been used for analysing multimodal data, combining all
of the data from each subject into a single dataset with more voxels. This “joint ICA”
method has been used before for simultaneously analysing functional maps and gray matter
maps (Calhoun et al., 2006), and has been used to extract correlations in structural grey-
matter/white-matter density data (Xu et al., 2009). Since concatenation is a preprocessing
step, the ICA model is completely unaware of which voxels belong to which modality.
However, different modalities may have different spatial source histograms. ICA effectively
assumes that each component has a single, non-Gaussian histogram as the prior distribu-
tion for all voxels in its spatial map. If this map consists of voxels from several different
modalities, the modelled histogram (which is effectively an estimate of the source distribu-
tion) may have to compromise. For example, this can occur if one modality has a small area
of strong activation (or signal change in the case of structural modalities), while the other
has a large region of weak activation. This can cause sub-optimal estimates of intensities
in spatial maps.
A related problem is that the contribution each modality makes to the ICA cost function
greatly depends on the scaling. One of the difficulties of concatenating multimodal data
is that the modalities may have different noise levels and different numbers of voxels. If
the scaling is mismatched, unsupervised methods such as PCA and ICA will be dominated
by the largest-variance modalities, or those with the most voxels. Typically these concate-
nation methods also require the same resolution and smoothing for all modalities, rather
than using optimized values for each.
There is also an issue of noise co-variance, for example due to spatial smoothing; in partic-
ular, adding more smoothing to one modality reduces the noise level but leaves the number
of voxels unchanged. The proposed method deals with this explicitly using a precalculated
correction for the number of effective degrees of freedom (eDOF), which is closely related
to the number resolution elements (RESELs) in the image (Worsley et al., 1995).
We also expect that some of the structured signals modelled by ICA will be observable
in only one modality, and may be extremely weak or even absent in some of the other
modalities. It would therefore be useful for sources to be “switched off” in the models
where they are not needed, just as it is important to eliminate unneeded components in
the single-modality Bayesian ICA model (Choudrey and Roberts, 2001).
• Tensor ICA stacks the modalities to create a 3D data matrix. This has been used for
multi-subject FMRI analysis, with dimensions of voxels × time × subjects (Beckmann and
Smith, 2005). In the multimodal scenario this would most likely translate into voxels ×
subjects × modalities. This is related to the PARAFAC model (see Nielsen 2004 for a
VB-based implementation) but with the addition of spatial-independence priors.
method assumes that each component has a single spatial map for all modalities, applied
to each modality with different weightings. This can be a beneficial feature because it
avoids unnecessary duplication of the spatial maps and can allow them to be inferred more
accurately when the assumption holds. However this is effectively a strong prior on the
nature of the spatial distribution and it may be inappropriate, for example if the number
of voxels is different or if the spatial maps in different modalities are not similar.
Using a modular Bayesian framework, we have developed a novel “Linked ICA” general model
that allows for either tensor ICA or spatially-concatenated ICA, or a combination of both at the
same time. The same subject loading matrix is shared between all of the modalities, so each
component consists of a single subject-course and one spatial map in each of the modalities.
The subject weighting matrix automatically balances information from all of the modalities.
This novel Linked ICA method will be applied to a data set with four different modalities, ac-
quired from 93 subjects (probable-Alzheimer’s patients and age-matched controls). One of these
modalities is a grey matter partial volume map (“GM”) derived from Voxel-Based Morphome-
try (VBM) methods (Ashburner and Friston, 2000), and the other three are measures of white
matter integrity: Fractional Anisotropy (FA), Mean Diffusivity (MD), and an orthogonal Tensor
Mode (MO) described in Ennis and Kindlmann (2006). These last three modalities have been
projected onto a two-dimensional white matter surface (the “skeleton”) using a Tract-Based
Spatial Statistics (TBSS) analysis (Smith et al., 2006).
(a) Linked ICA matrix diagram (b) Spatially-concatenated ICA
that the same subject loading matrix H is used for all of the modality groups, but otherwise
they are K separate Tensor ICAs, each with separate data dimensions Nk× Tk× R (voxels ×
modalities × subjects). Each of the modality groups contains one or more modalities stacked
together, expressed in terms of spatial maps X(k), modality weights W(k), a shared subject-
weighting matrix H, and additive noise E(k). (b) The spatially-concatenated ICA configuration,
for comparison. This model is almost identical to a standard Bayesian ICA.
(a) The main matrices of the Linked ICA that models multimodal data Y. Note
2.1Linked ICA Model for Multimodal Data Sets
We assume that the data set is from a group of R subjects, each scanned using several different
modalities. It should be noted that the proposed method has the potential to be applied in
any situation where multiple modalities have been collected across a single shared dimension
(subjects, trials, timepoints, etc.). Each of the scans is prepared for analysis using whatever
methods are recommended for a linear regression analysis (or a single-modality ICA) of the
group data. This produces maps for each modality, which can have different spatial masks and
different numbers of voxels. In this model, “modality” is defined as referring to a single contrast
image (per subject) that refers to a particular output extracted from the data. Typically, different
modalities will have different units, different scalings and different noise levels. In some cases,
a single analysis may result in several different contrast images; for example, a diffusion tensor
imaging (DTI) analysis can produce maps of FA (fractional anisotropy), MD (mean diffusivity)
and MO (tensor mode).These are treated as separate modalities as they contain distinct,
complementary, biophysical information.
However, to maintain the benefits of tensor ICA (inferring the same spatial patterns across
modalities) as much as possible, similar modalities can be collected into K “modality groups”.
Modalities in the same modality group must be observations of the same points in space; this
means the modalities must be spatially aligned to each other and have the same spatial mask, and
should also have similar spatial properties (for example, the same amount of smoothing). A good
example of this are multiple diffusion-derived measures projected onto a white matter skeleton
using TBSS. The data can then be packed into a set of 3D arrays Y(k)∈ RNk×Tk×R, where
Nkis the number of voxels in the shared spatial map and Tk≥ 1 is the number of modalities
in the kthmodality group. Each modality group is modelled using a Bayesian tensor ICA
model. This general configuration is shown in figure 1. Note that the Bayesian ICA differs from
standard methods like FastICA (Hyv¨ arinen and Oja, 2000) in that it incorporates dimensionality-
reduction into the ICA method itself by the use of automatic relevance determination (ARD)
priors on components (Choudrey and Roberts, 2001; Bishop, 1999). The model works on the
full-dimensionality data directly and has an additive noise model. The Bayesian ICA also models
an explicitly parametrized non-Gaussian source model (in this case a Gaussian mixture model)
instead of maximizing negentropy (as used in FastICA).
2.2Bayesian Tensor ICA Model
Within each modality-group k the data is modelled as a sum of components using a tensor
decomposition. Each component i = 1...L can be expressed as the tensor product of one
spatial map, one subject-course, and one modality-course. These model the data in modality
group k = 1 : K, modality t = 1 : Tk, subject r = 1 : R and voxel n = 1 : Nkas
weightings for component i in modality t (of modality group k), and Hi,r are the weights for
component i in subject r. For simplicity this model is used even when Tk= 1, so that W(k)
just a scalar. Crucially, the same H matrix is shared between all of the modality-groups; this
forms a link between the different modality groups, which are otherwise modelled completely
separately. The ithcomponent has the same subject weightings across modality groups but each
group has its own spatial map. Thus the number of repeats R and the maximum number of
components L must be the same everywhere, because these dimensions are shared, while Nkand
Tk are not. Uncorrelated Gaussian residuals are assumed, with the modality-dependent noise
precision (inverse variance) λ(k)
n,iare the spatial maps for component i in modality group k, W(k)
t,iare the modality
Note that this assumes the same noise variance for each voxel, while in the original data there
may actually be large (orders-of-magnitude) differences in the white noise intensity. To correct
for this, we rely on a robust preprocessing method called variance normalization which is widely
used for ICA on functional MRI (Beckmann and Smith, 2004).
The sketch of the Linked ICA matrices is shown in figure 1, and figure 2 shows how these
variables fit into the full Linked ICA graphical model; this also includes the hyperparameters
explained in the next two sections. Aside from the shared matrix H, linked ICA model is identical
to performing separate tensor ICA analyses: each modality group k has its own separate source
mixture model, as well as having its own noise model and separate ARD priors to drive different
patterns of sparsity. Note that r indexes the “repeats” dimension and is the dimension that is
shared across modality groups, for example r indexes subjects in the multi-subject application.
2.2.1 Adaptive Modality-weighting
The tensor model (equation 1) implies that the same spatial sources X(k)
different maps t ∈ 1..Tk, with weightings given by W(k)
(Beckmann and Smith, 2005), this t dimension indexes over repeats of the same scan, such as in
a multi-subject FMRI data from a study with identical stimulus timings. In that case it makes
sense to assume the same noise level for all timepoints, i.e. use only a scalar λ(k). Instead, the
·,iare used for all of the
t,i. In previous tensor ICA applications