Model-based clustering of meta-analytic functional imaging data.
ABSTRACT We present a method for the analysis of meta-analytic functional imaging data. It is based on Activation Likelihood Estimation (ALE) and subsequent model-based clustering using Gaussian mixture models, expectation-maximization (EM) for model fitting, and the Bayesian Information Criterion (BIC) for model selection. Our method facilitates the clustering of activation maxima from previously performed imaging experiments in a hierarchical fashion. Regions with a high concentration of activation coordinates are first identified using ALE. Activation coordinates within these regions are then subjected to model-based clustering for a more detailed cluster analysis. We demonstrate the usefulness of the method in a meta-analysis of 26 fMRI studies investigating the well-known Stroop paradigm.
- Franco Cauda, Sara Palermo, Tommaso Costa, Riccardo Torta, Sergio Duca, Ugo Vercelli, Giuliano Geminiani, Diana M E Torta[Show abstract] [Hide abstract]
ABSTRACT: Several studies have attempted to characterize morphological brain changes due to chronic pain. Although it has repeatedly been suggested that longstanding pain induces gray matter modifications, there is still some controversy surrounding the direction of the change (increase or decrease in gray matter) and the role of psychological and psychiatric comorbidities. In this study, we propose a novel, network-oriented, meta-analytic approach to characterize morphological changes in chronic pain. We used network decomposition to investigate whether different kinds of chronic pain are associated with a common or specific set of altered networks. Representational similarity techniques, network decomposition and model-based clustering were employed: i) to verify the presence of a core set of brain areas commonly modified by chronic pain; ii) to investigate the involvement of these areas in a large-scale network perspective; iii) to study the relationship between altered networks and; iv) to find out whether chronic pain targets clusters of areas. Our results showed that chronic pain causes both core and pathology-specific gray matter alterations in large-scale networks. Common alterations were observed in the prefrontal regions, in the anterior insula, cingulate cortex, basal ganglia, thalamus, periaqueductal gray, post-and pre-central gyri and inferior parietal lobule. We observed that the salience and attentional networks were targeted in a very similar way by different chronic pain pathologies. Conversely, alterations in the sensorimotor and attention circuits were differentially targeted by chronic pain pathologies. Moreover, model-based clustering revealed that chronic pain, in line with some neurodegenerative diseases, selectively targets some large-scale brain networks. Altogether these findings indicate that chronic pain can be better conceived and studied in a network perspective. c 2014 Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http: // creativecommons.org / licenses / by-nc-nd / 3.0 /).Frontiers in Human Neuroscience 01/2014; 4:676-686. · 2.91 Impact Factor - [Show abstract] [Hide abstract]
ABSTRACT: Age-related increases in right frontal cortex activation are a common finding in the neuroimaging literature. However, neurocognitive factors contributing to right frontal over-recruitment remain poorly understood. Here we investigated the influence of age-related reaction time (RT) slowing and white matter (WM) microstructure reductions as potential explanatory factors for age-related increases in right frontal activation during task switching. Groups of younger (N = 32) and older (N = 33) participants completed a task switching paradigm while functional magnetic resonance imaging (fMRI) was performed, and rested while diffusion tensor imaging (DTI) was performed. Two right frontal regions of interest (ROIs), the dorsolateral prefrontal cortex (DLPFC) and insula, were selected for further analyses from a common network of regions recruited by both age groups during task switching. Results demonstrated age-related activation increases in both ROIs. In addition, the older adult group showed longer RT and decreased fractional anisotropy in regions of the corpus callosum with direct connections to the fMRI ROIs. Subsequent mediation analyses indicated that age-related increases in right insula activation were mediated by RT slowing and age-related increases in right DLPFC activation were mediated by WM microstructure. Our results suggest that age-related RT slowing and WM microstructure declines contribute to age-related increases in right frontal activation during cognitive task performance.Neuroscience 08/2014; · 3.12 Impact Factor - SourceAvailable from: Angela R Laird[Show abstract] [Hide abstract]
ABSTRACT: Spatial normalization-applying standardized coordinates as anatomical addresses within a reference space-was introduced to human neuroimaging research nearly 30 years ago. Over these three decades, an impressive series of methodological advances have adopted, extended, and popularized this standard. Collectively, this work has generated a methodologically coherent literature of unprecedented rigor, size, and scope. Large-scale online databases have compiled these observations and their associated meta-data, stimulating the development of meta-analytic methods to exploit this expanding corpus. Coordinate-based meta-analytic methods have emerged and evolved in rigor and utility. Early methods computed cross-study consensus, in a manner roughly comparable to traditional (nonimaging) meta-analysis. Recent advances now compute coactivation-based connectivity, connectivity-based functional parcellation, and complex network models powered from data sets representing tens of thousands of subjects. Meta-analyses of human neuroimaging data in large-scale databases now stand at the forefront of computational neurobiology.Annual Review of Neuroscience 07/2014; 37:409-434. · 20.61 Impact Factor
Page 1
Model-Based Clustering of Meta-Analytic Functional Imaging Data
Jane Neumann*, D. Yves von Cramon, and Gabriele Lohmann
Max-Planck-Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, D-04103
Leipzig/Germany
Abstract
We present a method for the analysis of meta-analytic functional imaging data. It is based on
Activation Likelihood Estimation (ALE) and subsequent model-based clustering using Gaussian
mixture models, expectation-maximization (EM) for model fitting, and the Bayesian Information
Criterion (BIC) for model selection. Our method facilitates the clustering of activation maxima from
previously performed imaging experiments in a hierarchical fashion. Regions with a high
concentration of activation coordinates are first identified using ALE. Activation coordinates within
these regions are then subjected to model-based clustering for a more detailed cluster analysis. We
demonstrate the usefulness of the method in a meta-analysis of 26 fMRI studies investigating the
well-known Stroop paradigm.
Keywords
fMRI; clustering; ALE; meta-analysis
INTRODUCTION
Functional neuroimaging has become a powerful tool in cognitive neuroscience, which enables
us to investigate the relationship between particular cortical activations and cognitive tasks
performed by a test subject or patient. However, the rapidly growing number of imaging studies
still provides a quite variable picture, in particular of higher-order brain functioning.
Considerable variation can be observed in the results of imaging experiments addressing even
closely related experimental paradigms. The analysis of the consistency and convergence of
results across experiments is therefore a crucial prerequisite for correct generalizations about
human brain functions. This calls for analysis techniques on a meta-level, i.e. methods that
facilitate the post-hoc combination of results from independently performed imaging studies.
Moreover, functional neuroimaging is currently advancing from the simple detection and
localization of cortical activation to the investigation of complex cognitive processes and
associated functional relationships between cortical areas. Such research questions can no
longer be addressed by the isolated analysis of single experiments alone, but necessitate the
consolidation of results across different cognitive tasks and experimental paradigms. This again
makes meta-analyses an increasingly important part in the evaluation of functional imaging
results. Several methodological approaches to the automated meta-analysis of functional
imaging data have recently been proposed, for example, by Turkeltaub et al. (2002); Chein et
al. (2002); Nielsen and Hansen (2004); Nielsen (2005); Neumann et al. (2005); Lancaster et
al. (2005) and Laird et al. (2005a).
© 2007 Wiley-Liss, Inc.
*Correspondence to: Dr. Jane Neumann, Max-Planck-Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, D-04103
Leipzig, Germany. neumann@cbs.mpg.de.
NIH Public Access
Author Manuscript
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
Published in final edited form as:
Hum Brain Mapp. 2008 February ; 29(2): 177–192. doi:10.1002/hbm.20380.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 2
In coordinate-based meta-analyses activation coordinates reported from independently
performed imaging experiments are analyzed in search of functional cortical areas that are
relevant for the investigated cognitive function. In this article we propose to apply a
combination of Activation Likelihood Estimation (ALE) and model-based clustering to this
problem. The former is a form of kernel density estimation, which was recently adapted for
the automated meta-analysis of functional imaging data (Chein et al., 2002; Turkeltaub et al.,
2002). The latter provides a general framework for finding groups in data by formulating the
clustering problem in terms of the estimation of parameters in a finite mixture of probability
distributions (Everitt et al., 2001; Fraley and Raftery, 2002). In the context of functional
imaging, mixture modeling has been used previously for the detection of brain activation in
single-subject functional Magnetic Resonance Imaging (fMRI) data. For example, Everitt and
Bullmore (1999) modeled a test statistic estimated at each voxel as mixture of central and non-
central χ2 distributions. This approach was extended by Hartvig and Jensen (2000) to account
for the spatial coherency of activated regions. Penny and Friston (2003) used mixtures of
General Linear Models in a spatio-temporal analysis in order to find clusters of voxels showing
task-related activity.
The combination of model-based clustering and ALE presented in this article should be viewed
as an extension rather than a replacement of ALE, which is currently the state-of-the-art
approach to the meta-analysis of functional imaging data. ALE is based on representing
activation maxima from individual experiments by three-dimensional Gaussian probability
distributions from which activation likelihood estimates for all voxels can be inferred. These
estimates are then compared to a null-distribution derived from permutations of randomly
placed activation maxima. Successful application of ALE has been demonstrated by Chein et
al. (2002); Turkeltaub et al. (2002); Wager et al. (2004), and by several authors contributing
to Fox et al. (2005). However, one drawback of the method in its current form is its strong
dependency on the standard deviation of the Gaussian. Choosing the standard deviation too
small results in many small activation foci which cover only a small part of the original input
data and do not carry significantly more information than provided by the individual activation
maxima alone. In contrast, using a large standard deviation results in activation foci, which
represent more of the original activation maxima. However, as will be seen in our experimental
data, the size of such foci can by far exceed the extent of corresponding activations typically
found in single fMRI studies. Such ALE foci might thus comprise more than one functional
unit. This can be observed, in particular, in studies with a very inhomogeneous distribution of
activation coordinates. In this case a certain adaptiveness of the method or a hierarchical
approach would be desirable.
We propose to alleviate this problem by first applying ALE to the original data and then
subjecting activation maxima lying within the resulting activation foci to further clustering.
Using a large standard deviation of the Gaussian in the first step yields a new set of activation
maxima from which coordinates with no other activation maxima in their vicinity are removed.
The subsequent model-based clustering then explores the statistical distribution of the
remaining coordinates.
Model-based clustering assumes that the observed data are generated by a finite mixture of
underlying probability distributions. Each probability distribution corresponds to a cluster. Our
particular implementation closely follows the general model-based clustering approach
proposed by Fraley and Raftery (2002). This approach considers mixtures of multivariate
Gaussians. Maximum likelihood estimation of the mixture models is performed via the
expectation-maximization (EM) algorithm (Hartley, 1958; Dempster et al., 1977), which
determines the parameters of the mixture components as well as the posterior probability for
a data point to belong to a specific component or cluster. Since a suitable initialization is critical
Neumann et al. Page 2
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 3
in the successful application of EM, hierarchical agglomerative clustering is performed as an
initializing step.
Varying the parameterization of the covariance matrix of a Gaussian mixture provides a set of
models with different geometric characteristics, reaching from spherical components of equal
shape and volume to ellipsoidal components with variable shape, volume, and orientation
(Banfield and Raftery, 1993). We use a set of 10 different parameterizations. The best
parameterization of the model and the optimal number of clusters are determined using the
Bayesian Information Criterion (BIC) (Schwarz, 1978).
In the following, we provide the methodological background of ALE, Gaussian mixture
models, and BIC for model selection. We then present experimental data showing the
application of the method in a meta-analysis of 26 fMRI experiments investigating the well-
known Stroop paradigm.
METHODS
ALE
ALE, concurrently but independently developed by Turkeltaub et al. (2002) and Chein et al.
(2002), was among the first methods aimed at modeling cortical areas of activation from meta-
analytic imaging data. It was recently extended by Laird et al. (2005a) to account for multiple
comparisons and to enable statistical comparisons between two or more meta-analyses.
Moreover, it has been used in combination with replicator dynamics for the analysis of
functional networks in meta-analytic functional imaging data (Neumann et al., 2005). For the
presented meta-analysis, ALE was implemented as part of the software package LIPSIA
(Lohmann et al., 2001).
In ALE, activation maxima are modeled by three-dimensional Gaussian probability
distributions centered at their Talairach coordinates. Specifically, the probability that a given
activation maximum lies within a particular voxel is
(1)
where σ is the standard deviation of the distribution and d is the Euclidean distance of the voxel
to the activation maximum. For each voxel, the union of these probabilities calculated for all
activation maxima yields the ALE. In regions with a relatively high density of reported
activation maxima, voxels will be assigned a high ALE in contrast to regions where few and
widely spaced activation maxima have been reported.
From the resulting ALE maps, one can infer whether activation maxima reported from different
experiments are likely to represent the same functional activation. A non-parametric
permutation test is utilized to test against the null-hypothesis that the activation maxima are
spread uniformly throughout the brain. Given some desired level of significance α, ALE maps
are thresholded at the 100(1–α)th percentile of the null-distribution. Topologically connected
voxels with significant ALE values are then considered activated functional regions.
The extent and separability of the resulting regions critically depends on the choice of σ in Eq.
(1). As observed, for example, by Derrfuss et al. (2005), decreasing σ leads to smaller regions
of significant voxels and to an increase in the number of discrete above threshold regions which,
however, represent only few of the original activation maxima. Increasing σ has the opposite
effect with larger regions representing more of the original data. Most commonly σ is chosen
Neumann et al.Page 3
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 4
to correspond to the size of spatial filters typically applied to fMRI data. In previously published
ALE analyses (see Fox et al. (2005) for some examples) we found σ to vary between 9.4 and
10 mm FWHM, in rare cases 15 mm were used. In the vast majority of analyses, the standard
deviation of the Gaussian was set to 10 mm FWHM. As we view ALE as a preprocessing step
to model-based clustering, the activation likelihood should not be estimated too conservatively.
Therefore, we use a relatively large standard deviation of σ = 5 mm, corresponding to 11.8 mm
FWHM.
Model-Based Clustering
ALE leads to a reduced list of activation maxima containing only those maxima which have
one or more other maxima in their vicinity. These coordinates are then subjected to clustering
based on a finite mixture of probability distributions. Here, we will closely follow the procedure
suggested by Fraley and Raftery (1998, 2002), who propose a group of Gaussian mixture
models, maximum likelihood estimation via EM, hierarchical agglomeration as initial
clustering, and model and parameter selection via BIC. In the following, the individual parts
of the clustering procedure are described in detail. These parts were implemented for our
application using the software package MCLUST (Fraley and Raftery, 1999, 2003).
Gaussian Mixture Models
For n independent multivariate observations x = (x1, …, xn), the likelihood of a mixture model
with M components or clusters can be written as
(2)
where fk is the density of the cluster k with parameter vector θk, and p = (p1,…,pM) is the vector
of mixing proportions with pk ≥ 0 and ∑k pk = 1. Since any distribution can be effectively
approximated by a mixture of Gaussians (Silverman, 1985; Scott, 1992), the probability density
function is most commonly represented by
(3)
for d-dimensional data with mean μk and covariance matrix ∑k. Geometrical features of the
components can be varied by parameterization of the covariance matrices ∑k. Banfield and
Raftery (1993) suggest various parameterizations through the eigenvalue decomposition
(4)
Dk is the matrix of eigenvectors, Ak is a diagonal matrix with elements that are proportional to
the eigenvalues of ∑k such that |Ak| = 1, and λk is a scalar. Treating Dk, λk, and Ak as independent
parameters and keeping them either constant or variable across clusters varies the shape,
volume, and orientation of the components. In the simplest case ∑k = λI, all clusters are
spherical and of equal size. The least constraint case given in Eq. (4) accounts for ellipsoidal
clusters of variable shape, volume, and orientation. All parameterizations available in
MCLUST and applied to our experimental data are presented in Table I. The first two models
have spherical, all other models have ellipsoidal components, whereby components in models
with diagonal covariance matrices (c–f) are oriented along the coordinate axes. Models with
Neumann et al.Page 4
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 5
identical matrix A for all components have equally shaped components, whereas models with
identical λ for all components have components of the same volume.
Maximum Likelihood Estimation
Maximum likelihood estimation of a Gaussian mixture model as defined in Eqs. (2) and (3)
can be performed via the widely used EM algorithm, which provides a general approach to
parameter estimation in incomplete data problems (Dempster et al., 1977;Hartley, 1958;Neal
and Hinton, 1998). In general, given a likelihood function L(θ|y) = Πi f (yi|θ), for parameters
θ and data y = (y1…,yn), we wish to find θ̂ such that
In the presence of some hidden data z such that y = (x,z) with x observed and z unobserved, we
can equivalently maximize the so-called complete-data log likelihood and find θ̂ such that
Starting from an initial guess, the EM algorithm proceeds by alternately estimating the
unobservable data z and the unknown parameters θ. Specifically, in the E-step, the algorithm
calculates the expected value of the complete-data log likelihood with respect to z given x and
the current estimate of θ. In the M-step, this expected value is maximized in terms of θ, keeping
z fixed as computed in the previous E-step.
In our application, the complete data y = (y1…,yn), consists of yi = (xi,zi) where each xi is a
three-dimensional vector containing coordinates of activation maxima in Talairach space and
zi = (zi1,…,ziM) is the unknown membership of xi in one of the M clusters, i.e.
With the density of observation xi given zi written as Πk fk(xi|μk,∑k)zik, the complete-data log
likelihood in our problem can be formulated as
(5)
assuming that each zi is independently and identically distributed according to a multinomial
distribution of one draw from M categories with probabilities p1,…pM (Fraley and Raftery,
1998).
Maximum likelihood estimation is performed by alternating between the calculation of zik
given xi, μk, and ∑k (E-step) and maximizing Eq. (5) with respect to μk, ∑k, and pk with zik
fixed (M-step). Mathematical details of the algorithm are given in Appendix A. The EM
algorithm terminates after the difference between successive values of ℓ falls below some
threshold ε, which in our application was set to ε = 0.00001. The value of zik at the maximum
of Eq. (5) is the estimated probability that xi belongs to cluster k, and the maximum likelihood
classification of xi is the cluster k, with
Neumann et al.Page 5
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 6
Initialization by Hierarchical Agglomeration
Following the suggestion by Fraley and Raftery (1998), we employ model-based hierarchical
agglomeration provided in MCLUST as initializing partitioning method. This method tends to
yield reasonable clusterings in the absence of any information about a possible clustering
inherent in the data (Fraley and Raftery, 2002).
Hierarchical agglomeration techniques typically start with a pre-defined number of clusters
and in each step merge the two closest clusters into a new cluster, thereby reducing the number
of clusters by one. The implementation used here starts with n clusters, each containing a single
observation xi. Then, two clusters are chosen such that merging them increases the so-called
classification likelihood, given as
(6)
with fk(xi) given in Eq. (3). The vector c = (c1,…,cn) encodes the classification of the data, i.e.
ci = k, if xi is classified as member of cluster k. For an unrestricted covariance matrix as defined
in Eq. (4), approximately maximizing the classification likelihood (6) amounts to minimizing
where nk is the number of elements in cluster k and Wk is the within-cluster scattering matrix
of cluster k as defined in Eq. (8) in Appendix A (Banfield and Raftery, 1993). Computational
issues on this clustering procedure are discussed in detail by Banfield and Raftery (1993) and
Fraley (1998), in particular regarding the initial stages with a single data point in each cluster,
which leads to |W| = 0.
From the values of c at the maximum of C, initializations for the unknown membership values
zik are derived, and first estimates for the parameters of the Gaussian components can be
obtained from an M-step of the EM algorithm as described in Appendix A.
Model Selection via BIC
A problem of most clustering techniques is to determine the number of clusters inherent in the
data. One common technique in model-based clustering is to apply several models with
different pre-defined numbers of components and subsequently choose the best model
according to some model selection criterion. For models with equal number of parameters, the
simplest approach is to compare estimated residual variances. This is not applicable, however,
when models with varying number of parameters are considered.
An advantage of using mixture models for clustering is that approximate Bayes factors can be
used for model selection. Bayes factors were developed originally as a Bayesian approach to
hypothesis testing by Jeffreys (1935, 1961). In the context of model comparison, a Bayes factor
describes the posterior odds for one model against another given equal prior probabilities. It
is determined from the ratio of the integrated likelihoods of the models. In conjunction with
EM for maximum likelihood estimation, the integrated likelihood of a model can be
Neumann et al.Page 6
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 7
approximated under certain regularity conditions by the BIC (Schwarz, 1978), which is defined
as
(7)
where ℓ̂ is the maximized mixture log likelihood of the model, m is the number of independent
parameters of the model, and n the number of data points. With this definition, a large BIC
value provides strong evidence for a model and the associated number of clusters.
The relationship between Bayes factors and BIC, the regularity conditions, and the use of Bayes
factors for model comparison are discussed in more detail, e.g., by Kass and Raftery (1995).
They also provide guidelines for the strength of evidence for or against some model: A
difference of less than 2 between the BIC of two models corresponds to weak, a difference
between 2 and 6 to positive, between 6 and 10 to strong, and a difference greater than 10 to
very strong evidence for the model with the higher BIC value.
Putting Things Together
Taking together the individual parts described above, our algorithm for deriving activated
functional regions from meta-analytic imaging data can be summarized as follows:
1.
Given a list of coordinates encoding activation maxima in Talairach space from a
number of individual studies, calculate ALEs for all voxels using a large standard
deviation of the Gaussian. Determine those coordinates that fall within the regions
above the ALE threshold.
2.
Determine a maximum number of clusters M. Perform hierarchical agglomeration for
up to M clusters using the reduced coordinate list obtained in Step 1 as input, thereby
approximately maximizing the classification likelihood as defined in Eq. (6).
3.
For each parameterization and number of clusters of the model as defined in Eq. (5)
perform EM, using the classification obtained in Step 2 as initialization.
4.
Calculate the BIC for each parameterization and number of clusters in the model
according to Eq. (7)
5.
Choose the parameterization and number of clusters with a decisive maximum BIC
value as solution according to the guidelines above.
Experimental Data
Our method was applied in a meta-analysis of 26 fMRI experiments employing the well-known
Stroop paradigm (Stroop, 1935). A list of included studies is given in Appendix B. The Stroop
paradigm is designed to investigate interference effects in the processing of a stimulus while
a competing stimulus has to be suppressed. For example, subjects are asked to name a color
word, say “red,” which is presented on a screen in the color it stands for (congruent condition)
or in a different color (incongruent condition). Other variants of the Stroop paradigm include
the spatial word Stroop task (the word “above” is written below a horizontal line), the counting
Stroop task (the word “two” appears three times on the screen) and the object-color Stroop task
(an object is presented in an atypical color, e.g. a blue lemon).
This particular paradigm was chosen as a test case for our method, because the interference
effect and the associated cortical activations are known to be produced very reliably.
Activations are most commonly reported in the left inferior frontal region, the left inferior
parietal region, and the left and right anterior cingulate (Banich et al., 2000; Liu et al., 2004;
McKeown et al., 1998). Our own previous meta-analysis based on ALE and subsequent
Neumann et al. Page 7
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 8
application of replicator dynamics (Neumann et al., 2005) revealed a frontal network including
the presupplementory motor area (preSMA), the inferior frontal sulcus (IFS) extending onto
the middle frontal gyrus, the anterior cingulate cortex (ACC) of both hemispheres, and the
inferior frontal junction area (IFJ). Other frequently reported areas include frontopolar cortex,
occipital cortex, fusiform gyrus, and insula (Laird et al., 2005b; Zysset et al., 2001).
Despite the high agreement in the reported activated areas, the actual location of associated
coordinates in Talairach space differs widely between studies. For example, the left IFJ was
localized in previous studies at Talairach coordinates x between −47 and −35, y between −4
and 10, and z between 27 and 40 (Brass et al., 2005; Derrfuss et al., 2004, 2005; Neumann et
al., 2005). Such high variability makes the classification of the data into distinct functional
units difficult.
We applied our analysis to data extracted from the BrainMap database (Fox and Lancaster,
2002). This database provides Talairach coordinates of activation maxima from functional
neuroimaging experiments covering a variety of experimental paradigms and imaging
modalities. At the time of writing the database contained over 27,500 activation coordinates
reported in 790 papers.
Searching the database for fMRI experiments investigating the Stroop interference task resulted
in 26 peer-reviewed journal publications. Within these studies, 728 Talairach coordinates for
activation maxima were found. The majority of these coordinates (550 out of 728) represented
the Stroop interference effect, i.e. significant activation found for the contrasts incongruent ≥
congruent, incongruent ≥ control, or incongruent + congruent ≥ control. As control condition,
either the presentation of a neutral object (e.g. “XXXX” instead of a color word) or a simple
visual fixation were used. Fifty-five coordinates were marked as deactivation in the database,
i.e. they represent the contrast congruent ≥ incongruent. The remaining coordinates were
reported to represent other contrasts such as the contrast between different Stroop modalities
or a conjunction of Stroop interference, spatial interference, and the Flanker task. Note that 26
coordinates came from a meta-analysis on Stroop interference, nine coordinates represented
the interference effect in pathological gamblers, and all remaining coordinates were taken from
group studies with healthy subjects.
As the focus of our work is on the development of meta-analysis tools rather than the
investigation of the Stroop paradigm, all 728 coordinates were subjected to the subsequent
analysis without any further selection. This not only enabled us to test our method on a
reasonably large data set, it also introduced some “realistic” noise into our data.
Plots of all coordinates projected onto a single axial, sagittal, and coronal slice are shown in
the top row of Figure 1. Coordinates reported from different studies are represented by different
colors. As can be seen, activation maxima are distributed over large parts of the cortex, although
some areas with a higher density of activation coordinates are already apparent, in particular
in the left lateral prefrontal cortex and the medial frontal cortex. These can be seen more clearly
in the example slices in the bottom row of Figure 1.
Experimental Results
Activation coordinates were first subjected to an ALE analysis with standard deviations of σ
= 5 mm, corresponding to 11.8 mm FWHM. The null distribution was derived from 1,000
iterations of randomly placing 728 activation coordinates over a mask brain volume defined
by the minimum and maximum Talairach coordinates in the original data set. The brain mask
spanned a volume of 61,408 voxels, each 3 × 3 × 3 mm3 in size. As suggested by Turkeltaub
et al. (2002), the resulting ALE map was thresholded at an α-level of α = 0.01%. This
Neumann et al. Page 8
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 9
corresponded to an ALE threshold of 0.0156. Figure 2 shows sagittal and axial example slices
of the ALE map containing only voxels above threshold.
The ALE analysis yielded 13 regions of topologically connected voxels above threshold, which
covered a total volume of 54,810 mm3 and contained 210 of the original activation maxima.
Table II shows size, maximum ALE value, location of the center in Talairach space, and the
number of original activation coordinates covered by the detected ALE regions.
Note that the four largest regions cover 89.65% (49,140 mm3) of the total ALE regions’ volume.
They contain 83.8% of all above-threshold coordinates. This can be explained by the very
inhomogeneous distribution of the original input coordinates: More than 40% of the original
activation maxima fell within regions spanned by the minimum and maximum Talairach
coordinates of the four largest ALE regions. The remaining coordinates were distributed more
evenly over other parts of the cortex.
Note further that some smaller regions surviving the ALE threshold contain only single
activation maxima. This seems counterintuitive at first, as a single coordinate should not result
in a relatively high ALE value. However, imagine, for example, a situation where three
coordinates are arranged in a “row,” i.e. at three voxels in the same row of a slice with one
voxel between them. The voxel in the middle will get a higher empirical ALE value than the
ones at both ends, as it has two other coordinates in close distance (only two voxels away)
whereas the other two voxels have one coordinate in close distance and another one four voxels
further away. Depending on the distribution of other coordinates, thresholding the ALE values
could now shape the surviving ALE region such that only the coordinate in the middle will be
inside the region, whereas the value at the other two voxels might just be too small to survive
the thresholding. Thus, ALE regions containing only a single coordinate are caused by very
small groups of activation maxima that are quite isolated from the remaining ones. The fact
that some of our ALE regions contain only a single coordinate indicates that all remaining
activation coordinates, not surviving the thresholding, are very isolated from each other. They
can therefore be regarded as noise.
Despite the use of a very small α-level in ALE thresholding, some of the determined ALE foci
clearly exceed the size of cortical activations typically found in these regions for the Stroop
paradigm (see, e.g. Zysset et al. (2001) for a comparison). Moreover, as seen in Figure 2, within
such foci, in particular in the left prefrontal cortex, sub-maxima of ALE values are visible,
indicating a possible sub-clustering of the represented activation coordinates. All above-
threshold activation coordinates were therefore subjected to model-based clustering as the
second part of our method.
Hierarchical agglomeration of the above-threshold coordinates was first performed for up to
30 clusters. Using the results as initialization for the EM algorithm, models as defined in Eq.
(5) with the parameterizations introduced in Section Model-Based Clustering with up to 30
clusters were then applied to the data set, and BIC values were calculated for each number of
clusters and parameterization.
The three models with λk = λ, i.e. models with components of equal volume, outperformed the
remaining models, which all allowed for components of variable volume. This seems
counterintuitive at first, as a more variable model would be expected to fit the data better than
a more restricted one. However, as described above, the BIC value penalizes model complexity,
which is larger for models with variable components than for models with equal components.
Thus, for our data, allowing the components’ volume to vary did not increase the log likelihood
of the models sufficiently in order to justify the increased number of model parameters. Note
also that for very large cluster numbers, some more variable models failed to provide a
Neumann et al.Page 9
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 10
clustering due to the singularity of the associated covariance matrices. This was not the case
for models with fewer free parameters, however.
Figure 3 shows plots of the BIC values of the best three models for up to 30 clusters. BIC
values of these models are very similar, in particular for models with more than 20 clusters.
The right side shows an enlarged plot of the BIC values for models with 20 up to 25 clusters.
All three models yielded the highest BIC value when applied with 24 clusters. The more
complex models with ellipsoidal components slightly outperformed the spherical one, whereby
the difference between a variable and a fixed orientation of the components was negligible.
Figure 4 shows the results of the model-based clustering exemplified for the two largest ALE
regions, which were situated in the left lateral prefrontal cortex (left LPFC) and the medial
frontal cortex (MFC), respectively (cf. Table II). The categorization of activation coordinates
within the left LPFC is shown in five consecutive sagittal functional slices at Talairach
coordinates between x = −34 and x = −46. The coordinates in this ALE region were subdivided
into five groups in anterior-posterior and superior-inferior direction. In the most posterior and
superior part of the region a further division in lateral-medial direction can be observed (shown
in green and blue). Interestingly, cluster centers of the more anterior and inferior clusters
corresponded closely to the sub-maxima in the ALE focus visible in Figure 2. However, the
division of posterior and superior parts of the region into two clusters could not have been
predicted from the ALE sub-maxima. The same holds for the clustering of coordinates in the
MFC, where no sub-maxima could be observed in the ALE map. The categorizations of
coordinates in the MFC is shown in the right panel of Figure 4 in four consecutive sagittal
slices. The best model provided four clusters, again dividing the region in anterior-posterior
and superior-inferior direction. Thus, model-based clustering revealed some additional
structure in the data that would have remained undetected when using ALE alone. To get some
feeling for the actual shape of the clusters and their relative location, the extracted clusters are
presented again in views from different angles in Figure 5.
The robustness of our method against noisy input data was tested in a post-hoc analysis
including only the 550 activation coordinates that truly represented the Stroop interference
effect. The results did not significantly differ from the results of the original analysis. The noise
in the original input data thus did not have a noteworthy impact on the results of the model-
based clustering.
DISCUSSION
ALE facilitates the detection of cortical activation from activation maxima reported in
independently performed functional imaging studies. The resulting areas reflect the distribution
of activation maxima over the cortex. In particular, clusters of activation maxima in a region
reflect the likely involvement of this region in processing a cognitive task, whereas isolated
activation maxima are regarded as noise.
Our analysis shows that the extent of ALE regions can vary considerably due to the
heterogeneous distribution of the input data across different parts of the cortex. As seen in
Table II and Figure 2, the size of some ALE foci obtained in the first step of our analysis by
far exceeded the extent of comparable activations reported in single fMRI experiments. For
example, activation maxima reported by Zysset et al. (2001) for two separated activations in
the posterior (Tal: −38, 5, 30) and the anterior (Tal: −38, 35, 5) inferior frontal sulcus are both
located within the same ALE region in our analysis. This is caused by the high number of
activation coordinates within this region together with their high spatial variability. Moreover,
within the largest ALE focus located in the left LPFC, sub-maxima could be observed,
indicating a possible sub-clustering of the region.
Neumann et al. Page 10
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 11
One simple way to separate several areas within such a large ALE region would be the choice
of a higher ALE threshold. However, this is problematic if a whole brain analysis is performed,
since ALE values in other regions might be significantly lower despite a high concentration of
activation coordinates. For example, in Figure 2b a cluster of activation coordinates can clearly
be seen in the anterior part of the left intraparietal sulcus. However, the resulting ALE focus
representing no less than 25 activation coordinates has a maximum ALE value of only 0.027
in comparison to 0.05 in the left LPFC. Thus, by simply choosing a higher ALE threshold,
some clusters of activation coordinates might remain undetected.
We tried to alleviate this problem by following a hierarchical approach. In a first step, ALE is
used to identify regions with high concentration of activation coordinates. In a second step,
large ALE regions are further investigated in search for a possible sub-division.
Applying this two-step procedure to activation maxima from 26 Stroop experiments first
resulted in relatively large ALE regions, in particular in the frontal lobe (cf. Fig. 2). This is in
line with earlier findings on frontal lobe activity, in particular in a meta-analysis by Duncan
and Owen (2000) who reported cortical regions of large extent to be recruited by a variety of
cognitive tasks. However, in contrast to this study, our analysis pointed to a possible further
sub-clustering of these areas. The two largest ALE regions found in the left lateral prefrontal
cortex and the medial frontal wall were partitioned into five and four clusters, respectively.
While our exploratory analysis technique does not have the power to associate specific
cognitive functions to these clusters, this finding could serve as a hypothesis for a further
functional specialization of these regions.
The main directions of the clustering were in parallel to the coordinate axes, primarily in
anterior-posterior and superior-inferior direction. This corresponds well with recent results
from single-subject and group analyses obtained from a variety of analysis techniques as well
as from other meta-analyses, see e.g. Neumann et al. (2006); Forstmann et al. (2005); Koechlin
et al. (2003); Müller et al. (2003) for LPFC, and Forstmann et al. (2005) and Amodio and Frith
(2006) for MFC clustering.
It is important to be clear about the implicit assumptions made in the application of our analysis
technique. Meta-analyses are aimed at consolidating results from several studies in order to
find general mechanisms related to a particular task, class of paradigms, etc. Thus, if we want
to generalize the findings of any meta-analysis, we must assume that the data extracted from
the included studies are a representative sample of all the data collected for the investigated
phenomenon. Note, however, that this must be assumed in any empirical analysis relying on
sampled data. A second, closely related, assumption specific to clustering activation
coordinates is that the inherent distribution of activation for the investigated phenomenon is
completely represented by the investigated data.
In a meta-analysis, these assumptions are sometimes hard to meet because of the selective
publication of activation coordinates from particular cortical regions, a problem often referred
to as “publication or literature bias.” In the majority of experimental studies, only a specific
aspect of a paradigm or a particular cortical region are investigated and, consequently, some
significantly activated regions found for a stimulus might be neglected in the publication of
the results. This can result in overemphasizing some regions while neglecting others, which in
turn can lead to a nonrepresentative distribution of our input data. A careful and informed
selection of studies included in such an analysis and the inclusion of as much data as possible
is thus indispensable.
For our example analysis we used a very large data set, in order to reduce the effects of the
publication bias. Note, however, that our method also works for smaller analyses. For very
small numbers of activation maxima, the maximum number of clusters might have to be
Neumann et al. Page 11
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 12
reduced, to avoid singularity problems in the estimation of the covariance matrix. Moreover,
for very small or very homogeneously distributed data sets, the problem of very large ALE
regions might not arise in the first place. In this case, the results of the model-based clustering
should not differ significantly from the application of ALE alone.
The clustering technique presented here is purely data-driven. That is, the results are
exclusively derived from the spatial distribution of the input data and restricted only by the
constraints on the geometry of the mixture model components. Here, additional constraints
such as anatomical or cytoarchitectonic boundaries between cortical regions are conceivable.
How such constraints can be incorporated into the mathematical framework of mixture
modeling is a question that will be addressed in future work.
As noted earlier, in ALE the extent and number of above threshold clusters critically depend
on the choice of a suitable standard deviation of the Gaussian. Nielsen and Hansen (2002) offer
an interesting approach to this problem by optimizing the standard deviation of a Gaussian
kernel when modeling the relation between anatomical labels and corresponding focus
locations. Similar to ALE, activation maxima are modeled by three-dimensional Gaussian
probability distributions and the standard deviation is optimized by leave-one-out cross
validation (Nielsen and Hansen, 2002). In our hierarchical approach, the choice of σ is less
critical and the use of a large standard deviation is feasible, as ALE is used only as a pre-
processing step for model-based clustering. We can thus make use of as much information
present in the data as possible. Note that the use of an even larger standard deviation did not
have any effect on the choice of activation coordinates entering the second step of our analysis,
although some ALE regions were merged and slightly extended. The results of the model-based
clustering for a larger standard deviation would therefore be identical to the results presented
here for σ = 5 mm.
A second parameter, influencing the outcome of an ALE analysis, is the size of the mask volume
used for deriving the null-hypothesis. Clearly, the size of the volume has some influence on
the ALE threshold corresponding to the desired α-level. Therefore, the mask volume chosen
should match the volume spanned by the empirical activation maxima included in the analysis.
In our example, the activation coordinates obtained from the database were distributed over
the entire brain volume, including subcortical regions and even some white matter. We
therefore chose as a mask the entire volume of a brain, normalized to the standard size provided
by the software package LIPSIA (Lohmann et al., 2001). The distribution of the random
activation foci was then restricted to the area spanned by the minimum and maximum Talairach
coordinates of the 728 empirical maxima. Note, however, that the particular choice of the mask
volume is less critical than might appear at first sight. This is due to the large ratio between
the empirical maxima and the number of voxels in the mask (in our analysis 728 and 61,408
voxels, respectively). For example, reducing the mask volume by 1/2 in our example analysis
would change the ALE threshold only from 0.0156 to 0.018. The resulting thresholded ALE
map would still contain the vast majority of the activation maxima that exceed the threshold
when the full mask volume is used. This shows that slight variations in the mask volume do
not significantly change the outcome of the subsequent model-based clustering.
Note that in our example data, ALE values were not corrected for multiple comparison (Laird
et al., 2005a). Rather, as suggested in the original work by Turkeltaub et al. (2002), values
were thresholded at a very small α-level of 0.01% (P = 0.0001) to protect from family-wise
Type I errors. Correction was omitted for the sake of simplicity, keeping in mind that (1) in
our approach ALE serves as a pre-processing step to model-based clustering and therefore
should not be performed too conservatively, and (2) the aim of model-based clustering is the
sub-clustering of large ALE foci which would in any case survive the correction procedure.
Moreover, Laird and colleagues, when introducing multiple comparison correction for ALE,
Neumann et al.Page 12
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 13
compared it to uncorrected thresholding with small thresholds and observed: “It is clear that
thresholding the ALE maps at P < 0.0001 (uncorrected) produced results that most closely
matched the FDR-corrected results (Laird et al., 2005a, p. 161).” This confirms our own
empirical observation that correcting ALE values, though statistically sound, in practical terms
often amounts to using a smaller threshold without correction, as was done in the example
provided here. However, we wish to point out that model-based clustering can in principle be
applied to any activation coordinates. Thus, there are no restrictions on using it in conjunction
with ALE values corrected for multiple comparisons.
The second step of our analysis procedure pertains to fitting Gaussian mixtures to the activation
coordinates that survived the ALE threshold in the first analysis step. Although Gaussians are
the most commonly used components in mixture modeling, they have a well-known limitation:
Gaussian mixture models have a relatively high sensitivity to outliers which can lead to an
over-estimation of the number of components (Svensén and Bishop, 2004). However, we
would argue that this is not a critical issue in our particular application, since such outliers are
removed by ALE before the actual clustering.
Like in many clustering problems, the true number of clusters for a given set of activation
maxima is not known in advance. This can be problematic as most clustering techniques require
the number of clusters to be pre-specified. In the model-based clustering approach suggested
here, this problem is solved by fitting a set of models with different numbers of clusters to the
data and applying a model selection criterion afterwards. The use of the BIC as model selection
criterion allows us to select the best number of clusters and the model parameterization
simultaneously. Like most model selection criteria, the BIC follows the principle of Occam’s
razor and favors from two or more candidate models the model that fits the data sufficiently
well in the least complex way. In our context, this idea can be expressed formally using the
estimated log likelihood of the models and a fixed penalizing term encoding the number of
parameters of each model. Here, alternative approaches such as the Akaike Information
Criterion (AIC) (Akaike, 1973) or the Deviance Information Criterion (DIC) (Spiegelhalter et
al., 2002) are conceivable. AIC, for example, is strongly related to BIC as it only differs in the
simpler penalty term 2 m (cf. Eq. 7). This means, however, that for large sample sizes, AIC
tends to favor more complex models compared to BIC. Other conceivable strategies include
model selection procedures based on data-driven rather than fixed penalty terms (e.g. Shen and
Ye, 2002), or stochastic methods which allow an automatic determination of the number of
components in the process of modelling (e.g. Abd-Almageed et al., 2005; Richardson and
Green, 1997; Svensén and Bishop, 2004). The application of different model selection criteria
and their influence on the result of the clustering will be one direction of future research.
Finally, note the relationship of different parameterizations of the Gaussians to other clustering
criteria. For example, for the spherical model ∑k = λI, maximizing the complete-data log
likelihood in Eq. (5) refers to minimizing the standard k-means clustering criterion tr(W) where
W is the within-cluster scatter matrix as defined in Eq. (A1) and Eq. (A2) in Appendix A.
Maximizing the likelihood of the ellipsoidal model ∑k = λDADT is related to the minimization
of det(W). Thus, allowing the parameterization of the covariance matrices to vary, model-based
clustering encompasses and generalizes a number of classical clustering procedures.1 The
general problems of choosing an appropriate clustering technique and the optimal number of
clusters are then formulated as model selection problem (Fraley and Raftery, 2002).
1For a more detailed discussion on the relation between classical cluster criteria and constraints on the model covariance matrix see, e.g.,
Everitt et al. (2001); Celeux and Govaert (1995); Banfield and Raftery (1993).
Neumann et al.Page 13
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 14
CONCLUSION
We have presented a new method for the coordinate-based meta-analysis of functional imaging
data that facilitates the clustering of activation maxima obtained from independently performed
imaging studies. The method provides an extension to ALE and overcomes two of its
drawbacks: the strong dependency of the results on the chosen standard deviation of the
Gaussian and the relatively large extent of some ALE regions for very inhomogeneously
distributed input data. When applied in a meta-analysis of 26 comparable fMRI experiments,
the method resulted in functional regions that correspond well with the literature. Further
developments of our method could include the use of different model selection criteria and
further constraints on the model components incorporating additional anatomical or
cytoarchitectonic information.
Acknowledgments
We wish to thank Chris Fraley and Adrian Raftery for helpfully commenting on parts of the manuscript. We thank
the BrainMap development team for providing access to the database and for very helpful technical support.
Contract grant sponsor: NIH; Contract grant number: R01 MH74457; Contract grant sponsors: The National Institute
of Mental Health and the National Institute of Biomedical Imaging and Bioengineering.
REFERENCES
Abd-Almageed W, El-Osery A, Smith CE. Estimating time-varying densities using a stochastic learning
automaton. Soft Comput J 2005;10:1007–1020.
Akaike, H. Information theory and an extension of the maximum likelihood principle. Proceeding of the
Second International Symposium on Information Theory; Budapest. 1973. p. 267-281.
Amodio DA, Frith CD. Meeting of minds: The medial frontal cortex and social cognition. Nat Rev
Neurosc 2006;7:268–277.
Banfield J, Raftery A. Model-based Gaussian and non-Gaussian clustering. Biometrics 1993;49:803–
821.
Banich MT, Milham MP, Atchley RA, Cohen NJ, Webb A, Wszalek T, Kramer AF, Liang Z-P, Wright
A, Shenker J, Magin R, Barad V, Gullett D, Shah C, Brown C. fMRI studies of stroop tasks reveal
unique roles of anterior and posterior brain systems in attentional selection. J Cogn Neurosci
2000;12:988–1000. [PubMed: 11177419]
Brass M, Derrfuss J, Forstmann B, von Cramon DY. The role of the inferior frontal junction area in
cognitive control. Trends Cogn Sci 2005;9:314–316. [PubMed: 15927520]
Celeux G, Govaert G. Gaussian parsimonious clustering model. Pattern Recognit 1995;28:781–793.
Chein JM, Fissell K, Jacobs S, Fiez JA. Functional heterogeneity within broca’s area during verbal
working memory. Physiol Behav 2002;77:635–639. [PubMed: 12527011]
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the em algorithm.
J R Stat Soc B 1977;39:1–38.
Derrfuss J, Brass M, Neumann J, von Cramon DY. Involvement of the inferior frontal junction in
cognitive control: Meta-analyses of switching and stroop studies. Hum Brain Mapp 2005;25:22–34.
[PubMed: 15846824]
Derrfuss J, Brass M, von Cramon DY. Cognitive control in the posterior frontolateral cortex: Evidence
from common activations in task coordination, interference control, and working memory.
NeuroImage 2004;23:604–612. [PubMed: 15488410]
Duncan J, Owen AM. Common regions of the human frontal lobe recruited by diverse cognitive demands.
Trends Neurosci 2000;23:475–483. [PubMed: 11006464]
Everitt BS, Bullmore ET. Mixture model mapping of brain activation in functional magnetic resonance
images. Hum Brain Mapp 1999;7:1–14. [PubMed: 9882086]
Everitt, BS.; Landau, S.; Leese, M. Cluster Analysis. 4th ed. New York: Oxford University Press; 2001.
Neumann et al. Page 14
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Page 15
Forstmann BU, Brass M, Koch I, von Cramon DY. Internally generated and directly cued task sets: An
investigation with fMRI. Neuropsychologia 2005;43:943–952. [PubMed: 15716164]
Fox PT, Laird AR, Lancaster JL. Meta-Analysis in Functional Brain Mapping (Special Issue). Hum Brain
Mapp 2005;25
Fox PT, Lancaster JL. Mapping context and content: The BrainMap model. Nat Rev Neurosci
2002;3:319–321. [PubMed: 11967563]
Fraley C. Algorithms for model-based Gaussian hierarchical clustering. J Sci Comput 1998;20:270–281.
Fraley C, Raftery AE. How many clusters? Which clustering method? Answers via model-based cluster
analysis. Comput J 1998;41:578–588.
Fraley C, Raftery AE. MCLUST: Software for model-based cluster analysis. J Classification
1999;16:297–206.
Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat
Assoc 2002;97:611–631.
Fraley C, Raftery AE. Enhanced software for model-based clustering, discriminant analysis, and density
estimation: MCLUST. J Classification 2003;20:263–286.
Hartley H. Maximum likelihood estimation from incomplete data. Biometrics 1958;14:174–194.
Hartvig NV, Jensen JL. Spatial mixture modeling of fMRI data. Hum Brain Mapp 2000;11:233–248.
[PubMed: 11144753]
Jeffreys, H. Some tests of significance, treated by the theory of probability; Proceedings of the Cambridge
Philosophical Society; 1935. p. 203-222.
Jeffreys, H. Theory of Probability. 3rd ed. Oxford: Oxford University Press; 1961.
Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc 1995;90:773–795.
Koechlin E, Ody C, Kouneiher F. The architecture of cognitive control in the human prefrontal cortex.
Science 2003;302:1181–1185. [PubMed: 14615530]
Laird AR, Fox PM, Price CJ, Glahn DC, Uecker AM, Lancaster JL, Turkeltaub PE, Kochunov P, Fox
PT. ALE meta-analysis: Controlling the false discovery rate and performing statistical contrasts. Hum
Brain Mapp 2005a;25:155–164. [PubMed: 15846811]
Laird AR, McMillan KM, Lancaster JL, Kochunov P, Turkeltaub PE, Pardo JV, Fox PT. A comparison
of label-based and ALE meta-analysis in the Stroop task. Hum Brain Mapp 2005b;25:6–21. [PubMed:
15846823]
Lancaster J, Laird A, Glahn D, Fox P, Fox P. Automated analysis of meta-analysis networks. Hum Brain
Mapp 2005;25:174–184. [PubMed: 15846809]
Liu X, Banich MT, Jacobson BL, Tanabe JL. Common and distinct neural substrates of attentional control
in an integrated simon and spatial stroop task as assessed by event-related fMRI. NeuroImage
2004;22:1097–1106. [PubMed: 15219581]
Lohmann G, Müller K, Bosch V, Mentzel H, Hessler S, Chen L, Zysset S, von Cramon DY. LIPSIA—
A new software system for the evaluation of functional magnetic resonance images of the human
brain. Comput Med Imaging Graph 2001;25:449–457. [PubMed: 11679206]
McKeown MJ, Jung T-P, Makeig S, Brown G, Kindermann SS, Lee T-W, Sejnowski TJ. Spatially
independent activity patterns in functional MRI data during the stroop color-naming task. Proc Natl
Acad Sci USA 1998;95:803–810. [PubMed: 9448244]
Müller K, Lohmann G, Zysset S, von Cramon DY. Wavelet statistics of functional MRI data and the
general linear model. J Magn Reson Imaging 2003;17:20–30. [PubMed: 12500271]
Neal, RM.; Hinton, GE. A view of the EM algorithm that justifies incremental, sparse, and other variants.
In: Jordan, MI., editor. Learning in Graphical Models. Norwell, MA: Kluwer Academic; 1998. p.
355-368.
Neumann J, von Cramon DY, Forstmann BU, Zysset S, Lohmann G. The parcellation of cortical areas
using replicator dynamics in fMRI. NeuroImage 2006;32:208–219. [PubMed: 16647272]
Neumann J, Lohmann G, Derrfuss J, von Cramon DY. The meta-analysis of functional imaging data
using replicator dynamics. Hum Brain Mapp 2005;25:165–173. [PubMed: 15846812]
Nielsen, FA. Mass meta-analysis in Talairach space. In: Saul, LK.; Weiss, Y.; Bottou, L., editors.
Advances in Neural Information Processing Systems. Vol. Vol. 17. Cambridge, MA: MIT; 2005. p.
985-992.
Neumann et al.Page 15
Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript