# Model-based clustering of meta-analytic functional imaging data.

**ABSTRACT** We present a method for the analysis of meta-analytic functional imaging data. It is based on Activation Likelihood Estimation (ALE) and subsequent model-based clustering using Gaussian mixture models, expectation-maximization (EM) for model fitting, and the Bayesian Information Criterion (BIC) for model selection. Our method facilitates the clustering of activation maxima from previously performed imaging experiments in a hierarchical fashion. Regions with a high concentration of activation coordinates are first identified using ALE. Activation coordinates within these regions are then subjected to model-based clustering for a more detailed cluster analysis. We demonstrate the usefulness of the method in a meta-analysis of 26 fMRI studies investigating the well-known Stroop paradigm.

**0**Bookmarks

**·**

**88**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Spatial normalization-applying standardized coordinates as anatomical addresses within a reference space-was introduced to human neuroimaging research nearly 30 years ago. Over these three decades, an impressive series of methodological advances have adopted, extended, and popularized this standard. Collectively, this work has generated a methodologically coherent literature of unprecedented rigor, size, and scope. Large-scale online databases have compiled these observations and their associated meta-data, stimulating the development of meta-analytic methods to exploit this expanding corpus. Coordinate-based meta-analytic methods have emerged and evolved in rigor and utility. Early methods computed cross-study consensus, in a manner roughly comparable to traditional (nonimaging) meta-analysis. Recent advances now compute coactivation-based connectivity, connectivity-based functional parcellation, and complex network models powered from data sets representing tens of thousands of subjects. Meta-analyses of human neuroimaging data in large-scale databases now stand at the forefront of computational neurobiology.Annual Review of Neuroscience 07/2014; 37:409-434. · 20.61 Impact Factor - [Show abstract] [Hide abstract]

**ABSTRACT:**Background Major depressive disorder (MDD) is a common mental illness with high lifetime prevalence close to 20%. Positron emission tomography (PET) studies have reported decreased prefrontal, insular and limbic cerebral glucose metabolism in depressed patients compared with healthy controls. However, the literature has not always been consistent. To evaluate current evidence from PET studies, we conducted a voxel-based meta-analysis of cerebral metabolism in MDD.Method Data were collected from databases including PubMed and Web of Science, with the last report up to April 2013. Voxel-based meta-analyses were performed using the revised activation likelihood estimation (ALE) software.ResultsTen whole-brain-based FDG-PET studies in MDD were included in the meta-analysis, comprising 188 MDD patients and 169 healthy controls. ALE analyses showed the brain metabolism in bilateral insula, left lentiform nucleus putamen and extra-nuclear, right caudate and cingulate gyrus were significantly decreased. However, the brain activity in right thalamus pulvinar and declive of posterior lobe, left culmen of vermis in anterior lobe were significantly increased in MDD patients.Conclusion Our meta-analysis demonstrates the specific brain regions where possible dysfunctions are more consistently reported in MDD patients. Altered metabolism in insula, limbic system, basal ganglia, thalamus, and cerebellum and thus these regions are likely to play a key role in the pathophysiology of depression.BMC Psychiatry 11/2014; 14(1):321. · 2.23 Impact Factor - [Show abstract] [Hide abstract]

**ABSTRACT:**Age-related increases in right frontal cortex activation are a common finding in the neuroimaging literature. However, neurocognitive factors contributing to right frontal over-recruitment remain poorly understood. Here we investigated the influence of age-related reaction time (RT) slowing and white matter (WM) microstructure reductions as potential explanatory factors for age-related increases in right frontal activation during task switching. Groups of younger (N = 32) and older (N = 33) participants completed a task switching paradigm while functional magnetic resonance imaging (fMRI) was performed, and rested while diffusion tensor imaging (DTI) was performed. Two right frontal regions of interest (ROIs), the dorsolateral prefrontal cortex (DLPFC) and insula, were selected for further analyses from a common network of regions recruited by both age groups during task switching. Results demonstrated age-related activation increases in both ROIs. In addition, the older adult group showed longer RT and decreased fractional anisotropy in regions of the corpus callosum with direct connections to the fMRI ROIs. Subsequent mediation analyses indicated that age-related increases in right insula activation were mediated by RT slowing and age-related increases in right DLPFC activation were mediated by WM microstructure. Our results suggest that age-related RT slowing and WM microstructure declines contribute to age-related increases in right frontal activation during cognitive task performance.Neuroscience 08/2014; · 3.12 Impact Factor

Page 1

Model-Based Clustering of Meta-Analytic Functional Imaging Data

Jane Neumann*, D. Yves von Cramon, and Gabriele Lohmann

Max-Planck-Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, D-04103

Leipzig/Germany

Abstract

We present a method for the analysis of meta-analytic functional imaging data. It is based on

Activation Likelihood Estimation (ALE) and subsequent model-based clustering using Gaussian

mixture models, expectation-maximization (EM) for model fitting, and the Bayesian Information

Criterion (BIC) for model selection. Our method facilitates the clustering of activation maxima from

previously performed imaging experiments in a hierarchical fashion. Regions with a high

concentration of activation coordinates are first identified using ALE. Activation coordinates within

these regions are then subjected to model-based clustering for a more detailed cluster analysis. We

demonstrate the usefulness of the method in a meta-analysis of 26 fMRI studies investigating the

well-known Stroop paradigm.

Keywords

fMRI; clustering; ALE; meta-analysis

INTRODUCTION

Functional neuroimaging has become a powerful tool in cognitive neuroscience, which enables

us to investigate the relationship between particular cortical activations and cognitive tasks

performed by a test subject or patient. However, the rapidly growing number of imaging studies

still provides a quite variable picture, in particular of higher-order brain functioning.

Considerable variation can be observed in the results of imaging experiments addressing even

closely related experimental paradigms. The analysis of the consistency and convergence of

results across experiments is therefore a crucial prerequisite for correct generalizations about

human brain functions. This calls for analysis techniques on a meta-level, i.e. methods that

facilitate the post-hoc combination of results from independently performed imaging studies.

Moreover, functional neuroimaging is currently advancing from the simple detection and

localization of cortical activation to the investigation of complex cognitive processes and

associated functional relationships between cortical areas. Such research questions can no

longer be addressed by the isolated analysis of single experiments alone, but necessitate the

consolidation of results across different cognitive tasks and experimental paradigms. This again

makes meta-analyses an increasingly important part in the evaluation of functional imaging

results. Several methodological approaches to the automated meta-analysis of functional

imaging data have recently been proposed, for example, by Turkeltaub et al. (2002); Chein et

al. (2002); Nielsen and Hansen (2004); Nielsen (2005); Neumann et al. (2005); Lancaster et

al. (2005) and Laird et al. (2005a).

© 2007 Wiley-Liss, Inc.

*Correspondence to: Dr. Jane Neumann, Max-Planck-Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, D-04103

Leipzig, Germany. neumann@cbs.mpg.de.

NIH Public Access

Author Manuscript

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

Published in final edited form as:

Hum Brain Mapp. 2008 February ; 29(2): 177–192. doi:10.1002/hbm.20380.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 2

In coordinate-based meta-analyses activation coordinates reported from independently

performed imaging experiments are analyzed in search of functional cortical areas that are

relevant for the investigated cognitive function. In this article we propose to apply a

combination of Activation Likelihood Estimation (ALE) and model-based clustering to this

problem. The former is a form of kernel density estimation, which was recently adapted for

the automated meta-analysis of functional imaging data (Chein et al., 2002; Turkeltaub et al.,

2002). The latter provides a general framework for finding groups in data by formulating the

clustering problem in terms of the estimation of parameters in a finite mixture of probability

distributions (Everitt et al., 2001; Fraley and Raftery, 2002). In the context of functional

imaging, mixture modeling has been used previously for the detection of brain activation in

single-subject functional Magnetic Resonance Imaging (fMRI) data. For example, Everitt and

Bullmore (1999) modeled a test statistic estimated at each voxel as mixture of central and non-

central χ2 distributions. This approach was extended by Hartvig and Jensen (2000) to account

for the spatial coherency of activated regions. Penny and Friston (2003) used mixtures of

General Linear Models in a spatio-temporal analysis in order to find clusters of voxels showing

task-related activity.

The combination of model-based clustering and ALE presented in this article should be viewed

as an extension rather than a replacement of ALE, which is currently the state-of-the-art

approach to the meta-analysis of functional imaging data. ALE is based on representing

activation maxima from individual experiments by three-dimensional Gaussian probability

distributions from which activation likelihood estimates for all voxels can be inferred. These

estimates are then compared to a null-distribution derived from permutations of randomly

placed activation maxima. Successful application of ALE has been demonstrated by Chein et

al. (2002); Turkeltaub et al. (2002); Wager et al. (2004), and by several authors contributing

to Fox et al. (2005). However, one drawback of the method in its current form is its strong

dependency on the standard deviation of the Gaussian. Choosing the standard deviation too

small results in many small activation foci which cover only a small part of the original input

data and do not carry significantly more information than provided by the individual activation

maxima alone. In contrast, using a large standard deviation results in activation foci, which

represent more of the original activation maxima. However, as will be seen in our experimental

data, the size of such foci can by far exceed the extent of corresponding activations typically

found in single fMRI studies. Such ALE foci might thus comprise more than one functional

unit. This can be observed, in particular, in studies with a very inhomogeneous distribution of

activation coordinates. In this case a certain adaptiveness of the method or a hierarchical

approach would be desirable.

We propose to alleviate this problem by first applying ALE to the original data and then

subjecting activation maxima lying within the resulting activation foci to further clustering.

Using a large standard deviation of the Gaussian in the first step yields a new set of activation

maxima from which coordinates with no other activation maxima in their vicinity are removed.

The subsequent model-based clustering then explores the statistical distribution of the

remaining coordinates.

Model-based clustering assumes that the observed data are generated by a finite mixture of

underlying probability distributions. Each probability distribution corresponds to a cluster. Our

particular implementation closely follows the general model-based clustering approach

proposed by Fraley and Raftery (2002). This approach considers mixtures of multivariate

Gaussians. Maximum likelihood estimation of the mixture models is performed via the

expectation-maximization (EM) algorithm (Hartley, 1958; Dempster et al., 1977), which

determines the parameters of the mixture components as well as the posterior probability for

a data point to belong to a specific component or cluster. Since a suitable initialization is critical

Neumann et al. Page 2

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 3

in the successful application of EM, hierarchical agglomerative clustering is performed as an

initializing step.

Varying the parameterization of the covariance matrix of a Gaussian mixture provides a set of

models with different geometric characteristics, reaching from spherical components of equal

shape and volume to ellipsoidal components with variable shape, volume, and orientation

(Banfield and Raftery, 1993). We use a set of 10 different parameterizations. The best

parameterization of the model and the optimal number of clusters are determined using the

Bayesian Information Criterion (BIC) (Schwarz, 1978).

In the following, we provide the methodological background of ALE, Gaussian mixture

models, and BIC for model selection. We then present experimental data showing the

application of the method in a meta-analysis of 26 fMRI experiments investigating the well-

known Stroop paradigm.

METHODS

ALE

ALE, concurrently but independently developed by Turkeltaub et al. (2002) and Chein et al.

(2002), was among the first methods aimed at modeling cortical areas of activation from meta-

analytic imaging data. It was recently extended by Laird et al. (2005a) to account for multiple

comparisons and to enable statistical comparisons between two or more meta-analyses.

Moreover, it has been used in combination with replicator dynamics for the analysis of

functional networks in meta-analytic functional imaging data (Neumann et al., 2005). For the

presented meta-analysis, ALE was implemented as part of the software package LIPSIA

(Lohmann et al., 2001).

In ALE, activation maxima are modeled by three-dimensional Gaussian probability

distributions centered at their Talairach coordinates. Specifically, the probability that a given

activation maximum lies within a particular voxel is

(1)

where σ is the standard deviation of the distribution and d is the Euclidean distance of the voxel

to the activation maximum. For each voxel, the union of these probabilities calculated for all

activation maxima yields the ALE. In regions with a relatively high density of reported

activation maxima, voxels will be assigned a high ALE in contrast to regions where few and

widely spaced activation maxima have been reported.

From the resulting ALE maps, one can infer whether activation maxima reported from different

experiments are likely to represent the same functional activation. A non-parametric

permutation test is utilized to test against the null-hypothesis that the activation maxima are

spread uniformly throughout the brain. Given some desired level of significance α, ALE maps

are thresholded at the 100(1–α)th percentile of the null-distribution. Topologically connected

voxels with significant ALE values are then considered activated functional regions.

The extent and separability of the resulting regions critically depends on the choice of σ in Eq.

(1). As observed, for example, by Derrfuss et al. (2005), decreasing σ leads to smaller regions

of significant voxels and to an increase in the number of discrete above threshold regions which,

however, represent only few of the original activation maxima. Increasing σ has the opposite

effect with larger regions representing more of the original data. Most commonly σ is chosen

Neumann et al.Page 3

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 4

to correspond to the size of spatial filters typically applied to fMRI data. In previously published

ALE analyses (see Fox et al. (2005) for some examples) we found σ to vary between 9.4 and

10 mm FWHM, in rare cases 15 mm were used. In the vast majority of analyses, the standard

deviation of the Gaussian was set to 10 mm FWHM. As we view ALE as a preprocessing step

to model-based clustering, the activation likelihood should not be estimated too conservatively.

Therefore, we use a relatively large standard deviation of σ = 5 mm, corresponding to 11.8 mm

FWHM.

Model-Based Clustering

ALE leads to a reduced list of activation maxima containing only those maxima which have

one or more other maxima in their vicinity. These coordinates are then subjected to clustering

based on a finite mixture of probability distributions. Here, we will closely follow the procedure

suggested by Fraley and Raftery (1998, 2002), who propose a group of Gaussian mixture

models, maximum likelihood estimation via EM, hierarchical agglomeration as initial

clustering, and model and parameter selection via BIC. In the following, the individual parts

of the clustering procedure are described in detail. These parts were implemented for our

application using the software package MCLUST (Fraley and Raftery, 1999, 2003).

Gaussian Mixture Models

For n independent multivariate observations x = (x1, …, xn), the likelihood of a mixture model

with M components or clusters can be written as

(2)

where fk is the density of the cluster k with parameter vector θk, and p = (p1,…,pM) is the vector

of mixing proportions with pk ≥ 0 and ∑k pk = 1. Since any distribution can be effectively

approximated by a mixture of Gaussians (Silverman, 1985; Scott, 1992), the probability density

function is most commonly represented by

(3)

for d-dimensional data with mean μk and covariance matrix ∑k. Geometrical features of the

components can be varied by parameterization of the covariance matrices ∑k. Banfield and

Raftery (1993) suggest various parameterizations through the eigenvalue decomposition

(4)

Dk is the matrix of eigenvectors, Ak is a diagonal matrix with elements that are proportional to

the eigenvalues of ∑k such that |Ak| = 1, and λk is a scalar. Treating Dk, λk, and Ak as independent

parameters and keeping them either constant or variable across clusters varies the shape,

volume, and orientation of the components. In the simplest case ∑k = λI, all clusters are

spherical and of equal size. The least constraint case given in Eq. (4) accounts for ellipsoidal

clusters of variable shape, volume, and orientation. All parameterizations available in

MCLUST and applied to our experimental data are presented in Table I. The first two models

have spherical, all other models have ellipsoidal components, whereby components in models

with diagonal covariance matrices (c–f) are oriented along the coordinate axes. Models with

Neumann et al.Page 4

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 5

identical matrix A for all components have equally shaped components, whereas models with

identical λ for all components have components of the same volume.

Maximum Likelihood Estimation

Maximum likelihood estimation of a Gaussian mixture model as defined in Eqs. (2) and (3)

can be performed via the widely used EM algorithm, which provides a general approach to

parameter estimation in incomplete data problems (Dempster et al., 1977;Hartley, 1958;Neal

and Hinton, 1998). In general, given a likelihood function L(θ|y) = Πi f (yi|θ), for parameters

θ and data y = (y1…,yn), we wish to find θ̂ such that

In the presence of some hidden data z such that y = (x,z) with x observed and z unobserved, we

can equivalently maximize the so-called complete-data log likelihood and find θ̂ such that

Starting from an initial guess, the EM algorithm proceeds by alternately estimating the

unobservable data z and the unknown parameters θ. Specifically, in the E-step, the algorithm

calculates the expected value of the complete-data log likelihood with respect to z given x and

the current estimate of θ. In the M-step, this expected value is maximized in terms of θ, keeping

z fixed as computed in the previous E-step.

In our application, the complete data y = (y1…,yn), consists of yi = (xi,zi) where each xi is a

three-dimensional vector containing coordinates of activation maxima in Talairach space and

zi = (zi1,…,ziM) is the unknown membership of xi in one of the M clusters, i.e.

With the density of observation xi given zi written as Πk fk(xi|μk,∑k)zik, the complete-data log

likelihood in our problem can be formulated as

(5)

assuming that each zi is independently and identically distributed according to a multinomial

distribution of one draw from M categories with probabilities p1,…pM (Fraley and Raftery,

1998).

Maximum likelihood estimation is performed by alternating between the calculation of zik

given xi, μk, and ∑k (E-step) and maximizing Eq. (5) with respect to μk, ∑k, and pk with zik

fixed (M-step). Mathematical details of the algorithm are given in Appendix A. The EM

algorithm terminates after the difference between successive values of ℓ falls below some

threshold ε, which in our application was set to ε = 0.00001. The value of zik at the maximum

of Eq. (5) is the estimated probability that xi belongs to cluster k, and the maximum likelihood

classification of xi is the cluster k, with

Neumann et al.Page 5

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 6

Initialization by Hierarchical Agglomeration

Following the suggestion by Fraley and Raftery (1998), we employ model-based hierarchical

agglomeration provided in MCLUST as initializing partitioning method. This method tends to

yield reasonable clusterings in the absence of any information about a possible clustering

inherent in the data (Fraley and Raftery, 2002).

Hierarchical agglomeration techniques typically start with a pre-defined number of clusters

and in each step merge the two closest clusters into a new cluster, thereby reducing the number

of clusters by one. The implementation used here starts with n clusters, each containing a single

observation xi. Then, two clusters are chosen such that merging them increases the so-called

classification likelihood, given as

(6)

with fk(xi) given in Eq. (3). The vector c = (c1,…,cn) encodes the classification of the data, i.e.

ci = k, if xi is classified as member of cluster k. For an unrestricted covariance matrix as defined

in Eq. (4), approximately maximizing the classification likelihood (6) amounts to minimizing

where nk is the number of elements in cluster k and Wk is the within-cluster scattering matrix

of cluster k as defined in Eq. (8) in Appendix A (Banfield and Raftery, 1993). Computational

issues on this clustering procedure are discussed in detail by Banfield and Raftery (1993) and

Fraley (1998), in particular regarding the initial stages with a single data point in each cluster,

which leads to |W| = 0.

From the values of c at the maximum of C, initializations for the unknown membership values

zik are derived, and first estimates for the parameters of the Gaussian components can be

obtained from an M-step of the EM algorithm as described in Appendix A.

Model Selection via BIC

A problem of most clustering techniques is to determine the number of clusters inherent in the

data. One common technique in model-based clustering is to apply several models with

different pre-defined numbers of components and subsequently choose the best model

according to some model selection criterion. For models with equal number of parameters, the

simplest approach is to compare estimated residual variances. This is not applicable, however,

when models with varying number of parameters are considered.

An advantage of using mixture models for clustering is that approximate Bayes factors can be

used for model selection. Bayes factors were developed originally as a Bayesian approach to

hypothesis testing by Jeffreys (1935, 1961). In the context of model comparison, a Bayes factor

describes the posterior odds for one model against another given equal prior probabilities. It

is determined from the ratio of the integrated likelihoods of the models. In conjunction with

EM for maximum likelihood estimation, the integrated likelihood of a model can be

Neumann et al.Page 6

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 7

approximated under certain regularity conditions by the BIC (Schwarz, 1978), which is defined

as

(7)

where ℓ̂ is the maximized mixture log likelihood of the model, m is the number of independent

parameters of the model, and n the number of data points. With this definition, a large BIC

value provides strong evidence for a model and the associated number of clusters.

The relationship between Bayes factors and BIC, the regularity conditions, and the use of Bayes

factors for model comparison are discussed in more detail, e.g., by Kass and Raftery (1995).

They also provide guidelines for the strength of evidence for or against some model: A

difference of less than 2 between the BIC of two models corresponds to weak, a difference

between 2 and 6 to positive, between 6 and 10 to strong, and a difference greater than 10 to

very strong evidence for the model with the higher BIC value.

Putting Things Together

Taking together the individual parts described above, our algorithm for deriving activated

functional regions from meta-analytic imaging data can be summarized as follows:

1.

Given a list of coordinates encoding activation maxima in Talairach space from a

number of individual studies, calculate ALEs for all voxels using a large standard

deviation of the Gaussian. Determine those coordinates that fall within the regions

above the ALE threshold.

2.

Determine a maximum number of clusters M. Perform hierarchical agglomeration for

up to M clusters using the reduced coordinate list obtained in Step 1 as input, thereby

approximately maximizing the classification likelihood as defined in Eq. (6).

3.

For each parameterization and number of clusters of the model as defined in Eq. (5)

perform EM, using the classification obtained in Step 2 as initialization.

4.

Calculate the BIC for each parameterization and number of clusters in the model

according to Eq. (7)

5.

Choose the parameterization and number of clusters with a decisive maximum BIC

value as solution according to the guidelines above.

Experimental Data

Our method was applied in a meta-analysis of 26 fMRI experiments employing the well-known

Stroop paradigm (Stroop, 1935). A list of included studies is given in Appendix B. The Stroop

paradigm is designed to investigate interference effects in the processing of a stimulus while

a competing stimulus has to be suppressed. For example, subjects are asked to name a color

word, say “red,” which is presented on a screen in the color it stands for (congruent condition)

or in a different color (incongruent condition). Other variants of the Stroop paradigm include

the spatial word Stroop task (the word “above” is written below a horizontal line), the counting

Stroop task (the word “two” appears three times on the screen) and the object-color Stroop task

(an object is presented in an atypical color, e.g. a blue lemon).

This particular paradigm was chosen as a test case for our method, because the interference

effect and the associated cortical activations are known to be produced very reliably.

Activations are most commonly reported in the left inferior frontal region, the left inferior

parietal region, and the left and right anterior cingulate (Banich et al., 2000; Liu et al., 2004;

McKeown et al., 1998). Our own previous meta-analysis based on ALE and subsequent

Neumann et al. Page 7

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 8

application of replicator dynamics (Neumann et al., 2005) revealed a frontal network including

the presupplementory motor area (preSMA), the inferior frontal sulcus (IFS) extending onto

the middle frontal gyrus, the anterior cingulate cortex (ACC) of both hemispheres, and the

inferior frontal junction area (IFJ). Other frequently reported areas include frontopolar cortex,

occipital cortex, fusiform gyrus, and insula (Laird et al., 2005b; Zysset et al., 2001).

Despite the high agreement in the reported activated areas, the actual location of associated

coordinates in Talairach space differs widely between studies. For example, the left IFJ was

localized in previous studies at Talairach coordinates x between −47 and −35, y between −4

and 10, and z between 27 and 40 (Brass et al., 2005; Derrfuss et al., 2004, 2005; Neumann et

al., 2005). Such high variability makes the classification of the data into distinct functional

units difficult.

We applied our analysis to data extracted from the BrainMap database (Fox and Lancaster,

2002). This database provides Talairach coordinates of activation maxima from functional

neuroimaging experiments covering a variety of experimental paradigms and imaging

modalities. At the time of writing the database contained over 27,500 activation coordinates

reported in 790 papers.

Searching the database for fMRI experiments investigating the Stroop interference task resulted

in 26 peer-reviewed journal publications. Within these studies, 728 Talairach coordinates for

activation maxima were found. The majority of these coordinates (550 out of 728) represented

the Stroop interference effect, i.e. significant activation found for the contrasts incongruent ≥

congruent, incongruent ≥ control, or incongruent + congruent ≥ control. As control condition,

either the presentation of a neutral object (e.g. “XXXX” instead of a color word) or a simple

visual fixation were used. Fifty-five coordinates were marked as deactivation in the database,

i.e. they represent the contrast congruent ≥ incongruent. The remaining coordinates were

reported to represent other contrasts such as the contrast between different Stroop modalities

or a conjunction of Stroop interference, spatial interference, and the Flanker task. Note that 26

coordinates came from a meta-analysis on Stroop interference, nine coordinates represented

the interference effect in pathological gamblers, and all remaining coordinates were taken from

group studies with healthy subjects.

As the focus of our work is on the development of meta-analysis tools rather than the

investigation of the Stroop paradigm, all 728 coordinates were subjected to the subsequent

analysis without any further selection. This not only enabled us to test our method on a

reasonably large data set, it also introduced some “realistic” noise into our data.

Plots of all coordinates projected onto a single axial, sagittal, and coronal slice are shown in

the top row of Figure 1. Coordinates reported from different studies are represented by different

colors. As can be seen, activation maxima are distributed over large parts of the cortex, although

some areas with a higher density of activation coordinates are already apparent, in particular

in the left lateral prefrontal cortex and the medial frontal cortex. These can be seen more clearly

in the example slices in the bottom row of Figure 1.

Experimental Results

Activation coordinates were first subjected to an ALE analysis with standard deviations of σ

= 5 mm, corresponding to 11.8 mm FWHM. The null distribution was derived from 1,000

iterations of randomly placing 728 activation coordinates over a mask brain volume defined

by the minimum and maximum Talairach coordinates in the original data set. The brain mask

spanned a volume of 61,408 voxels, each 3 × 3 × 3 mm3 in size. As suggested by Turkeltaub

et al. (2002), the resulting ALE map was thresholded at an α-level of α = 0.01%. This

Neumann et al. Page 8

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 9

corresponded to an ALE threshold of 0.0156. Figure 2 shows sagittal and axial example slices

of the ALE map containing only voxels above threshold.

The ALE analysis yielded 13 regions of topologically connected voxels above threshold, which

covered a total volume of 54,810 mm3 and contained 210 of the original activation maxima.

Table II shows size, maximum ALE value, location of the center in Talairach space, and the

number of original activation coordinates covered by the detected ALE regions.

Note that the four largest regions cover 89.65% (49,140 mm3) of the total ALE regions’ volume.

They contain 83.8% of all above-threshold coordinates. This can be explained by the very

inhomogeneous distribution of the original input coordinates: More than 40% of the original

activation maxima fell within regions spanned by the minimum and maximum Talairach

coordinates of the four largest ALE regions. The remaining coordinates were distributed more

evenly over other parts of the cortex.

Note further that some smaller regions surviving the ALE threshold contain only single

activation maxima. This seems counterintuitive at first, as a single coordinate should not result

in a relatively high ALE value. However, imagine, for example, a situation where three

coordinates are arranged in a “row,” i.e. at three voxels in the same row of a slice with one

voxel between them. The voxel in the middle will get a higher empirical ALE value than the

ones at both ends, as it has two other coordinates in close distance (only two voxels away)

whereas the other two voxels have one coordinate in close distance and another one four voxels

further away. Depending on the distribution of other coordinates, thresholding the ALE values

could now shape the surviving ALE region such that only the coordinate in the middle will be

inside the region, whereas the value at the other two voxels might just be too small to survive

the thresholding. Thus, ALE regions containing only a single coordinate are caused by very

small groups of activation maxima that are quite isolated from the remaining ones. The fact

that some of our ALE regions contain only a single coordinate indicates that all remaining

activation coordinates, not surviving the thresholding, are very isolated from each other. They

can therefore be regarded as noise.

Despite the use of a very small α-level in ALE thresholding, some of the determined ALE foci

clearly exceed the size of cortical activations typically found in these regions for the Stroop

paradigm (see, e.g. Zysset et al. (2001) for a comparison). Moreover, as seen in Figure 2, within

such foci, in particular in the left prefrontal cortex, sub-maxima of ALE values are visible,

indicating a possible sub-clustering of the represented activation coordinates. All above-

threshold activation coordinates were therefore subjected to model-based clustering as the

second part of our method.

Hierarchical agglomeration of the above-threshold coordinates was first performed for up to

30 clusters. Using the results as initialization for the EM algorithm, models as defined in Eq.

(5) with the parameterizations introduced in Section Model-Based Clustering with up to 30

clusters were then applied to the data set, and BIC values were calculated for each number of

clusters and parameterization.

The three models with λk = λ, i.e. models with components of equal volume, outperformed the

remaining models, which all allowed for components of variable volume. This seems

counterintuitive at first, as a more variable model would be expected to fit the data better than

a more restricted one. However, as described above, the BIC value penalizes model complexity,

which is larger for models with variable components than for models with equal components.

Thus, for our data, allowing the components’ volume to vary did not increase the log likelihood

of the models sufficiently in order to justify the increased number of model parameters. Note

also that for very large cluster numbers, some more variable models failed to provide a

Neumann et al.Page 9

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 10

clustering due to the singularity of the associated covariance matrices. This was not the case

for models with fewer free parameters, however.

Figure 3 shows plots of the BIC values of the best three models for up to 30 clusters. BIC

values of these models are very similar, in particular for models with more than 20 clusters.

The right side shows an enlarged plot of the BIC values for models with 20 up to 25 clusters.

All three models yielded the highest BIC value when applied with 24 clusters. The more

complex models with ellipsoidal components slightly outperformed the spherical one, whereby

the difference between a variable and a fixed orientation of the components was negligible.

Figure 4 shows the results of the model-based clustering exemplified for the two largest ALE

regions, which were situated in the left lateral prefrontal cortex (left LPFC) and the medial

frontal cortex (MFC), respectively (cf. Table II). The categorization of activation coordinates

within the left LPFC is shown in five consecutive sagittal functional slices at Talairach

coordinates between x = −34 and x = −46. The coordinates in this ALE region were subdivided

into five groups in anterior-posterior and superior-inferior direction. In the most posterior and

superior part of the region a further division in lateral-medial direction can be observed (shown

in green and blue). Interestingly, cluster centers of the more anterior and inferior clusters

corresponded closely to the sub-maxima in the ALE focus visible in Figure 2. However, the

division of posterior and superior parts of the region into two clusters could not have been

predicted from the ALE sub-maxima. The same holds for the clustering of coordinates in the

MFC, where no sub-maxima could be observed in the ALE map. The categorizations of

coordinates in the MFC is shown in the right panel of Figure 4 in four consecutive sagittal

slices. The best model provided four clusters, again dividing the region in anterior-posterior

and superior-inferior direction. Thus, model-based clustering revealed some additional

structure in the data that would have remained undetected when using ALE alone. To get some

feeling for the actual shape of the clusters and their relative location, the extracted clusters are

presented again in views from different angles in Figure 5.

The robustness of our method against noisy input data was tested in a post-hoc analysis

including only the 550 activation coordinates that truly represented the Stroop interference

effect. The results did not significantly differ from the results of the original analysis. The noise

in the original input data thus did not have a noteworthy impact on the results of the model-

based clustering.

DISCUSSION

ALE facilitates the detection of cortical activation from activation maxima reported in

independently performed functional imaging studies. The resulting areas reflect the distribution

of activation maxima over the cortex. In particular, clusters of activation maxima in a region

reflect the likely involvement of this region in processing a cognitive task, whereas isolated

activation maxima are regarded as noise.

Our analysis shows that the extent of ALE regions can vary considerably due to the

heterogeneous distribution of the input data across different parts of the cortex. As seen in

Table II and Figure 2, the size of some ALE foci obtained in the first step of our analysis by

far exceeded the extent of comparable activations reported in single fMRI experiments. For

example, activation maxima reported by Zysset et al. (2001) for two separated activations in

the posterior (Tal: −38, 5, 30) and the anterior (Tal: −38, 35, 5) inferior frontal sulcus are both

located within the same ALE region in our analysis. This is caused by the high number of

activation coordinates within this region together with their high spatial variability. Moreover,

within the largest ALE focus located in the left LPFC, sub-maxima could be observed,

indicating a possible sub-clustering of the region.

Neumann et al. Page 10

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 11

One simple way to separate several areas within such a large ALE region would be the choice

of a higher ALE threshold. However, this is problematic if a whole brain analysis is performed,

since ALE values in other regions might be significantly lower despite a high concentration of

activation coordinates. For example, in Figure 2b a cluster of activation coordinates can clearly

be seen in the anterior part of the left intraparietal sulcus. However, the resulting ALE focus

representing no less than 25 activation coordinates has a maximum ALE value of only 0.027

in comparison to 0.05 in the left LPFC. Thus, by simply choosing a higher ALE threshold,

some clusters of activation coordinates might remain undetected.

We tried to alleviate this problem by following a hierarchical approach. In a first step, ALE is

used to identify regions with high concentration of activation coordinates. In a second step,

large ALE regions are further investigated in search for a possible sub-division.

Applying this two-step procedure to activation maxima from 26 Stroop experiments first

resulted in relatively large ALE regions, in particular in the frontal lobe (cf. Fig. 2). This is in

line with earlier findings on frontal lobe activity, in particular in a meta-analysis by Duncan

and Owen (2000) who reported cortical regions of large extent to be recruited by a variety of

cognitive tasks. However, in contrast to this study, our analysis pointed to a possible further

sub-clustering of these areas. The two largest ALE regions found in the left lateral prefrontal

cortex and the medial frontal wall were partitioned into five and four clusters, respectively.

While our exploratory analysis technique does not have the power to associate specific

cognitive functions to these clusters, this finding could serve as a hypothesis for a further

functional specialization of these regions.

The main directions of the clustering were in parallel to the coordinate axes, primarily in

anterior-posterior and superior-inferior direction. This corresponds well with recent results

from single-subject and group analyses obtained from a variety of analysis techniques as well

as from other meta-analyses, see e.g. Neumann et al. (2006); Forstmann et al. (2005); Koechlin

et al. (2003); Müller et al. (2003) for LPFC, and Forstmann et al. (2005) and Amodio and Frith

(2006) for MFC clustering.

It is important to be clear about the implicit assumptions made in the application of our analysis

technique. Meta-analyses are aimed at consolidating results from several studies in order to

find general mechanisms related to a particular task, class of paradigms, etc. Thus, if we want

to generalize the findings of any meta-analysis, we must assume that the data extracted from

the included studies are a representative sample of all the data collected for the investigated

phenomenon. Note, however, that this must be assumed in any empirical analysis relying on

sampled data. A second, closely related, assumption specific to clustering activation

coordinates is that the inherent distribution of activation for the investigated phenomenon is

completely represented by the investigated data.

In a meta-analysis, these assumptions are sometimes hard to meet because of the selective

publication of activation coordinates from particular cortical regions, a problem often referred

to as “publication or literature bias.” In the majority of experimental studies, only a specific

aspect of a paradigm or a particular cortical region are investigated and, consequently, some

significantly activated regions found for a stimulus might be neglected in the publication of

the results. This can result in overemphasizing some regions while neglecting others, which in

turn can lead to a nonrepresentative distribution of our input data. A careful and informed

selection of studies included in such an analysis and the inclusion of as much data as possible

is thus indispensable.

For our example analysis we used a very large data set, in order to reduce the effects of the

publication bias. Note, however, that our method also works for smaller analyses. For very

small numbers of activation maxima, the maximum number of clusters might have to be

Neumann et al. Page 11

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 12

reduced, to avoid singularity problems in the estimation of the covariance matrix. Moreover,

for very small or very homogeneously distributed data sets, the problem of very large ALE

regions might not arise in the first place. In this case, the results of the model-based clustering

should not differ significantly from the application of ALE alone.

The clustering technique presented here is purely data-driven. That is, the results are

exclusively derived from the spatial distribution of the input data and restricted only by the

constraints on the geometry of the mixture model components. Here, additional constraints

such as anatomical or cytoarchitectonic boundaries between cortical regions are conceivable.

How such constraints can be incorporated into the mathematical framework of mixture

modeling is a question that will be addressed in future work.

As noted earlier, in ALE the extent and number of above threshold clusters critically depend

on the choice of a suitable standard deviation of the Gaussian. Nielsen and Hansen (2002) offer

an interesting approach to this problem by optimizing the standard deviation of a Gaussian

kernel when modeling the relation between anatomical labels and corresponding focus

locations. Similar to ALE, activation maxima are modeled by three-dimensional Gaussian

probability distributions and the standard deviation is optimized by leave-one-out cross

validation (Nielsen and Hansen, 2002). In our hierarchical approach, the choice of σ is less

critical and the use of a large standard deviation is feasible, as ALE is used only as a pre-

processing step for model-based clustering. We can thus make use of as much information

present in the data as possible. Note that the use of an even larger standard deviation did not

have any effect on the choice of activation coordinates entering the second step of our analysis,

although some ALE regions were merged and slightly extended. The results of the model-based

clustering for a larger standard deviation would therefore be identical to the results presented

here for σ = 5 mm.

A second parameter, influencing the outcome of an ALE analysis, is the size of the mask volume

used for deriving the null-hypothesis. Clearly, the size of the volume has some influence on

the ALE threshold corresponding to the desired α-level. Therefore, the mask volume chosen

should match the volume spanned by the empirical activation maxima included in the analysis.

In our example, the activation coordinates obtained from the database were distributed over

the entire brain volume, including subcortical regions and even some white matter. We

therefore chose as a mask the entire volume of a brain, normalized to the standard size provided

by the software package LIPSIA (Lohmann et al., 2001). The distribution of the random

activation foci was then restricted to the area spanned by the minimum and maximum Talairach

coordinates of the 728 empirical maxima. Note, however, that the particular choice of the mask

volume is less critical than might appear at first sight. This is due to the large ratio between

the empirical maxima and the number of voxels in the mask (in our analysis 728 and 61,408

voxels, respectively). For example, reducing the mask volume by 1/2 in our example analysis

would change the ALE threshold only from 0.0156 to 0.018. The resulting thresholded ALE

map would still contain the vast majority of the activation maxima that exceed the threshold

when the full mask volume is used. This shows that slight variations in the mask volume do

not significantly change the outcome of the subsequent model-based clustering.

Note that in our example data, ALE values were not corrected for multiple comparison (Laird

et al., 2005a). Rather, as suggested in the original work by Turkeltaub et al. (2002), values

were thresholded at a very small α-level of 0.01% (P = 0.0001) to protect from family-wise

Type I errors. Correction was omitted for the sake of simplicity, keeping in mind that (1) in

our approach ALE serves as a pre-processing step to model-based clustering and therefore

should not be performed too conservatively, and (2) the aim of model-based clustering is the

sub-clustering of large ALE foci which would in any case survive the correction procedure.

Moreover, Laird and colleagues, when introducing multiple comparison correction for ALE,

Neumann et al.Page 12

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 13

compared it to uncorrected thresholding with small thresholds and observed: “It is clear that

thresholding the ALE maps at P < 0.0001 (uncorrected) produced results that most closely

matched the FDR-corrected results (Laird et al., 2005a, p. 161).” This confirms our own

empirical observation that correcting ALE values, though statistically sound, in practical terms

often amounts to using a smaller threshold without correction, as was done in the example

provided here. However, we wish to point out that model-based clustering can in principle be

applied to any activation coordinates. Thus, there are no restrictions on using it in conjunction

with ALE values corrected for multiple comparisons.

The second step of our analysis procedure pertains to fitting Gaussian mixtures to the activation

coordinates that survived the ALE threshold in the first analysis step. Although Gaussians are

the most commonly used components in mixture modeling, they have a well-known limitation:

Gaussian mixture models have a relatively high sensitivity to outliers which can lead to an

over-estimation of the number of components (Svensén and Bishop, 2004). However, we

would argue that this is not a critical issue in our particular application, since such outliers are

removed by ALE before the actual clustering.

Like in many clustering problems, the true number of clusters for a given set of activation

maxima is not known in advance. This can be problematic as most clustering techniques require

the number of clusters to be pre-specified. In the model-based clustering approach suggested

here, this problem is solved by fitting a set of models with different numbers of clusters to the

data and applying a model selection criterion afterwards. The use of the BIC as model selection

criterion allows us to select the best number of clusters and the model parameterization

simultaneously. Like most model selection criteria, the BIC follows the principle of Occam’s

razor and favors from two or more candidate models the model that fits the data sufficiently

well in the least complex way. In our context, this idea can be expressed formally using the

estimated log likelihood of the models and a fixed penalizing term encoding the number of

parameters of each model. Here, alternative approaches such as the Akaike Information

Criterion (AIC) (Akaike, 1973) or the Deviance Information Criterion (DIC) (Spiegelhalter et

al., 2002) are conceivable. AIC, for example, is strongly related to BIC as it only differs in the

simpler penalty term 2 m (cf. Eq. 7). This means, however, that for large sample sizes, AIC

tends to favor more complex models compared to BIC. Other conceivable strategies include

model selection procedures based on data-driven rather than fixed penalty terms (e.g. Shen and

Ye, 2002), or stochastic methods which allow an automatic determination of the number of

components in the process of modelling (e.g. Abd-Almageed et al., 2005; Richardson and

Green, 1997; Svensén and Bishop, 2004). The application of different model selection criteria

and their influence on the result of the clustering will be one direction of future research.

Finally, note the relationship of different parameterizations of the Gaussians to other clustering

criteria. For example, for the spherical model ∑k = λI, maximizing the complete-data log

likelihood in Eq. (5) refers to minimizing the standard k-means clustering criterion tr(W) where

W is the within-cluster scatter matrix as defined in Eq. (A1) and Eq. (A2) in Appendix A.

Maximizing the likelihood of the ellipsoidal model ∑k = λDADT is related to the minimization

of det(W). Thus, allowing the parameterization of the covariance matrices to vary, model-based

clustering encompasses and generalizes a number of classical clustering procedures.1 The

general problems of choosing an appropriate clustering technique and the optimal number of

clusters are then formulated as model selection problem (Fraley and Raftery, 2002).

1For a more detailed discussion on the relation between classical cluster criteria and constraints on the model covariance matrix see, e.g.,

Everitt et al. (2001); Celeux and Govaert (1995); Banfield and Raftery (1993).

Neumann et al.Page 13

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 14

CONCLUSION

We have presented a new method for the coordinate-based meta-analysis of functional imaging

data that facilitates the clustering of activation maxima obtained from independently performed

imaging studies. The method provides an extension to ALE and overcomes two of its

drawbacks: the strong dependency of the results on the chosen standard deviation of the

Gaussian and the relatively large extent of some ALE regions for very inhomogeneously

distributed input data. When applied in a meta-analysis of 26 comparable fMRI experiments,

the method resulted in functional regions that correspond well with the literature. Further

developments of our method could include the use of different model selection criteria and

further constraints on the model components incorporating additional anatomical or

cytoarchitectonic information.

Acknowledgments

We wish to thank Chris Fraley and Adrian Raftery for helpfully commenting on parts of the manuscript. We thank

the BrainMap development team for providing access to the database and for very helpful technical support.

Contract grant sponsor: NIH; Contract grant number: R01 MH74457; Contract grant sponsors: The National Institute

of Mental Health and the National Institute of Biomedical Imaging and Bioengineering.

REFERENCES

Abd-Almageed W, El-Osery A, Smith CE. Estimating time-varying densities using a stochastic learning

automaton. Soft Comput J 2005;10:1007–1020.

Akaike, H. Information theory and an extension of the maximum likelihood principle. Proceeding of the

Second International Symposium on Information Theory; Budapest. 1973. p. 267-281.

Amodio DA, Frith CD. Meeting of minds: The medial frontal cortex and social cognition. Nat Rev

Neurosc 2006;7:268–277.

Banfield J, Raftery A. Model-based Gaussian and non-Gaussian clustering. Biometrics 1993;49:803–

821.

Banich MT, Milham MP, Atchley RA, Cohen NJ, Webb A, Wszalek T, Kramer AF, Liang Z-P, Wright

A, Shenker J, Magin R, Barad V, Gullett D, Shah C, Brown C. fMRI studies of stroop tasks reveal

unique roles of anterior and posterior brain systems in attentional selection. J Cogn Neurosci

2000;12:988–1000. [PubMed: 11177419]

Brass M, Derrfuss J, Forstmann B, von Cramon DY. The role of the inferior frontal junction area in

cognitive control. Trends Cogn Sci 2005;9:314–316. [PubMed: 15927520]

Celeux G, Govaert G. Gaussian parsimonious clustering model. Pattern Recognit 1995;28:781–793.

Chein JM, Fissell K, Jacobs S, Fiez JA. Functional heterogeneity within broca’s area during verbal

working memory. Physiol Behav 2002;77:635–639. [PubMed: 12527011]

Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the em algorithm.

J R Stat Soc B 1977;39:1–38.

Derrfuss J, Brass M, Neumann J, von Cramon DY. Involvement of the inferior frontal junction in

cognitive control: Meta-analyses of switching and stroop studies. Hum Brain Mapp 2005;25:22–34.

[PubMed: 15846824]

Derrfuss J, Brass M, von Cramon DY. Cognitive control in the posterior frontolateral cortex: Evidence

from common activations in task coordination, interference control, and working memory.

NeuroImage 2004;23:604–612. [PubMed: 15488410]

Duncan J, Owen AM. Common regions of the human frontal lobe recruited by diverse cognitive demands.

Trends Neurosci 2000;23:475–483. [PubMed: 11006464]

Everitt BS, Bullmore ET. Mixture model mapping of brain activation in functional magnetic resonance

images. Hum Brain Mapp 1999;7:1–14. [PubMed: 9882086]

Everitt, BS.; Landau, S.; Leese, M. Cluster Analysis. 4th ed. New York: Oxford University Press; 2001.

Neumann et al. Page 14

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 15

Forstmann BU, Brass M, Koch I, von Cramon DY. Internally generated and directly cued task sets: An

investigation with fMRI. Neuropsychologia 2005;43:943–952. [PubMed: 15716164]

Fox PT, Laird AR, Lancaster JL. Meta-Analysis in Functional Brain Mapping (Special Issue). Hum Brain

Mapp 2005;25

Fox PT, Lancaster JL. Mapping context and content: The BrainMap model. Nat Rev Neurosci

2002;3:319–321. [PubMed: 11967563]

Fraley C. Algorithms for model-based Gaussian hierarchical clustering. J Sci Comput 1998;20:270–281.

Fraley C, Raftery AE. How many clusters? Which clustering method? Answers via model-based cluster

analysis. Comput J 1998;41:578–588.

Fraley C, Raftery AE. MCLUST: Software for model-based cluster analysis. J Classification

1999;16:297–206.

Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat

Assoc 2002;97:611–631.

Fraley C, Raftery AE. Enhanced software for model-based clustering, discriminant analysis, and density

estimation: MCLUST. J Classification 2003;20:263–286.

Hartley H. Maximum likelihood estimation from incomplete data. Biometrics 1958;14:174–194.

Hartvig NV, Jensen JL. Spatial mixture modeling of fMRI data. Hum Brain Mapp 2000;11:233–248.

[PubMed: 11144753]

Jeffreys, H. Some tests of significance, treated by the theory of probability; Proceedings of the Cambridge

Philosophical Society; 1935. p. 203-222.

Jeffreys, H. Theory of Probability. 3rd ed. Oxford: Oxford University Press; 1961.

Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc 1995;90:773–795.

Koechlin E, Ody C, Kouneiher F. The architecture of cognitive control in the human prefrontal cortex.

Science 2003;302:1181–1185. [PubMed: 14615530]

Laird AR, Fox PM, Price CJ, Glahn DC, Uecker AM, Lancaster JL, Turkeltaub PE, Kochunov P, Fox

PT. ALE meta-analysis: Controlling the false discovery rate and performing statistical contrasts. Hum

Brain Mapp 2005a;25:155–164. [PubMed: 15846811]

Laird AR, McMillan KM, Lancaster JL, Kochunov P, Turkeltaub PE, Pardo JV, Fox PT. A comparison

of label-based and ALE meta-analysis in the Stroop task. Hum Brain Mapp 2005b;25:6–21. [PubMed:

15846823]

Lancaster J, Laird A, Glahn D, Fox P, Fox P. Automated analysis of meta-analysis networks. Hum Brain

Mapp 2005;25:174–184. [PubMed: 15846809]

Liu X, Banich MT, Jacobson BL, Tanabe JL. Common and distinct neural substrates of attentional control

in an integrated simon and spatial stroop task as assessed by event-related fMRI. NeuroImage

2004;22:1097–1106. [PubMed: 15219581]

Lohmann G, Müller K, Bosch V, Mentzel H, Hessler S, Chen L, Zysset S, von Cramon DY. LIPSIA—

A new software system for the evaluation of functional magnetic resonance images of the human

brain. Comput Med Imaging Graph 2001;25:449–457. [PubMed: 11679206]

McKeown MJ, Jung T-P, Makeig S, Brown G, Kindermann SS, Lee T-W, Sejnowski TJ. Spatially

independent activity patterns in functional MRI data during the stroop color-naming task. Proc Natl

Acad Sci USA 1998;95:803–810. [PubMed: 9448244]

Müller K, Lohmann G, Zysset S, von Cramon DY. Wavelet statistics of functional MRI data and the

general linear model. J Magn Reson Imaging 2003;17:20–30. [PubMed: 12500271]

Neal, RM.; Hinton, GE. A view of the EM algorithm that justifies incremental, sparse, and other variants.

In: Jordan, MI., editor. Learning in Graphical Models. Norwell, MA: Kluwer Academic; 1998. p.

355-368.

Neumann J, von Cramon DY, Forstmann BU, Zysset S, Lohmann G. The parcellation of cortical areas

using replicator dynamics in fMRI. NeuroImage 2006;32:208–219. [PubMed: 16647272]

Neumann J, Lohmann G, Derrfuss J, von Cramon DY. The meta-analysis of functional imaging data

using replicator dynamics. Hum Brain Mapp 2005;25:165–173. [PubMed: 15846812]

Nielsen, FA. Mass meta-analysis in Talairach space. In: Saul, LK.; Weiss, Y.; Bottou, L., editors.

Advances in Neural Information Processing Systems. Vol. Vol. 17. Cambridge, MA: MIT; 2005. p.

985-992.

Neumann et al.Page 15

Hum Brain Mapp. Author manuscript; available in PMC 2010 June 15.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript