# Learning Task-Optimal Registration Cost Functions for Localizing Cytoarchitecture and Function in the Cerebral Cortex

**Abstract**

Image registration is typically formulated as an optimization problem with multiple tunable, manually set parameters. We present a principled framework for learning thousands of parameters of registration cost functions, such as a spatially-varying tradeoff between the image dissimilarity and regularization terms. Our approach belongs to the classic machine learning framework of model selection by optimization of cross-validation error. This second layer of optimization of cross-validation error over and above registration selects parameters in the registration cost function that result in good registration as measured by the performance of the specific application in a training data set. Much research effort has been devoted to developing generic registration algorithms, which are then specialized to particular imaging modalities, particular imaging targets and particular postregistration analyses. Our framework allows for a systematic adaptation of generic registration cost functions to specific applications by learning the "free" parameters in the cost functions. Here, we consider the application of localizing underlying cytoarchitecture and functional regions in the cerebral cortex by alignment of cortical folding. Most previous work assumes that perfectly registering the macro-anatomy also perfectly aligns the underlying cortical function even though macro-anatomy does not completely predict brain function. In contrast, we learn 1) optimal weights on different cortical folds or 2) optimal cortical folding template in the generic weighted sum of squared differences dissimilarity measure for the localization task. We demonstrate state-of-the-art localization results in both histological and functional magnetic resonance imaging data sets.

1424 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

Learning Task-Optimal Registration Cost Functions

for Localizing Cytoarchitecture and Function

in the Cerebral Cortex

B. T. Thomas Yeo*, Mert R. Sabuncu, Tom Vercauteren, Daphne J. Holt, Katrin Amunts, Karl Zilles,

Polina Golland, and Bruce Fischl

Abstract—Image registration is typically formulated as an opti-

mization problem with multiple tunable, manually set parameters.

We present a principled framework for learning thousands of pa-

rameters of registration cost functions, such as a spatially-varying

tradeoff between the image dissimilarity and regularization terms.

Our approach belongs to the classic machine learning framework

of model selection by optimization of cross-validation error. This

Manuscript received October 24, 2009; revised April 21, 2010; accepted

April 22, 2010. Date of publication June 07, 2010; date of current version June

30, 2010. This work was supported in part by the NAMIC (NIH NIBIB NAMIC

U54-EB005149), in part by the NAC (NIH NCRR NAC P41-RR13218), in

part by the mBIRN (NIH NCRR mBIRN U24-RR021382), in part by the

NIH NINDS R01-NS051826 Grant, in part by the NSF CAREER 0642971

Grant, in part by the National Institute on Aging (AG02238), in part by the

NCRR (P41-RR14075, R01 RR16594-01A1), in part by the NIBIB (R01

EB001550, R01EB006758), in part by the NINDS (R01 NS052585-01), and in

part by the MIND Institute. Additional support was provided by The Autism

& Dyslexia Project funded by the Ellison Medical Foundation. The work

of B. T. Thomas Yeo was supported by the A*STAR, Singapore. Asterisk

indicates corresponding author.

*B. T. T. Yeo is with the Computer Science and Artiﬁcial Intelligence

Laboratory, Department of Electrical Engineering and Computer Science,

Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail:

ythomas@csail.mit.edu).

P. Golland is with the Computer Science and Artiﬁcial Intelligence

Laboratory, Department of Electrical Engineering and Computer Science,

Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail:

polina@csail.mit.edu).

M. R. Sabuncu is with the Computer Science and Artiﬁcial Intelligence Lab-

oratory, Department of Electrical Engineering and Computer Science, Massa-

chusetts Institute of Technology, Cambridge, MA 02139 USA and also with

the Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts Gen-

eral Hospital, Harvard Medical School, Charlestown, MA 02129 USA (e-mail:

msabuncu@csail.mit.edu).

T. Vercauteren is with Mauna Kea Technologies, 75010 Paris, France (e-mail:

tom.vercauteren@maunakeatech.com).

D. J. Holt is with the Massachusetts General Hospital Psychiatry Depart-

ment, Harvard Medical School, Charlestown, MA 02139 USA (e-mail: dholt@

partners.org).

K. Amunts is with the Department of Psychiatry and Psychotherapy, RWTH

Aachen University and the Institute of Neuroscience and Medicine, Research

Center Jülich, 52425 Jülich, Germany (e-mail: kamunts@ukaachen.de).

K. Zilles is with the Institute of Neuroscience and Medicine, Research Center

Jülich and the C.&O. Vogt-Institute for Brain Research, University of Düssel-

dorf, 52425 Jülich, Germany (e-mail: k.zilles@fz-juelich.de).

B. Fischl is with the Computer Science and Artiﬁcial Intelligence Laboratory,

Department of Electrical Engineering and Computer Science, Massachusetts In-

stitute of Technology, Cambridge, MA 02139 USA, and also with the Athinoula

A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital,

Harvard Medical School, Charlestown, MA 02129 USA, and also with the De-

partment of Radiology, Harvard Medical School and the Divison of Health Sci-

ences and Technology, Massachusetts Institute of Technology, Cambridge, MA

02139 USA (e-mail: ﬁschl@nmr.mgh.harvard.edu).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TMI.2010.2049497

second layer of optimization of cross-validation error over and

above registration selects parameters in the registration cost

function that result in good registration as measured by the per-

formance of the speciﬁc application in a training data set. Much

research effort has been devoted to developing generic registration

algorithms, which are then specialized to particular imaging

modalities, particular imaging targets and particular postregistra-

tion analyses. Our framework allows for a systematic adaptation

of generic registration cost functions to speciﬁc applications by

learning the “free” parameters in the cost functions. Here, we

consider the application of localizing underlying cytoarchitecture

and functional regions in the cerebral cortex by alignment of

cortical folding. Most previous work assumes that perfectly reg-

istering the macro-anatomy also perfectly aligns the underlying

cortical function even though macro-anatomy does not completely

predict brain function. In contrast, we learn 1) optimal weights

on different cortical folds or 2) optimal cortical folding template

in the generic weighted sum of squared differences dissimilarity

measure for the localization task. We demonstrate state-of-the-art

localization results in both histological and functional magnetic

resonance imaging data sets.

Index Terms—Cross validation error, functional magnetic res-

onance imaging (fMRI), histology, ill-posed, leave one out error,

local maxima, local minima, model selection, objective function,

parameter tuning, registration parameters, regularization, space

of local optima, tradeoff.

I. INTRODUCTION

IN medical image analysis, registration is necessary to es-

tablish spatial correspondence across two or more images.

Traditionally, registration is considered a preprocessing step

[Fig. 1(a)]. Images are registered and are then used for other

image analysis applications, such as voxel-based morphometry

and shape analysis. Here, we argue that the quality of image

registration should be evaluated in the context of the applica-

tion. In particular, we propose a framework for learning the

parameters of registration cost functions that are optimal for

a speciﬁc application. Our framework is therefore equivalent

to classic machine learning approaches of model selection by

optimization of cross-validation error [33], [43], [58].

A. Motivation

Image registration is typically formulated as an optimization

problem with a cost function that comprises an image dissim-

ilarity term and a regularization term [Fig. 1(a)]. The param-

eters of the cost function are frequently determined manually

by inspecting the quality of the image alignment to account

0278-0062/$26.00 © 2010 IEEE

Authorized licensed use limited to: MIT Libraries. Downloaded on July 22,2010 at 11:10:18 UTC from IEEE Xplore. Restrictions apply.

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1425

Fig. 1. Traditional and proposed frameworks for image registration. indicates a collection of images. In image registration, we seek a deformation for

each image . The resulting deformations are then used for other applications, such as segmentation or group analysis. The registration cost function

typically contains multiple parameters, such as the tradeoff parameter and the template . Changes in these parameters alter the deformations and thus the

outcomes of downstream applications. In our framework (b), we assume a training data set, which allows us to evaluate the quality of the registration as measured by

the application performance (or cross-validation error metric) for each training subject. This allows us to pick the best parameters that result in good registration

as measured by . Subsequent new subjects are registered using these learned parameters.

Fig. 2. Examples of ambiguities in image registration, which can potentially be resolved by taking the application at hand into account. (a) Postcentral sulci with

different topology. (b) BAs overlaid on cortical surfaces.

for the characteristics (e.g., resolution, modality, signal-to-noise

ratio) of the image data. During this process, the ﬁnal task is

rarely considered in a principled fashion. Furthermore, the vari-

ability of the results due to these tunable parameters is rarely re-

ported in the literature. Yet, recent work has shown that taking

into account the tradeoff between the regularization and simi-

larity measure in registration can signiﬁcantly improve popula-

tion analysis [40] and segmentation quality [10], [79].

In addition to improving the performance of applications

downstream, taking into account the end-goal of registration

could help resolve ambiguities and the ill-posed nature of

image registration.

1) The variability of the folding pattern in the human cerebral

cortex is well-documented (see e.g., [45]). Fig. 2(a) shows

postcentral sulci of two different subjects. Note the differ-

ences in topology between the two sulci. When matching

cortical folds, even neuroanatomical experts disagree on

whether to join the ends of the broken sulcus or to break

up the uninterrupted sulcus.

2) In population studies of human brain mapping, it is

common to align subjects into a single coordinate system

by aligning macroanatomy or cortical folding patterns.

The pooling of functional data in this common coordinate

system boosts the statistical power of group analysis and

allows functional ﬁndings to be compared across different

studies. However, substantial cytoarchitectonic [3], [4],

[18] and functional [41], [62]–[64], [77], [78] variability is

widely reported. One reason for this variability is certainly

misregistration of the highly variable macroanatomy.

However, even if we perfectly align the macroanatomy,

the underlying function and cellular architecture of the

cortex will not be aligned because the cortical folds do

not completely predict the underlying brain function [54],

[62]. To illustrate this, Fig. 2(b) shows nine Brodmann

areas (BAs) projected onto the cortical surfaces of two

different subjects, obtained from histology. BAs deﬁne

cytoarchitectonic parcellation of the cortex closely related

to brain function [9]. Here, we see that perfectly aligning

the inferior frontal sulcus [Fig. 2(b)] will misalign the

superior end of BA44 (Broca’s language area). If our

goal is to segment sulci and gyri, perfect alignment of the

cortical folding pattern is ideal. However, it is unclear that

perfectly aligning cortical folds is optimal for function

localization.

Authorized licensed use limited to: MIT Libraries. Downloaded on July 22,2010 at 11:10:18 UTC from IEEE Xplore. Restrictions apply.

1426 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

In this paper, we propose a task-optimal registration framework

that optimizes parameters of any smooth family of registration

cost functions on a training set, with the aim of improving the

performance of a particular task for a new image [Fig. 1(b)]. The

key idea is to introduce a second layer of optimization over and

above the usual registration. This second layer of optimization

assumes the existence of a smooth cost function or cross-valida-

tion error metric [ in Fig. 1(b)] that evaluates the performance

of a particular task given the output of the registration step for

a training data set. The training data provides additional infor-

mation not present in a test image, allowing the task-speciﬁc

cost function to be evaluated during training. For example, if the

task is segmentation, we assume the existence of a training data

set with ground truth segmentation and a smooth cost function

(e.g., Dice overlap measure) that evaluates segmentation accu-

racy. If the registration cost function employs a single param-

eter, then the optimal parameter value can be found by exhaus-

tive search [79]. With multiple parameters, exhaustive search is

not possible. Here, we establish conditions for which the space

of local minima is locally smooth and demonstrate the optimiza-

tion of thousands of parameters by gradient descent on the space

of local minima, selecting registration parameters that result in

good registration local minima as measured by the task-speciﬁc

cost function in the training data set.

We validate our framework on two datasets. The ﬁrst dataset

consists of 10 ex vivo brains with the BAs of each subject ob-

tained via histology [4], [84] and mapped onto the cortical sur-

face representation of each subject obtained via MRI [18]. The

second dataset consists of 42 in vivo brains with functional re-

gion MT+ (V5) deﬁned using functional magnetic resonance

imaging (fMRI). Here, our task is deﬁned to be the localiza-

tion of BAs and MT+ in the cortical surface representation via

the registration of the cortical folding pattern. While it is known

that certain cytoarchitectonically or functionally-deﬁned areas,

such as V1 or BA28, are spatially consistent with respect to

local cortical geometry, other areas, such as BA44, present a

challenge for existing localization methods [18], [20]. We learn

the weights of the weighted sum of squared differences (wSSD)

family of registration cost functions and/or estimate an optimal

macroanatomical template for localizing the cytoarchitectural

and functional regions using only the cortical folding pattern.

We demonstrate improvement over existing methods [18].

B. Related Work

An alternative approach to overcome the imperfect corre-

lation between anatomy and function is to directly use the

functional data for establishing across-subject functional cor-

respondence [54], [56]. However, these approaches require

extra data acquisition (such as fMRI scans) of all future test

subjects. In contrast, our method aims to learn the relationship

between macro-anatomy and function (or cytoarchitectonics)

in a training data set containing information about both

macro-anatomy and function (or cytoarchitectonics). We use

this information to localize function (or cytoarchitectonics) in

future subjects, for which only macro-anatomical information

is available.

Our approach belongs to the class of “wrapper methods”

for model or feature selection in the machine learning litera-

ture [27], [34]. In particular, our model selection criterion of

application-speciﬁc performance is equivalent to the use of

cross-validation error in various model selection algorithms

[33], [43], [58]. Unlike feature selection methods that operate

in a discrete parameter space, we work in a continuous param-

eter space. Consequently, standard algorithms in the “wrapper

methods” literature do not apply to this problem.

Instead, our resulting optimization procedure borrows

heavily from the mathematical ﬁeld of continuation methods

[2]. Continuation methods have been recently introduced to

the machine learning community for computing the entire path

of solutions of learning problems (e.g., SVM or Lasso) as a

function of a single regularization parameter [16], [28], [46].

For example, the cost function in Lasso [67] consists of the

tradeoff between a least-squares term and a regularization

term. Least-angles regression (LARS) allows one to compute

the entire set of solutions of Lasso as a function of the tradeoff

parameter [16]. Because we deal with multiple (thousands of)

parameters, it is impossible for us to compute the entire solution

manifold. Instead, we trace a path within the solution manifold

that improves the task-speciﬁc cost function. Furthermore,

registration is not convex (unlike SVM and Lasso), resulting in

several theoretical and practical conundrums that we have to

overcome, some of which we leave for future work.

The wSSD similarity measure implicitly assumes an indepen-

dent Gaussian distribution on the image intensities, where the

weights correspond to the precision (reciprocal of the variance)

and the template corresponds to the mean of the Gaussian distri-

bution. The weights can be set to a constant value [6], [31] or a

spatially-varying variance can be estimated from the intensities

of registered images [19]. However, depending on the wSSD

regularization tradeoff, the choice of the scale of the variance

is still arbitrary [79]. With weaker regularization, the training

images will be better aligned, resulting in lower variance esti-

mates.

Recent work in probabilistic template construction resolves

this problem by either marginalizing the tradeoff under a

Bayesian framework [1] or by estimating the tradeoff with

the minimum description length principle [71]. While these

methods are optimal for “explaining the images” under the

assumed generative models, it is unclear whether the estimated

parameters are optimal for application-speciﬁc tasks. After

all, the parameters for optimal image segmentation might be

different from those for optimal group analysis. In contrast, Van

Leemput [74] proposes a generative model for image segmenta-

tion. The estimated parameters are therefore Bayesian-optimal

for segmentation. When considering one global tradeoff pa-

rameter, a more direct approach is to employ cross-validation

of segmentation accuracy and to perform an exhaustive search

over the values of the tradeoff parameter [79]. This is infeasible

for multiple parameters.

By learning the weights of the wSSD, we implicitly optimize

the tradeoff betweeen the dissimilarity measure and regulariza-

tion. Furthermore, the tradeoff we learn is spatially varying. Pre-

vious work on learning a spatially varying regularization prior

suffers from the lack of ground truth (nonlinear) deformations.

Authorized licensed use limited to: MIT Libraries. Downloaded on July 22,2010 at 11:10:18 UTC from IEEE Xplore. Restrictions apply.

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1427

For example, [10], [25], [35] assume that the deformations ob-

tained from registering a set of training images can be used to es-

timate a registration regularization to register new images. How-

ever, a change in the parameters of the registration cost function

used by these methods to register the training images would lead

to a different set of training deformations and thus a different

prior for registering new images. Furthermore, the methods are

inconsistent in the sense that the learned prior applied on the

training images will not result in the same training deformations

obtained previously.

While there has been efforts in obtaining ground truth

human-annotated deformation ﬁelds [37], the images consid-

ered typically have well-deﬁned correspondences, rather than

for example, the brain images of two different subjects. As

suggested in the previously presented examples (Fig. 2), the

concept of “ground truth deformations” may not always be

well-deﬁned, since the optimal registration may be a function

of the application at hand. In contrast, image segmentation is

generally better deﬁned in the sense that ground truth segmen-

tation is usually known. Our problem therefore differs from

recent work on learning segmentation cost functions [42], [70],

[83]. In this paper, we avoid the need for ground truth defor-

mations by focusing on the application of registration-based

segmentation, where ground truth segmentations are better

deﬁned and available. However, our framework is general and

can be applied whenever a postregistration application can be

well quantiﬁed by a smooth application-speciﬁc performance

cost function.

This paper is organized as follows. In the next section, we in-

troduce the task-optimal registration framework. We specialize

the framework to align hidden labels in Section III. We present

localization experiments in Section IV and discuss outstanding

issues in Section V. This paper extends a previously presented

conference article [80] and contains detailed derivations, dis-

cussions and experiments that were omitted in the conference

version.

1) We present a framework for learning the parameters of

registration cost functions with respect to speciﬁc appli-

cations. We present an algorithm sufﬁciently efﬁcient for

optimizing thousands of parameters.

2) We specialize the framework for the alignment of hidden

labels, which are not necessarily well-predicted by local

image features.

3) We apply the framework to localizing cytoarchitectural and

functional regions using only the cortical folding pattern

and demonstrate improvements over existing localization

methods [18].

II. TASK-OPTIMAL FRAMEWORK

In this section, we present the task-optimal registration frame-

work for learning the parameters of a registration cost function.

Given an image , let denote a smooth registration cost

function, with parameters and a spatial transformation .For

example

(2.1)

where is the template image, is the tradeoff between the

image dissimilarity measure and the regularization on the trans-

formation , denotes the deformed and resampled image .

is therefore also a function of the image , which we suppress

for conciseness. The optimal transformation minimizes the

cost function for a given set of parameters

(2.2)

We emphasize that is a function of since a different set of

parameters will result in a different solution to (2.2) and thus

will effectively deﬁne a different image coordinate system.

The resulting deformation is used to warp the input image

or is itself used for further tasks, such as image segmentation

or voxel-based morphometry. We assume that the task perfor-

mance can be measured by a smooth cost function (or cross-val-

idation error metric) , so that a smaller value of cor-

responds to better task performance. is typically a function

of additional input data associated with a subject (e.g., manual

segmentation labels if the task is automatic segmentation), al-

though we suppress this dependency in the notation for concise-

ness. This auxiliary data is only available in the training set;

cannot be evaluated for the new image.

Given a set of training subjects, let denote the so-

lution of (2.2) for training subject for a ﬁxed set of parame-

ters and denote the task performance for training

subject using the deformation and other information

available for the th training subject. A different set of param-

eters would lead to different task performance .

We seek the parameters that generalize well to a new sub-

ject: registration of a new subject with yields the transfor-

mation with a small task-speciﬁc cost . One

approach to solve this functional approximation problem [17]

is regularized risk minimization. Let denote regulariza-

tion on and deﬁne

(2.3)

Regularization risk minimization seeks

(2.4)

The optimization is difﬁcult because while we assume to be

smooth, the input to is itself the local minimum of an-

other nonlinear cost function . Furthermore, evaluating the cost

function for only one particular set of parameters requires

performing different registrations!

A. Characterizing the Space of Local Minima

In this section, we provide theoretical characterizations of the

optimization problem in (2.4). If is deﬁned strictly to

be a global registration optimum, then is clearly not a

smooth function of , since a small change in can result in a

big change in the global registration optimum. This deﬁnition is

also impractical, since the global optimum of a nonlinear opti-

mization problem cannot be generally found in practice. Instead,

1428 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

we relax the deﬁnition of to be a local minimum of the

registration cost function for ﬁxed values of . Here, we derive

conditions in which is locally a smooth function of ,so

we can employ gradient descent to optimize (2.4).

Let denote a local minimum of the registration cost

function for a ﬁxed . Suppose we perturb by an in-

ﬁnitestimally small , so that is no longer the registra-

tion local minimum for . We consider two repre-

sentations of this change in local minimum.

Additive deformation models arise when the space of defor-

mations is a vector space, such as the space of displacement

ﬁelds or positions of B-spline control points. At each iteration

of the registration algorithm, deformation updates are added to

the current deformation estimates. The additive model is general

and applies to many non-convex, smooth optimization problems

outside of registration. Most registration algorithms can in fact

be modeled with the additive framework.

In some registration algorithms, including that used in this

paper, it is more natural to represent deformation changes

through composition rather than additions [7], [61], [75]. For

example, in the diffeomorphic variants of the demons algo-

rithm [75], [81], [82], the diffeomorphic transformation is

represented as a dense displacement ﬁeld. At each iteration,

the transformation update is restricted to be a one parameter

subgroup of diffeomorphism parameterized by a stationary

velocity ﬁeld. The diffeomorphic transformation update is then

composed with, rather than added to, the current estimate of the

transformation, thus ensuring that the resulting transformation

is diffeomorphic.

1) Addition Model: Let

denote the new locally optimal deformation for

the updated set of parameters . The following proposi-

tion characterizes the existence and uniqueness of

as is varied. In particular, we show that under some mild

conditions, is a well-deﬁned smooth function in

the neighborhood of . In the remainder, we use

, , and to denote the corresponding partial derivatives.

Proposition 1: If the Hessian1is positive deﬁ-

nite at , then there exists an , such that for all

, a unique continuous function ex-

ists with . Furthermore, has the same order

of smoothness as .

Proof: We deﬁne the vector-valued function

. Since is a local minimum of ,we

have

(2.5)

At , the Hessian matrix

is positive deﬁnite by the assumption of the proposition and

is therefore invertible. By the Implicit Function Theorem [51],

there exists an , such that for all , there is

a unique continuous function , such that

and . Further-

more, has the same order of smoothness as .

1Here, we assume that the transformation is ﬁnite dimensional, such as the

parameters of afﬁne transformations, positions of spline control points or dense

displacement ﬁelds deﬁned on the voxels or vertices of the image domain.

Because the Hessian of is smooth and the eigenvalues of

a matrix depend continuously on the matrix [72], there exists a

small neighborhood around in which the eigen-

values of are all greater than 0. Since both sufﬁcient

conditions for a local minimum are satisﬁed (zero gradient and

positive deﬁnite Hessian), is indeed a

new local minimum close to .

Observe that the conditions in Proposition 1 are stronger than

those of typical nonlinear optimization problems. In particular,

we do not just require the cost functions and to be smooth,

but also that the Hessian be positive deﬁnite at the

local minimum. At , by deﬁnition, the Hessian

is positive semi-deﬁnite, so the positive deﬁnite

condition in Proposition 1 should not be too restrictive. Unfor-

tunately, degeneracies may arise for local minima with a sin-

gular Hessian. For example, let be the 1 2 vector and

. Then for any value of , there is an in-

ﬁnite number of local minima corresponding to .

Furthermore, the Hessian at any local minimum is singular. In

this case, even though is inﬁnitely differentiable, there is an

inﬁnite number of local minima near the current local minimum

, i.e., is not a well-deﬁned function and

the gradient is not deﬁned. Consequently, the parameters of

local registration minima whose Hessians are singular are also

local minima of the task-optimal optimization (2.4). The proof

of Proposition 1 follows the ideas of the Continuation Methods

literature [2]. We include the proof here to motivate the more

complex composition of deformations model.

2) Composition Model: Let be the registration local

minimum at and denote an update transformation

parameterized by , so that corresponds to the identity

transform. For example, could be a stationary [75], [81],

[82], nonstationary [8] velocity ﬁeld parameterization of dif-

feomorphism, positions of spline control points [52] or simply

displacement ﬁelds [59]. In the composition model,

is a local minimum if and only if there exists an , such

that for all values of

.

Let denote the new locally optimal

deformation for the new parameters . In general, there

might not exist a single update transformation

that leads to a new local minimum under a perturbation of the

parameters , so that there is no correponding version of Propo-

sition 1 for the general composition model. However, in the spe-

cial case of the composition of diffeomorphisms model [75],

[81], [82] employed in this paper, the following proposition

characterizes the existence and uniqueness of as

is varied.

Proposition 2: If the Hessian is

positive deﬁnite at , then there exists an , such that

for all , a unique continuous function

exists, such that is the new local minimum for pa-

rameters and . Furthermore,

has the same order of smoothness as .

Proof: The proof is a more complicated version of Propo-

sition 1 and so we leave the details to Appendix A.

Just like in the case of the additive deformation model, the

parameters of local registration minima that do not satisfy the

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1429

conditions of Proposition 2 are also local minima of the task-

optimal optimization (2.4). In the next section, we derive exact

and approximate gradients of .

B. Optimizing Registration Parameters

We now discuss the optimization of the regularized task per-

formance .

1) Addition Model: In the previous section, we showed that

at with a positive deﬁnite Hessian, is

a smooth well-deﬁned function such that

is the new local minimum at . Therefore, we can com-

pute the derivatives of with respect to at , allowing

us to traverse a curve of local optima, ﬁnding values of

that improve the task-speciﬁc cost function for the training

images. We ﬁrst perform a Taylor expansion of at

(2.6)

where we dropped the term .For

, the left-hand side is equal to 0 and we can write

(2.7)

Therefore, by taking the limit , we get

(2.8)

Equation (2.8) tells us the direction of change of the local min-

imum at . In practice, the matrix inversion in (2.8)

is computationally prohibitive for high-dimensional warps .

Here, we consider a simpliﬁcation of (2.8) by setting the Hes-

sian to be the identity

(2.9)

Since is the direction of gradient descent of the cost func-

tion (2.2), we can interpret (2.9) as approximating the new local

minimum to be in the same direction as the change in the direc-

tion of gradient descent as is perturbed.

Differentiating the cost function in (2.4), using the chain rule,

we obtain

(2.10)

(2.11)

(2.12)

We note the subscript on indicates the dependency of the

registration cost function on the th training image.

2) Composition Model: In the previous section, we have

shown that at , assuming the conditions of Propo-

sition 2 are true, is a smooth well-deﬁned function

such that is the new local minimum.

Therefore, we can compute the derivatives of with respect to

. As before, by performing a Taylor expansion, we obtain

(2.13)

(2.14)

Appendix B provides the detailed derivations. Differentiating

the cost function in (2.4), using the chain rule, we get

(2.15)

(2.16)

Once again, the subscript on indicates the dependency of

the registration cost function on the th training image.

Algorithm 1 summarizes the method for learning the task-op-

timal registration parameters. Each line search involves eval-

uating the cost function multiple times, which in turn re-

quires registering the training subjects, resulting in a compu-

tationally intensive process. However, since we are initializing

from a local optimum, for a small change in , each registration

converges quickly.

Algorithm 1. Task-Optimal Registration

Data: A set of training images

Result: Parameters that minimize the regularized task

performance [see (2.4)]

Initialize .

repeat

Step 1. Given current values of , estimate

, i.e., perform

registration of each training subject .

Step 2. Given current estimates ), compute the

gradient using either

1) Eq. (2.12) via in (2.9) for the addition

model or

2) Eq. (2.16) via in (2.14) for the

composition model.

Step 3. Perform line search in the direction opposite to

[47].

1430 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

Fig. 3. Illustration of the differences between our approach and the pairwise registration approach. In our approach, we use training images and labels to learn an

optimal cost function that is optimal for aligning the labels of the training and template subjects. This cost function is then used to register and predict the hidden

label in a new subject. (a) Pairwise registration without training using ground truth labels. (b) Task-optimal registration framework.

Since nonlinear registration is dependent on initialization, the

current estimates , which were initialized from pre-

vious estimates, might not be achievable when initializing the

registration with the identity transform. The corresponding pa-

rameters might therefore not generalize well to a new sub-

ject, which are typically initialized with the identity transform.

To put this more concretely, suppose our current estimates of

and the registration local minima are ( , ). Next,

we perform the gradient decent step and update w accordingly.

For argument’s sake, let our new estimates of and the registra-

tion local minima be ( , ). Note that this

particular value of is achieved by initializing the

registration with . Had we initialized the registration with

the identity transform (such as for a new subject), then

might instead be equal to 2.1, with possibly poorer application

performance than ( , ). To avoid this form of

overﬁtting, after every few iterations, we reregister the training

images by initializing with the identity transform, and verify that

the value of is better than the current best value of com-

puted with initialization from the identity transform.

The astute reader will observe that the preceding discussion

on “Addition Model” makes no assumptions speciﬁc to the task-

optimal registration problem. The framework can therefore also

be applied to learn the cost functions in other applications that

are formulated as nonlinear optimization problems solved by

gradient descent.

III. LEARNING WSSD FOR HIDDEN LABEL ALIGNMENT

We now instantiate the task-optimal registration framework

for localizing hidden labels in images. We demonstrate schemes

for either 1) learning the weights of the wSSD family of registra-

tion cost functions or 2) estimating an optimal template image

for localizing these hidden labels. We emphasize that the op-

timal template is not necessarily the average of the training im-

ages, since the goal is not to align image intensities across sub-

jects, but to localize the hidden labels.

Suppose we have a set of training images with some

underlying ground truth structure manually labeled or obtained

from another imaging modality (e.g., Brodmann areas from his-

tology mapped onto cortical surface representations). We de-

ﬁne our task as localizing the hidden structure in a test image.

In the traditional pairwise registration approach [Fig. 3(a)], a

single training subject is chosen as the template. After pairwise

registration between the template and test images, the ground

truth label of the template subject is used to predict that of

the test subject. The goal of predicting the hidden structure in

the test subject is typically not considered when choosing the

training subject or registration algorithm. For hidden labels that

are poorly predicted by local image intensity (e.g., BA44 dis-

cussed in Section I-A), blind alignment of image intensities lead

to poor localization.

In contrast, we pick one training subject as the initial template

and use the remaining training images and labels [Fig. 3(b)] to

learn a registration cost function that is optimal for aligning the

labels of the training and template subjects—perfect alignment

of the labels lead to perfect prediction of the labels in the training

subjects by the template labels. After pairwise registration be-

tween the template and test subject using the optimal registra-

tion cost function, the ground truth label of the template subject

is used to predict that of the test subject.

We limit ourselves to spherical images (i.e., images deﬁned

on a unit sphere), although it should be clear that the discus-

sion readily extends to volumetric images. Our motivation for

using spherical images comes from the representation of the

human cerebral cortex as a closed 2-D mesh in 3-D. There has

been much effort focused on registering cortical surfaces in 3-D

[14], [15], [24], [30], [65]. Since cortical areas—both structure

and function—are arranged in a mosaic across the cortical sur-

face, an alternative approach is to warp the underlying spher-

ical coordinate system [19], [48], [60], [66], [69], [73], [79],

[81]. Warping the spherical coordinate system establishes cor-

respondences across the surfaces without actually deforming the

surfaces in 3-D. We assume that the meshes have already been

spherically parameterized and represented as spherical images:

a geometric attribute is associated with each mesh vertex, de-

scribing local cortical geometry.

A. Instantiating Registration Cost Function

To register a given image to the template image ,we

deﬁne the following cost function:

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1431

where transformation maps a point on the sphere to

another point . The ﬁrst term corresponds to the

wSSD image similiarity. The second term is a percentage metric

distortion regularization on the transformation where is

a predeﬁned neighborhood around vertex and is the orig-

inal distance between the neighbors [79].

The weights ’s are generalizations of the tradeoff param-

eter , allowing for a spatially-varying tradeoff between the

image dissimilarity term and regularization: a higher weight

corresponds to placing more emphasis on matching the tem-

plate image at spatial location relative to the regularization.

The parameterization of the weights as ensures nonnegative

weights.

In this work, we consider either learning the weights or

the template for localizing BA labels or functional labels

by aligning cortical folding pattern. Since the weights of the

wSSD correspond to the precision of the Gaussian model, by

learning the weights of wSSD, we are learning the precision of

the Gaussian model and hence the uncertainty of the sulcal ge-

ometry. Optimizing leads to placing nonuniform importance

on matching different cortical folds with the aim of aligning

the underlying cytoarchitectonics or function. For example, sup-

pose there is a sulcus with functional regions that appear on

either side of the sulcus depending on the subject. The algo-

rithm may decide to place low weight on the “poorly predictive”

sulcus. On the other hand, optimizing corresponds to learning

a cortical folding template that is optimal for localizing the un-

derlying cytoarchitectonics or functional labels of the training

subjects. In the case of the previously mentioned “unpredic-

tive sulcus,” the algorithm might learn that the optimal cortical

folding template should not contain this sulcus.

We choose to represent the transformation as a composi-

tion of diffeomorphic warps parameterized by a stationary

velocity ﬁeld, so that [75], [81], [82].

We note that our choice of regularization is different from the

implicit hierarchical regularization used in Spherical Demons

[81] since the Demons regularization is not compatible with our

derivations from the previous section. Instead of the efﬁcient

2-Step Spherical Demons algorithm, we will use steepest de-

scent. The resulting registration algorithm is still relatively fast,

requiring about 15 min for registering full-resolution meshes

with more than 100k vertices, compared with 5 min of computa-

tion for Spherical Demons on a Xeon 2.8-GHz single processor

machine.

In general, a smooth stationary velocity ﬁeld parameter-

izes a diffeomorphism via a stationary ODE:

with an initial condition . The solution

at is denoted as , where we

have dropped the time index. A solution can be computed efﬁ-

ciently using scaling and squaring [5]. This particular choice of

representing deformations provides a computationally efﬁcient

method of achieving invertible transformations, which is a de-

sirable property in many medical imaging applications. In our

case, the velocity ﬁeld is a tangent vector ﬁeld on the sphere

and not an arbitrary 3-D vector ﬁeld.

B. Optimizing Registration Cost Function

To register subject to the template image for a ﬁxed set

of parameters , let be the current estimate of . We seek

an update transformation parameterized by a stationary

velocity ﬁeld

(3.1)

Let be the velocity vector tangent to vertex , and

be the entire velocity ﬁeld. We adopt the techniques in the

Spherical Demons algorithm [81] to differentiate (3.1) with re-

spect to , evaluated at . Using the fact that the differential

of at is the identity [44], i.e., ,

we conclude that a change in velocity at vertex does not

affect for up to the ﬁrst order derivatives.

Deﬁning to be the 1 3 spatial gradient of the

warped image at and to be the 3 3 Ja-

cobian matrix of at , we get the 1 3 derivative

(3.2)

We can perform gradient descent of the registration cost func-

tion using (3.2) to obtain , which can be used to eval-

uate the regularized task performance to be described in the

next section. We also note that (3.2) instantiates within

the mixed derivatives term in the task-optimal gradient (2.16)

for this application.

C. Instantiating Regularized Task Performance

We represent the hidden labels in the training subjects as

signed distance transforms on the sphere [36]. We con-

sider a pairwise approach, where we assume that the template

image has a corresponding labels with distance transform

and set the task-speciﬁc cost function to be

(3.3)

A low value of indicates good alignment of the hidden label

maps between the template and subject , suggesting good pre-

diction of the hidden label.

We experimented with a prior that encourages spatially con-

stant weights and template, but did not ﬁnd that the regulariza-

tion lead to improvements in the localization results. In partic-

ular, we considered the following smoothness regularization on

the registration parameters depending on whether we are opti-

mizing for the weights or the template :

(3.4)

(3.5)

1432 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

A possible reason for this lack of improvement is that the rereg-

istration after every few line searches already helps to regularize

against bad parameter values. Another possible reason is that the

above regularization assumes a smooth variation in the relation-

ship between structure and function, which may not be true in

reality. Unfortunately, the relationship between macro-anatom-

ical structure and function is poorly understood, making it difﬁ-

cult to design a more useful regularization that could potentially

improve the results. In the experiments that follow, we will dis-

card the regularization term of the registration parameters (i.e.,

set ). We also note that is typically set to

0 in machine learning approaches of model selection by opti-

mization of cross-validation error [33], [43], [58].

D. Optimizing Task Performance

Tooptimize the task performance over the set of parameters

, we have to instantiate the task-optimal gradient speciﬁed in

(2.16). We ﬁrst compute the derivative of the task-speciﬁc cost

function with respect to the optimal update . Once again, we

represent as the collection , where is a velocity vector

at vertex . Deﬁning to be the 1 3 spatial

gradient of the warped distance transform of the th subject

at , we get the 1 3 derivative

(3.6)

Weight Update: To update the weights of the wSSD,

we compute the derivative of the registration local minimum

update with respect to the weights. Using the approximation

in (2.14), we obtain the 3 1 derivative of the velocity update

with respect to the weights of the wSSD cost function

(3.7)

(3.8)

(3.9)

(3.10)

Here if and is zero otherwise. Since (3.10)

is in the same direction as the ﬁrst term of the gradient descent

direction of registration [negative of (3.2)], increasing will

improve the intensity matching of vertex of the template.

Substituting (3.10) and (3.6) into (2.16) provides the gradient

for updating the weights of the wSSD cost function.

Template Update: To update the template image used for

registration, we compute the 3 1 derivative of the velocity

update with respect to the template

(3.11)

(3.12)

(3.13)

(3.14)

Since (3.14) is in the same direction as the ﬁrst term of the gra-

dient descent direction of registration [negative of (3.2)], when

is larger than , increasing the value of

will warp vertex of the template further along the direction of

increasing intensity in the subject image. Conversely, if

is smaller than , decreasing the value of will

warp vertex of the template further along the direction of de-

creasing intensity in the subject image. Substituting (3.14) and

(3.6) into (2.16) provides the gradient for updating the template

used for registration. Note that the template subject’s hidden la-

bels are considered ﬁxed in template space and are not modiﬁed

during training.

We can in principle optimize both the weights and the

template . However, in practice, we ﬁnd that this does not

lead to better localization, possibly because of too many de-

grees-of-freedom, suggesting the need to design better regular-

ization of the parameters. A second reason might come from

the fact that we are only using an approximate gradient rather

than the true gradient for gradient descent. Previous work [82]

has shown that while using an approximate gradient can lead to

reasonable solutions, using the exact gradient can lead to sub-

stantially better local minima. Computing the exact gradient is

a challenge in our framework. We leave exploration of efﬁcient

means of computing better approximations of the gradient to fu-

ture work.

IV. EXPERIMENTS

We now present experiments on localizing BAs and fMRI-de-

ﬁned MT+ (V5) using macro-anatomical cortical folding in two

different data sets. For both experiments, we compare the frame-

work with using uniform weights [31] and FreeSurfer [19].

A. BA Localization

We consider the problem of localizing BAs in the surface

representations of the cortex using only cortical folding pat-

terns. In this study, ten human brains were analyzed histolog-

ically postmortem using the techniques described in [57] and

[84]. The histological sections were aligned to postmortem MR

with nonlinear warps to build a 3-D histological volume. These

volumes were segmented to separate white matter from other

tissue classes, and the segmentation was used to generate topo-

logically correct and geometrically accurate surface represen-

tations of the cerebral cortex using a freely available suite of

tools [21]. Six manually labeled BA maps (V1, V2, BA2, BA44,

BA45, MT) were sampled onto the surface representations of

each hemisphere, and errors in this sampling were manually

corrected (e.g., when a label was erroneously assigned to both

banks of a sulcus). A morphological close was then performed

on each label to remove small holes. Finally, the left and right

hemispheres of each subject were mapped onto a spherical coor-

dinate system [19]. The BAs on the resulting cortical represen-

tations for two subjects are shown in Fig. 2(b). We do not con-

sider BA4a, BA4p, and BA6 in this paper because they were not

histologically mapped by the experts in two of the ten subjects

in this particular data set (even though they exist in all human

brains).

As illustrated in Fig. 2(c) and discussed in multiple studies

[3], [4], [18], we note that V1, V2, and BA2 are well-predicted

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1433

by local cortical geometry, while BA44, BA45, and MT are not.

For all the BAs however, a spherical morph of cortical folding

was shown to improve their localization compared with only

Talairach or nonlinear spatial normalization in the Euclidean

3-D space [18]. Even though each subject has multiple BAs, we

focus on each structure independently. This allows for an easier

interpretation of the estimated parameters, such as the optimal

template example we provide in Section IV-A3. A clear future

direction is to learn a registration cost function that is jointly

optimal for localizing multiple cytoarchitectural or functional

areas.

We compare the following algorithms.

1) Task-Optimal. We perform leave-two-out

cross-validation to predict BA location. For each test

subject, we use one of the remaining nine subjects as

the template subject and the remaining eight subjects

for training. When learning the weights of the wSSD,

the weights are globally initialized to 1 and

the template image is ﬁxed to the geometry of the

template subject. When learning the cortical folding

template , the template image is initialized to that of

the template subject and the weights are globally

set to 1.

Once the weights or template are learned, we use them

to register the test subject and predict the BA of the

test subject by transferring the BA label from the

template to the subject. We compute the symmetric

mean Hausdorff distance between the boundary of

the true BA and the predicted BA on the cortical

surface of the test subject—smaller Hausdorff distance

corresponds to better localization [13]. The symmetric

mean Hausdorff distance between two curves is deﬁned

as follows. For each boundary point of the ﬁrst curve,

the shortest distance to the second curve is computed

and averaged. We repeat by computing and averaging

the shortest distance from each point of the second

curve to the ﬁrst curve. The symmetric mean Hausdorff

distance is obtained by averaging the two values. We

consider all 90 possibilities of selecting the test subject

and template, resulting in a total of 90 trials and 90

mean Hausdorff distances for each BA and for each

hemisphere.

2) Uniform-Weights. We repeat the process for the

uniform-weight method that ﬁxes the template

to the geometry of the template subject, and sets all

the weights to a global ﬁxed value without

training. We explore 14 different values of global

weight , chosen such that the deformations range

from rigid to ﬂexible warps. For each BA and each

hemisphere, we pick the best value of leading

to the lowest mean Hausdorff distances. Because

there is no cross-validation in selecting the weights,

the uniform-weight method is using an unrealistic

oracle-based version of the strategy proposed in [79].

3) FreeSurfer. Finally, we use FreeSurfer [19] to register

the 10 ex vivo subjects to the FreeSurfer Buckner40

atlas, constructed from the MRI of 40 in vivo subjects

Fig. 4. FreeSurfer’s atlas-based registration approach. Training and test sub-

jects are registered to an atlas. The BA of a training subject can then be used to

predict that of the test subject.

[21]. Once registered into this in vivo atlas space, for the

same 90 pairs of subjects, we can use the BAs of one ex

vivo subject to predict another ex vivo subject. We note

that FreeSurfer also uses the wSSD cost function, but

a more sophisticated regularization that penalizes both

metric and areal distortion. For a particular tradeoff

between the similarity measure and regularization, the

Buckner40 template consists of the empirical mean

and variance of the 40 in vivo subjects registered

to template space. We use the reported FreeSurfer

tradeoff parameters that were used to produce prior

state-of-the-art BA alignment [18].

We note that both the task-optimal and uniform-weights

methods use a pairwise registration framework, while

FreeSurfer uses an atlas-based registration framework. Under

the atlas-based framework, all the ex vivo subjects are regis-

tered to an atlas (Fig. 4). To use the BA of a training subject to

predict a test subject, we have to compose the deformations of

the training subject to the atlas with the inverse deformation of

the test subject to the atlas. Despite this additional source of

error from composing two warps, it has been shown that with

carefully constructed atlases, using the atlas-based strategy

leads to better registration because of the removal of template

bias in the pairwise registration framework [6], [23], [26], [31],

[32], [39], [79].

We run the task-optimal and uniform-weights methods on a

low-resolution subdivided icosahedron mesh containing 2562

vertices, whereas FreeSurfer results were computed on high-res-

olution meshes of more than 100k vertices. In our implementa-

tion, training on eight subjects takes on average 4 h on a standard

PC (AMD Opteron, 2GHz, 4GB RAM). Despite the use of the

low-resolution mesh, we achieve state-of-the-art localization ac-

curacy. We also emphasize that while training is computation-

ally intensive, registration of a new subject only requires one

minute of processing time since we are working with low-reso-

lution meshes.

1) Quantitative Results: Fig. 5 displays the mean and stan-

dard errors from the 90 trials of leave-two-out. On average, task-

optimal template performs the best, followed by task-optimal

weights. Permutation tests show that task-optimal template out-

performs FreeSurfer in ﬁve of the six areas, while task-optimal

1434 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

Fig. 5. Mean Hausdorff distances over an entire range of harmonic energy for BA44, BA45, and MT. First row corresponds to left hemisphere. Second row

corresponds to right hemipshere. indicates that task-optimal template is statistically signiﬁcantly better than FreeSurfer. indicates that task-optimal weights is

statistically signiﬁcantly better than FreeSurfer. Statistical threshold is set at 0.05, FDR corrected with respect to the 24 statistical tests performed in this section.

FreeSurfer is not statistically better than either of the task-optimal methods in any of the Brodmann areas. (a) Left BA44 . (b) Left BA45 . (c) Left MT.

(d) Right BA44 . (e) Right BA45 . (f) Right MT .

weights outperforms FreeSurfer in four of the six areas after cor-

rections for multiple comparisons (see Fig. 5 for more details).

For the Broca’s areas (BA44 and BA45) and MT, this is not

surprising. Since local geometry poorly predicts these regions,

by taking into account the ﬁnal goal of aligning BAs instead of

blindly aligning the cortical folds, our method achieves better

BA localization. FreeSurfer and the uniform-weights method

have similar performance because a better alignment of the cor-

tical folds on a ﬁner resolution mesh does not necessary improve

the alignment of these areas.

Since local cortical geometry is predictive of V1, V2, and

BA2, we expect the advantages of our framework to vanish.

Surprisingly, as shown in Fig. 6, task-optimal template again

achieve signiﬁcant improvement in BAs alignment over the uni-

form-weights method and FreeSurfer. Task-optimal weights is

also signiﬁcantly better than the uniform-weights method, but

only slightly better than FreeSurfer. Permutation tests show that

task-optimal template outperforms FreeSurfer in ﬁve of the six

areas, while task-optimal weights is outperforms FreeSurfer in

three of the six areas after corrections for multiple compar-

isons (see Fig. 6 for more details). This suggests that even when

local geometry is predictive of the hidden labels and anatomy-

based registration achieves reasonable localization of the labels,

tuning the registration cost function can further improve the task

performance. We also note that in this case, FreeSurfer performs

better than the uniform-weights method on average. Since local

cortical folds are predictive of these areas, aligning cortical folds

on a higher resolution mesh yields more precise alignment of the

cortical geometry and of the BAs.

We note that the FreeSurfer Buckner40 atlas utilizes 40

in vivo subjects consisting of 21 males and 19 females of a

wide-range of age. Of these, 30 are healthy subjects whose ages

range from 19 to 87. 10 of the subjects are Alzheimer’s patients

with age ranging from 71 to 86. The average age of the group

is 56 (see [12] for more details). The T1-weighted scans were

acquired on a 1.5T Vision system (Siemens, Erlangen Ger-

many), with the following protocol: two sagittal acquisitions,

FOV , matrix , resolution mm,

TR ms, TE ms, Flip angle ,TI ms

and TD ms. Two acquisitions were averaged together

to increase the contrast-to-noise ratio. The histological data set

includes ﬁve male and ﬁve female subjects, with age ranging

from 37 to 85 years old. The subjects had no previous his-

tory of neurologic or psychiatric diseases (see [4] for more

details). The T1-weighted scans of the subjects were obtained

on a 1.5T system (Siemens, Erlangen, Germany) with the

following protocol: ﬂip angle 40 ,TR ms, TE ms and

resolution mm. While there are demographic

and scanning differences between the in vivo and ex vivo data

sets, the performance differences between FreeSurfer and the

task-optimal framework cannot be solely attributed to this

difference. In particular, we have shown in previous work that

FreeSurfer’s results are worse when we use an ex vivo atlas

for registering ex vivo subjects (see [81,Table III]). Further-

more, FreeSurfer’s results are comparable with that of the

uniform-weights baseline algorithm, as well as previously pub-

lished results [18], where we have checked for gross anatomical

misregistration. We emphasize that since the goal is to optimize

Brodmann area localization, the learning algorithm might take

into account the idiosyncrasies of the registration algorithm

in addition to the relationship between macro-anatomy and

cytoarchitecture. Consequently, it is possible that the perfor-

mance differences are partly a result of our algorithm learning

a registration cost function with better local minima, thus

avoiding possible misregistration of anatomy.

2) Qualitative Results: Fig. 7 illustrates representative lo-

calization of the BAs for FreeSurfer and task-optimal template.

We note that the task-optimal boundaries (red) tend to be in

better visual agreement with the ground truth (yellow) bound-

aries, such as the right hemisphere BA44 and BA45.

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1435

Fig. 6. Mean Hausdorff distances over an entire range of harmonic energy for V1, V2, and BA2. First row corresponds to left hemisphere. Second row corresponds

to right hemisphere. indicates that task-optimal template is statistically signiﬁcantly better than FreeSurfer. indicates that task-optimal weights is statistically

signiﬁcantly better than FreeSurfer. Statistical threshold is set at 0.05, FDR corrected with respect to the 24 statistical tests performed in this section. FreeSurfer

is not statistically better than either of the task-optimal methods in any of the Brodmann areas. (a) Left V1 . (b) Left V2 . (c) Left BA2 . (d) Right V1 .

(e) Right V2 . (f) Right BA2.

Fig. 7. Representative BA localization in 90 trials of leave-two-out for FreeSurfer and task-optimal template. Yellow indicates ground truth boundary. Green

indicates FreeSurfer prediction. Red indicates Task-Optimal prediction. The representative samples were selected by ﬁnding subjects whose localization errors are

close to the mean localization errors for each BA. Furthermore, for a given BA, the same subject was selected for both methods to simplify the comparison.

3) Interpreting the Template: Fig. 8 illustrates an example of

learning a task-optimal template for localizing BA2. Fig. 8(a)

shows the cortical geometry of a test subject together with its

BA2. In this subject, the central sulcus is more prominent than

the postcentral sulcus. Fig. 8(b) shows the initial cortical ge-

ometry of a template subject with its corresponding BA2 in

black outline. In this particular subject, the postcentral sulcus

is more prominent than the central sulcus. Consequently, in the

uniform-weights method, the central sulcus of the test subject is

incorrectly mapped to the postcentral sulcus of the template, so

1436 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

Fig. 8. Template estimation in the task-optimal framework improves localization of BA2. (a) Cortical geometry of test subject with corresponding BA2 (in green).

(b) Initial cortical geometry of template subject with corresponding BA2 (in black). In (b), we also show the BA2 of the test subject (in green) after registration to

the intial template. (c) Final cortical geometry of template subject after task-optimal training. BA2 of the test subject (in green) after registration to the task-optimal

template demonstrates signiﬁcantly better alignment with the BA2 of the template subject.

that BA2 is misregistered. Fig. 8(b) also shows the BA2 of the

test subject (green) overlaid on the cortical geometry of the tem-

plate subject after registration to the initial template geometry.

During task-optimal training, our method interrupts the geom-

etry of the postcentral sulcus in the template because the unin-

terrupted postcentral sulcus in the template is inconsistent with

localizing BA2 in the training subjects. The ﬁnal template is

shown in Fig. 8(c). We see that the BA2 of the subject (green)

and the task-optimal template (black) are well-aligned, although

there still exists localization error in the superior end of BA2.

In the next section, we turn our attention to a fMRI data set.

Since the task-optimal template performed better than the task-

optimal weights, we will focus on the comparison between the

task-optimal template and FreeSurfer.

B. fMRI-MT+ Localization

We now consider the application of localizing fMRI-de-

ﬁned functional areas in the cortex using only cortical folding

patterns. Here, we focus on the so-called MT+ area localized

in 42 in vivo subjects using fMRI. The MT+ area deﬁned

functionally is thought to include primarily the cytoarchitec-

tonically-deﬁned MT and a small part of the medial superior

temporal (MST) area (hence the name MT+). The imaging

paradigm involved subjects viewing an alternating 16 s blocks

of moving and stationary concentric circles. The structural

scans were processed using the FreeSurfer pipeline [21], re-

sulting in spherically parameterized cortical surfaces [11], [19].

The functional data were analyzed using the general linear

model [22]. The resulting activation maps were thresholded

by drawing the activation boundary centered around the vertex

with maximal activation. The threshold was varied across sub-

jects in order to maintain a relatively ﬁxed ROI area of about

120 mm ( 5%) as suggested in [68]. The subjects consist

of 10 females and 32 males, with age ranging from 21 to 58

years old. 23 of the 42 subjects are clinically diagnosed with

schizophrenia, while the other 19 subjects are healthy controls.

Imaging took place on a 3T MR scanner (Siemens Trio) with

echoplanar (EP) imaging capability. Subjects underwent two

conventional high-resolution 3-D structural scans, constituting

a spoiled GRASS (SPGR) sequence (128 sagittal slices, 1.33

mm thickness, TR ms, TE ms, Flip angle ,

voxel size mm). Each functional run lasted 224 s

during which -weighted echoplanar (EP) images were ac-

quired (33 3-mm-thick slices, mm voxel size) using

a gradient echo (GR) sequence (TR ms; TE ms;

Flip angle ). To maximize training data, no distinction is

made between the healthy controls and schizophrenia patients.

1) Ex Vivo MT Prediction of In Vivo MT+: In this experi-

ment, we use each of the 10 ex vivo subjects as a template and

the remaining nine subjects for training a task-optimal template

for localizing MT. We then register each task-optimal template

to each of the 42 in vivo subjects and use the template subject’s

MT to predict that of the test subjects’ MT+. The results are

420 Hausdorff distances for each hemisphere. For FreeSurfer,

we align the 42 in vivo subjects to the Buckner40 atlas. Once

registered in this space, we can use MT of the ex vivo subjects

to predict MT+ of the in vivo subjects.

Fig. 9 reports the mean and standard errors of the Hausdorff

distances for both methods on both hemispheres. Once again,

we ﬁnd that the task-optimal template signiﬁcantly outperforms

the FreeSurfer template ( for both hemispheres). We

note that the errors in the in vivo subjects (Fig. 9) are signif-

icantly worse than those in the ex vivo subjects (Fig. 5). This

is not surprising since functionally deﬁned MT+ is slightly dif-

ferent from cytoarchitectonically deﬁned MT. Furthermore, the

ex vivo surfaces tend to be noisier and less smooth than those

acquired from in vivo subjects [81]. Since our framework at-

tempts to leverage domain speciﬁc knowledge about MT from

the ex vivo data, one would expect these mismatches between

the data sets to be highly deterimental to our framework. In-

stead, FreeSurfer appears to suffer more than our framework.

2) In Vivo MT Prediction of In Vivo MT+: To understand

the effects of the training set size on localization accuracy, we

perform cross-validation within the fMRI data set. For each

randomly selected template subject, we consider 9, 19, or 29

training subjects. The resulting task-optimal template is used

to register and localize MT+ in the remaining 32, 22, or 12

test subjects, respectively. The cross-validation trials were re-

peated 100, 200, and 300 times, respectively, resulting in a total

of 3200, 4400, and 3600 Hausdorff distances. This constitutes

thousands of hours of computation time. For FreeSurfer, we per-

form a pairwise prediction of MT+ among the in vivo subjects

after registration to the Buckner40 atlas, resulting in 1722 Haus-

dorff distances per hemisphere.

Fig. 10 reports the mean and standard errors of the Haus-

dorff distances for FreeSurfer and task-optimal template on both

hemispheres. We see that the FreeSurfer alignment errors are

now commensurate with the ex vivo results (Fig. 5). However,

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1437

Fig. 9. Mean Hausdorff distances using ex vivo MT to predict MT+ in in vivo

scans. Permutation testing shows that the differences between FreeSurfer and

task-optimal template are statistically signiﬁcant .

the task-optimal template still outperforms FreeSurfer (

for all cases). We also note that the accuracy of MT+ local-

ization improves with the size of the training set. The resulting

localization error with a training set of 29 subjects is less than

7 mm for both hemispheres. For all training set sizes, the lo-

calization errors are also better than the ex vivo MT experiment

(Fig. 5).

V. D ISCUSSION AND FUTURE WORK

The experiments in the previous section demonstrate the fea-

sibility of learning registration cost functions with thousands of

degrees-of-freedom from training data. We ﬁnd that the learned

registration cost functions generalize well to unseen test sub-

jects of the same (Sections IV-A and IV-B2), as well as dif-

ferent imaging modality (Section IV-B1). The almost linear im-

provement with increasing training subjects in the fMRI-deﬁned

MT+ experiment (Fig. 10) suggests that further improvements

can be achieved (in particular in the histological data set) with a

larger training set. Unfortunately, histological data over a whole

human hemisphere is difﬁcult to obtain, while fMRI localiza-

tion experiments tend to focus on single functional areas. There-

fore, a future direction of research is to combine histological

and functional information obtained from different subjects and

imaging modalities during training.

Since our measure of localization accuracy uses the mean

Hausdorff distance, ideally we should incorporate it into our

task-speciﬁc objective function instead of the SSD on the

distance transform representing the BA. Unfortunately, the

resulting derivative is difﬁcult to compute. Furthermore, the

gradient will be zero everywhere except at the BA boundaries,

causing the optimization to proceed slowly. On the other hand,

it is unclear how aligning the distance transform values far from

the boundary helps to align the boundary. Since distance trans-

form values far away from the boundary are larger, they can

dominate the task-speciﬁc objective function . Consequently,

we utilize the distance transform over the entire surface to

compute the gradient, but only consider the distance transform

within the boundary of the template BA when evaluating the

task performance criterion .

The idea of using multiple atlases for segmentation has

gained recent popularity [29], [49], [50], [53], [55], [76]. While

we have focused on building a single optimal template, our

method can complement the multiatlas approach. For example,

one could simply fuse the results of multiple individually-op-

timal templates for image segmentation. A more ambitious task

would be to optimize for multiple jointly-optimal templates for

segmentation.

In this work, we select one of the training subjects as the

template subject and use the remaining subjects for training.

The task-speciﬁc cost function evaluates the localization of

the hidden labels via the template subject. During training (ei-

ther for learning the weights or template in the registration cost

function), the Brodmann areas of the template subject are held

constant. Because the ﬁxed Brodmann areas are speciﬁc to the

template subject, the geometry of the template subject should

in fact be the best and most natural initialization. It does not

make sense to use the geometry of another subject (or average

geometry of the training subjects) as initialization for the tem-

plate subject’s Brodmann areas, especially since the geometry

of this other subject (or average geometry) is not registered

to the geometry of the template subject. However, the use of

a single subject’s Brodmann (or functional) area can bias the

learning process. An alternative groupwise approach modiﬁes

the task-speciﬁc cost function to minimize the variance of the

distance transforms across training subjects after registration.

In this case, both the template geometry and Brodmann (func-

tional) area are estimated from all the training subjects and dy-

namically updated at each iteration of the algorithm. The av-

erage geometry of the training subjects provided a reasonable

template initialization. However, our initial experiments in the

ex vivo data set do not suggest an improvement in task perfor-

mance over the pairwise formulation in this paper.

While this paper focuses mostly on localization of hidden

labels, different instantiations of the task-speciﬁc cost func-

tion can lead to other applications. For example, in group

analysis, the task-speciﬁc cost function could maximize differ-

ences between diseased and control groups, while minimizing

intra-group differences, similar to a recent idea proposed for

discriminative Procrustes alignment [38].

VI. CONCLUSION

In this paper, we present a framework for optimizing the pa-

rameters of any smooth family of registration cost functions,

such as the image dissimilarity-regularization tradeoff, with re-

spect to a speciﬁc task. The only requirement is that the task

performance can be evaluated by a smooth cost function on an

available training data set. We demonstrate state-of-the-art lo-

calization of Brodmann areas and fMRI-deﬁned functional re-

gions by optimizing the weights of the wSSD image-similarity

measure and estimating an optimal cortical folding template. We

believe this work presents an important step towards the auto-

matic selection of parameters in image registration. The gen-

erality of the framework also suggests potential applications to

other problems in science and engineering formulated as opti-

mization problems.

1438 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

Fig. 10. Plot of mean hausdorff errors for MT+ from cross-validation of the fMRI data set using either FreeSurfer or in vivo trained task-optimal template. For the

task-optimal framework, we tried different number of training subjects. Test errors decrease as we go from 9 to 19 to 29 training subjects. Once again, permutation

testing shows that the differences between FreeSurfer and task-optimal template are statistically signiﬁcant .

APPENDIX A

PROOF OF PROPOSITION 2

In this appendix, we prove Proposition 2: If the Hessian

is positive deﬁnite at , then

there exists an , such that for all , a unique

continuous function exists, such that

is the new local minimum for parameters and

. Furthermore, has the same order of

smoothness as .

In the next section, we ﬁrst prove that the Hessian

is equal to the mix-deriva-

tives matrix

under the composition of diffeomorphisms model [75], [81],

[82]. We then complete the proof of Proposition 2.

A. Proof of the Equivalence Between the Hessian

and Mix-Derivatives Matrix for the Composition of

Diffeomorphisms Model

We will only provide the proof for when the image is deﬁned

in so as not to obscur the main ideas behind the proof. To

extend the proof to a manifold (e.g., ), one simply need to

extend the notations and bookkeeping by the local parameter-

izing the velocity ﬁelds and using coordinate charts. The

same proof follows.

Let us deﬁne some notations. Suppose the image and there

are voxels. Let be the rasterized coordinates of the

voxels. For conciseness, we deﬁne for the ﬁxed parameters

(A.1)

Therefore, is a function from to . Under the compo-

sition of diffeomorphisms model, is the diffeomorphism

parameterized by the stationary velocity ﬁeld deﬁned on the

voxels, so that is a function from to .To

make the dependence of on explicit, we deﬁne

(A.2)

and so is a function from to . In other words,

we can rewrite

(A.3)

and

(A.4)

Now that we have gotten the notations out of the way, we will

now show that

(A.5)

Hessian: We ﬁrst compute the Jacobian via the chain

rule

(A.6)

From the above equation, we can equivalently write down the

th component of the Jacobian

(A.7)

where and denote the th and th components of and

, respectively. Now, we compute the th component of the

Hessian using the product rule

(A.8)

(A.9)

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1439

(A.10)

Because is the identity matrix and the Jaco-

bian (because

derivative is zero at local minimum), we get , and

so the second term in (A.10) is zero.

To simplify the ﬁrst term of (A.10), we once again use the

fact that is the identity matrix, and so the summand

is zero unless and . Consequently, (A.10) simpliﬁes

to

(A.11)

or equivalently

(A.12)

Mix-Derivatives Matrix: We ﬁrst compute the Jaco-

bian via the chain rule

(A.13)

(A.14)

From the above equation, we can equivalently write down the

th component of the Jacobian

(A.15)

Now, we compute the th component of the mix-

derivatives matrix using the product rule

(A.16)

(A.17)

Like before, we have , and so the second term

is zero. Because is the identity, is zero unless

. Since , is also equal to zero

unless . Therefore, we get

(A.18)

or equivalently

(A.19)

B. Completing the Proof of Proposition 2

We now complete the proof of Proposition2. Let

. Since ,

we have

(A.20)

(A.21)

(A.22)

where the last equality comes from the deﬁnition of

being a local minimum for the composition model.

Since the mix-derivatives matrix is invert-

ible by the positive-deﬁnite assumption of this proposition, by

the Implicit Function Theorem, there exists an , such

that for all , there is a unique continuous func-

tion , such that and

. Furthermore, has the same order of

smoothness as .

Let .

Then is positive deﬁnite at by the assumption

of the proposition. By the smoothness of derivatives and conti-

nuity of eigenvalues, there exists a small neighborhood around

in which the eigenvalues of are all greater

than zero. Therefore, does indeed de-

ﬁne a new local minimum close to .

APPENDIX B

COMPUTING THE DERIVATIVE

To compute , we perform a Taylor expansion

(B.1)

(B.2)

and rearranging the terms for , we get

(B.3)

ACKNOWLEDGMENT

The authors would like to thank P. Parillo for discussion on

the optimization aspects of this paper. The authors would also

like to thank C. Brun, S. Durrleman, T. Fletcher, and W. Mio

for helpful feedback on this work.

REFERENCES

[1] S. Allassonniere, Y. Amit, and A. Trouvé, “Toward a coherent statis-

tical framework for dense deformable template estimation,” J. R. Stat.

Soc., Series B, vol. 69, no. 1, pp. 3–29, 2007.

1440 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 29, NO. 7, JULY 2010

[2] E. Allgower and K. Georg, Introduction to Numerical Continuation

Methods. Philadelphia, PA: SIAM, 2003.

[3] K. Amunts, A. Malikovic, H. Mohlberg, T. Schormann, and K. Zilles,

“Brodmann’s areas 17 and 18 brought into stereotaxic space—Where

and how variable?,” NeuroImage, vol. 11, pp. 66–84, 2000.

[4] K. Amunts, A. Schleicher, U. Burgel, H. Mohlberg, H. Uylings, and

K. Zilles, “Broca’s region revisited: Cytoarchitecture and intersubject

variability,” J. Comparative Neurol., vol. 412, no. 2, pp. 319–341,1999.

[5] V. Arsigny, O. Commowick, X. Pennec, and N. Ayache, “A log-eu-

clidean framework for statistics on diffeomorphisms,” in Proc. Int.

Conf. Med. Image Computing Computer Assist. Intervent. (MICCAI),

2006, vol. 4190, LNCS, pp. 924–931.

[6] B. Avants and J. Gee, “Geodesic estimation for large deformation

anatomical shape averaging and interpolation,” NeuroImage, vol. 23,

pp. 139–150, 2004.

[7] S. Baker and I. Matthews, “Lucas-Kanade 20 years on: A unifying

framework,” Int. J. Comput. Vis., vol. 56, no. 3, pp. 221–255, 2004.

[8] M. Beg, M. Miller, A. Trouvé, and L. Younes, “Computing large defor-

mation metric mappings via geodesic ﬂows of diffeomorphisms,” Int.

J. Comput. Vis., vol. 61, no. 2, pp. 139–157, 2005.

[9] K. Brodmann, Vergleichende Lokalisationslehre der Großhirnrinde in

Ihren Prinzipien Dargestellt auf Grund des Zellenbaues 1909.

[10] O. Commowick, R. Stefanescu, P. Fillard, V. Arsigny, N. Ayache,

X. Pennec, and G. Malandain, “Incorporating statistical measures of

anatomical variability in atlas-to-subject registration for conformal

brain radiotherapy,” in Proc. Int. Conf. Med. Image Computing and

Computer Assist. Intervent. (MICCAI), 2005, vol. 3750, LNCS, pp.

927–934.

[11] A. M. Dale, B. Fischl, and M. I. Sereno, “Cortical surface-based anal-

ysis I: Segmentation and surface reconstruction,” NeuroImage, vol. 9,

pp. 179–194, 1999.

[12] R. Desikan, F. Segonne, B. Fischl, B. Quinn, B. Dickerson, D. Blacker,

R. Buckner, A. Dale, R. Maguire, B. Hyman, M. Albert, and R. Kil-

liany, “An automated labeling system for subdividing the human cere-

bral cortex on MRI scans into gyral based regions of interest,” Neu-

roImage, vol. 31, no. 3, pp. 968–980, 2006.

[13] M. Dubuisson and A. Jain, “A modiﬁed Hausdorff distance for object

matching,” in Proc. 12th IAPR Int. Conf. Pattern Recognit., 1994, vol.

1, pp. 566–568.

[14] S. Durrleman, X. Pennec, A. Trouvé, P. , and N. Ayache, “Inferring

brain variability from diffeomorphic deformations of currents: An in-

tegrative approach,” Med. Image Anal., vol. 12, no. 5, pp. 626–637,

2008, PMID: 18658005.

[15] I. Eckstein, A. Joshi, C. J. Kuo, R. Leahy, and M. Desbrun, “General-

ized surface ﬂows for deformable registration and cortical matching,”

in Proc. Int. Conf. Med. Image Computing Computer Assist. Intervent.

(MICCAI), 2007, vol. 4791, LNCS, pp. 692–700.

[16] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle re-

gression,” Ann. Stat., pp. 407–451, 2004.

[17] T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks

and support vector machines,” in Advances in Computational Mathe-

matics. Cambridge, MA: MIT Press, 2000, pp. 1–50.

[18] B. Fischl, N. Rajendran, E. Busa, J. Augustinack, O. Hinds, B. T. Yeo,

H. Mohlberg, K. Amunts, and K. Zilles, “Cortical folding patterns

and predicting cytoarchictecture,” Cerebral Cortex, vol. 18, no. 8, pp.

1973–1980, 2008.

[19] B. Fischl, M. Sereno, R. Tootell, and A. Dale, “High-resolution in-

tersubject averaging and a coordinate system for the cortical surface,”

Human Brain Mapp., vol. 8, no. 4, pp. 272–284, 1999.

[20] B. Fischl, A. Stevens, N. Rajendran, B. T. Yeo, D. Greve, K. Van

Leemput, J. Polimeni, S. Kakunoori, R. Buckner, J. Pacheco, D. Salat,

J. Melcher, M. Frosch, B. Hyman, P. E. Grant, B. R. Rosen, A. van

der Kouwe, G. Wiggins, L. Wald, and J. Augustinack, “Predicting the

location of entorhinal cortex from MRI,” Neuroimage, vol. 47, no. 1,

pp. 8–17, 2009.

[21] Freesurfer Wiki [Online]. Available: http://surfer.nmr.mgh.har-

vard.edu/fswiki/freesurferwiki/

[22] K. Friston, A. Holmes, K. Worsley, J.-P. Poline, C. Frith, and R. Frack-

owiak, “Statistical parametric maps in functional imaging: A general

linear approach,” Human BrainMapp., vol. 2, no. 4, pp. 189–210, 1995.

[23] X. Geng, G. Christensen, H. Gu, T. Ross, and Y. Yang, “Implicit refer-

ence-based group-wise image registration and its application to struc-

tural and functional MRI,” NeuroImage, vol. 47, no. 4, pp. 1341–1351,

2009.

[24] X. Geng, D. Kumar, and G. Christensen, “Transitive inverse-consistent

manifold registration,” in Proc. Int. Conf. Inf. Process. Med. Imag.,

2005, vol. 3564, LNCS, pp. 468–479.

[25] B. Glocker, N. Komodakis, N. Navab, G. Tziritas, and N. Paragios,

“Dense registration with deformation priors,” in Proc. Int. Conf. Inf.

Process. Med. Imag., 2009, vol. 5636, LNCS, pp. 540–551.

[26] A. Guimond, J. Meunier, and J.-P. Thirion, “Average brain models: A

convergence study,” Comput. Vis. Image Understand., vol. 77, no. 2,

pp. 192–210, 2000.

[27] I. Guyon and A. Elisseeff, “An introduction to variable and feature

selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, 2003.

[28] T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, “The entire regulariza-

tion path for the support vector machine,” J. Mach. Learn. Res., vol. 5,

pp. 1391–1415, 2004.

[29] R. Heckemann, J. Hajnal, P. Aljabar, D. Rueckert, and A. Hammers,

“Automatic anatomical brain MRI segmentation combining label prop-

agation and decision fusion,” NeuroImage, vol. 33, no. 1, pp. 115–126,

2006.

[30] S. Jaume, M. Ferrant, S. Warﬁeld, and B. Macq, “Multiresolution pa-

rameterization of meshes for improved surface-based registration,” in

Proc. SPIE Med. Imag., 2001, vol. 4322, pp. 633–642.

[31] S. Joshi, B. Davis, M. Jomier, and G. Gerig, “Unbiased diffeomorphic

atlas construction for computational anatomy,” NeuroImage, vol. 23,

pp. 151–160, 2004.

[32] A. Klein, S. S. Ghosh, B. Avants, B. T. Yeo, B. Fischl, B. Ardekani, J.

C. Gee, J. J. Mann, and R. V. Parsey, “Evaluation of volume-basedand

surface-based brain image registration methods,” Neuroimage, 2010.

[33] R. Kohavi, “A study of cross-validation and bootstrap for accuracy es-

timation and model selection,” in Int. Joint Conf. Artif. Intell., 1995,

vol. 14, pp. 1137–1145.

[34] R. Kohavi and G. John, “Wrappers for feature subset selection,” Artif.

Intell., vol. 97, no. 1-2, pp. 273–324, 1997.

[35] D. Lee, M. Hofmann, F. Steinke, Y. Altun, N. Cahill, and B. Schlkopf,

“Learning the similarity measure for multi-modal 3-D image reg-

istration,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern

Recognit., Jun. 2009, pp. 186–193.

[36] M. Leventon, W. Grimson, and O. Faugeras, “Statistical shape inﬂu-

ence in geodesic active contours,” in Proc. Int. Conf. Comput. Vis. Pat-

tern Recognit., 2000, pp. 1316–1323.

[37] C. Liu, W. T. Freeman, E. H. Adelson, and Y. Weiss, “Human-assisted

motion annotation,” in Proc. Int. Conf. Comput. Vis. Pattern Recognit.,

2008, pp. 1–8.

[38] M. Loog and M. de Bruijne, “Discriminative shape alignment,” in

Proc. Int. Conf. Inf. Process. Med. Imag., 2009, vol. 5636, LNCS, pp.

459–466.

[39] O. Lyttelton, M. Boucher, S. Robbins, and A. Evans, “An unbiased it-

erative group registration template for cortical surface analysis,” Neu-

roImage, vol. 34, no. 4, pp. 1535–1544, 2007.

[40] S. Makrogiannis, R. Verma, B. Karacali, and C. Davatzikos, “A joint

transformation and residual image descriptor for morphometric image

analysis using an equivalence class formulation,” in Proc. Workshop

Math. Methods Biomed. Image Anal., Int. Conf. Comput. Vis. Pattern

Recognit., New York, 2006.

[41] D. McGonigle, A. Howseman, B. Athwal, K. Friston, R. Frackowiak,

and A. Holmes, “Variability in fMRI: An examination of intersession

differences,” NeuroImage, vol. 11, no. 6, pp. 708–734, 2000.

[42] C. McIntosh and G. Hamarneh, “Is a single energy functional suf-

ﬁcient? Adaptive energy functionals and automatic initialization,” in

Proc. Int. Conf. Med. Image Computing Computer Assisted Intervent.

(MICCAI), 2007, vol. 4792, LNCS, pp. 503–510.

[43] A. Moore and M. Lee, “Efﬁcient algorithms for minimizing cross vali-

dation error,” in Proc. 11th Int. Conf. Mach. Learn., 1994, pp. 190–198.

[44] P. Olver, Applications of Lie Groups to Differential Equations, 2nd

ed. New York: Springer-Verlag, 1993.

[45] M. Ono, S. Kubick, and C. Abernathey, Atlas of the Cerebral Sulci, 1st

ed. Germany: Georg Thieme Verlag, 1990.

[46] M. Park and T. Hastie, “ -regularization path algorithm for general-

ized linear models,” J. R. Stat. Soc., Series B, vol. 69, pp. 659–677,

2007.

[47] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical

Recipes in C: The Art of Scientiﬁc Computing, 2nd ed. Cambridge,

U.K.: Cambridge Univ. Press, 1992.

[48] A. Qiu and M. Miller, “Cortical hemisphere registration via large defor-

mation diffeomorphic metric curve mapping,” in Proc. Int. Conf. Med-

ical Image Computing Computer Assisted Intervent. (MICCAI), 2007,

vol. 4791, LNCS, pp. 186–193.

[49] T. Rohﬂing, R. Brandt, R. Menzel, and C. Maurer, Jr., “Evaluation of

atlas selection strategies for atlas-based image segmentation with ap-

plication to confocal microscopy images of bee brains,” NeuroImage,

vol. 21, no. 4, pp. 1428–1442, 2004.

YEO et al.: LEARNING TASK-OPTIMAL REGISTRATION COST FUNCTIONS FOR LOCALIZING CYTOARCHITECTURE 1441

[50] T. Rohﬂing, D. Russakoff, and C. Maurer, “Performance-based classi-

ﬁer combination in atlas-based image segmentation using expectation-

maximization parameter estimation,” IEEE Trans. Med. Imag., vol. 23,

no. 8, pp. 983–994, Aug. 2004.

[51] W. Rudin, Principles of Mathematical Analysis. New York: Mc-

Graw-Hill, 1976.

[52] D. Rueckert, L. Sonoda, C. Hayes, D. Hill, M. Leach, and D. Hawkes,

“Non-rigid registration using free-form deformations: Application

to breast MR images,” IEEE Trans. Med. Imag., vol. 18, no. 8, pp.

712–720, Aug. 1999.

[53] M. R. Sabuncu, S. Balci, M. Shenton, and P. Golland, “Image-driven

population analysis through mixture-modeling,” IEEE Trans. Med.

Imag., vol. 28, no. 9, pp. 1473–1487, Sep. 2009.

[54] M. R. Sabuncu, B. Singer, B. Conroy, R. Bryan, P. Ramadge, and

J. Haxby, “Function-based inter-subject alignment of the cortical

anatomy,” Cerebral Cortex, vol. 20, no. 1, pp. 130–140, 2010.

[55] M. R. Sabuncu, B. T. Yeo, K. Van Leemput, B. Fischl, and P. Golland,

“Supervised nonparameteric image parcellation,” in Proc. Int. Con.

Med. Image Computing and Computer Assisted Intervent. (MICCAI),

2009, vol. 5762, LNCS, pp. 1075–1083.

[56] R. Saxe, M. Brett, and N. Kanwisher, “Divide and conquer: A defense

of functional localizers,” NeuroImage, vol. 30, no. 4, pp. 1088–1096,

2006.

[57] T. Schormann and K. Zilles, “Three-dimensional linear and non-linear

transformations: An integration of light microscopical and MRI data,”

Human Brain Mapp., vol. 6, pp. 339–347, 1998.

[58] J. Shao, “Linear model selection by cross-validation,” J. Am. Stat.

Assoc., pp. 486–494, 1993.

[59] D. Shen and C. Davatzikos, “HAMMER: Hierarchical attribute

matching mechanism for elastic registration,” IEEE Trans. Med.

Imag., vol. 21, no. 11, pp. 1421–1439, Nov. 2002.

[60] Y. Shi, J. Morra, P. Thompson, and A. Toga, “Inverse-consistent sur-

face mapping with laplace-beltrami eigen-features,” in Proc. Int. Conf.

Inf. Process. Med. Imag., 2009, vol. 5636, LNCS, pp. 467–478.

[61] H.-Y. Shum and R. Szeliski, “Construction of panoramic image mo-

saics with global and local alignment,” Int. J. Comput. Vis., vol. 16, no.

1, pp. 63–84, 2000.

[62] B. Thirion, G. Flandin, P. Pinel, A. Roche, P. Ciuciu, and J.-B. Poline,

“Dealing with the shortcomings of spatial normalization: Multi-sub-

ject parcellation of fMRI datasets,” Human Brain Mapp., vol. 27, pp.

678–693, 2006.

[63] B. Thirion, P. Pinel, S. Mériaux, A. Roche, S. Dehaene, and J.-B. Po-

line, “Analysis of a large fMRI cohort: Statistical and methodological

issues for group analyses,” NeuroImage, vol. 35, pp. 105–120, 2007.

[64] B. Thirion, P. Pinel, A. Tucholka, A. Roche, P. Ciuciu, J.-F. Mangin,

and J. Poline, “Structural analysis of fMRI data revisited: Improving

the sensitivity and reliability of fMRI group studies,” IEEE Trans. Med.

Imag., vol. 26, no. 9, pp. 1256–1269, Sep. 2007.

[65] P. Thompson and A. Toga, “A surface-based technique for warping

3-dimensional images of the brain,” IEEE Trans. Med. Imag., vol. 15,

no. 4, pp. 1–16, Aug. 1996.

[66] P. Thompson, R. Woods, M. Mega, and A. Toga, “Mathematical/com-

putational challenges in creating deformable and probabilistic atlases

of the human brain,” Human Brain Mapp., vol. 9, no. 2, pp. 81–92,

2000.

[67] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. R.

Stat. Soc. Series B (Methodological), pp. 267–288, 1996.

[68] R. Tootell and J. Taylor, “Anatomical evidence for MT and additional

cortical visual areas in humans,” Cerebral Cortex, vol. 5, pp. 39–55,

1995.

[69] D. Tosun and J. Prince, “Cortical surface alignment using geometry

driven multispectral optical ﬂow,” in Proc. Int. Conf. Inf. Process. Med.

Imag., 2005, vol. 3565, LNCS, pp. 480–492.

[70] Z. Tu, K. Narr, P. Dollar, I. Dinov, P. M. Thompson, and A. W.

Toga, “Brain anatomical structure segmentation by hybrid discrimina-

tive/generative models,” IEEE Trans. Med. Imag., vol. 27, no. 4, pp.

495–508, Apr. 2008.

[71] C. Twining, T. Cootes, S. Marsland, V. Petrovic,R. Schestowitz, and C.

Taylor, “A uniﬁed information-theoretic approach to groupwise non-

rigid registration and model building,” in Proc. Int. Conf. Inf. Process.

Med. Imag., 2005, vol. 3565, LNCS, pp. 1611–3349.

[72] E. Tyrtyshnikov, A Brief Introduction to Numerical Analysis. Boston,

MA: Birkhäuser, 1997.

[73] D. Van Essen, H. Drury, S. Joshi, and M. Miller, “Functional and struc-

tural mapping of human cerebral cortex: Solutions are in the surfaces,”

Proc. Nat. Acad. Sci., vol. 95, no. 3, pp. 788–795, 1996.

[74] K. Van Leemput, “Encoding probabilistic brain atlases using bayesian

inference,” IEEE Trans. Med. Imag., vol. 28, no. 6, pp. 822–837, Jun.

2009.

[75] T. Vercauteren, X. Pennec, A. Perchant, and N. Ayache, “Dif-

feomorphic demons: Efﬁcient non-parametric image registration,”

NeuroImage, vol. 45, no. 1, pp. S61–S72, 2009.

[76] S. K. Warﬁeld, K. H. Zou, and W. M. Wells, “Simultaneous Truth and

Performance Level Estimation (STAPLE): An algorithm for the vali-

dation of image segmentation,” IEEE Trans. Med. Imag., vol. 23, no.

7, p. 903, Jul. 2004.

[77] T. White, D. O’Leary, V. Magnotta, S. Arndt, M. Flaum, and N. An-

dreasen, “Anatomic and functional variability: The effects of ﬁlter size

in group fMRI data analysis,” NeuroImage, vol. 13, no. 4, pp. 577–588,

2001.

[78] J. Xiong, S. Rao, P. Jerabek, F. Zamarripa, M. Woldorff, J. Lancaster,

and P. Fox, “Intersubject variability in cortical activations during a

complex language task,” NeuroImage, vol. 12, no. 3, pp. 326–339,

2000.

[79] B. T. Yeo, M. Sabuncu, R. Desikan, B. Fischl, and P. Golland, “Effects

of registration regularization and atlas sharpness on segmentation ac-

curacy,” Med. Image Anal., vol. 12, no. 5, pp. 603–615, 2008.

[80] B. T. Yeo, M. Sabuncu, P. Golland, and B. Fischl, “Task-optimal reg-

istration cost functions,” in Proc. Int. Conf. Med. Image Computing

and Computer Assist. Intervent. (MICCAI), 2009, vol. 5761, LNCS,

pp. 598–606.

[81] B. T. Yeo, M. R. Sabuncu, T. Vercauteren, N. Ayache, B. Fischl, and P.

Golland, “Spherical demons: Fast diffeomorphic landmark-free surface

registration,”IEEE Trans. Med. Imag., vol. 29, no. 3, pp. 650–668, Mar.

2010.

[82] B. T. Yeo, T. Vercauteren, P. Fillard, J.-M. Peyrat, X. Pennec, P. Gol-

land, N. Ayache, and O. Clatz, “DT-REFinD: Diffusion tensor registra-

tion with exact ﬁnite-strain differential,” IEEE Trans. Med. Imag., vol.

28, no. 12, pp. 1914–1928, Dec. 2009.

[83] S. K. Zhou and D. Comaniciu, “Shape regression machine,” in Proc.

Int. Conf. Inf. Process. Med. Imag., 2007, vol. 4584, LNCS, pp. 13–25.

[84] K. Zilles, A. Schleicher, N. Palomero-Gallagher, and K. Amunts,

Quantitative Analysis of Cyto- and Receptor Architecture of the

Human Brain. New York: Elsevier, 2002.

- CitationsCitations35
- ReferencesReferences113

- "There are a wide range of registration algorithms, which employ different objective functions, deformation models, and optimization strategies (Sotiras et al., 2013). The optimal choice of algorithm specifics largely depend on the biomedical application, its goal, and operational constraints , such as available computational resources, desired accuracy, and restrictions on time (Yeo et al., 2010). "

[Show abstract] [Hide abstract]**ABSTRACT:**Multi-atlas segmentation (MAS), first introduced and popularized by the pioneering work of Rohlfing, et al. (2004), Klein, et al. (2005), and Heckemann, et al. (2006), is becoming one of the most widely-used and successful image segmentation techniques in biomedical applications. By manipulating and utilizing the entire dataset of "atlases" (training images that have been previously labeled, e.g., manually by an expert), rather than some model-based average representation, MAS has the flexibility to better capture anatomical variation, thus offering superior segmentation accuracy. This benefit, however, typically comes at a high computational cost. Recent advancements in computer hardware and image processing software have been instrumental in addressing this challenge and facilitated the wide adoption of MAS. Today, MAS has come a long way and the approach includes a wide array of sophisticated algorithms that employ ideas from machine learning, probabilistic modeling, optimization, and computer vision, among other fields. This paper presents a survey of published MAS algorithms and studies that have applied these methods to various biomedical problems. In writing this survey, we have three distinct aims. Our primary goal is to document how MAS was originally conceived, later evolved, and now relates to alternative methods. Second, this paper is intended to be a detailed reference of past research activity in MAS, which now spans over a decade (2003-2014) and entails novel methodological developments and application-specific solutions. Finally, our goal is to also present a perspective on the future of MAS, which, we believe, will be one of the dominant approaches in biomedical image segmentation. Copyright © 2015 Elsevier B.V. All rights reserved.- "Usually, such parameters are adjusted manually by observing the registration results, which does not always guarantee that the best combination is achieved. A solution to overcome this limitation is proposed in (Yeo et al., 2010b). "

[Show abstract] [Hide abstract]**ABSTRACT:**This paper presents a review of automated image registration methodologies that have been used in the medical field. The aim of this paper is to be an introduction to the field, provide knowledge on the work that has been developed and to be a suitable reference for those who are looking for registration methods for a specific application. The registration methodologies under review are classified into intensity or feature based. The main steps of these methodologies, the common geometric transformations, the similarity measures and accuracy assessment techniques are introduced and described.- "In calculating group average surface areas, we corrected for multiple sources of distortion, resulting in estimated sizes that were very similar to those of a typical individual subject. Further progress might be obtained by using more detailed functional information to drive the MSM registration, such as the signed distance transform of areas (Yeo et al., 2010) or by using more detailed retinotopic information (e.g., horizontal and vertical meridia, or even continuously varying eccentricity and polar angle maps). We also addressed an underappreciated problem of systematic drift that can occur during iterative methods of registration to a template. "

[Show abstract] [Hide abstract]**ABSTRACT:**We generated probabilistic area maps and maximum probability maps (MPMs) for a set of 18 retinotopic areas previously mapped in individual subjects (Georgieva et al., 2009 and Kolster et al., 2010) using four different inter-subject registration methods. The best results were obtained using a recently developed multimodal surface matching method. The best set of MPMs had relatively smooth borders between visual areas and group average area sizes that matched the typical size in individual subjects. Comparisons between retinotopic areas and maps of estimated cortical myelin content revealed the following correspondences: (i) areas V1, V2, and V3 are heavily myelinated; (ii) the MT + cluster is heavily myelinated, with a peak near the MT/pMSTv border; (iii) a dorsal myelin density peak corresponds to area V3D; (iv) the phPIT cluster is lightly myelinated; and (v) myelin density differs across the four areas of the V3A complex. Comparison of the retinotopic MPM with cytoarchitectonic areas, including those previously mapped to the fs_LR cortical surface atlas, revealed a correspondence between areas V1-3 and hOc1-3, respectively, but little correspondence beyond V3. These results indicate that architectonic and retinotopic areal boundaries are in agreement in some regions, and that retinotopy provides a finer-grained parcellation in other regions. The atlas datasets from this analysis are freely available as a resource for other studies that will benefit from retinotopic and myelin density map landmarks in human visual cortex.

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

This publication is from a journal that may support self archiving.

Learn more