Content uploaded by Alexandre Bône

Author content

All content in this area was uploaded by Alexandre Bône on Jul 03, 2020

Content may be subject to copyright.

International Journal of Computer Vision

https://doi.org/10.1007/s11263-020-01343-w

Learning the spatiotemporal variability in longitudinal shape data sets

Alexandre Bˆ

one1,2,3,4,5·Olivier Colliot1,2,3,4,5·Stanley Durrleman1,2,3,4,5·

for the Alzheimer’s Disease Neuroimaging Initiative†

Received: 12 April 2019 / Accepted: 19 May 2020

Abstract In this paper, we propose a generative statistical

model to learn the spatiotemporal variability in longitudinal

shape data sets, which contain repeated observations of a set

of objects or individuals over time. From all the short-term

sequences of individual data, the method estimates a long-

term normative scenario of shape changes and a tubular co-

ordinate system around this trajectory. Each individual data

sequence is therefore (i) mapped onto a speciﬁc portion of

the trajectory accounting for differences in pace of progres-

sion across individuals, and (ii) shifted in the shape space

to account for intrinsic shape differences across individu-

als that are independent of the progression of the observed

process. The parameters of the model are estimated using

a stochastic approximation of the expectation-maximization

algorithm. The proposed approach is validated on a simu-

lated data set, illustrated on the analysis of facial expres-

sion in video sequences, and applied to the modeling of the

progressive atrophy of the hippocampus in Alzheimer’s dis-

ease patients. These experiments show that one can use the

method to reconstruct data at the precision of the noise, to

highlight signiﬁcant factors that may modulate the progres-

sion, and to simulate entirely synthetic longitudinal data sets

reproducing the variability of the observed process.

Keywords Longitudinal data ·Statistical shape analysis ·

Large deformation diffeomorphic metric mapping ·Medical

imaging ·Disease progression modeling

1Institut du Cerveau, ICM, F-75013, Paris, France

2Inserm, U 1127, F-75013, Paris, France

3CNRS, UMR 7225, F-75013, Paris, France

4Sorbonne Universit´

e, F-75013, Paris, France

5Inria, Aramis project-team, F-75013, Paris, France

†Data used in preparation of this article were partly obtained from

the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. As

such, the investigators within the ADNI contributed to the design and

implementation of ADNI and/or provided data but did not participate

in analysis or writing of this report. A complete listing of ADNI inves-

tigators can be found at: adni.loni.usc.edu.

1 Introduction

1.1 Motivation

Video sequences of smiling faces, repeated measurements

of growing plants or developing cells, medical images col-

lected at multiple visits from a population of patients af-

fected by a chronic disease: all these examples can be un-

derstood as data collections where individual instances of a

common underlying process are observed at multiple time-

points. Such collections are called longitudinal data sets.

The individual processes are thought to result from ran-

dom variations of a common underlying process (or few of

them). Because of the dynamic nature of the observed pro-

cesses, one might decompose the variability in two compo-

nents: the dynamic or temporal variability on the one hand,

and the time-independent or spatial variability on the other

hand. In our examples, the variability in the pace of growth

or in age at disease onset is understood as temporal variabil-

ity. By contrast, there are also intrinsic inter-individual dif-

ferences in height, weight or shape that are independent of

the pace at which the plant grows or the disease progresses,

which we call spatial variability. The main difﬁculty here is

that growth or disease progression affect also height, weight

or shape, so that the differences between two observations of

two different samples are due to (i) the fact that the two in-

dividuals are observed at different stages of the process, and

(ii) that they have different intrinsic characteristics. Disen-

tangling these two sources of variability would not be pos-

sible if one had only one observation per individual. Having

repeated observations of the individuals over time, as in lon-

gitudinal data sets, implies that one could decompose the

changes due to the progression of the process from those

due to intrinsic differences that are independent of the pro-

gression.

2 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

The goal of this paper is to propose a statistical learn-

ing method which can describe the spatiotemporal variabil-

ity in a longitudinal data set. We focus here on shape data,

where the shape may be encoded by an image, or by geomet-

rical objects extracted from images such as curves, surface

meshes or segmented volumes.

One of the main difﬁculties is that the experimental de-

sign often provides little control on the temporal sampling

of the observations. We are interested here by processes for

which there is no clear marker of progression, such as the

progression of neurodegenerative diseases for which the age

at disease onset is hard to determine. Therefore there is no

easy way to re-align in time the individual data sequences to

analyze the inter-individual variability at each stage of the

process. By contrast, the method needs to learn how the indi-

vidual data sequences position themselves in relation to each

other. Furthermore, the follow-up period of the observations

rarely covers the whole process, but often just a small part

of it. In clinical studies for examples, patients may be fol-

lowed for few years whereas the disease may progress over

decades. Eventually, dealing with shape data raises the need

for generic representations of such data that can be included

in computational approaches.

1.2 Related work

Structured data like shapes can be advantageously repre-

sented as elements of curved spaces, such as Riemmanian

manifolds, in order to account for the prior on their struc-

ture. Either deﬁned by invariance [34, 56, 57] or topology-

preserving properties [7, 15, 21, 33, 51], shape spaces deﬁne

distance metrics adapted to the geometry of a well-identiﬁed

class of objects, such as brain magnetic resonance images

or segmented organs. These data representations allows the

generalization of the mean-variance analysis [3, 28, 50, 65],

which learns the geometrical distribution of a cross-sectional

data set in terms of an average shape, and variability-encoding

parameters. Typical healthy or pathological conﬁgurations

can be summarized in this manner, thus opening the way to

automatic diagnosis at the individual level. Time-series data

sets, consisting in the repeated observation of the same ob-

ject at successive time-points, can be described by general-

ized regression approaches on the same shape spaces [6, 26,

27, 29, 39, 49]. A time-continuous scenario of geometrical

transformation is then estimated, offering in turn individual-

ized interpolation and extrapolation methods. The statistical

analysis of longitudinal data sets requires to extend the con-

cept of generalized mean-variance for such time series. In

other words, it requires the deﬁnition of a statistical distri-

bution of curves drawn on a shape space.

Shape spaces are usually equipped with a differential

structure of inﬁnite dimension. In particular, the large defor-

mation diffeomorphic metric mapping (LDDMM) approach

deﬁnes shape spaces as orbits of template shapes under the

action of an inﬁnite-dimensional parametric group of dif-

feomorphisms of the 2D/3D ambient space [63]. With this

approach, the geometrical differences between two objects

are captured by estimating the diffeomorphic transforma-

tion that warps one into the other. More recent works pro-

pose ﬁnite-dimensional approaches built on the same princi-

ples: [64] uses truncated Fourier transforms to build a ﬁnite-

dimensional Lie algebra, and [20] constructs a ﬁnite-dimensional

Riemannian manifold based on a set of self-interacting par-

ticles.

Such structures are favorable to the analysis of longitudi-

nal data sets because they naturally offer the parallel trans-

port operator [40], which allows to compare tangent-space

vectors at distant points in a relevant manner. This operator

is key to compare trajectories on the manifold, and therefore

to analyze longitudinal data. In [56, 57] for instance, trajec-

tories on manifolds are compared by parallel-transporting

their initial velocity vectors back to some privileged point of

the manifold, thereby handling the spatial variability if a ref-

erence conﬁguration and reference time-point is known. A

similar approach is followed in [35] where medical images

are analyzed in a voxel-wise fashion, or also in [54] with the

co-adjoint transport instead of the parallel transport. In [16],

the variability of a large longitudinal data set of thalamus

shapes is analyzed by transporting individual residual defor-

mations along a common and pre-computed trajectory, back

to a baseline point. In [52, 53] the authors deﬁne the exp-

parallelization operator which extends the notion of parallel

lines to Riemannian manifolds. The works [10, 36] build on

this operator to analyze dynamic networks and shape ob-

jects respectively. Other approaches propose to work on a

space of trajectories, such as in [47] where the Sasaki met-

ric is used to deﬁne distances between geodesic curves on

a manifold, or in [12] which requires the same number of

observations per subject.

If parallel transport allows to spatially align manifold-

valued trajectories, a temporal alignment mechanism is also

needed for data sets with variability in the individual pro-

gression dynamics. For instance, two patients developing the

same neurological disease have no reason to reach the same

disease stage at the same age, nor to have synchronous pro-

gressions. A solution is to use time-warp functions, which

deﬁne a mapping between an abstract common reference

time frame and the individual time lines [9,10, 21,35, 36,45,

52, 53]. In [56, 57], the authors build on the square-root ve-

locity ﬁelds framework to quotient the space of spatiotempo-

ral paths by diffeomorphic time-warps. In [48], a monotonic

Gaussian process is built from a set of temporal sources.

Learning the spatiotemporal variability in longitudinal shape data sets 3

1.3 Contributions

In this paper, we propose a method that learns an average

progression and its spatiotemporal variability from a lon-

gitudinal shape data set. The average progression takes the

form of a geodesic curve in the ﬁnite-dimensional Rieman-

nian approximation of the LDDMM framework of [22]. The

concept of exp-parallelization introduced in [52, 53] is then

applied in this context to deﬁne a tubular coordinate system,

also called Fermi coordinates, around the average geodesic.

The average trajectory and its coordinate system are auto-

matically learned by the method, so that every individual

data sequence is mapped to a speciﬁc portion of the aver-

age trajectory to account for the temporal variability, and

shifted in the shape space to account for the spatial vari-

ability. The calibration of the resulting generative statisti-

cal model is done by adapting a stochastic approximation

EM method. This paper extends the conference paper [10],

with ﬁner modeling of the variability in the individual paces

of progression, and an original optimization method for ac-

celerated model calibration. The proposed approach is val-

idated on a simulated data set, illustrated on a facial ex-

pression recognition task, and applied to hippocampus shape

progression modeling in Alzheimer’s disease.

Section 2 deﬁnes the concept of shape spatiotemporal

coordinate systems, which allows the introduction of the

generative statistical model in Section 3. Section 4 details

the calibration, personalization and simulation algorithms,

which are evaluated and illustrated in Section 5. These ex-

periments will evaluate the goodness-of-ﬁt of the model, the

relevance of the representation of the spatiotemporal vari-

ability for the identiﬁcation of factors explaining this vari-

ability, and the ability of the model to generate synthetic

data sets that reproduce the observed variability in the train-

ing data set.

2 Shape spatiotemporal reference frame

Within LDDMM frameworks, shape are positioned with re-

spect to a reference shape, often called atlas or template. A

coordinate system is deﬁned in the tangent-space at the tem-

plate shape. We propose here to replace the template (which

is a single shape) by a curve (i.e. a shape trajectory), and

the coordinate system by a tubular spatiotemporal coordi-

nate system centered around the template trajectory. We ﬁrst

review the usual construction of a static template shape be-

fore extending it to the spatiotemporal case.

2.1 Positioning a shape with respect to a static atlas

Positioning a target shape ywith respect to a static reference

y0is called the registration problem. Deformation-based mor-

phometry solves it by estimating a diffeomorphism φ1of the

ambient space Rd(d= 2 or 3) that transforms y0into y,

which we note φ1? y0=y. In the context of LDDMM, dif-

feomorphisms are constructed by following the streamlines

of dynamic vector ﬁelds t→vt∈C∞

0(Rd,Rd)over [0,1]:

∂tφt=vt◦φtwith φ0=Id.(1)

Following the approach in ﬁnite-dimension of [22], we fur-

ther assume that any vtwrites as the Gaussian convolution

of pmomentum vectors mt=m(1)

t, ..., m(p)

t∈Rdover a

corresponding set of control points ct=c(1)

t, ..., c(p)

t∈Rd:

vt:x∈Rd→

p

X

k=1

gc(k)

t, x·m(k)

t∈Rd(2)

with g:x, x0∈Rd→exp kx0−xk2

`2/σ2the Gaussian

kernel function of kernel width σ > 0. Many other diffeo-

morphisms constructed in this manner might actually trans-

form y0into φ1?y0: we call solution of the registration prob-

lem the most regular transformation i.e. that minimizes its

“kinetic” energy:

1

2Z1

t=0 kvtk2

Gct=1

2Z1

t=0

m>

t·Gct·mt(3)

where ∀t∈[0,1],Gctis the p×p“kernel” symmetric

positive-deﬁnite matrix of general term g[c(k)

t, c(l)

t], and (.)>

is the matrix transposition. Such energy-minimizing curves,

also called geodesics, are such that the control points and the

momentum vectors trajectories are fully determined by their

initial values and the following Hamiltonian equations [46]:

˙ct=Gct·mt; ˙mt=−1

2∇ctm>

t·Gct·mt(4)

where ∇x(.)is the gradient operator with respect to x. As-

suming that there exist a diffeomorphism φ1constructed ac-

cording to equations (1, 2) such that φ1?y0=y, this last re-

sult allows to compactly represent the positioning of ywith

respect to y0with a set of pcontrol points c0and attached

momenta m0. In other words, m0is the coordinate of yin

the coordinate system deﬁned by (c0, y0). In practice, the

perfect registration constraint φ1? y0=yis relaxed, and we

call solution to the registration problem the extremal-path

diffeomorphism φ1that warps y0as close as possible to y,

for some extrinsic error measure dE(y0, y). In this paper, the

following choices are considered for dE, depending on the

nature of y0and y:

–the `2metric for meshes with point-to-point correspon-

dence (i.e. the sum of squared differences between point

positions),

–the current metric [14, 59] for oriented surface meshes

without point-to-point correspondence (details are given

in appendix for the reader’s convenience).

4 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

Noting y0as a collection y(1)

0, ..., y(K)

0of Kpoints of Rd,

φ1acts independently and directly on each point y(k)

0ac-

cording to φ1? y(k)

0=φ1◦y(k)

0. Note that the methodology

introduced in this section can be adapted straightforwardly

to image data, by deﬁning the action of the diffeomorphisms

φon the image Ias I◦φ−1and using the sum of squared

differences between image intensities as the error measure.

2.2 Riemannian structure

Let c0be a set of pcontrol points. We deﬁne:

Dc0=φ1|∂tφt=vt◦φt, φ0=Id, vt=Conv(ct, mt)

( ˙ct,˙mt) = Ham(ct, mt), m0∈Rp×d(5)

where Conv(., .)and Ham(., .)are compact notations for the

convolution operator deﬁned by equation (2) and the Hamil-

tonian equations (4) respectively. Equipped at any φ∈ Dc0

!"

#"

$"

$

%& '"

'

("

#

(

)

*+,-.,

/.,012/.,

3452

6

78

9

3452

:

#"#

("; 6<#"= > ("

%& ; 6 #"

'"

!"

<?=

!"

<@= !"

<A=

!"

<B=

!"

<C=

$"

<?=

6 #

6<#= > ("

Fig. 1: [Left]. Shape spatiotemporal reference frame y0, c0, m0, t0with respect to which a shape yadmits coordinates t∈R,

v∈v⊥

0⊂Tγ(t0)Dc0. Three spaces are involved: the manifold of control points Rp×d(top), the manifold of diffeomorphisms

Dc0(middle), and the shape submanifold Sy0,c0of the extrinsic shape space E(bottom). The momenta m0, m and the velocity

ﬁelds v0, v are in one-to-one correspondence. The velocity ﬁeld v, also called space-shift, is parallel-transported along the

geodesic γby the operator t→Pv

γ(t). Figure 2 illustrates the effect of parallel transport on Dc0.

[Right]. Illustrations of the manifolds abstractly depicted on the left side of the ﬁgure. Each row displays two example

elements of the corresponding geodesic (solid black lines on the left panel). The two columns correspond respectively to the

times t0and t.

Learning the spatiotemporal variability in longitudinal shape data sets 5

!

"

Fig. 2: [Bottom]. Illustration of a shape geodesic t→γ(t)? y0: the man-like shape (solid black contour) raises his left arm.

This geodesic is parametrized by a single set of control points c0(black dots) and attached momentum vectors m0(bold

blue arrows), to which corresponds the velocity ﬁeld v0(light blue arrows). A second set of momentum vectors m(bold red

arrows) attached to the same control points c0parametrizes the exp-parallelization of this shape geodesic.

[Top]. Exp-parallel shape curve t→η(t)? y0to the shape geodesic γ ? y0: the exp-parallelization transfers the arm-raising

motion from one man-like shape to another.

with the local metric G−1

φ(c0),Dc0has the structure of a Rie-

mannian manifold of dimension p×d. The tangent-space at

φis the set of velocity ﬁelds obtained by convolving any set

of momentum on φ(c0):

TφDc0=Conv(φ(c0), m)|m∈Rp×d.(6)

The geodesics of Dc0are the curves t→φtof constant ki-

netic energy (see equation (3)) i.e. such that the correspond-

ing control points and momenta trajectories t→ct, mtsat-

isfy the Hamiltonian equations (4). We deﬁne the exponen-

tial operator on Dc0:

Expt0,t

φ:v0∈TφDc0→φt∈ Dc0(7)

where φtis the diffeomorphism reached at time tby the

geodesic path obtained by integration from some reference

time t0∈Rwith initial conditions φ(c0),m0such that

v0=Conv(φ(c0), m0), and φ0=φ. The momentum vec-

tor m0is the dual of the velocity ﬁeld v0. The particular

case Exp0,1

φcorresponds to the usual Riemannian exponen-

tial map and will be noted Expφ. Diffeomorphisms φ∈ Dc0

act on shapes of the ambient space ythrough the action ?

previously deﬁned. Let y0be a reference shape. We deﬁne

its orbit under the action ?:

Sy0,c0=Dc0? y0={φ ? y0|φ∈ Dc0}.(8)

Sy0,c0is a submanifold of the extrinsic shape space Ein

which is deﬁned the distance dE.

2.3 Positioning a shape with respect to a dynamic atlas

Instead of positioning shapes with respect to a static atlas

y0, we aim now to position shapes with respect to a shape

geodesic t→γ(t)? y0, where γis a geodesic of Dc0of

the form γ(t) = Expt0,t

Id (v0)with v0=Conv(c0, m0). Sim-

ilarly to cylindrical coordinates in Euclidian spaces, under

some conditions (see [30, 43]) a shape y∈ Sy0,c0admits a

unique spatiotemporal coordinate, also known as Fermi co-

ordinates, t∈Rand v∈Tγ(t0)Dc0such that v⊥˙γ(t0):

y= ExpPv

γ(t)?y0with ExpPv

γ(t) = Expγ(t)Pv

γ(t)(9)

where Pv

γ(t)denotes the parallel transport of valong γfrom

t0to t. The curves γand η:t→ExpPv

γ(t)are said exp-

parallel, and the mapping γ→ηis called exp-parallelization

along v[52, 53]. In other words, a choice of y0, c0, m0, t0

deﬁnes a spatiotemporal reference frame, with respect to

which a shape ycan be unambiguously positioned in terms

of a time tand a velocity ﬁeld vorthogonal to v0= ˙γ(t0).

The time tis the temporal component of the coordinate which

positions the shape along the reference trajectory given by

the direction v0. The velocity vis the spatial component

of the coordinate, which positions the shape in the hyper-

plane that is orthogonal to v0. This decomposition can also

be understood as the orthogonal projection of yonto the one-

dimensional submanifold γ ? y0, hence the condition v⊥v0.

6 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

Figures 1 and 2 illustrate this concept of spatiotemporal ref-

erence frame, in which any shape yadmits the coordinates

t, v. Note that the time-point t0does not play any particular

role, in the sense that ycan be described in the same manner

for any other choice t0

0; a one-to-one transformation of the

spatiotemporal reference frame can actually be derived as:

t0=t+t0

0−t0and v0= Pv

γ(t0

0).(10)

In general, the target shape ymight not exactly belong to

Sy0,c0. Similarly to the static atlas case, equation (9) is re-

laxed and we call solution to the longitudinal registration

problem the pair t,vsuch that y0is warped as close as pos-

sible to y, in the sense of the extrinsic metric dE.

3 Statistical model for longitudinal data sets of shapes

3.1 Hierarchical generative model

Let {yi,j , ti,j }i,j be a longitudinal data set of shapes, which

is the collection of repeated individual measurements yi,1,

...,yi,nifor i= 1, ..., n, where each shape yi,j corresponds

to a time ti,j ∈R. Measurements are considered as sam-

ple points along individual trajectories, which are in turn

considered exp-parallel to a reference geodesic curve, there-

fore having a constant spatial coordinate in the spatiotempo-

ral reference frame centered around this reference geodesic.

Noting y0,c0,m0,t0the parameters of the spatiotempo-

ral coordinate system, v0=Conv(c0, m0)and γ:t→

Expt0,t

c0(v0)the reference geodesic, the statistical model writes:

ExpPvi

γψi(ti,j )? y0

iid

∼ NEyi,j , σ2

),

where

ψi:t→αi·(t−τi) + t0,

vi=Conv(c0, mi), mi=A0,m⊥

0·si,

and

αi

iid

∼N[0,+∞[(1, σ2

α), τi

iid

∼N(t0, σ2

τ),

si

iid

∼N(0,1)

(11)

where the noise distribution NEµ, σ2

)is deﬁned such that

the likelihood is proportional to p(y)∝exp(−dE(y−µ)2/2σ2

).

Model (11) is hierarchical in the sense that individual trajec-

tories t→ExpPvi

γ◦ψi(t)are independently deﬁned as spa-

tiotemporal transformations of a common, population-level

geodesic t→γ(t).

The time-warp functions ψiencode the temporal vari-

ability of the observed individual trajectories in terms of

pace of progression αiand onset time τi. They map the in-

dex ti,j of the j-th shape of the i-th individual (e.g. the age

of the subject at a given visit), to a time-point ψi(ti,j)on the

reference geodesic (e.g. the disease stage).

The spatial variability is encoded by the space-shifts vi∈

v⊥

0⊂Tγ(t0)Dc0along which γis exp-parallelized. Those

space-shifts admit dual representations under the form of the

momenta mi, which are assumed to derive from qsource

parameters si=s(1)

i, ..., s(q)

i, in the spirit of independent

component analysis (ICA) [31]. The orthogonality vi⊥v0,

necessary for the identiﬁability of the model, is ensured by

the projection of each column of the (p·d)×qmixing ma-

trix A0onto the hyperplane m⊥

0of Rp×dfor the cometric

Gc0, noted A0,m⊥

0. The individual parameters are modeled

as independent samples from normal distributions:

–a truncated normal distribution with ﬁxed mean for the

acceleration factor αi, allowing the identiﬁability of m0;

–a normal distribution for the onset time τi;

–a normal distribution with ﬁxed mean and variance for

the sources si, allowing the identiﬁability of y0and A0,m⊥

0

respectively.

These parameters deﬁne individual trajectories as random

spatiotemporal transformations of a common reference tra-

jectory. The spatial and temporal transformations commute,

in the sense that ∀t∈R,ExpPvi

γ◦ψi= ExpPvi

γ◦ψi. The

population trajectory is fully parameterized by the template

shape y0, the control points c0, the momenta m0and the ref-

erence time t0. The individual variability is unambiguously

represented by two reduced sets of scalar parameters: the ac-

celeration αiand the onset time τifor the temporal part, and

the sources s(1)

i, ..., s(q)

ifor the spatial part. In practice, it is

possible to choose a number of sources qp×dmuch

lower than the dimension of the tangent-space Tγ(t0)Dc0

while still capturing most of the geometrical variability in

the data.

3.2 Mixed-effects and Bayesian modeling

We further specify the formulation of the model (11) to ﬁt

the framework of mixed-effects models. We distinguish:

–the ﬁxed-effects θ= (θ1, θ2)with θ1= (t0, στ, σα, σ)

and θ2= (y0, c0, m0, A0), also called the model param-

eters,

–the random effects z= (zi)iwhere zi= (αi, τi, si).

We choose to work in a Bayesian framework, in order to the-

oretically ensure the existence of the maximum a posteriori

(MAP) estimate of the parameters θm. Such priors also reg-

ularize and guide the estimation procedure thanks to reason-

able and mild prior assumptions on the optimal ﬁxed effects

values. The following standard conjugate distributions are

selected as Bayesian priors on the model parameters:

t0∼ N(t0, ς 2

t), y0∼ N(y0, ς 2

y),

σ2

τ∼ IG(mτ, ς 2

τ), c0∼ N(c0, ς 2

c),

σ2

α∼ IG(mα, ς 2

α), m0∼ N(m0, ς 2

m),

σ2

∼ IG(m, ς 2

), A0∼ N(A0, ς 2

A),

where IG(., .)denotes the inverse-gamma distribution.

Learning the spatiotemporal variability in longitudinal shape data sets 7

4 Algorithms: calibration, personalization, simulation

4.1 Objectives

Given a longitudinal data set of shapes {yi,j, ti,j }i,j that we

may note more compactly {y, t}, we formulate three algo-

rithmic objectives:

–Calibration, which consists in computing the MAP pa-

rameters θm, unconditionally to any random effect z:

θm=argmaxθZp{y}, z, θ ;{t}·dz. (12)

–Personalization, which consist in computing the MAP

random effects zmthat best represent some longitudinal

shape data set {y, t}(which may or may not be the one

used for calibration), given the calibrated model θm:

zm=argmaxzp{y}, z, θm;{t}.(13)

–Simulation, which consist in generating a new data set

{ys}that resembles the original data set {y}.

We give now the details of the algorithms to solve these opti-

mization problems. Their implementation is freely available

in the software Deformetrica (ﬁnd the install instructions

and the documentation at www.deformetrica.org).

4.2 Computation of the complete log-likelihood

Evaluating the joint log-likelihood log p({y}, z, θ ;{t}) =

Pn

i=1 Pni

j=1 log p(yi,j , zi, θ ;ti,j )for some set of parame-

ters θand random effects z= (zi)iis central for both cal-

ibration and personalization algorithms. The computation-

ally most intensive part is the computation of the conditional

log-likelihood log p(yi,j |zi, θ ;ti,j ), which amounts to syn-

thesize the candidate data for the current values of the ﬁxed

and random-effects (θ, zi)and measure its discrepancy with

the true observation yi,j. The synthesis of the data follows

the generative model introduced in Section 2 and essentially

requires the integration of ordinary differential equations.

Algorithm 1 details the procedure, where |E| denotes the

dimension of the extrinsic shape shape E, and F(.)the cu-

mulative distribution function of the standard Gaussian. The

“source index” refers to the ICA components.

4.3 Calibration

4.3.1 Initialization procedure for model calibration

A good choice of initial parameters θ[0] and latent variables

z[0] improves the convergence speed of the calibration al-

gorithm. We propose in this section an initialization proce-

dure that combines several elementary shape analysis tools.

Given a longitudinal data set of shapes {yi,j, ti,j }i,j :

Algorithm 1: Compute the complete log-likelihood.

input : Longitudinal data set of shapes {y , t}={yi,j, ti,j }i,j .

Population parameters θ=y0, c0, m0, a0, t0, στ, σα, σ.

Individual parameters z= (zi)iwith zi=αi, τi, si.

output: The complete log-likelihood Q= log p({y}, z, θ ;{t}).

Set Q= 0.// initialization

/*compute the squared residuals 2

i,j for each visit */

Compute the initial velocity ﬁeld v0=Conv(c0, m0).

Compute the geodesic γ:t→Expt0,t

Id (v0).// see [22]

for the source index l= 1 to q

Compute the l-th column of A0,m⊥

0, projecting Coll(A0)on m⊥

0.

Compute the initial velocity ﬁeld wl=Convc0,Coll(A0,m⊥

0).

Compute the parallel transport wl:t→Pwl

γ(t).// see [41]

end

for the individual index i= 1 to n

for the visit index j= 1 to nj

Compute the time-warped age ψi,j =αi·(ti,j −τi) + t0.

Compute the initial velocity ﬁeld vi,j =Pq

l=1 s(l)

i·wl(ψi,j ).

Compute φi,j = Expγ(ψi,j)(vi,j )◦γ(ψi,j ).// see [22]

Compute the squared residual 2

i,j =dE(yi,j , φi,j ? y0)2.

/*add the model log-likelihood log p(yi,j |zi, θ ;ti,j )*/

Update Q←Q−1

2|E| · log σ2

+2

i,j /σ2

.

end

/*add the random effects log-likelihood log p(zi|θ)*/

Update Q←Q−1

2log σ2

τ+(τi−t0)2/σ2

τ+ksik2

`2+log σ2

α+

log(1 −F(−1/σα))2+ (αi−1)2/σ2

α.

end

/*add the Bayesian prior log-likelihood log p(θ)*/

Update Q←Q−1

2(t0−t0)2/ς2

t+mτ(log σ2

τ+ς2

τ/σ2

τ) +

mα(log σ2

α+ς2

α/σ2

α) + m(log σ2

+ς2

/σ2

).// log p(θ1)

Update Q←Q−1

2ky0−y0k2

`2/ς2

y+kc0−c0k2

`2/ς2

c+

km0−m0k2

`2/ς2

m+

A0−A0

2

`2/ς2

A.// log p(θ2)

1. estimate a Bayesian atlas model (see [28]) from the base-

line shapes {yi,1}i, to get an approximate population-

level average geometry y0

0, c0

0as well as nspace-shift

momenta m0

imapping this geometry to the baseline ob-

servations, and an estimate of the noise level σ0

;

2. for i= 1, ..., n, estimate a geodesic regression model

(see [26]) from the individual time-series {yi,j}j, then

parallel transport (see [41]) the computed individual ini-

tial momenta back to the mean geometry y0

0, c0

0along

the corresponding space-shift m0

i, and ﬁnally compute

the Euclidean average of those w0

ito get an approximate

population-level mean momenta m0

0=hw0

iii;

3. for i= 1, ..., n, initialize the individual temporal param-

eters with τi=hti,j ij,α2

i=w0

i·Gc0

0·m0

0

m0

0·Gc0

0·m0

0if this value is

positive and αi= 1 otherwise, then compute σ0

τand σ0

α

according to equations (16) and (17) respectively;

4. solve a standard ICA problem with qcomponents from

the collection of space-shift momenta w0

i,m0⊥

0

prelimi-

narily projected on the orthogonal space to m0

0, and set

A0

0as the estimated mixing matrix;

8 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

5. shoot forward the mean geometry y0

0, c0

0in the direction

m0

0with length t00

0−t0

0where t0

0=hti,1iiand t00

0=

hti,j ii,j to get longitudinally centered estimates c00

0,y00

0,

m00

0, and parallel-transport the qcolumns of A0

0along the

same geodesic to obtain A00

0;

6. personalize the model given by the initial parameters

θ[0] = (y00

0, c00

0, m00

0, A00

0, t00

0, σ0

τ, σ0

α, σ0

)to obtain z[0].

4.3.2 The MCMC-SAEM-GD algorithm

Calibration is a computationally-intensive task for mostly

two reasons. First, the optimized variable θis of high di-

mension |θ|= 4 + |y0|+d·p·(2 + q)where dis the dimen-

sion of the ambient space, pthe number of control points,

qthe number of sources, and |y0|the number of vertices

necessary to describe the template mesh. Second, the opti-

mized function requires the computation of the integral over

the latent variables. The term p({y}, z, θ ;{t})can only be

evaluated for some given random-effect values z, by solving

sets of ordinary differential equations (see Algorithm 1). In

this paper, we propose to address this computational chal-

lenge by combining the MCMC-SAEM algorithm with gra-

dient descent (GD). The backbone of this algorithm is the

SAEM algorithm [18], which is a stochastic approximation

(SA) of the classical expectation-maximization (EM) algo-

rithm [19]: are alternated a stochastic simulation step z[k]∼

p(z|{y}, θ[k−1] ;{t})of the latent variables followed by a

deterministic update of the model parameters θ[k]←θ?(z[k]).

Algorithm 2: Calibration with MCMC-SAEM-GD.

input : Dataset y. Initial parameters θ[0] and z[0] .

Sequence of step-sizes (ρ[k])k. Sampling variances (σ(b))b.

output: Estimation of θm≈θ[k].

Set k= 0 and S[0]

1=S1(z[0]).// initialization, eq. (14)

repeat

Set k←k+ 1.

/*block Gibbs symmetric random walk sampling */

foreach random variable z(b)in (z(1), z(2) , z(3))=(α, τ, s)do

Draw a candidate z(b)∼ N (z[k−1](b), σ2

b).

Let z= (z[k](1), ..., z [k](b−1), z(b), z[k−1](b+1) , ...).

Compute the ratio ω=log p({y},z,θ[k−1];{t})

p({y},z[k−1],θ [k−1];{t}).// alg. 1

Draw uaccording to the uniform distribution u∼ U(0,1).

if log u<ωthen z[k](b)←z(b)else z[k](b)←z[k−1](b).

end

Adapt the proposal variances (σ(b))b.// see [5]

/*analytical update rule for θ1(classical SAEM) */

Set S[k]

1←S[k−1]

1+ρ[k]·S1(z[k])−S[k−1]

1.// eq. (14)

Set θ[k]

1←θ?

1(S[k])// eqs. (15) -(18)

/*gradient-descent-based update heuristic for θ2*/

Solve θ?

2=argmax

θ2

p({y}, z[k], θ[k]

1, θ2;{t})by GD. // alg. 1

Set θ[k]

2←θ[k−1]

2+ρ[k]·hθ?

2−θ[k−1]

2i.// heuristic

until convergence

In [37], the authors introduce the MCMC-SAEM algorithm,

where the simulation step is replaced by a Markov chain

Monte-Carlo (MCMC) step while still preserving the the-

oretical convergence properties. In this paper, an analytical

update rule θ?cannot be found for all the parameters θ: we

use a gradient descent approach to overcome this difﬁculty,

and we name MCMC-SAEM-GD the global resulting algo-

rithm. Algorithm 2 gives a high-level pseudo-code of the

proposed procedure. The sufﬁcient statistics write:

St=1

n

n

X

i=1

τi, Sα=1

n

n

X

i=1

(αi−1)2,(14)

Sτ=1

n

n

X

i=1

τ2

i, S=1

|E| · n· hniii

n

X

i=1

ni

X

j=1

2

i,j ,

where 2

i,j =dE{yi,j ,ExpPvi

γψi(ti,j )? y0}2and hniiiis

the average number of longitudinal observations per subject.

The update rules write:

t?

0=ς2

tSt+σ?

τ

2

nt0·ς2

t+σ?

τ

2

n−1

(15)

σ?

τ=hSτ−2t?

0St+t?

0

2+mτ

nς2

τi1

2·h1 + mτ

ni−1

2(16)

σ?

α=hSα+mα

nς2

αi1

2·1−f(−1/σ?

α)/σ?

α

1−F(−1/σ?

α)+mα

n−1

2

(17)

σ?

=S+m

|E| nhniii

ς2

1

2

·1 + m

|E| nhniii−1

2

(18)

where f(.)is the probability density function of the standard

normal distribution. Both the coupled set of equations (15)-

(16) and the implicit equation (17) can easily be solved by

iterative update. Equation (18) is closed-form.

4.3.3 Implementation details

The sequence of ρ[k]required by Algorithm 2 is chosen to be

constantly equal to 1 in a preliminary “burn-in” phase of the

calibration procedure, and then decreases with the iterations

with an exponential decay. The fanning numerical scheme

is used to compute the parallel transport along geodesics in

a scalable manner [41, 42, 62]. A block Metropolis-Hasting-

within-Gibbs approach is used for the MCMC sampling step,

where each variable αi,τiand siare successively sampled.

Several transition kernels can be chained in order to decrease

the correlation between z[k−1] and z[k]. Proposal variances

are dynamically adapted during the iterations to ensure that

the acceptation rates remain close to 30% [5]. The optimiza-

tion problem for the update of θ2is solved by steepest gradi-

ent descent. The gradients of the complete log-likelihood are

obtained by autodifferentiation using the PyTorch library.

Learning the spatiotemporal variability in longitudinal shape data sets 9

The PyKeops library [13] implements smart autodifferen-

tiation methods for convolution intensive computations to

avoid memory overﬂows.

4.4 Personalization

Once the model is calibrated using a training data set, any

individual data sequence {yi,j, ti,j }jcan be reconstructed

by ﬁtting the model (whether it was part of the training set

or not, i.e. i≤nor i > n respectively). This procedure,

called here personalization, consists in solving the optimiza-

tion problem deﬁned by equation (13). Note that all indi-

viduals can be treated independently i.e. equation (13) is

equivalent to solving several sub-problems of the form zm

i=

argmaxzilog p({yi,j}j, zi, θ m;{ti,j}j). The computed opti-

mal latent variables zm

igive in turn the spatiotemporal coor-

dinates of the individual trajectory in the reference frame of

the calibrated model θm. We use the L-BFGS optimization

method [38], where gradients are automatically computed

using the PyTorch autodifferentiation library.

4.5 Simulation

The purpose of the simulation is to take advantage of the

generative nature of the model to generate an entirely syn-

thetic data set that reproduces the characteristics of the orig-

inal training data set.

Given a longitudinal data set, the calibration followed by

the personalization to the training data yields a normative

model of progression, a spatiotemporal coordinate system

(both being encoded by the parameters θm) and the coordi-

nates of each individual in this reference frame (i.e. zm=

(zm

i)i). We denote ep(zm,{t})the empirical joint distribu-

tion of those individual parameters and of the corresponding

time-indices {t}. We simulate synthetic data {ys}by sam-

pling random variables from this empirical distribution, and

generate data following the generative model (11).

We use statistic functions ζ(most often not sufﬁcient,

similarly to [44]) to evaluate to which extend the simulated

data resemble the training data:

ζ({ys})≈ζ({y}),with ({ys}iid

∼p({y}|zs, θm;{ts})

(zs,{ts})iid

∼ep(zm,{t}).

(19)

For visualization purposes, we may choose to ignore the cal-

ibrated variance of noise σm

and replace it with the degen-

erated value σ= 0. This choice will generate smoother

shapes, and we will call such simulations “without noise”.

5 Experiments

5.1 Validation on synthetic shape data

In this section, the calibration, personalization and simula-

tion algorithms are validated on a synthetic data set in 2D.

The simulation algorithm is ﬁrst used in Section 5.1.1 to

generate a synthetic shape data set from a chosen ground

truth model. We use then the calibration method to infer the

model parameters from the synthetic data set. The perfor-

mance and the stability of the calibration algorithm is eval-

uated in various settings in Section 5.1.2. The calibrated

model is personalized in Section 5.1.3, and the learned in-

dividual parameters are compared to the true values. Even-

tually, we re-simulate a synthetic data set from the calibrated

model, and assess in Section 5.1.4 to which extend this new

synthetic data set has similar statistics as the original data

set.

5.1.1 Simulating synthetic data from a ground truth model

We choose values of ﬁxed effects θ=(y0,c0,m0,A0,t0,στ,

σα,σ), which speciﬁes a normal distribution of shape tra-

jectories. The chosen geometrical parameters y0, c0, m0, A0

are shown in Figure 3. In addition, we choose t0=0,στ=2,

σα=0.2and σ∈{0.00,0.01,0.02,0.03,0.05}.

We use the generative model to simulate a total of n∈

{50,100,200}individual trajectories and to sample them at

several time-points {ti,j}ni

j=1. We draw the number of ob-

servations for each individual ni≥2according to a shifted

Poisson distribution with parameter E(ni)∈{3,5,7,9}. We

−2−1 0 1 2

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

−2 0 2

−2

−1

0

1

2

−2 0 2

−2

−1

0

1

2

−2 0 2

−2

−1

0

1

2−2 0 2

−2

−1

0

1

2

Fig. 3: Visualization of the parameters θ1=(y0, c0, m0, A0).

The template shape y0is in solid black, the control points c0

are the ﬁve dot points in either blue or red, the momenta

m0is the bold blue arrow, and the four columns of A0are

the bold red arrows. The velocity ﬁelds corresponding to the

momenta m0or the geometrical components of A0are re-

spectively represented with light blue or light red arrows.

The green dots and arrows on the top ﬁgure mark the four

landmark positions that will be considered for the statistic ζ,

in order to validate the simulation algorithm in Section 5.1.4.

10 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

ﬁnally impose that the individual time-points {ti,j}jare uni-

formly distributed in the observation interval [ti,1, ti,ni] =

[ti,0−∆ti/2, ti,0+∆ti/2], where both the observation time

window ∆ti=ti,ni−ti,1and the mid-point ti,0are drawn

according to normal distributions: ∆ti

iid

∼NE(ni)−2, σ 2

τ

and ti,0

iid

∼N(t0, σ2

τ). Figure 4 displays some generated data

in the reference case where σ= 0.02 and E(ni)=7.

5.1.2 Model calibration

The model calibration outputs are the estimated population

parameters θ= (θ1, θ2)with θ1= (t0, στ, σα, σ)and θ2=

(y0, c0, m0, A0). They are expected to be close to the MAP

θmdeﬁned by equation (12).

Computing the MAP. Because only a ﬁnite number of data

points are available and a Bayesian prior p(θ)is assumed on

the parameters θ, the MAP θmdoes not correspond exactly

to the ground truth parameters θt. The calibrated parameters

θare expected to converge towards the corresponding MAP

parameters θmwhen the number of iterations goes to inﬁnity

– and this section experimentally veriﬁes it, when the MAP

parameters θmare known to converge towards the ground

truth parameters θtwhen the number of observations goes

to inﬁnity. For each conﬁguration of ground truth parame-

ters θtand particular random sampling ztof the generative

model they deﬁne, θm

1can be analytically computed with

equations (15-18) and θm

2approximated by a steepest gra-

dient descent approach initialized with θtand zt(see Algo-

rithm 2). The calibration error between θand θmis analyzed

and discussed in details in the rest of this section, when the

statistical error between θmand θtis only computed in the

reference conﬁguration. Using the performance metrics in-

troduced in the following paragraph, the second line of Ta-

ble 1 (in italicized text) gives the corresponding quantitative

normalized distances, which remain below 6% in all cases.

Normalized error metrics. The error for the scalar parame-

ters θ1= (t0, στ, σα, σ)is measured by the absolute differ-

ence between estimated and MAP values. The error for t0

is normalized by the characteristic population observation

window, that we deﬁne as 2·(1 + σα)·[E(∆ti)/2 + στ].

The remaining errors for στ, σα, σare respectively normal-

ized by the true standard deviations στ= 2,σα= 0.2, and

the estimated noise level by the Bayesian atlas model [28]

computed during the initialization pipeline described in Sec-

tion 4.3.1. The error on the template shape y0is assessed

as the maximum point-to-point residual distance, and nor-

malized by the conservative value of 3 spatial units for the

characteristic size of the considered shape (see Figure 3).

The control points c0and the momenta m0are jointly eval-

uated through the `2distance of the estimated velocity ﬁeld

v0=Conv(c0, m0)to the MAP value, normalized by the `2

norm of this MAP velocity ﬁeld. Finally, the convergence

y0(%) c0, m0(%) c0, A0(%) t0(%) στ(%) σα(%) σ(%)

reference 2.5 ±0.01 6.2 ±0.10 2.1 ±0.02 8.8 ±0.06 1.7 ±0.28 7.0 ±2.73 7.7 ±0.01

statistical error 0.0 0.4 0.1 2.5 5.6 3.0 0.0

σ= 0.00 2.7 ±0.01 5.9 ±0.17 2.5 ±0.05 7.4 ±0.24 2.6 ±0.60 8.1 ±5.24 36.4 ±0.03

σ= 0.01 6.0 ±0.01 2.5 ±0.09 1.7 ±0.01 2.7 ±0.07 0.7 ±0.21 0.8 ±0.44 15.1 ±0.03

σ= 0.02 2.5 ±0.01 6.2 ±0.10 2.1 ±0.02 8.8 ±0.06 1.7 ±0.28 7.0 ±2.73 7.7 ±0.01

σ= 0.03 1.4 ±0.01 1.9 ±0.09 1.5 ±0.03 4.8 ±0.08 2.3 ±0.38 3.8 ±1.23 5.1 ±0.01

σ= 0.05 1.8 ±0.02 4.1 ±0.21 1.9 ±0.04 1.5 ±0.10 1.1 ±0.34 0.9 ±0.33 1.7 ±0.01

E(ni)=3 1.6 ±0.02 5.7 ±0.29 2.1 ±0.05 5.2 ±0.22 3.9 ±1.12 7.1 ±2.03 6.5 ±0.03

E(ni)=5 4.4 ±0.01 5.9 ±0.32 2.7 ±0.03 1.1 ±0.15 5.8 ±0.56 0.7 ±0.36 6.7 ±0.02

E(ni)=7 2.5 ±0.01 6.2 ±0.10 2.1 ±0.02 8.8 ±0.06 1.7 ±0.28 7.0 ±2.73 7.7 ±0.01

E(ni)=9 2.4 ±0.01 3.2 ±0.05 1.7 ±0.04 7.2 ±0.03 1.8 ±0.11 2.7 ±0.17 6.3 ±0.01

q= 2 2.5 ±0.04 18.0 ±0.28 51.3 ±0.06 10.6 ±0.16 9.3 ±0.52 3.1 ±0.84 172.4 ±0.07

q= 4 2.5 ±0.01 6.2 ±0.10 2.1 ±0.02 8.8 ±0.06 1.7 ±0.28 7.0 ±2.73 7.7 ±0.01

q= 6 2.5 ±0.01 6.8 ±0.10 1.9 ±0.03 8.9 ±0.07 1.7 ±0.20 9.3 ±1.80 7.0 ±0.02

n= 50 3.4 ±0.01 3.6 ±0.16 2.2 ±0.03 4.7 ±0.04 0.2 ±0.16 2.0 ±0.17 7.6 ±0.02

n= 100 2.5 ±0.01 6.2 ±0.10 2.1 ±0.02 8.8 ±0.06 1.7 ±0.28 7.0 ±2.73 7.7 ±0.01

n= 200 1.8 ±0.00 3.9 ±0.08 1.5 ±0.02 2.6 ±0.04 0.8 ±0.12 3.7 ±0.19 6.9 ±0.01

Table 1: Final average normalized performance metrics and associated standard deviations, obtained after 10 independent

runs of the MCMC-SAEM algorithm in varied conﬁgurations. The algorithm is run for 200 iterations in all conﬁgurations.

The reference conﬁguration corresponds to a noise level σ= 0.02, an average number of visits per subject E(ni) = 7,

q= 4 allowed components of geometrical variability, and n= 100 input subjects. The second line gives the discrepancy

between the ground truth (used for generating the data) and the MAP (used for evaluating the calibration performance), in

the reference case. We call this discrepancy the statistical error, by opposition to the calibration error.

Learning the spatiotemporal variability in longitudinal shape data sets 11

0.29 1.77 3.25

-3.89 -2.96 -2.03 -1.10 -0.18 0.75 1.68

-1.48 -1.02 -0.57 -0.12 0.33 0.79

-15.00 -10.00 -5.00 0.00 5.00 10.00 15.00

-3.20 -2.19 -1.19 -0.18 0.82 1.82 2.83

-3.19 -2.31 -1.43 -0.56 0.32 1.20 2.08

-2.57 -2.21 -1.84 -1.48 -1.11 -0.75

simulate (with noise)

learned

model

reconstructed data personalize

simulated data

learning data true model

calibrate

simulate

(without noise)

Fig. 4: Illustration of the evaluation procedure for model calibration, and subsequent personalization or simulation from the

learned model. The ground truth population geodesic is plotted in black on the central line. From this model are simulated

n= 100 individual spatiotemporal trajectories: three randomly-picked samples are plotted in black on the top lines. The

population geodesic of the calibrated model is plotted in green on the central line, superimposed with the ground truth

geodesic. This calibrated model can then be personalized to the training observations as plotted in red, or leveraged to

simulate new spatiotemporal trajectories that resemble the original data set as plotted in blue.

of modulation matrix A0is assessed by measuring the mis-

match between the sets of space-shifts that can be gener-

ated from the pair of parameters (c0, A0): (i) the estimated

modulation matrix is ﬁrst re-projected on the MAP control

points, (ii) the linear subspaces generated by the columns

of the MAP and estimated modulation matrices are deﬁned,

(iii) the matrix representations of the projectors over those

subspaces are computed, (iv) the average of the four great-

est eigenvalues of their difference captures the mismatch be-

tween those projectors, (v) the result is normalized by the

largest eigenvalue of the MAP projector.

Evaluation setups and results. In addition to the previously

introduced setups, conﬁgurations with varying allowed num-

ber of sources q∈ {2,4,6}are also evaluated. We call the

conﬁguration with σ= 0.02,q= 4,n= 100, and E(ni)=7

the reference one. Augmented with the 11 conﬁgurations

differing from this reference by a single parameter, a total of

12 calibration problems are deﬁned. Each is solved 10 times

by running the stochastic MCMC-SAEM-GD algorithm.

12 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

Figure 5 plots the evolution of the error metrics across the

allowed 200 iterations for the reference conﬁguration: the

black lines correspond to the 10 different runs, and in green

is represented their mean and standard deviation. The al-

gorithm is stable i.e. converges to similar results at each

run: the ﬁnal standard deviation of the error is smaller than

10% of the maximal (initial) error, for all parameters. The

two regimes of the algorithm can be identiﬁed: the burn-

in phase for the ﬁrst half of the iterations where the step-

sizes ρ[k]remained ﬁxed to 1, followed by the concentration

phase where the step-sizes decrease geometrically. We can

ﬁnally notice that σαis estimated with more variance than

other parameters, and than στin particular. This suggests

that adding higher-order components to the time-warp func-

tions ψiwould come with estimation challenges.

Table 1 gives the average ﬁnal error metrics and the associ-

ated standard deviations. Those standard deviations remain

in all but one case below 3%, underlining the stability of the

estimation algorithm. In most cases, the parameters are es-

timated with less than 10% of error: exceptions only appear

in the conﬁgurations with very low noise levels σ≤0.01

or underestimated number of geometrical sources q= 2.

Interestingly, a higher noise level σdoes not necessarily

correlate with a degraded estimation of the parameters. The

presence of noise can actually help the algorithm to better

explore the space of parameters: it seemed that local min-

ima may be harder to escape for very low levels of noise. In

particular, the estimation performance of the noise variance

σ2

improves when the true value increases. Table 1 stud-

ies also the impact of the length of the observation period

E(ni). Long periods generally favour more accurate esti-

mation of the parameters v0=Conv(c0, m0)which encodes

the direction of the progression, and (στ, σα)which capture

its dynamical variability. However, because of compensation

mechanisms that may take place at the individual level be-

tween αiand τi, it is rather the joint quantity στ+σαthat is

clearly better estimated when E(ni)increases than στand

σαindependently.

The same table compares also the estimation quality when

the true number of sources is underestimated (q= 2), per-

fectly chosen (q= 4) or overestimated (q= 6). The recon-

struction ability of our model, measured by σ, increases

with q, and seems to saturate once the optimal number of

sources has been reached. The large estimation error made

on σin the case q=2 comes from the fact that data was sim-

ulated from exactly four geometrical components of compa-

rable importance (see Figure 3), thus creating a strong re-

construction performance thresholding effect when choos-

ing q < 4. One can expect smoother variations of the recon-

structive performance on real data sets, which do not result

from the exact simulation of the generative model. Parame-

ters are in majority less well estimated in the q= 2 conﬁg-

uration, and at comparable distance to the MAP in the two

remaining ones. Finally, Table 1 shows that the number of

0 50 100 150 200

2.50

2.52

2.54

2.56

2.58

error on y0(%)

0 50 100 150 200

6.0

6.2

6.4

6.6

6.8

7.0

error on c0,m0(%)

0 50 100 150 200

2.1

2.2

2.3

error on c0,A0(%)

0 50 100 150 200

9

10

11

error on t0(%)

0 50 100 150 200

0

10

20

30

40

error on στ(%)

0 50 100 150 200

0

10

20

30

40

error on σα(%)

0 50 100 150 200

8

10

12

14

16

18

error on σe(%)

Fig. 5: Evolution of the error metrics across the 200 allowed iterations of the MCMC-SAEM algorithm, for the reference

conﬁguration: noise standard deviation σ= 0.02,q= 4 estimated components of geometrical variability, learning on a data

set composed of n= 100 with on average E(ni) = 7 visits per subject, spanning 5 time units. The 10 solid black curves

correspond to 10 independent runs of the same – stochastic – MCMC-SAEM algorithm; the bold green curve is their average

and the light green region indicates the associated standard deviation. The algorithm consistently converges towards similar

parameters at each run, and those estimated parameters are satisfyingly close to the MAP estimate.

Learning the spatiotemporal variability in longitudinal shape data sets 13

training subjects nhas a major inﬂuence over the quality of

the estimation. Almost all metrics are improved in the con-

ﬁguration with n=200 subjects.

In conclusion, the proposed MCMC-SAEM-GD algo-

rithm successfully solves our model calibration problem in

varied conﬁgurations. The stochastic procedure is stable across

independent repetitions. The presence of noise in the train-

ing data is well-handled, and actually seems to act as a good

regularizer for the estimation procedure. An underestimated

number of sources does not harm the convergence of the

procedure, but mostly impairs the reconstruction ability of

the learned model. This number should therefore be grad-

ually increased to meet the reconstruction goals of the ex-

perimenter, keeping in mind that an intrinsic optimal per-

formance will be reached when qis large enough. Finally,

increasing the number of subjects or the number of visits

are both beneﬁcial for model calibration.

5.1.3 Personalization after calibration

Once calibrated, the longitudinal shape models are person-

alized to the training data. The estimated individual parame-

ters αi, τi, siare compared to their true value. In order to be

comparable with the true sources, the estimated sources are

ﬁrst brought back to the cotangent space deﬁned by the true

control points ct

0by solving Conv(ct

0, mt

i) = Conv(c0, mi).

Figure 6 plots the estimated ziagainst the true corre-

sponding values. The acceleration factors are well aligned

on the bisector. The onset ages and sources are also esti-

mated with a low variance, but with a non-negligible bias.

0.6 0.8 1.0 1.2 1.4

true (αi)i

0.6

0.8

1.0

1.2

1.4

estimated (αi)i

R2=0.920

66 68 70 72 74 76

true (τi)i

67.5

70.0

72.5

75.0

estimated (τi)i

R2=0.965

−2−1 0 1 2

true (si)i

−3

−2

−1

0

1

2

estimated (si)i

R2=0.999

geometrical component 1

geometrical component 2

geometrical component 3

geometrical component 4

Fig. 6: Comparison of the estimated individual parameters

zi= (αi, τi, si)after personalization of the mean calibrated

model to the simulated observations, in the reference sce-

nario. In each scatter plot, the identity is represented by the

solid black line. The R2value for the sources is an average

over the four geometrical components.

This effect is due to the fact that the estimation of individ-

ual parameters during personalization may compensate for

some error made during the estimation of population param-

eters during calibration. Time-shifts τimay compensate an

error on the reference time t0. Acceleration factors αimay

compensate for an error in the norm of v0. Sources simay

compensate for an error in the norm of the columns of A0.

These effects do not question the identiﬁability of the

model, but rather suggest that, for a ﬁnite number of ob-

servations, the likelihood may have a rather ﬂat maximum,

for which a range of parameter values may reconstruct data

almost equally well. Finally, two outliers can be noticed in

Figure 6 for the pace of progression αi, as well as for the

onset age τi. These outliers correspond to extremely reduced

windows of observation ti,ni−ti,1, respectively equal to 0.07

and 0.20, when the theoretical mean is equal to 5.

Table 2 summarizes the results in all conﬁgurations, giv-

ing for each of the twelve considered setups the median error

and associated median absolute deviation on zi= (αi, τi, si)

when personalizing the average calibrated model. The me-

dian is reported instead of the mean because it is more robust

to outliers. Focusing on the estimation variability, it appears

that the sources siare the best estimated parameters, fol-

lowed by the pace of progression αiand the onset ages τi.

∆αi(%) ∆τi(%) ∆si(%)

reference 4.4±11.9 44.2±15.3 −11.7±3.3

σ= 0.00 −3.8±4.3 42.4±24.8 −5.4±6.8

σ= 0.01 −2.1±6.4 11.8±9.4 −11.1±12.0

σ= 0.02 4.4±11.9 44.2±15.3 −11.7±3.3

σ= 0.03 −7.6±15.7 25.5±15.0 −6.7±2.5

σ= 0.05 −21.8±23.4 4.4±14.7 −9.1±3.0

E(ni)=3 9.3±32.7 −12.1±19.3 −8.8±3.2

E(ni)=5 2.9±13.0 3.2±20.8 −3.6±7.5

E(ni)=7 4.4±11.9 44.2±15.3 −11.7±3.3

E(ni)=9 −8.2±7.3 46.0±16.1 −7.6±2.3

q= 2 −1.6±18.0 48.3±50.7 −2.4±69.4

q= 4 4.4±11.9 44.2±15.3 −11.7±3.3

q= 6 9.8±12.3 45.3±15.8 −11.7±3.3

n= 50 −12.9±8.1 24.8±18.3 −6.9±6.1

n= 100 4.4±11.9 44.2±15.3 −11.7±3.3

n= 200 9.8±10.5 13.2±10.9 −13.1±2.4

Table 2: Median of the residual errors and associated median

absolute deviation times 1.4826 for the estimated individ-

ual parameters, expressed in percentage of the correspond-

ing ground-truth standard deviations σα= 0.2,στ= 2 and

σs= 1. The results are given for the reference scenario plus

eleven perturbed scenarii, where either the noise level σ,

the average number of visits per subject E(ni), the allowed

number of geometrical components qor the number of sub-

jects nis varied.

14 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

0.0 0.5 1.0 1.5

left arm vertical position

0.0

0.2

0.4

0.6

0.8

1.0

cumulative distribution function

raw

reconstructed

simulated

0.0

0.5

1.0

1.5

probability distribution function

−0.5 0.0 0.5 1.0 1.5 2.0

right arm vertical position

0.0

0.2

0.4

0.6

0.8

1.0

cumulative distribution function

raw

reconstructed

simulated

0.0

0.2

0.4

0.6

0.8

1.0

probability distribution function

−2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5

left leg horizontal position

0.0

0.2

0.4

0.6

0.8

1.0

cumulative distribution function

raw

reconstructed

simulated

0.0

0.2

0.4

0.6

0.8

probability distribution function

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

right leg horizontal position

0.0

0.2

0.4

0.6

0.8

1.0

cumulative distribution function

raw

reconstructed

simulated

0.0

0.2

0.4

0.6

0.8

1.0

probability distribution function

Fig. 7: Distribution of the position of landmarks of interest in the raw (i.e. original), reconstructed (by personalization of the

calibrated model) and simulated data sets, for the reference conﬁguration. Those landmarks of interest are indicated by green

dots on Figure 3. The simulated distributions are similar to the corresponding raw ones, suggesting that the spatiotemporal

variability of the original data set has been successfully captured.

The estimation of the pace of progression αiquickly dete-

riorates with increasing levels of noise σ, reaching almost

25% of the true standard deviation σα= 0.2in the most

noisy conﬁguration. The estimation of the onset ages τiand

sources seems more robust, with no clear tendency. The es-

timation of the pace αiimproves when the number of visits

per subject E(ni)increases. The same trend can be noticed

for the onset age τi, although with a reduced amplitude. The

sources siremain well-estimated in all scenarii. No clear

difference can be noticed between the reference q= 4 and

the over-estimated number of geometrical components case

q= 6, suggesting that adding components does not hamper

the personalization of a calibrated model. However, under-

estimating this number of components with q= 2 deteri-

orates the estimation of the sources si, and the dynamical

parameters αiand τito a lesser extend. As in the previous

section, we interpret this large performance drop due to the

fact that data was simulated according to exactly four ge-

ometrical sources of similar magnitude (see Figure 3): in

real data sets, one may expect the estimation performance to

change more smoothly with q. Finally, an increased number

of subjects nallows a better performance of the personaliza-

tion algorithm, especially for the onset age τiand source si

parameters.

5.1.4 Simulation after calibration and personalization

After calibration and personalization, the learned model and

empirical distribution of the random effects can be used to

simulate entirely synthetic shape trajectories. Figure 4 gives

some randomly selected samples from such simulated tra-

jectories for the reference scenario, where (see equation (19)):

–the ﬁxed effects θmare averages over the 10 calibrations;

–the random effects zsare drawn according to indepen-

dent normal distributions with mean and standard devi-

ations equal to the values given by Table 2;

–the visit ages tsare drawn according to the true pro-

cedure based on the average calibrated values for t0,

στ, and the empirical average hniiifor E(ni)(see Sec-

tion 5.1.1).

Figure 7 compares the distribution of vertical or horizontal

positions of the tips of the original (see Section 5.1.1), re-

constructed (see Section 5.1.3) and simulated observations.

Those landmarks of interest are indicated by green dots and

arrows on Figure 3, and form the statistic ζintroduced in

equation (19). A total of 1,000 subjects are simulated, when

only 100 were available for model calibration. The three dis-

tributions largely overlap, indicating that the learned distri-

bution of shape trajectories reproduces the true distribution.

Learning the spatiotemporal variability in longitudinal shape data sets 15

happiness

male female

sadess

male female

surprise

male female

fear

male female

disgust

male female

anger

male female

Fig. 8: Learned emotion spatiotemporal models. The population geodesic is plotted in green, and the shifted progressions

along the gender mode of geometrical variability are plotted in black.

16 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

5.2 Dynamic facial expression

5.2.1 Data and preprocessing

The Birmingham University 3D dynamic facial expression

database [61] gathers short video sequences from 101 sub-

jects (of which 58 female, 43 male). Each subject mimics in

6 distinct sequences basic emotions which are Anger, Dis-

gust, Fear, Happiness, Sadness and Surprise. For each of

those 606 sequences we uniformly extract 8 frames span-

ning from the ﬁrst to the 36-th one, which correspond to

a subsampling of the ﬁrst 1.4 seconds of each video. We

do not work directly with the images, but with a set of 75

semi-automatically extracted landmarks, which come with

this data set. Every set of 3D landmarks is registered to a

reference one by similarity-based Procrustes alignment.

5.2.2 Model calibration: learned emotion models

We learn 6 distinct longitudinal atlas models: one per emo-

tion, calibrated on the n= 101 sequences of ni= 8 frames

for all subjects i. We choose q= 10 sources. Figure 8 shows

in green the estimated average scenario for each emotion.

Qualitatively, those average scenarii show a typical pattern

of facial expression. The Disgust, Fear, Happiness and Sur-

prise models feature large displacements in the area of the

mouth in particular. The Sadness expression is more mute,

with a subtle displacement of the eyebrows. The Anger model

shows a combined displacement of both eyes and eyebrows.

5.2.3 Gender-speciﬁc emotion patterns

The estimated models are personalized to the correspond-

ing training data sets, giving for each sequence an optimal

zi= (αi, τi, si). We only focus on exploiting the individ-

ual source parameters si∈Rq=R10 in this section. For

each model, we ﬁt a 1D partial least square regression model

for predicting the gender from a linear combination of the

sources variables si[1]. We then test whether the linear com-

bination of the sources are signiﬁcantly different between

men and women using a Student t-test. All p-values are smaller

than 10−5, thus showing signiﬁcant differences in the geom-

etry of the face between genders that are independent of the

pattern of expression.

Figure 8 shows the typical scenario for men and women,

which are built by translating the mean scenario in the direc-

tion of the average of the sources for each gender (in black).

For all emotion models, male subjects tend to have wider

faces than females, as it can very clearly be seen in the area

of the cheeks or of the nose for the Anger and Surprise mod-

els.

Angry Disgust Fear Happy Sad Surprise

Angry 64.3 7.0 8.1 4.0 16.6 -

Disgust 13.7 55.1 12.4 14.8 1.9 2.0

Fear 1.0 16.6 58.6 13.9 7.0 3.0

Happy 1.9 6.0 13.0 79.1 - -

Sad 16.5 2.0 14.2 1.1 66.2 -

Surprise 1.0 3.0 16.0 - 1.0 79.1

Table 3: Average confusion matrix across 5-fold linear dis-

criminant classiﬁcation. The sequence features consist in a

12-scalar vector that stacks the 6 pairs of dynamical param-

eters αi,τiobtained by personalizing the 6 emotion models.

The average accuracy is 67.08 %.

5.2.4 Application to classiﬁcation

We propose to automatically recognize the emotion from a

sequence based on the personalization of each facial expres-

sion model to the sequence. We propose here to use the dy-

namic variables αi,τifor classiﬁcation.

More precisely, we perform a 5-fold cross-validation en-

suring that each group is gender-balanced. For each split:

–six longitudinal shape models are learned on the training

sequences for each emotion;

–these models are personalized to all the 606 sequences:

for each sequence a total of 6 zivectors are therefore

estimated;

–for each sequence, the estimated temporal parameters

αi, τiare stacked into vectors of 6×2 = 12 scalars;

–these feature vectors are used to train and test a simple

linear discriminant classiﬁer on the corresponding train

and test sequences.

Table 3 gives the confusion matrix obtained with this pro-

cedure, averaged over the 5 folds. The average classiﬁcation

accuracy is 67.08 %, above the chance level which amounts

to 16.67 %. For comparison, [4] reported an average accu-

racy of almost 100 %, [58] of 90.44 %, and [23] of 74.63 %.

We emphasize however that our performance is achieved:

–using the default linear discriminant analysis from the

sklearn library, without any hyperparameter tuning as

in [4] with random forest, in [58] with hidden markov

model or in [23] with radial support vector machine;

–on all the 606 available sequences, without any manual

selection of a subset of 60/101 subjects as it is done in

[4, 58] or of 507/606 sequences as done in [23];

–based only on 12 intuitive scalar features per sequence,

that encode how an individual emotional pattern dynam-

ically compares to population models of basic emotions.

From this experiment that has not been particularly tuned

to achieve best classiﬁcation performance, we conclude that

our model captured shape characteristics that are speciﬁc to

Learning the spatiotemporal variability in longitudinal shape data sets 17

(a) Left hippocampus mean progression.

(b) Right hippocampus mean progression.

Fig. 9: Typical model of hippocampus atrophy from MCI to Alzheimer’s disease stage. Physiological ages (from left to right,

in years): 58.6, 63.0, 67.4, 71.8, 76.2, 80.7, 85.1, 89.5, 93.9.

each emotion. It is worth noting that we used here only dy-

namic parameters that capture how fast or slow the face is

changing in the sequence, or with which delay.

5.3 Hippocampal atrophy in Alzheimer’s disease

5.3.1 Data and preprocessing

Data used in the preparation of this section were obtained

from the Alzheimer’s Disease Neuroimaging Initiative (ADNI)

database (adni.loni.usc.edu).

We select all the T1-weighted MRIs of subjects that were

diagnosed as presenting mild cognitive impairements at some

visit, and diagnosed as converted to Alzheimer’s disease at

some later visit. See Table 4 for summary statistics. This

data set amounts to a total of 1993 visits from n= 322

subjects. Second-take “re-test” MR images are available for

1838 of those visits and will be used to estimate the noise

Number of subjects 322

Number of visits 1993

Average number of visits per subject (±std) 5.8 (±2.4)

Average age (±std) 74.0 (±6.7)

Sex ratio (F/M in %) 41.2 / 58.8

Amyloid status (+/-/unknown in %) 73.2 / 7.1 / 19.7

APOE carriership (%) 65.2

Education (mean ±std, in years) 15.9 (±2.8)

Marital status (married/not married in %) 80.9 / 19.1

Table 4: Summary statistics of the medical data set of

Alzheimer’s disease patients.

in the data. All those 1993 + 1838 = 3831 images are

pre-processed exactly in the same manner, starting with the

longitudinal pipeline of FreeSurfer1(version 5.3.0) [24,25].

The skull-stripped brains are then aligned with an afﬁne 12-

degrees-of-freedom transformation onto the Colin27 aver-

age brain2with FSL 5.03[60]. Meshes of the left and right

hippocampus are obtained from the original images as fol-

low:

–the volumetric segmentations of the hippocampus com-

puted with FreeSurfer are transformed into meshes using

the aseg2srf script of July 20094,

–the resulting meshes are decimated by a 88% factor us-

ing Paraview 5.4.15[2],

–they are aligned using the previously-computed global

afﬁne transformation estimated with the FSL software,

–residual pose differences among subjects are removed

by rigidly aligning the meshes from the baseline image

of each subject to the corresponding hippocampus mesh

in the Colin27 atlas image, this transformation with 6

degrees of freedom being computed with the GMMReg

script of June 20086[32],

–the same transformation is ﬁnally used to align the meshes

from the follow-up images of the same subject.

1available at: https://surfer.nmr.mgh.harvard.edu

2available at: http://www.bic.mni.mcgill.ca/ServicesAtlases/Colin27

3available at: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/

4available at: https://brainder.org

5available at: www.paraview.org

6available at: https://github.com/bing-jian/gmmreg

18 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

5.3.2 Models of atrophy of the hippocampus

We calibrate two longitudinal shape models on all the 1993

meshes of the left and right hippocampus respectively, choos-

ing in both cases q= 8 sources. The deformation kernel

width is set to σ= 10 mm. The current distance is used

to compute distances between meshes without point corre-

spondence, with a kernel width of σE= 5 mm [14, 59].

Figure 9 shows the estimated average progression, which

consists in an overall atrophy of both the left and right hip-

pocampus with a speciﬁc deformation of their shape. It is

worth noting that we reconstruct here the progressive at-

rophy of the hippocampus over more than 30 years of dis-

ease progression although patients have never been observed

for more than few years. This can be achieved because the

method automatically re-aligns in time the data of patients

that are at different, but unknown, disease stage.

5.3.3 Personalization to unseen data

We assess the reconstruction performance of the calibrated

models using a 5-fold cross-validation. The n= 322 sub-

jects are split into 5 groups; 2×5distinct shape models

40 60 80 100 120 140 160 180 200

current-metric absolute residual (mm2)

0.0

0.2

0.4

0.6

0.8

1.0

cumulative distribution function

measurement errors

reconstruction errors

0.000

0.005

0.010

0.015

0.020

0.025

0.030

probability distribution function

(a) Left hippocampus. The mean error is 68.5±15.9mm2for the

shape model, and 83.2±36.0mm2for the re-test measurement.

50 75 100 125 150 175 200

current-metric absolute residual (mm2)

0.0

0.2

0.4

0.6

0.8

1.0

cumulative distribution function

measurement errors

reconstruction errors

0.000

0.005

0.010

0.015

0.020

0.025

0.030

probability distribution function

(b) Right hippocampus. The mean error is 69.8±15.0mm2for the

shape model, and 85.2±40.1mm2for the re-test measurement.

Fig. 10: Comparison of the generalization error to unseen

data of the learned shape models and the intrinsic measure-

ment error. The discrepancies between meshes are computed

with the current metric with σE= 5 mm, without assuming

any point-to-point correspondence.

are calibrated on the training sets for the left and right hip-

pocampus. Those models are then personalized to the un-

seen test subjects. To assess the goodness of ﬁt, we measure

the residual errors and compared the distribution of such er-

rors with the noise distribution. This noise distribution is de-

termined by measuring the distance between the two meshes

extracted from the “test” and “re-test” images acquired from

the same patient the same day, thus capturing all the vari-

ability due to varying image quality and its consequence in

the processing. Figure 10 shows the superimposition of the

distribution of the residual errors with the distribution of the

differences between the meshes of the test and re-test im-

ages. The reconstruction errors are on average smaller than

the intrinsic uncertainty on the data, and with a lower vari-

ance as well. The model allows therefore to reconstruct in-

dividual data at the precision of the noise. It is worth noting

that this could be achieved using a reduced set of 2×10

scalars, which are for each hippocampus the pace of pro-

gression αi, the onset age τi, and the eight sources si.

5.3.4 Association with co-factors

We calibrate and personalize the models on whole data set,

and aim to study how some genetic, biological and environ-

mental co-factors may modulate the progression of Alzheimer’s

left hippocampus right hippocampus

genetic

gender

female vs.

male

αi×1.23 [**] ×1.21 [**]

τi−12.4months [**] −8.7months [*]

si±0.54 [***] ±0.57 [****]

APOE-4

carrier vs.

non-carrier

αi×1.22 [*]

τi−35.8months [***] −32.5months [**]

si

biological

amyloid

positive vs.

negative

αi×1.52 [**] ×1.67 [*]

τi

si

environmental

marital

married vs.

non-married

αi×1.14 [*]

τi−42.5months [***] −36.3months [**]

si

education

nb. of years

of education

αi

τi−3.7months/y [**] −5.1months/y [***]

si

Table 5: Signiﬁcant associations of individual parameters

with genetic, biological and environmental factors: effect

sizes and signiﬁcance levels of the adjusted p-values (thresh-

olds 5%, 1%, 0.1%, 0.01%). Time-shifts τiare in months,

others have no units. Directions of space-shifts are not

signed. The 23 subjects (out of n= 322) without amyloid

information have been discarded.

Learning the spatiotemporal variability in longitudinal shape data sets 19

disease in patients. We therefore aim to ﬁnd correlations be-

tween individual variables zi= (αi, τi, si)and the follow-

ing factors: gender, APOE-4 carriership, presence of amy-

loid plaques, education level and marital status.

To this end, the parameters αiand τiare regressed against

the ﬁve considered cofactors, and two-tailed t-tests are per-

formed on the coefﬁcients. A 2-block partial least square

regression model [1] is used to regress the eight sources si

against the ﬁve cofactors in a one-dimensional projection

space. A two-tailed t-test is then performed on the weights

of the multivariate regression of the linear combination of

sources against the cofactors. For each case, the obtained

ﬁve p-values are corrected with the Benjamini-Hochberg false

discovery rate procedure [8].

The obtained correlations for both left and right hip-

pocampus are summarized in Table 5. The two ﬁrst rows

indicate that the atrophy of the hippocampus develops faster

and starts earlier in female subjects. Male and female sub-

jects present signiﬁcantly different shape of their hippocam-

pus regardless of its atrophy due to aging or disease progres-

sion. Figure 11 presents the corresponding mode of geomet-

rical variability. Hippocampal atrophy also starts earlier in

carriers of at least one 4 allele of the APOE gene, with an

effect size of almost three years. The atrophy occurs at an

accelerated pace in amyloid-positive subject, as well as for

APOE-4 allele carriers and married subjects but only in a

signiﬁcant manner in the left hemisphere of the brain. Fi-

nally, the atrophy occurs earlier in married subjects, as well

as in educated subjects.

The results obtained by correlating the estimated indi-

vidual parameters ziwith the genetic and biological factors

are in line with current knowledge. The results obtained with

respect to the marital status are more surprising, and should

probably be taken with care as the non-married group, which

represents less than 20% of the considered 299 subjects (see

Table 5) is very heterogeneous. It gathers widowed, divorced,

or never married subjects. Finally, we show that the atrophy

starts earlier also in subjects with higher level of education.

This fact is not as counter-intuitive as it appears, and actually

is in line with the cognitive reserve theory [55], which sup-

ports the idea that education can help to compensate dam-

aged brain anatomy at the clinical level, maintaining un-

altered cognitive capacities for a period of time. In other

words, cognitive decline would be delayed with respect to

the onset of brain atrophy in educated subjects. Since, in ad-

dition, the age at diagnosis is not correlated with the number

of years of education in our dataset (r=−0.02 and p=0.70

according to a two-tailed test based on Pearson’s correla-

tion coefﬁcient), this explains why the subjects present an

increased atrophy of their hippocampi for an increased edu-

cation: they enrolled with a more advanced stage of anatom-

ical pathology, after some years of compensation.

A A

P

P

R L

(a) Coronal view.

P P

AA

L R

(b) Axial view.

Fig. 11: Superposition of the male-like (in blue) and the

female-like (in pink) hippocampus geometries, in two stan-

dard views. The letters L, R, A, P respectively indicate the

left, right, anterior and posterior directions.

5.3.5 Simulation of hippocampus atrophy due to AD

The calibrated models and the empirical distribution of ran-

dom effects ziestimated by their personalization to the train-

ing data are used to simulate synthetic progressions of the

hippocampus. In order to validate such a simulation method,

the simulated trajectories are sampled at several ages, and

the empirical distribution of the volumes of the simulated

hippocampus are compared to the distribution of the original

hippocampus. The volume is commonly used as a biomarker

in clinical studies, and we aim to assess if the simulated co-

hort could be used instead of the original one.

To do so, we simulate the same number of subjects as in

the training cohort (n= 322) with the same number of time-

points and same time interval between visits. Note that we

do not use the age at baseline, so that the sequence of obser-

vation time-points in the synthetic subjects may be shifted

in time compared to the real ones. We simulate according

to the empirical distribution of the individual parameters zi

and the age at baseline. There exists indeed a correlation be-

tween the estimated time-shift τiand the baseline age of the

enrolled subjects ti,1, as they tend to be included in the study

at similar disease stage. To be more precise:

20 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

–the empirical joint distribution of the time-related pa-

rameters αiand τiaugmented with the age at baseline

ti,1is computed using a kernel density estimation method;

–the empirical joint distribution of the time-related pa-

rameters augmented with the sources siis captured by

ﬁtting a multivariate Gaussian distribution.

A simulated data set is then created by applying 322 times

the following procedure:

–draw the acceleration factor αi, the onset age τiand the

baseline age ti,1from the corresponding kernel density;

–draw the sources sifrom the multivariate Gaussian con-

ditional distribution with respect to its already-drawn time-

related parameters;

–draw without replacement the sequence of visits of one

subject i.e. the number of visits and the time intervals

between them;

0.0

0.1

0.2

0.3

0.4

0.5

0.6

probability distribution function

1234567

hippocampal volume (cm3)

0.0

0.2

0.4

0.6

0.8

1.0

cumulative distribution function

raw

reconstructed

simulated

(a) Left hippocampus: the mean volume is 2958 ±779 mm3for raw

data, 2863 ±693 mm3for reconstructed data, and 2865 ±746 mm3

for simulated data.

0.0

0.1

0.2

0.3

0.4

0.5

0.6

probability distribution function

01234567

hippocampal volume (cm3)

0.0

0.2

0.4

0.6

0.8

1.0

cumulative distribution function

raw

reconstructed

simulated

(b) Right hippocampus: the mean volume is 3081 ±862 mm3for raw

data, 3014 ±754 mm3for reconstructed data, and 3063 ±763 mm3

for simulated data.

Fig. 12: Distribution of the left and right hippocampal vol-

ume in the raw, reconstructed and simulated data set. The

simulated volume distribution is very close to the volume

distribution of the reconstructed data set. The remaining bias

between those two distributions and the one corresponding

to the raw data comes from the smoothing behavior of the

current noise model, leveraged to deal with noisy meshes

without point correspondence. See Figure 13.

Fig. 13: Several views of a single example of the recon-

struction of a right hippocampus structure by the longitu-

dinal shape model. The reconstruction is the smooth white

structure, and the raw data point is plotted in red.

–sample the individual hippocampus trajectory deﬁned by

zi= (αi, τi, si)at the baseline age ti,1and the follow-

up visits.

This protocol is repeated for both the left and right hip-

pocampus, and for men and women (meaning that the es-

timation of the empirical distributions is done for both gen-

ders separately).

Figure 12 shows the volume distributions of the raw, re-

constructed and simulated data. The cumulative distribution

functions associated to the simulated and reconstructed dis-

tributions of hippocampal volumes are superimposed. This

result suggests that for this volume statistic, the simulated

and true data set could be used interchangeably. Raw and

reconstructed distributions does not superimpose so well,

because the model reconstructs smooth shapes whereas raw

meshes often have small protrusion pointing outward of the

surface which tend to bias volume computation (see Fig-

ure 13). This volume difference between the raw and recon-

structed meshes amounts on average to 84.5 mm3for the left

hippocampus and 67.3 mm3for the right hippocampus.

Now validated, the simulation algorithm could be used

to synthesize a data set of left and right hippocampus of any

number of subjects, with any desired visit sampling. The

proposed gender-wise split further allows to achieve any de-

sired male-female balance.

6 Conclusion

We proposed a statistical modeling approach that represents

individual data sequences as samples along continuous tra-

jectories, these trajectories being considered as spatiotempo-

ral perturbations of a population-average progression. The

spatial warp is deﬁned thanks to the exp-parallelization op-

erator on manifolds. The time warps are afﬁne time-reparameterizing

functions. The spatial and temporal individual parameters

Learning the spatiotemporal variability in longitudinal shape data sets 21

position the progression of each subject in a spatiotempo-

ral reference frame centered around the average trajectory

of the population.

We proposed calibration, personalization and simulation

algorithms to address different statistical questions. The cal-

ibration algorithm combines the MCMC-SAEM stochastic

approach with gradient descent to estimate the underlying

common process and its spatiotemporal variability from a

longitudinal data set of shapes. It does not require a common

time reference to be available across individual processes,

which furthermore may be observed each for only short pe-

riods of time. Personalizing such calibrated models to a new

individual data yields quantitative, low-dimensional and in-

terpretable measures of how the progression of an individ-

ual deviates from a normative scenario. These parameters

include an acceleration factor and a time-shift on the one

hand, and geometrical sources of variability on the other

hand. Such individual parameters offer relevant features for

classiﬁcation or correlation tasks, in a post-processing step.

The generative nature of the proposed model naturally of-

fers a simulation algorithm, which can generate entirely syn-

thetic data sets. Such data set may be sampled at any desired

temporal frequency, for any number of subjects and with a

full control over the population characteristics, for instance

in terms of gender balance.

We emphasize that the proposed modeling approach is

able to deal with meshes without any assumption on their

topology, in particular without assuming point-to-point cor-

respondence. It may be extended easily to deal with images

or other geometric primitives, provided that one can deﬁne

a metric between such objects.

The three proposed algorithms were validated in varied

simulated conﬁgurations, demonstrating their ability to re-

trieve the true parameters or reproduce the original data dis-

tribution. They were illustrated on a data set of facial ex-

pressions, showing the relevance of the learned normative

scenarios and the potential of the spatiotemporal parameters

for classiﬁcation. We apply the method also to large medical

data set of patients that develop Alzheimer’s disease. The

average scenarii of atrophy for the hippocampus subcorti-

cal structures are in line with current medical knowledge.

Individual sequences are successfully parametrized by 10

scalar spatiotemporal coordinates in the calibrated reference

frames. Correlating these coordinates with genetic, biolog-

ical and environmental factors gives valuable insights into

protective factors inﬂuencing age at onset or pace of pro-

gression. We also evidence typical shape differences across

sub-groups, which are independent of the shape changes due

to ageing or disease progression.

The calibration algorithm is computationally intensive:

estimating a model of hippocampus progression took around

a day. Our code is already parallel, combines CPU and GPU

together, and offers a ﬁne-grained initialization pipeline. Fur-

ther pure optimization of our code (among which multi-

GPU support, fast Fourier transforms for convolutions) is

planned, as well as evaluating the performance of variational

methods for calibration – which are not trivial to implement

in a longitudinal context without a ﬁxed number of observa-

tions per individual.

As for any modeling approach, our model relies on some

assumptions. For instance, subjects are considered to follow

trajectories that are parallel to the population average. This

hypothesis may be alleviated by introducing drift parame-

ters to model a progressive deviation from the average sce-

nario. Such a development would add to the complexity of

the model, which may require to have even more data to be

calibrated. Further extensions would consider also to esti-

mate not only one representative trajectory at the population

level but several of them, for instance by estimating a mix-

ture model along the lines of [17]. Nonetheless, it is worth

noting that in its current form the model is able to recon-

struct data at the precision of the noise.

The model also builds on the LDDMM framework for

modeling shape variability. This framework relies also on

some assumptions on the geometry of the shape space. Fu-

ture work will consider to learn such geometry from the

data instead of relying on prior assumptions, along the lines

of [11] for instance. Learning other parameters such as the

number of sources, using automatic model selection meth-

ods for instance, would also add to the usability of the method.

Acknowledgements The research leading to this publication has been

funded in part by the European Research Council (ERC) under grant

agreement No 678304 (LEASP), European Union’s Horizon 2020 re-

search and innovation programme under grant agreement No 666992

(EuroPOND) and No 826421 (TVB-Cloud), and the program “Investisse-

ments d’avenir” ANR-10-IAIHU-06 (IHU ICM) and ANR-19-P3IA-

0001 (PRAIRIE 3IA Institute)

The facial expression data set at the basis of Section 5.2 was built and

shared by the Binghamton University. The authors warmly thank Pr.

Lijun Yin for granting data access, and Peng Liu for his help in down-

loading the data set.

Regarding Section 5.3, data collection and sharing was funded by the

Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Insti-

tutes of Health Grant U01 AG024904) and DOD ADNI (Department of

Defense award number W81XWH-12-2-0012). ADNI is funded by the

National Institute on Aging, the National Institute of Biomedical Imag-

ing and Bioengineering, and through generous contributions from the

following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Dis-

covery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-

Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Phar-

maceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-

La Roche Ltd and its afﬁliated company Genentech, Inc.; Fujirebio; GE

Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research

& Development, LLC.; Johnson & Johnson Pharmaceutical Research

& Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso

Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technolo-

gies; Novartis Pharmaceuticals Corporation; Pﬁzer Inc.; Piramal Imag-

ing; Servier; Takeda Pharmaceutical Company; and Transition Thera-

peutics. The Canadian Institutes of Health Research is providing funds

to support ADNI clinical sites in Canada. Private sector contributions

are facilitated by the Foundation for the National Institutes of Health

(www.fnih.org). The grantee organization is the Northern California

22 Alexandre Bˆ

one, Olivier Colliot, and Stanley Durrleman

Institute for Research and Education, and the study is coordinated by

the Alzheimer’s Therapeutic Research Institute at the University of

Southern California. ADNI data are disseminated by the Laboratory

for Neuro Imaging at the University of Southern California.

References

1. Abdi, H.: Partial least square regression (pls regression). Encyclo-

pedia for research methods for the social sciences 6(4), 792–795

(2003)

2. Ahrens, J., Geveci, B., Law, C.: Paraview: An end-user tool for

large data visualization. The visualization handbook 717 (2005)

3. Allassonni`

ere, S., Durrleman, S., Kuhn, E.: Bayesian mixed effect

atlas estimation with a diffeomorphic deformation model. SIAM

Journal on Imaging Science 8, 1367–1395 (2015)

4. Amor, B.B., Drira, H., Berretti, S., Daoudi, M., Srivastava, A.: 4-d

facial expression recognition by learning geometric deformations.

IEEE Trans. Cybernetics 44(12), 2443–2457 (2014)

5. Atchade, Y.F.: An adaptive version for the metropolis adjusted

langevin algorithm with a truncated drift. Methodology and Com-

puting in applied Probability 8(2), 235–254 (2006)

6. Banerjee, M., Chakraborty, R., Ofori, E., Okun, M.S., Viallan-

court, D.E., Vemuri, B.C.: A nonlinear regression technique for

manifold valued data with applications to medical image analysis.

In: Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, pp. 4424–4432 (2016)

7. Beg, M., Miller, M., Trouv´

e, A., Younes, L.: Computing large

deformation metric mappings via geodesic ﬂows of diffeomor-

phisms. IJCV 61(2), 139–157 (2005)

8. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate:

a practical and powerful approach to multiple testing. Journal

of the Royal statistical society: series B (Methodological) 57(1),

289–300 (1995)

9. Bilgel, M., Prince, J.L., Wong, D.F., Resnick, S.M., Jedynak,

B.M.: A multivariate nonlinear mixed effects model for longitudi-

nal image analysis: Application to amyloid imaging. Neuroimage

134, 658–670 (2016)

10. Bˆ

one, A., Colliot, O., Durrleman, S.: Learning distributions of

shape trajectories from longitudinal datasets: a hierarchical model

on a manifold of diffeomorphisms. In: Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition, pp.

9271–9280 (2018)

11. Bˆ

one, A., Louis, M., Colliot, O., Durrleman, S., Initiative, A.D.N.,

et al.: Learning low-dimensional representations of shape data

sets with diffeomorphic autoencoders. In: International Confer-

ence on Information Processing in Medical Imaging, pp. 195–207.

Springer (2019)

12. Chakraborty, R., Banerjee, M., Vemuri, B.C.: Statistics on the

space of trajectories for longitudinal data analysis. In: Biomedical

Imaging (ISBI 2017), 2017 IEEE 14th International Symposium

on, pp. 999–1002. IEEE (2017)

13. Charlier, B., Feydy, J., Glaun`

es, J.A., Trouv´

e, A.: An efﬁcient ker-

nel product for automatic differentiation libraries, with applica-

tions to measure transport (2017)

14. Charon, N., Charlier, B., Glaun`

es, J., Gori, P., Roussillon, P.: Fi-

delity metrics between curves and surfaces: currents, varifolds,

and normal cycles. In: Riemannian Geometric Statistics in Medi-

cal Image Analysis, pp. 441–477. Elsevier (2020)

15. Christensen, G.E., Rabbitt, R.D., Miller, M.I.: Deformable tem-

plates using large deformation kinematics. IEEE transactions on

image processing 5(10), 1435–1447 (1996)

16. Cury, C., Durrleman, S., Cash, D.M., Lorenzi, M., Nicholas, J.M.,

Bocchetta, M., van Swieten, J.C., Borroni, B., Galimberti, D.,

Masellis, M., et al.: Spatiotemporal analysis for detection of pre-

symptomatic shape changes in neurodegenerative diseases: Initial

application to the genﬁ cohort. NeuroImage 188, 282–290 (2019)

17. Debavelaere, V., B ˆ

one, A., Durrleman, S., Allassonni`

ere, S., Ini-

tiative, A.D.N., et al.: Clustering of longitudinal shape data sets

using mixture of separate or branching trajectories. In: Interna-

tional Conference on Medical Image Computing and Computer-

Assisted Intervention, pp. 66–74. Springer (2019)

18. Delyon, B., Lavielle, M., Moulines, E.: Convergence of a stochas-

tic approximation version of the em algorithm. Annals of statistics

pp. 94–128 (1999)

19. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood

from incomplete data via the em algorithm. Journal of the royal

statistical society. Series B (methodological) pp. 1–38 (1977)

20. Durrleman, S., Allassonni`

ere, S., Joshi, S.: Sparse adaptive pa-

rameterization of variability in image ensembles. IJCV 101(1),

161–183 (2013)

21. Durrleman, S., Pennec, X., Trouv´

e, A., Braga, J., Gerig, G., Ay-

ache, N.: Toward a comprehensive framework for the spatiotem-

poral statistical analysis of longitudinal shape data. Interna-

tional Journal of Computer Vision 103(1), 22–59 (2013). DOI

10.1007/s11263-012- 0592-x. URL https://doi.org/10.

1007/s11263-012- 0592-x

22. Durrleman, S., Prastawa, M., Charon, N., Korenberg, J.R., Joshi,

S., Gerig, G., Trouv´

e, A.: Morphometry of anatomical shape com-

plexes with dense deformations and sparse parameters. NeuroIm-

age (2014)

23. Fang, T., Zhao, X., Shah, S.K., Kakadiaris, I.A.: 4d facial expres-

sion recognition. In: Computer Vision Workshops (ICCV Work-

shops), 2011 IEEE International Conference on, pp. 1594–1601.

IEEE (2011)

24. Fischl, B., Dale, A.M.: Measuring the thickness of the human

cerebral cortex from magnetic resonance images. Proceedings of

the National Academy of Sciences 97(20), 11,050–11,055 (2000)

25. Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., Hasel-

grove, C., Van Der Kouwe, A., Killiany, R., Kennedy, D., Klave-

ness, S., et al.: Whole brain segmentation: automated labeling of

neuroanatomical structures in the human brain. Neuron 33(3),

341–355 (2002)

26. Fishbaugh, J., Prastawa, M., Gerig, G., Durrleman, S.: Geodesic

regression of image and shape data for improved modeling of 4D

trajectories. In: ISBI 2014 - 11th International Symposium on

Biomedical Imaging, pp. 385 – 388 (2014)

27. Fletcher, T.: Geodesic regression and the theory of least squares

on riemannian manifolds. IJCV 105(2), 171–185 (2013)

28. Gori, P., Colliot, O., Marrakchi-Kacem, L., Worbe, Y., Poupon,

C., Hartmann, A., Ayache, N., Durrleman, S.: A Bayesian Frame-

work for Joint Morphometry of Surface and Curve meshes in

Multi-Object Complexes. Medical Image Analysis 35, 458–474

(2017). DOI 10.1016/j.media.2016.08.011. URL https://

hal.inria.fr/hal-01359423

29. Hinkle, J., Muralidharan, P., Fletcher, P.T., Joshi, S.: Polynomial

regression on riemannian manifolds. In: European Conference on

Computer Vision, pp. 1–14. Springer (2012)

30. Hirsch, M.W.: Differential topology, vol. 33. Springer Science &

Business Media (2012)

31. Hyv¨

arinen, A., Karhunen, J., Oja, E.: Independent component

analysis, vol. 46. John Wiley & Sons (2004)

32. Jian, B., Vemuri, B.C.: Robust point set registration using gaus-

sian mixture models. IEEE transactions on pattern analysis and

machine intelligence 33(8), 1633–1645 (2011)

33. Joshi, S.C., Miller, M.I.: Landmark matching via large deforma-

tion diffeomorphisms. IEEE Transactions on Image Processing

9(8), 1357–1370 (2000)

34. Kendall, D.G.: Shape manifolds, procrustean metrics, and com-

plex projective spaces. Bulletin of the London Mathematical So-

ciety 16(2), 81–121 (1984)

35. Kim, H.J., Adluru, N., Suri, H., Vemuri, B.C., Johnson, S.C.,

Singh, V.: Riemannian nonlinear mixed effects models: Analyz-

ing longitudinal deformations in neuroimaging. In: Proceedings

Learning the spatiotemporal variability in longitudinal shape data sets 23

of IEEE Conference on Computer Vision and Pattern Recognition

(CVPR) (2017)

36. Koval, I., Schiratti, J.B., Routier, A., Bacci, M., Colliot, O., Al-

lassonni`

ere, S., Durrleman, S., Initiative, A.D.N., et al.: Statistical

learning of spatiotemporal patterns from longitudinal manifold-

valued networks. In: International Conference on Medical Im-

age Computing and Computer-Assisted Intervention, pp. 451–

459. Springer (2017)

37. Kuhn, E., Lavielle, M.: Coupling a stochastic approximation ver-

sion of em with an mcmc procedure. ESAIM: Probability and

Statistics 8, 115–131 (2004)

38. Liu, D.C., Nocedal, J.: On the limited memory bfgs method for

large scale optimization. Mathematical programming 45(1-3),

503–528 (1989)

39. Lorenzi, M., Ayache, N., Frisoni, G., Pennec, X.: 4D registration

of serial brain’s MR images: a robust measure of changes applied

to Alzheimer’s disease. Spatio Temporal Image Analysis Work-

shop (STIA), MICCAI (2010)

40. Lorenzi, M., Ayache, N., Pennec, X.: Schild’s ladder for the paral-

lel transport of deformations in time series of images. In: Biennial

International Conference on Information Processing in Medical

Imaging, pp. 463–474. Springer (2011)

41. Louis, M., B ˆ

one, A., Charlier, B., Durrleman, S.: Parallel trans-

port in shape analysis: a scalable numerical scheme. In: Interna-

tional Conference on Geometric Science of Information, pp. 29–

37. Springer (2017)

42. Louis, M., Charlier, B., Jusselin, P., Pal, S., Durrleman, S.: A

fanning scheme for the parallel transport along geodesics on rie-

mannian manifolds. SIAM Journal on Numerical Analysis 56(4),

2563–2584 (2018)

43. Manasse, F., Misner, C.W.: Fermi normal coordinates and some

basic concepts in differential geometry. Journal of mathematical

physics 4(6), 735–745 (1963)

44. Marin, J.M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate

bayesian computational methods. Statistics and Computing 22(6),

1167–1180 (2012)

45. Marinescu, R.V., Eshaghi, A., Lorenzi, M., Young, A.L., Oxtoby,

N.P., Garbarino, S., Shakespeare, T.J., Crutch, S.J., Alexander,

D.C., Initiative, A.D.N., et al.: A vertex clustering model for dis-

ease progression: application to cortical thickness images. In:

International Conference on Information Processing in Medical

Imaging, pp. 134–145. Springer (2017)

46. Miller, M.I., Trouv´

e, A., Younes, L.: Geodesic shooting for com-

putational anatomy. Journal of Mathematical Imaging and Vision

24(2), 209–228 (2006)

47. Muralidharan, P., Fletcher, P.T.: Sasaki metrics for analysis of lon-

gitudinal data on manifolds. In: Computer Vision and Pattern

Recognition (CVPR), 2012 IEEE Conference on, pp. 1027–1034.

IEEE (2012)

48. Nader, C.A., Ayache, N., Robert, P., Lorenzi, M.: Monotonic gaus-

sian process for spatio-temporal trajectory separation in brain

imaging data. arXiv preprint arXiv:1902.10952 (2019)

49. Niethammer, M., Huang, Y., Vialard, F.X.: Geodesic regression

for image time-series. In: International Conference on Medical

Image Computing and Computer-Assisted Intervention, pp. 655–

662. Springer (2011)

50. Pennec, X.: Intrinsic statistics on riemannian manifolds: Basic

tools for geometric measurements. Journal of Mathematical Imag-

ing and Vision 25(1), 127–154 (2006)

51. Pennec, X., Fillard, P., Ayache, N.: A riemannian framework for

tensor computing. International Journal of Computer Vision 66(1),

41–66 (2006)

52. Schiratti, J.B., Allassonni`

ere, S., Colliot, O., Durrleman, S.:

Learning spatiotemporal trajectories from manifold-valued lon-

gitudinal data. In: C. Cortes, N.D. Lawrence, D.D. Lee,

M. Sugiyama, R. Garnett (eds.) NIPS 28, pp. 2404–2412. Curran

Associates, Inc. (2015)

53. Schiratti, J.B., Allassonniere, S., Colliot, O., Durrleman, S.: A

bayesian mixed-effects model to learn trajectories of changes from

repeated manifold-valued observations. The Journal of Machine

Learning Research 18(1), 4840–4872 (2017)

54. Singh, N., Hinkle, J., Joshi, S., Fletcher, P.T.: Hierarchical

geodesic models in diffeomorphisms. IJCV 117(1), 70–92 (2016)

55. Stern, Y.: Cognitive reserve and alzheimer disease. Alzheimer

Disease & Associated Disorders 20(2), 112–117 (2006)

56. Su, J., Kurtek, S., Klassen, E., Srivastava, A., et al.: Statistical

analysis of trajectories on riemannian manifolds: bird migration,

hurricane tracking and video surveillance. The Annals of Applied

Statistics 8(1), 530–552 (2014)

57. Su, J., Srivastava, A., de Souza, F.D., Sarkar, S.: Rate-invariant

analysis of trajectories on riemannian manifolds with application

in visual speech recognition. In: Proceedings of the IEEE Confer-

ence on Computer Vision and Pattern Recognition, pp. 620–627

(2014)

58. Sun, Y., Yin, L.: Facial expression recognition based on 3d dy-

namic range model sequences. In: European Conference on Com-

puter Vision, pp. 58–71. Springer (2008)

59. Vaillant, M., Glaun`

es, J.: Surface matching via currents. In: Infor-

mation processing in medical imaging, pp. 1–5. Springer (2005)

60. Woolrich, M.W., Jbabdi, S., Patenaude, B., Chappell, M., Makni,

S., Behrens, T., Beckmann, C., Jenkinson, M., Smith, S.M.:

Bayesian analysis of neuroimaging data in fsl. Neuroimage 45(1),

S173–S186 (2009)

61. Yin, L., Chen, X., Sun, Y., Worm, T., Reale, M.: A high-resolution

3d dynamic facial expression database. In: Automatic Face &

Gesture Recognition, 2008. FG’08. 8th IEEE International Con-

ference on, pp. 1–6. IEEE (2008)

62. Younes, L.: Jacobi ﬁelds in groups of diffeomorphisms and appli-

cations. Quarterly of Applied Mathematics 65(1), 113–134 (2007)

63. Younes, L.: Shapes and Diffeomorphisms. Applied Mathematical

Sciences. Springer Berlin Heidelberg (2010). URL https://

books.google.fr/books?id=SdTBtMGgeAUC

64. Zhang, M., Fletcher, P.T.: Finite-dimensional lie algebras for fast

diffeomorphic image registration. In: International Conference

on Information Processing in Medical Imaging, pp. 249–260.

Springer (2015)

65. Zhang, M., Singh, N., Fletcher, P.T.: Bayesian estimation of reg-

ularization and atlas building in diffeomorphic image registration.

In: IPMI, vol. 23, pp. 37–48 (2013)

A Background: meshes represented as currents

The theory of currents has been introduced in [59], and is

used in this paper to deﬁne a distance metric between pairs

of meshes without any assumption on their topology, and in

particular without assuming point-to-point correspondence.

See also [14] for more details.

A.1 Continuous theory

Let ybe a surface mesh, that we represent as an inﬁnite set

of tuples (x, n(x)) where xis a point of R3, and n(x)the

normal vector of yat this point. Let gE:R3×R3→Rbe a

positive-deﬁnite kernel operator, and Ethe associated repro-

ducing kernel Hilbert space. We deﬁne the current transform

C(y