Content uploaded by Alexandre Bône

Author content

All content in this area was uploaded by Alexandre Bône on Jun 13, 2018

Content may be subject to copyright.

Learning distributions of shape trajectories from longitudinal datasets:

a hierarchical model on a manifold of diffeomorphisms

Alexandre Bˆ

one Olivier Colliot Stanley Durrleman

The Alzheimer’s Disease Neuroimaging Initiative

Institut du Cerveau et de la Moelle ´

epini`

ere, Inserm, CNRS, Sorbonne Universit´

e, Paris, France

Inria, Aramis project-team, Paris, France

{alexandre.bone, olivier.colliot, stanley.durrleman}@icm-institute.org

Abstract

We propose a method to learn a distribution of shape tra-

jectories from longitudinal data, i.e. the collection of indi-

vidual objects repeatedly observed at multiple time-points.

The method allows to compute an average spatiotemporal

trajectory of shape changes at the group level, and the indi-

vidual variations of this trajectory both in terms of geometry

and time dynamics. First, we formulate a non-linear mixed-

effects statistical model as the combination of a generic sta-

tistical model for manifold-valued longitudinal data, a de-

formation model deﬁning shape trajectories via the action

of a ﬁnite-dimensional set of diffeomorphisms with a man-

ifold structure, and an efﬁcient numerical scheme to com-

pute parallel transport on this manifold. Second, we intro-

duce a MCMC-SAEM algorithm with a speciﬁc approach

to shape sampling, an adaptive scheme for proposal vari-

ances, and a log-likelihood tempering strategy to estimate

our model. Third, we validate our algorithm on 2D sim-

ulated data, and then estimate a scenario of alteration of

the shape of the hippocampus 3D brain structure during the

course of Alzheimer’s disease. The method shows for in-

stance that hippocampal atrophy progresses more quickly

in female subjects, and occurs earlier in APOE4 mutation

carriers. We ﬁnally illustrate the potential of our method for

classifying pathological trajectories versus normal ageing.

1. Introduction

1.1. Motivation

At the interface of geometry, statistics, and computer sci-

ence, statistical shape analysis meets a growing number of

applications in computer vision and medical image anal-

ysis. This research ﬁeld has addressed two main statisti-

cal questions: atlas construction for cross-sectional shape

datasets, and shape regression for shape time series. The

former is the classical extension of a mean-variance analy-

sis, which aims to estimate a mean shape and a covariance

structure from observations of several individual instances

of the same object or organ. The latter extends the concept

of regression by estimating a spatiotemporal trajectory of

shape changes from a series of observations of the same in-

dividual object at different time-points. The emergence of

longitudinal shape datasets, which consist in the collection

of individual objects repeatedly observed at multiple time-

points, has raised the need for a combined approach. One

needs statistical methods to estimate normative spatiotem-

poral models from series of individual observations which

differ in shape and dynamics of shape changes across in-

dividuals. Such model should capture and disentangle the

inter-subject variability in shape at each time-point and the

temporal variability due to shifts in time or scalings in pace

of shape changes. Considering individual series as samples

along a trajectory of shape changes, this approach amounts

to estimate a spatiotemporal distribution of trajectories, and

has potential applications in various ﬁelds including silhou-

ette tracking in videos, analysis of growth patterns in biol-

ogy or modelling disease progression in medicine.

1.2. Related work

The central difﬁculty in shape analysis is that shape

spaces are either deﬁned by invariance properties [21,40,

41] or by the conservation of topological properties [5,8,

13,20,34], and therefore have intrinsically a structure of

inﬁnite dimensional Riemannian manifolds or Lie groups.

Statistical Shape Models [9] are linear but require consistent

points labelling across observations and have no topology

preservation guarantees. A now usual approach is to use

the action of a group of diffeomorphisms to deﬁne a metric

on a shape space [29,43]. This approach has been used to

compute a “Fr´

echet mean” together with a covariance ma-

trix in the tangent-space of the mean [1,16,17,33,44] from

a cross-sectional dataset, and regression from time series of

shape data [4,15,16,18,26,32]. In [12,14,31], these tools

have been used to estimate an average trajectory of shape

changes from a longitudinal dataset using the convenient

assumption that the parameters encoding inter-individual

variability are independent of time. The work in [27] in-

troduced the idea to use the parallel transport to translate

the spatiotemporal patterns seen in one individual into the

geometry of another one. The co-adjoint transport is used

in [38] for the same purpose. Both estimate a group average

trajectory from individual trajectories. The proposed mod-

els do not account for inter-individual variability in the time

dynamics, which is of key importance in the absence of tem-

poral markers of the progression to align the sequences. The

same remarks can be applied to [6], which introduces a nice

theoretical setting for spaces of trajectories, in the case of a

ﬁxed number of temporal observations across subjects. The

need for temporal alignment in longitudinal data analysis

is highlighted for instance in [13] with a diffeomorphism-

based morphometry approach, or in [40,41] with quotient

manifolds. In [24,37], a generative mixed-effects model

for the statistical analysis of manifold-valued longitudinal

data is introduced for the analysis of feature vectors. This

model describes both the variability in the direction of the

individual trajectories by introducing the concept of “exp-

parallelization” which relies on parallel transport, and the

pace at which those trajectories are followed using “time-

warp” functions. Similar time-warps are used by the authors

of [23] to reﬁne their linear modeling approach [22].

1.3. Contributions

In this paper, we propose to extend the approach of [37]

from low-dimensional feature vectors to shape data. Us-

ing an approach designed for manifold-valued data for

shape spaces deﬁned by the action of a group of diffeomor-

phisms raises several theoretical and computational difﬁcul-

ties. Are notably needed: a ﬁnite-dimensional set of dif-

feomorphisms with a Riemannian manifold structure; sta-

ble and efﬁcient numerical schemes to compute Riemannian

exponential and parallel transport operators on this mani-

fold as no closed-form expressions are available; acceler-

ated algorithmic convergence to cope with an hundreds of

times larger dimensionality. To this end, we propose here:

•to formulate a generative non-linear mixed-effects

model for a ﬁnite-dimensional set of diffeomorphisms

deﬁned by control points, to show that this set is sta-

ble under exp-parallelization, and to use an efﬁcient

numerical scheme for parallel transport;

•to introduce an adapted MCMC-SAEM algorithm with

an adaptive block sampling of the latent variables, a

speciﬁc sampling strategy for shape parameters based

on random local displacements of the shape contours,

and a vanishing tempering of the target log-likelihood;

•to validate our method on 2D simulated data and a

large dataset of 3D brain structures in the context of

Alzheimer’s disease progression, and to illustrate the

potential of our method for classifying spatiotemporal

patterns, e.g. to discriminate pathological versus nor-

mal trajectories of ageing.

All in one, the proposed method estimates an average

spatiotemporal trajectory of shape changes from a longi-

tudinal dataset, together with distributions of space-shifts,

time-shifts and acceleration factors describing the variabil-

ity in shape, onset and pace of shape changes respectively.

2. Deformation model

2.1. The manifold of diffeomorphisms Dc0

We follow the approach taken in [12] built on the princi-

ples of the Large Deformation Diffeomorphic Metric Map-

ping (LDDMM) framework [30]. We note d2{2,3}the

dimension of the ambient space. We choose ka Gaussian

kernel of width 2R⇤

+and ca set of ncp 2N“con-

trol” points c=(c1,...,c

ncp )of the ambient space Rd. For

any set of “momentum” vectors m=(m1,...,m

ncn ), we

deﬁne the “velocity” vector ﬁeld v:Rd!Rdas the con-

volution v(x)=Pncp

k=1 k(ck,x)mkfor any point xof the

ambient space Rd. From initial sets of ncp control points c0

and corresponding momenta m0, we obtain the trajectories

t!(ct,m

t)by integrating the Hamiltonian equations:

˙c=Kcm;˙m=1

2rcmTKcm (1)

where Kcis the ncp ⇥ncp “kernel” matrix [k(ci,c

j)]ij ,r(.)

the gradient operator, and (.)Tthe matrix transposition.

Those trajectories prescribe the trajectory t!vtin the

space of velocity ﬁelds. The integration along such a path

from the identity generates a ﬂow of diffeomorphisms t!

tof the ambient space [5]. We can now deﬁne:

Dc0=1;@tt=vtt,0=Id,v

t=Conv(ct,m

t)

(˙ct,˙mt)=Ham(ct,m

t),m

02Rdncp (2)

where Conv(., .)and Ham(., .)are compact notations for

the convolution operator and the Hamiltonian equations (1)

respectively. Dc0has the structure of a manifold of ﬁnite

dimension, where the metric at the tangent space T1Dc0is

given by K1

c1. It is shown in [29] that the proposed paths

t!tare the paths of minimum deformation energy, and

are therefore the geodesics of Dc0. These geodesics are

fully determined by an initial set of momenta m0.

Then, any point x2Rdof the ambient space follows the

trajectory t!t(x). Such trajectories are used to deform

any point cloud or mesh embedded in the ambient space,

deﬁning a diffeomorphic deformation of the shape. For-

mally, this operation deﬁnes a shape space Sc0,y0as the or-

bit of a reference shape y0under the action of Dc0. The

manifold of diffeomorphisms Dc0is used as a proxy to ma-

nipulate shapes: all computations are performed in Dc0or

more concretely on a ﬁnite set of control points and mo-

mentum vectors, and applied back to the template shape y0

to obtain a result in Sc0,y0.

2.2. Riemmanian exponentials on Dc0

For any set of control points c0, we deﬁne the exponen-

tial operator Expc0:m02Rdncp !12Dc0. Note that

Dc0=Expc0(m0)|m02Rdncp .

The following proposition ensures the stability of Dc0by

the exponential operator, i.e. that the control points obtained

by applying successive compatible exponential maps with

arbitrary momenta are reachable by an unique integration

of the Hamiltonian equations from c0:

Proposition. Let c0be a set of control points. 812Dc0,

8wmomenta, we have Exp1(c0)(w)2Dc0.

Proof. We note 0

1=Exp

1(c0)(w)2D1(c0)and c0

1=

0

11(c0). By construction, there exist two paths t!'t

in Dc0and s!'0

sin D1(c0)such that '0

1'1(c0)=c0

1.

Therefore there exist a diffeomorphic path u! usuch

that (c0)=c0

1. Concluding with [29], the path u! uof

minimum energy exists, and is written u!Expc0(u.m0

0)

for some m0

02Rdncp .⇤

As a physical interpretation might be given to the inte-

gration time twhen building a statistical model, we intro-

duce the notation Expc0,t0,t :m02Rdncp !t2Dc0

where tis obtained by integrating from t=t0. Note that

Expc0=Exp

c0,0,1.

On the considered manifold Dc0, computing exponen-

tials – i.e. geodesics – therefore consists in integrating or-

dinary differential equations. This operation is direct and

computationally tractable. The top line on Figure 1plots a

geodesic :t!tapplied to the top-left shape y0.

2.3. Parallel transport and exp-parallels on Dc0

In [37] is introduced the exp-parallelism, which is a

generalization of Euclidian parallelism to geodesically-

complete manifolds. It relies on the Riemmanian parallel

transport operator, which we propose to compute using the

fanning scheme [28]. This numerical scheme only requires

the exponential operator to approximate the parallel trans-

port along a geodesic, with proved convergence.

We note Pc0,m0,t0,t :Rdncp !Rdncp the parallel trans-

port operator, which transports any momenta walong the

geodesic :t!t=Exp

c0,t0,t(m0)from t0to t. For

any c0,m0,wand t0, we can now deﬁne the curve:

t!⌘c0,m0,t0,t(w)=Exp

(t)(c0)⇥Pc0,m0,t0,t(w)⇤.(3)

This curve, that we will call exp-parallel to , is well-

deﬁned on the manifold Dc0, according to the proposition of

Section 2.2. Figure 1illustrates the whole procedure. From

the top-left shape, the computational scheme is as follow:

integrate the Hamiltonian equations to obtain the control

Figure 1: Samples from a geodesic (top) and an exp-

parallelized curve ⌘(bottom) on Sc0,y0. Parameters encod-

ing the geodesic are the blue momenta attached to con-

trol points and plotted together with the associated velocity

ﬁelds. Momenta in red are parallel transported along the

geodesic and deﬁne a deformation mapping each frame of

the geodesic to a frame of ⌘. Exp-parallelization allows to

transport a shape trajectory from one geometry to another.

points c(t)(red crosses) and momenta m(t)(bold blue ar-

rows); compute the associated velocity ﬁelds vt(light blue

arrows); compute the ﬂow :t!t(shape progression);

transport the momenta walong (red arrows); compute the

exp-parallel curve ⌘by repeating the three ﬁrst steps along

the transported momenta.

3. Statistical model

For each individual 1iNare available nilongitu-

dinal shape measurements y=(yi,j )1jniand associated

times (ti,j )1jni.

3.1. The generative statistical model

Let c0a set of control points and m0associated mo-

menta. We call the geodesic t!Expc0,t0,t(m0)of Dc0.

Let y0be a template mesh shape embedded in the ambi-

ent space. For a subject i, the observed longitudinal shape

measurements yi,0,...,y

i,niare modeled as sample points at

times i(ti,j )of an exp-parallel curve t!⌘c0,m0,t0,t(wi)

to this geodesic , plus additional noise ✏i,j :

yi,j =⌘c0,m0,t0, i(ti,j )(wi)y0+✏i,j .(4)

The time warp function iand the space-shift momenta wi

respectively encode for the individual time and space vari-

ability. The time-warp is deﬁned as an afﬁne reparametriza-

tion of the reference time t: i(t)=↵i(tt0⌧i)+t0

where the individual time-shift ⌧i2Rallows an inter-

individual variability in the stage of evolution, and the in-

dividual acceleration factor ↵i2R⇤

+a variability in the

pace of evolution. For convenience, we write ↵i= exp(⇠i).

In the spirit of Independent Component Analysis [19], the

space-shift momenta wiis modeled as the linear combi-

nation of nssources, gathered in the ncp ⇥nsmatrix A:

wi=Am?

0si. Before computing this superposition, each

column cl(A)of A has been projected on the hyperplane

m?

0for the metric Kc0, ensuring the orthogonality between

m0and wi. As argued in [37], this orthogonality is fun-

damental for the identiﬁability of the model. Without this

constraint, the projection of the space shifts (wi)ion m0

could be confounded with the acceleration factors (↵i)i.

3.2. Mixed-effects formulation

We use either the current [42] or varifold [7] noise model

for the residuals ✏i,j , allowing our method to work with in-

put meshes without any point correspondence. In this set-

ting, we note ✏i,j

iid

⇠N(0,2

✏). The other previously in-

troduced variables are modeled as random effects z, with:

y0⇠N(y0,2

y),c0⇠N(c0,2

c),m0⇠N(m0,2

m),A⇠

N(A, 2

A),⌧i

iid

⇠N(0,2

⌧),⇠i

iid

⇠N(0,2

⇠),si

iid

⇠N(0,1).

We deﬁne ✓=(y0, c0, m0, A, t0,2

⌧,2

⇠,2

✏)the ﬁxed ef-

fects i.e. the parameters of the model. The remaining vari-

ance parameters 2

y,2

c,2

mand 2

Acan be chosen arbitrar-

ily small. Standard conjugate distributions are chosen as

Bayesian priors on the model parameters: y0⇠N(y0,&2

O),

c0⇠N(c0,&2

c),m0⇠N(m0,&2

m),A⇠N(A, &2

A),

t0⇠N(t0,&2

t),2

⌧⇠IG(m⌧,2

⌧,0),2

⇠⇠IG(m⇠,2

⇠,0),

and 2

✏⇠IG(m✏,2

✏,0)with inverse-gamma distributions

on variance parameters. Those priors ensure the existence

of the maximum a posteriori (MAP) estimator. In practice,

they regularize and guide the estimation procedure.

The proposed model belongs to the curved exponential

family (see supplementary material, which gives the com-

plete log-likelihood). In this setting, the algorithm intro-

duced in the following section has a proved convergence.

We have deﬁned a distribution of trajectories that could

be noted t!y(t)=f✓,t(z)where zis a random variable

following a normal distribution. We call t!f✓,t(E[z]) the

average trajectory, which may not be equal to the expected

trajectory t!E[f✓,t(z)] in the general non-linear case.

4. Estimation

4.1. The MCMC-SAEM algorithm

The Expectation Maximization (EM) algorithm [11] al-

lows to estimate the parameters of a mixed-effects model

with latent variables, here the random effects z. It alter-

nates between an expectation (E) step and a maximiza-

tion (M) one. The E step is intractable in our case, due

to the non-linearity of the model. In [10] is introduced

and proved a stochastic approximation of the EM algo-

rithm, where the E step is replaced by a simulation (S)

step followed by an approximation (A) one. The S step

requires to sample q(z|y, ✓k), which is also intractable

in our case. In the case of curved exponential mod-

els, the authors in [2] show that the convergence holds

if the S step is replaced by a single transition of an er-

godic Monte-Carlo Markov Chain (MCMC) whose sta-

tionary distribution is q(z|y, ✓k). This global algorithm is

called the Monte-Carlo Markov Chain Stochastic Approxi-

mation Expectation-Maximization (MCMC-SAEM), and is

exploited in this paper to compute the MAP estimate of the

model parameters ✓map = max✓Rq(y, z|✓)dz.

4.2. The adaptative block sampler

We use a block formulation of the Metropolis-Hasting

within Gibbs (MHwG) sampler in the S-MCMC step. The

latent variables zare decomposed into nbnatural blocks:

z=y0,c

0,m

0,[cl(A)]l,[⌧i,⇠i,s

i]i . Those blocks have

highly heterogeneous sizes, e.g. a single scalar for ⌧iversus

possibly thousands for y0, for which we introduce a speciﬁc

proposal distribution in Section 4.3.

For all the other blocks, we use a symmetric random

walk MHwG sampler with normal proposal distributions

of the form N(0,2

bId)to perturb the current block state

zk

b. In order to achieve reasonable acceptance rates ar i.e.

around ar?= 30% [36], the proposal standard deviations

bare dynamically adapted every nadapt iterations by mea-

suring the mean acceptance rates ar over the last ndetect it-

erations, and applying, for any b:

b b+1

k

ar ar?

(1 ar?)arar?+ar?ar<ar?

(5)

with >0.5. Inspired by [3], this dynamic adaptation is

performed with a geometrically decreasing step-size k,

ensuring the vanishing property of the adaptation scheme

and the convergence of the whole algorithm [2,3]. It proved

very efﬁcient in practice with nadapt =ndetect = 10 and

=0.51, for any kind of data.

4.3. Efﬁcient sampling of smooth template shapes

The ﬁrst block z1=y0i.e. the coordinates of the points

of the template mesh, is of very high dimension: naively

sampling over each scalar value of its numerical descrip-

tion would result both in unnatural distorted shapes and a

daunting computational burden.

We propose to take advantage of the geometrical na-

ture of y0and leverage the framework introduced in Sec-

tion 2by perturbing the current block state zk

1with a small

displacement ﬁeld v, obtained by the convolution of ran-

dom momenta on a pre-selected set of control points. This

proposal distribution can be seen as a normal distribution

N(0,2

1DTD)where 2

1is the variance associated with

the random momenta, and Dthe convolution matrix. In

practice, dynamically adapting the proposal variance 2

1and

selecting regularly-spaced shape points as control points

proved efﬁcient.

4.4. Tempering

The MCMC-SAEM is proved convergent toward a local

maximum of ✓!Rq(y, z|✓)dz. In practice, the dimen-

sionality of the energetic landscape q(y, z|✓)and the pres-

ence of multiple local maxima can make the estimation pro-

cedure sensitive to initialization. Inspired by the globally-

convergent simulated annealing algorithm, [25] proposes

to carry out the optimization procedure in a smoothed ver-

sion of the original landscape qT(y, z|✓). The temperature

parameter Tcontrols this smoothing, and should decrease

from large values to 1, for which qT=q.

We propose to introduce such a temperature parameter

only for the population variables zpop. The tempered version

of the complete log-likelihood is given as supplementary

material. In our experiments, the chosen temperature se-

quence Tkremains constant at ﬁrst, and then geometrically

decreases to unity. Implementing this “tempering” feature

had a dramatic impact on the required number of iterations

before convergence, and greatly improved the robustness of

the whole procedure. Note that the theoretical convergence

properties of the MCMC-SAEM are not degraded, since the

tempered phase of the algorithm can be seen as an initializ-

ing heuristic, and may actually be improved.

Algorithm 1: Estimation of the longitudinal deformations

model with the MCMC-SAEM.

Code publicly available at: www.deformetrica.org.

input : Longitudinal dataset of shapes y=(yi,j )i,j . Initial

parameters ✓0and latent variables z0. Geometri-

cally decreasing sequence of step-sizes ⇢k.

output: Estimation of ✓map.Samples (zs)sapproximately

distributed following q(z|y, ✓map).

Initialization: set k=0and S0=S(z0).

repeat

Simulation:foreach block of latent variables zbdo

Draw a candidate zc

b⇠pb(.|zk

b).

Set zc=(zk+1

1,...,zk+1

b1,zc

b,zk

b+1,...,zk

nb).

Compute the geodesic :t!Expc0,t0,t(m0).

8i, compute wi=A?

m0si.

8i, compute w:t!Pc0,m0,t0,t(wi).

8i, j, compute Exp[ i(ti,j )](c0)w[ i(ti,j )] .

Compute the acceptation ratio !=minh1,q(zc|y,✓k)

q(zk|y,✓k)i.

if u⇠U(0,1) <!then zk+1

b zc

belse zk+1

b zk

b.

end

Stochastic approx.:Sk+1 Sk+⇢k⇥S(zk+1)Sk⇤.

Maximization:✓k+1 ✓?(Sk+1).

Adaptation:if remainder(k+1,n

adapt)=0then update

the proposal variances (2

b)bwith equation (5).

Increment: set k k+1.

until convergence;

4.5. Sufﬁcient statistics and maximization step

Exhibiting the sufﬁcient statistics S1=y0,S2=c0,S3=

m0,S4=A,S5=Pit0+⌧i,S6=Pi(t0+⌧i)2,S7=Pi⇠2

i

and S8=PiPjkyi,j ⌘c0,m0,t0, i(ti,j )(wi)y0k2, the

update of the model parameters ✓ ✓?in the M step of the

MCMC-SAEM can be derived in closed-form. The explicit

expressions are given as supplementary material.

5. Experiments

5.1. Validation with simulated data in R2

Convergence study. To validate the estimation procedure,

we ﬁrst generate synthetic data directly from the model

without additional noise. Our choice of reference geodesic

is plotted on top line of the previously introduced Fig-

ure 1: the template y0is the top central shape, the chosen

ﬁve control points c0are the red crosses, and the momenta

m0the bold blue arrow. Those parameters are completed

with t0= 70,⌧=1,⇠=0.1. With ns=4independent

components, we simulate N= 100 individual trajectories

and sample hniii=5observations from each.

The algorithm is run ten times. Figure 2plots the evolu-

tion of the error on the parameters along the estimation pro-

cedure in log scale. Each color corresponds to a different

run: the algorithm converges to the same point each time,

as it is conﬁrmed by the small variances on the residual er-

rors indicated in Table 1. Those residual errors come from

the ﬁnite number of observations of the generated dataset

0 10 20 30 40 50

Thousands of iterations

2

3

4

||y0

est - y 0

true||2

var

Varifold error on y0

0 10 20 30 40 50

Thousands of iterations

1

1.5

2

2.5

||v0

est - v 0

true||2

L2 error on v0

0 10 20 30 40 50

Thousands of iterations

10-1

100

101

|t0

est - t 0

true|

L1 error on t0

0 10 20 30 40 50

Thousands of iterations

10-3

10-2

10-1

100

101

|στ

est - στ

true|

L1 error on στ

0 10 20 30 40 50

Thousands of iterations

10-3

10-2

10-1

100

101

|σξ

est - σξ

true|

L1 error on σξ

0 10 20 30 40 50

Thousands of iterations

0.1

0.2

0.3

0.4

|σϵ

est - σϵ

true|

L1 error on σϵ

Figure 2: Error on the population parameters along the esti-

mation procedure, with logarithmic scales. The residual on

the template shape y0is computed with the varifold metric.

ky0k2

var. kv0k2|t0||⌧||⇠||✏|⌦kvik2↵i⌦|⇠i|↵i⌦|⌧i|↵i

1.43 ±5.6% 0.89 ±0.7% 0.19 ±2.7% 0.029 ±13.2% 0.017 ±7.6% 0.11 ±0.1% 2.47 ±1.7% 0.022 ±6.7% 0.19 ±0.8%

Table 1: Absolute residual errors on the estimated parameters and associated relative standard deviations across the 10 runs.

Are noted v0=Conv(c0,m

0)and vi=Conv(c0,w

i). The operator h.iiindicates an average over the index i. Residuals

are satisfyingly small, as it can be seen for |t0|for instance when compared with the time-span max|tij |=4. The low

standard deviations suggest that the stochastic estimation procedure is stable and reproduces very similar results at each run.

Figure 3: Estimated mean progression

(bottom line in bold), and three recon-

structed individual scenarii (top lines). In-

put data is plotted in red in the rele-

vant frames, demonstrating the reconstruc-

tion ability of the estimated model. Our

method is able to disentangle the variabil-

ity in shape, starting time of the arm move-

ment and speed.

and the Bayesian priors, but are satisfyingly small, as qual-

itatively conﬁrmed by Figure 3. The estimated mean trajec-

tory, in bold, matches the true one, given by the top line of

Figure 1. Figure 3also illustrates the ability of our method

to reconstruct continuous individual trajectories.

Personalizing the model to unseen data. Once a model

has been learned i.e. the parameters ✓map have been esti-

mated, it can easily be personalized to the observations ynew

of a new subject by maximizing q(ynew,z

new |✓map)for the

low-dimensional latent variables znew. We implemented this

maximization procedure with the Powell’s method [35], and

evaluated it by registering the simulated trajectories to the

true model. Table 2gathers the results for the previously-

introduced dataset with hniii=5observations per subject,

and extended ones with hniii=7and 9. The parameters

are satisfyingly estimated in all conﬁgurations: the recon-

Experience |✏|⌦|si|↵i⌦|⇠i|↵i⌦|⌧i|↵i

hniii=5 0.110 3.34% 37.0% 5.45%

hniii=7 0.095 2.98% 16.2% 3.86%

hniii=9 0.087 2.38% 11.9% 3.28%

Table 2: Residual errors metrics for the longitudinal regis-

tration procedure, for three simulated datasets. The abso-

lute residual error on ✏is given, the other errors are given

in percentage of the simulation standard deviation.

struction error measured by |✏|remains as low as in the

previous experiment (see Table 1, Figure 3). The acceler-

ation factor is the most difﬁcult parameter to estimate with

small observation windows of the individual trajectories; at

least two observations are needed to obtain a good estimate.

5.2. Hippocampal atrophy in Alzheimer’s disease

Longitudinal deformations model on MCIc subjects. We

extract the T1-weighted magnetic resonance imaging mea-

surements of N= 100 subjects from the ADNI database,

with hniii=7.6datapoints on average. Those sub-

jects present mild cognitive impairements, and are even-

tually diagnosed with Alzheimer’s disease (MCI convert-

ers, noted MCIc). In a pre-processing phase, the 3D

images are afﬁnely aligned and the segmentations of the

right-hemisphere hippocampi are transformed into a surface

meshes. Each afﬁne transformation is then applied to the

corresponding mesh, before rigid alignement of follow-up

meshes on the baseline one. The hippocampus is a subcorti-

cal brain structure which plays a central role in memory, and

experiences atrophy during the development of Alzheimer’s

disease. We initialize the geodesic population parameters

y0,c

0,m

0with a geodesic regression [15,16] performed on

a single subject. The reference time t0is initialized to the

mean of the observation times (ti,j)i,j and 2

⌧to the corre-

sponding variance. We choose to estimate ns=4indepen-

dent components and initialize the corresponding matrix A

to zero, so as the individual latent variables ⌧i,⇠i,s

i. After

10,000 iterations, the parameter estimates stabilized.

!. #$

!. !%

!. &%

!. '(

!. $(

!. ((

!. %(

!. )(

*!

++/-

Figure 4: Estimated mean progression of the right hippocampi. Successive ages: 69.3y, 71.8y (i.e. the template y0), 74.3y,

76.8y, 79.3y, 81.8y, and 84.3y. The color map gives the norm of the velocity ﬁeld kv0kon the meshes.

!" #$

#" #%

&" #%

'" #(

$" #(

)" #)

(" #)

%" #$

*+,$

--./0

Figure 5: Third independent component. The plotted hippocampi correspond to si,3successively equal to: -3, -2, -1, 0 (i.e.

the template y0), 1, 2 and 3. Note that this component is orthogonal to the temporal trajectory displayed in Figure 4.

Figure 4plots the estimated mean progression, which ex-

hibits a complex spatiotemporal atrophy pattern during dis-

ease progression: a pinching effect at the “joint” between

the head and the body, combined with a speciﬁc atrophy of

the medial part of the tail. Figure 5plots an independent

component, which is orthogonal to the mean progression

by construction. This component seems to account for the

inter-subject variability in the relative size of the hippocam-

pus head compared to its tail.

We further examine the correlation between individual

parameters and several patients characteristics. Figure 6ex-

hibits the strong correlation between the estimated individ-

ual time-shifts ⌧iand the age of diagnostic tdiag

i, suggesting

that the hippocampal atrophy correlates well with the cog-

nitive symptoms. The few outliers above the regression line

55 60 65 70 75 80 85 90

( early onset) Onset age t0+t

i(late onset !)

60

70

80

90

Age at diagnosis tdiag

i

Figure 6: Comparison of the estimated individual time-

shifts ⌧iversus the age of diagnostic tdiag

i.R2=0.74.

might have resisted better to the atrophy of their hippocam-

pus with a higher brain plasticity, in line with the cognitive

reserve theory [39]. The few outliers below this line could

have developed a subform of the disease, with delayed atro-

phy of their hippocampi. Further investigation is required

to rule out potential convergence issues in the optimiza-

tion procedure. Figures 7,8,9propose group comparisons

based on the estimated individual parameters: the accelera-

tion factor ↵i, time-shift ⌧iand space-shift si,3in the direc-

tion of the third component (see Figure 5). The distributions

of those parameters are signiﬁcantly different for the Mann-

Whitney statistical test when dividing the N= 100 MCIc

subjects according to gender, APOE4 mutation status, and

onset age t0+⌧irespectively.

0.5 1.0 1.5 2.0

( slow) Acceleration factor ai(fast !)

0

5

10

Male

Female

Figure 7: Distributions of acceleration factors ↵iaccord-

ing to the gender. Hippocampal atrophy is faster in female

subjects (p=0.045).

55 60 65 70 75 80 85 90

( early onset) Onset age t0+t

i(late onset !)

0

5

10

No APOE4 allele

One APOE4 allele

Two A P OE 4 a l le l es

Figure 8: Distributions of time-shifts ⌧iaccording to the

number of APOE4 alleles. Hippocampal atrophy occurs

earlier in carriers of 1 or 2 alleles (p=0.017 and 0.015).

3210123

Source parameter si,3

0.0

2.5

5.0

7.5

10.0 Early onset

Ave r a ge on s e t

Late onset

Figure 9: Distribution of the third source term si,3accord-

ing to the categories {⌧i3},{3<⌧i<3}, and

{3⌧i}. Hippocampal atrophy seems to occurs later in

subjects presenting a lower volume ratio of the hippocam-

pus tail over the hippocampus head (p=0.0049).

hniiiShape features Naive feature All features

1 71% ±4.5 [lr] 50% ±5.0 [nb] 58% ±5.0 [lr]

2 77% ±4.3 [lr] 58% ±4.9 [5nn] 68% ±4.7 [dt]

4 79% ±4.1 [svm] 67% ±4.7 [5nn] 80% ±4.0 [lr]

5 77% ±4.2 [nn] 77% ±4.2 [lr] 82% ±3.8 [nb]

6.86 83% ±3.7 [lr] 80% ±4.0 [lr] 86% ±3.4 [lr]

Table 3: Mean classiﬁcation scores and associated standard

deviations, computed on 10,000 bootstrap samples from the

test dataset. Across all tested classiﬁers (sklearn default

hyperparameters), only the best performing one is reported

in each case: [lr] logistic regression, [nb] naive Bayes,

[5nn] 5 nearest neighbors, [dt] decision tree, [nn] neural net-

work, [svm] linear support vector machine.

Classifying pathological trajectories vs. normal ageing.

We processed another hundred of individuals from the

ADNI database (hniii=7.37), choosing this time control

subjects (CN). We form two balanced datasets, each con-

taining 50 MCIc and 50 CN. We learn two distinct longitu-

dinal deformations model on the training MCIc (N= 50,

hniii=8.14) and CN (N= 50,hniii=8.08) subjects.

We personalize both models to all the 200 subjects, and

use the scaled and normalized differences zMCIc

izCN

ias

feature vectors of dimension 6, on which a list of stan-

dard classiﬁers are trained and tested to predict the label

in {MCIc,CN}. For several number of observations per

test subject hniiiconﬁgurations, we compute conﬁdence

intervals by bootstraping the test set. Table 3compares

the results with a naive approach, using as unique feature

the slope of individually-ﬁtted linear regressions of the hip-

pocampal volume with age. Classiﬁers performed consis-

tently better with the features extracted from the longitu-

dinal deformations model, even with a single observation.

The classiﬁcation performance increases with the number

of available observations per subject. Interestingly, from

hniii=4pooling the shape and volume features yields an

improved performance, suggesting complementarity.

6. Conclusion

We proposed a hierarchical model on a manifold of dif-

feomorphisms estimating the spatiotemporal distribution of

longitudinal shape data. The observed shape trajectories

are represented as individual variations of a group-average,

which can be seen as the mean progression of the popu-

lation. Both spatial and temporal variability are estimated

directly from the data, allowing the use of unaligned se-

quences. This feature is key for applications where no ob-

jective temporal markers are available, as it is the case for

Alzheimer’s disease progression for instance, whose onset

age and pace of progression vary among individuals. Our

model builds on the principles of a generic longitudinal

modeling for manifold-valued data [37]. We provided a

coherent theoretical framework for its application to shape

data, along with the needed algorithmic solutions for paral-

lel transport and sampling on our speciﬁc manifold. We es-

timated our model with the MCMC-SAEM algorithm both

with simulated and real data. The simulation experiments

conﬁrmed the ability of the proposed algorithm to retrieve

the optimal parameters in realistic scenarii. The application

to medical imaging data, namely segmented hippocampi

brain structures of Alzheimer’s diseased patients, deliv-

ered results coherent with medical knowledge, and provides

more detailed insights into the complex atrophy pattern of

the hippocampus and its variability across patients. In fu-

ture work, the proposed method will be leveraged for auto-

matic diagnosis and prognosis purposes. Further investiga-

tions are also needed to evaluate the algorithm convergence

with respect to the number of individual samples.

Acknowledgments. This work has been partly funded by the Eu-

ropean Research Council with grant 678304, European Union’s

Horizon 2020 research and innovation program with grant 666992,

and the program Investissements d’avenir ANR-10-IAIHU-06.

References

[1] S. Allassonni`

ere, S. Durrleman, and E. Kuhn. Bayesian

mixed effect atlas estimation with a diffeomorphic deforma-

tion model. SIAM Journal on Imaging Science, 8:13671395,

2015. 1

[2] S. Allassonni`

ere, E. Kuhn, and A. Trouv´

e. Construction of

bayesian deformable models via a stochastic approximation

algorithm: a convergence study. Bernoulli, 16(3):641–678,

2010. 4

[3] Y. F. Atchade. An adaptive version for the metropolis ad-

justed langevin algorithm with a truncated drift. Method-

ology and Computing in applied Probability, 8(2):235–254,

2006. 4

[4] M. Banerjee, R. Chakraborty, E. Ofori, M. S. Okun, D. E.

Viallancourt, and B. C. Vemuri. A nonlinear regression tech-

nique for manifold valued data with applications to medi-

cal image analysis. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pages 4424–

4432, 2016. 1

[5] F. Beg, M. Miller, A. Trouv´

e, and L. Younes. Computing

large deformation metric mappings via geodesic ﬂows of dif-

feomorphisms. IJCV, 2005. 1,2

[6] R. Chakraborty, M. Banerjee, and B. C. Vemuri. Statistics

on the space of trajectories for longitudinal data analysis. In

Biomedical Imaging (ISBI 2017), 2017 IEEE 14th Interna-

tional Symposium on, pages 999–1002. IEEE, 2017. 2

[7] N. Charon and A. Trouv´

e. The varifold representation of

nonoriented shapes for diffeomorphic registration. SIAM

Journal on Imaging Sciences, 6(4):2547–2580, 2013. 4,11

[8] G. E. Christensen, R. D. Rabbitt, and M. I. Miller. De-

formable templates using large deformation kinematics.

IEEE transactions on image processing, 5(10):1435–1447,

1996. 1

[9] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appear-

ance models. IEEE Transactions on pattern analysis and

machine intelligence, 23(6):681–685, 2001. 1

[10] B. Delyon, M. Lavielle, and E. Moulines. Convergence of a

stochastic approximation version of the em algorithm. An-

nals of statistics, pages 94–128, 1999. 4

[11] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum

likelihood from incomplete data via the em algorithm. Jour-

nal of the royal statistical society. Series B (methodological),

pages 1–38, 1977. 4

[12] S. Durrleman, S. Allassonni`

ere, and S. Joshi. Sparse adaptive

parameterization of variability in image ensembles. IJCV,

101(1):161–183, 2013. 1,2

[13] S. Durrleman, X. Pennec, A. Trouv´

e, J. Braga, G. Gerig, and

N. Ayache. Toward a comprehensive framework for the spa-

tiotemporal statistical analysis of longitudinal shape data. In-

ternational Journal of Computer Vision, 103(1):22–59, May

2013. 1,2

[14] S. Durrleman, X. Pennec, A. Trouv´

e, G. Gerig, and N. Ay-

ache. Spatiotemporal atlas estimation for developmental de-

lay detection in longitudinal datasets. In Med Image Comput

Comput Assist Interv, pages 297–304. Springer, 2009. 1

[15] J. Fishbaugh, M. Prastawa, G. Gerig, and S. Durrleman.

Geodesic regression of image and shape data for improved

modeling of 4D trajectories. In ISBI 2014 - 11th Interna-

tional Symposium on Biomedical Imaging, pages 385 – 388,

Apr. 2014. 1,6

[16] T. Fletcher. Geodesic regression and the theory of least

squares on riemannian manifolds. IJCV, 105(2):171–185,

2013. 1,6

[17] P. Gori, O. Colliot, L. Marrakchi-Kacem, Y. Worbe,

C. Poupon, A. Hartmann, N. Ayache, and S. Durrleman. A

Bayesian Framework for Joint Morphometry of Surface and

Curve meshes in Multi-Object Complexes. Medical Image

Analysis, 35:458–474, Jan. 2017. 1

[18] J. Hinkle, P. Muralidharan, P. T. Fletcher, and S. Joshi. Poly-

nomial Regression on Riemannian Manifolds, pages 1–14.

Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. 1

[19] A. Hyv¨

arinen, J. Karhunen, and E. Oja. Independent compo-

nent analysis, volume 46. John Wiley & Sons, 2004. 3

[20] S. C. Joshi and M. I. Miller. Landmark matching via large

deformation diffeomorphisms. IEEE Transactions on Image

Processing, 9(8):1357–1370, 2000. 1

[21] D. G. Kendall. Shape manifolds, procrustean metrics, and

complex projective spaces. Bulletin of the London Mathe-

matical Society, 16(2):81–121, 1984. 1

[22] H. J. Kim, N. Adluru, M. D. Collins, M. K. Chung, B. B.

Bendlin, S. C. Johnson, R. J. Davidson, and V. Singh. Multi-

variate general linear models (mglm) on riemannian man-

ifolds with applications to statistical analysis of diffusion

weighted images. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pages 2705–

2712, 2014. 2

[23] H. J. Kim, N. Adluru, H. Suri, B. C. Vemuri, S. C. Johnson,

and V. Singh. Riemannian nonlinear mixed effects models:

Analyzing longitudinal deformations in neuroimaging. In

Proceedings of IEEE Conference on Computer Vision and

Pattern Recognition (CVPR), 2017. 2

[24] I. Koval, J.-B. Schiratti, A. Routier, M. Bacci, O. Colliot,

S. Allassonni`

ere, S. Durrleman, A. D. N. Initiative, et al. Sta-

tistical learning of spatiotemporal patterns from longitudinal

manifold-valued networks. In International Conference on

Medical Image Computing and Computer-Assisted Interven-

tion, pages 451–459. Springer, 2017. 2

[25] M. Lavielle. Mixed effects models for the population ap-

proach: models, tasks, methods and tools. CRC press, 2014.

5,11

[26] M. Lorenzi, N. Ayache, G. Frisoni, and X. Pennec. 4D reg-

istration of serial brains MR images: a robust measure of

changes applied to Alzheimer’s disease. Spatio Temporal

Image Analysis Workshop (STIA), MICCAI, 2010. 1

[27] M. Lorenzi, N. Ayache, and X. Pennec. Schild’s ladder for

the parallel transport of deformations in time series of im-

ages. pages 463–474. Springer, 2011. 2

[28] M. Louis, A. Bˆ

one, B. Charlier, S. Durrleman, A. D. N. Ini-

tiative, et al. Parallel transport in shape analysis: a scalable

numerical scheme. In International Conference on Geomet-

ric Science of Information, pages 29–37. Springer, 2017. 3

[29] M. I. Miller, A. Trouv´

e, and L. Younes. Geodesic shooting

for computational anatomy. Journal of Mathematical Imag-

ing and Vision, 24(2):209–228, 2006. 1,2,3

[30] M. I. Miller and L. Younes. Group actions, homeomor-

phisms, and matching: A general framework. International

Journal of Computer Vision, 41(1-2):61–84, 2001. 2

[31] P. Muralidharan and P. T. Fletcher. Sasaki metrics for anal-

ysis of longitudinal data on manifolds. In Computer Vision

and Pattern Recognition (CVPR), 2012 IEEE Conference on,

pages 1027–1034. IEEE, 2012. 1

[32] M. Niethammer, Y. Huang, and F.-X. Vialard. Geodesic re-

gression for image time-series. In International Conference

on Medical Image Computing and Computer-Assisted Inter-

vention, pages 655–662. Springer, 2011. 1

[33] X. Pennec. Intrinsic statistics on riemannian manifolds: Ba-

sic tools for geometric measurements. Journal of Mathemat-

ical Imaging and Vision, 25(1):127–154, 2006. 1

[34] X. Pennec, P. Fillard, and N. Ayache. A riemannian frame-

work for tensor computing. International Journal of Com-

puter Vision, 66(1):41–66, 2006. 1

[35] M. J. Powell. An efﬁcient method for ﬁnding the minimum

of a function of several variables without calculating deriva-

tives. The computer journal, 7(2):155–162, 1964. 6

[36] G. O. Roberts, A. Gelman, W. R. Gilks, et al. Weak con-

vergence and optimal scaling of random walk metropolis al-

gorithms. The annals of applied probability, 7(1):110–120,

1997. 4

[37] J.-B. Schiratti, S. Allassonni`

ere, O. Colliot, and S. Durrle-

man. Learning spatiotemporal trajectories from manifold-

valued longitudinal data. In NIPS 28, pages 2404–2412.

2015. 2,3,4,8

[38] N. Singh, J. Hinkle, S. Joshi, and P. T. Fletcher. Hierarchical

geodesic models in diffeomorphisms. IJCV, 117(1):70–92,

2016. 2

[39] Y. Stern. Cognitive reserve and alzheimer disease. Alzheimer

Disease & Associated Disorders, 20(2):112–117, 2006. 7

[40] J. Su, S. Kurtek, E. Klassen, A. Srivastava, et al. Statistical

analysis of trajectories on riemannian manifolds: bird migra-

tion, hurricane tracking and video surveillance. The Annals

of Applied Statistics, 8(1):530–552, 2014. 1,2

[41] J. Su, A. Srivastava, F. D. de Souza, and S. Sarkar. Rate-

invariant analysis of trajectories on riemannian manifolds

with application in visual speech recognition. In Proceed-

ings of the IEEE Conference on Computer Vision and Pattern

Recognition, pages 620–627, 2014. 1,2

[42] M. Vaillant and J. Glaun`

es. Surface matching via currents.

In Information processing in medical imaging, pages 1–5.

Springer, 2005. 4,11

[43] L. Younes. Shapes and Diffeomorphisms. Applied Mathe-

matical Sciences. Springer Berlin Heidelberg, 2010. 1

[44] M. Zhang, N. Singh, and P. T. Fletcher. Bayesian estimation

of regularization and atlas building in diffeomorphic image

registration. In IPMI, volume 23, pages 37–48, 2013. 1

Appendix: supplementary material

We introduce the onset age individual random variable ti=t0+⌧i⇠N(t0,2

⌧)instead of the time shift ⌧i.

The obtained hierarchical model is equivalent to the one presented in Section 3, with unchanged parameters ✓=

(y0, c0, m0, A, t0,2

⌧,2

⇠,2

✏)and equivalent random effects z=(zpop ,z

1,...,z

N), where zpop =(y0,c

0,m

0,A)and

8i2J1,NK,z

i=(ti,⇠i,s

i). The complete log-likelihood writes:

log q(y, z, ✓)=

N

X

i=1

ni

X

j=1

log q(yi,j |z,✓) + log q(zpop|✓)+

N

X

i=1

log q(zi|✓) + log q(✓)(6)

where the densities q(yi,j |z,✓),q(zpop |✓),q(zi|✓)and q(✓)are given, up to an additive constant, by:

2 log q(yi,j |z, ✓)+cst

=⇤log 2

✏+kyi,j ⌘c0,m0,t0, i(ti,j )(wi)y0k2/2

✏(7)

2 log q(zpop|✓)+cst

=|y0|log 2

y+ky0y0k2/2

y+|c0|log 2

c+kc0c0k2/2

c(8)

+|m0|log 2

m+km0m0k2/2

m+|A|log 2

A+kAAk2/2

A

2 log q(zi|✓)+cst

= log 2

⌧+(tit0)2/2

⌧+ log 2

⇠+⇠2

i/2

⇠+ksik2(9)

2 log q(✓)+cst

=ky0y0k2/&2

y+kc0c0k2/&2

c+km0m0k2/&2

m+kAAk2/&2

A(10)

+(t0t0)2/&2

t+m⌧log 2

⌧+m⌧2

⌧,0/2

⌧+m⇠log 2

⇠+m⇠2

⇠,0/2

⇠

+m✏log 2

✏+m✏2

✏,0/2

✏

noting ⇤the dimension of the space where the residual kyi,j ⌘c0,m0,t0, i(ti,j )(wi)y0k2is computed, and |y0|,|c0|,|m0|

and |A|the total dimension of y0,c0,m0and Arespectively. We chose either the current [42] or the varifold [7] norm for the

residuals.

Noticing the identity ⌘c0,m0,t0, i(ti,j )=⌘c0,m0,0, i(ti,j )t0, the complete log-likelihood can be decomposed into

log q(y, z, ✓)=⌦S(y, z),(✓)↵Id (✓)i.e. the proposed mixed-effects model belongs the curved exponential family.

In this setting, the MCMC-SAEM algorithm presented in Section 4has a proved convergence.

Exhibiting the sufﬁcient statistics S1=y0,S2=c0,S3=m0,S4=A,S5=Piti,S6=Pit2

i,S7=Pi⇠2

iand

S8=PiPjkyi,j ⌘c0,m0,t0, i(ti,j )(wi)y0k2(see Section 4.5), the update of the model parameters ✓ ✓?in the M step

of the MCMC-SAEM algorithm can be derived in closed form:

y0?=⇥&2

yS1+2

yy0⇤/⇥&2

y+2

y⇤t?

0=⇥&2

tS5+2

⌧

?t0⇤/⇥N&2

t+2

⌧

?⇤(11)

c0?=⇥&2

cS2+2

cc0⇤/⇥&2

c+2

c⇤2

⌧

?=⇥S62t?

0S5+Nt?

0

2+m⌧2

⌧,0⇤/⇥N+m⌧⇤(12)

m0?=⇥&2

mS3+2

mm0⇤/⇥&2

m+2

m⇤2

⇠

?=⇥S7+m⇠2

⇠,0⇤/⇥N+m⇠⇤(13)

A?=⇥&2

AS4+2

AA⇤/⇥&2

A+2

A⇤2

✏

?=⇥S8+m✏2

✏,0⇤/⇥⇤Nhniii+m✏⇤(14)

The intricate update of the parameters t0 t?

0and 2

⌧ 2

⌧

?can be solved by iterative replacement.

Similarly to Equation 6, the tempered complete log-likelihood writes:

log qT(y, z, ✓)=

N

X

i=1

ni

X

j=1

log qT(yi,j |z,✓) + log qT(zpop|✓)+

N

X

i=1

log q(zi|✓) + log q(✓)(15)

with: 2 log qT(yi,j |z, ✓)+cst

=⇤log(T2

✏)+kyi,j ⌘c0,m0,t0, i(ti,j )(wi)y0k2/(T2

✏)(16)

2 log qT(zpop|✓)+cst

=|y0|log(T2

y)+ky0y0k2/(T2

y)+|c0|log(T2

c)+kc0c0k2/(T2

c)(17)

+|m0|log(T2

m)+km0m0k2/(T2

m)+|A|log(T2

A)+kAAk2/(T2

A)

Tempering can therefore be understood as an artiﬁcial increase of the variances 2

✏,2

y,2

c,2

mand 2

Awhen computing

the associated acceptation ratios in the S-MCMC step of the algorithm. This intuition is well-explained in [25].