Conference PaperPDF Available

Learning Distributions of Shape Trajectories from Longitudinal Datasets: A Hierarchical Model on a Manifold of Diffeomorphisms

Authors:

Abstract and Figures

We propose a method to learn a distribution of shape trajectories from longitudinal data, i.e. the collection of individual objects repeatedly observed at multiple time-points. The method allows to compute an average spatiotemporal trajectory of shape changes at the group level, and the individual variations of this trajectory both in terms of geometry and time dynamics. First, we formulate a non-linear mixed-effects statistical model as the combination of a generic statistical model for manifold-valued longitudinal data, a deformation model defining shape trajectories via the action of a finite-dimensional set of diffeomorphisms with a manifold structure, and an efficient numerical scheme to compute parallel transport on this manifold. Second, we introduce a MCMC-SAEM algorithm with a specific approach to shape sampling, an adaptive scheme for proposal variances , and a log-likelihood tempering strategy to estimate our model. Third, we validate our algorithm on 2D simulated data, and then estimate a scenario of alteration of the shape of the hippocampus 3D brain structure during the course of Alzheimer's disease. The method shows for instance that hippocampal atrophy progresses more quickly in female subjects, and occurs earlier in APOE4 mutation carriers. We finally illustrate the potential of our method for classifying pathological trajectories versus normal ageing.
Content may be subject to copyright.
Learning distributions of shape trajectories from longitudinal datasets:
a hierarchical model on a manifold of diffeomorphisms
Alexandre Bˆ
one Olivier Colliot Stanley Durrleman
The Alzheimer’s Disease Neuroimaging Initiative
Institut du Cerveau et de la Moelle ´
epini`
ere, Inserm, CNRS, Sorbonne Universit´
e, Paris, France
Inria, Aramis project-team, Paris, France
{alexandre.bone, olivier.colliot, stanley.durrleman}@icm-institute.org
Abstract
We propose a method to learn a distribution of shape tra-
jectories from longitudinal data, i.e. the collection of indi-
vidual objects repeatedly observed at multiple time-points.
The method allows to compute an average spatiotemporal
trajectory of shape changes at the group level, and the indi-
vidual variations of this trajectory both in terms of geometry
and time dynamics. First, we formulate a non-linear mixed-
effects statistical model as the combination of a generic sta-
tistical model for manifold-valued longitudinal data, a de-
formation model defining shape trajectories via the action
of a finite-dimensional set of diffeomorphisms with a man-
ifold structure, and an efficient numerical scheme to com-
pute parallel transport on this manifold. Second, we intro-
duce a MCMC-SAEM algorithm with a specific approach
to shape sampling, an adaptive scheme for proposal vari-
ances, and a log-likelihood tempering strategy to estimate
our model. Third, we validate our algorithm on 2D sim-
ulated data, and then estimate a scenario of alteration of
the shape of the hippocampus 3D brain structure during the
course of Alzheimer’s disease. The method shows for in-
stance that hippocampal atrophy progresses more quickly
in female subjects, and occurs earlier in APOE4 mutation
carriers. We finally illustrate the potential of our method for
classifying pathological trajectories versus normal ageing.
1. Introduction
1.1. Motivation
At the interface of geometry, statistics, and computer sci-
ence, statistical shape analysis meets a growing number of
applications in computer vision and medical image anal-
ysis. This research field has addressed two main statisti-
cal questions: atlas construction for cross-sectional shape
datasets, and shape regression for shape time series. The
former is the classical extension of a mean-variance analy-
sis, which aims to estimate a mean shape and a covariance
structure from observations of several individual instances
of the same object or organ. The latter extends the concept
of regression by estimating a spatiotemporal trajectory of
shape changes from a series of observations of the same in-
dividual object at different time-points. The emergence of
longitudinal shape datasets, which consist in the collection
of individual objects repeatedly observed at multiple time-
points, has raised the need for a combined approach. One
needs statistical methods to estimate normative spatiotem-
poral models from series of individual observations which
differ in shape and dynamics of shape changes across in-
dividuals. Such model should capture and disentangle the
inter-subject variability in shape at each time-point and the
temporal variability due to shifts in time or scalings in pace
of shape changes. Considering individual series as samples
along a trajectory of shape changes, this approach amounts
to estimate a spatiotemporal distribution of trajectories, and
has potential applications in various fields including silhou-
ette tracking in videos, analysis of growth patterns in biol-
ogy or modelling disease progression in medicine.
1.2. Related work
The central difficulty in shape analysis is that shape
spaces are either defined by invariance properties [21,40,
41] or by the conservation of topological properties [5,8,
13,20,34], and therefore have intrinsically a structure of
infinite dimensional Riemannian manifolds or Lie groups.
Statistical Shape Models [9] are linear but require consistent
points labelling across observations and have no topology
preservation guarantees. A now usual approach is to use
the action of a group of diffeomorphisms to define a metric
on a shape space [29,43]. This approach has been used to
compute a “Fr´
echet mean” together with a covariance ma-
trix in the tangent-space of the mean [1,16,17,33,44] from
a cross-sectional dataset, and regression from time series of
shape data [4,15,16,18,26,32]. In [12,14,31], these tools
have been used to estimate an average trajectory of shape
changes from a longitudinal dataset using the convenient
assumption that the parameters encoding inter-individual
variability are independent of time. The work in [27] in-
troduced the idea to use the parallel transport to translate
the spatiotemporal patterns seen in one individual into the
geometry of another one. The co-adjoint transport is used
in [38] for the same purpose. Both estimate a group average
trajectory from individual trajectories. The proposed mod-
els do not account for inter-individual variability in the time
dynamics, which is of key importance in the absence of tem-
poral markers of the progression to align the sequences. The
same remarks can be applied to [6], which introduces a nice
theoretical setting for spaces of trajectories, in the case of a
fixed number of temporal observations across subjects. The
need for temporal alignment in longitudinal data analysis
is highlighted for instance in [13] with a diffeomorphism-
based morphometry approach, or in [40,41] with quotient
manifolds. In [24,37], a generative mixed-effects model
for the statistical analysis of manifold-valued longitudinal
data is introduced for the analysis of feature vectors. This
model describes both the variability in the direction of the
individual trajectories by introducing the concept of “exp-
parallelization” which relies on parallel transport, and the
pace at which those trajectories are followed using “time-
warp” functions. Similar time-warps are used by the authors
of [23] to refine their linear modeling approach [22].
1.3. Contributions
In this paper, we propose to extend the approach of [37]
from low-dimensional feature vectors to shape data. Us-
ing an approach designed for manifold-valued data for
shape spaces defined by the action of a group of diffeomor-
phisms raises several theoretical and computational difficul-
ties. Are notably needed: a finite-dimensional set of dif-
feomorphisms with a Riemannian manifold structure; sta-
ble and efficient numerical schemes to compute Riemannian
exponential and parallel transport operators on this mani-
fold as no closed-form expressions are available; acceler-
ated algorithmic convergence to cope with an hundreds of
times larger dimensionality. To this end, we propose here:
to formulate a generative non-linear mixed-effects
model for a finite-dimensional set of diffeomorphisms
defined by control points, to show that this set is sta-
ble under exp-parallelization, and to use an efficient
numerical scheme for parallel transport;
to introduce an adapted MCMC-SAEM algorithm with
an adaptive block sampling of the latent variables, a
specific sampling strategy for shape parameters based
on random local displacements of the shape contours,
and a vanishing tempering of the target log-likelihood;
to validate our method on 2D simulated data and a
large dataset of 3D brain structures in the context of
Alzheimer’s disease progression, and to illustrate the
potential of our method for classifying spatiotemporal
patterns, e.g. to discriminate pathological versus nor-
mal trajectories of ageing.
All in one, the proposed method estimates an average
spatiotemporal trajectory of shape changes from a longi-
tudinal dataset, together with distributions of space-shifts,
time-shifts and acceleration factors describing the variabil-
ity in shape, onset and pace of shape changes respectively.
2. Deformation model
2.1. The manifold of diffeomorphisms Dc0
We follow the approach taken in [12] built on the princi-
ples of the Large Deformation Diffeomorphic Metric Map-
ping (LDDMM) framework [30]. We note d2{2,3}the
dimension of the ambient space. We choose ka Gaussian
kernel of width 2R
+and ca set of ncp 2N“con-
trol” points c=(c1,...,c
ncp )of the ambient space Rd. For
any set of “momentum” vectors m=(m1,...,m
ncn ), we
define the “velocity” vector field v:Rd!Rdas the con-
volution v(x)=Pncp
k=1 k(ck,x)mkfor any point xof the
ambient space Rd. From initial sets of ncp control points c0
and corresponding momenta m0, we obtain the trajectories
t!(ct,m
t)by integrating the Hamiltonian equations:
˙c=Kcmm=1
2rcmTKcm (1)
where Kcis the ncp ncp “kernel” matrix [k(ci,c
j)]ij ,r(.)
the gradient operator, and (.)Tthe matrix transposition.
Those trajectories prescribe the trajectory t!vtin the
space of velocity fields. The integration along such a path
from the identity generates a flow of diffeomorphisms t!
tof the ambient space [5]. We can now define:
Dc0=1;@tt=vtt,0=Id,v
t=Conv(ct,m
t)
ct,˙mt)=Ham(ct,m
t),m
02Rdncp (2)
where Conv(., .)and Ham(., .)are compact notations for
the convolution operator and the Hamiltonian equations (1)
respectively. Dc0has the structure of a manifold of finite
dimension, where the metric at the tangent space T1Dc0is
given by K1
c1. It is shown in [29] that the proposed paths
t!tare the paths of minimum deformation energy, and
are therefore the geodesics of Dc0. These geodesics are
fully determined by an initial set of momenta m0.
Then, any point x2Rdof the ambient space follows the
trajectory t!t(x). Such trajectories are used to deform
any point cloud or mesh embedded in the ambient space,
defining a diffeomorphic deformation of the shape. For-
mally, this operation defines a shape space Sc0,y0as the or-
bit of a reference shape y0under the action of Dc0. The
manifold of diffeomorphisms Dc0is used as a proxy to ma-
nipulate shapes: all computations are performed in Dc0or
more concretely on a finite set of control points and mo-
mentum vectors, and applied back to the template shape y0
to obtain a result in Sc0,y0.
2.2. Riemmanian exponentials on Dc0
For any set of control points c0, we define the exponen-
tial operator Expc0:m02Rdncp !12Dc0. Note that
Dc0=Expc0(m0)|m02Rdncp .
The following proposition ensures the stability of Dc0by
the exponential operator, i.e. that the control points obtained
by applying successive compatible exponential maps with
arbitrary momenta are reachable by an unique integration
of the Hamiltonian equations from c0:
Proposition. Let c0be a set of control points. 812Dc0,
8wmomenta, we have Exp1(c0)(w)2Dc0.
Proof. We note 0
1=Exp
1(c0)(w)2D1(c0)and c0
1=
0
11(c0). By construction, there exist two paths t!'t
in Dc0and s!'0
sin D1(c0)such that '0
1'1(c0)=c0
1.
Therefore there exist a diffeomorphic path u! usuch
that (c0)=c0
1. Concluding with [29], the path u! uof
minimum energy exists, and is written u!Expc0(u.m0
0)
for some m0
02Rdncp .
As a physical interpretation might be given to the inte-
gration time twhen building a statistical model, we intro-
duce the notation Expc0,t0,t :m02Rdncp !t2Dc0
where tis obtained by integrating from t=t0. Note that
Expc0=Exp
c0,0,1.
On the considered manifold Dc0, computing exponen-
tials – i.e. geodesics – therefore consists in integrating or-
dinary differential equations. This operation is direct and
computationally tractable. The top line on Figure 1plots a
geodesic :t!tapplied to the top-left shape y0.
2.3. Parallel transport and exp-parallels on Dc0
In [37] is introduced the exp-parallelism, which is a
generalization of Euclidian parallelism to geodesically-
complete manifolds. It relies on the Riemmanian parallel
transport operator, which we propose to compute using the
fanning scheme [28]. This numerical scheme only requires
the exponential operator to approximate the parallel trans-
port along a geodesic, with proved convergence.
We note Pc0,m0,t0,t :Rdncp !Rdncp the parallel trans-
port operator, which transports any momenta walong the
geodesic :t!t=Exp
c0,t0,t(m0)from t0to t. For
any c0,m0,wand t0, we can now define the curve:
t!c0,m0,t0,t(w)=Exp
(t)(c0)Pc0,m0,t0,t(w).(3)
This curve, that we will call exp-parallel to , is well-
defined on the manifold Dc0, according to the proposition of
Section 2.2. Figure 1illustrates the whole procedure. From
the top-left shape, the computational scheme is as follow:
integrate the Hamiltonian equations to obtain the control
Figure 1: Samples from a geodesic (top) and an exp-
parallelized curve (bottom) on Sc0,y0. Parameters encod-
ing the geodesic are the blue momenta attached to con-
trol points and plotted together with the associated velocity
fields. Momenta in red are parallel transported along the
geodesic and define a deformation mapping each frame of
the geodesic to a frame of . Exp-parallelization allows to
transport a shape trajectory from one geometry to another.
points c(t)(red crosses) and momenta m(t)(bold blue ar-
rows); compute the associated velocity fields vt(light blue
arrows); compute the flow :t!t(shape progression);
transport the momenta walong (red arrows); compute the
exp-parallel curve by repeating the three first steps along
the transported momenta.
3. Statistical model
For each individual 1iNare available nilongitu-
dinal shape measurements y=(yi,j )1jniand associated
times (ti,j )1jni.
3.1. The generative statistical model
Let c0a set of control points and m0associated mo-
menta. We call the geodesic t!Expc0,t0,t(m0)of Dc0.
Let y0be a template mesh shape embedded in the ambi-
ent space. For a subject i, the observed longitudinal shape
measurements yi,0,...,y
i,niare modeled as sample points at
times i(ti,j )of an exp-parallel curve t!c0,m0,t0,t(wi)
to this geodesic , plus additional noise i,j :
yi,j =c0,m0,t0, i(ti,j )(wi)y0+i,j .(4)
The time warp function iand the space-shift momenta wi
respectively encode for the individual time and space vari-
ability. The time-warp is defined as an affine reparametriza-
tion of the reference time t: i(t)=i(tt0i)+t0
where the individual time-shift i2Rallows an inter-
individual variability in the stage of evolution, and the in-
dividual acceleration factor i2R
+a variability in the
pace of evolution. For convenience, we write i= exp(i).
In the spirit of Independent Component Analysis [19], the
space-shift momenta wiis modeled as the linear combi-
nation of nssources, gathered in the ncp nsmatrix A:
wi=Am?
0si. Before computing this superposition, each
column cl(A)of A has been projected on the hyperplane
m?
0for the metric Kc0, ensuring the orthogonality between
m0and wi. As argued in [37], this orthogonality is fun-
damental for the identifiability of the model. Without this
constraint, the projection of the space shifts (wi)ion m0
could be confounded with the acceleration factors (i)i.
3.2. Mixed-effects formulation
We use either the current [42] or varifold [7] noise model
for the residuals i,j , allowing our method to work with in-
put meshes without any point correspondence. In this set-
ting, we note i,j
iid
N(0,2
). The other previously in-
troduced variables are modeled as random effects z, with:
y0N(y0,2
y),c0N(c0,2
c),m0N(m0,2
m),A
N(A, 2
A),i
iid
N(0,2
),i
iid
N(0,2
),si
iid
N(0,1).
We define =(y0, c0, m0, A, t0,2
,2
,2
)the fixed ef-
fects i.e. the parameters of the model. The remaining vari-
ance parameters 2
y,2
c,2
mand 2
Acan be chosen arbitrar-
ily small. Standard conjugate distributions are chosen as
Bayesian priors on the model parameters: y0N(y0,&2
O),
c0N(c0,&2
c),m0N(m0,&2
m),AN(A, &2
A),
t0N(t0,&2
t),2
IG(m,2
,0),2
IG(m,2
,0),
and 2
IG(m,2
,0)with inverse-gamma distributions
on variance parameters. Those priors ensure the existence
of the maximum a posteriori (MAP) estimator. In practice,
they regularize and guide the estimation procedure.
The proposed model belongs to the curved exponential
family (see supplementary material, which gives the com-
plete log-likelihood). In this setting, the algorithm intro-
duced in the following section has a proved convergence.
We have defined a distribution of trajectories that could
be noted t!y(t)=f,t(z)where zis a random variable
following a normal distribution. We call t!f,t(E[z]) the
average trajectory, which may not be equal to the expected
trajectory t!E[f,t(z)] in the general non-linear case.
4. Estimation
4.1. The MCMC-SAEM algorithm
The Expectation Maximization (EM) algorithm [11] al-
lows to estimate the parameters of a mixed-effects model
with latent variables, here the random effects z. It alter-
nates between an expectation (E) step and a maximiza-
tion (M) one. The E step is intractable in our case, due
to the non-linearity of the model. In [10] is introduced
and proved a stochastic approximation of the EM algo-
rithm, where the E step is replaced by a simulation (S)
step followed by an approximation (A) one. The S step
requires to sample q(z|y, k), which is also intractable
in our case. In the case of curved exponential mod-
els, the authors in [2] show that the convergence holds
if the S step is replaced by a single transition of an er-
godic Monte-Carlo Markov Chain (MCMC) whose sta-
tionary distribution is q(z|y, k). This global algorithm is
called the Monte-Carlo Markov Chain Stochastic Approxi-
mation Expectation-Maximization (MCMC-SAEM), and is
exploited in this paper to compute the MAP estimate of the
model parameters map = maxRq(y, z|)dz.
4.2. The adaptative block sampler
We use a block formulation of the Metropolis-Hasting
within Gibbs (MHwG) sampler in the S-MCMC step. The
latent variables zare decomposed into nbnatural blocks:
z=y0,c
0,m
0,[cl(A)]l,[i,i,s
i]i . Those blocks have
highly heterogeneous sizes, e.g. a single scalar for iversus
possibly thousands for y0, for which we introduce a specific
proposal distribution in Section 4.3.
For all the other blocks, we use a symmetric random
walk MHwG sampler with normal proposal distributions
of the form N(0,2
bId)to perturb the current block state
zk
b. In order to achieve reasonable acceptance rates ar i.e.
around ar?= 30% [36], the proposal standard deviations
bare dynamically adapted every nadapt iterations by mea-
suring the mean acceptance rates ar over the last ndetect it-
erations, and applying, for any b:
b b+1
k
ar ar?
(1 ar?)arar?+ar?ar<ar?
(5)
with >0.5. Inspired by [3], this dynamic adaptation is
performed with a geometrically decreasing step-size k,
ensuring the vanishing property of the adaptation scheme
and the convergence of the whole algorithm [2,3]. It proved
very efficient in practice with nadapt =ndetect = 10 and
=0.51, for any kind of data.
4.3. Efficient sampling of smooth template shapes
The first block z1=y0i.e. the coordinates of the points
of the template mesh, is of very high dimension: naively
sampling over each scalar value of its numerical descrip-
tion would result both in unnatural distorted shapes and a
daunting computational burden.
We propose to take advantage of the geometrical na-
ture of y0and leverage the framework introduced in Sec-
tion 2by perturbing the current block state zk
1with a small
displacement field v, obtained by the convolution of ran-
dom momenta on a pre-selected set of control points. This
proposal distribution can be seen as a normal distribution
N(0,2
1DTD)where 2
1is the variance associated with
the random momenta, and Dthe convolution matrix. In
practice, dynamically adapting the proposal variance 2
1and
selecting regularly-spaced shape points as control points
proved efficient.
4.4. Tempering
The MCMC-SAEM is proved convergent toward a local
maximum of !Rq(y, z|)dz. In practice, the dimen-
sionality of the energetic landscape q(y, z|)and the pres-
ence of multiple local maxima can make the estimation pro-
cedure sensitive to initialization. Inspired by the globally-
convergent simulated annealing algorithm, [25] proposes
to carry out the optimization procedure in a smoothed ver-
sion of the original landscape qT(y, z|). The temperature
parameter Tcontrols this smoothing, and should decrease
from large values to 1, for which qT=q.
We propose to introduce such a temperature parameter
only for the population variables zpop. The tempered version
of the complete log-likelihood is given as supplementary
material. In our experiments, the chosen temperature se-
quence Tkremains constant at first, and then geometrically
decreases to unity. Implementing this “tempering” feature
had a dramatic impact on the required number of iterations
before convergence, and greatly improved the robustness of
the whole procedure. Note that the theoretical convergence
properties of the MCMC-SAEM are not degraded, since the
tempered phase of the algorithm can be seen as an initializ-
ing heuristic, and may actually be improved.
Algorithm 1: Estimation of the longitudinal deformations
model with the MCMC-SAEM.
Code publicly available at: www.deformetrica.org.
input : Longitudinal dataset of shapes y=(yi,j )i,j . Initial
parameters 0and latent variables z0. Geometri-
cally decreasing sequence of step-sizes k.
output: Estimation of map.Samples (zs)sapproximately
distributed following q(z|y, map).
Initialization: set k=0and S0=S(z0).
repeat
Simulation:foreach block of latent variables zbdo
Draw a candidate zc
bpb(.|zk
b).
Set zc=(zk+1
1,...,zk+1
b1,zc
b,zk
b+1,...,zk
nb).
Compute the geodesic :t!Expc0,t0,t(m0).
8i, compute wi=A?
m0si.
8i, compute w:t!Pc0,m0,t0,t(wi).
8i, j, compute Exp[ i(ti,j )](c0)w[ i(ti,j )] .
Compute the acceptation ratio !=minh1,q(zc|y,k)
q(zk|y,k)i.
if uU(0,1) <!then zk+1
b zc
belse zk+1
b zk
b.
end
Stochastic approx.:Sk+1 Sk+kS(zk+1)Sk.
Maximization:k+1 ?(Sk+1).
Adaptation:if remainder(k+1,n
adapt)=0then update
the proposal variances (2
b)bwith equation (5).
Increment: set k k+1.
until convergence;
4.5. Sufficient statistics and maximization step
Exhibiting the sufficient statistics S1=y0,S2=c0,S3=
m0,S4=A,S5=Pit0+i,S6=Pi(t0+i)2,S7=Pi2
i
and S8=PiPjkyi,j c0,m0,t0, i(ti,j )(wi)y0k2, the
update of the model parameters ?in the M step of the
MCMC-SAEM can be derived in closed-form. The explicit
expressions are given as supplementary material.
5. Experiments
5.1. Validation with simulated data in R2
Convergence study. To validate the estimation procedure,
we first generate synthetic data directly from the model
without additional noise. Our choice of reference geodesic
is plotted on top line of the previously introduced Fig-
ure 1: the template y0is the top central shape, the chosen
five control points c0are the red crosses, and the momenta
m0the bold blue arrow. Those parameters are completed
with t0= 70,=1,=0.1. With ns=4independent
components, we simulate N= 100 individual trajectories
and sample hniii=5observations from each.
The algorithm is run ten times. Figure 2plots the evolu-
tion of the error on the parameters along the estimation pro-
cedure in log scale. Each color corresponds to a different
run: the algorithm converges to the same point each time,
as it is confirmed by the small variances on the residual er-
rors indicated in Table 1. Those residual errors come from
the finite number of observations of the generated dataset
0 10 20 30 40 50
Thousands of iterations
2
3
4
||y0
est - y 0
true||2
var
Varifold error on y0
0 10 20 30 40 50
Thousands of iterations
1
1.5
2
2.5
||v0
est - v 0
true||2
L2 error on v0
0 10 20 30 40 50
Thousands of iterations
10-1
100
101
|t0
est - t 0
true|
L1 error on t0
0 10 20 30 40 50
Thousands of iterations
10-3
10-2
10-1
100
101
|στ
est - στ
true|
L1 error on στ
0 10 20 30 40 50
Thousands of iterations
10-3
10-2
10-1
100
101
|σξ
est - σξ
true|
L1 error on σξ
0 10 20 30 40 50
Thousands of iterations
0.1
0.2
0.3
0.4
|σϵ
est - σϵ
true|
L1 error on σϵ
Figure 2: Error on the population parameters along the esti-
mation procedure, with logarithmic scales. The residual on
the template shape y0is computed with the varifold metric.
ky0k2
var. kv0k2|t0|||||||kvik2i|i|i|i|i
1.43 ±5.6% 0.89 ±0.7% 0.19 ±2.7% 0.029 ±13.2% 0.017 ±7.6% 0.11 ±0.1% 2.47 ±1.7% 0.022 ±6.7% 0.19 ±0.8%
Table 1: Absolute residual errors on the estimated parameters and associated relative standard deviations across the 10 runs.
Are noted v0=Conv(c0,m
0)and vi=Conv(c0,w
i). The operator h.iiindicates an average over the index i. Residuals
are satisfyingly small, as it can be seen for |t0|for instance when compared with the time-span max|tij |=4. The low
standard deviations suggest that the stochastic estimation procedure is stable and reproduces very similar results at each run.
Figure 3: Estimated mean progression
(bottom line in bold), and three recon-
structed individual scenarii (top lines). In-
put data is plotted in red in the rele-
vant frames, demonstrating the reconstruc-
tion ability of the estimated model. Our
method is able to disentangle the variabil-
ity in shape, starting time of the arm move-
ment and speed.
and the Bayesian priors, but are satisfyingly small, as qual-
itatively confirmed by Figure 3. The estimated mean trajec-
tory, in bold, matches the true one, given by the top line of
Figure 1. Figure 3also illustrates the ability of our method
to reconstruct continuous individual trajectories.
Personalizing the model to unseen data. Once a model
has been learned i.e. the parameters map have been esti-
mated, it can easily be personalized to the observations ynew
of a new subject by maximizing q(ynew,z
new |map)for the
low-dimensional latent variables znew. We implemented this
maximization procedure with the Powell’s method [35], and
evaluated it by registering the simulated trajectories to the
true model. Table 2gathers the results for the previously-
introduced dataset with hniii=5observations per subject,
and extended ones with hniii=7and 9. The parameters
are satisfyingly estimated in all configurations: the recon-
Experience |||si|i|i|i|i|i
hniii=5 0.110 3.34% 37.0% 5.45%
hniii=7 0.095 2.98% 16.2% 3.86%
hniii=9 0.087 2.38% 11.9% 3.28%
Table 2: Residual errors metrics for the longitudinal regis-
tration procedure, for three simulated datasets. The abso-
lute residual error on is given, the other errors are given
in percentage of the simulation standard deviation.
struction error measured by ||remains as low as in the
previous experiment (see Table 1, Figure 3). The acceler-
ation factor is the most difficult parameter to estimate with
small observation windows of the individual trajectories; at
least two observations are needed to obtain a good estimate.
5.2. Hippocampal atrophy in Alzheimer’s disease
Longitudinal deformations model on MCIc subjects. We
extract the T1-weighted magnetic resonance imaging mea-
surements of N= 100 subjects from the ADNI database,
with hniii=7.6datapoints on average. Those sub-
jects present mild cognitive impairements, and are even-
tually diagnosed with Alzheimer’s disease (MCI convert-
ers, noted MCIc). In a pre-processing phase, the 3D
images are affinely aligned and the segmentations of the
right-hemisphere hippocampi are transformed into a surface
meshes. Each affine transformation is then applied to the
corresponding mesh, before rigid alignement of follow-up
meshes on the baseline one. The hippocampus is a subcorti-
cal brain structure which plays a central role in memory, and
experiences atrophy during the development of Alzheimer’s
disease. We initialize the geodesic population parameters
y0,c
0,m
0with a geodesic regression [15,16] performed on
a single subject. The reference time t0is initialized to the
mean of the observation times (ti,j)i,j and 2
to the corre-
sponding variance. We choose to estimate ns=4indepen-
dent components and initialize the corresponding matrix A
to zero, so as the individual latent variables i,i,s
i. After
10,000 iterations, the parameter estimates stabilized.
!. #$
!. !%
!. &%
!. '(
!. $(
!. ((
!. %(
!. )(
*!
++/-
Figure 4: Estimated mean progression of the right hippocampi. Successive ages: 69.3y, 71.8y (i.e. the template y0), 74.3y,
76.8y, 79.3y, 81.8y, and 84.3y. The color map gives the norm of the velocity field kv0kon the meshes.
!" #$
#" #%
&" #%
'" #(
$" #(
)" #)
(" #)
%" #$
*+,$
--./0
Figure 5: Third independent component. The plotted hippocampi correspond to si,3successively equal to: -3, -2, -1, 0 (i.e.
the template y0), 1, 2 and 3. Note that this component is orthogonal to the temporal trajectory displayed in Figure 4.
Figure 4plots the estimated mean progression, which ex-
hibits a complex spatiotemporal atrophy pattern during dis-
ease progression: a pinching effect at the “joint” between
the head and the body, combined with a specific atrophy of
the medial part of the tail. Figure 5plots an independent
component, which is orthogonal to the mean progression
by construction. This component seems to account for the
inter-subject variability in the relative size of the hippocam-
pus head compared to its tail.
We further examine the correlation between individual
parameters and several patients characteristics. Figure 6ex-
hibits the strong correlation between the estimated individ-
ual time-shifts iand the age of diagnostic tdiag
i, suggesting
that the hippocampal atrophy correlates well with the cog-
nitive symptoms. The few outliers above the regression line
55 60 65 70 75 80 85 90
( early onset) Onset age t0+t
i(late onset !)
60
70
80
90
Age at diagnosis tdiag
i
Figure 6: Comparison of the estimated individual time-
shifts iversus the age of diagnostic tdiag
i.R2=0.74.
might have resisted better to the atrophy of their hippocam-
pus with a higher brain plasticity, in line with the cognitive
reserve theory [39]. The few outliers below this line could
have developed a subform of the disease, with delayed atro-
phy of their hippocampi. Further investigation is required
to rule out potential convergence issues in the optimiza-
tion procedure. Figures 7,8,9propose group comparisons
based on the estimated individual parameters: the accelera-
tion factor i, time-shift iand space-shift si,3in the direc-
tion of the third component (see Figure 5). The distributions
of those parameters are significantly different for the Mann-
Whitney statistical test when dividing the N= 100 MCIc
subjects according to gender, APOE4 mutation status, and
onset age t0+irespectively.
Figure 7: Distributions of acceleration factors iaccord-
ing to the gender. Hippocampal atrophy is faster in female
subjects (p=0.045).
55 60 65 70 75 80 85 90
( early onset) Onset age t0+t
i(late onset !)
0
5
10
No APOE4 allele
One APOE4 allele
Two A P OE 4 a l le l es
Figure 8: Distributions of time-shifts iaccording to the
number of APOE4 alleles. Hippocampal atrophy occurs
earlier in carriers of 1 or 2 alleles (p=0.017 and 0.015).
3210123
Source parameter si,3
0.0
2.5
5.0
7.5
10.0 Early onset
Ave r a ge on s e t
Late onset
Figure 9: Distribution of the third source term si,3accord-
ing to the categories {i3},{3<i<3}, and
{3i}. Hippocampal atrophy seems to occurs later in
subjects presenting a lower volume ratio of the hippocam-
pus tail over the hippocampus head (p=0.0049).
hniiiShape features Naive feature All features
1 71% ±4.5 [lr] 50% ±5.0 [nb] 58% ±5.0 [lr]
2 77% ±4.3 [lr] 58% ±4.9 [5nn] 68% ±4.7 [dt]
4 79% ±4.1 [svm] 67% ±4.7 [5nn] 80% ±4.0 [lr]
5 77% ±4.2 [nn] 77% ±4.2 [lr] 82% ±3.8 [nb]
6.86 83% ±3.7 [lr] 80% ±4.0 [lr] 86% ±3.4 [lr]
Table 3: Mean classification scores and associated standard
deviations, computed on 10,000 bootstrap samples from the
test dataset. Across all tested classifiers (sklearn default
hyperparameters), only the best performing one is reported
in each case: [lr] logistic regression, [nb] naive Bayes,
[5nn] 5 nearest neighbors, [dt] decision tree, [nn] neural net-
work, [svm] linear support vector machine.
Classifying pathological trajectories vs. normal ageing.
We processed another hundred of individuals from the
ADNI database (hniii=7.37), choosing this time control
subjects (CN). We form two balanced datasets, each con-
taining 50 MCIc and 50 CN. We learn two distinct longitu-
dinal deformations model on the training MCIc (N= 50,
hniii=8.14) and CN (N= 50,hniii=8.08) subjects.
We personalize both models to all the 200 subjects, and
use the scaled and normalized differences zMCIc
izCN
ias
feature vectors of dimension 6, on which a list of stan-
dard classifiers are trained and tested to predict the label
in {MCIc,CN}. For several number of observations per
test subject hniiiconfigurations, we compute confidence
intervals by bootstraping the test set. Table 3compares
the results with a naive approach, using as unique feature
the slope of individually-fitted linear regressions of the hip-
pocampal volume with age. Classifiers performed consis-
tently better with the features extracted from the longitu-
dinal deformations model, even with a single observation.
The classification performance increases with the number
of available observations per subject. Interestingly, from
hniii=4pooling the shape and volume features yields an
improved performance, suggesting complementarity.
6. Conclusion
We proposed a hierarchical model on a manifold of dif-
feomorphisms estimating the spatiotemporal distribution of
longitudinal shape data. The observed shape trajectories
are represented as individual variations of a group-average,
which can be seen as the mean progression of the popu-
lation. Both spatial and temporal variability are estimated
directly from the data, allowing the use of unaligned se-
quences. This feature is key for applications where no ob-
jective temporal markers are available, as it is the case for
Alzheimer’s disease progression for instance, whose onset
age and pace of progression vary among individuals. Our
model builds on the principles of a generic longitudinal
modeling for manifold-valued data [37]. We provided a
coherent theoretical framework for its application to shape
data, along with the needed algorithmic solutions for paral-
lel transport and sampling on our specific manifold. We es-
timated our model with the MCMC-SAEM algorithm both
with simulated and real data. The simulation experiments
confirmed the ability of the proposed algorithm to retrieve
the optimal parameters in realistic scenarii. The application
to medical imaging data, namely segmented hippocampi
brain structures of Alzheimer’s diseased patients, deliv-
ered results coherent with medical knowledge, and provides
more detailed insights into the complex atrophy pattern of
the hippocampus and its variability across patients. In fu-
ture work, the proposed method will be leveraged for auto-
matic diagnosis and prognosis purposes. Further investiga-
tions are also needed to evaluate the algorithm convergence
with respect to the number of individual samples.
Acknowledgments. This work has been partly funded by the Eu-
ropean Research Council with grant 678304, European Union’s
Horizon 2020 research and innovation program with grant 666992,
and the program Investissements d’avenir ANR-10-IAIHU-06.
References
[1] S. Allassonni`
ere, S. Durrleman, and E. Kuhn. Bayesian
mixed effect atlas estimation with a diffeomorphic deforma-
tion model. SIAM Journal on Imaging Science, 8:13671395,
2015. 1
[2] S. Allassonni`
ere, E. Kuhn, and A. Trouv´
e. Construction of
bayesian deformable models via a stochastic approximation
algorithm: a convergence study. Bernoulli, 16(3):641–678,
2010. 4
[3] Y. F. Atchade. An adaptive version for the metropolis ad-
justed langevin algorithm with a truncated drift. Method-
ology and Computing in applied Probability, 8(2):235–254,
2006. 4
[4] M. Banerjee, R. Chakraborty, E. Ofori, M. S. Okun, D. E.
Viallancourt, and B. C. Vemuri. A nonlinear regression tech-
nique for manifold valued data with applications to medi-
cal image analysis. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 4424–
4432, 2016. 1
[5] F. Beg, M. Miller, A. Trouv´
e, and L. Younes. Computing
large deformation metric mappings via geodesic flows of dif-
feomorphisms. IJCV, 2005. 1,2
[6] R. Chakraborty, M. Banerjee, and B. C. Vemuri. Statistics
on the space of trajectories for longitudinal data analysis. In
Biomedical Imaging (ISBI 2017), 2017 IEEE 14th Interna-
tional Symposium on, pages 999–1002. IEEE, 2017. 2
[7] N. Charon and A. Trouv´
e. The varifold representation of
nonoriented shapes for diffeomorphic registration. SIAM
Journal on Imaging Sciences, 6(4):2547–2580, 2013. 4,11
[8] G. E. Christensen, R. D. Rabbitt, and M. I. Miller. De-
formable templates using large deformation kinematics.
IEEE transactions on image processing, 5(10):1435–1447,
1996. 1
[9] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appear-
ance models. IEEE Transactions on pattern analysis and
machine intelligence, 23(6):681–685, 2001. 1
[10] B. Delyon, M. Lavielle, and E. Moulines. Convergence of a
stochastic approximation version of the em algorithm. An-
nals of statistics, pages 94–128, 1999. 4
[11] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum
likelihood from incomplete data via the em algorithm. Jour-
nal of the royal statistical society. Series B (methodological),
pages 1–38, 1977. 4
[12] S. Durrleman, S. Allassonni`
ere, and S. Joshi. Sparse adaptive
parameterization of variability in image ensembles. IJCV,
101(1):161–183, 2013. 1,2
[13] S. Durrleman, X. Pennec, A. Trouv´
e, J. Braga, G. Gerig, and
N. Ayache. Toward a comprehensive framework for the spa-
tiotemporal statistical analysis of longitudinal shape data. In-
ternational Journal of Computer Vision, 103(1):22–59, May
2013. 1,2
[14] S. Durrleman, X. Pennec, A. Trouv´
e, G. Gerig, and N. Ay-
ache. Spatiotemporal atlas estimation for developmental de-
lay detection in longitudinal datasets. In Med Image Comput
Comput Assist Interv, pages 297–304. Springer, 2009. 1
[15] J. Fishbaugh, M. Prastawa, G. Gerig, and S. Durrleman.
Geodesic regression of image and shape data for improved
modeling of 4D trajectories. In ISBI 2014 - 11th Interna-
tional Symposium on Biomedical Imaging, pages 385 – 388,
Apr. 2014. 1,6
[16] T. Fletcher. Geodesic regression and the theory of least
squares on riemannian manifolds. IJCV, 105(2):171–185,
2013. 1,6
[17] P. Gori, O. Colliot, L. Marrakchi-Kacem, Y. Worbe,
C. Poupon, A. Hartmann, N. Ayache, and S. Durrleman. A
Bayesian Framework for Joint Morphometry of Surface and
Curve meshes in Multi-Object Complexes. Medical Image
Analysis, 35:458–474, Jan. 2017. 1
[18] J. Hinkle, P. Muralidharan, P. T. Fletcher, and S. Joshi. Poly-
nomial Regression on Riemannian Manifolds, pages 1–14.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. 1
[19] A. Hyv¨
arinen, J. Karhunen, and E. Oja. Independent compo-
nent analysis, volume 46. John Wiley & Sons, 2004. 3
[20] S. C. Joshi and M. I. Miller. Landmark matching via large
deformation diffeomorphisms. IEEE Transactions on Image
Processing, 9(8):1357–1370, 2000. 1
[21] D. G. Kendall. Shape manifolds, procrustean metrics, and
complex projective spaces. Bulletin of the London Mathe-
matical Society, 16(2):81–121, 1984. 1
[22] H. J. Kim, N. Adluru, M. D. Collins, M. K. Chung, B. B.
Bendlin, S. C. Johnson, R. J. Davidson, and V. Singh. Multi-
variate general linear models (mglm) on riemannian man-
ifolds with applications to statistical analysis of diffusion
weighted images. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 2705–
2712, 2014. 2
[23] H. J. Kim, N. Adluru, H. Suri, B. C. Vemuri, S. C. Johnson,
and V. Singh. Riemannian nonlinear mixed effects models:
Analyzing longitudinal deformations in neuroimaging. In
Proceedings of IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2017. 2
[24] I. Koval, J.-B. Schiratti, A. Routier, M. Bacci, O. Colliot,
S. Allassonni`
ere, S. Durrleman, A. D. N. Initiative, et al. Sta-
tistical learning of spatiotemporal patterns from longitudinal
manifold-valued networks. In International Conference on
Medical Image Computing and Computer-Assisted Interven-
tion, pages 451–459. Springer, 2017. 2
[25] M. Lavielle. Mixed effects models for the population ap-
proach: models, tasks, methods and tools. CRC press, 2014.
5,11
[26] M. Lorenzi, N. Ayache, G. Frisoni, and X. Pennec. 4D reg-
istration of serial brains MR images: a robust measure of
changes applied to Alzheimer’s disease. Spatio Temporal
Image Analysis Workshop (STIA), MICCAI, 2010. 1
[27] M. Lorenzi, N. Ayache, and X. Pennec. Schild’s ladder for
the parallel transport of deformations in time series of im-
ages. pages 463–474. Springer, 2011. 2
[28] M. Louis, A. Bˆ
one, B. Charlier, S. Durrleman, A. D. N. Ini-
tiative, et al. Parallel transport in shape analysis: a scalable
numerical scheme. In International Conference on Geomet-
ric Science of Information, pages 29–37. Springer, 2017. 3
[29] M. I. Miller, A. Trouv´
e, and L. Younes. Geodesic shooting
for computational anatomy. Journal of Mathematical Imag-
ing and Vision, 24(2):209–228, 2006. 1,2,3
[30] M. I. Miller and L. Younes. Group actions, homeomor-
phisms, and matching: A general framework. International
Journal of Computer Vision, 41(1-2):61–84, 2001. 2
[31] P. Muralidharan and P. T. Fletcher. Sasaki metrics for anal-
ysis of longitudinal data on manifolds. In Computer Vision
and Pattern Recognition (CVPR), 2012 IEEE Conference on,
pages 1027–1034. IEEE, 2012. 1
[32] M. Niethammer, Y. Huang, and F.-X. Vialard. Geodesic re-
gression for image time-series. In International Conference
on Medical Image Computing and Computer-Assisted Inter-
vention, pages 655–662. Springer, 2011. 1
[33] X. Pennec. Intrinsic statistics on riemannian manifolds: Ba-
sic tools for geometric measurements. Journal of Mathemat-
ical Imaging and Vision, 25(1):127–154, 2006. 1
[34] X. Pennec, P. Fillard, and N. Ayache. A riemannian frame-
work for tensor computing. International Journal of Com-
puter Vision, 66(1):41–66, 2006. 1
[35] M. J. Powell. An efficient method for finding the minimum
of a function of several variables without calculating deriva-
tives. The computer journal, 7(2):155–162, 1964. 6
[36] G. O. Roberts, A. Gelman, W. R. Gilks, et al. Weak con-
vergence and optimal scaling of random walk metropolis al-
gorithms. The annals of applied probability, 7(1):110–120,
1997. 4
[37] J.-B. Schiratti, S. Allassonni`
ere, O. Colliot, and S. Durrle-
man. Learning spatiotemporal trajectories from manifold-
valued longitudinal data. In NIPS 28, pages 2404–2412.
2015. 2,3,4,8
[38] N. Singh, J. Hinkle, S. Joshi, and P. T. Fletcher. Hierarchical
geodesic models in diffeomorphisms. IJCV, 117(1):70–92,
2016. 2
[39] Y. Stern. Cognitive reserve and alzheimer disease. Alzheimer
Disease & Associated Disorders, 20(2):112–117, 2006. 7
[40] J. Su, S. Kurtek, E. Klassen, A. Srivastava, et al. Statistical
analysis of trajectories on riemannian manifolds: bird migra-
tion, hurricane tracking and video surveillance. The Annals
of Applied Statistics, 8(1):530–552, 2014. 1,2
[41] J. Su, A. Srivastava, F. D. de Souza, and S. Sarkar. Rate-
invariant analysis of trajectories on riemannian manifolds
with application in visual speech recognition. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 620–627, 2014. 1,2
[42] M. Vaillant and J. Glaun`
es. Surface matching via currents.
In Information processing in medical imaging, pages 1–5.
Springer, 2005. 4,11
[43] L. Younes. Shapes and Diffeomorphisms. Applied Mathe-
matical Sciences. Springer Berlin Heidelberg, 2010. 1
[44] M. Zhang, N. Singh, and P. T. Fletcher. Bayesian estimation
of regularization and atlas building in diffeomorphic image
registration. In IPMI, volume 23, pages 37–48, 2013. 1
Appendix: supplementary material
We introduce the onset age individual random variable ti=t0+iN(t0,2
)instead of the time shift i.
The obtained hierarchical model is equivalent to the one presented in Section 3, with unchanged parameters =
(y0, c0, m0, A, t0,2
,2
,2
)and equivalent random effects z=(zpop ,z
1,...,z
N), where zpop =(y0,c
0,m
0,A)and
8i2J1,NK,z
i=(ti,i,s
i). The complete log-likelihood writes:
log q(y, z, )=
N
X
i=1
ni
X
j=1
log q(yi,j |z,) + log q(zpop|)+
N
X
i=1
log q(zi|) + log q()(6)
where the densities q(yi,j |z,),q(zpop |),q(zi|)and q()are given, up to an additive constant, by:
2 log q(yi,j |z, )+cst
=log 2
+kyi,j c0,m0,t0, i(ti,j )(wi)y0k2/2
(7)
2 log q(zpop|)+cst
=|y0|log 2
y+ky0y0k2/2
y+|c0|log 2
c+kc0c0k2/2
c(8)
+|m0|log 2
m+km0m0k2/2
m+|A|log 2
A+kAAk2/2
A
2 log q(zi|)+cst
= log 2
+(tit0)2/2
+ log 2
+2
i/2
+ksik2(9)
2 log q()+cst
=ky0y0k2/&2
y+kc0c0k2/&2
c+km0m0k2/&2
m+kAAk2/&2
A(10)
+(t0t0)2/&2
t+mlog 2
+m2
,0/2
+mlog 2
+m2
,0/2
+mlog 2
+m2
,0/2
noting the dimension of the space where the residual kyi,j c0,m0,t0, i(ti,j )(wi)y0k2is computed, and |y0|,|c0|,|m0|
and |A|the total dimension of y0,c0,m0and Arespectively. We chose either the current [42] or the varifold [7] norm for the
residuals.
Noticing the identity c0,m0,t0, i(ti,j )=c0,m0,0, i(ti,j )t0, the complete log-likelihood can be decomposed into
log q(y, z, )=S(y, z),()Id ()i.e. the proposed mixed-effects model belongs the curved exponential family.
In this setting, the MCMC-SAEM algorithm presented in Section 4has a proved convergence.
Exhibiting the sufficient statistics S1=y0,S2=c0,S3=m0,S4=A,S5=Piti,S6=Pit2
i,S7=Pi2
iand
S8=PiPjkyi,j c0,m0,t0, i(ti,j )(wi)y0k2(see Section 4.5), the update of the model parameters ?in the M step
of the MCMC-SAEM algorithm can be derived in closed form:
y0?=&2
yS1+2
yy0/&2
y+2
yt?
0=&2
tS5+2
?t0/N&2
t+2
?(11)
c0?=&2
cS2+2
cc0/&2
c+2
c2
?=S62t?
0S5+Nt?
0
2+m2
,0/N+m(12)
m0?=&2
mS3+2
mm0/&2
m+2
m2
?=S7+m2
,0/N+m(13)
A?=&2
AS4+2
AA/&2
A+2
A2
?=S8+m2
,0/Nhniii+m(14)
The intricate update of the parameters t0 t?
0and 2
2
?can be solved by iterative replacement.
Similarly to Equation 6, the tempered complete log-likelihood writes:
log qT(y, z, )=
N
X
i=1
ni
X
j=1
log qT(yi,j |z,) + log qT(zpop|)+
N
X
i=1
log q(zi|) + log q()(15)
with: 2 log qT(yi,j |z, )+cst
=log(T2
)+kyi,j c0,m0,t0, i(ti,j )(wi)y0k2/(T2
)(16)
2 log qT(zpop|)+cst
=|y0|log(T2
y)+ky0y0k2/(T2
y)+|c0|log(T2
c)+kc0c0k2/(T2
c)(17)
+|m0|log(T2
m)+km0m0k2/(T2
m)+|A|log(T2
A)+kAAk2/(T2
A)
Tempering can therefore be understood as an artificial increase of the variances 2
,2
y,2
c,2
mand 2
Awhen computing
the associated acceptation ratios in the S-MCMC step of the algorithm. This intuition is well-explained in [25].
... Focusing more specifically on medical applications, several works have analysed longitudinal data through the prism of progression models using in particular mixed-effects models (Schiratti et al., 2015;Bône et al., 2018). In these approaches, patients are assumed to follow a given trajectory that deviates from a reference curve that may, for example, represent the average progression of a given disease. ...
... In such literature, mixed-effect models (Laird & Ware, 1982) that parameterise a patient's evolution as a deviation from a reference trajectory have become more and more popular (Diggle et al., 2002;Singer et al., 2003). First applied on Euclidean data (Bernal-Rusiel et al., 2013), they were then extended with a Riemannian geometry viewpoint (Schiratti et al., 2015;Singh et al., 2016;Koval et al., 2017;Bône et al., 2018) or combined with dimensionality reduction (Louis et al., 2019;Sauty & Durrleman, 2022). Despite being adapted to model disease progression, it is unclear how these models would apply to datasets where there is no clear average evolution. ...
... For these experiments, we consider 5 different databases that mimic longitudinal datasets. The first one is a synthetic longitudinal dataset composed of 1,000, 64x64 images of starmen raising their left arm and generated according to the diffeomorphic model of (Bône et al., 2018). The second one consists of 8 evenly separated rotations applied to the MNIST database (LeCun, 1998) from 0 to 360 degrees, we call it rotMNIST. ...
Preprint
This paper introduces a new latent variable generative model able to handle high dimensional longitudinal data and relying on variational inference. The time dependency between the observations of an input sequence is modelled using normalizing flows over the associated latent variables. The proposed method can be used to generate either fully synthetic longitudinal sequences or trajectories that are conditioned on several data in a sequence and demonstrates good robustness properties to missing data. We test the model on 6 datasets of different complexity and show that it can achieve better likelihood estimates than some competitors as well as more reliable missing data imputation. A code is made available at \url{https://github.com/clementchadebec/variational_inference_for_longitudinal_data}.
... A recent overview of existing LDDMM methods can be found in [19]. Related LDDMM software packages include Demons [9], ANTs [38][39][40], DARTEL [41], deformetrica [42][43][44][45], FLASH [15], LDDMM [5,46], ARDENT [47], ITKNDReg [48], and PyCA [49]. Surveys of GPU-accelerated image registration solvers can be found in [50][51][52]; particular examples for various formulations are [14,16,43,[53][54][55][56][57][58][59][60][61][62][63][64][65][66]. ...
Article
Full-text available
We study the performance of CLAIRE—a diffeomorphic multi-node, multi-GPU image-registration algorithm and software—in large-scale biomedical imaging applications with billions of voxels. At such resolutions, most existing software packages for diffeomorphic image registration are prohibitively expensive. As a result, practitioners first significantly downsample the original images and then register them using existing tools. Our main contribution is an extensive analysis of the impact of downsampling on registration performance. We study this impact by comparing full-resolution registrations obtained with CLAIRE to lower resolution registrations for synthetic and real-world imaging datasets. Our results suggest that registration at full resolution can yield a superior registration quality—but not always. For example, downsampling a synthetic image from 10243 to 2563 decreases the Dice coefficient from 92% to 79%. However, the differences are less pronounced for noisy or low contrast high resolution images. CLAIRE allows us not only to register images of clinically relevant size in a few seconds but also to register images at unprecedented resolution in reasonable time. The highest resolution considered are CLARITY images of size 2816×3016×1162. To the best of our knowledge, this is the first study on image registration quality at such resolutions.
... Since L align is a ℓ 2 loss in the latent space, it can be seen as the log-likelihood of a Gaussian prior z i,j " N pη i ψ pt i;j q, Iq in the latent space, which defines an elementary Gaussian Process, and which supports the addition of GP priors in the latent space of VAEs to model longitudinal data [13,26]. Besides, X can be seen as a Riemannian manifold, the metric of which is given by the pushforward of the Euclidean metric of Z through the decoder, such that trajectories in X are geodesics, in accordance with the Riemannian modeling of longitudinal data [5,14,24,28,29]. The metric on X thus allows to recover non linear dynamics, as is often the case for biomarkers. ...
Chapter
Full-text available
Disease progression models are crucial to understanding degenerative diseases. Mixed-effects models have been consistently used to model clinical assessments or biomarkers extracted from medical images, allowing missing data imputation and prediction at any timepoint. However, such progression models have seldom been used for entire medical images. In this work, a Variational Auto Encoder is coupled with a temporal linear mixed-effect model to learn a latent representation of the data such that individual trajectories follow straight lines over time and are characterised by a few interpretable parameters. A Monte Carlo estimator is devised to iteratively optimize the networks and the statistical model. We apply this method on a synthetic data set to illustrate the disentanglement between time dependant changes and inter-subjects variability, as well as the predictive capabilities of the method. We then apply it to 3D MRI and FDG-PET data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to recover well documented patterns of structural and metabolic alterations of the brain.KeywordsVariational auto encodersMixed-effects modelsDisease progression modelsAlzheimer’s Disease
... Our work applies the DSCM framework to model 3D meshes. Many recent works on spatiotemporal neuroimaging require longitudinal data [59,6], model only population-level, average shapes [36], or do not consider causality in the imaged outcome [75]. ...
Preprint
Causal reasoning provides a language to ask important interventional and counterfactual questions beyond purely statistical association. In medical imaging, for example, we may want to study the causal effect of genetic, environmental, or lifestyle factors on the normal and pathological variation of anatomical phenotypes. However, while anatomical shape models of 3D surface meshes, extracted from automated image segmentation, can be reliably constructed, there is a lack of computational tooling to enable causal reasoning about morphological variations. To tackle this problem, we propose deep structural causal shape models (CSMs), which utilise high-quality mesh generation techniques, from geometric deep learning, within the expressive framework of deep structural causal models. CSMs enable subject-specific prognoses through counterfactual mesh generation ("How would this patient's brain structure change if they were ten years older?"), which is in contrast to most current works on purely population-level statistical shape modelling. We demonstrate the capabilities of CSMs at all levels of Pearl's causal hierarchy through a number of qualitative and quantitative experiments leveraging a large dataset of 3D brain structures.
Article
Full-text available
Functional data analysis (FDA) is a fast-growing area of research and development in statistics. While most FDA literature imposes the classical $$\mathbb {L}^2$$ L 2 Hilbert structure on function spaces, there is an emergent need for a different, shape-based approach for analyzing functional data. This paper reviews and develops fundamental geometrical concepts that help connect traditionally diverse fields of shape and functional analyses. It showcases that focusing on shapes is often more appropriate when structural features (number of peaks and valleys and their heights) carry salient information in data. It recaps recent mathematical representations and associated procedures for comparing, summarizing, and testing the shapes of functions. Specifically, it discusses three tasks: shape fitting, shape fPCA, and shape regression models. The latter refers to the models that separate the shapes of functions from their phases and use them individually in regression analysis. The ensuing results provide better interpretations and tend to preserve geometric structures. The paper also discusses an extension where the functions are not real-valued but manifold-valued. The article presents several examples of this shape-centric functional data analysis using simulated and real data.
Article
Accurate prediction of progression in subjects at risk of Alzheimer's disease is crucial for enrolling the right subjects in clinical trials. However, a prospective comparison of state-of-the-art algorithms for predicting disease onset and progression is currently lacking. We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcomes: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. The methods used by challenge participants included multivariate linear regression, machine learning methods such as support vector machines and deep neural networks, as well as disease progression models. No single submission was best at predicting all three outcomes. For clinical diagnosis and ventricle volume prediction, the best algorithms strongly outperform simple baselines in predictive ability. However, for ADAS-Cog13 no single submitted prediction method was significantly better than random guesswork. Two ensemble methods based on taking the mean and median over all predictions, obtained top scores on almost all tasks. Better than average performance at diagnosis prediction was generally associated with the additional inclusion of features from cerebrospinal fluid (CSF) samples and diffusion tensor imaging (DTI). On the other hand, better performance at ventricle volume prediction was associated with inclusion of summary statistics, such as the slope or maxima/minima of patient-specific biomarkers. On a limited, cross-sectional subset of the data emulating clinical trials, performance of the best algorithms at predicting clinical diagnosis decreased only slightly (2 percentage points) compared to the full longitudinal dataset. The submission system remains open via the website https://tadpole.grand-challenge.org, while TADPOLE SHARE (https://tadpole-share.github.io/) collates code for submissions. TADPOLE's unique results suggest that current prediction algorithms provide sufficient accuracy to exploit biomarkers related to clinical diagnosis and ventricle volume, for cohort refinement in clinical trials for Alzheimer's disease. However, results call into question the usage of cognitive test scores for patient selection and as a primary endpoint in clinical trials.
Chapter
Causal reasoning provides a language to ask important interventional and counterfactual questions beyond purely statistical association. In medical imaging, for example, we may want to study the causal effect of genetic, environmental, or lifestyle factors on the normal and pathological variation of anatomical phenotypes. However, while anatomical shape models of 3D surface meshes, extracted from automated image segmentation, can be reliably constructed, there is a lack of computational tooling to enable causal reasoning about morphological variations. To tackle this problem, we propose deep structural causal shape models (CSMs), which utilise high-quality mesh generation techniques, from geometric deep learning, within the expressive framework of deep structural causal models. CSMs enable subject-specific prognoses through counterfactual mesh generation (“How would this patient’s brain structure change if they were ten years older?”), which is in contrast to most current works on purely population-level statistical shape modelling. We demonstrate the capabilities of CSMs at all levels of Pearl’s causal hierarchy through a number of qualitative and quantitative experiments leveraging a large dataset of 3D brain structures.
Article
Full-text available
We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcomes: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. No single submission was best at predicting all three outcomes. For clinical diagnosis and ventricle volume prediction, the best algorithms strongly outperform simple baselines in predictive ability. However, for ADAS-Cog13 no single submitted prediction method was significantly better than random guessing. Two ensemble methods based on taking the mean and median over all predictions, obtained top scores on almost all tasks. Better than average performance at diagnosis prediction was generally associated with the additional inclusion of features from cerebrospinal fluid (CSF) samples and diffusion tensor imaging (DTI). On the other hand, better performance at ventricle volume prediction was associated with inclusion of summary statistics, such as patient-specific biomarker trends. The submission system remains open via the website https://tadpole.grand-challenge.org, while code for submissions is being collated by TADPOLE SHARE: https://tadpole-share.github.io/. Our work suggests that current prediction algorithms are accurate for biomarkers related to clinical diagnosis and ventricle volume, opening up the possibility of cohort refinement in clinical trials for Alzheimer's disease.
Chapter
Longitudinal medical image data are becoming increasingly important for monitoring patient progression. However, such datasets are often small, incomplete, or have inconsistencies between observations. Thus, we propose a generative model that not only produces continuous trajectories of fully synthetic patient images, but also imputes missing data in existing trajectories, by estimating realistic progression over time. Our generative model is trained directly on features extracted from images and maps these into a linear trajectory in a Euclidean space defined with velocity, delay, and spatial parameters that are learned directly from the data. We evaluated our method on toy data and face images, both showing simulated trajectories mimicking progression in longitudinal data. Furthermore, we applied the proposed model on a complex neuroimaging database extracted from ADNI. All datasets show that the model is able to learn overall (disease) progression over time.KeywordsLongitudinal dataGenerative modelSynthetic images
Conference Paper
Full-text available
We introduce a mixed-effects model to learn spatiotemporal patterns on a network by considering longitudinal measures distributed on a fixed graph. The data come from repeated observations of subjects at different time points which take the form of measurement maps distributed on a graph such as an image or a mesh. The model learns a typical group-average trajectory characterizing the propagation of measurement changes across the graph nodes. The subject-specific trajectories are defined via spatial and temporal transformations of the group-average scenario, thus estimating the variability of spatiotemporal patterns within the group. To estimate population and individual model parameters, we adapted a stochastic version of the Expectation-Maximization algorithm, the MCMC-SAEM. The model is used to describe the propagation of cortical atrophy during the course of Alzheimer’s Disease. Model parameters show the variability of this average pattern of atrophy in terms of trajectories across brain regions, age at disease onset and pace of propagation. We show that the personalization of this model yields accurate prediction of maps of cortical thickness in patients.
Conference Paper
Full-text available
The analysis of manifold-valued data requires efficient tools from Riemannian geometry to cope with the computational complexity at stake. This complexity arises from the always-increasing dimension of the data, and the absence of closed-form expressions to basic operations such as the Riemannian logarithm. In this paper, we adapt a generic numerical scheme recently introduced for computing parallel transport along geodesics in a Riemannian manifold to finite-dimensional manifolds of diffeomorphisms. We provide a qualitative and quantitative analysis of its behavior on high-dimensional manifolds, and investigate an application with the prediction of brain structures progression.
Conference Paper
Full-text available
Statistical machine learning models that operate on manifold-valued data are being extensively studied in vision, motivated by applications in activity recognition, feature tracking and medical imaging. While non-parametric methods have been relatively well studied in the literature, efficient formulations for parametric models (which may offer benefits in small sample size regimes) have only emerged recently. So far, manifold-valued regression models (such as geodesic regression) are restricted to the analysis of cross-sectional data, i.e., the so-called "fixed effects" in statistics. But in most "longitudinal analysis" (e.g., when a participant provides multiple measurements, over time) the application of fixed effects models is problematic. In an effort to answer this need, this paper generalizes non-linear mixed effects model to the regime where the response variable is manifold-valued, i.e., f : Rd → ℳ. We derive the underlying model and estimation schemes and demonstrate the immediate benefits such a model can provide - both for group level and individual level analysis - on longitudinal brain imaging data. The direct consequence of our results is that longitudinal analysis of manifold-valued measurements (especially, the symmetric positive definite manifold) can be conducted in a computationally tractable manner.
Conference Paper
Full-text available
Regression is an essential tool in Statistical analysis of data with many applications in Computer Vision, Machine Learning, Medical Imaging and various disciplines of Science and Engineering. Linear and nonlinear regression in a vector space setting has been well studied in literature. However, generalizations to manifold-valued data are only recently gaining popularity. With the exception of a few, most existing methods of regression for manifold valued data are limited to geodesic regression which is a generalization of the linear regression in vector-spaces. In this paper, we present a novel nonlinear kernel-based regression method that is applicable to manifold valued data. Our method is applicable to cases when the independent and dependent variables in the regression model are both manifold-valued or one is manifold-valued and the other is vector or scalar valued. Further, unlike most methods, our method does not require any imposed ordering on the manifold-valued data. The performance of our model is tested on a large number of real data sets acquired from Alzhiemers and movement disorder (Parkinsons and Essential Tremor) patients. We present an extensive set of results along with statistical validation and comparisons.
Article
Full-text available
Hierarchical linear models (HLMs) are a standard approach for analyzing data where individuals are measured repeatedly over time. However, such models are only applicable to longitudinal studies of Euclidean data. This paper develops the theory of hierarchical geodesic models (HGMs), which generalize HLMs to the manifold setting. Our proposed model quantifies longitudinal trends in shapes as a hierarchy of geodesics in the group of diffeomorphisms. First, individual-level geodesics represent the trajectory of shape changes within individuals. Second, a group-level geodesic represents the average trajectory of shape changes for the population. Our proposed HGM is applicable to longitudinal data from unbalanced designs, i.e., varying numbers of timepoints for individuals, which is typical in medical studies. We derive the solution of HGMs on diffeomorphisms to estimate individual-level geodesics, the group geodesic, and the residual diffeomorphisms. We also propose an efficient parallel algorithm that easily scales to solve HGMs on a large collection of 3D images of several individuals. Finally, we present an effective model selection procedure based on cross validation. We demonstrate the effectiveness of HGMs for longitudinal analysis of synthetically generated shapes and 3D MRI brain scans.
Article
Full-text available
In this paper we introduce a diffeomorphic constraint on the deformations considered in the deformable Bayesian mixed effect template model. Our approach is built on a generic group of diffeomorphisms, which is parameterized by an arbitrary set of control point positions and momentum vectors. This enables us to estimate the optimal positions of control points together with a template image and parameters of the deformation distribution which compose the atlas. We propose to use a stochastic version of the expectation-maximization algorithm where the simulation is performed using the anisotropic Metropolis adjusted Langevin algorithm. We propose also an extension of the model including a sparsity constraint to select an optimal number of control points with relevant positions. Experiments are carried out on the United States Postal Service database, on mandibles of mice, and on three-dimensional murine dendrite spine images.
Article
We present a Bayesian framework for atlas construction of multi-object shape complexes comprised of both surface and curve meshes. It is general and can be applied to any parametric deformation framework and to all shape models with which it is possible to define probability density functions (PDF). Here, both curve and surface meshes are modelled as Gaussian random varifolds, using a finite-dimensional approximation space on which PDFs can be defined. Using this framework, we can automatically estimate the parameters balancing data-terms and deformation regularity, which previously required user tuning. Moreover, it is also possible to estimate a well-conditioned covariance matrix of the deformation parameters. We also extend the proposed framework to data-sets with multiple group labels. Groups share the same template and their deformation parameters are modelled with different distributions. We can statistically compare the groups'distributions since they are defined on the same space. We test our algorithm on 20 Gilles de la Tourette patients and 20 control subjects, using three sub-cortical regions and their incident white matter fiber bundles. We compare their morphological characteristics and variations using a single diffeomorphism in the ambient space. The proposed method will be integrated with the Deformetrica software package, publicly available at www.deformetrica.org.