Huiling Le’s research while affiliated with University of Nottingham and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (67)


Figure 1: Three possible topologies for data extracted from metazoan data [Nye, 2011].
Figure 2: log R n (x) for metazoan data. When x is on leg 1 Red, blue and green lines are values of −c 1 ((1 − α)/2) at α = 0.1, 0.05, 0.001, respectively. The purple, brown and orange lines are α = 0.99998, 1, 1, respectively.
Figure 3: Confidence sets for the non-sticky Fréchet mean tree of the metazoan data depending on confidence level α. Clockwise: (i) when α ⩽ α 1 ; (ii) when α 1 < α ⩽ α 2 ; (iii) when when α 2 < α ⩽ α 3 ; (iv) when α > α 3 .
Empirical likelihood for Fr\'echet means on open books
  • Preprint
  • File available

December 2024

·

13 Reads

·

Huiling Le

·

·

Xi Yan

Empirical Likelihood (EL) is a type of nonparametric likelihood that is useful in many statistical inference problems, including confidence region construction and k-sample problems. It enjoys some remarkable theoretical properties, notably Bartlett correctability. One area where EL has potential but is under-developed is in non-Euclidean statistics where the Fr\'echet mean is the population characteristic of interest. Only recently has a general EL method been proposed for smooth manifolds. In this work, we continue progress in this direction and develop an EL method for the Fr\'echet mean on a stratified metric space that is not a manifold: the open book, obtained by gluing copies of a Euclidean space along their common boundaries. The structure of an open book captures the essential behaviour of the Fr\'echet mean around certain singular regions of more general stratified spaces for complex data objects, and relates intimately to the local geometry of non-binary trees in the well-studied phylogenetic treespace. We derive a version of Wilks' theorem for the EL statistic, and elucidate on the delicate interplay between the asymptotic distribution and topology of the neighbourhood around the population Fr\'echet mean. We then present a bootstrap calibration of the EL, which proves that under mild conditions, bootstrap calibration of EL confidence regions have coverage error of size O(n2)O(n^{-2}) rather than O(n1)O(n^{-1}).

Download

Relevant points on the minimal geodesic βz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _z$$\end{document} from x0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_0$$\end{document} to z when z∉Cx0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z\not \in {\mathcal {C}}_{x_0}$$\end{document}
Central limit theorem for intrinsic Fréchet means in smooth compact Riemannian manifolds

June 2024

·

18 Reads

·

3 Citations

Probability Theory and Related Fields

We prove a central limit theorem (CLT) for the Fréchet mean of independent and identically distributed observations in a compact Riemannian manifold assuming that the population Fréchet mean is unique. Previous general CLT results in this setting have assumed that the cut locus of the Fréchet mean lies outside the support of the population distribution. In this paper we present a CLT under some mild technical conditions on the manifold plus the following assumption on the population distribution: in a neighbourhood of the cut locus of the population Fréchet mean, the population distribution is absolutely continuous with respect to the volume measure on the manifold and in this neighhbourhood the Radon–Nikodym derivative has a version that is continuous. So far as we are aware, the CLT given here is the first which allows the cut locus to have co-dimension one or two when it is included in the support of the distribution. A key part of the proof is establishing an asymptotic approximation for the parallel transport of a certain vector field. Whether or not a non-standard term arises in the CLT depends on whether the co-dimension of the cut locus is one or greater than one: in the former case a non-standard term appears but not in the latter case. This is the first paper to give a general and explicit expression for the non-standard term which arises when the co-dimension of the cut locus is one.



Central limit theorem for intrinsic Frechet means in smooth compact Riemannian manifolds

October 2022

·

44 Reads

We prove a central limit theorem (CLT) for the Frechet mean of independent and identically distributed observations in a compact Riemannian manifold assuming that the population Frechet mean is unique. Previous general CLT results in this setting have assumed that the cut locus of the Frechet mean lies outside the support of the population distribution. So far as we are aware, the CLT in the present paper is the first which allows the cut locus to have co-dimension one or two when it is included in the support of the distribution. A key part of the proof is establishing an asymptotic approximation for the parallel transport of a certain vector field. Whether or not a non-standard term arises in the CLT depends on whether the co-dimension of the cut locus is one or greater than one: in the former case a non-standard term appears but not in the latter case. This is the first paper to give a general and explicit expression for the non-standard term which arises when the co-dimension of the cut locus is one.


A diffusion approach to Stein's method on Riemannian manifolds

March 2020

·

80 Reads

We detail an approach to develop Stein's method for bounding integral metrics on probability measures defined on a Riemannian manifold M\mathbf{M}. Our approach exploits the relationship between the generator of a diffusion on M\mathbf{M} with target invariant measure and its characterising Stein operator. We consider a pair of such diffusions with different starting points, and investigate properties of solution to the Stein equation based on analysis of the distance process between the pair. Several examples elucidating the role of geometry of M\mathbf{M} in these developments are presented.



Principal nested shape space analysis of molecular dynamics data

March 2019

·

43 Reads

Molecular dynamics simulations produce huge datasets of temporal sequences of molecules. It is of interest to summarize the shape evolution of the molecules in a succinct, low-dimensional representation. However, Euclidean techniques such as principal components analysis (PCA) can be problematic as the data may lie far from in a flat manifold. Principal nested spheres gives a fundamentally different decomposition of data from the usual Euclidean sub-space based PCA (Jung, Dryden and Marron, 2012, Biometrika). Sub-spaces of successively lower dimension are fitted to the data in a backwards manner, with the aim of retaining signal and dispensing with noise at each stage. We adapt the methodology to 3D sub-shape spaces and provide some practical fitting algorithms. The methodology is applied to cluster analysis of peptides, where different states of the molecules can be identified. Also, the temporal transitions between cluster states are explored.


Rate-Invariant Analysis of Covariance Trajectories

October 2018

·

123 Reads

·

28 Citations

Journal of Mathematical Imaging and Vision

·

·

·

[...]

·

Statistical analysis of dynamic systems, such as videos and dynamic functional connectivity, is often translated into a problem of analyzing trajectories of relevant features, particularly covariance matrices. As an example, in video-based action recognition, a natural mathematical representation of activity videos is as parameterized trajectories on the set of symmetric, positive-definite matrices (SPDMs). The variable execution-rates of actions, implying arbitrary parameterizations of trajectories, complicates their analysis and classification. To handle this challenge, we represent covariance trajectories using transported square-root vector fields (TSRVFs), constructed by parallel translating scaled-velocity vectors of trajectories to their starting points. The space of such representations forms a vector bundle on the SPDM manifold. Using a natural Riemannian metric on this vector bundle, we approximate geodesic paths and geodesic distances between trajectories in the quotient space of this vector bundle. This metric is invariant to the action of the reparameterization group, and leads to a rate-invariant analysis of trajectories. In the process, we remove the parameterization variability and temporally register trajectories during analysis. We demonstrate this framework in multiple contexts, using both generative statistical models and discriminative data analysis. The latter is illustrated using several applications involving video-based action recognition and dynamic functional connectivity analysis.


Bayesian Linear Size-and-Shape Regression with Applications to Face Data

June 2018

·

134 Reads

·

3 Citations

Sankhya A

Regression models for size-and-shape analysis are developed, where the model is specified in the Euclidean space of the landmark coordinates. Statistical models in this space (which is known as the top space or ambient space) are often easier for practitioners to understand than alternative models in the quotient space of size-and-shapes. We consider a Bayesian linear size-and-shape regression model in which the response variable is given by labelled configuration matrix, and the covariates represent quantities such as gender and age. It is important to parameterize the model so that it is identifiable, and we use the LQ decomposition in the intercept term in the model for this purpose. Gamma priors for the inverse variance of the error term, matrix Fisher priors for the random rotation matrix, and flat priors for the regression coefficients are used. Markov chain Monte Carlo algorithms are used for sampling from the posterior distribution, in particular by using combinations of Metropolis-Hastings updates and a Gibbs sampler.The proposed Bayesian methodology is illustrated with an application to forensic facial data in three dimensions, where we investigate the main changes in growth by describing relative movements of landmarks for each gender over time.


Smoothing Splines on Riemannian Manifolds, with Applications to 3D Shape Space

January 2018

·

111 Reads

·

46 Citations

Journal of the Royal Statistical Society Series B (Statistical Methodology)

There has been increasing interest in statistical analysis of data lying in manifolds. This paper generalizes a smoothing spline fitting method to Riemannian manifold data based on the technique of unrolling and unwrapping originally proposed in Jupp and Kent (1987) for spherical data. In particular we develop such a fitting procedure for shapes of configurations in general m-dimensional Euclidean space, extending our previous work for two dimensional shapes. We show that parallel transport along a geodesic on Kendall shape space is linked to the solution of a homogeneous first-order differential equation, some of whose coefficients are implicitly defined functions. This finding enables us to approximate the procedure of unrolling and unwrapping by simultaneously solving such equations numerically, and so to find numerical solutions for smoothing splines fitted to higher dimensional shape data. This fitting method is applied to the analysis of simulated 3D shapes and to some dynamic 3D peptide data. Model selection procedures are also developed.


Citations (42)


... When M is a smooth finite-dimensional manifold, considerable effort has been made to understand how the geometry of M influences existence and uniqueness of the Fréchet mean [Karcher, 1977, Afsari, 2011, and to identify corresponding conditions that ensure consistency and a Central Limit Theorem (CLT) for its empirical version based on a random sample from µ [Bhattacharya and Patrangenaru, 2003]; the limiting distribution is a Gaussian with full support on the tangent space of the population Fréchet mean. Theoretical complications in the CLT theory can arise even in the case of a smooth compact manifold; see Eltzner et al. [2021] and Hotz et al. [2024]. ...

Reference:

Empirical likelihood for Fr\'echet means on open books
Central limit theorem for intrinsic Fréchet means in smooth compact Riemannian manifolds

Probability Theory and Related Fields

... In this case we redefine x 1 to be γ (t) for any t ∈ (0, t * z )which gives uniqueness of t z when it exists, provided x 0 and x 1 are sufficiently close. See also Lemma 3 in the Supplementary Material of [18] for the description of sufficiently small neighbourhoods of non-conjugate parts of cut loci. ...

A diffusion approach to Stein’s method on Riemannian manifolds
  • Citing Article
  • May 2024

Bernoulli

... This notably means that the data embeddings at two different dimensions might drastically differ, which is a pitfall for data analysis. For more details about the importance of nestedness in statistics, one can refer to Huckemann et al. (2010); Jung et al. (2012); Damon and Marron (2014); Huckemann and Eltzner (2018); Pennec (2018); Lerman and Maunu (2018b); Dryden et al. (2019); Yang and Vemuri (2021); Fan et al. (2022). We illustrate in Figure 1 the nestedness issue on toy datasets related to three important machine learning problems: robust subspace recovery, linear discriminant analysis and sparse spectral clustering (Lu et al., 2016;Wang et al., 2017). ...

Principal nested shape space analysis of molecular dynamics data
  • Citing Article
  • December 2019

The Annals of Applied Statistics

... More recently, the Fisher-Rao Riemannian metric has been used to separate the phase and amplitude parts of 1D functional data on [0, 1] in [28]. Alignment of functional data on the manifold, i.e., g : [0, 1] → M where M is a nonlinear manifold, has been investigated in [29]- [31]. However, the ConCon function f is defined on a product manifold domain Ω × Ω, which is significantly different from the previous works. ...

Rate-Invariant Analysis of Covariance Trajectories

Journal of Mathematical Imaging and Vision

... Among intrinsic methods, partial intrinsic methods define and operate on intrinsic residuals via the Riemannian logarithmic map and use parallel transport (see Lee, 2003 for definitions), which allows estimation in the tangent space at the mean direction, a linear subspace of R d (Jupp and Kent, 1987;Zhu et al., 2009;Shi et al., 2009;Yuan et al., 2012;Cornea et al., 2017;Lin and Yao, 2019;Kim et al., 2021). Another class of intrinsic regression models, Fréchet regression, measures the deviation of the response from the mean direction using geodesic distance, allowing estimation by minimizing this deviation. ...

Smoothing Splines on Riemannian Manifolds, with Applications to 3D Shape Space

Journal of the Royal Statistical Society Series B (Statistical Methodology)

... [14] investigated the dynamic continuous-time assets and liabilities management problem with delay in the mean-variance framework, and derived analytical expressions for the pre-commitment strategies of the mean-variance assets and liabilities management problem with delay. [15] used the conjugate duality approach to study a class of stochastic optimal control problems with delay of state systems described by stochastic differential equations and obtained expressions for the corresponding dual problem. [16] considered the optimal expected-variance reinsurance problem with delay under the dependent-risk model, obtaining analytical expressions for the optimal strategies. ...

Conjugate duality in stochastic controls with delay
  • Citing Article
  • July 2017

Advances in Applied Probability

... We focus on the open book for the following reasons. First, every stratified space that is singular along a stratum of codimension one is locally homeomorphic to the open book [Goresky et al., 1988]; developing an EL method for the open book that accommodates a sticky Fréchet mean on its codimension one strata will shed light on the challenges, and corresponding mitigation strategies, when moving onto general stratified spaces with strata of codimension greater than one [Barden and Le, 2018]. Second, from a methodological perspective, the open book relates intimately to the neighbourhood structure of certain non-binary trees in the space of phylogenetic trees [Billera et al., 2001], whose geometry has now been extensively studied, both from statistical [e.g., Nye, 2011, Willis, 2019 and computational perspectives [e.g., Miller et al., 2015, Owen, 2011, and used in various applications involving tree-structured data [e.g., Feragen et al., 2013]. ...

The Logarithm Map, its Limits and Frechet Means in Orthant Spaces
  • Citing Article
  • March 2017

Proceedings of The London Mathematical Society

... By "shape data", we mean objects whose predominantly interesting features are of geometric and topological nature; examples of which include functions, curves, surfaces or probability densities. Naturally, this prompted the emergence of new mathematical and algorithmic approaches for the analysis of such objects, which led to the development of the growing fields of geometric shape analysis and topological data analysis, see e.g Younes (2010), Srivastava and Klassen (2016), Kendall et al. (1999), Edelsbrunner and Harer (2022), Carlsson (2014), Bronstein et al. (2008Bronstein et al. ( , 2021. ...

Wiley Series in Probability and Statistics
  • Citing Chapter
  • May 2008

... Unlike Euclidean spaces, operations like addition or averaging cannot be defined in Dif f (M). Techniques such as Principal Geodesic Analysis (PGA) (Fletcher et al., 2004), Fréchet means (Le and Kume, 2000), and geodesic regression (Fletcher, 2011) attempt to address statistical estimation for Riemannian manifolds but require significant adaptation for Frechet Lie groups. Therefore, alternative metrics are necessary to model relationships between deformation fields and ensure anatomical relevance. ...

The Fréchet mean shape and the shape of the means
  • Citing Article
  • March 2000

Advances in Applied Probability

... Fréchet [70] conducted some of the earliest work on generalizing the concept of mean and variance to distributions in manifolds rather than Euclidean spaces. These concepts were rediscovered for analysis of nonlinear variation in curves [56,54,53], and statistical shape analysis [63,64,11,32]. The Fréchet mean in the manifold coincides with the usual mean when the support of the distribution is Euclidean with the usual metric. ...

Estimating Fréchet means in Bookstein's shape space
  • Citing Article
  • September 2000

Advances in Applied Probability