Leo Duan

Leo Duan
University of Florida | UF · Department of Statistics

PhD

About

37
Publications
4,220
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
708
Citations
Citations since 2017
28 Research Items
674 Citations
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140
Additional affiliations
August 2018 - present
University of Florida
Position
  • Professor (Assistant)
March 2016 - August 2018
Duke University
Position
  • PostDoc Position
September 2011 - June 2015
University of Cincinnati
Position
  • PhD

Publications

Publications (37)
Article
In network analysis, it is common to work with a collection of graphs that exhibit heterogeneity. For example, neuroimaging data from patient cohorts are increasingly available. A critical analytical task is to identify communities, and graph Laplacian-based methods are routinely used. However, these methods are currently limited to a single networ...
Preprint
Full-text available
Vector autoregressions have been widely used for modeling and analysis of multivariate time series data. In high-dimensional settings, model parameter regularization schemes inducing sparsity have achieved good forecasting performances. However, in many data applications such as those in neuroscience, the graph estimates from existing methods still...
Preprint
Full-text available
Spectral clustering algorithms are very popular. Starting from a pairwise similarity matrix, spectral clustering gives a partition of data that approximately minimizes the total similarity scores across clusters. Since there is no need to model how data are distributed within each cluster, such a method enjoys algorithmic simplicity and robustness...
Chapter
A better understanding of how the brain's structure and connections give rise to motivation and behavior will lead to better therapies for psychiatric and neurological conditions, including neurodegenerative conditions such as Alzheimer's disease (AD). Magnetic resonance imaging (MRI) allows us to evaluate how these metrics change in various brain...
Article
In Bayesian applications, there is a huge interest in rapid and accurate estimation of the posterior distribution, particularly for high dimensional or hierarchical models. In this article, we propose to use optimization to solve for a joint distribution (random transport plan) between two random variables, θ from the posterior distribution and β f...
Preprint
Full-text available
In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. Examples include the fused lasso regression, the matrix recovery under an unknown low rank, etc. Despite the ease of obtaining a point estimate via the optimization, it is much more challenging to quantify their uncertainty --- in...
Preprint
Full-text available
In multivariate data analysis, it is often important to estimate a graph characterizing dependence among p variables. A popular strategy uses the non-zero entries in a p × p covariance or precision matrix, typically requiring restrictive modeling assumptions for accurate graph recovery. To improve model robust-ness, we instead focus on estimating t...
Article
Model-based clustering is widely used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance cl...
Preprint
Full-text available
In mixture modeling and clustering application, often, the number of components is not known. The stick-breaking model is an appealing construction that assumes infinitely many components, while shrinking most of the redundant weights to near zero. However, it has been discovered that such shrinkage is unsatisfactory: even when the component distri...
Preprint
Full-text available
Markov chain Monte Carlo is routinely used for posterior estimation in Bayesian models; however, it can suffer from computing inefficiency, especially in high dimensional or hierarchical models, due to the high correlation appearing in the Markov chain. While approximate solutions have become popular, there are concerns about accuracy. Inspired by...
Article
Full-text available
Background: Beginning at a young age, children with cystic fibrosis (CF) embark on demanding care regimens that pose challenges to parents. We examined the extent to which clinical, demographic and psychosocial features inform patterns of adherence to pulmonary therapies and how these patterns can be used to develop clinical personas, defined as a...
Preprint
Full-text available
Lasso and $l_1$-regularization play a dominating role in high dimensional statistics and machine learning. The most attractive property is that it produces a sparse parameter estimate containing exact zeros. For uncertainty quantification, popular Bayesian approaches choose a continuous prior that puts concentrated mass near zero; however, as a lim...
Preprint
Full-text available
Background: Beginning at a young age, children with cystic fibrosis (CF) embark on demanding care regimens that pose challenges to parents. We examined the extent to which clinical, demographic and psychosocial features inform patterns of adherence to pulmonary therapies and how these patterns can be used to develop clinical personas, defined as as...
Preprint
Full-text available
Background: Beginning at a young age, children with cystic fibrosis (CF) embark on demanding care regimens that pose challenges to parents. We examined the extent to which clinical, demographic and psychosocial features inform patterns of adherence to pulmonary therapies and how these patterns can be used to develop clinical personas, defined as as...
Preprint
Full-text available
Background: Beginning at a young age, children with cystic fibrosis (CF) embark on demanding care regimens that pose challenges to parents. We examined the extent to which clinical, demographic and psychosocial features inform patterns of adherence to pulmonary therapies and how these patterns can be used to develop clinical personas, defined as as...
Preprint
The multi-scale factor models are particularly appealing for analyzing matrix- or tensor-valued data, due to their adaptiveness to local geometry and intuitive interpretation. However, the reliance on the binary tree for recursive partitioning creates high complexity in the parameter space, making it extremely challenging to quantify its uncertaint...
Preprint
In the network data analysis, it is common to encounter a large population of graphs: each has a small-to-moderate number of nodes but together, they show substantial variation from one graph to another. The graph Laplacian, a linear transform of the adjacency matrix, is routinely used in community detection algorithms; however, it is limited to si...
Preprint
In representation learning and non-linear dimension reduction, there is a huge interest to learn the 'disentangled' latent variables, where each sub-coordinate almost uniquely controls a facet of the observed data. While many regularization approaches have been proposed on variational autoencoders, heuristic tuning is required to balance between di...
Preprint
High dimensional data often contain multiple facets, and several clustering patterns (views) can co-exist under different feature subspaces. While multi-view clustering algorithms were proposed, the uncertainty quantification remains difficult --- a particular challenge is in the high complexity of estimating the cluster assignment probability unde...
Preprint
Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance cl...
Article
There has been considerable interest in making Bayesian inference more scalable. In big data settings, most of the focus has been on reducing the computing time per iteration rather than reducing the number of iterations needed in Markov chain Monte Carlo (MCMC). This article considers data augmentation MCMC (DA-MCMC), a widely used technique. DA-M...
Preprint
Full-text available
Gaussian processes (GPs) are commonplace in spatial statistics. Although many non-stationary models have been developed, there is arguably a lack of flexibility compared to equipping each location with its own parameters. However, the latter suffers from intractable computation and can lead to overfitting. Taking the instantaneous stationarity idea...
Article
A two-level Gaussian process (GP) joint model is proposed to improve personalized prediction of medical monitoring data. The proposed model is applied to jointly analyze multiple longitudinal biomedical outcomes, including continuous measurements and binary outcomes, to achieve better prediction in disease progression. At the population level of th...
Article
Full-text available
Prior information often takes the form of parameter constraints. Bayesian methods include such information through prior distributions having constrained support. By using posterior sampling algorithms, one can quantify uncertainty without relying on asymptotic approximations. However, sharply constrained priors are (a) unrealistic in many settings...
Article
Modern environmental and climatological studies produce multiple outcomes at high spatial resolutions. Multivariate spatial modeling is an established means to quantify cross-correlation among outcomes. However, existing models typically suffer from poor computational efficiency and lack the flexibility to simultaneously estimate auto- and cross-co...
Article
Objective: More than 70% of hospitals in the United States have electronic health records (EHRs). Clinical decision support (CDS) presents clinicians with electronic alerts during the course of patient care; however, alert fatigue can influence a provider's response to any EHR alert. The primary goal was to evaluate the effects of alert burden on...
Article
Full-text available
Data augmentation is a common technique for building tuning-free Markov chain Monte Carlo algorithms. Although these algorithms are very popular, autocorrelations are often high in large samples, leading to poor computational efficiency. This phenomenon has been attributed to a discrepancy between Gibbs step sizes and the rate of posterior concentr...
Article
Full-text available
Community ecology aims to understand what factors determine the assembly and dynamics of species assemblages at different spatiotemporal scales. To facilitate the integration between conceptual and statistical approaches in community ecology, we propose Hierarchical Modelling of Species Communities (HMSC) as a general, flexible framework for modern...
Article
Full-text available
Objective To identify phenotypes of type 1 diabetes control and associations with maternal/neonatal characteristics based on blood pressure (BP), glucose, and insulin curves during gestation, using a novel functional data analysis approach that accounts for sparse longitudinal patterns of medical monitoring during pregnancy. Methods We performed a...
Article
Full-text available
Lower airway biomarkers of restored cystic fibrosis transmembrane conductance regulator (CFTR) function are limited. We hypothesized that fractional excretion of nitric oxide (FENO), typically low in CF patients, would demonstrate reproducibility during CFTR-independent therapies, and increase during CFTR-specific intervention (ivacaftor) in patien...
Article
Full-text available
Gaussian process regression is a commonly used nonparametric approach in spatial statistics and functional data analyses. The parameters in the covariance function provide nice interpretation regarding the decaying pattern of the correlation. However, its computational cost obstructs its use in extremely large data or more sophisticated modeling. I...
Article
Full-text available
A novel extrapolation method is proposed for longitudinal forecasting. A hierarchical Gaussian process model is used to combine nonlinear population change and individual memory of the past to make prediction. The prediction error is minimized through the hierarchical design. The method is further extended to joint modeling of continuous measuremen...
Article
Full-text available
We propose a novel "tree-averaging" model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian ensemble trees (BET) and model them as an infinite mixture Dirichlet process. We show that BET adapts to data heterogeneit...
Article
Detecting the onset of rapid lung function decline is important to reduce mortality rates in cystic fibrosis (CF) and other lung diseases. The most common approach is conventional linear mixed modeling-estimating a population-level slope of lung function decline and using random effects to address serial correlation-but this ignores nonlinear featu...
Article
Actin plays crucial roles in the life of the cell while being notorious for its inability to reach a functional conformation without the help of assistant proteins. In eukaryotes, for example, the cytosolic chaperonin containing TCP-1 (CCT) and prefoldin (PFD) are required for actin folding assistance and prevention of protein aggregation in the cr...
Article
Human synaptotagmin 1 (Syt1) plays a crucial role in the bending of the membrane during neurotransmitter release at the synapse. Hence, resolving the structural details of Syt1 that underlie its biological function is fundamental for providing mechanistic insights into the nature of the synaptic response. We explored the unfolding micromechanics of...

Network

Cited By

Projects

Projects (2)
Archived project
Development of Bayesian modelling framework for analysis of community data.