About
37
Publications
4,220
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
708
Citations
Citations since 2017
Introduction
Skills and Expertise
Additional affiliations
August 2018 - present
March 2016 - August 2018
September 2011 - June 2015
Publications
Publications (37)
In network analysis, it is common to work with a collection of graphs that exhibit heterogeneity. For example, neuroimaging data from patient cohorts are increasingly available. A critical analytical task is to identify communities, and graph Laplacian-based methods are routinely used. However, these methods are currently limited to a single networ...
Vector autoregressions have been widely used for modeling and analysis of multivariate time series data. In high-dimensional settings, model parameter regularization schemes inducing sparsity have achieved good forecasting performances. However, in many data applications such as those in neuroscience, the graph estimates from existing methods still...
Spectral clustering algorithms are very popular. Starting from a pairwise similarity matrix, spectral clustering gives a partition of data that approximately minimizes the total similarity scores across clusters. Since there is no need to model how data are distributed within each cluster, such a method enjoys algorithmic simplicity and robustness...
A better understanding of how the brain's structure and connections give rise to motivation and behavior will lead to better therapies for psychiatric and neurological conditions, including neurodegenerative conditions such as Alzheimer's disease (AD). Magnetic resonance imaging (MRI) allows us to evaluate how these metrics change in various brain...
In Bayesian applications, there is a huge interest in rapid and accurate estimation of the posterior distribution, particularly for high dimensional or hierarchical models. In this article, we propose to use optimization to solve for a joint distribution (random transport plan) between two random variables, θ from the posterior distribution and β f...
In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. Examples include the fused lasso regression, the matrix recovery under an unknown low rank, etc. Despite the ease of obtaining a point estimate via the optimization, it is much more challenging to quantify their uncertainty --- in...
In multivariate data analysis, it is often important to estimate a graph characterizing dependence among p variables. A popular strategy uses the non-zero entries in a p × p covariance or precision matrix, typically requiring restrictive modeling assumptions for accurate graph recovery. To improve model robust-ness, we instead focus on estimating t...
Model-based clustering is widely used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance cl...
In mixture modeling and clustering application, often, the number of components is not known. The stick-breaking model is an appealing construction that assumes infinitely many components, while shrinking most of the redundant weights to near zero. However, it has been discovered that such shrinkage is unsatisfactory: even when the component distri...
Markov chain Monte Carlo is routinely used for posterior estimation in Bayesian models; however, it can suffer from computing inefficiency, especially in high dimensional or hierarchical models, due to the high correlation appearing in the Markov chain. While approximate solutions have become popular, there are concerns about accuracy. Inspired by...
Background:
Beginning at a young age, children with cystic fibrosis (CF) embark on demanding care regimens that pose challenges to parents. We examined the extent to which clinical, demographic and psychosocial features inform patterns of adherence to pulmonary therapies and how these patterns can be used to develop clinical personas, defined as a...
Lasso and $l_1$-regularization play a dominating role in high dimensional statistics and machine learning. The most attractive property is that it produces a sparse parameter estimate containing exact zeros. For uncertainty quantification, popular Bayesian approaches choose a continuous prior that puts concentrated mass near zero; however, as a lim...
Background: Beginning at a young age, children with cystic fibrosis (CF) embark on demanding care regimens that pose challenges to parents. We examined the extent to which clinical, demographic and psychosocial features inform patterns of adherence to pulmonary therapies and how these patterns can be used to develop clinical personas, defined as as...
Background: Beginning at a young age, children with cystic fibrosis (CF) embark on demanding care regimens that pose challenges to parents. We examined the extent to which clinical, demographic and psychosocial features inform patterns of adherence to pulmonary therapies and how these patterns can be used to develop clinical personas, defined as as...
Background: Beginning at a young age, children with cystic fibrosis (CF) embark on demanding care regimens that pose challenges to parents. We examined the extent to which clinical, demographic and psychosocial features inform patterns of adherence to pulmonary therapies and how these patterns can be used to develop clinical personas, defined as as...
The multi-scale factor models are particularly appealing for analyzing matrix- or tensor-valued data, due to their adaptiveness to local geometry and intuitive interpretation. However, the reliance on the binary tree for recursive partitioning creates high complexity in the parameter space, making it extremely challenging to quantify its uncertaint...
In the network data analysis, it is common to encounter a large population of graphs: each has a small-to-moderate number of nodes but together, they show substantial variation from one graph to another. The graph Laplacian, a linear transform of the adjacency matrix, is routinely used in community detection algorithms; however, it is limited to si...
In representation learning and non-linear dimension reduction, there is a huge interest to learn the 'disentangled' latent variables, where each sub-coordinate almost uniquely controls a facet of the observed data. While many regularization approaches have been proposed on variational autoencoders, heuristic tuning is required to balance between di...
High dimensional data often contain multiple facets, and several clustering patterns (views) can co-exist under different feature subspaces. While multi-view clustering algorithms were proposed, the uncertainty quantification remains difficult --- a particular challenge is in the high complexity of estimating the cluster assignment probability unde...
Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance cl...
There has been considerable interest in making Bayesian inference more scalable. In big data settings, most of the focus has been on reducing the computing time per iteration rather than reducing the number of iterations needed in Markov chain Monte Carlo (MCMC). This article considers data augmentation MCMC (DA-MCMC), a widely used technique. DA-M...
Gaussian processes (GPs) are commonplace in spatial statistics. Although many non-stationary models have been developed, there is arguably a lack of flexibility compared to equipping each location with its own parameters. However, the latter suffers from intractable computation and can lead to overfitting. Taking the instantaneous stationarity idea...
A two-level Gaussian process (GP) joint model is proposed to improve personalized prediction of medical monitoring data. The proposed model is applied to jointly analyze multiple longitudinal biomedical outcomes, including continuous measurements and binary outcomes, to achieve better prediction in disease progression. At the population level of th...
Prior information often takes the form of parameter constraints. Bayesian methods include such information through prior distributions having constrained support. By using posterior sampling algorithms, one can quantify uncertainty without relying on asymptotic approximations. However, sharply constrained priors are (a) unrealistic in many settings...
Modern environmental and climatological studies produce multiple outcomes at high spatial resolutions. Multivariate spatial modeling is an established means to quantify cross-correlation among outcomes. However, existing models typically suffer from poor computational efficiency and lack the flexibility to simultaneously estimate auto- and cross-co...
Objective:
More than 70% of hospitals in the United States have electronic health records (EHRs). Clinical decision support (CDS) presents clinicians with electronic alerts during the course of patient care; however, alert fatigue can influence a provider's response to any EHR alert. The primary goal was to evaluate the effects of alert burden on...
Data augmentation is a common technique for building tuning-free Markov chain Monte Carlo algorithms. Although these algorithms are very popular, autocorrelations are often high in large samples, leading to poor computational efficiency. This phenomenon has been attributed to a discrepancy between Gibbs step sizes and the rate of posterior concentr...
Community ecology aims to understand what factors determine the assembly and dynamics of species assemblages at different spatiotemporal scales. To facilitate the integration between conceptual and statistical approaches in community ecology, we propose Hierarchical Modelling of Species Communities (HMSC) as a general, flexible framework for modern...
Objective To identify phenotypes of type 1 diabetes control and associations with maternal/neonatal characteristics based on blood pressure (BP), glucose, and insulin curves during gestation, using a novel functional data analysis approach that accounts for sparse longitudinal patterns of medical monitoring during pregnancy. Methods We performed a...
Lower airway biomarkers of restored cystic fibrosis transmembrane conductance regulator (CFTR) function are limited. We hypothesized that fractional excretion of nitric oxide (FENO), typically low in CF patients, would demonstrate reproducibility during CFTR-independent therapies, and increase during CFTR-specific intervention (ivacaftor) in patien...
Gaussian process regression is a commonly used nonparametric approach in
spatial statistics and functional data analyses. The parameters in the
covariance function provide nice interpretation regarding the decaying pattern
of the correlation. However, its computational cost obstructs its use in
extremely large data or more sophisticated modeling. I...
A novel extrapolation method is proposed for longitudinal forecasting. A
hierarchical Gaussian process model is used to combine nonlinear population
change and individual memory of the past to make prediction. The prediction
error is minimized through the hierarchical design. The method is further
extended to joint modeling of continuous measuremen...
We propose a novel "tree-averaging" model that utilizes the ensemble of
classification and regression trees (CART). Each constituent tree is estimated
with a subset of similar data. We treat this grouping of subsets as Bayesian
ensemble trees (BET) and model them as an infinite mixture Dirichlet process.
We show that BET adapts to data heterogeneit...
Detecting the onset of rapid lung function decline is important to reduce mortality rates in cystic fibrosis (CF) and other lung diseases. The most common approach is conventional linear mixed modeling-estimating a population-level slope of lung function decline and using random effects to address serial correlation-but this ignores nonlinear featu...
Actin plays crucial roles in the life of the cell while being notorious for its inability to reach a functional conformation without the help of assistant proteins. In eukaryotes, for example, the cytosolic chaperonin containing TCP-1 (CCT) and prefoldin (PFD) are required for actin folding assistance and prevention of protein aggregation in the cr...
Human synaptotagmin 1 (Syt1) plays a crucial role in the bending of the membrane during neurotransmitter release at the synapse. Hence, resolving the structural details of Syt1 that underlie its biological function is fundamental for providing mechanistic insights into the nature of the synaptic response. We explored the unfolding micromechanics of...
Projects
Projects (2)
Development of Bayesian modelling framework for analysis of community data.