## About

26

Publications

3,142

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

191

Citations

Citations since 2017

Introduction

Additional affiliations

March 2018 - present

March 2017 - February 2018

Education

October 2010 - December 2016

## Publications

Publications (26)

We consider the problem of semi-supervised regression when the predictor variables are drawn from an unknown manifold. A simple approach to this problem is to first use both the labeled and unlabeled data to estimate the manifold geodesic distance between pairs of points, and then apply a k nearest neighbor regressor based on these distance estimat...

The boundary crossing probability of a Poisson process with n jumps is a fundamental quantity with numerous applications. We present a fast O(n^2 log n) algorithm to calculate this probability for arbitrary upper and lower boundaries.

Continuous goodness-of-fit testing is a classical problem in statistics. Despite having low power for detecting deviations at the tail of a distribution, the most popular test is based on the Kolmogorov-Smirnov statistic. While similar variance-weighted statistics, such as Anderson-Darling and the Higher Criticism statistic give more weight to tail...

Cross‐validation is the de facto standard for predictive model evaluation and selection. In proper use, it provides an unbiased estimate of a model's predictive performance. However, data sets often undergo various forms of data‐dependent preprocessing, such as mean‐centring, rescaling, dimensionality reduction and outlier removal. It is often beli...

Background and Objective
: One of the strengths of single-particle cryo-EM compared to other structural determination techniques is its ability to image heterogeneous samples containing multiple molecular species, different oligomeric states or distinct conformations. This is achieved using routines for in-silico 3D classification that are now well...

Manifold learning methods play a prominent role in nonlinear dimensionality reduction and other tasks involving high-dimensional data sets with low intrinsic dimensionality. Many of these methods are graph-based: they associate a vertex with each data point and a weighted edge with each pair. Existing theory shows that the Laplacian matrix of the g...

Manifold learning methods play a prominent role in nonlinear dimensionality reduction and other tasks involving high-dimensional data sets with low intrinsic dimensionality. Many of these methods are graph-based: they associate a vertex with each data point and a weighted edge between each pair of close points. Existing theory shows, under certain...

Motivated by the 2D class averaging problem in single-particle cryo-electron microscopy (cryo-EM), we present a k-means algorithm based on a rotationally-invariant Wasserstein metric for images. Unlike existing methods that are based on Euclidean ($L_2$) distances, we prove that the Wasserstein metric better accommodates for the out-of-plane angula...

We consider problems of dimensionality reduction and learning data representations for continuous spaces with two or more independent degrees of freedom. Such problems occur, for example, when observing shapes with several components that move independently. Mathematically, if the parameter space of each continuous independent motion is a manifold,...

We present a method for computing exact p-values for a large family of one-sided continuous goodness-of-fit statistics. This includes the higher criticism statistic, one-sided weighted Kolmogorov-Smirnov statistics, and the one-sided Berk-Jones statistics. For a sample size of 10,000, our method takes merely 0.15 seconds to run and it scales to sam...

Unsupervised Determination of the Number of Conformations in Single-particle cryo-EM - Ye Zhou, Amit Moscovich, Priyamvada Acharya, Alberto Bartesaghi

In this paper, we propose a novel approach for manifold learning that combines the Earthmover's distance (EMD) with the diffusion maps method for dimensionality reduction. We demonstrate the potential benefits of this approach for learning shape spaces of proteins and other flexible macromolecules using a simulated dataset of 3-D density maps that...

Single-particle cryo-electron microscopy (EM) has become a popular technique for determining the structure of challenging biomolecules that are inaccessible to other technologies. Recent advances in automation, both in data collection and data processing, have significantly lowered the barrier for non-expert users to successfully execute the struct...

Single-particle cryo-Electron Microscopy (EM) has become a popular technique for determining the structure of challenging biomolecules that are inaccessible to other technologies. Recent advances in automation, both in data collection and data processing, have significantly lowered the barrier for non-expert users to successfully execute the struct...

Single-particle electron cryomicroscopy is an essential tool for high-resolution 3D reconstruction of proteins and other biological macromolecules. An important challenge in cryo-EM is the reconstruction of non-rigid molecules with parts that move and deform. Traditional reconstruction methods fail in these cases, resulting in smeared reconstructio...

In this paper, we propose a novel approach for manifold learning that combines the Earthmover's distance (EMD) with the diffusion maps method for dimensionality reduction. We demonstrate the potential benefits of this approach for learning shape spaces of proteins and other flexible macromolecules using a simulated dataset of 3-D density maps that...

Single-particle electron cryomicroscopy is an essential tool for high-resolution 3D reconstruction of proteins and other biological macromolecules. An important challenge in cryo-EM is the reconstruction of non-rigid molecules with parts that move and deform. Traditional reconstruction methods fail in these cases, resulting in smeared reconstructio...

Cross-validation of predictive models is the de-facto standard for model selection and evaluation. In proper use, it provides an unbiased estimate of a model's predictive performance. However, data sets often undergo a preliminary data-dependent transformation, such as feature rescaling or dimensionality reduction, prior to cross-validation. It is...

We propose a new semiparametric approach to binary classification that exploits the modeling flexibility of sparse graphical models. Specifically, we assume that each class can be represented by a forest-structured graphical model. Under this assumption, the optimal classifier is linear in the log of the one- and two-dimensional marginal densities....

We consider semi-supervised regression when the predictor variables are drawn from an unknown manifold. A simple two step approach to this problem is to: (i) estimate the manifold geodesic distance between any pair of points using both the labeled and unabeled instances; and (ii) apply a k nearest neighbor regressor based on these distance estimate...

We present a fast $O(n^2 \log n)$ algorithm for calculating the probability
that a one-dimensional Poisson process will stay within arbitrary boundaries
that are bounded by $n$. This algorithm is faster than previous $O(n^3)$
methods, and can be used to compute $p$-values for continuous goodness-of-fit
statistics.

Consider the empirical CDF of n samples generated from a known continuous distribution. This poster describes several methods of computing the boundary crossing probability of the empirical CDF given arbitrary boundary functions, including our new O(n^2 log n) algorithm. For more details, see ”Fast calculation of boundary crossing probabilities for...

Continuous goodness-of-fit (GOF) is a classical hypothesis testing problem in
statistics. Despite numerous suggested methods, the Kolmogorov-Smirnov (KS)
test is, by far, the most popular GOF test used in practice. Unfortunately, it
lacks power at the tails, which is important in many practical applications.
In this paper we make two main contribut...