## About

40

Publications

8,523

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

834

Citations

Citations since 2017

Introduction

Additional affiliations

October 2015 - September 2016

January 2012 - June 2015

## Publications

Publications (40)

Improving the efficiency of algorithms for fundamental computations can have a widespread impact, as it can affect the overall speed of a large amount of computations. Matrix multiplication is one such primitive task, occurring in many systems—from neural networks to scientific computing routines. The automatic discovery of algorithms using machine...

This paper proposes a method for estimating consumer preferences among discrete choices, where the consumer chooses at most one product in a category, but selects from multiple categories in parallel. The consumer’s utility is additive in the different categories. Her preferences about product attributes as well as her price sensitivity vary across...

A graph generative model defines a distribution over graphs. One type of generative model is constructed by autoregressive neural networks, which sequentially add nodes and edges to generate a graph. However, the likelihood of a graph under the autoregressive model is intractable, as there are numerous sequences leading to the given graph; this mak...

We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, whic...

The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture; one of them parameterizes the model's likelihood. Fitting its parameters via maximum likelihood is challenging since the computation of the likelihood involves an intractable integral over the latent space; thus the V...

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the embedded topic model (etm), a generative model of documents that marries traditional topic models with word embeddings. More spe...

Generative adversarial networks (GANs) are a powerful approach to unsupervised learning. They have achieved state-of-the-art performance in the image domain. However, GANs are limited in two ways. They often learn distributions with low support---a phenomenon known as mode collapse---and they do not guarantee the existence of a probability density,...

Topic modeling analyzes documents to learn meaningful patterns of words. Dynamic topic models capture how these patterns vary over time for a set of documents that were collected over a large time span. We develop the dynamic embedded topic model (D-ETM), a generative model of documents that combines dynamic latent Dirichlet allocation (D-LDA) and...

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings. In parti...

This paper proposes a method for estimating consumer preferences among discrete choices, where the consumer chooses at most one product in a category, but selects from multiple categories in parallel. The consumer's utility is additive in the different categories. Her preferences about product attributes as well as her price sensitivity vary across...

We develop a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches. Specifically, we improve the variational distribution by running a few MCMC steps. To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replac...

Expanded View Figures PDF

Common approaches to gene signature discovery in single‐cell RNA‐sequencing (scRNA‐seq) depend upon predefined structures like clusters or pseudo‐temporal order, require prior normalization, or do not account for the sparsity of single‐cell data. We present single‐cell hierarchical Poisson factorization (scHPF), a Bayesian factorization method that...

We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family. UIVI considers an implicit variational distribution obtained in a hierarchical manner using a simple reparameterizable distribution whose variational parameters are defined by arbi...

Common approaches to gene signature discovery in single cell RNA-sequencing (scRNA-seq) depend upon predefined structures like clusters or pseudo-temporal order, require prior normalization, or do not account for the sparsity of single cell data. We present single cell Hierarchical Poisson Factorization (scHPF), a Bayesian factorization method that...

Categorical distributions are ubiquitous in machine learning, e.g., in classification, language models, and recommendation systems. They are also at the core of discrete choice models. However, when the number of possible outcomes is very large, using categorical distributions becomes computationally expensive, as the complexity scales linearly wit...

This paper analyzes consumer choices over lunchtime restaurants using data from a sample of several thousand anonymous mobile phone users in the San Francisco Bay Area. The data is used to identify users' approximate typical morning location, as well as their choices of lunchtime restaurants. We build a model where restaurants have latent character...

New communication standards need to deal with machine-to-machine communications, in which users may start or stop transmitting at any time in an asynchronous manner. Thus, the number of users is an unknown and time-varying parameter that needs to be accurately estimated in order to properly recover the symbols transmitted by all users in the system...

We develop SHOPPER, a sequential probabilistic model of market baskets. SHOPPER uses interpretable components to model the forces that drive how a customer chooses products; in particular, we designed SHOPPER to capture how items interact with other items. We develop an efficient posterior inference algorithm to estimate these forces from large-sca...

Word embeddings are a powerful approach for analyzing language, and exponential family embeddings (EFE) extend them to other types of data. Here we develop structured exponential family embeddings (S-EFE), a method for discovering embeddings that vary across related groups of data. We study how the word usage of U.S. Congressional speeches varies a...

This paper addresses the mapping problem. Using a conjugate prior form, we derive the exact theoretical batch multiobject posterior density of the map given a set of measurements. The landmarks in the map are modeled as extended objects, and the measurements are described as a Poisson process, conditioned on the map. We use a Poisson process prior...

The goal of causal inference is to understand the outcome of alternative courses of action. However, all causal inference requires assumptions. Such assumptions can be more influential than in typical tasks for probabilistic modeling, and testing those assumptions is important to assess the validity of causal inference. We develop model criticism f...

Variational inference using the reparameterization trick has enabled large-scale approximate Bayesian inference in complex probabilistic models, leveraging stochastic optimization to sidestep intractable expectations. The reparameterization trick is applicable when we can simulate a random variable by applying a (differentiable) deterministic funct...

The reparameterization gradient has become a widely used method to obtain Monte Carlo gradients to optimize the variational objective. However, this technique does not easily apply to commonly used distributions such as beta or gamma without further approximations, and most practical applications of the reparameterization gradient fit Gaussian dist...

Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word embeddings to other types of high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a mark...

We introduce overdispersed black-box variational inference, a method to reduce the variance of the Monte Carlo estimator of the gradient in black-box variational inference. Instead of taking samples from the variational distribution, we use importance sampling to take samples from an overdispersed distribution in the same exponential family as the...

This paper presents a novel application of Bayesian nonparametrics (BNP) for marathon data modeling. We make use of two well-known BNP priors, the single-p dependent Dirichlet process and the hierarchical Dirichlet process, in order to address two different problems. First, we study the impact of age, gender and environment on the runners' performa...

Data used in this study as a.mat file.
Further descriptions are found in the README files inside.
(MAT)

We aim at finding the comorbidity patterns of substance abuse, mood and personality disorders using the diagnoses from the National Epidemiologic Survey on Alcohol and Related Conditions database. To this end, we propose a novel Bayesian nonparametric latent feature model for categorical observations, based on the Indian buffet process, in which th...

There are many scenarios in artificial intelligence, signal processing or medicine, in which a temporal sequence consists of several unknown overlapping independent causes, and we are interested in accurately recovering those canonical causes. Factorial hidden Markov models (FHMMs) present the versatility to provide a good fit to these scenarios. H...

In many modern multiuser communication systems, users are allowed to enter and leave the system at any given time. Thus, the number of active users is an unknown and time-varying parameter, and the performance of the system depends on how accurately this parameter is estimated over time. We address the problem of blind joint channel parameter and d...

We show that a classical model for soccer can also provide competitive results in predicting basketball outcomes. We modify the classical model in two ways in order to capture both the specific behavior of each National collegiate athletic association (NCAA) conference and different strategies of teams and conferences. Through simulated bets on six...

Bayesian nonparametric models allow solving estimation and detection problems with an unbounded number of degrees of freedom. In multiuser multiple-input multiple-output (MIMO) communication systems we might not know the number of active users and the channel they face, and assuming maximal scenarios (maximum number of transmitters and maximum chan...

The analysis of comorbidity is an open and complex research field in the
branch of psychiatry, where clinical experience and several studies suggest
that the relation among the psychiatric disorders may have etiological and
treatment implications. In this paper, we are interested in applying latent
feature modeling to find the latent structure behi...

The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, etc., of a representative sample of the U.S. population. In this paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model...

In this paper, we propose nontrivial codes that achieve a non-zero zero-error rate for several odd-letter noisy-typewriter channels. Some of these codes (specifically, those which are defined for a number of letters of the channel of the form 2n + 1) achieve the best-known lower bound on the zero-error capacity. We build the codes using linear code...

## Projects

Project (1)

We apply exponential family embeddings (Rudolph et al., 2016) on shopping data in order to uncover users' shopping patterns and reveal pairs of items that are substitutes and complements of each other.