Publications (335)330.16 Total impact
 [Show abstract] [Hide abstract]
ABSTRACT: Single Index Models (SIMs) are simple yet flexible semiparametric models for classification and regression. Response variables are modeled as a nonlinear, monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights, and the nonlinear function. While methods have been described to learn SIMs in the low dimensional regime, a method that can efficiently learn SIMs in high dimensions has not been forthcoming. We propose three variants of a computationally and statistically efficient algorithm for SIM inference in high dimensions. We establish excess risk bounds for the proposed algorithms and experimentally validate the advantages that our SIM learning methods provide relative to Generalized Linear Model (GLM) and low dimensional SIM based learning methods.  [Show abstract] [Hide abstract]
ABSTRACT: This paper investigates the problem of active learning for binary label prediction on a graph. We introduce a simple and labelefficient algorithm called S2 for this task. At each step, S2 selects the vertex to be labeled based on the structure of the graph and all previously gathered labels. Specifically, S2 queries for the label of the vertex that bisects the *shortest shortest* path between any pair of oppositely labeled vertices. We present a theoretical estimate of the number of queries S2 needs in terms of a novel parametrization of the complexity of binary functions on graphs. We also present experimental results demonstrating the performance of S2 on both real and synthetic data. While other graphbased active learning algorithms have shown promise in practice, our algorithm is the first with both good performance and theoretical guarantees. Finally, we demonstrate the implications of the S2 algorithm to the theory of nonparametric active learning. In particular, we show that S2 achieves near minimax optimal excess risk for an important class of nonparametric classification problems.  [Show abstract] [Hide abstract]
ABSTRACT: Lowrank matrix completion (LRMC) problems arise in a wide variety of applications. Previous theory mainly provides conditions for completion under missingatrandom samplings. An incomplete $d \times N$ matrix is $\textit{finitely completable}$ if there are at most finitely many rank$r$ matrices that agree with all its observed entries. Finite completability is the tipping point in LRMC, as a few additional samples of a finitely completable matrix guarantee its $\textit{unique}$ completability. The main contribution of this paper is a full characterization of finitely completable observation sets. We use this characterization to derive sufficient deterministic sampling conditions for unique completability. We also show that under uniform random sampling schemes, these conditions are satisfied with high probability if at least $\mathscr{O}(\max\{r,\log d \})$ entries per column are observed.  [Show abstract] [Hide abstract]
ABSTRACT: This paper considers the problem of recovering an unknown sparse p×p matrix X from an m×m matrix Y=AXBT, where A and B are known m×p matrices with m≪p. The main result shows that there exist constructions of the sketching matrices A and B so that even if X has O(p) nonzeros, it can be recovered exactly and efficiently using a convex program as long as these nonzeros are not concentrated in any single row/column of X. Furthermore, it suffices for the size of Y (the sketch dimension) to scale as m = O(√(# nonzeros in X) × log p). The results also show that the recovery is robust and stable in the sense that if X is equal to a sparse matrix plus a perturbation, then the convex program we propose produces an approximation with accuracy proportional to the size of the perturbation. Unlike traditional results on sparse recovery, where the sensing matrix produces independent measurements, our sensing operator is highly constrained (it assumes a tensor product structure). Therefore, proving recovery guarantees require nonstandard techniques. Indeed, our approach relies on a novel result concerning tensor products of bipartite graphs, which may be of independent interest. This problem is motivated by the following application, among others. Consider a p×n data matrix D, consisting of n observations of p variables. Assume that the correlation matrix X:=DDT is (approximately) sparse in the sense that each of the p variables is significantly correlated with only a few others. Our results show that these significant correlations can be detected even if we have access to only a sketch of the data S=AD with A ∈ Rm×p . 
Article: Sparse Dueling Bandits
[Show abstract] [Hide abstract]
ABSTRACT: The dueling bandit problem is a variation of the classical multiarmed bandit in which the allowable actions are noisy comparisons between pairs of arms. This paper focuses on a new approach for finding the "best" arm according to the Borda criterion using noisy comparisons. We prove that in the absence of structural assumptions, the sample complexity of this problem is proportional to the sum of the inverse squared gaps between the Borda scores of each suboptimal arm and the best arm. We explore this dependence further and consider structural constraints on the pairwise comparison matrix (a particular form of sparsity natural to this problem) that can significantly reduce the sample complexity. This motivates a new algorithm called Successive Elimination with Comparison Sparsity (SECS) that exploits sparsity to find the Borda winner using fewer samples than standard algorithms. We also evaluate the new algorithm experimentally with synthetic and real data. The results show that the sparsity model and the new algorithm can provide significant improvements over standard approaches.  [Show abstract] [Hide abstract]
ABSTRACT: Classification with a sparsity constraint on the solution plays a central role in many high dimensional signal processing applications. In some cases, the features can be grouped together, so that entire subsets of features can be selected or discarded. In many applications, however, this can be too restrictive. In this paper, we are interested in a less restrictive form of structured sparse feature selection: We assume that while features can be grouped according to some notion of similarity, not all features in a group need be selected for the task at hand. The Sparse Group Lasso (SGL) was proposed to solve problems of this form. The main contributions of this paper are a new procedure called Sparse Overlapping Group (SOG) lasso, an extension to the SGL to overlapping groups and theoretical sample complexity bounds for the same. We establish model selection error bounds that specializes to many other cases. We experimentally validate our proposed method on both real and toy datasets.  [Show abstract] [Hide abstract]
ABSTRACT: Consider a generic $r$dimensional subspace of $\mathbb{R}^d$, $r<d$, and suppose that we are only given projections of this subspace onto small subsets of the canonical coordinates. The paper establishes necessary and sufficient deterministic conditions on the subsets for subspace identifiability. 
Article: Sparse Estimation with Strongly Correlated Variables using Ordered Weighted L1 Regularization
[Show abstract] [Hide abstract]
ABSTRACT: This paper studies ordered weighted L1 (OWL) norm regularization for sparse estimation problems with strongly correlated variables. We prove sufficient conditions for clustering based on the correlation/colinearity of variables using the OWL norm, of which the socalled OSCAR is a particular case. Our results extend previous ones for OSCAR in several ways: for the squared error loss, our conditions hold for the more general OWL norm and under weaker assumptions; we also establish clustering conditions for the absolute error loss, which is, as far as we know, a novel result. Furthermore, we characterize the statistical performance of OWL norm regularization for generative models in which certain clusters of regression variables are strongly (even perfectly) correlated, but variables in different clusters are uncorrelated. We show that if the true pdimensional signal generating the data involves only s of the clusters, then O(s log p) samples suffice to accurately estimate the signal, regardless of the number of coefficients within the clusters. The estimation of ssparse signals with completely independent variables requires just as many measurements. In other words, using the OWL we pay no price (in terms of the number of measurements) for the presence of strongly correlated variables. 
Article: To lie or not to lie in a subspace
[Show abstract] [Hide abstract]
ABSTRACT: We give deterministic necessary and sufficient conditions to guarantee that if a subspace fits certain partially observed data from a union of subspaces, it is because such data really lies in a subspace. Furthermore, we give deterministic necessary and sufficient conditions to guarantee that if a subspace fits certain partially observed data, such subspace is unique. We do this by characterizing when and only when a set of incomplete vectors behaves as a single but complete one.  [Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It has been known however that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full datarequirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel algorithm that provably improves over all previous methods in a regime of interest. 
Conference Paper: On the sample complexity of subspace clustering with missing data
[Show abstract] [Hide abstract]
ABSTRACT: Subspace clustering is a useful tool for analyzing large complex data, but in many relevant applications missing data are common. Existing theoretical analysis of this problem shows that subspace clustering from incomplete data is possible, but that analysis requires the number of samples (i.e., partially observed vectors) to be superpolynomial in the dimension d. Such huge sample sizes are unnecessary when no data are missing and uncommon in applications. There are two main contributions in this paper. First, it is shown that if subspaces have rank at most r and the number of partially observed vectors greater than dr+1 (times a polylogarithmic factor), then with high probability the true subspaces are the only subspaces that agree with the observed data. We may conclude that subspace clustering may be possible without impractically large sample sizes and that we can certify the output of any subspace clustering algorithm by checking its fit to the observed data. The second main contribution is a novel EMtype algorithm for subspace clustering with missing data. We demonstrate and compare it to several other algorithms. Experiments with simulated and real data show that such algorithms work well in practice.  [Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full datarequirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel reconstruction algorithm that provably improves over all previous methods in a regime of interest.  [Show abstract] [Hide abstract]
ABSTRACT: This paper studies graphical model selection, i.e., the problem of estimating a graph of statistical relationships among a collection of random variables. Conventional graphical model selection algorithms are passive, i.e., they require all the measurements to have been collected before processing begins. We propose an active learning algorithm that uses junction tree representations to adapt future measurements based on the information gathered from prior measurements. We prove that, under certain conditions, our active learning algorithm requires fewer scalar measurements than any passive algorithm to reliably estimate a graph. A range of numerical results validate our theory and demonstrates the benefits of active learning. 
Conference Paper: Bestarm identification algorithms for multiarmed bandits in the fixed confidence setting
[Show abstract] [Hide abstract]
ABSTRACT: This paper is concerned with identifying the arm with the highest mean in a multiarmed bandit problem using as few independent samples from the arms as possible. While the socalled “best arm problem” dates back to the 1950s, only recently were two qualitatively different algorithms proposed that achieve the optimal sample complexity for the problem. This paper reviews these recent advances and shows that most bestarm algorithms can be described as variants of the two recent optimal algorithms. For each algorithm type we consider a specific instance to analyze both theoretically and empirically thereby exposing the core components of the theoretical analysis of these algorithms and intuition about how the algorithms work in practice. The derived sample complexity bounds are novel, and in certain cases improve upon previous bounds. In addition, we compare a variety of stateoftheart algorithms empirically through simulations for the bestarmproblem.  [Show abstract] [Hide abstract]
ABSTRACT: Binary logistic regression with a sparsity constraint on the solution plays a vital role in many high dimensional machine learning applications. In some cases, the features can be grouped together, so that entire subsets of features can be selected or zeroed out. In many applications, however, this can be very restrictive. In this paper, we are interested in a less restrictive form of structured sparse feature selection: we assume that while features can be grouped according to some notion of similarity, not all features in a group need be selected for the task at hand. This is sometimes referred to as a "sparse group" lasso procedure, and it allows for more flexibility than traditional group lasso methods. Our framework generalizes conventional sparse group lasso further by allowing for overlapping groups, an additional flexibility that presents further challenges. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization program that automatically selects similar features for learning in high dimensions. We establish consistency results for the SOSlasso for classification problems using the logistic regression setting, which specializes to results for the lasso and the group lasso, some known and some new. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features, source localization problems in Magnetoencephalography (MEG), and analyzing gene activation patterns in microarray data analysis. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.  [Show abstract] [Hide abstract]
ABSTRACT: Secondharmonic generation (SHG) imaging can help reveal interactions between collagen fibers and cancer cells. Quantitative analysis of SHG images of collagen fibers is challenged by the heterogeneity of collagen structures and low signaltonoise ratio often found while imaging collagen in tissue. The role of collagen in breast cancer progression can be assessed post acquisition via enhanced computation. To facilitate this, we have implemented and evaluated four algorithms for extracting fiber information, such as number, length, and curvature, from a variety of SHG images of collagen in breast tissue. The imageprocessing algorithms included a Gaussian filter, SPIRALTV filter, Tubeness filter, and curveletdenoising filter. Fibers are then extracted using an automated tracking algorithm called fiber extraction (FIRE). We evaluated the algorithm performance by comparing length, angle and position of the automatically extracted fibers with those of manually extracted fibers in twentyfive SHG images of breast cancer. We found that the curveletdenoising filter followed by FIRE, a process we call CTFIRE, outperforms the other algorithms under investigation. CTFIRE was then successfully applied to track collagen fiber shape changes over time in an in vivo mouse model for breast cancer.  [Show abstract] [Hide abstract]
ABSTRACT: The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multiarmed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCBtype algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the stateoftheart.  [Show abstract] [Hide abstract]
ABSTRACT: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multi subject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.  [Show abstract] [Hide abstract]
ABSTRACT: In many applications in signal and image processing, communications, and system identification, one aims to recover a signal that has a simple representation in a given basis or frame. Key devices for obtaining such representations are objects called atoms, and functions called atomic norms. These concepts unify the idea of simple representations across several known applications, and motivate extensions to new problem classes of interest. In important special cases, fast and efficient algorithms are available to solve the reconstruction problems, but an approach that works well for the general atomicnorm paradigm has not been forthcoming to date. In this paper, we combine a greedy selection scheme with a backward step that sparsifies the basis by removing less significant elements that were included at earlier iterations. We show that the overall scheme achieves the same convergence rate as the forward greedy scheme alone, provided that backward steps are taken only when they do not degrade the solution quality too badly. Finally, we validate our method by describing applications to three problems of interest.  [Show abstract] [Hide abstract]
ABSTRACT: This paper studies the sample complexity of searching over multiple populations. We consider a large number of populations, each corresponding to either distribution P0 or P1. The goal of the search problem studied here is to find one population corresponding to distribution P1 with as few samples as possible. The main contribution is to quantify the number of samples needed to correctly find one such population. We consider two general approaches: nonadaptive sampling methods, which sample each population a predetermined number of times until a population following P1 is found, and adaptive sampling methods, which employ sequential sampling schemes for each population. We first derive a lower bound on the number of samples required by any sampling scheme. We then consider an adaptive procedure consisting of a series of sequential probability ratio tests, and show it comes within a constant factor of the lower bound. We give explicit expressions for this constant when samples of the populations follow Gaussian and Bernoulli distributions. An alternative adaptive scheme is discussed which does not require full knowledge of P1, and comes within a constant factor of the optimal scheme. For comparison, a lower bound on the sampling requirements of any nonadaptive scheme is presented.
Publication Stats
13k  Citations  
330.16  Total Impact Points  
Top Journals
Institutions

22015

University of Wisconsin–Madison
 Department of Electrical and Computer Engineering
Madison, Wisconsin, United States


2011

University of Adelaide
 School of Mathematical Sciences
Tarndarnya, South Australia, Australia


2010

University of Houston
 Department of Electrical & Computer Engineering
Houston, TX, United States


19962010

Rice University
 Department of Electrical and Computer Engineering
Houston, Texas, United States


2008

Technical University of Lisbon
 Instituto de Telecomunicações (IT)
Lisbon, Lisbon, Portugal


2007

McGill University
 Department of Electrical & Computer Engineering
Montréal, Quebec, Canada


19992005

Boston University
 Department of Mathematics and Statistics
Boston, Massachusetts, United States


2004

Georgia Institute of Technology
 School of Electrical & Computer Engineering
Atlanta, Georgia, United States


19961999

Michigan State University
 Department of Electrical and Computer Engineering
ИстЛансинг, Michigan, United States
