Publications (233)187.56 Total impact

Article: Sparse Estimation with Strongly Correlated Variables using Ordered Weighted L1 Regularization
[Show abstract] [Hide abstract]
ABSTRACT: This paper studies ordered weighted L1 (OWL) norm regularization for sparse estimation problems with strongly correlated variables. We prove sufficient conditions for clustering based on the correlation/colinearity of variables using the OWL norm, of which the socalled OSCAR is a particular case. Our results extend previous ones for OSCAR in several ways: for the squared error loss, our conditions hold for the more general OWL norm and under weaker assumptions; we also establish clustering conditions for the absolute error loss, which is, as far as we know, a novel result. Furthermore, we characterize the statistical performance of OWL norm regularization for generative models in which certain clusters of regression variables are strongly (even perfectly) correlated, but variables in different clusters are uncorrelated. We show that if the true pdimensional signal generating the data involves only s of the clusters, then O(s log p) samples suffice to accurately estimate the signal, regardless of the number of coefficients within the clusters. The estimation of ssparse signals with completely independent variables requires just as many measurements. In other words, using the OWL we pay no price (in terms of the number of measurements) for the presence of strongly correlated variables.09/2014;  [Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full datarequirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel reconstruction algorithm that provably improves over all previous methods in a regime of interest.04/2014;  [Show abstract] [Hide abstract]
ABSTRACT: Binary logistic regression with a sparsity constraint on the solution plays a vital role in many high dimensional machine learning applications. In some cases, the features can be grouped together, so that entire subsets of features can be selected or zeroed out. In many applications, however, this can be very restrictive. In this paper, we are interested in a less restrictive form of structured sparse feature selection: we assume that while features can be grouped according to some notion of similarity, not all features in a group need be selected for the task at hand. This is sometimes referred to as a "sparse group" lasso procedure, and it allows for more flexibility than traditional group lasso methods. Our framework generalizes conventional sparse group lasso further by allowing for overlapping groups, an additional flexibility that presents further challenges. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization program that automatically selects similar features for learning in high dimensions. We establish consistency results for the SOSlasso for classification problems using the logistic regression setting, which specializes to results for the lasso and the group lasso, some known and some new. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features, source localization problems in Magnetoencephalography (MEG), and analyzing gene activation patterns in microarray data analysis. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.02/2014;  [Show abstract] [Hide abstract]
ABSTRACT: Secondharmonic generation (SHG) imaging can help reveal interactions between collagen fibers and cancer cells. Quantitative analysis of SHG images of collagen fibers is challenged by the heterogeneity of collagen structures and low signaltonoise ratio often found while imaging collagen in tissue. The role of collagen in breast cancer progression can be assessed post acquisition via enhanced computation. To facilitate this, we have implemented and evaluated four algorithms for extracting fiber information, such as number, length, and curvature, from a variety of SHG images of collagen in breast tissue. The imageprocessing algorithms included a Gaussian filter, SPIRALTV filter, Tubeness filter, and curveletdenoising filter. Fibers are then extracted using an automated tracking algorithm called fiber extraction (FIRE). We evaluated the algorithm performance by comparing length, angle and position of the automatically extracted fibers with those of manually extracted fibers in twentyfive SHG images of breast cancer. We found that the curveletdenoising filter followed by FIRE, a process we call CTFIRE, outperforms the other algorithms under investigation. CTFIRE was then successfully applied to track collagen fiber shape changes over time in an in vivo mouse model for breast cancer.Journal of Biomedical Optics 01/2014; 19(1):16007. · 2.88 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multiarmed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCBtype algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the stateoftheart.12/2013;  [Show abstract] [Hide abstract]
ABSTRACT: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multi subject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.11/2013;  [Show abstract] [Hide abstract]
ABSTRACT: This paper proposes a simple adaptive sensing and group testing algorithm for sparse signal recovery. The algorithm, termed Compressive Adaptive Sense and Search (CASS), is shown to be nearoptimal in that it succeeds at the lowest possible signaltonoiseratio (SNR) levels. Like traditional compressed sensing based on random nonadaptive design matrices, the CASS algorithm requires only k log n measurements to recover a ksparse signal of dimension n. However, CASS succeeds at SNR levels that are a factor log n less than required by standard compressed sensing. From the point of view of constructing and implementing the sensing operation as well as computing the reconstruction, the proposed algorithm is substantially less computationally intensive than standard compressed sensing. CASS is also demonstrated to perform considerably better in practice through simulation. To the best of our knowledge, this is the first demonstration of an adaptive compressed sensing algorithm with nearoptimal theoretical guarantees and excellent practical performance. This paper also shows that methods like compressed sensing, group testing, and pooling have an advantage beyond simply reducing the number of measurements or tests  adaptive versions of such methods can also improve detection and estimation performance when compared to nonadaptive direct (uncompressed) sensing.06/2013;  [Show abstract] [Hide abstract]
ABSTRACT: Sampling from distributions to find the one with the largest mean arises in a broad range of applications, and it can be mathematically modeled as a multiarmed bandit problem in which each distribution is associated with an arm. This paper studies the sample complexity of identifying the best arm (largest mean) in a multiarmed bandit problem. Motivated by largescale applications, we are especially interested in identifying situations where the total number of samples that are necessary and sufficient to find the best arm scale linearly with the number of arms. We present a singleparameter multiarmed bandit model that spans the range from linear to superlinear sample complexity. We also give a new algorithm for best arm identification, called PRISM, with linear sample complexity for a wide range of mean distributions. The algorithm, like most exploration procedures for multiarmed bandits, is adaptive in the sense that the next arms to sample are selected based on previous samples. We compare the sample complexity of adaptive procedures with simpler nonadaptive procedures using new lower bounds. For many problem instances, the increased sample complexity required by nonadaptive procedures is a polynomial factor of the number of arms.06/2013; 
Article: Sketching Sparse Matrices
[Show abstract] [Hide abstract]
ABSTRACT: This paper considers the problem of recovering an unknown sparse p\times p matrix X from an m\times m matrix Y=AXB^T, where A and B are known m \times p matrices with m << p. The main result shows that there exist constructions of the "sketching" matrices A and B so that even if X has O(p) nonzeros, it can be recovered exactly and efficiently using a convex program as long as these nonzeros are not concentrated in any single row/column of X. Furthermore, it suffices for the size of Y (the sketch dimension) to scale as m = O(\sqrt{# nonzeros in X} \times log p). The results also show that the recovery is robust and stable in the sense that if X is equal to a sparse matrix plus a perturbation, then the convex program we propose produces an approximation with accuracy proportional to the size of the perturbation. Unlike traditional results on sparse recovery, where the sensing matrix produces independent measurements, our sensing operator is highly constrained (it assumes a tensor product structure). Therefore, proving recovery guarantees require nonstandard techniques. Indeed our approach relies on a novel result concerning tensor products of bipartite graphs, which may be of independent interest. This problem is motivated by the following application, among others. Consider a p\times n data matrix D, consisting of n observations of p variables. Assume that the correlation matrix X:=DD^{T} is (approximately) sparse in the sense that each of the p variables is significantly correlated with only a few others. Our results show that these significant correlations can be detected even if we have access to only a sketch of the data S=AD with A \in R^{m\times p}.03/2013;  [Show abstract] [Hide abstract]
ABSTRACT: This special issue features papers that highlight a variety of anomaly detection techniques in the context of many important applications including networks, homeland security, and healthcare.We accepted 13 papers for publication out of a large number of submitted manuscripts. The first series of seven papers appearing in the special issue are focused on new problem structures. The next set of six papers are somewhat focused more towards specific applications. Several papers in the first half of the special issue are generally motivated by applications arising in sensor, communication and social networks. The next set of three papers describes techniques for homeland security and surveillance applications.IEEE Journal of Selected Topics in Signal Processing 02/2013; 7(1):13. · 3.30 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: This paper studies the sample complexity of searching over multiple populations. We consider a large number of populations, each corresponding to either distribution P0 or P1. The goal of the search problem studied here is to find one population corresponding to distribution P1 with as few samples as possible. The main contribution is to quantify the number of samples needed to correctly find one such population. We consider two general approaches: nonadaptive sampling methods, which sample each population a predetermined number of times until a population following P1 is found, and adaptive sampling methods, which employ sequential sampling schemes for each population. We first derive a lower bound on the number of samples required by any sampling scheme. We then consider an adaptive procedure consisting of a series of sequential probability ratio tests, and show it comes within a constant factor of the lower bound. We give explicit expressions for this constant when samples of the populations follow Gaussian and Bernoulli distributions. An alternative adaptive scheme is discussed which does not require full knowledge of P1, and comes within a constant factor of the optimal scheme. For comparison, a lower bound on the sampling requirements of any nonadaptive scheme is presented.IEEE Transactions on Information Theory 01/2013; 59(8):50395050. · 2.62 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: This paper studies sequential methods for recovery of sparse signals in high dimensions. When compared to fixed sample size procedures, in the sparse setting, sequential methods can result in a particularly large reduction in the number of samples needed for reliable signal support recovery. Starting with a lower bound, we show any sequential sampling procedure fails in the high dimensional limit provided the average number of measurements per dimension is less than log s /D(P0P1), where s is the level of sparsity and D(P0P1) the KullbackLeibler divergence between the underlying distributions. An extension of the Sequential Probability Ratio Test (SPRT) which requires complete knowledge of the underlying distributions is shown to achieve this bound. We introduce a simple procedure termed Sequential Thresholding which can be implemented with limited knowledge of the underlying distributions, and guarantees exact support recovery provided the average number of measurements per dimension grows faster than log s / D(P0P1), achieving the lower bound. For comparison, we show any nonsequential procedure fails provided the number of measurements grows at a rate less than log n / D(P1P0), where n is the total dimension of the problem.12/2012;  [Show abstract] [Hide abstract]
ABSTRACT: Locationspecific Internet services are predicated on the ability to identify the geographic position of IP hosts accurately. Fundamental to current stateoftheart geolocation techniques is reliance on heavyweight traceroutelike probes that put a significant traffic load on networks. In this paper, we introduce a new lightweight approach to IP geolocation that we call Posit. This methodology requires only a small number of delay measurements conducted to end host targets in conjunction with a computationallyefficient statistical embedding technique. We demonstrate that Posit performs better than all existing geolocation tools across a wide spectrum of measurement infrastructures with varying geographic densities. Specifically, Posit is shown to geolocate hosts with median error improvements of over 55% with respect to all current measurementbased IP geolocation methodologies.ACM SIGMETRICS Performance Evaluation Review 10/2012; 40(2):211.  [Show abstract] [Hide abstract]
ABSTRACT: In applications ranging from communications to genetics, signals can be modeled as lying in a union of subspaces. Under this model, signal coefficients that lie in certain subspaces are active or inactive together. The potential subspaces are known in advance, but the particular set of subspaces that are active (i.e., in the signal support) must be learned from measurements. We show that exploiting knowledge of subspaces can further reduce the number of measurements required for exact signal recovery, and derive universal bounds for the number of measurements needed. The bound is universal in the sense that it only depends on the number of subspaces under consideration, and their orientation relative to each other. The particulars of the subspaces (e.g., compositions, dimensions, extents, overlaps, etc.) does not affect the results we obtain. In the process, we derive sample complexity bounds for the special case of the group lasso with overlapping groups (the latent group lasso), which is used in a variety of applications. Finally, we also show that wavelet transform coefficients of images can be modeled as lying in groups, and hence can be efficiently recovered using group lasso methods.09/2012;  [Show abstract] [Hide abstract]
ABSTRACT: This paper studies the sample complexity of searching over multiple populations. We consider a large number of populations, each corresponding to either distribution P0 or P1. The goal of the search problem studied here is to find one population corresponding to distribution P1 with as few samples as possible. The main contribution is to quantify the number of samples needed to correctly find one such population. We consider two general approaches: nonadaptive sampling methods, which sample each population a predetermined number of times until a population following P1 is found, and adaptive sampling methods, which employ sequential sampling schemes for each population. We first derive a lower bound on the number of samples required by any sampling scheme. We then consider an adaptive procedure consisting of a series of sequential probability ratio tests, and show it comes within a constant factor of the lower bound. We give explicit expressions for this constant when samples of the populations follow Gaussian and Bernoulli distributions. An alternative adaptive scheme is discussed which does not require full knowledge of P1, and comes within a constant factor of the optimal scheme. For comparison, a lower bound on the sampling requirements of any nonadaptive scheme is presented.09/2012;  [Show abstract] [Hide abstract]
ABSTRACT: This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Booleanvalued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Booleanvalued function comparisons, the convergence rate is the same.09/2012;  [Show abstract] [Hide abstract]
ABSTRACT: A key challenge in wireless networking is the management of interference between transmissions. Identifying which transmitters interfere with each other is a crucial first step. Complicating the task is the fact that the topology of wireless networks changes with time, and so identification may need to be performed on a regular basis. Injecting probing traffic to assess interference can lead to unacceptable overhead, and so this paper focuses on interference estimation based on passive traffic monitoring. We concentrate on networks that use the CSMA/CA protocol, although our model is more general. We cast the task of estimating the interference environment as a graph learning problem. Nodes represent transmitters and edges represent the presence of interference between pairs of transmitters. We passively observe network traffic transmission patterns and collect information on transmission successes and failures. We establish bounds on the number of observations required to identify the interference graph reliably with high probability. Our main results are scaling laws telling us how the number of observations must grow in terms of the total number of nodes $n$ in the network and the maximum number of interfering transmitters $d$ per node (maximum node degree). The effects of hidden terminal interference on the observation requirements are also quantified. We show that it is necessary and sufficient that the observation period grows like $d^2 \log n$, and we propose a practical algorithm that reliably identifies the graph from this length of observation. The observation requirements scale quite mildly with network size, and networks with sparse interference (small $d$) can be identified more rapidly. Computational experiments based on a realistic simulations of the traffic and protocol lend additional support to these conclusions.08/2012;  [Show abstract] [Hide abstract]
ABSTRACT: Many realworld phenomena can be represented by a spatiotemporal signal: where, when, and how much. Social media is a tantalizing data source for those who wish to monitor such signals. Unlike most prior work, we assume that the target phenomenon is known and we are given a method to count its occurrences in social media. However, counting is plagued by sample bias, incomplete data, and, paradoxically, data scarcity  issues inadequately addressed by prior work. We formulate signal recovery as a Poisson point process estimation problem. We explicitly incorporate human population bias, time delays and spatial distortions, and spatiotemporal regularization into the model to address the noisy count issues. We present an efficient optimization algorithm and discuss its theoretical properties. We show that our model is more accurate than commonlyused baselines. Finally, we present a case study on wildlife roadkill monitoring, where our model produces qualitatively convincing results.04/2012;  [Show abstract] [Hide abstract]
ABSTRACT: We propose a simple modification to the recently proposed compressive binary search. The modification removes an unnecessary and suboptimal factor of log log n from the SNR requirement, making the procedure optimal (up to a small constant). Simulations show that the new procedure performs significantly better in practice as well. We also contrast this problem with the more well known problem of noisy binary search.03/2012; 
Conference Paper: Quickest search for a rare distribution
[Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of finding one sequence of independent random variables following a rare atypical distribution, P1, amongst a large number of sequences following some null distribution, P0. We quantify the number of samples needed to correctly identify one atypical sequence as the atypical sequences become increasingly rare. We show that the known optimal procedure, which consists of a series of sequential probability ratio tests, succeeds with high probability provided the number of samples grows at a rate equal to a constant times π1 D(P1∥P0)1, where π is the prior probability of a sequence being atypical, and D(P1∥P0) is the KullbackLeibler divergence. Using techniques from sequential analysis, we show that if the number of samples grow at a rate equal to π1 D(P1∥P0)1, any procedure fails. This is then compared to sequential thresholding [1], a simple procedure which can be implemented without exact knowledge of distribution P1. We also show that the SPRT and sequential thresholding are fairly robust to our knowledge of π. Lastly, a lower bound for nonsequential procedures is derived for comparison.Information Sciences and Systems (CISS), 2012 46th Annual Conference on; 01/2012
Publication Stats
6k  Citations  
187.56  Total Impact Points  
Top Journals
Institutions

2–2012

University of Wisconsin–Madison
 Department of Electrical and Computer Engineering
Madison, Wisconsin, United States


1996–2010

Rice University
 Department of Electrical and Computer Engineering
Houston, Texas, United States


2008

Technical University of Lisbon
 Instituto de Telecomunicações (IT)
Lisbon, Lisbon, Portugal


2007–2008

McGill University
 Department of Electrical & Computer Engineering
Montréal, Quebec, Canada 
Duke University
 Department of Electrical and Computer Engineering (ECE)
Durham, North Carolina, United States


1999–2004

Boston University
 Department of Mathematics and Statistics
Boston, Massachusetts, United States


2003

Institute of Telecommunications
Lisboa, Lisbon, Portugal


1996–1999

Michigan State University
 Department of Electrical and Computer Engineering
East Lansing, MI, United States
