Robert Nowak

University of Wisconsin–Madison, Madison, Wisconsin, United States

Are you Robert Nowak?

Claim your profile

Publications (272)237.53 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full data-requirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel reconstruction algorithm that provably improves over all previous methods in a regime of interest.
    04/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Binary logistic regression with a sparsity constraint on the solution plays a vital role in many high dimensional machine learning applications. In some cases, the features can be grouped together, so that entire subsets of features can be selected or zeroed out. In many applications, however, this can be very restrictive. In this paper, we are interested in a less restrictive form of structured sparse feature selection: we assume that while features can be grouped according to some notion of similarity, not all features in a group need be selected for the task at hand. This is sometimes referred to as a "sparse group" lasso procedure, and it allows for more flexibility than traditional group lasso methods. Our framework generalizes conventional sparse group lasso further by allowing for overlapping groups, an additional flexibility that presents further challenges. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization program that automatically selects similar features for learning in high dimensions. We establish consistency results for the SOSlasso for classification problems using the logistic regression setting, which specializes to results for the lasso and the group lasso, some known and some new. In particular, SOSlasso is motivated by multi-subject fMRI studies in which functional activity is classified using brain voxels as features, source localization problems in Magnetoencephalography (MEG), and analyzing gene activation patterns in microarray data analysis. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.
    02/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Second-harmonic generation (SHG) imaging can help reveal interactions between collagen fibers and cancer cells. Quantitative analysis of SHG images of collagen fibers is challenged by the heterogeneity of collagen structures and low signal-to-noise ratio often found while imaging collagen in tissue. The role of collagen in breast cancer progression can be assessed post acquisition via enhanced computation. To facilitate this, we have implemented and evaluated four algorithms for extracting fiber information, such as number, length, and curvature, from a variety of SHG images of collagen in breast tissue. The image-processing algorithms included a Gaussian filter, SPIRAL-TV filter, Tubeness filter, and curvelet-denoising filter. Fibers are then extracted using an automated tracking algorithm called fiber extraction (FIRE). We evaluated the algorithm performance by comparing length, angle and position of the automatically extracted fibers with those of manually extracted fibers in twenty-five SHG images of breast cancer. We found that the curvelet-denoising filter followed by FIRE, a process we call CT-FIRE, outperforms the other algorithms under investigation. CT-FIRE was then successfully applied to track collagen fiber shape changes over time in an in vivo mouse model for breast cancer.
    Journal of Biomedical Optics 01/2014; 19(1):16007. · 2.88 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.
    12/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multi- subject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.
    11/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sampling from distributions to find the one with the largest mean arises in a broad range of applications, and it can be mathematically modeled as a multi-armed bandit problem in which each distribution is associated with an arm. This paper studies the sample complexity of identifying the best arm (largest mean) in a multi-armed bandit problem. Motivated by large-scale applications, we are especially interested in identifying situations where the total number of samples that are necessary and sufficient to find the best arm scale linearly with the number of arms. We present a single-parameter multi-armed bandit model that spans the range from linear to superlinear sample complexity. We also give a new algorithm for best arm identification, called PRISM, with linear sample complexity for a wide range of mean distributions. The algorithm, like most exploration procedures for multi-armed bandits, is adaptive in the sense that the next arms to sample are selected based on previous samples. We compare the sample complexity of adaptive procedures with simpler non-adaptive procedures using new lower bounds. For many problem instances, the increased sample complexity required by non-adaptive procedures is a polynomial factor of the number of arms.
    06/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper considers the problem of recovering an unknown sparse p\times p matrix X from an m\times m matrix Y=AXB^T, where A and B are known m \times p matrices with m << p. The main result shows that there exist constructions of the "sketching" matrices A and B so that even if X has O(p) non-zeros, it can be recovered exactly and efficiently using a convex program as long as these non-zeros are not concentrated in any single row/column of X. Furthermore, it suffices for the size of Y (the sketch dimension) to scale as m = O(\sqrt{# nonzeros in X} \times log p). The results also show that the recovery is robust and stable in the sense that if X is equal to a sparse matrix plus a perturbation, then the convex program we propose produces an approximation with accuracy proportional to the size of the perturbation. Unlike traditional results on sparse recovery, where the sensing matrix produces independent measurements, our sensing operator is highly constrained (it assumes a tensor product structure). Therefore, proving recovery guarantees require non-standard techniques. Indeed our approach relies on a novel result concerning tensor products of bipartite graphs, which may be of independent interest. This problem is motivated by the following application, among others. Consider a p\times n data matrix D, consisting of n observations of p variables. Assume that the correlation matrix X:=DD^{T} is (approximately) sparse in the sense that each of the p variables is significantly correlated with only a few others. Our results show that these significant correlations can be detected even if we have access to only a sketch of the data S=AD with A \in R^{m\times p}.
    03/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This special issue features papers that highlight a variety of anomaly detection techniques in the context of many important applications including networks, homeland security, and healthcare.We accepted 13 papers for publication out of a large number of submitted manuscripts. The first series of seven papers appearing in the special issue are focused on new problem structures. The next set of six papers are somewhat focused more towards specific applications. Several papers in the first half of the special issue are generally motivated by applications arising in sensor, communication and social networks. The next set of three papers describes techniques for homeland security and surveillance applications.
    IEEE Journal of Selected Topics in Signal Processing 02/2013; 7(1):1-3. · 3.30 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Location-specific Internet services are predicated on the ability to identify the geographic position of IP hosts accurately. Fundamental to current state-of-the-art geolocation techniques is reliance on heavyweight traceroute-like probes that put a significant traffic load on networks. In this paper, we introduce a new lightweight approach to IP geolocation that we call Posit. This methodology requires only a small number of delay measurements conducted to end host targets in conjunction with a computationally-efficient statistical embedding technique. We demonstrate that Posit performs better than all existing geolocation tools across a wide spectrum of measurement infrastructures with varying geographic densities. Specifically, Posit is shown to geolocate hosts with median error improvements of over 55% with respect to all current measurement-based IP geolocation methodologies.
    ACM SIGMETRICS Performance Evaluation Review 10/2012; 40(2):2-11.
  • Source
    Nikhil Rao, Benjamin Recht, Robert Nowak
    [Show abstract] [Hide abstract]
    ABSTRACT: In applications ranging from communications to genetics, signals can be modeled as lying in a union of subspaces. Under this model, signal coefficients that lie in certain subspaces are active or inactive together. The potential subspaces are known in advance, but the particular set of subspaces that are active (i.e., in the signal support) must be learned from measurements. We show that exploiting knowledge of subspaces can further reduce the number of measurements required for exact signal recovery, and derive universal bounds for the number of measurements needed. The bound is universal in the sense that it only depends on the number of subspaces under consideration, and their orientation relative to each other. The particulars of the subspaces (e.g., compositions, dimensions, extents, overlaps, etc.) does not affect the results we obtain. In the process, we derive sample complexity bounds for the special case of the group lasso with overlapping groups (the latent group lasso), which is used in a variety of applications. Finally, we also show that wavelet transform coefficients of images can be modeled as lying in groups, and hence can be efficiently recovered using group lasso methods.
    09/2012;
  • Kevin G. Jamieson, Robert D. Nowak, Benjamin Recht
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.
    09/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many real-world phenomena can be represented by a spatio-temporal signal: where, when, and how much. Social media is a tantalizing data source for those who wish to monitor such signals. Unlike most prior work, we assume that the target phenomenon is known and we are given a method to count its occurrences in social media. However, counting is plagued by sample bias, incomplete data, and, paradoxically, data scarcity -- issues inadequately addressed by prior work. We formulate signal recovery as a Poisson point process estimation problem. We explicitly incorporate human population bias, time delays and spatial distortions, and spatio-temporal regularization into the model to address the noisy count issues. We present an efficient optimization algorithm and discuss its theoretical properties. We show that our model is more accurate than commonly-used baselines. Finally, we present a case study on wildlife roadkill monitoring, where our model produces qualitatively convincing results.
    04/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Accurate and timely identification of the router-level topology of the Internet is one of the major unresolved problems in Internet research. Topology recovery via tomographic inference is potentially an attractive complement to standard methods that use TTL-limited probes. Unfortunately, limitations of prior tomographic techniques make timely resolution of large-scale topologies impossible due to the requirement of an infeasible number of measurements. In this paper, we describe new techniques that aim toward efficient tomographic inference for accurate router-level topology measurement. We introduce methodologies based on Depth-First Search (DFS) ordering that clusters end-hosts based on shared infrastructure and enables the logical tree topology of a network to be recovered accurately and efficiently. We evaluate the capabilities of our algorithms in large-scale simulation and find that our methods will reconstruct topologies using less than 2% of the measurements required by exhaustive methods and less than 15% of the measurements needed by the current state-of-the-art tomographic approach. We also present results from a study of the live Internet where we show our DFS-based methodologies can recover the logical router-level topology more accurately and with fewer probes than prior techniques.
    IEEE/ACM Transactions on Networking 01/2012; 20(3):931-943. · 2.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Linear subspace models have recently been successfully employed to model highly incomplete high-dimensional data, but they are sometimes too restrictive to model the data well. Modeling data as a union of subspaces gives more flexibility and leads to the problem of Subspace Clustering, or clustering vectors into groups that lie in or near the same subspace. Low-rank matrix completion allows one to estimate a single subspace from incomplete data, and this work has recently been extended for the union of subspaces problem [3]. However, the algorithm analyzed there is computationally demanding. Here we present a fast algorithm that combines GROUSE, an incremental matrix completion algorithm, and k-subspaces, the alternating minimization heuristic for solving the subspace clustering problem. k-GROUSE is two orders of magnitude faster than the algorithm proposed in [3] and relies on a slightly more general projection theorem which we present here.
    Statistical Signal Processing Workshop (SSP), 2012 IEEE; 01/2012
  • Conference Paper: Covariance sketching
    [Show abstract] [Hide abstract]
    ABSTRACT: Learning covariance matrices from high-dimensional data is an important problem that has received a lot of attention recently. We are particularly interested in the high-dimensional setting, where the number of samples one has access to is fewer than the number of variates. Fortunately, in many applications of interest, the underlying covariance matrix is sparse and hence has limited degrees of freedom. In most existing work however, it is assumed that one can obtain samples of all the variates simultaneously. This could be very expensive or physically infeasible in some applications. As a means of overcoming this limitation, we propose a new procedure that “pools” the covariates into a small number of groups and then samples each pooled group. We show that in certain cases it is possible to recover the covariance matrix from the pooled samples using an efficient convex optimization program, and so we call the procedure “covariance sketching”.
    Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: A sequential adaptive compressed sensing procedure for signal support recovery is proposed and analyzed. The procedure is based on the principle of distilled sensing, and makes used of sparse sensing matrices to perform sketching observations able to quickly identify irrelevant signal components. It is shown that adaptive compressed sensing enables recovery of weaker sparse signals than those that can be recovered using traditional non-adaptive compressed sensing approaches.
    Statistical Signal Processing Workshop (SSP), 2012 IEEE; 01/2012
  • Source
    Brian Eriksson, Laura Balzano, Robert Nowak
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper considers the problem of completing a matrix with many missing entries under the assumption that the columns of the matrix belong to a union of multiple low-rank subspaces. This generalizes the standard low-rank matrix completion problem to situations in which the matrix rank can be quite high or even full rank. Since the columns belong to a union of subspaces, this problem may also be viewed as a missing-data version of the subspace clustering problem. Let X be an n x N matrix whose (complete) columns lie in a union of at most k subspaces, each of rank <= r < n, and assume N >> kn. The main result of the paper shows that under mild assumptions each column of X can be perfectly recovered with high probability from an incomplete version so long as at least CrNlog^2(n) entries of X are observed uniformly at random, with C>1 a constant depending on the usual incoherence conditions, the geometrical arrangement of subspaces, and the distribution of columns over the subspaces. The result is illustrated with numerical experiments and an application to Internet distance matrix completion and topology identification.
    12/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A cross-validation (CV) method based on state-space framework is introduced for comparing the fidelity of different cortical interaction models to the measured scalp electroencephalogram (EEG) or magnetoencephalography (MEG) data being modeled. A state equation models the cortical interaction dynamics and an observation equation represents the scalp measurement of cortical activity and noise. The measured data are partitioned into training and test sets. The training set is used to estimate model parameters and the model quality is evaluated by computing test data innovations for the estimated model. Two CV metrics normalized mean square error and log-likelihood are estimated by averaging over different training/test partitions of the data. The effectiveness of this method of model selection is illustrated by comparing two linear modeling methods and two nonlinear modeling methods on simulated EEG data derived using both known dynamic systems and measured electrocorticography data from an epilepsy patient.
    IEEE transactions on bio-medical engineering 11/2011; 59(2):504-14. · 2.15 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite many efforts over the past decade, the ability to generate topological maps of the Internet at the router-level accurately and in a timely fashion remains elusive. Mapping campaigns commonly involve {t traceroute}-like probing that are usually non-adaptive and incomplete, thus revealing only a portion of the underlying topology. In this paper we demonstrate that standard probing methods yield datasets that implicitly contain information about much more than just the directly observed links and routers. Each probe yields information that places constraints on the underlying topology, and by integrating a large number of such constraints it is possible to accurately infer the existence of unseen components of the Internet (i.e., links and routers not directly revealed by the probing). Moreover, we show that this information can be used to adaptively re-focus the probing in order to more quickly discover the topology. These findings suggest radically new and more efficient approaches to Internet mapping. Our work focuses on the discovery of the core of the Internet. We define "Internet core" as the set of routers that is roughly bounded by ingress/egress routers from stub autonomous systems. We describe a novel data analysis methodology designed to accurately infer (i) the number of unseen core routers, (ii) the unseen hop-count distances between observed routers, and (iii) unseen links between observed routers. We use a large experimental dataset to validate the proposed methods. For our data set, we show that our methods can predict the number of unseen routers to within a 13% error level, estimate 60% of the unseen distances between observed routers to within a one-hop error, and robustly detect over 35% of the unseen links between observed routers. Furthermore, we use the information extracted by our inference methodology to drive an adaptive active-probing scheme. The adaptive probing method allows us to generate maps on our data set using 50% fewer probes- - than standard non-adaptive approaches.
    IEEE Journal on Selected Areas in Communications 11/2011; · 3.12 Impact Factor
  • Source
    Kevin G. Jamieson, Robert D. Nowak
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of $n$ objects can be identified by standard sorting methods using $n log_2 n$ pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a $d$-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in $R^d$. We show that under this assumption the number of possible rankings grows like $n^{2d}$ and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than $d log n$ adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis.
    09/2011;

Publication Stats

8k Citations
237.53 Total Impact Points

Institutions

  • 2–2012
    • University of Wisconsin–Madison
      • Department of Electrical and Computer Engineering
      Madison, Wisconsin, United States
  • 2009–2011
    • Princeton University
      Princeton, New Jersey, United States
  • 1999–2011
    • Boston University
      • • Department of Computer Science
      • • Department of Mathematics and Statistics
      Boston, MA, United States
  • 2010
    • University of Houston
      • Department of Electrical & Computer Engineering
      Houston, TX, United States
    • University of Minnesota Twin Cities
      • Department of Electrical and Computer Engineering
      Minneapolis, MN, United States
  • 1996–2010
    • Rice University
      • Department of Electrical and Computer Engineering
      Houston, Texas, United States
  • 2008
    • Technical University of Lisbon
      • Instituto de Telecomunicações (IT)
      Lisbon, Lisbon, Portugal
  • 2007–2008
    • Duke University
      • Department of Electrical and Computer Engineering (ECE)
      Durham, North Carolina, United States
    • McGill University
      • Department of Electrical & Computer Engineering
      Montréal, Quebec, Canada
    • University of California, Los Angeles
      Los Angeles, California, United States
  • 2006
    • Institute of Electrical and Electronics Engineers
      Washington, Washington, D.C., United States
  • 2003
    • Institute of Telecommunications
      Lisboa, Lisbon, Portugal
  • 1996–1999
    • Michigan State University
      • Department of Electrical and Computer Engineering
      East Lansing, MI, United States