Conference Paper

Proximal Methods for Sparse Hierarchical Dictionary Learning.

Conference: Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel
Source: DBLP

ABSTRACT We propose to combine two approaches for mod- eling data admitting sparse representations: on the one hand, dictionary learning has proven ef- fective for various signal processing tasks. On the other hand, recent work on structured spar- sity provides a natural framework for modeling dependencies between dictionary elements. We thus consider a tree-structured sparse regulariza- tion to learn dictionaries embedded in a hierar- chy. The involved proximal operator is com- putable exactly via a primal-dual method, allow- ing the use of accelerated gradient techniques. Experiments show that for natural image patches, learned dictionary elements organize themselves in such a hierarchical structure, leading to an im- proved performance for restoration tasks. When applied to text documents, our method learns hi- erarchies of topics, thus providing a competitive alternative to probabilistic topic models.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of estimating high-dimensional regression models regularized by a structured sparsity-inducing penalty that encodes prior structural information on either the input or output variables. We consider two widely adopted types of penalties of this kind as motivating examples: (1) the general overlapping-group-lasso penalty, generalized from the group-lasso penalty; and (2) the graph-guided-fused-lasso penalty, generalized from the fused-lasso penalty. For both types of penalties, due to their nonseparability and nonsmoothness, developing an efficient optimization method remains a challenging problem. In this paper we propose a general optimization approach, the smoothing proximal gradient (SPG) method, which can solve structured sparse regression problems with any smooth convex loss under a wide spectrum of structured sparsity-inducing penalties. Our approach combines a smoothing technique with an effective proximal gradient method. It achieves a convergence rate significantly faster than the standard first-order methods, subgradient methods, and is much more scalable than the most widely used interior-point methods. The efficiency and scalability of our method are demonstrated on both simulation experiments and real genetic data sets.
    The Annals of Applied Statistics 05/2010; 6(2012). · 2.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In many machine learning and pattern analysis applications, grouping of features during model development and the selection of a small number of relevant groups can be useful to improve the interpretability of the learned parameters. Although this problem has been receiving a significant amount of attention lately, most of the approaches require the manual tuning of one or more hyper-parameters. In order to overcome this drawback, this work presents a novel hierarchical Bayesian formulation of a generalized linear model and estimates the posterior distribution of the parameters and hyper-parameters of the model within a completely Bayesian paradigm based on variational inference. All the required computations are analytically tractable. The performance and applicability of the proposed framework is demonstrated on synthetic and real world examples.
    International Journal of Machine Learning and Cybernetics. 12/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mapping images to a high-dimensional feature space, either by considering patches of images or other features, has lead to state-of-art results in signal processing tasks such as image denoising and imprinting, and in various machine learning and computer vision tasks on images. Understanding the geometry of the embedding of images into high-dimensional feature space is a challenging problem. Finding efficient representations and learning dictionaries for such embeddings is also problematic, often leading to expensive optimization algorithms. Many such algorithms scale poorly with the dimension of the feature space, for example with the size of patches of images if these are chosen as features. This is in contrast with the crucial needs of using a multi-scale approach in the analysis of images, as details at multiple scales are crucial in image understanding, as well as in many signal processing tasks. Here we exploit a recent dictionary learning algorithm based on Geometric Wavelets, and we extend it to perform multi-scale dictionary learning on image patches, with efficient algorithms for both the learning of the dictionary, and the computation of coefficients onto that dictionary. We also discuss how invariances in images may be introduced in the dictionary learning phase, by generalizing the construction of such dictionaries to non-Euclidean spaces.
    Proc SPIE 09/2013;

Full-text (2 Sources)

Available from
Jun 3, 2014