Conference Paper

Proximal Methods for Sparse Hierarchical Dictionary Learning.

Conference: Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel
Source: DBLP

ABSTRACT We propose to combine two approaches for mod- eling data admitting sparse representations: on the one hand, dictionary learning has proven ef- fective for various signal processing tasks. On the other hand, recent work on structured spar- sity provides a natural framework for modeling dependencies between dictionary elements. We thus consider a tree-structured sparse regulariza- tion to learn dictionaries embedded in a hierar- chy. The involved proximal operator is com- putable exactly via a primal-dual method, allow- ing the use of accelerated gradient techniques. Experiments show that for natural image patches, learned dictionary elements organize themselves in such a hierarchical structure, leading to an im- proved performance for restoration tasks. When applied to text documents, our method learns hi- erarchies of topics, thus providing a competitive alternative to probabilistic topic models.

Download full-text


Available from: Francis Bach, Sep 26, 2015
25 Reads
  • Source
    • "Recent years, multi-scale dictionary has received much attention due to its better depiction of the data geometry [3] compared with its single-scale counterpart [2]. Multi-scale representation achieves stronger signal restoration ability [4], and has been effectively applied to many machine learning tasks, including classification [5], novelty detection [6] and topic modelling [7]. We introduce multi-scale dictionary based representation into solving the Poisson compressive sensing (CS) inverse problem. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A novel multi-scale dictionary based Bayesian reconstruction algorithm is proposed for compressive X-ray imaging, which encodes the material's spectrum by Poisson measurements. Inspired by recently developed compressive X-ray imaging systems [1], this work aims to recover the material's spectrum from the compressive coded image by leveraging a reference spectrum library. Instead of directly using the huge and redundant library as a dictionary, which is cumbersome in computation and difficult for selecting those active dictionary atoms, a multi-scale tree structured dictionary is refined from the spectrum library, and following this a Bayesian reconstruction algorithm is developed. Experimental results on real data demonstrate superior performance in comparison with traditional methods.
    ICASSP 2015; 04/2015
  • Source
    • "As an extension of lasso [31], group lasso [5] incorporates the correlation of features inside inner groups and successfully induces the group effect. Furthermore, many other structured sparsity-like methods have been proposed, such as fused lasso [32], elastic net [45], Graph-guide sparsity [10], and tree-structure sparsity [12]. Recently, Mairal et al. proposed path coding penalties [22] to take into account the correlations among features that are embedded in a long-range path on a graph. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The selection of a subset of discriminative features for semantic recognition is crucial to making multimedia analysis more interpretable. This paper proposes a model of spatial path coding (SPC) that uses a supervised technique to select sparse features. SPC is a regularized penalty that encodes the spatial correlations of features obtained by the spatial pyramid model. In SPC, each feature dimension is considered as a vertex in a direct acyclic graph (DAG), and the spatial correlations among features are considered as directed edges associated with predefined weights. Thus, the process of supervised feature selection can be directly formulated to solve a path selection problem with minimum cost. Experiments are conducted to evaluate the performance of supervised feature selection with SPC for the tasks of scene classification and action recognition using four benchmark datasets. The results show that SPC can be used to automatically select a subgraph of the DAG with a small number of discriminative features for a certain category. In addition, the method proposed in this paper shows better performance in terms of classification and recognition accuracy as compared with state-of-the-art algorithms.
    Information Sciences 10/2014; 281:523–535. DOI:10.1016/j.ins.2014.03.093 · 4.04 Impact Factor
  • Source
    • "It is known that setting the step size to the reciprocal of the Lipschitz constant of ∇L guarantees convergence [22]. Given a hierarchical sparsity regularization, such as Ω SP LAM , the proximal operator can be very efficiently solved in the dual by a single pass of block coordinate descent [20]. SPLAM has two tuning parameters, λ and α, and typically the problem is solved over a grid of (λ, α) pairs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The generalized partially linear additive model (GPLAM) is a flexible and interpretable approach to building predictive models. It combines features in an additive manner, allowing them to have either a linear or nonlinear effect on the response. However, the assignment of features to the linear and nonlinear groups is typically assumed known. Thus, to make a GPLAM a viable approach in situations in which little is known $apriori$ about the features, one must overcome two primary model selection challenges: deciding which features to include in the model and determining which features to treat nonlinearly. We introduce sparse partially linear additive models (SPLAMs), which combine model fitting and $both$ of these model selection challenges into a single convex optimization problem. SPLAM provides a bridge between the Lasso and sparse additive models. Through a statistical oracle inequality and thorough simulation, we demonstrate that SPLAM can outperform other methods across a broad spectrum of statistical regimes, including the high-dimensional ($p\gg N$) setting. We develop efficient algorithms that are applied to real data sets with half a million samples and over 45,000 features with excellent predictive performance.
Show more