Conference Proceeding

# Proximal Methods for Sparse Hierarchical Dictionary Learning.

01/2010; In proceeding of: Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel
Source: DBLP

ABSTRACT We propose to combine two approaches for mod- eling data admitting sparse representations: on the one hand, dictionary learning has proven ef- fective for various signal processing tasks. On the other hand, recent work on structured spar- sity provides a natural framework for modeling dependencies between dictionary elements. We thus consider a tree-structured sparse regulariza- tion to learn dictionaries embedded in a hierar- chy. The involved proximal operator is com- putable exactly via a primal-dual method, allow- ing the use of accelerated gradient techniques. Experiments show that for natural image patches, learned dictionary elements organize themselves in such a hierarchical structure, leading to an im- proved performance for restoration tasks. When applied to text documents, our method learns hi- erarchies of topics, thus providing a competitive alternative to probabilistic topic models.

0 0
·
0 Bookmarks
·
53 Views
• Source
##### Conference Proceeding: Efficient sparse coding algorithms.
[hide abstract]
ABSTRACT: Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that cap- ture higher-level features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1-regularized least squares problem and an L2-constrained least squares problem. We propose novel algorithms to solve both of these optimiza- tion problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field sur- round suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons.
Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006; 01/2006
• Source
##### Article: The composite absolute penalties family for grouped and hierarchical variable selection
[hide abstract]
ABSTRACT: Extracting useful information from high-dimensional data is an important focus of today's statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the $L_1$-penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including $L_1$ to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the across-group and within-group levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. We propose using the BLASSO and cross-validation to compute CAP estimates in general. For a subfamily of CAP estimates involving only the $L_1$ and $L_{\infty}$ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived so that the regularization parameter is selected without cross-validation. CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with $p\gg n$ and possibly mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments. Comment: Published in at http://dx.doi.org/10.1214/07-AOS584 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
The Annals of Statistics 09/2009; · 2.53 Impact Factor
• Source
##### Article: Learning the parts of objects by non-negative matrix factorization.
[hide abstract]
ABSTRACT: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.
Nature 11/1999; 401(6755):788-91. · 38.60 Impact Factor