Conference Paper

# Proximal Methods for Sparse Hierarchical Dictionary Learning.

Conference: Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel
Source: DBLP

ABSTRACT

We propose to combine two approaches for mod- eling data admitting sparse representations: on the one hand, dictionary learning has proven ef- fective for various signal processing tasks. On the other hand, recent work on structured spar- sity provides a natural framework for modeling dependencies between dictionary elements. We thus consider a tree-structured sparse regulariza- tion to learn dictionaries embedded in a hierar- chy. The involved proximal operator is com- putable exactly via a primal-dual method, allow- ing the use of accelerated gradient techniques. Experiments show that for natural image patches, learned dictionary elements organize themselves in such a hierarchical structure, leading to an im- proved performance for restoration tasks. When applied to text documents, our method learns hi- erarchies of topics, thus providing a competitive alternative to probabilistic topic models.

### Full-text

Available from: Francis Bach
• Source
• "To avoid the non-convex optimization problem incurred by ℓ 0 -norm, most of the sparse subspace clustering or sparse graph based clustering methods use ℓ 1 -norm [26] [4] [7] [8] [28] or ℓ 2 -norm with thresholding [20] to impose the sparsity on the constructed similarity graph. In addition, ℓ 1 norm has been widely used as a convex relaxation of ℓ 0 norm for efficient sparse coding algorithms [11] [14] [15]. On the other hand, sparse representation methods such as [16] that directly optimize objective function involving ℓ 0 norm demonstrate compelling performance compared to its ℓ 1 -norm counterpart. "
##### Research: Learning with L0-Graph: L0-Induced Sparse Subspace Clustering
[Hide description]
DESCRIPTION: Sparse subspace clustering methods, such as Sparse Subspace Clustering (SSC) \cite{ElhamifarV13} and $\ell^{1}$-graph \cite{YanW09,ChengYYFH10}, are effective in partitioning the data that lie in a union of subspaces. Most of those methods use $\ell^{1}$-norm or $\ell^{2}$-norm with thresholding to impose the sparsity of the constructed sparse similarity graph, and certain assumptions, e.g. independence or disjointness, on the subspaces are required to obtain the subspace-sparse representation, which is the key to their success. Such assumptions are not guaranteed to hold in practice and they limit the application of sparse subspace clustering on subspaces with general location. In this paper, we propose a new sparse subspace clustering method named $\ell^{0}$-graph. In contrast to the required assumptions on subspaces for most existing sparse subspace clustering methods, it is proved that subspace-sparse representation can be obtained by $\ell^{0}$-graph for arbitrary distinct underlying subspaces almost surely under the mild i.i.d. assumption on the data generation. We develop a proximal method to obtain the sub-optimal solution to the optimization problem of $\ell^{0}$-graph with proved guarantee of convergence. Moreover, we propose a regularized $\ell^{0}$-graph that encourages nearby data to have similar neighbors so that the similarity graph is more aligned within each cluster and the graph connectivity issue is alleviated. Extensive experimental results on various data sets demonstrate the superiority of $\ell^{0}$-graph compared to other competing clustering methods, as well as the effectiveness of regularized $\ell^{0}$-graph.
Full-text · Research · Dec 2015
• Source
• "Dictionary learning finds a low rank factored-matrix approximation to the observation matrix, whose columns span this collection. Many different methods for this approximation problem have been proposed [23] [24]. Among the simplest methods is the K-SVD approach [25] which uses a spectral decomposition to determine the best low rank approximation to the observed matrix. "
##### Article: Multi-centrality Graph Spectral Decompositions and their Application to Cyber Intrusion Detection
[Hide abstract]
ABSTRACT: Many modern datasets can be represented as graphs and hence spectral decompositions such as graph principal component analysis (PCA) can be useful. Distinct from previous graph decomposition approaches based on subspace projection of a single topological feature, e.g., the Fiedler vector of centered graph adjacency matrix (graph Laplacian), we propose spectral decomposition approaches to graph PCA and graph dictionary learning that integrate multiple features, including graph walk statistics, centrality measures and graph distances to reference nodes. In this paper we propose a new PCA method for single graph analysis, called multi-centrality graph PCA (MC-GPCA), and a new dictionary learning method for ensembles of graphs, called multi-centrality graph dictionary learning (MC-GDL), both based on spectral decomposition of multi-centrality matrices. As an application to cyber intrusion detection, MC-GPCA can be an effective indicator of anomalous connectivity pattern and MC-GDL can provide discriminative basis for attack classification.
Full-text · Article · Dec 2015
• Source
• "To avoid the non-convex optimization problem incurred by ℓ 0 -norm, most of the sparse graph based methods [19] [4] [8] [9] [21] [22] replaces ℓ 0 -norm with ℓ 1 -norm so as to solve a convex optimization problem. In addition, ℓ 1 -norm has been widely used as a convex relaxation of ℓ 0 -norm for efficient sparse coding algorithms [12] [13] [14]. [9] points out that in case that the data are drawn from linear independent subspaces, sparse representation by ℓ 1 -norm can recover the underlying subspaces. "
##### Article: Learning $\ell^{0}$-Graph for Data Clustering
[Hide abstract]
ABSTRACT: $\ell^{1}$-graph \cite{YanW09,ChengYYFH10}, a sparse graph built by reconstructing each datum with all the other data using sparse representation, has been demonstrated to be effective in clustering high dimensional data and recovering independent subspaces from which the data are drawn. It is well known that $\ell^{1}$-norm used in $\ell^{1}$-graph is a convex relaxation of $\ell^{0}$-norm for enforcing the sparsity. In order to handle general cases when the subspaces are not independent and follow the original principle of sparse representation, we propose a novel $\ell^{0}$-graph that employs $\ell^{0}$-norm to encourage the sparsity of the constructed graph, and develop a proximal method to solve the associated optimization problem with the proved guarantee of convergence. Extensive experimental results on various data sets demonstrate the superiority of $\ell^{0}$-graph compared to other competing clustering methods including $\ell^{1}$-graph.
Preview · Article · Oct 2015