Proximal Methods for Sparse Hierarchical Dictionary Learning

5. Discussion

We have shown in this paper that tree-structured sparse de-

composition problems can be solved at the same computa-

tional cost as addressing classical decomposition based on

the ℓ

1

norm. We have used this approach to learn dictionar-

ies embedded in trees, with application to representation of

natural image patches and text documents.

We believe that the connection established between sparse

methods and probabilistic topic models should prove

fruitful as the two lines of work have focused on dif-

ferent aspects of the same unsupervised learning prob-

lem: our approach is based on convex optimization tools,

and provides experimentally more stable data representa-

tions. Moreover, it can be easily extended with the same

tools to other types of structures corresponding to other

norms (

Jenatton et al., 2009; Jacob et al., 2009). However,

it is not able to learn elegantly and automatically model

parameters such as dictionary size of tree topology, which

Bayesian methods can. Finally, another interesting com-

mon line of research to pursue is the supervised design

of dictionaries, which has been proved useful in the two

frameworks (

Mairal et al., 2009; Blei & McAuliffe., 2008).

Acknowledgments

This paper was partially supported by grants from the

Agence Nationale de la Recherche (MGA Project) and

from the European Research Council (SIERRA Project).

References

Bach, F. High-dimensional non-linear variable selection

through hierarchical kernel learning. Technical report,

arXiv:0909.0844, 2009.

Beck, A. and Teboulle, M. A fast iterative shrinkage-

thresholding algorithm for linear inverse problems.

SIAM J. Imag. Sci., 2(1):183–202, 2009.

Bengio, S., Pereira, F., Singer, Y., and Strelow, D. Group

sparse coding. In Adv. NIPS, 2009.

Bengio, Y. Learning deep architectures for AI. Foundations

and Trends in Machine Learning, 2(1), 2009.

Bertsekas, D. P. Nonlinear programming. Athena Scientiﬁc

Belmont, 1999.

Blei, D., Ng, A., and Jordan, M. Latent dirichlet allocation.

J. Mach. Learn. Res., 3:993–1022, 2003.

Blei, D. and McAuliffe, J. Supervised topic models. In

Adv. NIPS, 2008.

Blei, D., Grifﬁths, T., and Jordan, M. The nested chi-

nese restaurant process and Bayesian nonparametric in-

ference of topic hierarchies. Journal of the ACM, 2010.

Boyd, S. P. and Vandenberghe, L. Convex optimization.

Cambridge University Press, 2004.

Buntine, W.L. Variational Extensions to EM and Multino-

mial PCA. In Proc. ECML, 2002.

Elad, M. and Aharon, M. Image denoising via sparse

and redundant representations over learned dictionaries.

IEEE Trans. Image Process., 54(12):3736–3745, 2006.

Jacob, L., Obozinski, G., and Vert, J.-P. Group Lasso with

overlap and graph Lasso. In Proc. ICML, 2009.

Jenatton, R., Audibert, J.-Y., and Bach, F. Structured vari-

able selection with sparsity-inducing norms. Technical

report, arXiv:0904.3523, 2009.

Jenatton, R., Obozinski, G., and Bach, F. Structured sparse

principal component analysis. In Proc. AISTATS, 2010.

Ji, S., and Ye, J. An accelerated gradient method f or trace

norm minimization. In Proc. ICML, 2009.

Kavukcuoglu, K., Ranzato, M., Fergus, R., and LeCun,

Y. Learning invariant features through topographic ﬁl-

ter maps. In Proc. CVPR, 2009.

Kim, S. and Xing, E. P. Tree-guided group lasso for multi-

task regression with structured sparsity. Technical report,

arXiv:0909.1373, 2009.

Lee, D. D. and Seung, H. S. Learning the parts of objects by

non-negative matrix factorization. Nature, 401(6755):

788–791, 1999.

Lee, H., Battle, A., Raina, R., and Ng, A. Y. Efﬁcient sparse

coding algorithms. In Adv. NIPS, 2007.

Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman,

A. Supervised Dictionary Learning. In Adv. NIPS, 2009.

Mairal, J., Bach, F., Ponce, J., and Sapiro, G. Online learn-

ing for matrix factorization and sparse coding. J. Mach.

Learn. Res., 11, 19–60, 2010.

Nesterov, Y. Gradient methods for minimizing composite

objective function. Technical report, CORE, 2007.

Olshausen, B. A. and Field, D. J. Sparse coding with an

overcomplete basis set: A strategy employed by V1? Vi-

sion Research, 37:3311–3325, 1997.

Zhao, P., Rocha, G., and Yu, B. The composite absolute

penalties family for grouped and hierarchical variable se-

lection. Ann. Stat., 37(6A):3468–3497, 2009.

Zhu, J., Ahmed, A., and Xing, E. P. MedLDA: maximum

margin supervised topic models for regression and clas-

siﬁcation. In Proc. ICML, 2009.