Proximal Methods for Sparse Hierarchical Dictionary Learning
We have shown in this paper that tree-structured sparse de-
composition problems can be solved at the same computa-
tional cost as addressing classical decomposition based on
norm. We have used this approach to learn dictionar-
ies embedded in trees, with application to representation of
natural image patches and text documents.
We believe that the connection established between sparse
methods and probabilistic topic models should prove
fruitful as the two lines of work have focused on dif-
ferent aspects of the same unsupervised learning prob-
lem: our approach is based on convex optimization tools,
and provides experimentally more stable data representa-
tions. Moreover, it can be easily extended with the same
tools to other types of structures corresponding to other
Jenatton et al., 2009; Jacob et al., 2009). However,
it is not able to learn elegantly and automatically model
parameters such as dictionary size of tree topology, which
Bayesian methods can. Finally, another interesting com-
mon line of research to pursue is the supervised design
of dictionaries, which has been proved useful in the two
Mairal et al., 2009; Blei & McAuliffe., 2008).
This paper was partially supported by grants from the
Agence Nationale de la Recherche (MGA Project) and
from the European Research Council (SIERRA Project).
Bach, F. High-dimensional non-linear variable selection
through hierarchical kernel learning. Technical report,
Beck, A. and Teboulle, M. A fast iterative shrinkage-
thresholding algorithm for linear inverse problems.
SIAM J. Imag. Sci., 2(1):183–202, 2009.
Bengio, S., Pereira, F., Singer, Y., and Strelow, D. Group
sparse coding. In Adv. NIPS, 2009.
Bengio, Y. Learning deep architectures for AI. Foundations
and Trends in Machine Learning, 2(1), 2009.
Bertsekas, D. P. Nonlinear programming. Athena Scientiﬁc
Blei, D., Ng, A., and Jordan, M. Latent dirichlet allocation.
J. Mach. Learn. Res., 3:993–1022, 2003.
Blei, D. and McAuliffe, J. Supervised topic models. In
Adv. NIPS, 2008.
Blei, D., Grifﬁths, T., and Jordan, M. The nested chi-
nese restaurant process and Bayesian nonparametric in-
ference of topic hierarchies. Journal of the ACM, 2010.
Boyd, S. P. and Vandenberghe, L. Convex optimization.
Cambridge University Press, 2004.
Buntine, W.L. Variational Extensions to EM and Multino-
mial PCA. In Proc. ECML, 2002.
Elad, M. and Aharon, M. Image denoising via sparse
and redundant representations over learned dictionaries.
IEEE Trans. Image Process., 54(12):3736–3745, 2006.
Jacob, L., Obozinski, G., and Vert, J.-P. Group Lasso with
overlap and graph Lasso. In Proc. ICML, 2009.
Jenatton, R., Audibert, J.-Y., and Bach, F. Structured vari-
able selection with sparsity-inducing norms. Technical
report, arXiv:0904.3523, 2009.
Jenatton, R., Obozinski, G., and Bach, F. Structured sparse
principal component analysis. In Proc. AISTATS, 2010.
Ji, S., and Ye, J. An accelerated gradient method f or trace
norm minimization. In Proc. ICML, 2009.
Kavukcuoglu, K., Ranzato, M., Fergus, R., and LeCun,
Y. Learning invariant features through topographic ﬁl-
ter maps. In Proc. CVPR, 2009.
Kim, S. and Xing, E. P. Tree-guided group lasso for multi-
task regression with structured sparsity. Technical report,
Lee, D. D. and Seung, H. S. Learning the parts of objects by
non-negative matrix factorization. Nature, 401(6755):
Lee, H., Battle, A., Raina, R., and Ng, A. Y. Efﬁcient sparse
coding algorithms. In Adv. NIPS, 2007.
Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman,
A. Supervised Dictionary Learning. In Adv. NIPS, 2009.
Mairal, J., Bach, F., Ponce, J., and Sapiro, G. Online learn-
ing for matrix factorization and sparse coding. J. Mach.
Learn. Res., 11, 19–60, 2010.
Nesterov, Y. Gradient methods for minimizing composite
objective function. Technical report, CORE, 2007.
Olshausen, B. A. and Field, D. J. Sparse coding with an
overcomplete basis set: A strategy employed by V1? Vi-
sion Research, 37:3311–3325, 1997.
Zhao, P., Rocha, G., and Yu, B. The composite absolute
penalties family for grouped and hierarchical variable se-
lection. Ann. Stat., 37(6A):3468–3497, 2009.
Zhu, J., Ahmed, A., and Xing, E. P. MedLDA: maximum
margin supervised topic models for regression and clas-
siﬁcation. In Proc. ICML, 2009.