Preprint

Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Recently, over-parameterized neural networks have been extensively analyzed in the literature. However, the previous studies cannot satisfactorily explain why fully trained neural networks are successful in practice. In this paper, we present a new theoretical framework for analyzing over-parameterized neural networks which we call neural feature repopulation. Our analysis can satisfactorily explain the empirical success of two level neural networks that are trained by standard learning algorithms. Our key theoretical result is that in the limit of infinite number of hidden neurons, over-parameterized two-level neural networks trained via the standard (noisy) gradient descent learns a well-defined feature distribution (population), and the limiting feature distribution is nearly optimal for the underlying learning task under certain conditions. Empirical studies confirm that predictions of our theory are consistent with the results observed in real practice.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multi-layer artificial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors.
Book
Probability. Measure. Integration. Random Variables and Expected Values. Convergence of Distributions. Derivatives and Conditional Probability. Stochastic Processes. Appendix. Notes on the Problems. Bibliography. List of Symbols. Index.
Article
Formulation In the most general form, we can write an optimization problem in a topological space endowed with some topology and J : R is the objective functional. By extending the objective functional to U via J(u) := we can rewrite this problem as .
Can sgd learn recurrent neural networks with provable generalization?
  • Z Allen-Zhu
  • Y Li
Allen-Zhu, Z. & Li, Y. (2019). Can sgd learn recurrent neural networks with provable generalization? arXiv preprint arXiv:1902.01028.
Learning and generalization in overparameterized neural networks, going beyond two layers
  • Z Allen-Zhu
  • Y Li
  • Y Liang
Allen-Zhu, Z., Li, Y., & Liang, Y. (2018). Learning and generalization in overparameterized neural networks, going beyond two layers. arXiv preprint arXiv:1811.04918.
A convergence theory for deep learning via overparameterization
  • Z Allen-Zhu
  • Y Li
  • Z Song
Allen-Zhu, Z., Li, Y., & Song, Z. (2019). A convergence theory for deep learning via overparameterization. In International Conference on Machine Learning.
Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks
  • S Arora
  • S S Du
  • W Hu
  • Z Li
  • R Wang
Arora, S., Du, S. S., Hu, W., Li, Z., & Wang, R. (2019). Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In International Conference on Machine Learning.