[Show abstract][Hide abstract] ABSTRACT: We study the connection between the highly non-convex loss function of a
simple model of the fully-connected feed-forward neural network and the
Hamiltonian of the spherical spin-glass model under the assumptions of: i)
variable independence, ii) redundancy in network parametrization, and iii)
uniformity. These assumptions enable us to explain the complexity of the fully
decoupled neural network through the prism of the results from the random
matrix theory. We show that for large-size decoupled networks the lowest
critical values of the random loss function are located in a well-defined
narrow band lower-bounded by the global minimum. Furthermore, they form a
layered structure. We show that the number of local minima outside the narrow
band diminishes exponentially with the size of the network. We empirically
demonstrate that the mathematical model exhibits similar behavior as the
computer simulations, despite the presence of high dependencies in real
networks. We conjecture that both simulated annealing and SGD converge to the
band containing the largest number of critical points, and that all critical
points found there are local minima and correspond to the same high learning
quality measured by the test error. This emphasizes a major difference between
large- and small-size networks where for the latter poor quality local minima
have non-zero probability of being recovered. Simultaneously we prove that
recovering the global minimum becomes harder as the network size increases and
that it is in practice irrelevant as global minimum often leads to overfitting.
[Show abstract][Hide abstract] ABSTRACT: These notes cover one of the topics programmed for the St Petersburg School
in Probability and Statistical Physics of June 2012.
The aim is to review recent mathematical developments in the field of random
walks in random environment. Our main focus will be on directionally transient
and reversible random walks on different types of underlying graph structures,
such as $\mathbb{Z}$, trees and $\mathbb{Z}^d$ for $d\geq 2$.
[Show abstract][Hide abstract] ABSTRACT: We take a first small step to extend the validity of Rudelson-Vershynin type
estimates to some sparse random matrices, here random permutation matrices. We
give lower (and upper) bounds on the smallest singular value of a large random
matrix D+M where M is a random permutation matrix, sampled uniformly, and D is
diagonal. When D is itself random with i.i.d terms on the diagonal, we obtain a
Rudelson-Vershynin type estimate, using the classical theory of random walks
with negative drift.
[Show abstract][Hide abstract] ABSTRACT: We analyze the landscape of general smooth Gaussian functions on the sphere in dimension N, when N is large. We give an explicit formula for the asymptotic complexity of the mean number of critical points of finite and diverging index at any level of energy and for the mean Euler characteristic of level sets. We then find two possible scenarios for the bottom landscape, one that has a layered structure of critical values and a strong correlation between indexes and critical values and another where even at levels below the limiting ground state energy the mean number of local minima is exponentially large. We end the paper by discussing how these results can be interpreted in the language of spin glasses models.
The Annals of Probability 11/2013; 41(6). DOI:10.1214/13-AOP862 · 1.42 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We introduce a general model of trapping for random walks on graphs. We give
the possible scaling limits of these "Randomly Trapped Random Walks" on Z.
These scaling limits include the well known Fractional Kinetics process, the
Fontes-Isopi-Newman singular diffusion as well as a new broad class we call
Spatially Subordinated Brownian Motions. We give sufficient conditions for
convergence and illustrate these on two important examples.
The Annals of Probability 02/2013; 43(5). DOI:10.1214/14-AOP939 · 1.42 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We give an asymptotic evaluation of the complexity of spherical p-spin
spin-glass models via random matrix theory. This study enables us to obtain
detailed information about the bottom of the energy landscape, including the
absolute minimum (the ground state), the other local minima, and describe an
interesting layered structure of the low critical values for the Hamiltonians
of these models. We also show that our approach allows us to compute the
related TAP-complexity and extend the results known in the physics literature.
As an independent tool, we prove a LDP for the k-th largest eigenvalue of the
GOE, extending the results of Ben Arous, Dembo and Guionnett (2001).
Communications on Pure and Applied Mathematics 02/2013; 66(2). DOI:10.1002/cpa.21422 · 3.13 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: As a model of trapping by biased motion in random structure, we study the
time taken for a biased random walk to return to the root of a subcritical
Galton-Watson tree. We do so for trees in which these biases are randomly
chosen, independently for distinct edges, according to a law that satisfies a
logarithmic non-lattice condition. The mean return time of the walk is in
essence given by the total conductance of the tree. We determine the asymptotic
decay of this total conductance, finding it to have a pure power-law decay. In
the case of the conductance associated to a single vertex at maximal depth in
the tree, this asymptotic decay may be analysed by the classical defective
renewal theorem, due to the non-lattice edge-bias assumption.
However, the derivation of the decay for total conductance requires computing
an additional constant multiple outside the power-law that allows for the
contribution of all vertices close to the base of the tree. This computation
entails a detailed study of a convenient decomposition of the tree, under
conditioning on the tree having high total conductance. As such, our principal
conclusion may be viewed as a development of renewal theory in the context of
random environments.
For randomly biased random walk on a supercritical Galton-Watson tree with
positive extinction probability, our main results may be regarded as a
description of the slowdown mechanism caused by the presence of subcritical
trees adjacent to the backbone that may act as traps that detain the walker.
Indeed, this conclusion is exploited in \cite{GerardAlan} to obtain a stable
limiting law for walker displacement in such a tree.
Communications on Pure and Applied Mathematics 11/2012; 65(11). DOI:10.1002/cpa.21416 · 3.13 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We consider a biased random walk Xn on a Galton–Watson tree with leaves in the sub-ballistic regime. We prove that there exists an explicit constant γ = γ(β) ∈ (0, 1), depending on the bias β, such that |Xn| is of order nγ. Denoting Δn the hitting time of level n, we prove that Δn/n1/γ is tight. Moreover, we show that Δn/n1/γ does not converge in law (at least for large values of β). We prove that along the sequences nλ(k) = ⌊λβγk⌋, Δn/n1/γ converges to certain infinitely divisible laws. Key tools for the proof are the classical Harris decomposition for Galton–Watson trees, a new variant of regeneration times and the careful analysis of triangular arrays of i.i.d. heavy-tailed random variables.
The Annals of Probability 01/2012; 40(1). DOI:10.1214/10-AOP620 · 1.42 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We study the many body quantum evolution of bosonic systems in the mean field
limit. The dynamics is known to be well approximated by the Hartree equation.
So far, the available results have the form of a law of large numbers. In this
paper we go one step further and we show that the fluctuations around the
Hartree evolution satisfy a central limit theorem. Interestingly, the variance
of the limiting Gaussian distribution is determined by a time-dependent
Bogoliubov transformation describing the dynamics of initial coherent states in
a Fock space representation of the system.
[Show abstract][Hide abstract] ABSTRACT: The speed $v(\beta)$ of a $\beta$-biased random walk on a Galton-Watson tree
without leaves is increasing for $\beta \geq 717$.
[Show abstract][Hide abstract] ABSTRACT: We analyze the landscape of general smooth Gaussian functions on the sphere
in dimension N, when N is large. We give an explicit formula for the asymptotic
complexity of the mean number of critical points of finite and diverging index
at any level of energy and for the mean Euler characteristic of level sets. We
then find two possible scenarios for the bottom energy landscape, one that has
a layered structure of critical values and a strong correlation between indexes
and critical values and another where even at energy levels below the limiting
ground state energy the mean number of local minima is exponentially large.
These two scenarios should correspond to the distinction between one-step
replica symmetry breaking and full replica-symmetric breaking of the physics
literature on spin glasses. In the former, we find a new way to derive the
asymptotic complexity function as a function of the 1RSB Parisi functional.
[Show abstract][Hide abstract] ABSTRACT: We prove the Einstein relation, relating the velocity under a small
perturbation to the diffusivity in equilibrium, for certain biased random walks
on Galton--Watson trees. This provides the first example where the Einstein
relation is proved for motion in random media with arbitrary deep traps.
Annales de l Institut Henri Poincaré Probabilités et Statistiques 06/2011; 49(3). DOI:10.1214/12-AIHP486 · 1.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Smooth linear statistics of random permutation matrices, sampled under a
general Ewens distribution, exhibit an interesting non-universality phenomenon.
Though they have bounded variance, their fluctuations are asymptotically
non-Gaussian but infinitely divisible. The fluctuations are asymptotically
Gaussian for less smooth linear statistics for which the variance diverges. The
degree of smoothness is measured in terms of the quality of the trapezoidal
approximations of the integral of the observable.
Annales de l Institut Henri Poincaré Probabilités et Statistiques 06/2011; 51(2). DOI:10.1214/13-AIHP569 · 1.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: This is a brief survey of some of the important results in the study of the eigenvalues and the eigenvectors of Wigner random matrices, i.e. random her- mitian (or real symmetric) matrices with i.i.d entries. We review briey the known universality results, which show how much the behavior of the spectrum is insensitive to the distribution of the entries.
[Show abstract][Hide abstract] ABSTRACT: We consider Random Hopping Time (RHT) dynamics of the Sherrington - Kirkpatrick (SK) model and p-spin models of spin glasses. For any of these models and for any inverse temperature we prove that, on time scales that are sub-exponential in the dimension, the properly scaled clock process (time-change process) of the dynamics converges to an extremal process. Moreover, on these time scales, the system exhibits aging like behavior which we called extremal aging. In other words, the dynamics of these models ages as the random energy model (REM) does. Hence, by extension, this confirms Bouchaud's REM-like trap model as a universal aging mechanism for a wide range of systems which, for the first time, includes the SK model.
[Show abstract][Hide abstract] ABSTRACT: This paper studies the extreme gaps between eigenvalues of random matrices.
We give the joint limiting law of the smallest gaps for Haar-distributed
unitary matrices and matrices from the Gaussian unitary ensemble. In
particular, the kth smallest gap, normalized by a factor $n^{-4/3}$, has a
limiting density proportional to $x^{3k-1}e^{-x^3}$. Concerning the largest
gaps, normalized by $n/\sqrt{\log n}$, they converge in ${\mathrm{L}}^p$ to a
constant for all $p>0$. These results are compared with the extreme gaps
between zeros of the Riemann zeta function.
The Annals of Probability 10/2010; 41(4). DOI:10.1214/11-AOP710 · 1.42 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We report here on the recent works [3] and [4]. There we consider a model of diffusion in random media with a two-way coupling (i.e. a model in which the randomness of the medium influences the diffusing particles, and where the diffusing particles change the medium). In this particular model, particles are injected at the origin with a time-dependent rate, and diffuse among random traps. Each trap has a finite (random) depth, so that when it has absorbed a finite (random) number of particles it is "saturated", and it no longer acts as a trap. Related models have been studied recently by Gravner and Quastel [10] and by Funaki [9] using hydrodinamic limit tools. We compute the asymptotic behaviour of the probability of survival of a particle born at some given time, both in the annealed and quenched cases, and show that three different situations occur depending on the injection rate. For weak injection, the typical survival strategy of the particle is as in Sznitman [16] and the asymptotic behaviour of this survival probability behaves as if there was no saturation effect. For medium injection rate, the picture is closer to that of Internal DLA, as given by Lawler, Bramson and Griffeath [13]. For large injection rates, the picture is less understood except in dimension one.