## About

30

Publications

2,189

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

202

Citations

Citations since 2016

Introduction

**Skills and Expertise**

## Publications

Publications (30)

This book presents a unified theory of random matrices for applications in machine learning, offering a large-dimensional data vision that exploits concentration and universality phenomena. This enables a precise understanding, and possible improvements, of the core mechanisms at play in real-world machine learning algorithms. The book opens with a...

This book presents a unified theory of random matrices for applications in machine learning, offering a large-dimensional data vision that exploits concentration and universality phenomena. This enables a precise understanding, and possible improvements, of the core mechanisms at play in real-world machine learning algorithms. The book opens with a...

This chapter discusses the fundamentally different mental images of large-dimensional machine learning (versus its small-dimensional counterpart), through the examples of sample covariance matrices and kernel matrices, on both synthetic and real data. Random matrix theory is presented as a flexible and powerful tool to assess, understand, and impro...

This chapter discusses the generalized linear classifier that results from convex optimization problem and takes in general nonexplicit form. Random matrix theory is combined with leave-one-out arguments to handle the technical difficulty due to implicity. Again, counterintuitive phenomena arise in popular machine learning methods such as logistic...

This chapter covers the basics of random matrix theory, within the unified framework of resolvent- and deterministic-equivalent approach. Historical and foundational random matrix results are presented in the proposed framework, together with heuristic derivations as well as detailed proofs. Topics such as statistical inference and spiked models ar...

This book presents a unified theory of random matrices for applications in machine learning, offering a large-dimensional data vision that exploits concentration and universality phenomena. This enables a precise understanding, and possible improvements, of the core mechanisms at play in real-world machine learning algorithms. The book opens with a...

This chapter discusses the fundamental kernel methods, with applications to supervised (kernel ridge regression or LS-SVM), semi-supervised (graph Laplacian-based learning), or unsupervised learning (such as kernel spectral clustering) schemes. By focusing on the typical examples of distance and inner-product-type kernels, we show how large-dimensi...

In this paper, we propose a geometric framework to analyze the convergence properties of gradient descent trajectories in the context of linear neural networks. We translate a well-known empirical observation of linear neural nets into a conjecture that we call the overfitting conjecture which states that, for almost all training data and initial c...

This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples n , their dimension p , and the dimension of feature space N are all large and comparable. In this regime, the random RFF Gram matrix no longer converges to the well-known limiting Gaussian kernel ma...

In this article, we investigate the spectral behavior of random features kernel matrices of the type ${\bf K} = \mathbb{E}_{{\bf w}} \left[\sigma\left({\bf w}^{\sf T}{\bf x}_i\right)\sigma\left({\bf w}^{\sf T}{\bf x}_j\right)\right]_{i,j=1}^n$, with nonlinear function $\sigma(\cdot)$, data ${\bf x}_1, \ldots, {\bf x}_n \in \mathbb{R}^p$, and random...

Given an optimization problem, the Hessian matrix and its eigenspectrum can be used in many ways, ranging from designing more efficient second-order algorithms to performing model analysis and regression diagnostics. When nonlinear models and non-convex problems are considered, strong simplifying assumptions are often made to make Hessian spectral...

For a tall $n\times d$ matrix $A$ and a random $m\times n$ sketching matrix $S$, the sketched estimate of the inverse covariance matrix $(A^\top A)^{-1}$ is typically biased: $E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$, where $\tilde A=SA$. This phenomenon, which we call inversion bias, arises, e.g., in statistics and distributed optimizati...

In this paper, we provide a precise characterize of generalization properties of high dimensional kernel ridge regression across the under- and over-parameterized regimes, depending on whether the number of training data $n$ exceeds the feature dimension $d$. By establishing a novel bias-variance decomposition of the expected excess risk, we show t...

Given a large data matrix, sparsifying, quantizing, and/or performing other entry-wise nonlinear operations can have numerous benefits, ranging from speeding up iterative algorithms for core numerical linear algebra problems to providing nonlinear filters to design state-of-the-art neural network models. Here, we exploit tools from random matrix th...

It is often desirable to reduce the dimensionality of a large dataset by projecting it onto a low-dimensional subspace. Matrix sketching has emerged as a powerful technique for performing such dimensionality reduction very efficiently. Even though there is an extensive literature on the worst-case performance of sketching, existing guarantees are t...

This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$, their dimension $p$, and the dimension of feature space $N$ are all large and comparable. In this regime, the random RFF Gram matrix no longer converges to the well-known limiting Gaussian kerne...

Large dimensional data and learning systems are ubiquitous in modern machine learning. As opposed to small dimensional learning, large dimensional machine learning algorithms are prone to various counterintuitive phenomena and behave strikingly differently from the low dimensional intuitions upon which they are built. Nonetheless, by assuming the d...

This article investigates the eigenspectrum of the inner product-type kernel matrix $\sqrt{p} \mathbf{K}=\{f( \mathbf{x}_i^{\sf T} \mathbf{x}_j/\sqrt{p})\}_{i,j=1}^n $ under a binary mixture model in the high dimensional regime where the number of data $n$ and their dimension $p$ are both large and comparable. Based on recent advances in random mat...

In this article, we investigate a family of classification algorithms defined by the principle of empirical risk minimization, in the high dimensional regime where the feature dimension $p$ and data number $n$ are both large and comparable. Based on recent advances in high dimensional statistics and random matrix theory, we provide under mixture da...

In this article we present a geometric framework to analyze convergence of gradient descent trajectories in the context of neural networks. In the case of linear networks of an arbitrary number of hidden layers, we characterize appropriate quantities which are conserved along the gradient descent system (GDS). We use them to prove boundedness of ev...

Understanding the learning dynamics of neural networks is one of the key issues for the improvement of optimization algorithms as well as for the theoretical comprehension of why deep neural nets work so well today. In this paper, we introduce a random matrix-based framework to analyze the learning dynamics of a single-layer linear network on a bin...

Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators. In this paper, we leverage the "concentration" phenomenon induced by random matrix theory to perform a spectral analysis on the Gram matrix of these random feat...

This article proposes a performance analysis of kernel least squares support vector machines (LS-SVMs) based on a random matrix approach, in the regime where both the dimension of data $p$ and their number $n$ grow large at the same rate. Under a two-class Gaussian mixture model for the input data, we prove that the LS-SVM decision function is asym...

This article studies the Gram random matrix model $G=\frac1T\Sigma^{\rm T}\Sigma$, $\Sigma=\sigma(WX)$, classically found in random neural networks, where $X=[x_1,\ldots,x_T]\in\mathbb{R}^{p\times T}$ is a (data) matrix of bounded norm, $W\in\mathbb{R}^{n\times p}$ is a matrix of independent zero-mean unit variance entries, and $\sigma:\mathbb{R}\t...

In this article, a large dimensional performance analysis of kernel least squares support vector machines (LS-SVMs) is provided under the assumption of a two-class Gaussian mixture model for the input data. Building upon recent advances in random matrix theory, we show, when the dimension of data
$p$
and their number
$n$
are both large, that the...