About
60
Publications
5,381
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
441
Citations
Introduction
Additional affiliations
Education
May 2012 - January 2013
November 2010 - October 2013
September 2005 - July 2010
Publications
Publications (60)
Let $X_{1},\ldots,X_{n}$ be i.i.d. sample in $\mathbb{R}^{p}$ with zero mean and the covariance matrix $\mathbf{\Sigma}$. The problem of recovering the projector onto an eigenspace of $\mathbf{\Sigma}$ from these observations naturally arises in many applications. Recent technique from [Koltchinskii, Lounici, 2015] helps to study the asymptotic dis...
We consider a random symmetric matrix ${\bf X} = [X_{jk}]_{j,k=1}^n$ with upper triangular entries being i.i.d. random variables with mean zero and unit variance. We additionally suppose that $\mathbb E |X_{11}|^{4 + \delta} =: \mu_{4+\delta} < \infty$ for some $\delta > 0$. The aim of this paper is to significantly extend recent result of the auth...
We consider a random symmetric matrix ${\bf X} = [X_{jk}]_{j,k=1}^n$ in which
the upper triangular entries are independent identically distributed random
variables with mean zero and unit variance. We additionally suppose that
$\mathbb E |X_{11}|^{4 + \delta} =: \mu_4 < \infty$ for some $\delta > 0$.
Under these conditions we show that the typical...
We consider a random symmetric matrix ${\bf X} = [X_{jk}]_{j,k=1}^n$ with
upper triangular entries being independent identically distributed random
variables with mean zero and unit variance. We additionally suppose that
$\mathbb E |X_{11}|^{4 + \delta} =: \mu_{4+\delta} < C$ for some $\delta > 0$
and some absolute constant $C$. Under these conditi...
In this paper we consider the product of two independent random matrices
$\mathbb X^{(1)}$ and $\mathbb X^{(2)}$. Assume that $X_{jk}^{(q)}, 1 \le j,k
\le n, q = 1, 2,$ are i.i.d. random variables with $\mathbb E X_{jk}^{(q)} = 0,
\mathbb E (X_{jk}^{(q)})^2 = 1$. Denote by $s_1, ..., s_n$ the singular values
of $\mathbb W: = \frac{1}{n} \mathbb X^{...
In this paper, we establish non-asymptotic convergence rates in the central limit theorem for Polyak-Ruppert-averaged iterates of stochastic gradient descent (SGD). Our analysis builds on the result of the Gaussian approximation for nonlinear statistics of independent random variables of Shao and Zhang (2022). Using this result, we prove the non-as...
We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation technique to reduce the asymptotic bias of SGD at the expense of a mild i...
In this paper, we introduce a novel approach for bounding the cumulant generating function (CGF) of a Dirichlet process (DP) $X \sim \text{DP}(\alpha \nu_0)$, using superadditivity. In particular, our key technical contribution is the demonstration of the superadditivity of $\alpha \mapsto \log \mathbb{E}_{X \sim \text{DP}(\alpha \nu_0)}[\exp( \mat...
Generative Flow Networks (GFlowNets) treat sampling from distributions over compositional discrete spaces as a sequential decision-making problem, training a stochastic policy to construct objects step by step. Recent studies have revealed strong connections between GFlowNets and entropy-regularized reinforcement learning. Building on these insight...
In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Our findings reveal that the fastest rate of normal approximation is achieved when setting the most aggressive step size $\alpha_{k} \asymp k...
This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system [Formula: see text] for which [Formula: see text] can only be estimated by (asymptotically) unbiased obse...
In this paper, we establish moment and Bernstein-type inequalities for additive functionals of geometrically ergodic Markov chains. These inequalities extend the corresponding inequalities for independent random variables. Our conditions cover Markov chains converging geometrically to the stationary distribution either in weighted total variation n...
In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis method...
In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance und...
This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are...
We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinfor...
This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a $d$-dimensional linear system $\bar{\mathbf{A}} \theta = \bar{\mathbf{b}}$, for which $(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only...
This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any H\"{o}lder smooth function up to a given approximation error in H\"{o}lder norms in such a way that all weights of this neural netw...
We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits. Our method uses the quantile of a Q-value function posterior as upper confidence bound on the optimal Q-value function. For B...
We develop an Explore-Exploit Markov chain Monte Carlo algorithm ($\operatorname{Ex^2MCMC}$) that combines multiple global proposals and local moves. The proposed method is massively parallelizable and extremely computationally efficient. We prove $V$-uniform geometric ergodicity of $\operatorname{Ex^2MCMC}$ under realistic conditions and compute e...
In this paper, we establish moment and Bernstein-type inequalities for additive functionals of geometrically ergodic Markov chains. These inequalities extend the corresponding inequalities for independent random variables. Our conditions cover Markov chains converging geometrically to the stationary distribution either in $V$-norms or in weighted W...
Two–sided bounds are constructed for a probability density function of a weighted sum of chi-square variables. Both cases of central and non-central chi-square variables are considered. The upper and lower bounds have the same dependence on the parameters of the sum and differ only in absolute constants. The estimates obtained will be useful, in pa...
This paper provides a non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize. This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear system $\bar{A}\theta = \bar{b}$ for which $\bar{A}$ and $\bar{b}$ can only be accessed through random estimates $...
In this work we present an approach for building tight model-free confidence intervals for the optimal value function $V^\star$ in general infinite horizon MDPs via the upper solutions. We suggest a novel upper value iterative procedure (UVIP) to construct upper solutions for a given agent's policy. UVIP leads to a model free method of policy evalu...
In this paper we propose a novel and practical variance reduction approach for additive functionals of dependent sequences. Our approach combines the use of control variates with the minimization of an empirical variance estimate. We analyze finite sample properties of the proposed method and derive finite-Time bounds of the excess asymptotic varia...
This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online learning or reinforcement learning). The existing results impose strong conditions such as u...
We undertake a precise study of the non-asymptotic properties of vanilla generative adversarial networks (GANs) and derive theoretical guarantees in the problem of estimating an unknown $d$-dimensional density $p^*$ under a proper choice of the class of generators and discriminators. We prove that the resulting density estimate converges to $p^*$ i...
Two--sided bounds are constructed for a probability density function of a weighted sum of chi-square variables. Both cases of central and non-central chi-square variables are considered. The upper and lower bounds have the same dependence on the parameters of the sum and differ only in absolute constants. The estimates obtained will be useful, in p...
We consider a random symmetric matrix \(\mathbf{X}= [X_{jk}]_{j,k=1}^n\) with upper triangular entries being independent random variables with mean zero and unit variance. Assuming that \( \max _{jk} {{\,\mathrm{\mathbb {E}}\,}}|X_{jk}|^{4+\delta } < \infty , \delta > 0\), it was proved in Götze et al. (Bernoulli 24(3):2358–2400, 2018) that with hi...
In this paper we propose a novel and practical variance reduction approach for additive functionals of dependent sequences. Our approach combines the use of control variates with the minimisation of an empirical variance estimate. We analyse finite sample properties of the proposed method and derive finite-time bounds of the excess asymptotic varia...
In this paper, we propose a novel variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates. A distinctive feature of the proposed approach is its ability to significantly reduce the overall finite sample varia...
Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise...
In this paper we propose a novel variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates. A distinctive feature of the proposed approach is its ability to significantly reduce the overall finite sample varian...
We consider products of independent [Formula: see text] non-Hermitian random matrices [Formula: see text]. Assume that their entries, [Formula: see text], are independent identically distributed random variables with zero mean, unit variance. Götze and Tikhomirov [On the asymptotic spectrum of products of independent random matrices, preprint (2010...
We consider a random symmetric matrix ${\bf X} = [X_{jk}]_{j,k=1}^n$ with upper triangular entries being independent random variables with mean zero and unit variance. Assuming that $\max_{jk} {\mathbb E} |X_{jk}|^{4+\delta} < \infty, \delta > 0$, it was proved in [G\"otze, Naumov and Tikhomirov, Bernoulli, 2018] that with high probability the typi...
We consider symmetric random matrices with independent mean zero and unit variance entries in the upper triangular part. Assuming that the distributions of matrix entries have finite moment of order four, we prove optimal bounds for the distance between the Stieltjes transforms of the empirical spectral distribution function and the semicircle law....
We consider symmetric random matrices \({{{\mathbf{X}}}_{n}} = [{{X}_{{jk}}}]_{{j,k = 1}}^{n},n \geqslant 1\), whose upper triangular entries are independent random variables with zero mean and unit variance. Under the assumption \(\mathbb{E}{\text{|}}{{X}_{{jk}}}{{{\text{|}}}^{4}} < C\), j, k = 1, 2, ..., n, it is shown that the fluctuations of th...
Upper bounds for the closeness of two centered Gaussian measures in the class of balls in a separable Hilbert space are obtained. The bounds are optimal with respect to the dependence on the spectra of the covariance operators of the Gaussian measures. The inequalities cannot be improved in the general case.
A sample X1,...,Xn consisting of independent identically distributed vectors in ℝp with zero mean and a covariance matrix Σ is considered. The recovery of spectral projectors of high-dimensional covariance matrices from a sample of observations is a key problem in statistics arising in numerous applications. In their 2015 work, V. Koltchinskii and...
We derive tight non-asymptotic bounds for the Kolmogorov distance between the probabilities of
two Gaussian elements to hit a ball in a Hilbert space. The key property of these bounds is that
they are dimension-free and depend on the nuclear (Schatten-one) norm of the difference between
the covariance operators of the elements and on the norm of th...
In this paper we consider asymptotic expansions for a class of sequences of
symmetric functions of many variables. Applications to classical and free
probability theory are discussed.
http://rdcu.be/u8N8
We derive tight non-asymptotic bounds for the Kolmogorov distance between the probabilities of two Gaussian elements to hit a ball in a Hilbert space. The key property of these bounds is that they are dimension-free and depend on the nuclear (Schatten-one) norm of the difference between the covariance operators of the elements and on the norm of th...
The aim of this paper is to prove a local version of the circular law for non-Hermitian random matrices and its generalization to the product of non-Hermitian random matrices under weak moment conditions. More precisely we assume that the entries $X_{jk}^{(q)}$ of non-Hermitian random matrices ${\bf X}^{(q)}, 1 \le j,k \le n, q = 1, \ldots, m, m \g...
The aim of this paper is to prove a local version of the circular law for non-Hermitian random matrices and its generalization to the product of non-Hermitian random matrices under weak moment conditions. More precisely we assume that the entries $X_{jk}^{(q)}$ of non-Hermitian random matrices ${\bf X}^{(q)}, 1 \le j,k \le n, q = 1, \ldots, m, m \g...
We consider a random symmetric matrix (Formula presented) where the upper triangular entries are independent identically distributed random variables with zero mean and unit variance. We additionally suppose that (Formula presented) for some δ > 0. Under these conditions we show that the typical distance between the Stieltjes transform of the empir...
Symmetric random matrices are considered whose upper triangular entries are independent identically distributed random variables with zero mean, unit variance, and a finite moment of order 4 + δ, δ > 0. It is shown that the distances between the Stieltjes transforms of the empirical spectral distribution function and the semicircle law are of order...
In this note, we consider ensembles of random symmetric matrices with Gaussian elements. Assume that \( \mathbb{E} \)
X
ij = 0 and \( \mathbb{E}{X}_{ij}^2={\sigma}_{ij}^2 \) We do not assume that all the σ
ij
are equal. Assuming that the average of the normalized sums of variances in each row converges to one and the Lindeberg condition holds, we p...
We consider the products of m >= 2 independent large real random matrices with independent tuples (X-jk((q)), X-kj((q))), 1 <= j < k <= n of entries. The entries X-jk((q)), X-kj((q)) are standardized and correlated with correlation coefficient rho = E[X-jk((q)) X-kj((q))]. The limit distribution of the empirical spectral distribution of the eigenva...
Let $\mathbf X$ be a random matrix whose pairs of entries $X_{jk}$ and
$X_{kj}$ are correlated and vectors $ (X_{jk},X_{kj})$, for $1\le j<k\le n$,
are mutually independent. Assume that the diagonal entries are independent from
off-diagonal entries as well. We assume that $\mathbb{E} X_{jk}=0$, $\mathbb{E}
X_{jk}^2=1$, for any $j,k=1,\ldots,n$ and...
We consider the products of $m\ge 2$ independent large real random matrices with independent vectors $(X_{jk}^{(q)},X_{kj}^{(q)})$ of entries. The entries $X_{jk}^{(q)},X_{kj}^{(q)}$ are correlated with $\rho=\mathbb E X_{jk}^{(q)}X_{kj}^{(q)}$. The limit distribution of the empirical spectral distribution of the eigenvalues of such products doesn'...
In this paper we study random symmetric matrices with dependent entries. Suppose that all entries have zero mean and finite variances, which can be different. Assuming that the average of normalized sums of variances in each row converges to one and the Lindeberg condition holds true, we prove that the empirical spectral distribution of eigenvalues...
In this paper we consider ensemble of random matrices $\X_n$ with independent
identically distributed vectors $(X_{ij}, X_{ji})_{i \neq j}$ of entries. Under
assumption of finite fourth moment of matrix entries it is proved that
empirical spectral distribution of eigenvalues converges in probability to a
uniform distribution on the ellipse. The axi...
In this paper we study ensembles of random symmetric matrices $\X_n =
{X_{ij}}_{i,j = 1}^n$ with dependent entries such that $\E X_{ij} = 0$, $\E
X_{ij}^2 = \sigma_{ij}^2$, where $\sigma_{ij}$ may be different numbers.
Assuming that the average of the normalized sums of variances in each row
converges to one and Lindeberg condition holds we prove t...
Analogs of the Kolmogorov, Zygmund-Martsenkevich, and Brunk-Prokhorov strong law of large numbers are proved for martingales
with continuous parameter. A new generalization of the Brunk—Prokhorov strong law of large numbers is given for martingales
with discrete times. Along with convergence almost everywhere, we also prove the average convergence....