Dmitry Kamzolov

Dmitry Kamzolov
Mohamed bin Zayed University of Artificial Intelligence | MBZUAI

Ph.D. in Computer Science

About

30
Publications
1,943
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
157
Citations
Introduction
Dmitry Kamzolov is a research associate at MBZUAI, Abu Dhabi, UAE. He is working at the Optimization and Machine Learning Lab under the supervision of Prof. Martin Takáč. The main focus of the research is second-order and higher-order methods for convex and non-convex optimization. Also, he has papers on distributed optimization, federated learning, and various applications.

Publications

Publications (30)
Preprint
Full-text available
We present a new accelerated stochastic second-order method that is robust to both gradient and Hessian inexactness, which occurs typically in machine learning. We establish theoretical lower bounds and prove that our algorithm achieves optimal convergence in both gradient and Hessian inexactness in this key setting. We further introduce a tensor g...
Preprint
Full-text available
In this paper, we propose Cubic Regularized Quasi-Newton Methods for (strongly) star-convex and Accelerated Cubic Regularized Quasi-Newton for convex optimization. The proposed class of algorithms combines the global convergence of the Cubic Newton with the affordability of constructing the Hessian approximation via a Quasi-Newton update. To constr...
Preprint
Full-text available
In this paper, we present the first stepsize schedule for Newton method resulting in fast global and local convergence guarantees. In particular, a) we prove an $O\left( \frac 1 {k^2} \right)$ global rate, which matches the state-of-the-art global rate of cubically regularized Newton method of Polyak and Nesterov (2006) and of regularized Newton me...
Article
Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed S...
Chapter
Collaboration among multiple data-owning entities (e.g., hospitals) can accelerate the training process and yield better machine learning models due to the availability and diversity of data. However, privacy concerns make it challenging to exchange data while preserving confidentiality. Federated Learning (FL) is a promising solution that enables...
Preprint
Full-text available
Exploiting higher-order derivatives in convex optimization is known at least since 1970's. In each iteration higher-order (also called tensor) methods minimize a regularized Taylor expansion of the objective function, which leads to faster convergence rates if the corresponding higher-order derivative is Lipschitz-continuous. Recently a series of l...
Preprint
Full-text available
Collaboration among multiple data-owning entities (e.g., hospitals) can accelerate the training process and yield better machine learning models due to the availability and diversity of data. However, privacy concerns make it challenging to exchange data while preserving confidentiality. Federated Learning (FL) is a promising solution that enables...
Preprint
Full-text available
Inspired by the recent work FedNL (Safaryan et al, FedNL: Making Newton-Type Methods Applicable to Federated Learning), we propose a new communication efficient second-order framework for Federated learning, namely FLECS. The proposed method reduces the high-memory requirements of FedNL by the usage of an L-SR1 type update for the Hessian approxima...
Preprint
Full-text available
This work considers non-convex finite sum minimization. There are a number of algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner that is based upon Hutch...
Preprint
Full-text available
Gradient-free/zeroth-order methods for black-box convex optimization have been extensively studied in the last decade with the main focus on oracle calls complexity. In this paper, besides the oracle complexity, we focus also on iteration complexity, and propose a generic approach that, based on optimal first-order methods, allows to obtain in a bl...
Chapter
Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with class...
Preprint
Full-text available
We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finite-sum problem, for which we propose an inexact accelerated cubic-regularized Newton's method that achieves lower communication com...
Preprint
Full-text available
Statistical preconditioning can be used to design fast methods for distributed large-scale empirical risk minimization problems, for strongly convex and smooth loss functions, allowing fewer communication rounds. Multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxili...
Preprint
Full-text available
We propose a general non-accelerated tensor method under inexact information on higher-order derivatives, analyze its convergence rate, and provide sufficient conditions for this method to have similar complexity as the exact tensor method. As a corollary, we propose the first stochastic tensor method for convex optimization and obtain sufficient m...
Article
Over the last two decades, the PageRank problem has received increased interest from the academic community as an efficient tool to estimate web-page importance in information retrieval. Despite numerous developments, the design of efficient optimization algorithms for the PageRank problem is still a challenge. This paper proposes three new algorit...
Preprint
Full-text available
Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with class...
Chapter
We consider the minimization problem of a sum of a number of functions having Lipshitz p-th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle...
Chapter
In this paper, we present a new Hyperfast Second-Order Method with convergence rate up to a logarithmic factor for the convex function with Lipshitz 3rd derivative. This method based on two ideas. The first comes from the superfast second-order scheme of Yu. Nesterov (CORE Discussion Paper 2020/07, 2020). It allows implementing the third-order sche...
Preprint
Full-text available
We propose an accelerated meta-algorithm, which allows to obtain accelerated methods for convex unconstrained minimization in different settings. As an application of the general scheme we propose nearly optimal methods for minimizing smooth functions with Lipschitz derivatives of an arbitrary order, as well as for smooth minimax optimization probl...
Preprint
Full-text available
In this paper, we present a new Hyperfast Second-Order Method with convergence rate $O(N^{-5})$ up to a logarithmic factor for the convex function with Lipshitz $3$rd derivative. This method based on two ideas. The first comes from the superfast second-order scheme of Yu. Nesterov (CORE Discussion Paper 2020/07, 2020). It allows implementing the th...
Article
In this paper, we consider resource allocation problem stated as a convex minimization problem with linear constraints. To solve this problem, we use gradient and accelerated gradient descent applied to the dual problem and prove the convergence rate both for the primal iterates and the dual iterates. We obtain faster convergence rates than the one...
Preprint
Full-text available
We consider the minimization problem of a sum of a number of functions having Lipshitz p-th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle...
Article
Full-text available
In this paper, we propose new first-order methods for minimization of a convex function on a simple convex set. We assume that the objective function is a composite function given as a sum of a simple convex function and a convex function with inexact Holder-continuous subgradient. We propose Universal Intermediate Gradient Method. Our method enjoy...
Article
Full-text available
We propose a simple way to explain Univerasal method of Yu. Nesterov. Based on this method and using the restart technique we propose Universal method for strictly convex optimization problems. We consider general proximal set up (not necessarily euclidian one).
Article
Full-text available
In this paper we propose three methods for solving the PageRank problem for the matrices with both row and column sparsity. All the methods leads to the convex optimization problem over the simplex. The first is based on the gradient descent in L1 norm instead of Euclidean one. The idea behind the second method is Frank--Wolfe conditional gradient...
Article
Full-text available
In the paper we generalize universal gradient method (Yu. Nesterov) to strongly convex case and to Intermediate gradient method (Devolder-Glineur-Nesterov). We also consider possible generalizations to stochastic and online context. We show how these results can be generalized to gradient-free method and method of random direction search. But the m...

Network

Cited By