Dmitry Kovalev

Dmitry Kovalev
King Abdullah University of Science and Technology | KAUST · Department of Computer Science

About

27
Publications
1,016
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
29
Citations

Publications

Publications (27)
Preprint
Full-text available
Optimization problems under affine constraints appear in various areas of machine learning. We consider the task of minimizing a smooth strongly convex function $F(x)$ under the affine constraint $K x = b$, with an oracle providing evaluations of the gradient of $F$ and matrix-vector multiplications by $K$ and its transpose. We provide lower bounds...
Preprint
We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is...
Preprint
We propose a family of lossy integer compressions for Stochastic Gradient Descent (SGD) that do not communicate a single float. This is achieved by multiplying floating-point vectors with a number known to every device and then rounding to an integer number. Our theory shows that the iteration complexity of SGD does not change up to constant factor...
Preprint
Full-text available
We consider distributed convex-concave saddle point problems over arbitrary connected undirected networks and propose a decentralized distributed algorithm for their solution. The local functions distributed across the nodes are assumed to have global and local groups of variables. For the proposed algorithm we prove non-asymptotic convergence rate...
Preprint
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. In many scenarios communication between devices is energy demanding and time consuming and forms the bottleneck of the entire system. We propose a new randomized first-order method which tackles the communication bottleneck by appl...
Article
Recently, it has been shown how, on the basis of the usual accelerated gradient method for solving problems of smooth convex optimization, accelerated methods for more complex problems (with a structure) and problems that are solved using various local information about the behavior of a function (stochastic gradient, Hessian, etc.) can be obtained...
Preprint
Full-text available
In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and SGD with delayed updates (D-SGD). Via a single theorem, we derive the complexity results for all the methods tha...
Preprint
We study the problem of decentralized optimization over time-varying networks with strongly convex smooth cost functions. In our approach, nodes run a multi-step gossip procedure after making each gradient update, thus ensuring approximate consensus at each iteration, while the outer loop is based on accelerated Nesterov scheme. The algorithm achie...
Preprint
We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes a network. For this problem, lower bounds on the number of gradient computations and the number of communication rounds required to achieve $\varepsilon$ accuracy have recently been proven. We propose two new algorithms for this...
Preprint
Full-text available
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this...
Preprint
Full-text available
Since the late 1950's when quasi-Newton methods first appeared, they have become one of the most widely used and efficient algorithmic paradigms for unconstrained optimization. Despite their immense practical success, there is little theory that shows why these methods are so efficient. We provide a semi-local rate of convergence for the randomized...
Preprint
Full-text available
Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of iterations, there are no methods...
Preprint
We propose an accelerated version of stochastic variance reduced coordinate descent -- ASVRCD. As other variance reduced coordinate descent methods such as SEGA or SVRCD, our method can deal with problems that include a non-separable and non-smooth regularizer, while accessing a random block of partial derivatives in each iteration only. However, A...
Preprint
We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed. This problem is motivated by the practice of federated learning, where a large model stored in the cloud is compressed before it is sent to a mobile device, which then proceeds with training based on local data. We develop s...
Preprint
We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions. The first is a stochastic variant of Newton's method (SN), and the second is a stochastic variant of cubically regularized Newton's method (SCN). We establish local linear-quadr...
Preprint
We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution. Our method is a generalization of the Langevin algorithm to potentials expressed as the sum of one stochastic smooth term and multiple stochastic nonsmooth terms. In each iteration, our splitting technique only requires access...
Preprint
We consider a new extension of the extragradient method that is motivated by approximating implicit updates. Since in a recent work~\cite{chavdarova2019reducing} it was shown that the existing stochastic extragradient algorithm (called mirror-prox) of~\cite{juditsky2011solving} diverges on a simple bilinear problem, we prove guarantees for solving...
Preprint
We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We d...
Article
Full-text available
The Monteiro-Svaiter accelerated hybrid proximal extragradient method (2013) with one step of Newton’s method used at every iteration for the approximate solution of an auxiliary problem is considered. The Monteiro-Svaiter method is optimal (with respect to the number of gradient and Hessian evaluations for the optimized function) for sufficiently...
Preprint
Full-text available
We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes to compress (e.g.\ quantize or sparsify) the gradients, thereby introducing additional variance $\omega \geq...
Preprint
Full-text available
The stochastic variance-reduced gradient method (SVRG) and its accelerated variant (Katyusha) have attracted enormous attention in the machine learning community in the last few years due to their superior theoretical properties and empirical behaviour on training supervised machine learning models via the empirical risk minimization paradigm. A ke...
Article
The Monteiro–Svaiter accelerated hybrid proximal extragradient method (2013) with one step of Newton’s method used at every iteration for the approximate solution of an auxiliary problem is considered. The Monteiro–Svaiter method is optimal (with respect to the number of gradient and Hessian evaluations for the optimized function) for sufficiently...
Preprint
In this work we consider Monteiro-Svaiter accelerated hybrid proximal extragradient (A-HPE) framework and accelerated Newton proximal extragradient (A-NPE) framework.The last framework contains optimal methods for rather smooth convex optimization problems with second-order oracle. We generalize A-NPE framework for higher order derivative oracle (s...
Article
Full-text available
In this paper we discuss lower bounds for convergence of convex optimization methods of high order and attainability of this bounds. We formulate a hypothesis that covers all the cases. It is noticeable that we provide this statement without a proof. Newton method is the most famous method that uses gradient and Hessian of optimized function. Howev...
Article
Full-text available
The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number...