Saverio Salzo

Saverio Salzo
Istituto Italiano di Tecnologia | IIT · The Laboratory for Computational Statistics and Machine Learning

PhD

About

35
Publications
4,691
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
971
Citations
Citations since 2017
25 Research Items
857 Citations
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
Additional affiliations
October 2020 - present
University College London
Position
  • Honorary Lecture
January 2016 - present
Istituto Italiano di Tecnologia and Massachusetts Institute of Technology
Position
  • PostDoc Position
January 2015 - December 2015
Università degli Studi di Genova
Position
  • PostDoc Position
Description
  • Regularization methods for learning in high dimensional data
Education
January 2008 - May 2012
Università degli Studi di Genova
Field of study
  • Computer Science
January 2002 - December 2002
September 1997 - July 2001
Università degli Studi di Bari Aldo Moro
Field of study
  • (pure) Mathematics

Publications

Publications (35)
Article
Full-text available
This paper investigates learning theory in abstract Banach spaces of features via regularized empirical risk minimization. The main result establishes the consistency of such learning scheme, under appropriate conditions on the loss function, the geometry of the feature space, the regularization function, and the regularization parameters. We focus...
Article
Full-text available
In this paper we study the variational problem associated to support vector regression in Banach function spaces. Using the Fenchel-Rockafellar duality theory, we give explicit formulation of the dual problem as well as of the related optimality conditions. Moreover, we provide a new computational framework for solving the problem which relies on a...
Article
Full-text available
We study the variable metric forward-backward splitting algorithm for convex minimization problems without the standard assumption of the Lipschitz continuity of the gradient. In this setting, we prove that, by requiring only mild assumptions on the smooth part of the objective function and using several types of backtracking line search procedures...
Preprint
Full-text available
In this work we study the method of Bregman projections for deterministic and stochastic convex feasibility problems with three types of control sequences for the selection of sets during the algorithmic procedure: greedy, random, and adaptive random. We analyze in depth the case of affine feasibility problems showing that the iterates generated by...
Article
Full-text available
In this paper, we study the convergence properties of a randomized block-coordinate descent algorithm for the minimization of a composite convex objective function, where the block-coordinates are updated asynchronously and randomly according to an arbitrary probability distribution. We prove that the iterates generated by the algorithm form a stoc...
Article
Full-text available
In this work, we study the method of randomized Bregman projections for stochastic convex feasibility problems, possibly with an infinite number of sets, in Euclidean spaces. Under very general assumptions, we prove almost sure convergence of the iterates to a random almost common point of the sets. We then analyze in depth the case of affine sets...
Preprint
In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this case the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected sto...
Preprint
We analyze a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, hyperparameter optimization and data poisoning adversarial attacks....
Preprint
Full-text available
In this paper, we study the convergence properties of a randomized block-coordinate descent algorithm for the minimization of a composite convex objective function, where the block-coordinates are updated asynchronously and randomly according to an arbitrary probability distribution. We prove that the iterates generated by the algorithm form a stoc...
Chapter
Full-text available
Convex optimization plays a key role in data sciences. The objective of this work is to provide basic tools and methods at the core of modern nonlinear convex optimization. Starting from the gradient descent method we will focus on a comprehensive convergence analysis for the proximal gradient algorithm and its state-of-the art variants, including...
Preprint
Full-text available
In this work we propose a batch version of the Greenkhorn algorithm for multimarginal regularized optimal transport problems. Our framework is general enough to cover, as particular cases, some existing algorithms like Sinkhorn and Greenkhorn algorithm for the bi-marginal setting, and (greedy) MultiSinkhorn for multimarginal optimal transport. We p...
Conference Paper
Full-text available
We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a para-metric fixed-point equation. Important instances arising in machine learning include hyperparame-ter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of...
Conference Paper
Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation sch...
Article
Full-text available
We study the block-coordinate forward–backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dime...
Preprint
Full-text available
Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems in the design of optimization algorithms for bilevel optimization is the efficient computation of the gradient of the upper-level objective (h...
Preprint
Full-text available
We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of th...
Preprint
Full-text available
Recently, classical kernel methods have been extended by the introduction of suitable tensor kernels so to promote sparsity in the solution of the underlying regression problem. Indeed, they solve an lp-norm regularization problem, with p=m/(m-1) and m even integer, which happens to be close to a lasso problem. However, a major drawback of the meth...
Conference Paper
We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distri...
Preprint
Full-text available
We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dime...
Preprint
We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distri...
Conference Paper
In many applications of finance, biology and sociology, complex systems involve entities interacting with each other. These processes have the peculiarity of evolving over time and of comprising latent factors, which influence the system without being explicitly measured. In this work we present latent variable time-varying graphical lasso (LTGL),...
Conference Paper
We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take eithe...
Preprint
Full-text available
In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exa...
Preprint
Full-text available
We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take eithe...
Article
Full-text available
In many applications of finance, biology and sociology, complex systems involve entities interacting with each other. These processes have the peculiarity of evolving over time and of comprising latent factors, which influence the system without being explicitly measured. In this work we present latent variable time-varying graphical lasso (LTGL),...
Article
Full-text available
In this paper, we discuss how a suitable family of tensor kernels can be used to efficiently solve nonparametric extensions of $\ell^p$ regularized learning methods. Our main contribution is proposing a fast dual algorithm, and showing that it allows to solve the problem efficiently. Our results contrast recent findings suggesting kernel methods ca...
Article
Full-text available
We investigate random design least-squares regression with prediction functions which are linear combination of elements of a possibly infinite-dimensional dictionary. We propose a new flexible composite regularization model, which makes it possible to apply various priors to the coefficients of the prediction function, including hard constraints....
Article
We present an algorithm for dictionary learning that is based on the alternating proximal algorithm studied by Attouch, Bolte, Redont, and Soubeyran (2010), coupled with a reliable and efficient dual algorithm for computation of the related proximity operators. This algorithm is suitable for a general dictionary learning model composed of a Bregman...
Article
Full-text available
We propose a convergence analysis of accelerated forward-backward splitting methods for composite function minimization, when the proximity operator is not available in closed form, and can only be computed up to a certain precision. We prove that the $1/k^2$ convergence rate for the function values can be achieved if the admissible errors are of a...
Article
The advent of Comparative Genomic Hybridization (CGH) data led to the development of new mathematical models and computational methods to automatically infer chromosomal alterations. In this work we tackle a standard clustering problem exploiting the good representation properties of a novel method based on dictionary learning. The identified dicti...
Article
Full-text available
We present inexact accelerated proximal point algorithms for minimizing a proper lower semicontinuous and convex function. We carry on a convergence analysis under different types of errors in the evaluation of the proximity operator, and we provide corresponding convergence rates for the objective function values. The proof relies on a generalizat...
Article
Full-text available
An extension of the Gauss-Newton algorithm is proposed to find local minimizers of penalized nonlinear least squares problems, under generalized Lipschitz assumptions. Convergence results of local type are obtained, as well as an estimate of the radius of the convergence ball. Some applications for solving constrained nonlinear equations are discus...
Conference Paper
We propose an algorithm for the construction of a nearly optimal integer to integer approximation of the Karhunen-Loeve Transform. The algorithm is based on the method of P. Hao and Q. Shi as described in [1] but-unlike described in the paper-we vary the pivoting in order to obtain a better approximation of the linear transform. We have then develo...

Network

Cited By