## About

40

Publications

6,210

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

1,297

Citations

Introduction

Additional affiliations

Education

January 2008 - May 2012

January 2002 - December 2002

September 1997 - July 2001

## Publications

Publications (40)

This paper investigates learning theory in abstract Banach spaces of features
via regularized empirical risk minimization. The main result establishes the
consistency of such learning scheme, under appropriate conditions on the loss
function, the geometry of the feature space, the regularization function, and
the regularization parameters. We focus...

In this paper we study the variational problem associated to support vector regression in Banach function spaces. Using the Fenchel-Rockafellar duality theory, we give explicit formulation of the dual problem as well as of the related optimality conditions. Moreover, we provide a new computational framework for solving the problem which relies on a...

We study the variable metric forward-backward splitting algorithm for convex minimization problems without the standard assumption of the Lipschitz continuity of the gradient. In this setting, we prove that, by requiring only mild assumptions on the smooth part of the objective function and using several types of backtracking line search procedures...

In this work we study the method of Bregman projections for deterministic and stochastic convex feasibility problems with three types of control sequences for the selection of sets during the algorithmic procedure: greedy, random, and adaptive random. We analyze in depth the case of affine feasibility problems showing that the iterates generated by...

In this paper, we provide novel optimal (or near optimal) convergence rates in expectation for the last iterate of a clipped version of the stochastic subgradient method. We consider nonsmooth convex problems, over possibly unbounded domains, under heavy-tailed noise that only possesses the first $p$ moments for $p \in (1,2]$. Our rates are of the...

In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms sin...

In recent years, bilevel approaches have become very popular to efficiently estimate high-dimensional hyperparameters of machine learning models. However, to date, binary parameters are handled by continuous relaxation and rounding strategies, which could lead to inconsistent solutions. In this context, we tackle the challenging optimization of mix...

In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms sin...

In this paper, we study the convergence properties of a randomized block-coordinate descent algorithm for the minimization of a composite convex objective function, where the block-coordinates are updated asynchronously and randomly according to an arbitrary probability distribution. We prove that the iterates generated by the algorithm form a stoc...

In this work, we study the method of randomized Bregman projections for stochastic convex feasibility problems, possibly with an infinite number of sets, in Euclidean spaces. Under very general assumptions, we prove almost sure convergence of the iterates to a random almost common point of the sets. We then analyze in depth the case of affine sets...

In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this case the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected sto...

We analyze a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, hyperparameter optimization and data poisoning adversarial attacks....

In this paper, we study the convergence properties of a randomized block-coordinate descent algorithm for the minimization of a composite convex objective function, where the block-coordinates are updated asynchronously and randomly according to an arbitrary probability distribution. We prove that the iterates generated by the algorithm form a stoc...

Convex optimization plays a key role in data sciences. The objective of this work is to provide basic tools and methods at the core of modern nonlinear convex optimization. Starting from the gradient descent method we will focus on a comprehensive convergence analysis for the proximal gradient algorithm and its state-of-the art variants, including...

In this work we propose a batch version of the Greenkhorn algorithm for multimarginal regularized optimal transport problems. Our framework is general enough to cover, as particular cases, some existing algorithms like Sinkhorn and Greenkhorn algorithm for the bi-marginal setting, and (greedy) MultiSinkhorn for multimarginal optimal transport. We p...

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a para-metric fixed-point equation. Important instances arising in machine learning include hyperparame-ter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of...

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation sch...

We study the block-coordinate forward–backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dime...

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems in the design of optimization algorithms for bilevel optimization is the efficient computation of the gradient of the upper-level objective (h...

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of th...

Recently, classical kernel methods have been extended by the introduction of suitable tensor kernels so to promote sparsity in the solution of the underlying regression problem. Indeed, they solve an lp-norm regularization problem, with p=m/(m-1) and m even integer, which happens to be close to a lasso problem. However, a major drawback of the meth...

We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distri...

We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dime...

We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distri...

In many applications of finance, biology and sociology, complex systems involve entities interacting with each other. These processes have the peculiarity of evolving over time and of comprising latent factors, which influence the system without being explicitly measured. In this work we present latent variable time-varying graphical lasso (LTGL),...

We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take eithe...

In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exa...

We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take eithe...

In many applications of finance, biology and sociology, complex systems involve entities interacting with each other. These processes have the peculiarity of evolving over time and of comprising latent factors, which influence the system without being explicitly measured. In this work we present latent variable time-varying graphical lasso (LTGL),...

In this paper, we discuss how a suitable family of tensor kernels can be used to efficiently solve nonparametric extensions of $\ell^p$ regularized learning methods. Our main contribution is proposing a fast dual algorithm, and showing that it allows to solve the problem efficiently. Our results contrast recent findings suggesting kernel methods ca...

We investigate random design least-squares regression with prediction
functions which are linear combination of elements of a possibly
infinite-dimensional dictionary. We propose a new flexible composite
regularization model, which makes it possible to apply various priors to the
coefficients of the prediction function, including hard constraints....

We present an algorithm for dictionary learning that is based on the alternating proximal algorithm studied by Attouch, Bolte, Redont, and Soubeyran (2010), coupled with a reliable and efficient dual algorithm for computation of the related proximity operators. This algorithm is suitable for a general dictionary learning model composed of a Bregman...

We propose a convergence analysis of accelerated forward-backward splitting methods for composite function minimization, when the proximity operator is not available in closed form, and can only be computed up to a certain precision. We prove that the $1/k^2$ convergence rate for the function values can be achieved if the admissible errors are of a...

The advent of Comparative Genomic Hybridization (CGH) data led to the development of new mathematical models and computational methods to automatically infer chromosomal alterations. In this work we tackle a standard clustering problem exploiting the good representation properties of a novel method based on dictionary learning. The identified dicti...

We present inexact accelerated proximal point algorithms for minimizing a proper lower semicontinuous and convex function. We carry on a convergence analysis under different types of errors in the evaluation of the proximity operator, and we provide corresponding convergence rates for the objective function values. The proof relies on a generalizat...

An extension of the Gauss-Newton algorithm is proposed to find local
minimizers of penalized nonlinear least squares problems, under generalized
Lipschitz assumptions. Convergence results of local type are obtained, as well
as an estimate of the radius of the convergence ball. Some applications for
solving constrained nonlinear equations are discus...

We propose an algorithm for the construction of a nearly optimal integer to integer approximation of the Karhunen-Loeve Transform. The algorithm is based on the method of P. Hao and Q. Shi as described in [1] but-unlike described in the paper-we vary the pivoting in order to obtain a better approximation of the linear transform. We have then develo...