Stephen A. Vavasis’s research while affiliated with University of Waterloo and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (126)


Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization
  • Preprint
  • File available

January 2025

·

5 Reads

·

Stephen Vavasis

Parameter estimation is a fundamental challenge in machine learning, crucial for tasks such as neural network weight fitting and Bayesian inference. This paper focuses on the complexity of estimating translation μRl\boldsymbol{\mu} \in \mathbb{R}^l and shrinkage σR++\sigma \in \mathbb{R}_{++} parameters for a distribution of the form 1σlf0(xμσ)\frac{1}{\sigma^l} f_0 \left( \frac{\boldsymbol{x} - \boldsymbol{\mu}}{\sigma} \right), where f0f_0 is a known density in Rl\mathbb{R}^l given n samples. We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain ε\varepsilon-approximations for arbitrary ε>0\varepsilon > 0 within poly(1ε)\text{poly} \left( \frac{1}{\varepsilon} \right) time using the Wasserstein distance.

Download


Nonlinear conjugate gradient for smooth convex functions

June 2024

·

11 Reads

·

5 Citations

Mathematical Programming Computation

The method of nonlinear conjugate gradients (NCG) is widely used in practice for unconstrained optimization, but it satisfies weak complexity bounds at best when applied to smooth convex functions. In contrast, Nesterov’s accelerated gradient (AG) method is optimal up to constant factors for this class. However, when specialized to quadratic function, conjugate gradient is optimal in a strong sense among function-gradient methods. Therefore, there is seemingly a gap in the menu of available algorithms: NCG, the optimal algorithm for quadratic functions that also exhibits good practical performance for general functions, has poor complexity bounds compared to AG. We propose an NCG method called C+AG (“conjugate plus accelerated gradient”) to close this gap, that is, it is optimal for quadratic functions and still satisfies the best possible complexity bound for more general smooth convex functions. It takes conjugate gradient steps until insufficient progress is made, at which time it switches to accelerated gradient steps, and later retries conjugate gradient. The proposed method has the following theoretical properties: (i) It is identical to linear conjugate gradient (and hence terminates finitely) if the objective function is quadratic; (ii) Its running-time bound is O(ϵ1/2)O(\epsilon ^{-1/2}) gradient evaluations for an L-smooth convex function, where ϵ\epsilon is the desired residual reduction, (iii) Its running-time bound is O(L/ln(1/ϵ))O(\sqrt{L/\ell }\ln (1/\epsilon )) if the function is both L-smooth and \ell -strongly convex. We also conjecture and outline a proof that a variant of the method has the property: (iv) It is n-step quadratically convergent for a function whose second derivative is smooth and invertible at the optimizer. Note that the bounds in (ii) and (iii) match AG and are the best possible, i.e., they match lower bounds up to constant factors for the classes of functions under consideration. On the other hand, (i) and (iv) match NCG. In computational tests, the function-gradient evaluation count for the C+AG method typically behaves as whichever is better of AG or classical NCG. In some test cases it outperforms both.


MGProx: A nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization

May 2024

·

83 Reads

We study the combination of proximal gradient descent with multigrid for solving a class of possibly nonsmooth strongly convex optimization problems. We propose a multigrid proximal gradient method called MG- Prox, which accelerates the proximal gradient method by multigrid, based on using hierarchical information of the optimization problem. MGProx applies a newly introduced adaptive restriction operator to simplify the Minkowski sum of subdifferentials of the nondifferentiable objective function across different levels. We provide a theoretical characterization of MGProx. First we show that the MGProx update operator exhibits a fixed-point property. Next, we show that the coarse correction is a descent direction for the fine variable of the original fine level problem in the general nonsmooth case. Lastly, under some assumptions we provide the convergence rate for the algorithm. In the numerical tests on the Elastic Obstacle Problem, which is an example of nonsmooth convex optimization prob- lem where multigrid method can be applied, we show that MGProx has a faster convergence speed than competing methods


Computational Complexity of Decomposing a Symmetric Matrix as a Sum of Positive Semidefinite and Diagonal Matrices

December 2023

·

24 Reads

·

4 Citations

Foundations of Computational Mathematics

We study several variants of decomposing a symmetric matrix into a sum of a low-rank positive-semidefinite matrix and a diagonal matrix. Such decompositions have applications in factor analysis, and they have been studied for many decades. On the one hand, we prove that when the rank of the positive-semidefinite matrix in the decomposition is bounded above by an absolute constant, the problem can be solved in polynomial time. On the other hand, we prove that, in general, these problems as well as their certain approximation versions are all NP-hard. Finally, we prove that many of these low-rank decomposition problems are complete in the first-order theory of the reals, i.e., given any system of polynomial equations, we can write down a low-rank decomposition problem in polynomial time so that the original system has a solution iff our corresponding decomposition problem has a feasible solution of certain (lowest) rank.


Re-embedding data to strengthen recovery guarantees of clustering

January 2023

·

12 Reads

We propose a clustering method that involves chaining four known techniques into a pipeline yielding an algorithm with stronger recovery guarantees than any of the four components separately. Given n points in Rd\mathbb R^d, the first component of our pipeline, which we call leapfrog distances, is reminiscent of density-based clustering, yielding an n×nn\times n distance matrix. The leapfrog distances are then translated to new embeddings using multidimensional scaling and spectral methods, two other known techniques, yielding new embeddings of the n points in Rd\mathbb R^{d'}, where dd' satisfies ddd'\ll d in general. Finally, sum-of-norms (SON) clustering is applied to the re-embedded points. Although the fourth step (SON clustering) can in principle be replaced by any other clustering method, our focus is on provable guarantees of recovery of underlying structure. Therefore, we establish that the re-embedding improves recovery SON clustering, since SON clustering is a well-studied method that already has provable guarantees.


Computational complexity of decomposing a symmetric matrix as a sum of positive semidefinite and diagonal matrices

September 2022

·

19 Reads

·

2 Citations

We study several variants of decomposing a symmetric matrix into a sum of a low-rank positive semidefinite matrix and a diagonal matrix. Such decompositions have applications in factor analysis and they have been studied for many decades. On the one hand, we prove that when the rank of the positive semidefinite matrix in the decomposition is bounded above by an absolute constant, the problem can be solved in polynomial time. On the other hand, we prove that, in general, these problems as well as their certain approximation versions are all NP-hard. Finally, we prove that many of these low-rank decomposition problems are complete in the first-order theory of the reals; i.e., given any system of polynomial equations, we can write down a low-rank decomposition problem in polynomial time so that the original system has a solution iff our corresponding decomposition problem has a feasible solution of certain (lowest) rank.


Computational performance of the DCA under different settings
Recovery performance of different algorithms given different sizes of the linear map
Computational performance of different algorithms given different linear map sizes
Recovery performance of k2 and t-nuclear for different ranks
Computational performance of k2 and t-nuclear for different ranks

+2

Low-rank matrix recovery with Ky Fan 2-k-norm

April 2022

·

66 Reads

·

3 Citations

Journal of Global Optimization

Low-rank matrix recovery problem is difficult due to its non-convex properties and it is usually solved using convex relaxation approaches. In this paper, we formulate the non-convex low-rank matrix recovery problem exactly using novel Ky Fan 2- k -norm-based models. A general difference of convex functions algorithm (DCA) is developed to solve these models. A proximal point algorithm (PPA) framework is proposed to solve sub-problems within the DCA, which allows us to handle large instances. Numerical results show that the proposed models achieve high recoverability rates as compared to the truncated nuclear norm method and the alternating bilinear optimization approach. The results also demonstrate that the proposed DCA with the PPA framework is efficient in handling larger instances.


Nonlinear conjugate gradient for smooth convex functions

November 2021

·

15 Reads

The method of nonlinear conjugate gradients (NCG) is widely used in practice for unconstrained optimization, but it satisfies weak complexity bounds at best when applied to smooth convex functions. In contrast, Nesterov's accelerated gradient (AG) method is optimal up to constant factors for this class. However, when specialized to quadratic function, conjugate gradient is optimal in a strong sense among function-gradient methods. Therefore, there is seemingly a gap in the menu of available algorithms: NCG, the optimal algorithm for quadratic functions that also exhibits good practical performance for general functions, has poor complexity bounds compared to AG. We propose an NCG method called C+AG ("conjugate plus accelerated gradient") to close this gap, that is, it is optimal for quadratic functions and still satisfies the best possible complexity bound for more general smooth convex functions. It takes conjugate gradient steps until insufficient progress is made, at which time it switches to accelerated gradient steps, and later retries conjugate gradient. The proposed method has the following theoretical properties: (i) It is identical to linear conjugate gradient (and hence terminates finitely) if the objective function is quadratic; (ii) Its running-time bound is O(\eps^{-1/2}) gradient evaluations for an L-smooth convex function, where \eps is the desired residual reduction, (iii) Its running-time bound is O(\sqrt{L/\ell}\ln(1/\eps)) if the function is both L-smooth and \ell-strongly convex. In computational tests, the function-gradient evaluation count for the C+AG method typically behaves as whichever is better of AG or classical NCG. In most test cases it outperforms both.


Robust Correlation Clustering with Asymmetric Noise

October 2021

·

7 Reads

Graph clustering problems typically aim to partition the graph nodes such that two nodes belong to the same partition set if and only if they are similar. Correlation Clustering is a graph clustering formulation which: (1) takes as input a signed graph with edge weights representing a similarity/dissimilarity measure between the nodes, and (2) requires no prior estimate of the number of clusters in the input graph. However, the combinatorial optimization problem underlying Correlation Clustering is NP-hard. In this work, we propose a novel graph generative model, called the Node Factors Model (NFM), which is based on generating feature vectors/embeddings for the graph nodes. The graphs generated by the NFM contain asymmetric noise in the sense that there may exist pairs of nodes in the same cluster which are negatively correlated. We propose a novel Correlation Clustering algorithm, called \anormd, using techniques from semidefinite programming. Using a combination of theoretical and computational results, we demonstrate that \texttt{\ell_2-norm-diag} recovers nodes with sufficiently strong cluster membership in graph instances generated by the NFM, thereby making progress towards establishing the provable robustness of our proposed algorithm.


Citations (76)


... Constraints were subsequently incorporated in two distinct ways: (i) by allowing composite objectives of the form f (x) + g(x), where f is a smooth data term and g encodes the constraint as an indicator function g(x) = 0, x ∈ C ∞, x ̸ = C , or (ii) by building the constraints directly into the multilevel design. Notable instances of the first approach [2,28,22,23] all assume Lipschitz smoothness of the gradient ∇f (x) -a significant limitation in imaging contexts. A key example arises in Poisson linear inverse problems, which show up naturally whenever the imaging process involves counting photons arriving in the image domain [35,15], such as image deconvolution in microscopy and astronomy, or tomographic reconstruction in PET. ...

Reference:

Multilevel Bregman Proximal Gradient Descent
MGProx: A Nonsmooth Multigrid Proximal Gradient Method with Adaptive Restriction for Strongly Convex Optimization
  • Citing Article
  • August 2024

SIAM Journal on Optimization

... While NCG performs well for general functions, its complexity bounds are inferior to those of AG. Karimi et al. [16] introduced the Conjugate plus Accelerated Gradient (C + AG) method, which integrates conjugate gradient and accelerated gradient steps. This approach is optimal for quadratic functions and achieves the best possible complexity bound for smooth convex functions by dynamically switching between the two types of steps based on progress. ...

Nonlinear conjugate gradient for smooth convex functions
  • Citing Article
  • June 2024

Mathematical Programming Computation

... Important ∃R-completeness results include the realizability of abstract order types [40,52], geometric linkages [45], and the recognition of geometric intersection graphs, as further discussed below. More results concern graph drawing [20,21,31,46], the Hausdorff distance [27], polytopes [19,43], Nash-equilibria [8,10,11,24,48], training neural networks [4,9], matrix factorization [17,49,50,51,58], continuous constraint satisfaction problems [38], geometric packing [5], the art gallery problem [2,56], and covering polygons with convex polygons [1]. ...

Computational Complexity of Decomposing a Symmetric Matrix as a Sum of Positive Semidefinite and Diagonal Matrices
  • Citing Article
  • December 2023

Foundations of Computational Mathematics

... 10 Moreover, the convergence of modern IPM is super linear and the number of iterations for desirable convergence is almost unaffected by the number of unknowns. 11,12 In light of its advantages, this solution strategy has been extended for solving many challenging problems in geotechnical and geological engineering such as limit analysis, 13,14 elastoplastic analysis, 10,[15][16][17] viscoplastic analysis, 18,19 contact analysis 20,21 , analyses of fracture propagation 22,23 , etc. Many recent publications indicate that this MP strategy is very efficient when analyzing engineering problems with plasticity. ...

Second-order cone interior-point method for quasistatic and moderate dynamic cohesive fracture

Computer Methods in Applied Mechanics and Engineering

... The inaccuracy may lead to the failure of known properties of sum-of-norms clustering such as the recovery of a mixture of Gaussians and the agglomeration property. It has been established that for the appropriate choice of λ, (1) exactly recovers a mixture of Gaussians due to Panahi et al. [10], Sun et al. [13], and Jiang et al. [7]. However, it is unknown if the recovery result still holds when the approximate test is applied. ...

Recovery of a mixture of Gaussians by sum-of-norms clustering

... We remark that C+AG requires prior knowledge of L, ℓ; we return to this point in Section 9. In the case of a smooth convex function that is not strongly convex, we take ℓ = 0. We conclude this introductory section with a few remarks about our previous unpublished manuscript [11]. That work explored connections between AG, GD, and LCG, and proposed a hybrid of NCG and GD. ...

A single potential governing convergence of conjugate gradient, accelerated gradient and geometric descent
  • Citing Article
  • December 2017

... A paradigmatic class of optimization problems is that of Quadratic Unconstrained Binary Optimization (QUBO) [7], where the loss function is a quadratic form of binary variables. Despite their simple formulation, QUBO problems are known to be computationally challenging, as they belong to the NP-Hard complexity class [8,9]. This means that no algorithm is known to solve an arbitrary QUBO instance in polynomial time, and the required computational resources grow exponentially with the problem size. ...

Complexity theory: quadratic programmingComplexity Theory: Quadratic Programming
  • Citing Chapter
  • January 2001

... It is worth mentioning that a unification of the conjugate-gradient method, i.e. Hestenes and Stiefel [5], was proposed by Karimi and co-workers [6], with a state-of-the-art paper on first-order methods published in 2020, cf. Drori and Taylor [7] pointing out to a unification of the first-order algorithms. ...

A unified convergence bound for conjugate gradient and accelerated gradient
  • Citing Article
  • May 2016

... Indeed, the discontinuity is manifested in the traction vector in multiaxial states. It has been termed a time discontinuity (Papoulia et al. 2003; Sam et al. 2005 ) because it happens at the time of activation of an interface; however, it is not a discontinuity in time, as many known discontinuities are, e.g., shock waves, but rather the governing equations are discontinuous functions of the displacements. Differential equations which are discontinuous in the solution variable are known to suffer from lack of a solution or nonuniqueness of solution, e.g., Hairer et al. (1993). ...

Obtaining initially rigid cohesive finite element models that are temporally convergent (vol 72, pg 2247, 2005)
  • Citing Article
  • May 2006

Engineering Fracture Mechanics