Zvonimir Bujanović’s research while affiliated with University of Zagreb and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (21)


Subspace embedding with random Khatri-Rao products and its application to eigensolvers
  • Preprint
  • File available

May 2024

·

42 Reads

Zvonimir Bujanović

·

·

·

Hei Yin Lam

Various iterative eigenvalue solvers have been developed to compute parts of the spectrum for a large sparse matrix, including the power method, Krylov subspace methods, contour integral methods, and preconditioned solvers such as the so called LOBPCG method. All of these solvers rely on random matrices to determine, e.g., starting vectors that have, with high probability, a non-negligible overlap with the eigenvectors of interest. For this purpose, a safe and common choice are unstructured Gaussian random matrices. In this work, we investigate the use of random Khatri-Rao products in eigenvalue solvers. On the one hand, we establish a novel subspace embedding property that provides theoretical justification for the use of such structured random matrices. On the other hand, we highlight the potential algorithmic benefits when solving eigenvalue problems with Kronecker product structure, as they arise frequently from the discretization of eigenvalue problems for differential operators on tensor product domains. In particular, we consider the use of random Khatri-Rao products within a contour integral method and LOBPCG. Numerical experiments indicate that the gains for the contour integral method strongly depend on the ability to efficiently and accurately solve (shifted) matrix equations with low-rank right-hand side. The flexibility of LOBPCG to directly employ preconditioners makes it easier to benefit from Khatri-Rao product structure, at the expense of having less theoretical justification.

Download

Speedup attained by Algorithm 4 for Example 2. Time needed by Advanpix for computing high-precision Schur decomposition divided by time needed by Algorithm 4 (including the time needed to compute Schur decomposition in double precision)
Absolute values of entries of the matrix L obtained from solving the triangular matrix (6) within Algorithm 4 applied to the matrix from Example 3. Left: Second iteration of Algorithm 4. Right: Sixth iteration. Color indicates base-10 logarithm of each matrix element’s absolute value
Iterative refinement of Schur decompositions

June 2022

·

76 Reads

·

6 Citations

Numerical Algorithms

The Schur decomposition of a square matrix A is an important intermediate step of state-of-the-art numerical algorithms for addressing eigenvalue problems, matrix functions, and matrix equations. This work is concerned with the following task: Compute a (more) accurate Schur decomposition of A from a given approximate Schur decomposition. This task arises, for example, in the context of parameter-dependent eigenvalue problems and mixed precision computations. We have developed a Newton-like algorithm that requires the solution of a triangular matrix equation and an approximate orthogonalization step in every iteration. We prove local quadratic convergence for matrices with mutually distinct eigenvalues and observe fast convergence in practice. In a mixed low-high precision environment, our algorithm essentially reduces to only four high-precision matrix-matrix multiplications per iteration. When refining double to quadruple precision, it often needs only 3–4 iterations, which reduces the time of computing a quadruple precision Schur decomposition by up to a factor of 10–20.


Iterative Refinement of Schur decompositions

March 2022

·

90 Reads

The Schur decomposition of a square matrix A is an important intermediate step of state-of-the-art numerical algorithms for addressing eigenvalue problems, matrix functions, and matrix equations. This work is concerned with the following task: Compute a (more) accurate Schur decomposition of A from a given approximate Schur decomposition. This task arises, for example, in the context of parameter-dependent eigenvalue problems and mixed precision computations. We have developed a Newton-like algorithm that requires the solution of a triangular matrix equation and an approximate orthogonalization step in every iteration. We prove local quadratic convergence for matrices with mutually distinct eigenvalues and observe fast convergence in practice. In a mixed low-high precision environment, our algorithm essentially reduces to only four high-precision matrix-matrix multiplications per iteration. When refining double to quadruple precision, it often needs only 3-4 iterations, which reduces the time of computing a quadruple precision Schur decomposition by up to a factor of 10-20.



Figure 2: Performance of trace estimators for 8 different matrices. See Section 4.1 for details.
Norm and trace estimation with random rank-one vectors

April 2020

·

228 Reads

A few matrix-vector multiplications with random vectors are often sufficient to obtain reasonably good estimates for the norm of a general matrix or the trace of a symmetric positive semi-definite matrix. Several such probabilistic estimators have been proposed and analyzed for standard Gaussian and Rademacher random vectors. In this work, we consider the use of rank-one random vectors, that is, Kronecker products of (smaller) Gaussian or Rademacher vectors. It is not only cheaper to sample such vectors but it can sometimes also be much cheaper to multiply a matrix with a rank-one vector instead of a general vector. In this work, theoretical and numerical evidence is given that the use of rank-one instead of unstructured random vectors still leads to good estimates. In particular, it is shown that our rank-one estimators multiplied with a modest constant constitute, with high probability, upper bounds of the quantity of interest. Partial results are provided for the case of lower bounds. The application of our techniques to condition number estimation for matrix functions is illustrated.



New robust ScaLAPACK routine for computing the QR factorization with column pivoting

October 2019

·

30 Reads

In this note we describe two modifications of the ScaLAPACK subroutines PxGEQPF for computing the QR factorization with the Businger-Golub column pivoting. First, we resolve a subtle numerical instability in the same way as we have done it for the LAPACK subroutines xGEQPF, xGEQP3 in 2006. [LAPACK Working Note 176 (2006); ACM Trans. Math. Softw. 2008]. The problem originates in the first release of LINPACK in the 1970's: due to severe cancellations in the down-dating of partial column norms, the pivoting procedure may be in the dark completely about the true norms of the pivot column candidates. This may cause miss-pivoting, and as a result loss of the important rank revealing structure of the computed triangular factor, with severe consequences on other solvers that rely on the rank revealing pivoting. The instability is so subtle that, e.g., inserting a WRITE statement or changing the process topology can drastically change the result. Secondly, we also correct a programming error in the complex subroutines PCGEQPF, PZGEQPF, which also causes wrong pivoting because of erroneous use of PSCNRM2, PDZNRM2 for the explicit norm computation.


A numerical comparison of solvers for large-scale, continuous-time algebraic Riccati equations

November 2018

·

104 Reads

In this paper, we discuss numerical methods for solving large-scale continuous-time algebraic Riccati equations. These methods have been the focus of intensive research in recent years, and significant progress has been made in both the theoretical understanding and efficient implementation of various competing algorithms. There are several goals of this manuscript: first, to gather in one place an overview of different approaches for solving large-scale Riccati equations, and to point to the recent advances in each of them. Second, to analyze and compare the main computational ingredients of these algorithms, to detect their strong points and their potential bottlenecks. And finally, to compare the effective implementations of all methods on a set of relevant benchmark examples, giving an indication of their relative performance.


Each point σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} of the complex plane is colored according to residual reduction obtained when σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} is taken as the shift in the 14th iteration of RADI. a Ratios ρproj(σ)=‖R14proj(σ)‖/‖R13proj‖\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho ^{\mathsf {proj}}(\sigma )=\Vert R^{\mathsf {proj}}_{14}(\sigma )\Vert / \Vert R^{\mathsf {proj}}_{13}\Vert $$\end{document} for the projected equation of dimension 13. b Ratios ρ(σ)=‖R14(σ)‖/‖R13‖\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho (\sigma )=\Vert R_{14}(\sigma )\Vert / \Vert R_{13}\Vert $$\end{document} for the original equation of dimension 10648
Algorithm performances for benchmark CUBE (n=10648,m=p=10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=10648, m=p=10$$\end{document}). a Relative residual versus the subspace dimension used by an algorithm. b Relative residual versus time
Algorithm performances for benchmark IFISS (n=66049,m=p=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=66049, m=p=5$$\end{document}). a Relative residual versus the subspace dimension used by an algorithm. b Relative residual versus time
RADI: a low-rank ADI-type algorithm for large scale algebraic Riccati equations

February 2018

·

257 Reads

·

57 Citations

Numerische Mathematik

This paper introduces a new algorithm for solving large-scale continuous-time algebraic Riccati equations (CARE). The advantage of the new algorithm is in its immediate and efficient low-rank formulation, which is a generalization of the Cholesky-factored variant of the Lyapunov ADI method. We discuss important implementation aspects of the algorithm, such as reducing the use of complex arithmetic and shift selection strategies. We show that there is a very tight relation between the new algorithm and three other algorithms for CARE previously known in the literature—all of these seemingly different methods in fact produce exactly the same iterates when used with the same parameters: they are algorithmically different descriptions of the same approximation sequence to the Riccati solution.


Table 1 : Execution time of HouseHT relative to DGGHRD for various benchmark examples (Test Suite 2), on a single core and on eight cores. 
Table 2 : Execution time of HouseHT relative to DGGHD3 for matrix pencils with saddlepoint structure (Test Suite 4). The numbers in parentheses correspond to results obtained with prepro- cessing. 
A Householder-Based Algorithm for Hessenberg-Triangular Reduction

October 2017

·

784 Reads

·

7 Citations

SIAM Journal on Matrix Analysis and Applications

The QZ algorithm for computing eigenvalues and eigenvectors of a matrix pencil AλBA - \lambda B requires that the matrices first be reduced to Hessenberg-triangular (HT) form. The current method of choice for HT reduction relies entirely on Givens rotations partially accumulated into small dense matrices which are subsequently applied using matrix multiplication routines. A non-vanishing fraction of the total flop count must nevertheless still be performed as sequences of overlapping Givens rotations alternatingly applied from the left and from the right. The many data dependencies associated with this computational pattern leads to inefficient use of the processor and makes it difficult to parallelize the algorithm in a scalable manner. In this paper, we therefore introduce a fundamentally different approach that relies entirely on (large) Householder reflectors partially accumulated into (compact) WY representations. Even though the new algorithm requires more floating point operations than the state of the art algorithm, extensive experiments on both real and synthetic data indicate that it is still competitive, even in a sequential setting. The new algorithm is conjectured to have better parallel scalability, an idea which is partially supported by early small-scale experiments using multi-threaded BLAS. The design and evaluation of a parallel formulation is future work.


Citations (13)


... Topics related to eigenvalues and singular values are more complicated. Still, there exist several literature on this matter, including the symmetric eigensolvers [13,21,23,24,25] and nonsymmetric eigensolvers [5]. ...

Reference:

Mixed precision iterative refinement for least squares with linear equality constraints and generalized least squares problems
Iterative refinement of Schur decompositions

Numerical Algorithms

... The ARE (3.24) reduces to a linear Lyapunov equation whenever B = 0. AREs, and its adjoint variants which we do not consider here, are important nonlinear matrix equations that appear in many problems in control theory [21]. In such problems, typically one is only interested in the unique stabilizing solution X such that the spectrum of ( [12,24]; as cited by [5]. ...

A Numerical Comparison of Different Solvers for Large-Scale, Continuous-Time Algebraic Riccati Equations and LQR Problems
  • Citing Article
  • April 2020

SIAM Journal on Scientific Computing

... The level-3 BLAS potential in this approach is limited since distinct shifts results in distinct RQ factorizations. This problem has been addressed by Bosner et al. [6,Section 3] and gives the second idea. Level-3 BLAS can be introduced in spite of distinct shifts. ...

Parallel Solver for Shifted Systems in a Hybrid CPU--GPU Framework

SIAM Journal on Scientific Computing

... The first formulation of the ADI that iterates the residual alongside is shown in [7] and [50] show its reinterpretation in the Krylov subspace projection setting. Both, pointing the way to a generalization for Riccati equations: the RADI [4]. The scenarios above also occur for this algorithm. ...

RADI: a low-rank ADI-type algorithm for large scale algebraic Riccati equations

Numerische Mathematik

... However, this is not possible for large-scale problems where the iterates are too large (i.e., d is too large) to fit in memory. Instead, iterative solvers of large-scale matrix equations, such as RADI, exploit the fact that, for many practical problems, the desired solution can be well-approximated by a low-rank matrix [3]. Such solvers produce low-rank iterates in factorized form: ...

On the solution of large-scale algebraic Riccati equations by using low-dimensional invariant subspaces
  • Citing Article
  • January 2016

Linear Algebra and its Applications

... , m}, q m = q k ·q m−k , and p k ∈ P k is a polynomial with (formal) roots (infinity allowed) in the region we want to filter out. In applications this technique is usually used to deal with large memory requirements or orthogonalization costs for V m+1 , or to purge unwanted or spurious eigenvalues (see, e.g., [19,23,24] and the references given therein). Implicit filtering for RADs was first introduced in [24] and further studied in [23]. ...

A new framework for implicit restarting of the Krylov–Schur algorithm
  • Citing Article
  • July 2014

Numerical Linear Algebra with Applications

... The algorithms described in [8], [10] overcome this obstacle. In the first phase, the pair (A, B) is reduced to a so-called controller Hessenberg form: an orthogonal matrix Q is constructed such that A = Q * AQ is m-Hessenberg ( A i,j = 0 for all i > j + m), and B = Q * B is upper triangular; the matrix C = CQ has no particular structure. ...

Efficient Generalized Hessenberg Form and Applications

ACM Transactions on Mathematical Software