BookPDF Available

TEMPLATES for the Solution of Linear Systems: Building Blocks for Iterative Methods

Authors:

Abstract

We have divided this book into five main chapters. Chapter 1 gives the motivation for this book and the use of templates. Chapter 2 describes stationary and nonstationary iterative methods. In this chapter we present both historical development and state-of-the-art methods for solving some of the most challenging computational problems facing researchers. Chapter 3 focuses on preconditioners. Many iterative methods depend in part on preconditioners to improve performance and ensure fast convergence. Chapter 4 provides a glimpse of issues related to the use of iterative methods. This chapter, like the preceding, is especially recommended for the experienced user who wishes to have further guidelines for tailoring a specic code to a particular machine. It includes information on complex systems, stopping criteria, data storage formats, and parallelism. Chapter 5 includes overviews of related topics such as the close connection between the Lanczos algorithm and the Conjugate Gradient algorithm, block iterative methods, red/black orderings, domain decomposition methods, multigrid-likemethods, and rowprojection schemes. The Appendices contain information on how the templates and BLAS software can be obtained. A glossary of important terms used in the book is also provided.
A preview of the PDF is not available
... While iterative methods require significantly less memory than direct ones, 3D nonsymmetric problems can still prove challenging. As shown in [40,6], iterative solvers from the Krylov family approach the issue of nonsymmetry in two different ways, leading to two groups of algorithms. One of them, with biconjugate gradient (BiCG) as a concrete example, uses relatively inexpensive iterations with a constant cost, but may have nonmonotonic convergence. ...
... Choices other than FOM could be algorithms that deal with nonsymmetry using biorthogonalization, such as BiCG and its successors, which are described in [40,6]. For example, when solving nonlinear equations, one can employ a linearization via Chebyshev interpolation, with the resulting system to be solved by BiCG, as done in [11] for parameterized systems. ...
... This is due to the Schur complement potentially having a much higher condition number and leading to faster accumulation of errors. See [6,Chapter 2.3.3], [40,Chapter 8.1], and [37] for similar discussions regarding the original CRAIG and CG on the normal equation. ...
... High memory consumption: A matrix of shape × is generally referred to as sparse if the number of its nnzs is small enough compared to ( ). Compressed storage formats are used in SpMM, such as Compressed Sparse Row (CSR) [47] and Coordinate (COO) [48] format, to make it possible to process sparse matrices with very large scales. SpMM is a memory-bound kernel for the irregular data access pattern brought by sparse data structures. ...
Preprint
Full-text available
General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM acceleration. However, in order to fully unleash the power of hardware performance, systematic optimization is required. In this paper, we propose Acc-SpMM, a high-performance SpMM library on TCs, with multiple optimizations, including data-affinity-based reordering, memory efficient compressed format, high-throughput pipeline, and adaptive sparsity-aware load balancing. In contrast to the state-of-the-art SpMM kernels on various NVIDIA GPU architectures with a diverse range of benchmark matrices, Acc-SpMM achieves significant performance improvements, on average 2.52x (up to 5.11x) speedup on RTX 4090, on average 1.91x (up to 4.68x) speedup on A800, and on average 1.58x (up to 3.60x) speedup on H100 over cuSPARSE.
... The implementation was executed using the Python programming language, where sparse linalg (linear algebra) Python functions were used for solving the linear system of equations. Specifically, a Super Incomplete LU factorization was employed as a preconditioner applying the conjugate gradient method, as described in Barrett et al. [8]. The heat transfer model exhibits nonlinearity due to the convective heat transfer coefficient in eq. ...
Conference Paper
Full-text available
This study introduces a new strategy for capturing the interface between water and the fouling layer on heat transfer surfaces. This approach fits within moving boundary problems and is referred to as an Eulerianlike Interface-Capturing Approach for Modeling the Fouling Process by Crystallization (ELICAFC). Fouling is a prevalent industrial phenomenon characterized by the deposition of undesirable compounds on heat transfer surfaces. Several mathematical models have been proposed in the literature to predict the average fouling growth over time, with one of the key challenges being the tracking or capturing of the fouling layer’s interface. In this context, capturing the fouling layer movement not only aids in estimating the increase in thermal resistance but also in controlling heat transfer efficiency over space and time. The Finite Element Method is employed within the ELICAFC strategy, utilizing the Bohnet model to estimate the net deposition rate resulting from the crystallization process. The ELICAFC procedure is used to estimate temperature distribution, fouling growth, and thermal resistance in an idealized fouling scenario. Numerical results align with the expected physical behavior and verify the consistency of the ELICAFC approach.
... However, one disadvantage of such upwinding approaches in LES is the attenuation of energy at higher wavenumbers, see, e.g., [85]. Both cases use the implicit backward scheme for time-stepping (see, e.g., [86]) and the preconditioned bi-conjugate gradient (PBiCGStab) [87,88] for the iterative solution of the momentum and pressure equations. For the iterative solvers, we use the Diagonal-based Incomplete Cholesky (DIC) preconditioner (pressure equation) and the Diagonal-based Incomplete LU (DILU) preconditioner (momentum equation). ...
Preprint
Full-text available
This work investigates the current wall-modeled large-eddy simulation (WMLES) capabilities of the open-source computational fluid dynamics solver OpenFOAM, which is used widely in academia and industry. This is achieved by a simulation campaign that covers both attached and smooth body separation cases. The campaign includes simulations using four different wall models and aims to investigate the sensitivity of the results to changes in numerics, mesh resolution, and subgrid-scale modeling. The results demonstrate that two main factors largely determine OpenFOAM-based WMLES performance. These are the discretization of the convective term and wall modeling. For the former, the best performance in the attached case is achieved with low-dissipation numerics, however, for the smooth body separation case, more dissipative numerics give the best performance. For the latter, we find that both equilibrium and non-equilibrium wall models perform well in the attached case but that the non-equilibrium models significantly improve the prediction of smooth body separation. Still, the non-equilibrium wall model results do not show a uniform improvement over equilibrium models. This is explained by an inconsistent accounting of non-equilibrium physics in these models, i.e., including the pressure gradient term without also including the convective term. This highlights the potential for future performance improvements by using non-equilibrium wall models that consistently account for both the convective and pressure gradient terms.
Article
Full-text available
The present study numerically investigates the effects of a magnetic field on mixed convection flow and entropy generation within a double lid‐driven square cavity filled with a hybrid nanofluid. The flow is induced by two isothermally heated semi‐circles located on the bottom and left walls of the cavity. The cavity is filled with a ternary composition of hybrid nanofluid (aluminum oxide/silver/copper oxide‐water) and is exposed to a uniform magnetic field. The velocity ratio of the moving lids and the radius ratio of the semi‐circles are key parameters in the analysis. The study employs the finite volume method and full multigrid acceleration to solve the coupled continuity, momentum, energy, and entropy generation equations, along with the relevant boundary conditions. Key dimensionless parameters considered include the Hartmann number (0 ≤ Ha ≤ 100), Richardson number (0.01 ≤ Ri ≤ 1), hybrid nanofluid volume fraction (3% ≤ ϕ ≤ 12%), internal semi‐circle radius ratio (β = 0.5 and 1), and velocity ratio (−2 ≤ λ ≤ 2). Results revealed that the optimal heat transfer is achieved for Ri = 0.04, Ha = 100, ϕ = 0%, β = 1, and λ = 0.5 with 63% enhancement. Moreover, the maximum entropy generation rates are obtained for the same parameters with a rate of 47%, reflecting the complex balance of enhanced heat transfer and associated irreversibility's. Results reveal also that heat transfer and entropy generation are a decreasing function of Hartmann number implying a suppress of fluid motion due to the Lorentz force. This study provides a valuable resource and parametric analysis for researchers and engineers, aiding in the design and optimization of thermal management systems for various industrial applications, including heat exchangers, nuclear reactors, and energy systems.
Article
Full-text available
Electron hydrodynamics refers to the transport regime where electrons collectively behave like a fluid. Its realization requires pure materials, some of which, such as bilayer graphene or PdCoO 2 , are anisotropic so that different in-plane transport directions can be defined. Collective electron flow also benefits from geometrically engineered devices because it is highly dependent on the nonuniformity of the electron flow. Here we analyze carrier transport in anisotropic materials where remarkable effects emerge after the proper directional design of the device. Simulations based on the Boltzmann transport equation demonstrate that electrical properties are clearly different when the device is set in the easy or the hard transport directions, namely, when the transport channel is aligned or not aligned to the group velocity at the Fermi level, respectively. Most importantly, the standard signatures of viscous electron flow, such as Poiseuille flow, superballistic conduction, and the formation of whirlpools, are enhanced when the anisotropic device operates in the hard transport directions. As a result, we demonstrate that electron hydrodynamics leads to a different route for efficient charge transport in the hard in-plane transport directions. Published by the American Physical Society 2025
Article
An application of the eXtended Element-Free Galerkin method to a boundary-value problem reduces to an asymmetric EFG-type Saddle-Point (EFG-SP) problem. How-ever, saddle-point problems are difficult to solve even with iterative methods. For the pur-pose of resolving the difficulties, four types of high-performance solvers were developed for asymmetric EFG-SP problems in the previous study. In the four solvers, after elimi-nating Lagrange multipliers from the problems, the resulting linear systems are solved with Krylov subspace methods. In the present study, the Lagrange-multiplier-elimination method is generalized. As a result, an infinite number of solvers can be derived in principle. Three categories of solvers are introduced and their performance is investigated numerically. Con-sequently, it is found that the resulting solvers are effective especially for large-scale asym-metric EFG-SP problems.
Article
Full-text available
The natural convection heat transfer of a trihybrid nanofluid comprising Fe2O3, MoS2, and CuO nanoparticles dispersed in water (Fe2O3 + MoS2 + CuO/H2O) has been investigated within a cavity exposed to a uniform magnetic field. Three cold fins were strategically positioned on the top, right, and left walls of the enclosure. The study employs numerical simulations conducted using a custom-developed FORTRAN code. The computational approach integrates the finite volume method and full multigrid acceleration to solve the coupled governing equations for continuity, momentum, energy, and entropy generation, along with the associated boundary conditions. Prior to obtaining the results, a meticulous parameterization process was undertaken to accurately capture the fluid dynamics and thermal behavior characteristic of this geometric configuration. The findings underscored the key parameters’ significant impact on the flow structure and thermal performance. The results revealed that natural convection is more dominant at high Rayleigh and low Hartmann numbers, leading to higher Nusselt numbers and stronger dependence on the tilt angle α. Moreover, the optimal heat transfer conditions were obtained for the following parameters: Ha = 25, α = 45°, ϕ = 6%, and Ra = 106 with a rate of 4.985. This study offers valuable insights into achieving a balance between these competing factors by determining the optimal conditions for maximizing heat transfer while minimizing entropy generation. The findings contribute to enhancing the design of thermal systems that utilize magnetic nanofluids for efficient heat dissipation, making the research particularly relevant to advanced cooling technologies and compact thermal management solutions.
Article
We consider the problem of solving the algebraic system of equations which arise from the discretization of symmetric elliptic boundary value problems via finite element methods. A new class of preconditioners for these discrete systems is developed based on substructuring (also known as domain decomposition). The resulting preconditioned algorithms are well suited to emerging parallel computing architectures. The proposed methods are applicable to problems on general domains involving differential operators with rather general coefficients. A basic theory for the analysis of the condition number of the preconditioned system (which determines the iterative convergence rate of the algorithm) is given. Techniques for applying the theory and algorithms to problems with irregular geometry are discussed and the results of extensive numerical experiments are reported.
Article
A large collection of public domain mathematical software is now available via electronic mail. Messages sent to "netlib@anl-mcs" (on the Arpanet/CSNET) or to "researchlnetlib" (on the UNIX network) wake up a server that distributes items from the collection. For example the one-line message, "send index" gets a library catalog by return mail. We describe how to use the service and some of the issues in its implementation.
Article
The marching and generalized marching algorithms of part I are extended to nonconstant coefficient problems in which the elliptic operator is separable, once a suitable set of polynomials, which play a role analogous to the Chebyshev polynomials in the constant coefficient case, has been determined. These methods require O(n2)O(n^2 ) and O(n2logn/k)O(n^2 \log {n / k}) operations, respectively, to solve a problem on an n×nn \times n grid, and have numerical stability characteristics similar to their constant coefficient counterparts. Problems in which the elliptic operator is not separable are treated using a D’Yakanov–Gunn iteration in which a sequence of separable problems is solved. The rate of convergence of this iteration is shown to be essentially independent of n.
Article
A new iterative method has been developed for solving the large sets of algebraic equations that arise in the approximate solution of multidimensional partial differential equations by implicit numerical techniques. This method has several advantages over those now in use. First, its rate of convergence does not depend strongly on the nature of the coefficient matrix of the equations to be solved. Second, it is not sensitive to the choice of iteration parameters, and as a result, suitable parameters can be estimated from the coefficient matrix. Finally, it reduces significantly the computational effort needed to solve a set of equations. For a typical set of 961 equations, it was found to reduce the number of calculations by a factor of three, when compared to the most competitive of the older methods. It is expected that this advantage will be even greater for larger sets of equations. 1. Introduction. Approximate solutions of multidimensional differential equations often are obtained by the application of implicit finite difference analogues. A difference equation is written for each grid point in the region of interest, and the resulting set of simultaneous equations must be solved for each time step. Such sets of equations can be solved directly by elimination or by one of several iterative methods, such as relaxation, successive overrelaxation, or ADI (alternating direction iteration). The purpose of this paper is to describe a new iterative procedure that converges much faster than any of these methods. The simplest method of solving these sets of equations is direct solution by elimination. Although this approach is the most efficient method available for small sets of equations, it is not for large sets. The procedure requires 2n2 arithmetic operations to solve n equations of the type being considered. When n becomes relatively large, it is more efficient to use an