Conference PaperPDF Available

Parallel Maxwell Eigensolver Using Trilinos Software Framework

Authors:

Abstract

We report on a parallel implementation of the Jacobi–Davidson algorithm to compute a few eigenvalues and corresponding eigenvectors of a large real symmetric generalized matrix eigenvalue problem. The eigenvalue problem stems from the design of cavities of particle accelerators. It is obtained by the finite element discretization of the time-harmonic Maxwell equation in weak form by a combination of Nédélec (edge) and Lagrange (node) elements. We found the Jacobi–Davidson (JD) method to be a very effective solver provided that a good preconditioner is available for the correction equations. The parallel code makes extensive use of the Trilinos software framework. In our examples from accelerator physics we observe satisfactory speedup and efficiency.
Parallel Numerics ’05, 25-34 M. Vajterˇsic, R. Trobec, P. Zinterhof, A. Uhl (Eds.)
Chapter 2: Matrix Algebra ISBN 961-6303-67-8
Parallel Maxwell Eigensolver Using
Trilinos Software Framework
Peter Arbenz 1, Martin Beˇcka 2,, Roman Geus 3,
Ulrich Hetmaniuk 4, Tiziano Mengotti 1
1Institute of Computational Science, Swiss Federal Institute of Technology,
CH-8092 Zurich, Switzerland
2Department of Informatics, Institute of Mathematics,
Slovak Academy of Sciences,
ubravsk´a cesta 9, 841 04 Bratislava, Slovak Republic
3Paul Scherrer Institut,
CH-5232 Villigen PSI, Switzerland
4Sandia National Laboratories,
Albuquerque, NM 87185-1110, U.S.A.
We report on a parallel implementation of the Jacobi–Davidson algo-
rithm to compute a few eigenvalues and corresponding eigenvectors of a
large real symmetric generalized matrix eigenvalue problem. The eigen-
value problem stems from the design of cavities of particle accelerators.
It is obtained by the finite element discretization of the time-harmonic
Maxwell equation in weak form by a combination of N´ed´elec (edge) and
Lagrange (node) elements. We found the Jacobi–Davidson (JD) method
to be a very effective solver provided that a good preconditioner is avail-
able for the correction equations. The parallel code makes extensive use
of the Trilinos software framework. In our examples from accelerator
physics we observe satisfactory speedup and efficiency.
Corresponding author. E-mail: martin.becka@savba.sk
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin
Company, for the United States Department of Energy’s National Nuclear Security Admini-
stration under contract DE-AC04-94AL85000.
26 P. Arbenz, M. Beˇcka, R. Geus, U. Hetmaniuk, T. Mengotti
1 Introduction
Many applications in electromagnetics require the computation of some of the
eigenpairs of the curl-curl operator,
curl µ1
rcurl e(x)k2
0εre(x) = 0,div e(x) = 0,x,(1)
Equations (1) are obtained from the Maxwell equations after separation of
the time and space variables and after elimination of the magnetic field in-
tensity. The discretization of (1) by finite elements leads to a real symmetric
generalized matrix eigenvalue problem
Ax=λMx, CTx=0,(2)
where Ais positive semidefinite and Mis positive definite. In order to avoid
spurious modes we approximate the electric field eby N´ed´elec (or edge) el-
ements [17]. The Lagrange multiplier that is a function introduced to treat
properly the divergence free condition is approximated by Lagrange (or nodal)
finite elements [3].
In this paper we consider a parallel eigensolver for computing a few of
the smallest eigenvalues and corresponding eigenvectors of (2) as efficiently as
possible with regard to execution time and memory cost. In earlier studies [3]
we found the Jacobi–Davidson algorithm [18, 9] a very effective solver for this
task. We have parallelized this solver in the framework of the Trilinos parallel
solver environment [10].
In section 2 we briefly review the symmetric Jacobi-Davidson eigensolver
and the preconditioner that is needed for its efficient application. In section 3
we discuss data distribution and issues involving the use of Trilinos.
In section 4 we report on experiments that we conducted by means of
problems originating in the design of the RF cavity of the 590 MeV ring cy-
clotron installed at the Paul Scherrer Institute (PSI) in Villigen, Switzerland.
These experiments indicate that the implemented solution procedure is almost
optimal in that the number of iteration steps until convergence only slightly
depends on the problem size.
2 The eigensolver
The Jacobi–Davidson algorithm has been introduced by Sleijpen and van der
Vorst [18]. There are variants for all types of eigenvalue problems [5]. Here
we use a variant (JDSYM) adapted to the generalized symmetric eigenvalue
problem (2) as described in detail in [2, 9]. This algorithm is well-suited since
Parallel Maxwell Eigensolver Using Trilinos Software Framework 27
it does not require the factorization of the matrices Aor M. In [2, 3, 4] we
found JDSYM to be the method of choice for this problem.
In addition to the standard JDSYM algorithm, we keep solutions of the cor-
rection equation orthogonal to Capplying a projector operator (IY H1CT)
in each iteration step. Note that Y=M1Cis a very sparse basis of the null
space of Aand that H=YTCis the discretization of the Laplace operator in
the nodal element space [3].
Our preconditioner KAσM , where σis a fixed shift, is a combina-
tion of a hierarchical basis preconditioner and an algebraic multigrid (AMG)
preconditioner.
Since our finite element spaces consist of N´ed´elec and Lagrange finite ele-
ments of degree 2 and since we are using hierarchical bases, we employ the
hierarchical basis preconditioner that we used successfully in [3]. Numbering
the linear before the quadratic degrees of freedom, the matrices A,Mand
Kget a 2-by-2 block structure. The (1,1)-blocks correspond to the bilinear
forms involving linear basis functions. The hierarchical basis preconditioners
as discussed by Bank [6] are stationary iteration methods that respect the
2-by-2 block structure of Aand M. We use the symmetric block Gauss–Seidel
iteration as the underlying stationary method.
For very large problems (order 105and more), we solve with K11 by a
single V-cycle of an AMG preconditioner. This makes our preconditioner a
true multilevel preconditioner. We found ML [16] the AMG solver of choice as
it can handle unstructured systems that originate from the Maxwell equation
discretized by linear N´ed´elec finite elements. ML implements a smoothed
aggregation AMG method [20] that extends the straightforward aggregation
approach of Reitzinger and Sch¨oberl [14]. ML is part of Trilinos which is
discussed in the next section.
The approximation
e
K22 of K22 again represents a stationary iteration
method of which we execute a single iteration step.
3 Parallelization issues
For very large problems, the data must be distributed over a series of proces-
sors. To make the solution of these large problems feasible, an efficient parallel
implementation of the algorithm is necessary. Such a parallelization of the al-
gorithm requires proper data structures and data layout, some parallel direct
and iterative solvers, and some parallel preconditioners. For our project, we
found the Trilinos Project [19] to be an efficient environment to develop such
a complex parallel application.
28 P. Arbenz, M. Beˇcka, R. Geus, U. Hetmaniuk, T. Mengotti
3.1 Trilinos
The Trilinos Project is an ongoing effort to design, develop, and integrate
parallel algorithms and libraries within an object-oriented software framework
for the solution of large-scale, complex multi-physics engineering and scientific
applications [19, 10, 15]. Trilinos is a collection of compatible software pack-
ages. Their capabilities include parallel linear algebra computations, parallel
algebraic preconditioners, the solution of linear and non-linear equations, the
solution of eigenvalue problems, and related capabilities. Trilinos is primarily
written in C++ and provides interfaces to essential Fortran and C libraries.
For our project, we use the following packages
Epetra, the fundamental package for basic parallel algebraic operations.
It provides a common infrastructure to the higher level packages,
Amesos, the Trilinos wrapper for linear direct solvers (SuperLU, UMF-
PACK, KLU, etc.),
AztecOO, an object-oriented descendant of the Aztec library of parallel
iterative solvers and preconditioners,
ML, the multilevel preconditioner package, that implements a smoothed
aggregation AMG preconditioner capable of handling Maxwell equa-
tions [7, 16].
For a detailed overview of Trilinos and its packages, we refer the reader to [10].
3.2 Data structures
Real valued double precision distributed vectors, multivectors (collections of
one or more vectors) and (sparse) matrices are fundamental data structures,
which are implemented in Epetra. The distribution of the data is done by
specifying a communicator and a map, both Epetra objects.
The notion of a communicator is known from MPI [13]. A communicator
defines a context of communication, a group of processes and their topology,
and it provides the scope for all communication operations. Epetra imple-
ments communicators for serial and MPI use. Moreover, communicator classes
provide methods similar to other MPI functions.
Vectors, multivectors and matrices are distributed row wise. The distribu-
tion is defined by means of a map. A map can be defined as the distribution
of a set of integers across the processes, it relates the global and local row in-
dices. To create a map object, a communicator, the global and local numbers
of elements (rows), and the global numbering of all local elements have to be
Parallel Maxwell Eigensolver Using Trilinos Software Framework 29
provided. So, a map completely describes the distribution of vector elements
or matrix rows. Note that rows can be stored on several processors redun-
dantly. To create a distributed vector object, in addition to a map, one must
assign values to the vector elements. The Epetra vector class offers standard
functions for doing this and other common vector manipulations.
Trilinos supports dense and sparse matrices. Sparse matrices are stored
locally in the compressed row storage (CRS) format [5]. Construction of a
matrix is row by row or element by element. Afterwards, a transformation
of the matrix is required in order to perform matrix-(multi)vector product
Y=A×Xefficiently, specifying maps of the vectors Xand Y.
Some algorithms require only the application of a linear operator, such that
the underlying matrix need not be available as an object. Epetra handles this
by means of a virtual operator class. Epetra also admits to work with block
sparse matrices. Unfortunately, there is no particular support for symmetric
matrices.
To redistribute data, one defines a new, so-called target map and creates
an empty data object according to this new map as well as an Epetra’s im-
port/export object from the original and the new map. The new data object
can be filled with the values of the original data object using the import/export
object, which describes the communication plan.
3.3 Data distribution
A suitable data distribution can reduce communication costs and balance the
computational load. The gain from such a redistribution can, in general,
overcome the cost of this preprocessing step.
Zoltan [21, 8] is a library that contains tools for load balancing and parallel
data management. It provides a common interface to graph partitioners like
METIS and ParMetis [12, 11]. Zoltan is not a Trilinos package. But the
Trilinos package EpetraExt provides an interface between Epetra and Zoltan.
In our experiments, we use ParMetis to distribute the data. This parti-
tioner tries to distribute a graph such that (I) the number of graph vertices
per processor is balanced and (II) the number of edge cuts is minimized. The
former balances the work load. The latter minimizes the communication over-
head by concentrating elements in diagonal blocks and minimizing the number
of non-zero off-diagonal blocks. In our experiments, we define a graph G, which
contains connectivity informations for each node, edge, and face of the finite
element mesh. Gis constructed from portions of the sparse matrices M,H,
and C. To reduce overhead of the redistribution we also work with a smaller
than our artificial graph G. We determine a good parallel distribution just for
the vertices and then adjust the edges and faces accordingly.
30 P. Arbenz, M. Beˇcka, R. Geus, U. Hetmaniuk, T. Mengotti
4 Numerical experiments
In this section, we discuss the numerical experiments used to assess the parallel
implementation. Results have been presented in [1].
The experiments have been executed on a 32 dual-node PC cluster in
dedicated mode. Each node has 2 AMD Athlon 1.4 GHz processors, 2 GB
main memory, and 160 GB local disk. The nodes are connected by a Myrinet
providing a communication bandwidth of 2000 Mbit/s. The system operates
with Linux.
For these experiments, we use the developer version of Trilinos on top of
MPICH. We computed the 5 smallest positive eigenvalues and corresponding
eigenvectors using JDSYM with the multilevel preconditioner.
Test problems originate in the design of the RF cavity of the 590 MeV ring
cyclotron installed at the Paul Scherrer Institute (PSI) in Villigen, Switzer-
land. We deal with two problem sizes. They are labelled cop40k and cop300k.
Their characteristics are given in Table 1, where we list the order nand the
Table 1: Matrix characteristics
grid nAσM nnzAσM nHnnzH
cop40k 231,668 4,811,786 46,288 1,163,834
cop300k 1,822,854 39,298,588 373,990 10,098,456
number of non-zeros nnz for the shifted operator AσM and for the discrete
Laplacian H. Here the eigenvalues to be computed are
λ11.13, λ24.05, λ39.89, λ411.3, λ514.2.
We set σ= 1.5.
In Table 2, we report the execution times t=t(p) for solving the eigenvalue
problem with various numbers pof processors. These times do not include
preparatory work, such as the assembly of matrices or the data redistribution.
E(p) describes the parallel efficiency with respect to the simulation run with
the smallest number of processors. tprec and tpro j indicate the percentage
of the time the solver spent applying the preconditioner and the projector,
respectively. navg
inner is the average number of inner (QMRS) iterations per
outer iteration. The total number of applications for the preconditioner K
is approximately nouter ·navg
inner. Here we use an AMG preconditioner for the
block K11, Jacobi steps for K22 and an AMG preconditioner for the whole H.
In Table 3, we use the AMG preconditioner for the block K11, Jacobi
steps for K22, and a similar strategy for H(AMG preconditioner for H11 and
Jacobi steps for H22). We investigate the effect of redistributing the matrices.
Parallel Maxwell Eigensolver Using Trilinos Software Framework 31
Table 2: Results for matrix cop40k
p t [sec] E(p)tprec [%] tproj [%] nouter navg
inner
1 2092 1.00 37 18 53 19.02
2 1219 0.86 38 17 54 18.96
4 642 0.81 37 17 54 19.43
8 321 0.81 38 18 53 19.23
12 227 0.77 40 19 53 19.47
16 174 0.75 43 20 53 18.96
Results in Table 3 show that the quality of data distribution is important.
For the largest number of processors (p= 16), the execution time with the
redistributed matrices is half the time obtained with the original matrices.
These were straightforward block distributions of the matrices.
Table 3: cop40k: Comparison of results with (left) and without (right) redis-
tribution
p t [sec] E(p)nouter navg
inner
1 1957 2005 1.00 1.00 53 53 19.02 19.02
2 1159 1297 0.84 0.77 54 53 19.06 19.66
4 622 845 0.79 0.59 54 55 19.43 19.18
8 318 549 0.77 0.45 53 54 19.23 19.67
12 231 451 0.71 0.37 53 54 20.47 19.78
16 184 366 0.66 0.34 53 54 19.00 19.04
Finally, in Table 4, we report results for our larger problem size cop300k.
We use the 2-level preconditioner for Kand H: an appropriate AMG pre-
conditioner for the blocks K11 and H11 and one step of Jacobi for the blocks
K22 and H22. Table 4 shows that, for these experiments, the iteration counts
behave nicely and that efficiencies stay high.
Table 4: Results for matrix cop300k
p t [sec] E(p)nouter navg
inner
8 4346 1.00 62 28.42
12 3160 0.91 62 28.23
16 2370 0.92 61 28.52
32 P. Arbenz, M. Beˇcka, R. Geus, U. Hetmaniuk, T. Mengotti
5 Conclusions
In conclusion, the parallel algorithm shows a very satisfactory behavior. The
efficiency of the parallelized code does not get below 65 percent for 16 proces-
sors. We usually have a big efficiency loss initially. Then efficiency decreases
slowly as the number of processors increases. This is natural due to the grow-
ing communication-to-computation ratio.
The accuracy of the results are satisfactory. The computed eigenvectors
were M-orthogonal and orthogonal to Cto machine precision.
We plan to compare the Jacobi-Davidson approach with another eigen-
value solvers like the locally optimal block preconditioned conjugate gradient
methods (LOBPCG).
References
[1] P. Arbenz, M. Beˇcka, R. Geus, U. Hetmaniuk, and T. Mengotti, On a
Parallel Multilevel Preconditioned Maxwell Eigensolver. Technical Report
465, Institute of Computational Science, ETH Z¨urich, December 2004.
[2] P. Arbenz and R. Geus, A comparison of solvers for large eigenvalue
problems originating from Maxwell’s equations. Numer. Linear Algebra
Appl., 6(1):3–16, 1999.
[3] P. Arbenz and R. Geus, Multilevel preconditioners for solving eigenvalue
problems occuring in the design of resonant cavities. Applied Numerical
Mathematics, 2004. Article in press. Corrected proof available from doi:
10.1016/j.apnum.2004.09.026.
[4] P. Arbenz, R. Geus, and S. Adam, Solving Maxwell eigenvalue problems
for accelerating cavities. Phys. Rev. ST Accel. Beams, 4:022001, 2001.
(Electronic journal available from http://prst-ab.aps.org/).
[5] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, Templates
for the Solution of Algebraic Eigenvalue Problems: A Practical Guide.
SIAM, Philadelphia, PA, 2000.
[6] R. E. Bank, Hierarchical bases and the finite element method. Acta
Numerica, 5:1–43, 1996.
[7] P. B. Bochev, C. J. Garasi, J. J. Hu, A. C. Robinson, and R. S. Tuminaro,
An improved algebraic multigrid method for solving Maxwell’s equations.
SIAM J. Sci. Comput., 25(2):623–642, 2003.
Parallel Maxwell Eigensolver Using Trilinos Software Framework 33
[8] K. Devine, E. Boman, R. Heaphy, B. Hendrickson, and C. Vaughan,
Zoltan data management services for parallel dynamic applications. Com-
puting in Science and Engineering, 4(2):90–97, 2002.
[9] R. Geus, The Jacobi–Davidson algorithm for solving large sparse sym-
metric eigenvalue problems. PhD Thesis No. 14734, ETH Z¨urich,
2002. (Available at URL http://e-collection.ethbib.ethz.ch/show?
type=diss&nr=14734).
[10] M. Heroux, R. Bartlett, V. Howle, R. Hoekstra, J. Hu, T. Kolda,
R. Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thorn-
quist, R. Tuminaro, J. Willenbring, and A. Williams, An overview of the
Trilinos Project. ACM Trans. Math. Softw., 5:1–23, 2003.
[11] G. Karypis and V. Kumar, Parallel multilevel k-way partitioning scheme
for irregular graphs. SIAM Rev., 41(2):278–300, 1999.
[12] METIS: A family of programs for partitioning unstructured graphs and
hypergraphs and computing fill-reducing orderings of sparse matrices. See
URL http://www-users.cs.umn.edu/~karypis/metis/.
[13] P. S. Pacheco, Parallel programming with MPI. Morgan Kaufmann, San
Francisco CA, 1997.
[14] S. Reitzinger and J. Sch¨oberl, An algebraic multigrid method for finite
element discretizations with edge elements. Numer. Linear Algebra Appl.,
9(3):223–238, 2002.
[15] M. Sala, M. A. Heroux, and D. D. Day, Trilinos 4.0 Tutorial. Technical
Report SAND2004-2189, Sandia National Laboratories, May 2004.
[16] M. Sala, J. Hu, and R. S. Tuminaro, ML 3.1 Smoothed Aggregation
User’s Guide. Tech. Report SAND2004-4819, Sandia National Laborato-
ries, September 2004.
[17] P. P. Silvester and R. L. Ferrari, Finite Elements for Electrical Engineers.
Cambridge University Press, Cambridge, 3rd edition, 1996.
[18] G. L. G. Sleijpen and H. A. van der Vorst, A Jacobi–Davidson iteration
method for linear eigenvalue problems. SIAM J. Matrix Anal. Appl.,
17(2):401–425, 1996.
[19] The Trilinos Project Home Page, http://software.sandia.gov/
trilinos/.
34 P. Arbenz, M. Beˇcka, R. Geus, U. Hetmaniuk, T. Mengotti
[20] P. Vanˇek, J. Mandel, and M. Brezina, Algebraic multigrid based on
smoothed aggregation for second and fourth order problems. Computing,
56(3):179–196, 1996.
[21] Zoltan Home Page, http://www.cs.sandia.gov/Zoltan/.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Article
ML is a multigrid preconditioning package intended to solve linear systems of equations Ax = b where A is a user supplied n◊n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial dierential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package or to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the Aztec 2.1 and AztecOO iterative packages (21). However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell's equations, and a multilevel and domain decomposition method for symmetric and non- symmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dy- namics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.
Full-text available
Article
The Zoltan library is a collection of data management services for parallel, unstructured, adaptive, and dynamic applications that is available as open-source software from www.cs.sandia.gov/zoltan. It simplifies the load-balancing, data movement, unstructured-communication, and memory usage difficulties that arise in dynamic applications such as adaptive finite-element methods, particle methods, and crash simulations. Zoltan's data-structure-neutral design also lets a wide range of applications use it without imposing restrictions on application data structures. Its object-based interface provides a simple and inexpensive way for application developers to use the library and researchers to make new capabilities available under a common interface.
Full-text available
Article
We investigate eigensolvers for computing a few of the smallest eigenvalues of a generalized eigenvalue problem resulting from the finite element discretization of the time independent Maxwell equation. Various multilevel preconditioners are employed to improve the convergence and memory consumption of the Jacobi-Davidson algorithm and of the locally optimal block preconditioned conjugate gra-dient (LOBPCG) method. We present numerical results of very large eigenvalue problems originating from the design of resonant cavities of particle accelerators.
Article
This third edition of the principal text on the finite element method for electrical engineers and electronics specialists presents the method in a mathematically undemanding style, accessible to undergraduates who may be encountering it for the first time. Like the earlier editions, it begins by deriving finite elements for the simplest familiar potential fields, then advances to formulate finite elements for a wide range of applied electromagnetics problems. These include wave propagation, diffusion, and static fields; open-boundary problems and nonlinear materials; axisymmetric, planar and fully three-dimensional geometries; scalar and vector fields. This new edition is more than half as long again as its predecessor, with original material extensively revised and much new material added. As well as providing all that is needed for the beginning undergraduate student, this textbook is also a valuable reference text for professional engineers and research students. A wide selection of demonstration programs allows the reader to follow the practical use of the methods.
Article
We present experiments with various solvers for large sparse generalized symmetric matrix eigenvalue problems. These problems occur in the computation of a few of the lowest frequencies of standing electromagnetic waves in resonant cavities with the finite element method. The solvers investigated are (1) subspace iteration, (2) block Lanczos algorithm, (3) implicitly restarted Lanczos algorithm and (4) Jacobi–Davidson algorithm. The experiments have been conducted on a Hewlett-Packard Exemplar S-Class system. Copyright © 1999 John Wiley & Sons, Ltd.
Article
This paper presents an algebraic multigrid method for the efficient solution of the linear system arising from a finite element discretization of variational problems in H0(curl,Ω). The finite element spaces are generated by Nédélec's edge elements. A coarsening technique is presented, which allows the construction of suitable coarse finite element spaces, corresponding transfer operators and appropriate smoothers. The prolongation operator is designed such that coarse grid kernel functions of the curl-operator are mapped to fine grid kernel functions. Furthermore, coarse grid kernel functions are ‘discrete’ gradients. The smoothers proposed by Hiptmair and Arnold, Falk and Winther are directly used in the algebraic framework. Numerical studies are presented for 3D problems to show the high efficiency of the proposed technique. Copyright