Content uploaded by Nolwenn Balin

Author content

All content in this area was uploaded by Nolwenn Balin on Jun 23, 2016

Content may be subject to copyright.

Fast Methods applied to BEM Solvers

for Acoustic Propagation Problems

N. Balin ∗

, G. Sylvand †

, and J. Robert ‡

Airbus Group Innovations, Toulouse, France

For the numerical simulation of wave propagation in acoustics, Airbus Group Innovations

relies on integral equations solved with the Boundary Elements Method (BEM), leading

to the need to solve dense linear systems. In this article, we intend to present two families

of fast solvers (Fast Multipole Method and H-Matrix method) that can be used on these

systems. We propose to underline their similarities, their connections and their diﬀerences,

to present their complementarity in future high performance solvers and to illustrate their

performances on industrial class applications.

I. Context

Airbus Group Innovations is the Airbus Group research center, dedicated to upstream research applied to

all Business Units (Airbus, Airbus Helicopters, Airbus Defence and Space ). The applied mathematics team

has developed over the years a software called ACTIPOLE 1destined to solve various acoustic propagation

problems using integral equations and boundary elements methods. This software suite is used in design

and research department to work on noise reduction.

The advantages of integral equations and BEM solver are well known: mainly accuracy, and simpler

(surfacic) mesh. The main algorithmic drawback is the need to cope with a dense matrix whose size can

be quite large for wave propagation problems, where the mesh step is governed by the wavelength of the

physical problem treated (in frequency domain).

For example, acoustic problems on a full size aircraft at 20 000 Hz (upper limit of audible frequencies)

can involve more than 108unknowns. Solving such linear systems with standard method is just impossible

(storage would require 80.000 terabytes of disk, factorization would take 100 years on all Airbus HPC

facilities).

Since the late 90’s, fast methods have been introduced to deal with these limitations. First, the Fast Mul-

tipole Method (FMM) allowed to compute fast matrix-vector products (in O(nlog2(n)) instead of O(n2) for

the standard algorithm), and hence to design fast solvers using iterative methods. Lately, H-Matrix methods

have gained wide acceptance by introducing fast direct solvers, allowing to solve systems in O(nlog2(n)) –

or less – without the hassle of using iterative solvers (unknown convergence rate and diﬃculty to ﬁnd a good

preconditionner).

All these methods allow to compute the acoustic noise propagation for industrial conﬁgurations in pres-

ence of a uniform ﬂow.

II. Boundary Element Method and Classical Resolution

A. Boundary Element Method

We are interested here into the modelling of the propagation of an acoustic ﬁeld generated by an acoustic

source in an uniform ﬂow (domain Ω). The acoustic source is supposed to be harmonic, deﬁned by the

pulsation ω0. In the following we consider the exp(−iω0t) time convention. The diﬀracted acoustic pressure

pveriﬁes the convected Helmhotlz equation with associated boundary conditions on Γ = ∂Ω.

∗Research Engineer, AGI

†Expert, AGI/Inria

‡Research Engineer, AGI

1of13

American Institute of Aeronautics and Astronautics

Using the following Prandtl-Glauert transformation2,3

(x0=x+C∞(M0·x)M0

p(x) = p0(x0)eikM0·x0(1)

with C∞=1

M2

01

√1−M2

0−1and k=k0

√1−M2

0

,the problem is reduced to the classical Helmholtz equation

on the acoustic diﬀracted pressure

∆0p0+k2p0= 0 in Ω0

∇0p0·n0=−∇0pinc0·n0on Γ0

lim

|r0|→∞ |r0|∇p0·r0

|r0|−ikp0= 0

(2)

with Ω0and Γ0the domain and boundary Ω and Γ after the transformation, n0the normal to Γ0oriented

outside Ω0and pinc0the transformed incident acoustic pressure.

For simplicity in the notations, we consider here only a rigid body and in the following, the prime has

been removed.

The knowledge of the acoustic pressure on Γ entirely solves the problem, since the following representation

theorem allows to compute the diﬀracted pressure pin any point x /∈Γ:4

p(x) = ZΓ−∂G(x, y)

∂ny

p(y)dy, (3)

where G(x, y) is the Green’s function, solution of ∆u+k2u=−δ0and given by G(x, y) = eik||x−y||

4π||x−y||. We

have the following results for x∈Γ:

∂p

∂n (x) = −Dp(x),(4)

with:

Dp(x) = IΓ

∂2G(x, y)

∂nx∂ny

p(y)dy. (5)

Using the rigid boundary conditions ∂ ptot

∂n = 0, we obtain the variationnal formulation:

ﬁnd psuch that ∀pt, we have:

IΓ×Γ

∂2G

∂nx∂ny

p(y)pt(x)dydx =ZΓ

∂pinc (x)

∂n pt(x)dx. (6)

In order to discretize this system, we use a surfacic triangle mesh of the boundary Γ. The pressure trace

is discretized with P1 linear basis functions. We end-up with a complex, dense and symmetric system to

solve:

hDih pi=∂uinc

∂n .(7)

B. Classical Resolution

As we have seen in the previous section, the BEM formulation leads to the need to solve dense linear

systems, which is quite speciﬁc in the world of ﬁnite element methods. This has led to the development of

various solvers speciﬁcally designed for this purpose. In this section, we present a direct solver based on a

block-L.U or block-L.D.tLfactorization adapted to HPC platforms.5This is the most natural way to solve

a dense linear system, but its cost in terms of CPU time and storage grows fast with respect to the number

of unknowns. This has led to the introduction of advanced solver (FMM, H-mat) that we will present in the

next sections.

2of13

American Institute of Aeronautics and Astronautics

Direct vs. Iterative solver There are two main families of solvers for solving linear system : the direct

solvers apply a predetermined set of operations to compute the solution. The costs, in terms of storage and

ﬂoating point operations, are known in advance. For the linear system we are dealing with, L.U or L.D.tL

factorization are the most common approaches. The L.D.tLalgorithm is adapted to symmetric matrices,

and for a matrix of order Nit requires to store N2/2 scalar values and to computes 4N3/3 operations

(for complex scalar). The L.U is for non-symmetric matrices and requires twice the cost in storage and in

operation count.

On the other hand, iterative solvers repetitively apply a set of operations to compute an approximation of

the solution more and more precisely as the computation goes on. CG (Conjugate gradient,6) and GMRES

(Generalized Minimal Residual7) algorithms are the most widely used. For dense matrices, this kind of

approach is especially eﬃcient when used in conjunction with a fast matrix-vector product such a the Fast

Multipole Method (see section III).

The SPIDO2 solver SPIDO2 is an in-house solver developed since the mid 90’s by Airbus Group Inno-

vations. It is a parallel direct solver, implementing both L.U and L.D.tLalgorithm, to solve dense linear

systems coming from boundary element method applied to wave propagation problems in electromagnetism

and acoustics. This solver features hybrid parallelism (with MPI and threads) and out-of-core execution

(allowing to store the matrix on disk during factorization, which is necessary since for large values of Nit

doesn’t ﬁt in memory anymore).

Accuracy At the moment the SPIDO2 solver is our reference solver in terms of accuracy for solving

BEM problems. It has been tested during years against analytic results, measurements and other software

for electromagnetism and acoustic propagation problems.

Performance We will consider a problem with N= 1,3.106unknowns. This number of degrees of

freedom would allow to realize a computation for a full A321 aircraft as represented on ﬁgure 12 with an

acoustic frequency of 1500 Hz. For a nacelle computation as seen on ﬁgure 9, this size of problem would

allow to treat an acoustic frequency of 3100 Hz.

The test platform is Airbus Group’s current HPC system called HPC4. It a cluster based on Xeon E5-

2697 processors at 2.7 GHz, with a total of ≈68,000 cores, an inﬁniband interconnect, and an overall peak

performance of 1.5 Pﬂops (that is : 1.5.1015 ﬂoating point operations per second).

On this machine, using 80 nodes (or 1920 cores), a problem with N= 1,3.106unknowns is solved using

the block L.D.tLsolver in 27.5 hours. The factorization, the most expensive part of the computation,

takes 25.8 hours, with a performance above 65% of the machine peak performance. In this computation,

the number of right hand sides only impacts the solve time, which is very small when compared to the

factorization time : 27 minutes only for 362 RHS. It illustrates the fact that a direct solver is well suited for

problems with a large number of right hand sides. The matrix is stored distributed among the nodes and on

disk, representing approx. 160 Gbytes of data per nodes.

Nevertheless, on today’s machines, using this type of approach (with a storage growing like O(N2) and

a CPU time growing like O(N3)), it is diﬃcult to consider doing larger computation than this. Hopefully,

several new approaches have emerged to solve this type of linear systems more eﬃciently.

III. Fast Multipole Methods

The initial FMM was introduced in the late 80’s for particle simulation.8Basically, the idea is to gather

the particle in clusters and to compute all the interactions not point-to-point, but cluster-to-cluster, using

approximations adapted to the considered kernel. A hierarchical approach for building the clusters leads to

a multi-level algorithm, which we refer to as “the” FMM. The introduction of FMM for Helmholtz kernel,910

paved the way to a very broad use of FMM in the ﬁeld of wave propagation,11.12

Since then, new FMM formulations have been introduced. The directional FMM13 extends the black box

FMM to all oscillatory kernels, for instance for 2D applications. The advantage of this “black box” approach

is that it only relies on kernel evaluations, and not on an analytical decomposition of this kernel. In,14 the

authors deal with a numerical breakdown that prevents the classical FMM for Helmholtz from handling

low-frequency problems. Recently, this method has been improved and simpliﬁed,15 leading to a new FMM

scheme for Helmholtz stable at all frequencies. In our implementation of this method, we still rely on the

original approach available in the late 90’s.

3of13

American Institute of Aeronautics and Astronautics

The Fast Multipole Method (FMM) is a way to compute fast but approximate matrix-vector products.

In conjonction with any iterative solver, it is an eﬃcient way to solve an integral equation problem such as

(7). The method is fast in the sense that CPU time is O(n. log2(n)) instead of O(n2) for standard matrix-

vector products, and approximate in the sense that there is a relative error between the “old” and the “new”

matrix-vector products ε≈10−3. Nevertheless, this error is not a problem since it is usually below the error

introduced by the iterative solver or by the approximation due to the mesh.

In practice, we want to compute the right hand side of (6) for all pt. In the mono-level FMM method,

we split the object into equally sized domain using for instance a cubic grid (as shown in Figure 1). The

degrees of freedom are then dispatched between these cubic cells. Interaction of basis functions located in

neighbouring cells (that is cells that share at least one vertex) are treated classicaly, without any multipole

acceleration.

Figure 1: Use of a cubic grid to split the mesh

The interactions of unknowns located in non-neighbouring cells are accelerated with the FMM. The

base of this algorithm if the following addition theorem: given two points xand ylocated in two distant

(=non-neighbouring) boxes Cand C0centered in Mand M0, we have

G(|y−x|) = ik

16π2lim

L→+∞Zs∈S

eiks.xMTL

MM0(s)eiks.M0yds,(8)

where Sdenotes the unit sphere in R3, and TL

MM0is the transfer function deﬁned on Sby

TL

MM0(s) = X

0≤l≤L

(2l+ 1)ilh(1)

l(k.|MM0|)Pl(cos(s,MM0)).(9)

with h(1)

lthe spherical Hankel function, Plthe Legendre polynomial, and the parameter Lis called

number of poles. It is chosen in accordance with the size of the box edge a, in order to have a good accuracy

in ((8)) and no divergence in ((9)): L=√3ka satisﬁes these two conditions. Using (8) within (6), we can

replace an O(n2) computation by a sequence of three operations (traditionnaly called P2M, M2L and L2P

where P stands for particle, M for Multipole expansion and L for Local expansion). Each of these operations

uses a discretization of the unit sphere Sthat is chosen in accordance with the value of L. In practice, for

an optimal size of the grid, the complexity of the single level FMM is O(n3/2).

One can then introduce a more advanced variant of the FMM using a multilevel approach based on a

recursive subdivision of the diﬀracting object using an octree (see ﬁgure 2). The idea is to explore this tree

from the root (largest box) to the leaves, and at each level to use the single level multipole method to treat

interactions between non-neighbouring domains that have not yet been treated at an upper level. At the

ﬁnest level, we treat classicaly the interactions between neighbouring domains. The improvement adds two

new operations to the algorithm (M2M and L2L) to connect the diﬀerent levels, but the operations used for

4of13

American Institute of Aeronautics and Astronautics

Figure 2: Subdivision of a plane through an octree

single level FMM remain unchanged. The optimal complexity of the multi level FMM is O(nlog2n). Hence,

for an iterative resolution of a linear system with NRHS right-hand sides requiring NIT E R iterations to reach

convergence, the complexity of the whole computation will be NRH S NIT E RO(nlog2n). The algorithm uses

a lot of functions and parameters, for a full description of the implementation and parallelisation, please

refer e.g. to.11 For examples of applications using FMM, see section B.

IV. H-Matrix

H-Matrix16 is a lossy, hierarchical storage scheme for matrices that, along with an associated arithmetic,

provides a rich enough set of approximate operations to perform the matrix addition, multiplication, factor-

ization (e.g. LU or LDLT) and inversion. It relies on two core ideas : (a) nested clustering of the degrees

of freedom (ﬁgure 3), and of their products; and (b) adaptive compression of these clusters. Several choices

exist in the literature for these two ingredients, the most common being Binary Space Partitioning for the

clustering and Adaptative Cross Approximation for the compression.

Figure 3: Example of geometry cluster tree

Each pair of cluster which are far enough from each other are said to be admissible (ﬁgure 4). They

yields to compressible blocks in the tree-like matrix structure (ﬁgure 5).

Once created, the structure is “ﬁlled” in a second step with low-rank approximations of the corresponding

matrix blocks, representing the interaction of two clusters. The algorithms then perform the operations on

this structure, using adaptive recompression to avoid inﬂating the matrix as the algorithm progresses.

The compression consists in replacing each block by a product of two low rank matrices (ﬁgure 6). During

the initial ﬁlling of the matrix, this is actived using the ACA+ algorithm. It has the big advantage of not

needing the full block but only the relevant lines and columns. This make the initial ﬁlling very fast compare

to uncompressed assembly. For each block the algorithm ensures that kB−Bcompressedk< ε kBkwhere εis

a user controlled parameter.

Together, they allow for the construction of a fast direct solver with complexity O(nlog2(n)) in some

5of13

American Institute of Aeronautics and Astronautics

(a) relative distance between cluster = 1 (b) relative distance between cluster = 3

Figure 4: Impact of diﬀerent admissibility criteria (in blue, the admissible interactions with red group).

Figure 5: H-Matrix structure. Red blocks will not be compressed. White ones will.

' ×

Figure 6: Compression of a matrix block

6of13

American Institute of Aeronautics and Astronautics

cases,17 which is especially important for BEM applications as it gracefully handles a large number of Right-

Hand Sides (RHS). They also provide a kernel-independent fast solver, allowing one to use the method for

diﬀerent physics.

Airbus Group Innovations has recently implemented the H-Matrix arithmetic and successfully applied it

to a wide range of industrial applications in electromagnetism and acoustics. A sequential version of this

implementation has been published as open source.18 These algorithms are hard to eﬃciently parallelize,

as the very scarce literature on the subject shows.19 We developed a parallel solver that goes beyond the

aforementioned reference, using innovative techniques on top of a state-of-the-art runtime system.20,21 This

enables the solving of very large problems, with a very good eﬃciency. In this presentation, we show some

results on the accuracy of this method on several challenging applications, and its fast solving time and

eﬃcient use of resources.

V. Computation chain

In order to perform high frequency computations, an eﬃcient and automated meshing methodology is

required. An in-house methodology has been developed.

(a) Tesselated geometry (b) Half-automated with

commercial software

(c) Amibe (d) Amibe with Afront

Figure 7: Comparison of diﬀerent mesher on a 1250Hz mesh

The geometry is ﬁrst transformed to a clean and sewed tesselated surface (ﬁgure 7a). Then a mesh is

created for each frequency using a batch mesher named Amibe.22 Amibe is Delaunay mesher so it is robust

but not good at creating high quality triangles. Robustness is what allows to do batch meshing, as no human

must be involved in ﬁxing the mesh. To get better triangles we use Afront.23 Afront is not robust but, as

it is a frontal mesher, it creates very good quality triangles. Then hybridizing Amibe with Afront gives a

robust and high quality mesher (as shown on ﬁgure 8) with more than 90% of the edges length at the target

±5%. In our case, the number of degrees of freedom has been reduced by 30% with this method.

Having good quality triangles help reducing the number of degrees of freedom, increasing the eﬀectiveness

of our FMM algorithm and then reducing the computation time.

Frequency Number of triangles Time

3150Hz 9,347,590 52min

4000Hz 15,065,208 1h 7min

5000Hz 23,525,739 2h 18min

6300Hz 36,263,028 3h 50min

Table 1: Time for the generation of the mesh

VI. Numerical results

Two examples of application are presented here to highlight the possibilities of the Fast Multipole Method

and the H-Matrix solvers. They have been run on a machine with 2 intel Xeon(R) “Ivy Bridge EP” processors

with 12 cores running at 2.7GHz and 192GB RAM per node, and and inﬁniband QDR network to connect

the nodes.

7of13

American Institute of Aeronautics and Astronautics

Figure 8: Repartition of the edge length on a 1250Hz mesh (representative of the mesh quality)

A. Nacelle treatment characterisation

We are interested here in characterising nacelle acoustic treatments eﬀects. The ﬁgure 9shows a nacelle

geometry mesh provided by Airbus. The computation conﬁgurations are: uniform ﬂows (with diﬀerent ﬂow

conditions depending on the pre-deﬁned ﬂight phase analysed), a modal source and several frequencies, which

are related to the engine regime (through the fan blade count and rotation speed).

Figure 9: Example of mesh for a nacelle simulation

We study the takeoﬀ case for diﬀerent frequencies (cf. table 2). The Mach number of the carrying ﬂow

is around 0.6 and all the propagative modes have an energy of 100dB. Due to the number of propagative

modes (Right Hand-Side (RHS) terms), the Fast Multipole Method is not eﬀective. So the H-Matrix solver

is only compared to the direct solver. In the following computations we used ε= 10−3.

1500Hz 2400Hz

Elements 201 780 877 305

Total unknowns 268 416 783 648

Number of propagative modes 466 1 690

Table 2: Nacelle computations characteristics

For the modes coeﬃcients, the relative errors between the H-Matrix and the classical direct solvers are

illustrated in table 3.

Another quantity of interest is the broadband noise obtained by equi-repartition of the energy on the

8of13

American Institute of Aeronautics and Astronautics

Type Relative Error

Min 5.887 10−4

Average 1.839 10−4

Max 4.144 10−3

Table 3: Nacelle computations validation

azimuthal modes

RMSdB(θ) = 10 log10 1

MX

m

1

Nm

Nm

X

n=1

Pmn(θ)2!−10 log10 P2

ref(10)

with θthe radiation direction, Pref the reference pressure (2.10−5Pa), Nmthe number of radial modes for

azimuthal mode mand Pmn the radiated pressure of the mode m, n.

The ﬁgures 10 and 11 illustrate the broadband noise obtained at 2400Hz on an arc of radius of 46m

centered on the inlet and in the vertical plane for H-Matrix and standard (SPIDO2) methods and the

relative error between these two methods. The error is very low and more important for the lowest levels of

noise.

Figure 10: Nacelle computation - broadband noise on an arc at 2400Hz

The computation time at 2400Hz is presented in the table 4. With the new H-Matrix solver, a speed-up

of 60 for the CPU time is observed and the maximum memory is reduced by 15 compared to the direct

classical solver.

Total Assembly Solver Max mem.

(s) (s) (s) (Go)

direct

(480 cores) 85406 32874 34448 107 * 20

H-Matrix

ε= 10−329195 14019 10833 143

(24 cores)

Table 4: Nacelle computations time

9of13

American Institute of Aeronautics and Astronautics

Figure 11: Nacelle computation - error on the broadband noise at 2400Hz

B. Ramp Noise

The other application presented here is the prediction of the installation eﬀects for ramp noise sources. Here,

we consider an A321 rigid aircraft and we are interesting in computing the noise generated by Environmental

Control System (ECS) or Auxilliary Power Unit (APU) sources at servicing points in a frequency range

between 500 Hz and 10 kHz. The ground plane will be taken into account in the Boundary Element Method

by changing the Green kernel. The sources considered are modal sources.

The high frequency imposes the use of the FMM algorithm with an iterative resolution. The ﬁgure 12

shows the mesh adapted to the frequency of 1250 Hz and the table 5the computation characteristics.

Figure 12: Mesh at 1250Hz

Figures 13 and 14 illustrate the broadband noise obtained using equation (10) at 1250Hz on the receivers

for the H-Matrix and FMM methods and the relative error between these two methods. A very good

agreement is observed between the two methods in this case.

The FMM methods shows to be performant and accurate for these kind of computations allowing to

compute Ramp Noise on the whole aircraft for high frequencies (≥5kHz). At the moment the H-Matrix

solver is not able to run on the 2500Hz and 5000Hz frequencies.

10 of 13

American Institute of Aeronautics and Astronautics

Frequency # dof # RHS # cores time

1250Hz 0.9M 3 24 0.9h

2500Hz 3.6M 9 96 3.5h

5000Hz 14.3M 32 88 86h

Table 5: Computation characteristics on A321 Ramp Noise

Figure 13: Ramp noise computation - broadband noise at 1250Hz

Figure 14: Ramp noise computation - error on the broadband noise at 1250Hz

11 of 13

American Institute of Aeronautics and Astronautics

VII. Conclusion

High performance solvers have been implemented into ACTIPOLE software and allow to run large scale

industrial applications. A very good agreement has been obtained between the three solvers tested here in

terms of accuracy.

The three solvers are complementary: SPIDO remains the reference solver in term of accuracy but is very

expensive and can not address large problems, H-Matrix solver has to be preferred for medium problems

especially with a large number of RHS and at the moment, FMM solver remains the reference solver for

huge problems. For instance, FMM solver is able to compute the ramp noise of an airbus A321 at 5kHz.

An automated meshing chain has also been developed in order to manage huge problems with minimal

human interaction.

All these features associated to the current computing power allow to address large industrial problems

during the design phases as well as optimization, uncertainty and sensibility analysis and then to propose

a strategy based on acoustic numerical modelling. Work is on-going for the extension of the modelling to

non-uniform ﬂows.2

Acknowledgements

This work has been funded by the French Government in the frame of Hi-BOX and ABRICOT pro jects.

The authors would like to thank D. Lizarazu from the AIRBUS Acoustics team for her contribution to this

work.

References

1Delnevo, A., Le Saint, S., Sylvand, G., and Terrasse, I., “Numerical methods: Fast multipole method for shielding eﬀects,”

AIAA Paper, Vol. 2971, 2005, pp. 2005.

2Balin, N., Casenave, F., Dubois, F., Duceau, E., Duprey, S., and Terrasse, I., “Boundary element and ﬁnite element

coupling for aeroacoustics simulations,” Journal of Computational Physics, Vol. 294, 2015, pp. 274–296.

3Dubois, F., Duceau, E., Mar´echal, F., and Terrasse, I., “Lorentz Transform and Staggered Finite Diﬀerences for Advective

Acoustics,” Tech. rep., EADS and arXiv:1105.1458, 2002.

4N´ed´elec, J., Acoustic and Electromagnetic Equations: Integral Representations for Harmonic Problems, Vol. 144 of

Applied Math. Sciences , Springer, 2001.

5Liz´e, B., Solveur Direct Haute Performance, Master’s thesis, Ecole Centrale Paris, 2009.

6Hestenes, M. R. and Stiefel, E., Methods of conjugate gradients for solving linear systems , Vol. 49, NBS, 1952.

7Saad, Y. and Schultz, M. H., “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear

systems,” SIAM Journal on scientiﬁc and statistical computing , Vol. 7, No. 3, 1986, pp. 856–869.

8Greengard, L. and Rokhlin, V., “A fast algorithm for particle simulations,” Journal of computational physics, Vol. 73,

No. 2, 1987, pp. 325–348.

9R. Coifman, V. R. and Wandzura, S., “The fast multipole method for the wave equation: a pedestrian prescription,”

IEEE Antennas and Propagation Magazine , Vol. 35, No. 3, 1993, pp. 7–12.

10Darve, E., “The fast multipole method I: error analysis and asymptotic complexity,” SIAM Journal on Numerical

Analysis, Vol. 38, No. 1, 2000, pp. 98–128.

11Sylvand, G., La m´ethode multipˆole rapide en ´electromagn´etisme. Performances, parall´elisation, applications, Ph.D.

thesis, Ecole des Ponts ParisTech, 2002.

12Sylvand, G., “Performance of a parallel implementation of the FMM for electromagnetics applications,” Int. J. Numer.

Meth. Fluids, Vol. 43, 2003, pp. 865–879.

13Messner, M., Schanz, M., and Darve, E., “Fast directional multilevel summation for oscillatory kernels based on Chebyshev

interpolation,” Journal of Computational Physics, Vol. 231, No. 4, 2012, pp. 1175–1196.

14Darve, E. and Hav´e, P., “Eﬃcient fast multipole method for low-frequency scattering,” Journal of Computational Physics,

Vol. 197, No. 1, 2004, pp. 341–363.

15Collino, F., “Analyse th´eorique d’une m´ethodes multipˆoles stable `a toutes ´echelles pour le noyau d’Hemholtz,” Tech. rep.,

CERFACS, 2013.

16Hackbusch, W., “A sparse matrix arithmetic based on H-matrices. part i: Introduction to H-matrices,” Computing,

Vol. 62, No. 2, 1999, pp. 89–108.

17Grasedyck, L. and Hackbusch, W., “Construction and arithmetics of H-matrices,” Computing, Vol. 70, No. 4, 2003,

pp. 295–334.

18“hmat-oss,” https://github.com/jeromerobert/hmat- oss, Accessed: 2016-04-20.

19Kriemann, R., “Parallel-Matrix Arithmetics on Shared Memory Systems,” Computing , Vol. 74, No. 3, 2005, pp. 273–297.

20Liz´e, B., R´esolution directe rapide pour les ´el´ements ﬁnis de fronti`ere en ´electromagn´etisme et acoustique: H-Matrices.

Parall´elisme et applications industrielles , Ph.D. thesis, Paris 13, 2014.

12 of 13

American Institute of Aeronautics and Astronautics

21Augonnet, C., Thibault, S., Namyst, R., and Wacrenier, P.-A., “StarPU: a uniﬁed platform for task scheduling on

heterogeneous multicore architectures,” Concurrency and Computation: Practice and Experience , Vol. 23, No. 2, 2011, pp. 187–

198.

22“Amibe,” http://jcae.sourceforge.net/amibe.html, Accessed: 2016-04-20.

23Schreiner, J., Scheidegger, C., Fleishman, S., and Silva, C., “Direct (Re)Meshing for Eﬃcient Surface Processing,”

Computer Graphics Forum (Proceedings of Eurographics 2006), Vol. 25, No. 3, 2006, pp. 527–536.

13 of 13

American Institute of Aeronautics and Astronautics