Conference PaperPDF Available

Performance and scalability of finite-difference and finite-element wave-propagation modeling on Intel's Xeon Phi

Authors:
  • EZNumeric

Figures

Content may be subject to copyright.
Performance and scalability of finite-difference and finite-element wave-propagation modeling on
Intel’s Xeon Phi
Elena Zhebel, Shell Global Solutions International B.V., Sara Minisini, Shell Global Solutions International B.V.,
Alexey Kononov, Source Contracting, Wim Mulder, Shell Global Solutions International B.V. and Delft University of
Technology
SUMMARY
With the rapid developments in parallel compute architectures,
algorithms for seismic modeling and imaging need to be re-
considered in terms of parallelization. The aim of this paper
is to compare scalability of seismic modeling algorithms: fi-
nite differences, continuous mass-lumped finite elements and
discontinuous Galerkin finite elements. The performance for
these methods is considered for a given accuracy. The exper-
iments were performed on an Intel Sandy Bridge dual 8-core
machine and on Intel’s 61-core Xeon Phi, which is based on
the Many Integrated Core architecture. The codes ran without
any modifications. On the Sandy Bridge, the scalability is sim-
ilar for all methods. On the Xeon Phi, the finite elements out-
perform finite differences on larger number of cores in terms
of scalability.
INTRODUCTION
High-performance computing develops rapidly and new tech-
nologies keep appearing on the market. A few years ago, a
new generation of graphics processing units (GPUs) appeared,
offering tera-FLOPs performance on a single card. The GPU
was originally designed to accelerate the manipulation of im-
ages in a frame buffer that was mapped to an output display.
Later on, dedicated GPUs found use as general-purpose copro-
cessors suited for parallelizable applications. In 2012, Intel re-
leased its Xeon Phi coprocessor based on the Many Integrated
Core (MIC) architecture.
With the changes in compute architectures, the algorithms for
seismic modeling and imaging need to be reconsidered in terms
of parallelization and scalability. Scalability refers to the abil-
ity of an algorithm to sustain increasing performance on an
increasing number of cores.
The finite-difference method has become the workhorse for
time-domain modelling of the wave equation with applica-
tions in acquisition optimization, development and testing of
seismic processing algorithms, reverse-time migration and full
waveform inversion. Its advantages are the relative ease of
coding and parallelization, the use of high-order spatial dis-
cretization schemes and explicit time stepping, and its com-
putational speed. The common opinion and experience is that
finite differences perform well on rectangular domains with
smooth velocity variations. However, in case of an irregular
free surface or sharp contrasts in the medium properties, they
lose their accuracy when using a Cartesian coordinate system.
If the interface does not follow the grid, the staircasing effect
generates first-order errors. Because the solution is continu-
ous but not differentiable across an impedance contrast, a local
second-order error will be incurred as well.
Finite-element methods have some advantages over finite dif-
ferences because they can easily handle geometric or property
discontinuities by using unstructured meshes and spatial local
refinement. For instance, meshes can be used that consist of
tetrahedral elements having their size scale with the local ve-
locity. In this way the resulting meshes have less elements
while maintaining the same number of points per wavelength
in the computational domain. Finite elements that follow sharp
interfaces do not suffer from a loss of accuracy. Moreover,
they offer flexibility in mixing of discretization orders, mixing
element geometries and deploying hybrid discretizations. The
choice of a suitable time discretization scheme enables explicit
time stepping. If the standard finite-element approach is used,
a large sparse linear system of equations has to be solved at
each time step, which has a negative impact on performance.
There are a number of finite-element techniques that avoid the
inversion of the large sparse matrix, for example, spectral ele-
ments, discontinuous Galerkin finite elements and continuous
mass-lumped finite-elements. The last two will be considered
here.
Several authors compared the various methods (Fornberg, 1987;
Moczo et al., 2011; Chaljub et al., 2007; Wang et al., 2010;
Pasquetti and Rapetti, 2004; De Basabe and Sen, 2007, e.g.).
Zhebel et al. (2012a) considered the continuous mass-lumped
and the discontinuous Galerkin finite elements in terms of ac-
curacy, stability and computational cost. Numerical experi-
ments on 3-D problems showed that both methods have similar
stability conditions and require a comparable computational
time to obtain a result with a given accuracy, assuming that
the stiffness and mass matrices are pre-assembled. Here, we
will recompute them on the fly to save storage. Another paper
(Zhebel et al., 2013) compares the continuous mass-lumped
finite elements to the finite-difference method. Zhebel et al.
(2012b) cover the stability analysis.
The goal of the present paper is to compare finite-difference
and finite-element algorithms in terms of scalability on many-
core architectures. The next section reviews the methods. Then,
we provide detail of the compute architecture, followed by the
results of experiments with the three methods on different ar-
chitectures.
METHODS
We consider the 3-D constant-density acoustic wave equation
1
c
2u
t2
2u
x2
2u
y2
2u
z2=s,(1)
DOI http://dx.doi.org/10.1190/segam2013-0861.1© 2013 SEG
SEG Houston 2013 Annual Meeting Page 3386
Downloaded 08/27/13 to 80.254.146.84. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
Performance and scalability of FD and FE on Intel’s Xeon Phi
on a bounded domain R3, where u(t,x,y,z)is the pres-
sure wave field, c(x,y,z)is the velocity of the medium and
s(t,x,y,z)is the source function, x,y,z. If the domain
has topography, the free surface can have reflecting (zero pres-
sure) boundary conditions. Absorbing boundary conditions are
imposed elsewhere.
By choosing a symmetric time-marching scheme with a time
step t, for example, the leap-frog method, we obtain the fol-
lowing algebraic system of equations,
un+1=2unun1+t2(Lun+sn),(2)
where Ldenotes a spatial discretization. The only unknown
is the vector un+1. The values of the solution at the previous
time steps, nand (n1), are known. For the spatial discretiza-
tion, we consider a finite-difference method (FD), symmetric
interior penalty discontinuous Galerkin finite elements (DG)
and continuous mass-lumped finite-elements (ML).
Finite differences
The problem 1 is discretized by central finite differences on
a regular Cartesian grid in the rectangular domain represented
by ={(xi,yj,zk)|xi=x0+ix,yj=y0+jy,zk=z0+
kz,i,j,kN}, where x,yand zare grid spacings in the
x-, y- and z-directions, respectively, and i=0,1,...,Nx1,
j=0,1,...,Ny1, k=0,1,...,Nz1. With this, the acoustic
wave equation in the second-order formulation at one point in
space and time becomes
un+1
i,j,k=2un
i,j,kun1
i,j,k+t2c2
i,j,k[Dxxun]i,j,k
ˆDyyun˜i,j,k[Dzzun]i,j,k+sn
i,j,k,
(3)
where Dxx,Dyy and Dzz denote the discretized second deriva-
tives in the x-, y- and z-directions, respectively. Time is sam-
pled at tn=Tmin +nt, with t= (Tmax Tmin)/NTand n=
0,...,NT. The pressure field at the next time step, un+1, de-
pends on the current nand the previous (n1)time steps.
The legacy finite-difference code makes use of distributed mem-
ory parallelization with MPI, also when used on a single many-
core node. Grid partitioning is applied to divide the domain
into a number of subdomains. From the expression 3, it is clear
that the computation of a single grid point (i,j,k)requires
information from its neighbors in three directions. The sub-
domains require additional points for the calculations, called
halos. At each time step, the points in the halos need to be
exchanged to synchronize their values. Therefore, the paral-
lelization for a distributed memory design requires extra mem-
ory and has a communication overhead.
Discontinuous Galerkin
We briefly review the formulation of the problem when using
the symmetric interior penalty discontinuous Galerkin formu-
lation. A detailed derivation can be found in Rivi`
ere (2008).
We choose the symmetric and conservative form of the scheme.
In this method, the solution is allowed to be discontinuous
across element boundaries. Because in seismics, we want con-
tinuous solutions, a penalty is added to make the discontinu-
ities small, of the order of the numerical discretization error.
To discretize the wave equation 1 with finite elements, we use
the weak formulation
Z
1
c2
2u
t2vd+ZuvdZΓ(n·u)vd=
Zs vd,
(4)
for all test functions vthat are chosen as polynomials up to
degree M. Here, ndenotes the outward normal and Γconsists
of internal and external boundaries of the domain .
The domain can be partitioned into tetrahedral elements
τ
.
To have a certain number of points per wavelength, we require
the diameter of the inscribed sphere to scale with the domi-
nant wavelength, hence with the local velocity. The meshing
approach has been described elsewhere (Kononov et al., 2012).
Since the solution is discontinuous across the internal bound-
aries, additional computations of so-called fluxes are needed to
make the solution continuous. The term with the normal in 4
is a flux term that is given on the faces of elements. It consists
of incoming and outgoing fluxes.
By discretizing the weak formulation 4 with leap-frog, we ob-
tain for each element
τ
,
un+1
τ
=2un
τ
un1
τ
+t2M1`Kun
τ
F+un
τ
Fun
τ
+sn
τ
´,
where Mand Kare the local mass and stiffness matrix, respec-
tively. Depending on the element degree M, the matrices will
have different sizes, see Table 1. We also have the contribution
of the fluxes. The term F+denotes the sum of outgoing fluxes
over the four faces in the given element and can be stored in
one matrix. The second term, F, contains incoming fluxes
from the four neighboring elements,
τ
, and requires the stor-
age of four matrices. Note that for elements which lie on the
boundary, we need to store less outgoing flux matrices, since
one or more neighbors are absent.
The implementation of DG uses shared memory paralleliza-
tion with OpenMP on many-cores architectures. The paral-
lelization is performed over the grid points in space. That
means that, for each time step, the computations for the grid
points (i,j,k)are performed in parallel. The information from
the neighboring points is read from shared memory. To save
memory, the matrices M,Kand F+and the four Ffor each
element are reassembled at each time step.
Continuous mass-lumped finite elements
The weak formulation of equation 1 for the continuous mass-
lumped finite element method is given by equation 4, but now
Γonly refers to the external boundaries, because the test func-
tions and solution are assumed to be continuous. Discretizing
equation 4 with mass-lumped finite elements in space and with
second-order finite differences in time, we obtain a global fully
algebraic system of the form
un+1=2unun1+t2M1(Kun+sn).(5)
The matrix Mis diagonal, avoiding the need to solve a large
sparse linear system of equations. Chin-Joe-Kong et al. (1999)
DOI http://dx.doi.org/10.1190/segam2013-0861.1© 2013 SEG
SEG Houston 2013 Annual Meeting Page 3387
Downloaded 08/27/13 to 80.254.146.84. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
Performance and scalability of FD and FE on Intel’s Xeon Phi
Table 1: The size of the matrices per element used in DG and
ML is N×Nand depends on the degree M.
degree M N (DG) N(ML)
1 4 4
2 10 23
3 20 50
4 35 –
describe the construction of the mass-lumped tetrahedral ele-
ments in detail.
The global discretized problem 5 can be reformulated element-
by-element, improving parallelization of the time-stepping. In
this case, depending on the element degree M, the matrices will
have different sizes, see Table 1. The mass matrix was pre-
assembled before the time stepping started, but the stiffness
matrices were reassembled on the fly during the time stepping.
The contribution per element was determined from the 6 pre-
computed matrices of size N×Nfor the reference element and
6 scalar factors related to the local coordinate transformation
of each element. In this way, only two vectors for the solution
and one for the inverse mass matrix need to be stored, each of
the size of the total number of degrees of freedom. The veloc-
ity, c(x,y,z)can be absorbed in the inverse mass matrix. The
shared-memory parallelization of ML uses OpenMP. During
each time step, the elements
τ
are treated in parallel.
COMPUTE ARCHITECTURES
The codes were run on two architectures: Intel’s standard x86
processor, Sandy Bridge, and a Xeon Phi based on Intel Many
Integrated Core architecture.
The Sandy Bridge based compute nodes has two eight-core
Intel ES-2670 2.6 GHz CPUs processors.
The Xeon Phi is a coprocessor based on Intel’s Many Inte-
grated Core architecture. The difference between a processor
and coprocessor is that a processor does not require another
compute device to be present in a working system. A copro-
cessor cannot be used as an independent device. It has to be
connected to a processor (host). In our case, an Intel Xeon Phi
card is connected to a host through a PCIe×16 bus.
There are two ways to use the Intel Xeon Phi card. The first
is to run the whole application on the Intel Xeon Phi, called
its native mode. The second is to identify the parallel parts
in the algorithm and run the program on the host, letting only
those parallel parts run out-of-core on a MIC, also called off-
load mode. In our experiments, we only used the Xeon Phi
in the first way by running the whole application on a single
card. For this, the codes were recompiled with the latest Intel
Compiler (ICS2013). The Intel Xeon Phi coprocessor runs
Linux. Each card has his own IP address. Each Xeon Phi
coprocessor combines 61 Intel cores on a single chip running
with a frequency of 1.1 GHz. One core is responsible for the
OS. The maximum number of threads per core is 4, allowing a
total of 244. The Xeon Phi has 8 GB of DDR5 memory and a
(a) (b)
Figure 1: (a) Velocity model for the model problem. The
source is denoted by a red star and receivers by yellow tri-
angles. (b) Snapshot of the wave field.
bandwidth of 320 GB/s.
The FD code, containing MPI calls, and the DG and ML code,
parallelized with OpenMP pragmas, did not require any modi-
fication.
RESULTS
We consider seismic modelling 1, which represents the essen-
tial part of migration and inversion algorithms. The problem
size has been chosen such that it would fit in the 8 GB of Xeon
Phi memory and is kept the same also for the experiments on
Sandy Bridge. This means we only consider strong scaling,
as opposed to weak scaling where the problem size per core is
kept constant.
The goal of the experiments is to compare finite-element and
finite-difference algorithms in terms of their scalability on many-
core architectures.
The model problem is the dipping interface shown in Fig-
ure 1(a). We consider a 3-D domain of size (2 km)3with
two halfspaces having velocities of 1.5 and 3.0 km/s. The in-
terface runs from 0.7 to 1.3 km depth between 0 and 2 km
in x-direction, so the dip angle is 16.7. A shot is located at
(779.7, 1000, 516.3) m, 350 m above the interface in the shal-
low low-velocity part of the model. The receivers are located
250 m above the interface and have offsets from 100 to 700 m
with a 25-m interval, parallel to the interface in the down-dip
direction of the source.
Figure 2 shows the errors as function of the total degrees of
freedom N, using either the finite-difference scheme with a
4th-order approximation of the second derivatives in each co-
ordinate direction, the M=3 type 2 continuous mass-lumped
finite-element method, or the discontinuous Galerkin finite el-
ement of degree 3. The finite-element methods therefore also
have 4th-order spatial accuracy and a 2nd-order temporal error.
Note that 4th-order accuracy of the finite-difference scheme is
lost near discontinuities in the model. The finite-difference
code was run on grids with 2013, 2513, 3013, 3513, 4013
and 5013points. The continuous mass-lumped finite-element
code and discontinuous Galerkin finite-element code ran on
meshes with 294,508, 567,071, and 2,320,289 elements, re-
DOI http://dx.doi.org/10.1190/segam2013-0861.1© 2013 SEG
SEG Houston 2013 Annual Meeting Page 3388
Downloaded 08/27/13 to 80.254.146.84. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
Performance and scalability of FD and FE on Intel’s Xeon Phi
102.3 102.4 102.5 102.6
10−3
10−2
10−1
100
N1/3
FD, 2−norm
FD, max
ML, 2−norm
ML, max
DG, 2−norm
DG, max
Figure 2: Errors for the model problem as a function of
1/h=N1/3, where Nis the number of grid points for the
finite-difference method (FD) or the number of degrees of free-
dom for the finite-element methods (ML and DG). The dashed
lines represent the maximum norm, the drawn the 2-norm. The
green line corresponds to a spatial fourth-order error. Second-
order time stepping was used.
2 4 6 8 10 12 14 16
2
4
6
8
10
12
14
16
Number of cores
Speed−up
Sandy Bridge
FD
ML
DG
ideal
Figure 3: Speed-up on Intel’s Sandy Bridge. The red line
with crosses denotes FD, the black line with circles denotes
ML and the blue line with squares denotes DG. The solid line
represents ideal speed-up.
spectively. For finite differences and for finite elements, the
direct wave was modelled on the same mesh but with a con-
stant velocity of 1.5 km/s and then subtracted to only have the
reflected/refracted event. It is obvious that, asymptotically, the
ML and DG finite elements are more accurate than the finite
differences because the mesh follows the interface.
For the experiments on the Intel Xeon Phi, we have selected
the FD problem of size 3013and the mesh with 567,071 tetra-
hedra for the finite elements. For these problems, FD, ML and
DG have a similar accuracy of about 3%. In total, ML had
28,353,550 and DG had 11,341,420 degrees of freedom, re-
spectively. Figure 3 and 4 show the speed-up for the finite dif-
ferences and the two finite-element methods on Intel’s Sandy
Bridge and Xeon Phi, respectively. The results on Sandy Bridge
50 100 150 200
10
20
30
40
50
60
70
80
90
100
110
120
Number of cores
Speed−up
MIC
FD
ML
DG
ideal
Figure 4: Speed-up on Intel’s Xeon Phi. The red crosses
denote FD, black circles denote ML and the blue line with
squares denotes DG. The solid line presents ideal speed-up.
show that the speed-up measured on the nodes on one socket
(up to 8 cores) is close to optimal for the three methods. To
make sure that 1 process is running on the same core and is not
dynamically relocated with cache losses, we have forced affin-
ity. When using more than 8 cores, affinity is switched off.
Since the DG code is more computational intensive that ML
and FD, due to calculation of the fluxes, the speed-up is larger.
The results on Xeon Phi show that the speed-up is almost linear
with less than 50 threads. For a larger number of threads, the
efficiency is reduced. Similar to the experiments with FD, we
note a performance peak with 60, 120 and 180 threads, which
are running at maximum capacity. This is especially obvious
with DG. FD uses 1 thread for I/O and programme control,
which is mostly idle during the computations. This one is ex-
cluded from the node count in the graphs. The code attempts
to find a subdivision that favors cube-like structures, but the
choice of the number of cores limits the options. The spe-
cific choice of subdivision impacts the performance. With the
3013points on the mesh, a subdivision on, for instance, 173+1
nodes into 173 subdomains in one coordinate direction does
not make sense, so cases like that are absent from the graph.
CONCLUSIONS
We have compared the scalability of three seismic modeling
algorithms: finite differences, continuous mass-lumped finite
elements and discontinuous Galerkin finite elements. Their
performance was considered for a given accuracy. Asymptot-
ically, the ML and DG finite elements are more accurate than
the finite differences when the finite-element mesh follows the
interfaces between sharp contrasts in medium properties. The
experiments were run on the Intel Sandy Bridge dual 8-core
machine and Intel Xeon Phi based on the Many Integrated
Core architecture. We observed that the finite elements have
similar speed-up as finite differences, if a proper subdivision
is chosen. Only beyond 120 cores, FD degrades. On Sandy
Bridge, all methods show similar scalability.
DOI http://dx.doi.org/10.1190/segam2013-0861.1© 2013 SEG
SEG Houston 2013 Annual Meeting Page 3389
Downloaded 08/27/13 to 80.254.146.84. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
http://dx.doi.org/10.1190/segam2013-0861.1
EDITED REFERENCES
Note: This reference list is a copy-edited version of the reference list submitted by the author. Reference lists for the 2013
SEG Technical Program Expanded Abstracts have been copy edited so that references provided with the online metadata for
each paper will achieve a high degree of linking to cited sources that appear on the Web.
REFERENCES
Chaljub, E., D. Komatitsch, J. Vilotte, Y. Capdeville, B. Valette, and G. Festa , 2007, Spectral element
analysis in seismology: in R.S. Wu and V. Maupin, eds., Advances in geophysics: Advances in wave
propagation in heterogeneous media: Elsevier, 365419.
Chin-Joe-Kong, M. J. S., W. A. Mulder, and M. van Veldhuizen, 1999, Higher-order triangular and
tetrahedral finite elements with mass lumping for solving the wave equation: Journal of Engineering
Mathematics, 35, 405426, http://dx.doi.org/10.1023/A:1004420829610.
De Basabe, J. D., and M. K. Sen, 2007, Grid dispersion and stability criteria of some common finite-
element methods for acoustic and elastic wave equations: Geophysics, 72, no. 6, T81T95,
http://dx.doi.org/10.1190/1.2785046.
Fornberg, B., 1987, The pseudospectral method: Comparisons with finite-differences for the elastic wave
equation: Geophysics, 52, 483501, http://dx.doi.org/10.1190/1.1442319.
Kononov, A., S. Minisini, E. Zhebel, and W. A. Mulder, 2012, A 3D tetrahedral mesh generator for
seismic problems: 74th Conference & Exhibition, EAGE, Extended Abstracts, 58922.
Moczo, P., J. Kristek, M. Galis, E. Chaljub, and V. Etienne, 2011, 3-D finite-difference, finite-element,
discontinuous-Galerkin and spectral-element schemes analysed for their accuracy with respect to P-
wave to S-wave speed ratio: Geophysical Journal International, 187, 16451667,
http://dx.doi.org/10.1111/j.1365-246X.2011.05221.x.
Pasquetti, R., and F. Rapetti, 2004, Spectral element methods on triangles and quadrilaterals: comparisons
and applications: Journal of Computational Physics, 198, 349362,
http://dx.doi.org/10.1016/j.jcp.2004.01.010.
Rivière, B., 2008, Discontinuous Galerkin methods for solving elliptic and parabolic equations: Theory
and implementation: SIAM Frontiers in Mathematics 35.
Wang, X., W. W. Symes, and T. Warburton, 2010, Comparison of discontinuous Galerkin and finite
difference methods for time domain acoustics: 80th Annual International Meeting, SEG, Expanded
Abstracts, 30603065, http://dx.doi.org/10.1190/1.3513482.
Zhebel, E., S. Minisini, A. Kononov, and W. A. Mulder, 2012a, A comparison of continuous mass-
lumped finite elements and finite differences for 3-D acoustics in the time domain: 74th Conference
& Exhibition, EAGE, Extended Abstracts, 59474.
Zhebel, E., S. Minisini, A. Kononov, and W. A. Mulder, 2012b, On the time-stepping stability of
continuous mass-lumped and discontinuous Galerkin finite elements for the 3D acoustic wave
equation: Proceedings of the 6th European Congress on Computational Methods in Applied Sciences
and Engineering (ECCOMAS 2012), CD-ROM, 10411051.
Zhebel, E., S. Minisini, A. Kononov, and W. A. Mulder, 2013, Personal communication.
DOI http://dx.doi.org/10.1190/segam2013-0861.1© 2013 SEG
SEG Houston 2013 Annual Meeting Page 3390
Downloaded 08/27/13 to 80.254.146.84. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
... Zhebel et al. [9] compared scalability of unmodified codes for finite-differences and finite-element algorithms on Intel Xeon and Xeon Phi. On the Xeon, the scalability was similar and non-linear for all the methods, while on the Xeon Phi, only the finite difference showed less scalability, because of some idleness of the I/O and program control thread. ...
Conference Paper
Full-text available
Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results faster and with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand the impact of each change applied to the software, in order to improve the performance as most as possible. In this paper, we propose several optimization strategies for a wave propagation model for five architectures: Intel Haswell, Intel Knights Corner, Intel Knights Landing, NVIDIA Kepler and NVIDIA Maxwell. We focus on improving the cache memory usage, vectorization, and locality in the memory hierarchy. We analyze the hardware impact of the optimizations, providing insights of how each strategy can improve the performance. The results show that NVIDIA Maxwell improves over Intel Haswell, Intel Knights Corner, Intel Knights Landing and NVIDIA Kepler performance by up to 17.9x.
... For finite-differences and finite-elements algorithms running on Intel Xeon and Intel Xeon Phi processors, Zhebel et al. [12] compared scalability of unmodified codes. Scalability analyses are out of the scope of this work, yet not excluded from future lines to evaluate. ...
... It is also easier to write code, because there is a wide range of developing tools available. Recent work done using the many core architecture Intel Xeon Phi (Zhebel et al., 2013) shows that this platform is promising in providing even better performance keeping the traditional CPU programming environment. ...
Conference Paper
Full-text available
Many recent oil field discoveries are located in deep water, imposing new challenges for seismic migration. Traditional migration methods used in other areas (e.g., ray based migration algorithms) could not provide the required results. Reverse Time Migration (RTM) has become the method of choice to image these fields. However, this method is computationally very intensive, even using modern parallel architectures like multicores. One of the critical computational challenges of RTM is the high demand of memory performance, requiring several optimizations to take advantage of the memory hierarchy. In this paper we show the use of the main optimization techniques like cache blocking, domain decomposition, memory distribution to optimize an Elastic RTM code, achieving a speedup of 6 in 8 cores on a multicore architecture.
Article
Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results faster and with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand the impact of each change applied to the software, in order to improve the performance as most as possible. In this paper, we propose several optimization strategies for a wave propagation model for six architectures: Intel Broadwell, Intel Haswell, Intel Knights Landing, Intel Knights Corner, NVIDIA Pascal and NVIDIA Kepler. We focus on improving the cache memory usage, vectorization, load balancing, portability and locality in the memory hierarchy. We analyze the hardware impact of the optimizations, providing insights of how each strategy can improve the performance. The results show that NVIDIA Pascal outperforms the other considered architectures by up to 8.5x.
Conference Paper
Full-text available
Finite-element modelling of seismic wave propagation on tetrahedra requires meshes that accurately follow interfaces between impedance contrasts or surface topography and have element sizes proportional to the local velocity. We explain a mesh generation approach by example. Starting from a finite-difference representation of the velocity model, triangulated surfaces are generated along impedance discontinuities. These define subdomains that are meshed independently and in parallel, honouring the local velocity values. The resulting volumetric meshes are merged into a single mesh. The approach is flexible, efficient, scalable and capable of producing quality meshes.
Article
Full-text available
We present a review of the application of the spectral-element method to regional and global seismology. This technique is a high-order variational method that allows one to compute accurate synthetic seismograms in three-dimensional heterogeneous Earth models with deformed geometry. We first recall the strong and weak forms of the seismic wave equation with a particular emphasis set on fluid regions. We then discuss in detail how the conditions that hold on the boundaries, including coupling boundaries, are honored. We briefly outline the spectral-element discretization procedure and present the time-marching algorithm that makes use of the diagonal structure of the mass matrix. We show examples that illustrate the capabilities of the method and its interest in the context of the computation of three-dimensional synthetic seismograms.
Article
Full-text available
The higher-order finite-element scheme with mass lumping for triangles and tetrahedra is an efficient method for solving the wave equation. A number of lower-order elements have already been found. Here the search for elements of higher order is continued. Elements are constructed in a systematic manner. The nodes are chosen in a symmetric way. Integration rules must be exact up to a certain degree to maintain an overall accuracy that is the same as without mass lumping. First, for given integration degrees, consistent rule structures are derived for which integration formulas are likely to exist. Then, as each rule structure corresponds to a potential element of certain order, the position of element nodes and the integration weights can be found by solving the related system of nonlinear equations. With this systematic approach, a number of new sixth-order triangular elements and a new fourth-order tetrahedral element have been found.
Article
Full-text available
Purely numerical methods based on finite-element approx-imation of the acoustic or elastic wave equation are becoming increasingly popular for the generation of synthetic seismo-grams. We present formulas for the grid dispersion and stabil-ity criteria for some popular finite-element methods FEM for wave propagation, namely, classical and spectral FEM. We develop an approach based on a generalized eigenvalue formulation to analyze the dispersive behavior of these FEMs for acoustic or elastic wave propagation that overcomes diffi-culties caused by irregular node spacing within the element and the use of high-order polynomials, as is the case for spec-tral FEM.Analysis reveals that for spectral FEM of order four or greater, dispersion is less than 0.2% at four to five nodes per wavelength, and dispersion is not angle dependent. New results can be compared with grid-dispersion results of some classical finite-difference methods FDM used for acoustic or elastic wave propagation. Analysis reveals that FDM and classical FEM require a larger sampling ratio than a spectral FEM to obtain results with the same degree of accuracy. The staggered-grid FDM is an efficient scheme, but the disper-sion is angle dependent with larger values along the grid axes. On the other hand, spectral FEM of order four or greater is isotropic with small dispersion, making it attractive for simu-lations with long propagation times.
Article
We solve the three-dimensional acoustic wave equation, discretized on tetrahedral meshes. Two methods are considered: mass-lumped continuous finite elements and the symmetric interior-penalty discontinuous Galerkin method (SIP-DG). Combining the spatial discretization with the leap-frog time-stepping scheme, which is second-order accurate and conditionally stable, leads to a fully explicit scheme. We provide estimates of its stability limit for simple cases, namely, the reference element with Neumann boundary conditions, its distorted version of arbitrary shape, the unit cube that can be partitioned into 6 tetrahedra with periodic boundary conditions, and its distortions. The CFL stability limit contains a length scale for which we considered different options. The one based on the sum of the eigenvalues of the spatial operator for the first degree mass-lumped element gives the best results. It resembles the diameter of the inscribed sphere but is slightly easier to compute. The stability estimates show that mass-lumped continuous and SIP-DG finite elements have comparable stability conditions, with the exception of the elements of the first degree. The stability limit for the mass-lumped elements is less restrictive and allows for larger time steps.
Article
The finite-difference method on rectangular meshes is widely used for time-domain modelling of the wave equation. It is relatively easy to implement high-order spatial discretization schemes and parallelization. Also, the method is computationally efficient. However, the use of finite elements on tetrahedral unstructured meshes is more accurate in complex geometries near sharp interfaces. We compared the standard eighth-order finite-difference method to fourth-order continuous mass-lumped finite elements in terms of accuracy and computational cost. The results show that, for simple models like a cube with constant density and velocity, the finite-difference method outperforms the finite-element method by at least an order of magnitude. Outside the application area of rectangular meshes, i.e., for a model with interior complexity and topography well described by tetrahedra, however, finite-element methods are about two orders of magnitude faster than finite-difference methods, for a given accuracy.
Article
We analyse 13 3-D numerical time-domain explicit schemes for modelling seismic wave propagation and earthquake motion for their behaviour with a varying P-wave to S-wave speed ratio (VP/VS). The second-order schemes include three finite-difference, three finite-element and one discontinuous-Galerkin schemes. The fourth-order schemes include three finite-difference and two spectral-element schemes. All schemes are second-order in time. We assume a uniform cubic grid/mesh and present all schemes in a unified form. We assume plane S-wave propagation in an unbounded homogeneous isotropic elastic medium. We define relative local errors of the schemes in amplitude and the vector difference in one time step and normalize them for a unit time. We also define the equivalent spatial sampling ratio as a ratio at which the maximum relative error is equal to the reference maximum error. We present results of the extensive numerical analysis. We theoretically (i) show how a numerical scheme sees the P and S waves if the VP/VS ratio increases, (ii) show the structure of the errors in amplitude and the vector difference and (iii) compare the schemes in terms of the truncation errors of the discrete approximations to the second mixed and non-mixed spatial derivatives. We find that four of the tested schemes have errors in amplitude almost independent on the VP/VS ratio. The homogeneity of the approximations to the second mixed and non-mixed spatial derivatives in terms of the coefficients of the leading terms of their truncation errors as well as the absolute values of the coefficients are key factors for the behaviour of the schemes with increasing VP/VS ratio. The dependence of the errors in the vector difference on the VP/VS ratio should be accounted for by a proper (sufficiently dense) spatial sampling.
Article
The pseudospectral (or Fourier) method has been used recently by several investigators for forward seis- mic modeling. The method is introduced here in two different ways: as a limit of finite differences of increas- ing orders, and by trigonometric interpolation. An argu- ment based on spectral analysis of a model equation shows that the pseudospectral method (for the accu- racies and integration times typical of forward elastic seismic modeling) may require, in each space dimension, as little as a quarter the number of grid points com- pared to a fourth-order finite-difference scheme and one-sixteenth the number of points as a second-order finite-difference scheme. For the total number of points in two dimensions, these factors become l/16 and l/256, respectively; in three dimensions, they become l/64 and 114 096, repectively In a series of test calculations on the two-dimensional elastic wave equation, only minor degradations are found in cases with variable coefficients and discontinu- ous interfaces.