Content uploaded by Andrea D. Beck
Author content
All content in this area was uploaded by Andrea D. Beck on Jan 12, 2021
Content may be subject to copyright.
Highlights
An Efficient Sliding Mesh Interface Method for High-Order Discontinuous Galerkin Schemes
Jakob D¨
urrw¨
achter, Marius Kurz, Patrick Kopper, Daniel Kempf, Claus-Dieter Munz, Andrea Beck
•An efficient parallelization strategy for a high-order accurate sliding mesh method
•Investigation of the method’s scaling behavior on high performance computing systems
•A wall-resolved large eddy simulation of a 1-1/2 stage turbine
An Efficient Sliding Mesh Interface Method for High-Order Discontinuous
Galerkin Schemes
Jakob D¨
urrw¨
achtera,1,∗, Marius Kurza,1, Patrick Kopperb, Daniel Kempfa, Claus-Dieter Munza, Andrea Beckc
aInstitute of Aerodynamics and Gas Dynamics, University of Stuttgart, Pfaffenwaldring 21, 70569 Stuttgart, Germany
bInstitute of Aircraft Propulsion Systems, University of Stuttgart, Pfaffenwaldring 6, 70569 Stuttgart, Germany
cLaboratory of Fluid Dynamics and Technical Flows, University of Magdeburg “Otto von Guericke”, Universittsplatz 2, 39106 Madgeburg,
Germany
Abstract
Sliding meshes are a powerful method to treat deformed domains in computational fluid dynamics, where different
parts of the domain are in relative motion. In this paper, we present an efficient implementation of a sliding mesh
method into a discontinuous Galerkin compressible Navier-Stokes solver and its application to a large eddy simulation
of a 1-1/2 stage turbine. The method is based on the mortar method and is high-order accurate. It can handle
three-dimensional sliding mesh interfaces with various interface shapes. For plane interfaces, which are the most
common case, conservativity and free-stream preservation are ensured. We put an emphasis on efficient parallel
implementation. Our implementation generates little computational and storage overhead. Inter-node communication
via MPI in a dynamically changing mesh topology is reduced to a bare minimum by ensuring a priori information
about communication partners and data sorting. We provide performance and scaling results showing the capability
of the implementation strategy. Apart from analytical validation computations and convergence results, we present a
wall-resolved implicit LES of the 1-1/2 stage Aachen turbine test case as a large scale practical application example.
Keywords: Sliding mesh, Discontinuous Galerkin, High-order methods, High-performance computing, Large eddy
simulation, Turbine flow
1. Introduction
High-order methods have significantly gained popularity over the last decade, due to their potential advantages in
terms of accuracy and efficiency for many applications [1]. A variety of high-order, element-based schemes has been
proposed in literature, such as reconstruction-based approaches like the WENO schemes [2, 3], h/p finite element
schemes [4, 5], the flux reconstruction method [6] and the spectral difference method [7, 8], with the latter two also
being closely related to the discontinuous Galerkin methods [9, 10]. In particular the discontinuous Galerkin spectral
element method (DGSEM) [11, 12] has shown its suitability for simulations of unsteady turbulent flows in large-scale
applications and high-performance computing (HPC) as consequence of its high-order of accuracy as well as its ex-
cellent scaling properties [1, 13, 14, 15, 16]. While the development of novel formulations and efficient schemes is
still ongoing, there is sufficient evidence from published results that DG and related methods have reached a certain
level of maturity. Thus, adding and exploring new features to the existing formulations - while retaining the original
properties - has now become another focus of ongoing development.
Many fields of engineering interest, as e.g. turbomachinery, wind turbines and rotorcraft, are characterized by mov-
ing geometries and large periodic displacements. A common technique to incorporate this movement into numerical
∗Corresponding author
Email addresses: jd@iag.uni-stuttgart.de (Jakob D¨
urrw¨
achter), m.kurz@iag.uni-stuttgart.de (Marius Kurz),
kopper@ila.uni-stuttgart.de (Patrick Kopper), kempf@iag.uni-stuttgart.de (Daniel Kempf), munz@iag.uni-stuttgart.de
(Claus-Dieter Munz), beck@iag.uni-stuttgart.de (Andrea Beck)
1J. D¨
urrw¨
achter and M. Kurz share first authorship.
Preprint submitted to Computers &Fluids December 11, 2020
schemes is the arbitrary Lagrangian-Eulerian (ALE) approach [17], which introduces a time-dependent mapping from
an arbitrarily deformed domain to the undeformed reference. Mesh movement can also be used for fully Lagrangian
formulations [18] and has in this context also been considered on curvilinear meshes [19, 20]. Since the mesh topology
and connectivity remain unchanged for the ALE approach, it is often limited to small or moderate relative displace-
ments to retain valid grid cells. In order to accommodate larger displacements, the mesh topology has to be dynamic.
To this end, several approaches based on the ALE method have been developed: A quite recent high-order accu-
rate family of methods allows for rather general topology changes and large displacements. It allows for re-meshing
through a continuous mesh movement between timesteps, which leads to straight polyhedral space-time meshes (two-
dimensional in space), which are unstructured both in space and time [21, 22]. Hanging nodes and sliding lines [23]
have also been incorporated into this approach [24]. In the overset mesh (Chimera) method [25, 26, 27, 28, 29], sev-
eral independent meshes inside the computational domain are overlaid, which can be moved independently and are
coupled by means of the overlapping elements and associated inter-mesh interpolation operators. The sliding mesh
approach, on the other hand, is less general, which allows it to be conceptually simpler, easier to implement and po-
tentially more computationally efficient. In the sliding mesh method [30, 31, 32, 33, 34], the computational domain is
divided into non-overlapping sub-domains, which can slide along a common interface while preserving the mesh ge-
ometry inside each sub-domain. This approach can be seen as a special case of the Chimera method, where the overlap
area is restricted to a linear (in 2D) or planar (in 3D) shared region, and the movement is prescribed accordingly. For
both approaches, special care must be taken to construct high-order variants, which guarantee global conservation and
overall error scaling behavior, especially if the consistent geometry representation of curved interfaces has to be taken
into account [35, 36, 37, 38, 39, 40, 41]. Schematics of the discussed methods are shown in Figure 1.
In this work, we thus focus on the challenge of combining such an approach with a high-order discontinuous Galerkin
solver. This not only requires a conservative, high-order accurate method at the sub-domain interface, but also the
design of an efficient and dynamic parallelization strategy which retains the scaling of the static version. These efforts
then give us the ability to conduct high-order sliding mesh simulations at an industrial scale, and provide a novel tool
for investigating the highly non-linear and unsteady flow physics in these scenarios. The implementation considered
in the present work is open source and can be retrieved from GitHub2.
Figure 1: Schematics of common moving mesh techniques: the arbitrary Lagrangian-Eulerian approach (ALE) (left), the sliding mesh method
(middle) and the overset mesh or Chimera method (right)
The method was used to investigate an interesting and challenging industrial application, the Aachen 1-1/2 stage axial
flow turbine [42, 43, 44]. It serves to study both complex turbulent flow phenomena and stage interaction, the pre-
diction of which are crucial to a turbine’s performance. Multi-stage turbomachines consist of several alternating rows
of static and rotating blades. The large deformations due to the unbounded relative displacement between static and
passing rotating blade rows pose serious challenges to numerical codes, which can, however, be handled well with
the sliding mesh method. The Aachen 1-1/2 stage turbine is a subsonic rig featuring a stator-rotor-stator configuration
with identical blade geometries for both stators, thereby offering the opportunity to study the influences of transient
behavior and wake interaction in the same setup.
The outline of this paper is as follows: The governing equations on the moving domain are given in Section 2 and
the DGSEM scheme as well as the treatment of non-conforming meshes are discussed in Section 3. Section 4 gives
details on the proposed implementation and parallelization strategy for the sliding mesh method. In Section 5, we
demonstrate the error convergence of the method, its scaling properties and the suitability for large-scale applications.
To this end, an implicit, wall-resolved large eddy simulation (LES) of the 1-1/2 stage turbine test case Aachen Turbine
2https://github.com/flexi-framework/flexi-extensions/tree/sliding-mesh
2
is presented and discussed. Section 6 concludes the paper, and we give an outlook on further developments.
2. Governing Equations
In this work, we consider the three-dimensional compressible Navier-Stokes equations, which can be written in
conservative form as
Ut+∇x·Fc(U)− ∇x·FvU,∇xU=0,(1)
with the vector of conserved variables in usual notation U=ρ, ρv1, ρv2, ρv3, ρeTand its time derivative Ut, the
convective fluxes Fc, the viscous fluxes Fvand the differential operator ∇xwith respect to the physical coordinates
x=[x1,x2,x3]T. To account for the moving frame of reference in case of mesh movement, the fluxes can be written
in an arbitrary Lagrangian-Eulerian (ALE) formulation [45], which yields the fluxes with columns i=1,2,3 as
Fc
i=
ρvi
ρv1vi+δ1ip
ρv2vi+δ2ip
ρv3vi+δ3ip
ρevi+pvi
−vg
i
ρ
ρv1
ρv2
ρv3
ρe
,Fv
i=
0
τ1i
τ2i
τ3i
τi jvj−qi
.(2)
Here, vg=hvg
1,vg
2,vg
3iTdenotes the velocity of the grid in Cartesian coordinates, which induces an additional flux
contribution, and δdenotes the Kronecker delta. The stress tensor τi j and the heat flux qican be written as
τi j =µ ∂vi
∂xj
+∂vj
∂xi!+λ δi j
∂vk
∂xk
,(3)
qi=−k∂T
∂xi
,(4)
where kdenotes the heat conductivity and Tthe static temperature. Moreover, Stokes’ hypothesis states λ=−2
3µwith
µas dynamic viscosity. The equation system is closed with the perfect gas assumption which yields the equation of
state
p=ρRT =ρ(γ−1)"e−1
2v2
1+v2
2+v2
3#(5)
with the ratio of specific heats γand the specific gas constant R.
3. Numerical Methods
3.1. Discontinuous Galerkin Spectral Element Method
For the discontinuous Galerkin spectral element method (DGSEM), the computational domain is subdivided into
non-overlapping elements. Each element is mapped into the reference element E∈[−1,1]3using a polynomial
mapping x=xξwith degree Ngeo as described in more detail in [46]. By defining the Jacobian of the mapping J=
det ∂xi/∂ξjand introducing the contravariant fluxes F(again, see [46] for details), Eq. (1) can be written in the
reference space as
JξUt+∇ξ· F U,∇ξU=0.(6)
The projection of Eq. (6) onto the polynomial test space, spanned by test functions φξin the reference element E,
and subsequent integration by parts yields the weak formulation as
ZE
JξUtφξdE+Z∂EF · NφξdS−ZE
FU·∇ξφξdE=0,(7)
3
where Nis the outward pointing face normal vector. For DGSEM, the solution Uand the fluxes Fare approximated
by polynomial basis functions, which are the tensor product of one-dimensional nodal Lagrangian polynomials `ξk,
that satisfy the cardinal property δi j on a given set of interpolation points nξk
jowith j=0, ..., Nand kindicating the
spatial dimension. The solution and the fluxes in three dimensions can therefore be written as:
U≈
N
X
i,j,k=0
ˆ
Ui jk`iξ1`jξ2`kξ3,(8)
F ≈
N
X
i,j,k=0
ˆ
Fi jk`iξ1`jξ2`kξ3.(9)
The test functions are chosen identical to the basis functions and the integrals in Eq. (7) are evaluated numerically
with collocation of the integration points and interpolation points, for which the Legendre-Gauss and the Legendre-
Gauss-Lobatto points are common choices. The choice of shared nodes for both operators leads to a highly efficient
numerical scheme with significantly reduced operation counts in 2D and 3D cases. The gradients of the solution vec-
tor for evaluation of the viscous fluxes are obtained with the lifting method by Bassi and Rebay commonly referred
to as their ”first method” [47]. The single elements are only coupled weakly by the fluxes across the elements’ faces
in the surface integral, which are approximated by a Riemann solver. An appropriate Runge-Kutta method is used to
advance the solution in time.
Throughout this work all computations were carried out using Legendre-Gauss-Lobatto interpolation points in con-
junction with the Split-DG formulation by Pirozolli [48] and Roe’s approximate Riemann solver [49] with an entropy
fix by Harten and Hyman [50]. Unless stated otherwise, a fourth-order low-storage Runge-Kutta method [51] is em-
ployed for time integration. A complete description of the method, its implementation and parallelization in the code
framework FLEXI as well as validation and application examples can be found in [14].
3.2. Non-conforming Meshes
The sliding mesh approach, representing the focus of this work, naturally introduces non-conforming element in-
terfaces (sometimes also called hanging nodes) at the sub-domain interface. Even if the initial topology is conforming,
the relative movement of adjacent meshes creates a time-dependent interface architecture, in which element neighbors
constantly change. In this section, we briefly summarize the static case of such a non-conforming mesh, and extend it
to the moving case in Section 3.3.
A common approach for the coupling of non-conforming domains in spectral element schemes is the mortar method,
originally proposed by Mavriplis in [52] for the incompressible Navier-Stokes equations and applied to compressible
flows by Kopriva in [53, 54]. It was recently shown by Laughton et al. [55], that this mortar approach yields superior
accuracy in comparison to interpolation-based methods, especially for underresolved flows. In the mortar method, the
non-conforming interface is subdivided into two-dimensional mortars in such a way that each mortar has only one
adjacent element face on each side of the interface, as depicted in Figure 2. The sub-domains do not interact directly
with each other across the interface. Instead, the solution on each element face at the interface is first projected onto
its adjacent mortars. The unique fluxes across the interface are then computed on the mortars and the two mortar
fluxes corresponding to each element face are projected back onto these respective element faces on both sides of the
interface. A more detailed description of the method implemented in FLEXI can be found in e.g. [14].
For the configuration depicted in Figure 2, the solution points of the mortars and the element face line up along the
dotted lines in Figure 2(b). This is intended by design - it stems from the fact that we choose the same solution
representation on the mortars as for the usual elements. This choice entails that for the chosen tensor product basis,
the problem decomposes into individual one-dimensional operations along the dotted lines, as shown in Figure 2(c).
Therefore, only the one-dimensional case along one representative line will be considered in the following.
Projection Domain →Mortar
Analogous to Eq. (8), the polynomial approximation of the solution on the domain faces Ωkis given by
Uk≈
N
X
i=0
ˆ
Uk
i`i(ξ),(10)
4
(a) (b) (c)
Figure 2: Left: Schematic view of the used mortar method in three dimensions. Middle: The distribution of the solution points of the element face Ω1
and the mortars Ξ1,Ξ2with dotted lines to indicate alignment. Right: The resulting quasi-one-dimensional mortar configuration. Additional spacing
between the elements is added for clarification and the mortars are shown hatched.
where ξdenotes the one-dimensional coordinate on the element side. To express the solution on the mortars Ξk, we
define a local coordinate zk∈[−1,1]such that
ξ=
σ+1−σ
2z1+1for ξ > σ
−1+1+σ
2z2+1for ξ≤σ
,(11)
where σdenotes the position of the hanging node in reference space as depicted in Figure 2(c). As shown in [53],
an unweighted L2-projection from the side onto the mortars is sufficient to ensure conservation for straight-edged
elements. Using the same basis functions as for the solution Uin Eq. (10) on each mortar, the left and right solution
of the mortar Ξkcan be written as
QΞk,L/R≈
N
X
i=0
ˆ
QΞk,L/R
i`izk.(12)
Inserting the polynomial representations Eq. (10) and Eq. (12) into the L2-projection of the solution U1on side Ω1
onto mortar Ξ1then reads
Z1
−1
N
X
i=0
ˆ
Q1,L
i`iz1−
N
X
i=0
ˆ
U1
i`i(ξ)
`jz1dz1for j=0, ..., N.(13)
Introducing the definitions
Mi j :=Z1
−1
`iz1`jz1dz1for i,j=0, ..., N,(14)
SΩ1→Ξ1
i j :=Z1
−1
`iξ(z1)`jz1dz1for i,j=0, ..., N,(15)
finally leads to the projection operation as
ˆ
Q1,L=M−1SΩ1→Ξ1ˆ
U1:=PΩ1→Ξ1ˆ
U1,(16)
where PΩ1→Ξ1is defined as the projection matrix from face Ω1to mortar Ξ1. The projection from face Ω1onto
mortar Ξ2can be obtained accordingly. In addition, it seems worth noting that the mass matrix Mdepends only on
the used basis functions and is therefore independent of the interface configuration. Furthermore, the projection from
the element faces onto the mortars is formally exact, provided that the same polynomial degree Nfor the solution on
mortars and element faces is employed. Since the lifting routine employs a DGSEM discretization for the gradients,
these can be obtained by the same procedure as is described here. The resulting gradients on the element faces can be
projected onto the mortars analogously using Eq. (16).
5
Projection Mortar →Domain
Once the left and the right solution on the mortars Ξ1and Ξ2are obtained, the left and right fluxes FL/R,Ξ1/2can
be evaluated. The resulting fluxes are projected back onto the domain face Ω1using an L2-projection in the form of
Zσ
−1FΩ1(ξ)− F Ξ1,Lz1`j(ξ)dξ+Z1
σFΩ1(ξ)− F Ξ2,Lz2`j(ξ)dξ=0 for j=0, ..., N.(17)
By inserting the polynomial representation Eq. (9) of the fluxes and reordering we obtain
ˆ
FΩ=PΞ1→Ωˆ
FΞ1,L+PΞ2→Ωˆ
FΞ2,L:=1−σ
2M−1SΞ1→Ωˆ
FΞ1,L+1+σ
2M−1SΞ2→Ωˆ
FΞ2,L,(18)
where Mis identical to the matrix defined in Eq. (14) and the matrices SΞ1→Ωand SΞ2→Ωare the transposes of SΩ→Ξ1
and SΩ→Ξ2in Eq. (15), respectively.
3.3. Sliding Mesh Interface
We briefly lay out the sliding mesh idea as proposed and described in detail in [35]. At a sliding mesh interface,
two sub-domains perform a sliding relative motion along the interface. For simplicity, it can be assumed that one
sub-domain is static while the other is moving (although in principle, both sub-domains can be moving). We first
consider two-dimensional cases, i.e. where the interface is a 1D object, and we restrict ourselves to periodic interfaces.
For straight interfaces, periodicity can be ensured with periodic boundary conditions. The alternative are circular
interfaces, where one sub-domain is inside the circle, and the other outside, and the relative motion is a rotation.
Starting from an initial configuration where element faces are conforming and equi-spaced along the interface, the
Figure 3: Schematic of the straight sliding mesh interface. The gap between the domains allows to show the mortar configuration at the interface
(gray). The striped mortars are identical for periodic boundaries at the interface.
relative motion leads to a non-conforming pattern, as shown in Figure 3. We note that our algorithm can also work
with non-equispaced interface elements, but for all the applications envisioned by us, there is no reason to suggest that
such a mesh topology is necessary. We will thus stick to the described spacing for the rest of this work. Extending the
idea for the static case from Section 3.2 to the moving case follows these steps:
1. Introduce a dynamic definition of the interface configuration.
2. Introduce two-sided mortars between the non-conforming sub-domains as shown in Figure 3.
3. Interpolate the solution from the faces of both sub-domains onto these mortars.
4. Solve the Riemann problem on the mortars to obtain numerical fluxes.
5. Project the numerical flux of the mortars back onto the faces of both sub-domains.
Due to the relative motion, the interface is now necessarily dynamic, i.e. once the definition has been updated to reflect
that situation in item 1, the rest of the algorithm can follow the static procedure for this instance in time, and compute
the appropriate projection and interpolation operators. The dynamic definition of the interface has to account for two
aspects: Firstly, the size of the overlap regions of elements changes, i.e. the mortar definition has to be adjusted
accordingly. Secondly, the neighboring information across the interface is now also dynamic and changes during
the computation. The restriction to equidistant spacing at the interface has two consequences: Each element face is
always represented by two mortars (with the exception of the singular moments of conforming sub-domains, which
6
(a) (b) (c) (d)
Figure 4: Possible mesh geometries in three spatial dimensions for the described sliding mesh method. The moving sub-domains are scaled or
shifted to reveal the structured mesh at the interfaces. From left to right the interface geometries allow for translational movement, rotation with a
conical interface, rotation with a cyclindrical radial interface and rotation with a plane annular axial interface.
are in practice handled by one mortar of the size of the element faces, and the other of size 0). Moreover, the position
of the hanging nodes σin reference space is the same for all faces, such that the same interpolation and projection
matrices can be used for all faces along the interface.
For three-dimensional domains, the interfaces become two-dimensional objects. A coordinate perpendicular to the
interface movement can be introduced (in principle, the relative movement is not confined to one coordinate, but
in the present work, we restrict ourselves to this case). Perpendicular to the movement, several element layers can
be introduced, but the face mesh at the interface has to be structured. For circular and straight interfaces, three-
dimensional equivalents are shown in Figure 4 along with an axial interface of two annular sub-domains with relative
rotation. Note that for the mortar method as well as the DG scheme itself, information is only exchanged via the
surface fluxes. Hence, elements with only a single vertex (or a single edge in 3D) situated at the interface do not
interfere with the sliding mesh interface and thus do not need any special treatment. Such a non-interfering vertex/edge
at the interface can be seen in Figure 4(a).
The geometric conservation law (GCL) states that mesh movement does not induce artificial perturbations in a constant
solution [17]. For our ALE formulation, it was ensured that the underlying scheme itself satisfies the GCL by solving
the discrete GCL with a DGSEM discretization, as is discussed more detailed in [17, 56]. Since the sliding mesh
interface allows per definition only for tangential movements, it should satisfy the GCL by construction. Following
[22], it was verified numerically that the GCL is indeed fulfilled exactly for planar sliding mesh interfaces, such as
Figure 4(a) and Figure 4(d). For curved interfaces (e.g. Figure 4(b) and Figure 4(c)), the approximation of the cirular
geometry by polynomials causes minor surface normal velocity components, which are different for non-conforming
elements, and thus lead to perturbations in a constant solution. These are, however, miniscule for typical mesh
resolutions as is demonstrated in Section 5.1, and they diminish with the geometric order of accuracy Ngeo +1 if mesh
resolution is increased. This issue is a subject of ongoing research.
4. Parallel Sliding Mesh Implementation
With the mathematical operators for the sliding mesh interface in place, we now present a strategy for an efficient
implementation in our in-house code FLEXI and possibly other element-based high-order schemes. It is designed
to minimize the thread-level computational overhead, but more importantly to keep communication as efficient as
possible. To this end, global communication is avoided altogether and local communication is kept to a minimum by
passing only data and no metadata like indices or identifiers.
In Section 4.1, a very brief introduction to the MPI-based parallelization strategy of our baseline code is given.
The basic approach to the sliding mesh implementation is given in Section 4.2. The most challenging aspect of the
parallelization is to generate information about size and sorting of a set of data for each passed message in a setting
where the communication partners are dynamically changing. To facilitate the description of our approach to this
dynamic configuration, some index definitions are introduced in Section 4.3. On this basis, the data sorting and index
mapping is described in Section 4.4. An illustrative example for the index mapping is given in Appendix A.
7
4.1. Prerequisites: FLEXI Parallelization
In order to formulate requirements to the sliding mesh implementation, some principles of the underlying FLEXI
code are first briefly laid out. FLEXI uses a pure distributed memory (MPI) parallelization. In the mesh building pro-
cess using our in-house High-Order Pre-Processor (HOPR) [57], elements are sorted along a space filling curve [14].
During mesh decomposition in FLEXI, one or several complete DG elements are assigned to each process following
the sorting along the space filling curve. This allows to obtain compact sub-domains for each process. No element is
split between two processors. At each interface between two elements, one of the elements is defined as primary and
the other as replica with respect to that interface. We note that the connection between elements in a DG scheme is
achieved by the numerical flux function akin to a finite volume scheme. If the two elements adjacent to an interface
are handled by two different processes, two communication steps are to be carried out during each computation of a
DG operator to compute a common interface flux: First, the solution Uon the boundary is passed from the replica
to the primary element. The Riemann flux is calculated on the primary element and the result is passed back to the
replica element (two more analogous communication steps are required for the lifting procedure for the computation
of viscous terms). The solution Uand the fluxes Fat the boundary are stored in separate arrays for primary and
replica (yielding four arrays Uprimary,Ureplica ,Fprimary and Freplica). In these arrays, the values are sorted according
to an element interface index iface, which is assigned to each element interface during the initialization phase of the
simulation. iface is process-local.
For all element interfaces forming a process interface (i.e. where the neighboring element is handled by another
process), these interface indices iface are ordered nestedly by two criteria:
•The outer sorting is by the rank of the neighboring process, i.e. all faces where the opposing elements are
handled by one specific other rank are grouped together. This ensures that the data sent to this other process is
contiguous in memory.
•For each set of faces shared between two processes, it has to be ensured that the order of those faces within the
set is the same on both processes, e.g. that the face which is the first in the set on one processor is the first in
that set on the other processor as well.
In FLEXI mesh files, each face has a uniqiue global ID. It is read in for each face by every process. Simply
sorting each set of faces shared between two processes by this global face ID ensures that the data communicated
between two processes is sorted consistently.
A similar strategy will employed for the sliding mesh interface.
In order to achieve high compute throughputs without having to wait for communications to finish, latency hiding
is employed in FLEXI: The operations necessary in preparation of a communication step are always carried out first
at the earliest possible instance. Also, the communication is initiated as early as possible in a non-blocking manner.
The communication window is then filled with local arithmetic operations to give the message passing as much time
as possible to complete.
4.2. Sliding Mesh: Implementation Basics
We define the sliding mesh interfaces when building the mesh in the pre-processing stage of the simulation. In
principle, every slice in the mesh with the equi-spaced structured topology described in Section 3.3 can be defined a
sliding mesh interface with little computational overhead. However, in our envisaged applications, only one or a few
interfaces are needed.
For each sliding mesh interface, there is an adjacent static and a moving mesh sub-domain. The MPI domain decom-
position occurs in two steps: First, each process is assigned to either the sliding or the moving domain, so that no
process handles elements on both sides of a sliding mesh interface. This choice eases implementation, but also in-
creases efficiency, as a processor sub-domain across a sliding mesh interface would get torn apart and lose its compact
shape due to the movement. Within each of these sub-domains, elements are already sorted along a space filling curve
during mesh generation and the elements handled by each process are assigned accordingly.
The elements belonging to the static domain are defined to be the primary elements for the sliding mesh interface.
The solution Uis first interpolated from the element faces to the mortars on both sides of the interface. The additional
mortar arrays Usm
primary and Usm
replica exist for this purpose, alongside the additional arrays Fsm
primary and Fsm
replica. The
8
1
2
3
4
1
2
34
4
3
3
2
1
0
1
0
1
Figure 5: Mortar structure and coordinate definitions for a straight
interface with periodic boundaries. The left (red) sub-domain is
moving vertically, resulting in a displacement ∆. The right (blue)
sub-domain is static. As shown, ˜ηk=ηk−∆for the parallel coor-
dinates.
Figure 6: Index definition example for a straight interface. Left
(red) sub-domain is moving, right (blue) is static. Plane view: the
normal coordinate and the according index are omitted for clarity.
Note that the mortars use the index ikof the static sub-domain.
communication procedure is similar to the one for conforming faces: The solution Usm
replica is passed from the moving
(replica) to the static (primary) domain, where the Riemann flux is evaluated. The flux Fsm
replica is passed back to the
moving (replica) process and on both sides of the interface, the two mortar fluxes are projected onto the DG basis for
the respective element face. Communication hiding is employed in the same manner as for the standard conforming
element interfaces.
4.3. Parallelization: Index Definitions
In the following, some index definitions are introduced to ease the description of index mapping and sorting.
Particularly, it will be necessary to uniquely address each mortar, each static and each moving face by a set of indices.
The following definitions are also illustrated in Figures 5 and 6. Variables on the static side (and in a static un-
displaced frame of reference) are noted without an accent, while variables on the moving side (and in a frame of
reference displaced with the moving sub-domain) are marked with a tilde ˜·. Let us first consider the static and un-
displaced frame of reference: We define the coordinates ηkand η⊥at the sliding mesh interface, where the subscripts
kand ⊥indicate their direction relative to the mesh movement (cf. Figure 5). The faces at the interface are placed in
a structured grid, so two indices ikand i⊥can be assigned to each static element face at the interface, numbering them
along ηkand η⊥.
On the moving side, a parallel coordinate ˜ηkdisplaced with the moving sub-domain and an according index ˜
ikare
introduced, such that ˜ηk=ηk−∆, where ∆is the displacement of the moving domain. This is illustrated in Figure 5.
Since there is no displacement in the perpendicular direction, the coordinate η⊥and the index i⊥can be used for the
moving domain, too, and no additional corresponding variables for the moving side have to be introduced. Faces on
the moving side are uniquely defined by the two indices ˜
ikand i⊥. The displacement ∆can be expressed in terms of
the number of surpassed faces n∆and the fraction of the currently surpassed face s∆, i.e.
∆ = (n∆+s∆)lk,n∆∈Z,(19)
where lkis the face length along the direction of sub-domain movement.
The mortars inherit the index from the static domain ik(as well as i⊥). In order to uniquely define each mortar, a
third index isub ∈ {0,1}is introduced to distinguish between the two mortars adjacent to an element face. It is defined
as
isub =
0 for ηk−iklk<s∆lk,
1 else. (20)
9
Following this definition, for each face on the static side, the mortar with the smaller ηk(i.e. the ”lower” mortar in
Figure 6) has index isub =0 and the ”upper” one has isub =1, while the order is inverse on the moving side. The
indices ˜
ik,ikand isub of a mortar and the adjacent faces are finally linked via the relation
˜
ik=ik−n∆+isub −1,(21)
which can be exemplarily verified in Figure 6 for n∆=1. The normal index i⊥is of course the same for a mortar and
its adjacent element faces.
4.4. Parallelization: Mortar Sorting and Index Mapping
Communication in the presence of changing mesh topology and changing communication partners poses unique
challenges. In order to avoid additional communication, several requirements have to be met:
1. The communication partners’ ranks as well as the size of the communicated data sets need to be known a priori
by both sides.
2. The data communicated from one process to another should be contiguous in memory on both processes.
3. The data communicated from one process to another has to be sorted by universal criteria, such that the receiving
process knows beforehand how the data is sorted.
We start by addressing the first requirement. To this end, the ranks handling the elements belonging to each
static face Ωik,i⊥and each moving face ˜
Ω˜
ik,i⊥are communicated globally during the initialization phase prior to the
actual simulation and are stored as two mapping arrays r(ik,i⊥) and ˜r(˜
ik,i⊥). This is the only global communication
procedure in the proposed implementation. Subsequently, all dynamic information regarding the configuration of the
interface and the partners in a communication step across the interface can be deduced without the need for further
message passing.
We now have all necessary ingredients in place to meet the above three criteria:
1. For each mortar, the ranks handling both adjacent faces are known via rand ˜r(we use Eq. (21) to translate
between ˜
ikand ik).
2. Sorting the mortars on each process by its communication partner yields contiguous data chunks to be passed.
3. The index triple (ik,i⊥,isub) for each mortar is a globally unique criterion for the inner sorting of the communi-
cated data sets.
The sorting procedure now works as follows: An index array Ais set up, which contains a tuple of five entries
for each mortar adjacent to the faces of the own rank. On each process of the static domain, these entries are: the
FLEXI face indices iface, the ranks of the opposing moving domain ranks ˜r, the movement-parallel indices ik, the
normal indices i⊥, and the mortar index isub. The only difference on the moving domain is that here, the ranks of
the static processes rare stored instead of ˜r. These index arrays are then nestedly sorted by the four indices r(or ˜r,
respectively), ik,i⊥and isub (from outer to inner in that order) using a quicksort algorithm. The FLEXI face index iface
is passively sorted by the other variables. The resulting order determines the sorting of mortar data on both the static
and the moving side. The order of the entries in Adefines a process-local index imortar , which determines the data
sorting in the arrays Usm
primary,Usm
replica,Fsm
primary and Fsm
replica.
An index array mis set up, where imortar is given for each iface and isub. It is used to store data ordered during the
interpolation and the projection steps of the mortar procedure.
The sorting and mapping process is illustrated with the help of an example in Appendix A.
Updates of the index arrays Aand mare only carried out whenever the communication structure changes, i.e.
whenever the moving sub-domain displacement surpasses a full face length. This is, in fact, the only necessary
procedure to account for the changed mesh topology. At all other time stages, mremains the same and only s∆
changes, so only the interpolation and projection operators have to be updated.
For extension to multiple sliding mesh interfaces, this strategy can be performed individually for each interface.
The mortar sorting can then be kept unique by introducing a new (outermost) sorting index to distinguish between
interfaces.
10
5. Results
In this section, the high-order accuracy of the implemented sliding mesh method is verified by convergence tests
for a curved interface in Section 5.1 and a straight interface in Section 5.2. We then investigate the parallel perfor-
mance of the novel method and compare it against the static baseline scheme in Section 5.3, before the method is
applied to a large scale LES test case in Section 5.4.
5.1. Isentropic Vortex
To verify that the implemented method is indeed high-order accurate for DGSEM and especially for curved inter-
faces which are approximated by high-order polynomials, we follow Zhang and Liang [35] and investigate the order
of accuracy of the method using the transport of an isentropic Euler vortex. For this two-dimensional test case, an
isentropic vortex is superimposed on a constant freestream. For details on the exact solution and the notation we refer
the reader to [35]. The vortex parameters are chosen as =1, rc=1, which can be interpreted as vortex intensity and
vortex size, respectively. At the beginning t=0, the vortex is located in the center of the domain. The freestream is
initialized with ρ∞=1, v∞=1, θ=arctan 1
2,Ma∞=0.3 as the freestream density, velocity magnitude, flow angle
and the freestream Mach number, respectively. The ratio of specific heats is set to γ=1.4 and the freestream pressure
is set consistent with the Mach number via the ideal gas relation.
Figure 7: Left: The coarsest mesh with 1854 elements at t=4.0, the sliding mesh interface is highlighted red. Right: Exact solution of the
isentropic vortex for the density ρat t=4.0.
Three different unstructured meshes with the number of elements ranging between 1854 and 15003 elements are used,
with the coarsest one depicted in Figure 7 together with the exact solution. The meshes are unstructured and quadratic
with a side length of L=20 and periodic boundary conditions. The inner sub-domain rotates with an angular velocity
of ω=0.1. All errors in Table 1 are reported at t=4.0 when the center of the vortex reaches the sliding mesh inter-
face. For all cases, the explicit time step is reduced artificially to highlight the behavior of the spatial discretization
error. The method indeed shows the expected convergence behavior for all investigated orders.
N=2 N=3 N=4 N=5
#Elem L2-Error Order L2-Error Order L2-Error Order L2-Error Order
1854 8.58e-05 — 8.60e-06 — 1.09e-06 — 1.51e-07 —
7094 1.52e-05 2.58 8.40e-07 3.47 4.94e-08 4.61 3.13e-09 5.78
15003 4.35e-06 3.34 1.65e-07 4.35 7.71e-09 4.96 3.85e-10 5.60
Table 1: L2-Errors of density ρfor the isentropic vortex at t=4.0 and the resultant orders of accuracy for several polynomial degrees N.
11
5.2. Manufactured Solution
To verify the high-order accuracy of the implemented method for fully three-dimensional flows, a smooth manu-
factured solution from [58] is investigated. The corresponding parameters are set to ω=1, α=0.1, R=287.058,
µ=0.001. This manufactured solution describes an oblique sine wave advected with a constant speed, as shown in
Figure 8. The computational domain is set to x∈[0,2]3with a Cartesian mesh and periodic boundary conditions,
while the central sub-domain (i.e. the sub-domain x2∈h2
3,4
3i) moves with a velocity of vg=[1,0,0]T, as depicted in
Figure 8. Starting from a cube with 33elements, the number of elements in each spatial direction is doubled in every
refinement step.
Figure 8: Left: Computational mesh with 63elements at t=0.3, the sliding mesh interfaces are highlighted in red. Right: Exact solution of the
manufactured solution for the density ρat t=0.3.
The time step is chosen small enough to inhibit any influence of the time integration scheme on the observed errors.
The L2-errors for the density ρat t=1.0 and the resultant orders of accuracy are reported in Table 2. As before, the
sliding mesh method retains the high-order accuracy of the DGSEM scheme as expected.
N=2 N=3 N=4 N=5
#Elem L2-Error Order L2-Error Order L2-Error Order L2-Error Order
334.21e-02 – 5.08e-03 – 3.16e-04 – 2.98e-05 –
633.82e-03 3.46 1.60e-04 4.99 9.80e-06 5.01 6.86e-07 5.44
1234.93e-04 2.95 1.02e-05 3.97 3.51e-07 4.81 1.07e-08 6.00
2436.93e-05 2.83 6.79e-07 3.91 1.13e-08 4.95 1.64e-10 6.03
4838.47e-06 3.03 4.44e-08 3.93 3.05e-10 5.21 2.36e-12 6.12
Table 2: L2-Errors of density ρfor the manufactured solution at t=1.0 and the resultant orders of accuracy for several polynomial degrees N.
5.3. Scaling Tests
Having established that the sliding mesh method produces accurate results and the interface treatment does not
introduce spurious errors, we now investigate the results of the implementation and parallelization strategy described
in Section 4. Favourable scaling behavior is essential to exploit today’s massively parallel hardware resources, which
in turn are necessary for LES of complex applications found in industry. The baseline open source code FLEXI
has been shown to scale efficiently up to over 100,000 cores [14, 59], which allows to investigate the impact of the
implemented sliding mesh method on its scaling efficiency. To this end, scaling tests were performed on the supercom-
puter Hazel Hen at the High-Performance Computing Center Stuttgart (HLRS). The Cray XC40-system consists of
7712 nodes, each equipped with two Intel Xeon E5-2680 v3 and 128GB of main memory. The computational meshes
for the tests are based on a cubical Cartesian mesh as shown in Figure 8 with x∈[0,1]3and sliding mesh interfaces
parallel to the x1x3-plane. For the refined meshes, the amount of elements of the baseline mesh with 6 ×6×6 elements
is successively doubled in x1-,x2- and x3-direction respectively, up to the finest mesh with 96 ×48 ×48 elements.
The simulation is initialized with a constant freestream and the mesh velocity is set to vg=[0,1,0]T. Every simulation
12
Figure 9: Strong and weak scaling of the sliding mesh implementation for different quantities of elements and computing cores on the HLRS
supercomputer Hazel Hen. In all plots, the mean over all five runs is given with the minimum and maximum as error bars. Left: the performance
index (PID) over the number of degrees of freedom (DOF) per core. Middle: strong scaling as parallel efficiency over the total number of cores
for the different meshes. The results for the baseline open source code (OS-Flexi) without sliding mesh for the finest grid with 63·210 =221,184
elements is shown dashed for comparison. Right: weak scaling as parallel efficiency for loads ranging between 36 and 288 elements per core.
is run exactly 100 time steps using a five-stage Runge-Kutta method, while only the computational time excluding
I/O and initialization is considered for the performance analysis. The communication partners at the sliding mesh
interface change at most three times during the computation. Starting from 1 computing node (24 cores) the amount
of nodes for each mesh is then doubled until the maximum of 512 nodes (12.288 cores) is reached. However, only
cases with at least 1944 DOF per core are considered, corresponding to 9 elements per core for the polynomial de-
gree N=5, which is used for the entire scaling tests. Each run is repeated five times to account for potential statistical
influences like the overall network load caused by other jobs on the system. As a consistent performance measure the
performance index PID is used, which is defined as
PID =wall-clock-time ·#cores
#DOF ·#time steps ·#RK-stages ,(22)
and indicates the averaged computation walltime per spatial degree of freedom on each core for the computation of
one Runge-Kutta stage. The results of the scaling tests are shown in three different plots in Figure 9. While the left
plot shows the PID over the computational load per core, the plot in the center shows the parallel efficiency over the
total amount of cores as strong scaling. The parallel efficiency is defined as ratio between the PID of the respective
number of cores and the PID of the baseline simulation using the minimum of 24 cores. The results for the baseline
code without sliding mesh for the finest mesh with 96 ×48 ×48 elements are also plotted dashed for comparison. The
right plot shows the weak scaling for four distinct loads per core.
For more than 104DOF per core the sliding mesh implementation shows a decrease in performance by about 15% in
comparison to the baseline code. This is mainly due to the increased workload of the implemented ALE formulation
and the additional overhead introduced by the sliding mesh method. For a decreasing amount of load per core, the
memory consumption per core decreases as well, which allows a greater share of data being placed in the CPU cache.
The baseline code can exploit these caching effects and shows significant performance increases for small loads per
core. In contrast, the overall performance of the sliding mesh code decreases significantly for low loads and large
amounts of cores. The first reason for this is the load imbalance introduced by the sliding mesh interface. With
an increasing amount of cores, the additional work at the interface is distributed to a smaller share of the cores,
which leads to higher work imbalances and to a decrease of parallel efficiency. The other reason is the increased
communication effort at the sliding mesh interfaces, since the number of communication partners at the interface
potentially doubles in comparison to a conforming mesh. As described in Section 4, FLEXI relies on non-blocking
communication which is hidden by local work. For very small loads however, the processors at the interface do not
have enough local work to hide the additional communication effectively. This leads to poor scaling performance for
very small loads per core. For practical applications however, the authors have usually encountered loads of more then
13
x+y+z+
max mean std max mean std max mean std
Rotor T=0.0 36.6 23.8 8.1 2.2 1.4 0.4 26.6 16.3 4.4
T=0.5 38.2 24.2 8.1 2.5 1.4 0.4 29.4 16.7 4.5
Stator 2 T=0.0 42.8 24.1 9.5 2.2 1.3 0.4 29.1 17.6 5.1
T=0.5 44.5 25.4 9.3 2.2 1.4 0.4 29.2 18.7 5.3
Table 3: The dimensionless wall distances for the wall-nearest grid cells of the rotor blade and the second stator vane in viscous wall units. The wall
distances are normalized with the factor (N+1) to account for the multiple DOF in each direction inside a DG element. Given are the respective
maximum, mean and the standard deviation of the phase-averaged wall distances at two distinct phase angles T=0.0 and T=0.5.
10,000 DOF per core, for which the proposed method shows excellent weak and strong scaling. The performance
loss of about 15% compared to the baseline code is well within acceptable limits, and could likely be further reduced
by an a priori static load balancing.
5.4. LES of the Aachen Turbine
To demonstrate the suitability of the proposed sliding mesh implementation for large-scale simulations with the
presented high-order DG scheme, it is applied to a wall-resolved implicit LES of the 1-1/2 stage Aachen turbine test
case [42, 43, 44], which is investigated extensively in literature, e.g. [60, 61, 62, 63].
Test Case Definition
The 1-1/2 stage Aachen turbine consists of stator vanes with modified VKI design and rotor blades with a Traupel
profile [64] in a stator-rotor-stator setup. All blades are untwisted and the inner and outer diameter of the turbine are
constant. The leading edges of the two stators are not in line, but rotated circumferentially by 3◦. Following [62],
the original blade count of 36-41-36 is modified to a uniform blade count, which allows to reduce the computational
domain to a periodic sector with one single blade pitch per cascade. However, in contrast to [62], the blade count
is modified to 38-38-38 while retaining the blades’ original profile geometry. To further reduce the computational
cost, the turbine is approximated by a planar cascade, which neglects any influence of the casing and the effects of the
rotor’s tip clearance. The modified geometry is obtained by extruding the two-dimensional profile geometries in x3-
direction to a length of 6 mm, which corresponds to 10% of the rotor’s chord length and approximately 10.9% of the
rotor’s original blade span. The remaining geometric quantities are obtained with respect to the turbine’s mean radius
of r=272.5mm and considering the modified blade count. For the investigated operation point with a rotational speed
of 3500 rpm this leads to a planar velocity of 99.88 m/s. The inflow Mach number is Ma ≈0.1 and the Reynolds
number with respect to the outflow velocity and the chord length of the stator vane is Re ≈800,000. More details on
the test case and the turbine geometry can be found in e.g. [60].
Mesh Generation
The mesh is based on a two-dimensional unstructured mesh with structured O-type meshes around the blades,
which is extruded in x3-direction with 24 equi-spaced elements, resulting in 966,288 hexahedral elements for the
entire mesh. The sliding mesh interfaces are centered between the trailing and leading edge of consecutive blades.
The resolution at the walls is comparable to wall-resolved LES in literature [65, 66] with the dimensionless wall
distances for the present simulation given in Table 3 in viscous wall units. Our in-house high-order pre-processor
HOPR [57] code is then used to generate a fifth-order geometry approximation of the curved blade geometry and an
RBF method is used to expand the curving into the surrounding volume, as described more detailed in [14].
The boundary conditions are set periodic in x2- and x3-direction and the blade walls are modelled as adiabatic no-slip
walls. The measured inflow state at the mean diameter of the turbine from the test case’s experimental data is imposed
as Dirichlet boundary condition at the inflow position. Similarly, the measured pressure behind the second stator is
used to impose a pressure outflow condition [67]. Both states are given in Table 4. A sponge zone with ramping
function is employed before the outflow to avoid spurious reflections at the outflow, as depicted in Figure 10. More
details on the used sponge zone and ramping function can be found in [68]. While the rotor blade moves continuously
14
Figure 10: Cross section of the computational mesh at two distinct time instants. The upper figure shows the intial conforming configuration
(referred to as T=0) and the lower figure shows the mesh with a relative displacement of a half period (T=1
2). The sliding mesh interfaces are
highlighted in red and the sponge zone at the outflow is shaded blue. The magnified section exhibits details of the transition between the structured
O-grids around the blades, the equidistant mesh at the sliding mesh interface and the unstructured mesh in the remaining domain.
ρp v1v2v3
Inflow 1.7765 kg/m3157305.88 Pa 37.200 m/s−2.010 m/s 0.0 m/s
Outflow 1.3651 kg/m3110357.08 Pa 55.992 m/s 160.019 m/s 0.0 m/s
Table 4: The measured inflow and outflow conditions at the mean diameter of the turbine from the test case’s experimental data. The inflow was
measured 143 mm in front of the leading edge of the first stator and the outflow state is given 8.8 mm behind the trailing edge of the second stator.
The velocity v3is set to zero at the inflow and outflow on account of the planar cascade assumption.
upwards in this planar simulation, the periodic boundary conditions cause the blade to reappear from beneath, as is
indicated in Figure 10.
Computational Setup
For the simulation, a sixth-order scheme is employed, which results in approximately 208 million spatial degrees
of freedom. A fourth-order Runge-Kutta method by Niegemann et al. [69] is used, since its optimized stability region
allows for larger time steps. The specific gas constant is set to R=287.058 J
kg·K, the viscosity to µ=1.8·10−5kg
m·s,
the ratio of specific heats to γ=1.4 and the Prandtl number is chosen as Pr =0.72. The LES is filtered implicitly by
the discretization with the discretization error also acting as an implicit subgrid scale model, following e.g. [15].
The computation was carried out on Hazel Hen at the HLRS with a varying amount of up to 4800 cores, resulting in
a minimum of 41,300 DOF per core. The PID was approx. 1.3µs per DOF for all considered cases and shows good
agreement with the results of the scaling test. After attaining a quasi-periodic solution, 10 periods of the turbine flow
were computed with a computational cost of about 27,000 CPU-hours per period.
Results
The time-averaged surface pressure of the vanes and the rotor blade are given in Figure 11 together with the results
of Yao et al. [62] for comparison, who conducted an URANS simulation of a three-dimensional blade passage based
on a 36-36-36 blade configuration. The results show good qualitative agreement even though the LES predicts an
overall higher pressure level than the URANS simulation. Furthermore, the LES shows excellent agreement with the
available experimental data at the blades’ trailing edges.
15
00.25 0.5 0.75 1
1
1.2
1.4
1.6
Dimensionless surface position
p/p∞
(a) Upstream vane
00.25 0.5 0.75 1
Dimensionless surface position
(b) Rotor
00.25 0.5 0.75 1
Dimensionless surface position
URANS [62]
LES
Experiment [42, 43, 44]
(c) Downstream vane
Figure 11: Comparison of static pressure. The results of the LES (black dashed) are compared with the unsteady pressure envelopes by Yao et
al. [62] (blue) and the pressure at the trailing edges from experimental data [42, 43, 44] (black squares).
The instantaneous flow fields for two distinct phase angles T=0 and T=1
2are given in Figure 12, with Tas the
relative rotor position and T=1 corresponding to one completed period of the rotor. The flow around the first stator
vane remains laminar with transition just before the trailing edge on the suction side, due to the strong favourable
pressure gradient. The induced vortex shedding causes aeroacoustic noise, which is also convected upstream, as
exhibited by the numerical pseudo-schlieren. The vane’s wake impinges on the rotor and wraps around the rotor’s
leading edge. The wake’s axis is then rotated counter-clockwise before it is passively advected by the freestream, as
detailed in [70]. The wake acts as perturbation in the rotor passage, creating unsteady pressure distributions on the
rotor surface, e.g. [71]. To quantify these effects and their impact on the rotor, the averaged lift force acting on the
rotor blade as well as its spectrum is given in Figure 13. The data is obtained by evaluating the lift force and its Fourier
transform on intervals with 2 periods each and averaging the obtained results. The minimum lift force is reached at
around T ≈ 0.8 when the wake contacts the suction side of the rotor. In contrast, the rotor lift force increases as the
point of impact shifts towards the rotor’s pressure side, reaching the maximum lift force at T ≈ 0.3. Interestingly,
only the third harmonic of the blade passing frequency (BPF) is distinguishable, while the second and fourth harmonic
show no considerable contribution to the lift force. The increasing amplitudes for frequencies at around 10 BPFs are
caused by the vortex shedding of the first stator vane, as their frequencies conincide.
As this initial analysis of the resulting flow field reveals, a number of non-linear interactions are triggered by the
complex stator/rotor interactions, which warrant further detailed analysis. Of particular interest will be the boundary
layer state and its interaction with the wakes, as well as the resulting load transients. As the focus of this paper is
on the methodology and the performance of the described sliding mesh implementation, a more throughout analysis
of the flow physics is subject of a separate publication [73]. However, this test case here already highlights the
potential of the presented high-order sliding mesh method for scale-resolving simulations in turbomachinery and
related applications.
6. Conclusion
In this work, we have proposed an efficient implementation and parallelization strategy of a mortar-based sliding
mesh method for high-order discontinuous Galerkin methods. The method retains the high-order accuracy of the DG
method as well as the excellent strong and weak scaling properties of the baseline code, and is thus well-prepared to
tackle problems at an industrial scale.
The challenge in designing a parallelization strategy lies in the dynamic communication structure. At the heart of
our proposed approach lies the avoidance of additional global communication as well as the passing of metadata.
Instead, only essential solution data is communicated, while process-local mapping arrays handle the identification
of communication partners at each timestep. The presented, globally unique sorting strategy keeps the message
data contiguous in memory and avoids local rearranging of the data on arrival. After validating the accuracy and
convergence property of our approach, we demonstrate that due to the careful design of the parallelization scheme,
16
T=0 : Configuration with low rotor lift T=1
2: Configuration with high rotor lift
Figure 12: Instantaneous flow field visualized with iso-surfaces of the λ2-criterion [72] colored by the Mach number in front of pseudo-schlieren
computed at x3=0 mm with an offset of half a period between the left and right figures. The lower figures show a close-up of the passage between
two rotor blades at the respective time instants.
00.511.52
2.5
2.6
2.7
2.8
Periods [−]
Rotor Lift [ N]
100101102
10−4
10−3
10−2
10−1
BPF Third
harmonic
Vortex shedding
of first stator
Normalized Frequency [−]
Amplitude [ N]
Figure 13: The lift force acting on the rotor blade. The results are averaged by dividing the obtained lift force into intervals of two rotor periods.
These intervals are then averaged to obtain the temporal evolution of the lift force on the left. Shown on the right is the lift force in the frequency
domain as average of the discrete Fourier transforms of the individual time intervals. Highlighted are the blade passing frequency (BPF), its third
harmonic and the frequency of the first stator vane’s vortex shedding. The frequency is normalized with the BPF.
the sliding mesh implementation achieves excellent strong and weak scaling. We note that the optimum load per core
is shifted slightly towards higher loads, and that the performance deteriorates for very low loads per core. This is to
be expected and will be improved in the future with additional load balancing. Compared to the baseline scheme, a
performance loss of only about 15% is incurred by the novel method, which is acceptable and allows us to conduct
17
large scale simulations with sliding mesh interfaces on the available supercomputers. We present an example of such
an application for the case of a turbine flow with stator-rotor-stator interaction. To the authors’ best knowledge, this is
the first time a high-order sliding mesh method for DG has been applied to large scale problems of industrial relevance.
In the future, we plan to apply this framework to a range of interesting cases, where high unsteadiness and non-linear
interactions pose challenging problems for traditional models like the RANS equations. A typical example of such
applications can be found in turbomachinery components, where our simulation framework can contribute to the
understanding of complex 3D flows. Beyond the pure fluid phase however, coupling the sliding mesh approach with a
particle tracking method as developed in [74] will establish simulation capabilities that can investigate particle-laden
flows in rotating geometry, which are rarely tractable with an experimental approach.
Acknowledgment
The research presented in this paper was funded by Deutsche Forschungsgemeinschaft (DFG, German Research
Foundation) under Germany’s Excellence Strategy - EXC 2075 - 390740016 and by Friedrich und Elisabeth Boysen-
Stiftung as part of the project BOY-143.
The authors gratefully acknowledge the support and the computing time on ”Hazel Hen” provided by the HLRS
through the project ”hpcdg”.
The measurements on the test case ”Aachen Turbine” were carried out at the Institute of Jet Propulsion and
Turbomachinery at RWTH Aachen University, Germany.
Appendix A. Illustrative Mortar Sorting Example
Here, the construction of the mapping arrays ˜r,Aand mlaid out in Section 4.4 is illustrated with an example.
The example setup is shown in Figure A.14 and described in the following: The considered process on the static
subdomain handles five sliding mesh sides (thick black lines). Each of them contains two sliding mesh mortars (fine
black lines). The moving subdomain moves from left to right. At the considered time instant, the adjacent sides in
the moving subdomain are handled by the processes with ranks 4 and 7, their indices are given by ˜r(Sub-Figure a).
The moving sides are outlined with red dotted lines. They are not aligned with the static sides. Furthermore, in this
example, the five sliding mesh sides handled by the considered static process have global static indices ikranging from
3 to 5 (Sub-Figure b) and i⊥ranging from 1 to 2 (Sub-Figure c). Within each side, the mortar sub-index is ascending
along the direction of movement as stated in Eq. (20) (Sub-Figure d). The side indices assigned to each side by FLEXI
are process-local and arbitrary, but continuous (Sub-Figure e).
7 4 4
7 47 7 4 4
4
(a) ˜r
3 3 4
4 53 3 4 5
4
(b) ik
2 2 2
1 11 1 1 1
2
(c) i⊥
0 1 0
0 00 1 1 1
1
(d) isub
8 8 6
5 79 9 5 7
6
(e) iface
Figure A.14: Example indices of sliding mesh sides of a process in the static subdomain. The movement of the neighboring moving domain is
from left to right.
The ranks of all processes on the moving domain adjacent to the interface interface are stored in the array ˜r, which
is sent to every static process adjacent to the interface. The column index of ˜ris ˜
ikand the row index is i⊥. For each
18
static side, the static process knows the static global indices ikand i⊥. It calculates the column indices ˜
ikfrom ikusing
Eq. (21) and reads the ranks shown in Sub-Figure a from the according entries of ˜r, which are
˜r=
· · · 7 7 4 4 · · ·
· · · 744...· · ·
....
.
..
.
..
.
..
.
....
.(A.1)
Note that compared to the figure, the orientation of i⊥is flipped.
The indices of all mortars handled by one process are stored in A. The columns of Acontain the different mortars,
the rows contain the different types of indices. For the considered example, the array Aafter sorting looks as follows:
A=
4444447777
3444553334
2122111121
1101010100
8566779985
˜r
ik
i⊥
isub
iface
(A.2)
This is the result of sorting the mortars (that is the columns) hierarchically by the entries of the upper four rows: The
upper row ˜rdetermines the highest (outermost) sorting criterion and the fourth row isub the lowest. The entries of the
last row iface do not contain a sorting criterion and are transported passively.
The columns of Aare now indexed from left to right, that is from 1 to 10. This index is called imortar. It is depicted
in Figure A.15. For example, the upper left mortar belongs to the ninth column of A. As can be verified in the different
sub-figures of Figure A.14, the column entries match the indices of the upper left mortar.
9 1 3
10 57 8 2 6
4
Figure A.15: Mortar index imortar as a result of sorting the columns of A.
While the upper four rows of Aare needed to create a globally unique sorting, the lower two rows are needed
locally for the mapping from local sides to global sorting. To this end, the mapping array mis filled. It represents the
inverse mapping of the last two rows of A. Its rows correspond to isub (ranging from 0 to 1) and its columns to iface.
Note that the column index does not necessarily start at 1, but in our example ranges from 5 to 9. The entries of mare
the mortar indices imortar, such that in the considered example,
m= 103597
2 4618!.(A.3)
As a verifying example, the upper left entry of mwith row index 0 and column index 5 has the value imortar =10,
while in the tenth column of A,isub =0 and iface =5.
As described in Section 4.4, the array mis now used to store the solution and Flux on the mortars in their own arrays
Usm
primary,Usm
replica,Fsm
primary and Fsm
replica using only one index imortar to distinguish between the mortars. Its has the desired
properties described in Section 4.4 in that contiguous chunks of data are sent and received during communication and
the order of the data is globally defined and thus identical for sending and receiving process. To this end, a similar
sorting procedure is carried out on the moving subdomain, too.
References
[1] Z. Wang, K. Fidkowski, R. Abgrall, F. Bassi, D. Caraeni, A. Cary, H. Deconinck, R. Hartmann, K. Hillewaert, H. Huynh, N. Kroll, G. May,
P.-O. Persson, B. van Leer, M. Visbal, High-order CFD methods: current status and perspective, International Journal for Numerical Methods
in Fluids 72 (8) (2013) 811–845.
19
[2] X.-D. Liu, S. Osher, T. Chan, Weighted essentially non-oscillatory schemes, Journal of Computational Physics 115 (1) (1994) 200–212.
[3] D. S. Balsara, C.-W. Shu, Monotonicity preserving weighted essentially non-oscillatory schemes with increasingly high order of accuracy,
Journal of Computational Physics 160 (2) (2000) 405 – 452.
[4] J. W. L. Paul F. Fischer, S. G. Kerkemeier, nek5000 Web page, http://nek5000.mcs.anl.gov (2008).
[5] C. Cantwell, D. Moxey, A. Comerford, A. Bolis, G. Rocco, G. Mengaldo, D. De Grazia, S. Yakovlev, J.-E. Lombard, D. Ekelschot, B. Jordi,
H. Xu, Y. Mohamied, C. Eskilsson, B. Nelson, P. Vos, C. Biotto, R. Kirby, S. Sherwin, Nektar++: An open-source spectral/hp element
framework, Computer Physics Communications 192 (2015) 205–219.
[6] H. T. Huynh, A flux reconstruction approach to high-order schemes including discontinuous Galerkin methods, in: 18th AIAA Computational
Fluid Dynamics Conference, 2007, p. 4079.
[7] Y. Liu, M. Vinokur, Z. Wang, Spectral difference method for unstructured grids I: Basic formulation, Journal of Computational Physics
216 (2) (2006) 780 – 801.
[8] Z. J. Wang, Y. Liu, G. May, A. Jameson, Spectral difference method for unstructured grids II: Extension to the Euler equations, Journal of
Scientific Computing 32 (1) (2007) 45–71.
[9] G. May, On the connection between the spectral difference method and the discontinuous Galerkin method, Communications in Computa-
tional Physics 9 (4) (2011) 10711080.
[10] D. De Grazia, G. Mengaldo, D. Moxey, P. E. Vincent, S. J. Sherwin, Connections between the discontinuous Galerkin method and high-order
flux reconstruction schemes, International Journal for Numerical Methods in Fluids 75 (12) (2014) 860–877.
[11] J. S. Hesthaven, T. Warburton, Nodal discontinuous Galerkin methods: algorithms, analysis, and applications, Springer Science & Business
Media, 2007.
[12] F. Hindenlang, G. J. Gassner, C. Altmann, A. Beck, M. Staudenmaier, C.-D. Munz, Explicit discontinuous Galerkin methods for unsteady
problems, Computers & Fluids 61 (2012) 86 – 93, high Fidelity Flow Simulations Onera Scientific Day.
[13] M. Atak, A. Beck, T. Bolemann, D. Flad, H. Frank, C.-D. Munz, High fidelity scale-resolving computational fluid dynamics using the high
order discontinuous Galerkin spectral element method, in: W. E. Nagel, D. H. Kr¨
oner, M. M. Resch (Eds.), High Performance Computing in
Science and Engineering ’15, Springer International Publishing, Cham, 2016, pp. 511–530.
[14] N. Krais, A. Beck, T. Bolemann, H. Frank, D. Flad, G. Gassner, F. Hindenlang, M. Hoffmann, T. Kuhn, M. Sonntag, C.-D. Munz, FLEXI:
A high order discontinuous Galerkin framework for hyperbolic-parabolic conservation laws, Computers & Mathematics with Applications
(2020).
[15] A. D. Beck, T. Bolemann, D. Flad, H. Frank, G. J. Gassner, F. Hindenlang, C.-D. Munz, High-order discontinuous Galerkin spectral element
methods for transitional and turbulent flow simulations, International Journal for Numerical Methods in Fluids 76 (8) (2014) 522–548.
[16] T. Bolemann, A. Beck, D. Flad, H. Frank, V. Mayer, C.-D. Munz, High-order discontinuous Galerkin schemes for large-eddy simulations
of moderate Reynolds number flows, in: IDIHOM: Industrialization of High-Order Methods-A Top-Down Approach, Springer, 2015, pp.
435–456.
[17] C. A. A. Minoli, D. A. Kopriva, Discontinuous Galerkin spectral element approximations on moving meshes, Journal of Computational
Physics 230 (5) (2011) 1876 – 1902.
[18] X. Liu, N. R. Morgan, D. E. Burton, A Lagrangian discontinuous Galerkin hydrodynamic method, Computers & Fluids 163 (2018) 68–85.
[19] V. A. Dobrev, T. V. Kolev, R. N. Rieben, High-Order Curvilinear Finite Element Methods for Lagrangian Hydrodynamics, SIAM Journal on
Scientific Computing (Sep 2012).
[20] R. W. Anderson, V. A. Dobrev, T. V. Kolev, R. N. Rieben, V. Z. Tomov, High-Order Multi-Material ALE Hydrodynamics, SIAM Journal on
Scientific Computing (Jan 2018).
[21] L. Wang, P.-O. Persson, A high-order discontinuous Galerkin method with unstructured space–time meshes for two-dimensional compressible
flows on domains with large deformations, Computers & Fluids 118 (2015) 53–68.
[22] E. Gaburro, W. Boscheri, S. Chiocchetti, C. Klingenberg, V. Springel, M. Dumbser, High order direct Arbitrary-Lagrangian-Eulerian schemes
on moving Voronoi meshes with topology changes, Journal of Computational Physics 407 (2020) 109167.
[23] E. J. Caramana, The implementation of slide lines as a combined force and velocity boundary condition, Journal of Computational Physics
228 (11) (2009) 3911–3916.
[24] E. Gaburro, A unified framework for the solution of hyperbolic PDE systems using high order direct Aarbitrary-Lagrangian-Eulerian schemes
on moving unstructured meshes with topology change, Archives of Computational Methods in Engineering (2020) 1–73.
[25] G. Wang, F. Duchaine, D. Papadogiannis, I. Duran, S. Moreau, L. Y. Gicquel, An overset grid method for large eddy simulation of turboma-
chinery stages, Journal of Computational Physics 274 (2014) 333–355.
[26] J. Ahmad, E. P. Duque, Helicopter rotor blade computation in unsteady flows using moving overset grids, Journal of Aircraft 33 (1) (1996)
54–60.
[27] H. Pomin, S. Wagner, Navier-Stokes analysis of helicopter rotor aerodynamics in hover and forward flight, Journal of Aircraft 39 (5) (2002)
813–821.
[28] V. Sankaran, A. Wissink, A. Datta, J. Sitaraman, M. Potsdam, B. Jayaraman, A. Katz, S. Kamkar, B. Roget, D. Mavriplis, H. Saberi, W.-B.
Chen, W. Johnson, R. Strawn, Overview of the Helios Version 2.0 Computational Platform for Rotorcraft Simulations.
[29] F. Zahle, N. N. Sørensen, J. Johansen, Wind turbine rotor-tower interaction using an incompressible overset grid method, Wind Energy: An
International Journal for Progress and Applications in Wind Power Conversion Technology 12 (6) (2009) 594–619.
[30] E. van der Weide, G. Kalitzin, J. Schluter, J. Alonso, Unsteady turbomachinery computations using massively parallel platforms, in: 44th
AIAA Aerospace Sciences Meeting and Exhibit, 2006, p. 421.
[31] A. Bakker, R. D. LaRoche, M.-H. Wang, R. V. Calabrese, Sliding mesh simulation of laminar flow in stirred reactors, Chemical Engineering
Research and Design 75 (1) (1997) 42–44.
[32] Z. Jaworski, M. Wyszynski, I. Moore, A. Nienow, Sliding mesh computational fluid dynamics-a predictive tool in stirred tank design, Pro-
ceedings of the Institution of Mechanical Engineers, Part E: Journal of Process Mechanical Engineering 211 (3) (1997) 149–156.
[33] K. Ng, N. Fentiman, K. Lee, M. Yianneskis, Assessment of sliding mesh CFD predictions and LDA measurements of the flow in a tank stirred
by a rushton impeller, Chemical Engineering Research and Design 76 (6) (1998) 737–747.
20
[34] J. McNaughton, I. Afgan, D. Apsley, S. Rolfo, T. Stallard, P. Stansby, A simple sliding-mesh interface procedure and its application to the
CFD simulation of a tidal-stream turbine, International Journal for Numerical Methods in Fluids 74 (4) (2014) 250–269.
[35] B. Zhang, C. Liang, A simple, efficient, and high-order accurate curved sliding-mesh interface approach to spectral difference method on
coupled rotating and stationary domains, Journal of Computational Physics 295 (2015) 147–160.
[36] B. Zhang, C. Liang, J. Yang, Y. Rong, A 2d parallel high-order sliding and deforming spectral difference method, Computers & Fluids 139
(2016) 184 – 196, 13th USNCCM International Symposium of High-Order Methods for Computational Fluid Dynamics - A special issue
dedicated to the 60th birthday of Professor David Kopriva.
[37] Z. Qiu, B. Zhang, C. Liang, M. Xu, A high-order solver for simulating vortex-induced vibrations using the sliding-mesh spectral difference
method and hybrid grids, International Journal for Numerical Methods in Fluids 90 (4) (2019) 171–194.
[38] E. Ferrer, R. H. Willden, A high order discontinuous Galerkin-Fourier incompressible 3d Navier-Stokes solver with rotating sliding meshes,
Journal of Computational Physics 231 (21) (2012) 7037 – 7056.
[39] L. Ramrez, C. Foulqui, X. Nogueira, S. Khelladi, J.-C. Chassaing, I. Colominas, New high-resolution-preserving sliding mesh techniques for
higher-order finite volume schemes, Computers & Fluids 118 (2015) 114 – 130.
[40] M. Wurst, M. Keßler, E. Kr¨
amer, A high-order discontinuous Galerkin Chimera method for laminar and turbulent flows, Computers & Fluids
121 (2015) 102 – 113.
[41] M. J. Brazell, J. Sitaraman, D. J. Mavriplis, An overset mesh approach for 3d mixed element high-order discretizations, Journal of Computa-
tional Physics 322 (2016) 33 – 51.
[42] H. Gallus, ERCOFTAC test case 6: axial flow turbine stage, in: Seminar and Workshop on 3D Turbomachinery flow prediction III, Les Arcs,
France, 1995.
[43] R. Walraevens, H. Gallus, Testcase 6–1-1/2 stage axial flow turbine, ERCOFTAC Testcase 6 (1997) 201–212.
[44] T. Volmar, B. Brouillet, H. Benetschik, H. Gallus, Test case 6: 1-1/2 stage axial flow turbine-unsteady computation, in: ERCOFTAC Turbo-
machinery Seminar and Workshop, 1998.
[45] C. Hirt, A. Amsden, J. Cook, An arbitrary Lagrangian-Eulerian computing method for all flow speeds, Journal of Computational Physics
14 (3) (1974) 227–253.
[46] D. A. Kopriva, Implementing spectral methods for partial differential equations: Algorithms for scientists and engineers (2009).
[47] F. Bassi, S. Rebay, A high-order accurate discontinuous finite element method for the numerical solution of the compressible Navier-Stokes
equations, Journal of Computational Physics 131 (2) (1997) 267–279.
[48] S. Pirozzoli, Numerical methods for high-speed flows, Annual Review of Fluid Mechanics 43 (2011) 163–194.
[49] P. L. Roe, Approximate Riemann solvers, parameter vectors, and difference schemes, Journal of Computational Physics 43 (2) (1981) 357–
372.
[50] A. Harten, J. M. Hyman, Self adjusting grid methods for one-dimensional hyperbolic conservation laws, Journal of Computational Physics
50 (2) (1983) 235 – 269.
[51] M. H. Carpenter, C. A. Kennedy, Fourth-order 2n-storage Runge-Kutta schemes, Tech. Rep. NASA-TM-109112, NASA Langley Research
Center (June 1994).
[52] C. Mavriplis, Nonconforming discretizations and a posteriori error estimators for adaptive spectral element techniques, Ph.D. thesis, Mas-
sachusetts Institute of Technology (1989).
[53] D. A. Kopriva, A conservative staggered-grid Chebyshev multidomain method for compressible flows. II. A semi-structured method, Journal
of Computational Physics 128 (2) (1996) 475–488.
[54] D. A. Kopriva, A staggered-grid multidomain spectral method for the compressible Navier-Stokes equations, Journal of Computational
Physics 143 (1) (1998) 125–158.
[55] E. Laughton, G. Tabor, D. Moxey, A comparison of interpolation techniques for non-conformal high-order discontinuous Galerkin methods
(2020). arXiv:2007.15534.
[56] N. Krais, G. Schncke, T. Bolemann, G. J. Gassner, Split form ALE discontinuous Galerkin methods with applications to under-resolved
turbulent low-Mach number flows, Journal of Computational Physics 421 (2020) 109726. doi:https://doi.org/10.1016/j.jcp.
2020.109726.
URL http://www.sciencedirect.com/science/article/pii/S0021999120305003
[57] F. Hindenlang, T. Bolemann, C.-D. Munz, Mesh curving techniques for high order discontinuous Galerkin simulations, in: IDIHOM: Indus-
trialization of high-order methods-a top-down approach, Springer, 2015, pp. 133–152.
[58] G. J. Gassner, F. L¨
orcher, C.-D. Munz, J. S. Hesthaven, Polymorphic nodal elements and their application in discontinuous Galerkin methods,
Journal of Computational Physics 228 (5) (2009) 1573–1590.
[59] C. Altmann, A. D. Beck, F. Hindenlang, M. Staudenmaier, G. J. Gassner, C.-D. Munz, An efficient high performance parallelization of a
discontinuous Galerkin spectral element method, in: Facing the Multicore-Challenge III, Springer, 2013, pp. 37–47.
[60] R. E. Walraevens, H. E. Gallus, A. R. Jung, J. F. Mayer, H. Stetter, Experimental and computational study of the unsteady flow in a 1.5 stage
axial turbine with emphasis on the secondary flow in the second stator, in: ASME 1998 International Gas Turbine and Aeroengine Congress
and Exhibition, American Society of Mechanical Engineers, 1998, pp. V001T01A069–V001T01A069.
[61] T. W. Volmar, B. Brouillet, H. E. Gallus, H. Benetschik, Time-accurate three-dimensional Navier-Stokes analysis of one-and-one-half stage
axial-flow turbine, Journal of Propulsion and Power 16 (2) (2000) 327–335.
[62] J. Yao, R. L. Davis, J. J. Alonso, A. Jameson, Massively parallel simulation of the unsteady flow in an axial turbine stage, Journal of Propulsion
and Power 18 (2) (2002) 465–471.
[63] Unsteady Simulation of a 1.5 Stage Turbine Using an Implicitly Coupled Nonlinear Harmonic Balance Method, Vol. Volume 8: Turboma-
chinery, Parts A, B, and C of Turbo Expo: Power for Land, Sea, and Air.
[64] C. Utz, Experimentelle Untersuchung der Str¨
omungsverluste in einer mehrstufigen Axialturbine, Ph.D. thesis, ETH Z ¨
urich, Switzerland
(1972). doi:10.3929/ETHZ-A- 000088584.
[65] N. Gourdain, Prediction of the unsteady turbulent flow in an axial compressor stage. part 1: Comparison of unsteady RANS and LES with
experiments, Computers & Fluids 106 (2015) 119 – 129.
21
[66] W. Rodi, DNS and LES of some engineering flows, Fluid Dynamics Research 38 (2-3) (2006) 145–173.
[67] J.-R. Carlson, Inflow/outflow boundary conditions with application to fun3d, Tech. Rep. NASA-TM2011-217181, NASA Langley Research
Center (October 2011).
[68] D. Flad, A. D. Beck, G. Gassner, C.-D. Munz, A discontinuous Galerkin spectral element method for the direct numerical simulation of
aeroacoustics, in: 20th AIAA/CEAS Aeroacoustics Conference, 2014, p. 2740.
[69] J. Niegemann, R. Diehl, K. Busch, Efficient low-storage Runge-Kutta schemes with optimized stability regions, Journal of Computational
Physics 231 (2) (2012) 364 – 372.
[70] X. Wu, P. A. Durbin, Evidence of longitudinal vortices evolved from distorted wakes in a turbine passage, Journal of Fluid Mechanics 446
(2001) 199228.
[71] J. D. Coull, H. P. Hodson, Unsteady boundary-layer transition in low-pressure turbines, Journal of Fluid Mechanics 681 (2011) 370410.
[72] J. Jeong, F. Hussain, On the identification of a vortex, Journal of Fluid Mechanics 285 (1995) 6994.
[73] P. Kopper, M. Kurz, C. Wenzel, J. D ¨
urrw¨
achter, C. Koch, A. Beck, Boundary layer dynamics in wall-resolved LES across multiple turbine
stages, manuscript submitted for publication (2020).
[74] A. Beck, P. Ortwein, P. Kopper, N. Krais, D. Kempf, C. Koch, Towards high-fidelity erosion prediction: On time-accurate particle tracking in
turbomachinery, International Journal of Heat and Fluid Flow 79 (2019) 108457.
22