PresentationPDF Available

Parallel LBM Methods for Pore Scale Resolved Complex Flows

Authors:

Abstract and Figures

We will present parallel lattice Boltzmann methods for pore-scale resolved flows, analysing their efficiency and accuracy. As an advanced application scenario we will show results for simulating a powder based 3D printing process where each grain is geometrically resolved.
Content may be subject to copyright.
Complex Flows — Ulrich Rüde
Lehrstuhl für Simulation
Universität Erlangen-Nürnberg
www10.informatik.uni-erlangen.de
Ulrich Rüde
LSS Erlangen and CERFACS Toulouse
ulrich.ruede@fau.de
1
Centre Européen de Recherche et de
Formation Avancée en Calcul Scientifique
www.cerfacs.fr
Tokyo, March 8, 2018
Parallel LBM Methods for
Pore Scale Resolved Complex Flows
Access
Venue
Waseda University – Nishi Waseda Campus
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
Nearest Station : Tokyo Metro Fukutoshin Line, Nishi-waseda Station
Maps : Waseda University – Nishi Waseda Campus
Access to the venue
SIAM Conference on
Parallel Processing for
Scienti!c Computing (PP18)
March 7-10, 2018 Waseda University Tokyo, Japan
Menu
Outline
Motivation
Direct Simulation of Complex Flows
1. Solid phase - rigid body dynamics
2. Fluid phase - Lattice Boltzmann method
3. Gas phase - free surface tracking, volume of fluids
Multi-Physics Simulations
Additive Manufacturing
Perspectives
2
Complex Flows - Ulrich Rüde
3
The LBM stream step
Move PDFs
into neighboring cells
Non-local part,
Linear propagation to neighbors
(stream step)
Local part,
Non-linear operator,
(collide step)
Complex Flows - Ulrich Rüde
4
The LBM collide step
Compute new PDFs modeling molecular collisions
Most collision operators can be expressed as
Equilibrium function: non-linear,!
depending on the conserved moments , , and .
Complex Flows - Ulrich Rüde
The Lattice Boltzmann Algorithm
5
Complex Flows - Ulrich Rüde
Complex Flows — Ulrich Rüde
Where have all my cycles gone?
evaluating single node performance
6
SuperMUC
JUQUEEN
vectorized
optimized
standard
Pohl, T., Deserno, F., Thürey, N., UR, Lammers, P., Wellein, G., & Zeiser, T. (2004). Performance evaluation of parallel large-
scale lattice Boltzmann applications on three supercomputing architectures. Proceedings of the 2004 ACM/IEEE conference
on Supercomputing (p. 21). IEEE Computer Society.
Donath, S., Iglberger, K., Wellein, G., Zeiser, T., Nitsure, A., & UR (2008). Performance comparison of different parallel lattice
Boltzmann implementations on multi-core multi-socket systems. International Journal of Computational Science and
Engineering, 4(1), 3-11.
Büro für Gestaltung Wangler & Abele 04. April 2011
Weak scaling for TRT
lid driven cavity - uniform grids
JUQUEEN !
16 processes per node
4 threads per process
SuperMUC
4 processes per node
4 threads per process
0.837 × 1012 cell
updates !
per second (TLups)
2.1 × 1012 cell updates !
per second (TLups)
Complex Flows - Ulrich Rüde
History of Data Locality Techniques for
Node Level Performance Optimization
Stals, L., & Rüde, U. (1997). Techniques for improving the data locality of iterative methods. Australian National
University, Centre for Mathematics and its Applications, School of Mathematical Sciences.
Weiß, C., Karl, W., Kowarschik, M., & Rude, U. (1999, November). Memory characteristics of iterative methods. In
Supercomputing, ACM/IEEE 1999 Conference (pp. 31-31). IEEE.
Douglas, C. C., Hu, J., Kowarschik, M., Rüde, U., & Weiß, C. (2000). Cache optimization for structured and
unstructured grid multigrid. Electronic Transactions on Numerical Analysis, 10, 21-40.
Iglberger, K. (2003). Cache optimizations for the lattice Boltzmann method in 3D. Lehrstuhl für Informatik, 10.
Wilke, J., Pohl, T., Kowarschik, M., & Rüde, U. (2003). Cache performance optimizations for parallel lattice Boltzmann
codes. In European Conference on Parallel Processing (pp. 441-450). Springer, Berlin, Heidelberg.
Pohl, T., Kowarschik, M., Wilke, J., Iglberger, K., & Rüde, U. (2003). Optimization and profiling of the cache
performance of parallel lattice Boltzmann codes. Parallel Processing Letters, 13(04), 549-560.
Pohl, T., Deserno, F., Thurey, N., Rude, U., Lammers, P., Wellein, G., & Zeiser, T. (2004, November). Performance
evaluation of parallel large-scale lattice Boltzmann applications on three supercomputing architectures. In
Supercomputing, 2004. Proceedings of the ACM/IEEE SC2004 Conference (pp. 21-21). IEEE.
Zeiser, T., Wellein, G., Nitsure, A., Iglberger, K., Rude, U., & Hager, G. (2008). Introducing a parallel cache oblivious
blocking approach for the lattice Boltzmann method. Progress in Computational Fluid Dynamics, an International
Journal, 8(1-4), 179-188.
8
Complex Flows - Ulrich Rüde
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde 9
Pore scale resolved
flow in porous media
Direct numerical simulation of flow
through sphere packings
Beetstra, R., Van der Hoef, M. A., & Kuipers, J. A. M. (2007). Drag force of intermediate Reynolds number
flow past mono-and bidisperse arrays of spheres. AIChE Journal, 53(2), 489-501.
Tenneti, S., Garg, R., & Subramaniam, S. (2011). Drag law for monodisperse gas–solid systems using
particle-resolved direct numerical simulation of flow past fixed assemblies of spheres. International journal
of multiphase flow, 37(9), 1072-1092.
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde
Flow field and vorticity
2D slice visualized
Domain size:
Re = 300
Volume fraction:
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde 11
Drag Correlations
Fluid-solid systems
Important in chemical engineering (fluidized beds,
hydrocyclone, thickener, flotation columns)
Relate the drag force per particle to the
local particle Reynolds number (relative velocity) and
solid volume fraction
Examples: Wen & Yu (1966), Ergun (1952)
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde 12
Macroscopic drag correlation
Finally, the drag correlation reads
Average absolute percentage error: 9.7 %
Bogner, S., Mohanty, S., & UR (2015). Drag correlation for dilute and moderately dense fluid-particle systems
using the lattice Boltzmann method, International Journal of Multiphase Flow 68, 71-79.
Streamwise velocity
Porosity
Height of the channel
00.2 0.4 0.6 0.8 1
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
Porosity
Velocity Profile
Flow over porous structure
13
Complex Flows - Ulrich Rüde
Fattahi E., Waluga C., Wohlmuth B., Rüde U. (2016) Large Scale Lattice Boltzmann Simulation for the Coupling of Free
and Porous Media Flow. In: Kozubek T., Blaheta R., Šístek J., Rozložník M., Čermák M. (eds) High Performance
Computing in Science and Engineering. HPCSE 2015. Lecture Notes in Computer Science, vol 9611. Springer, Cham.
10 Lecture Notes in Computer Science: Authors’ Instructions
(a) pore geometry and streamlines
Streamwise velocity
Porosity
Height of the channel
00.2 0.4 0.6 0.8 1
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
Porosity
Velocity Profile
(b) planar average of stream-wise velocity
Fig. 4. pore-scale simulation of free flow over porous media.
in the OTW model, the jump coecient and the eective viscosity µeare
unknown and in the Br model, the eective viscosity µeis unknown.
By using the DNS solution, we calculate the optimal value for the unknown
parameters. The domain that is used is a channel which is periodic in stream-wise
and span-wise directions (Fig. 5). A free fluid flows on the top of a porous media.
To make the comparison independent of the setup, all of the flow properties are
non-dimensionalized.
Fig. 5. Schematic of the simulation domain and averaged velocity profile in the open
and porous regions.
The value of the interface velocity Uint , can be directly obtained from the
averaged velocity profile of the DNS. In order to obtain the velocity gradient on
Pore geometry and streamlines
Setup
Streamwise velocity
Porosity
Height of the channel
00.2 0.4 0.6 0.8 1
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
Porosity
Velocity Profile
Complex Flows — Ulrich Rüde
Setup of random spherical structure for
porous media with the PE
14
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde
Re 3000 direct numerical simulation
Volume rendering of velocity magnitude
Periodic in X and Y direction
I10 cluster, 7x32x19=4256 core hours
1,300,000 timesteps
8 times more time steps on the finest level
Turbulent flow over a permeable region
15
Büro für Gestaltung Wangler & Abele 04. April 2011
Flow through structure of thin crystals (filter)
16
Complex Flows Ulrich Rüde
Gil, A., Galache, J. P. G., Godenschwager, C., & Rüde, U. (2017). Optimum
configuration for accurate simulations of chaotic porous media with Lattice
Boltzmann Methods considering boundary conditions, lattice spacing and domain
size. Computers & Mathematics with Applications, 73(12), 2515-2528.
Free Surface Flows
Volume-of-Fluids like approach
Flag field: Compute only in fluid
Special “free surface” conditions in interface cells
Reconstruction of curvature for surface tension
17
Complex Flows - Ulrich Rüde
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde
Simulation of!
Metal Foams
Example application:
Engineering: metal foam simulations
Based on LBM:
Free surfaces
Surface tension
Disjoining pressure to stabilize thin liquid
films
Parallelization with MPI and load
balancing
Collaboration with C. Körner (Dept. of
Material Sciences, Erlangen)
Other applications:
Food processing
Fuel cells
18
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde
Additive Manufacturing
Fast Electron Beam
Melting
19
Bikas, H., Stavropoulos, P., & Chryssolouris, G. (2015). Additive manufacturing methods and modelling approaches: a critical
review. The International Journal of Advanced Manufacturing Technology, 1-17.
Klassen, A., Scharowsky, T., & Körner, C. (2014). Evaporation model for beam based additive manufacturing using free
surface lattice Boltzmann methods. Journal of Physics D: Applied Physics, 47(27), 275303.
Körner, C., Thies, M., Hofmann, T., Thürey, N., & UR (2005). Lattice Boltzmann model for free surface flow for modeling
foaming. Journal of Statistical Physics, 121(1-2), 179-196.
Donath, S., Mecke, K., Rabha, S., Buwa, V., & UR (2011). Verification of surface tension in the parallel free surface lattice
Boltzmann method in waLBerla. Computers & Fluids, 45(1), 177-186.
Thürey, N., &UR. (2009). Stable free surface flows with the lattice Boltzmann method on adaptively coarsened grids.
Computing and Visualization in Science, 12(5), 247-263.
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde
Motivating Example: Simulation of Electron
Beam Melting Process (Additive Manufacturing)
EU-Project Fast-
EBM
ARCAM
(Gothenburg)
TWI (Cambridge)
WTM (FAU)
ZISC (FAU)
Generation of
powder bed
Energy transfer by
electron beam
penetration depth
heat transfer
Flow dynamics
melting
solidification
melt flow
surface tension
wetting
capillary forces
contact angles
20
Ammer, R., Markl, M., Ljungblad, U., Körner, C., & UR (2014).
Simulating fast electron beam melting with a parallel thermal free
surface lattice Boltzmann method. Computers & Mathematics with
Applications, 67(2), 318-330.
Ammer, R., UR, Markl, M., Jüchter V., & Körner, C. (2014).
Validation experiments for LBM simulations of electron beam
melting. International Journal of Modern Physics C.
Büro für Gestaltung Wangler & Abele 04. April 2011
Simulation of Electron Beam Melting
21
Complex Flows Ulrich Rüde
Simulating powder bed generation
using the PE framework
High speed camera shows
melting step for manufacturing a
hollow cylinder
WaLBerla Simulation
Simulating powder bed generation
using the PE framework
Büro für Gestaltung Wangler & Abele 04. April 2011
Study of AM process strategies
22
Complex Flows Ulrich Rüde
Chapter 9: Simulation for Application: EBM
enable a speed up of the build times. However, developers and customers are
interested in faster production times since they reduce the overall costs and
increase the success of the EBM process in other industry branches. In the
following the process window in Figure 9.8 which shows scan velocities up to
6.4m
sis extended numerically up to 30 m
s. The scan velocities are increased
while studying porosity and swelling limits in order to find out the best possible
parameter configuration of scan velocity and line energy.
0 5 10 15 20 25 30 35
scan velocity m/s
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Line Energy kJ/m
swelling
porous
good
1.2 kW
2.4 kW
4.8 kW
Figure 9.10: Extended numerical process window of the EBM process with
100 µm line offset.
Figure 9.10 shows the numerical extended process window with scan veloci-
ties up to around 30 m
s. It contains also the previous simulation results which
were produced for the validation in Section 9.1.3.3 (see left hand side of the
vertical line). The blue downward oriented triangles stand for porous samples,
the green circles for samples with sufficient properties, and the red upward
directed triangles for samples where swelling effects may occur. In addition
three strictly decreasing functions denote three different beam powers where
the beam diameter is reliable. 1.2 kW stands for the existing electron beam gun
and 2.4 kW and 4.8 kW for future electron beam guns.
In Figure 9.10 the last numerical simulation classified as ”good” has a scan
velocity of 29 m
sand the range of possible good samples closes at around 30 m
s.
An almost constant lower porosity border and an upper swelling border can be
identified. Higher scan velocities than 30 m
sresult in samples which are porous
138
Markl, M., Ammer, R., Rüde, U., & Körner, C. (2015). Numerical investigations on hatching process
strategies for powder-bed-based additive manufacturing using an electron beam. The International
Journal of Advanced Manufacturing Technology, 78(1-4), 239-247.
Büro für Gestaltung Wangler & Abele 04. April 2011
Coupled Flow for ExaScale — Ulrich Rüde
Conclusions
23
Thank you for your attention!
24
Complex Flows - Ulrich Rüde
Bogner, S., & UR. (2013). Simulation of floating bodies with the lattice Boltzmann method. Computers & Mathematics with
Applications, 65(6), 901-913.
Anderl, D., Bogner, S., Rauh, C., UR, & Delgado, A. (2014). Free surface lattice Boltzmann with enhanced bubble model.
Computers & Mathematics with Applications, 67(2), 331-339.
Bogner, S. Harting, J., & UR (2017). Simulation of liquid-gas-solid flow with a free surface lattice Boltzmann method. Submitted.
Thank you for your attention!
25
Complex Flows - Ulrich Rüde
Bogner, S., & UR. (2013). Simulation of floating bodies with the lattice Boltzmann method. Computers & Mathematics with
Applications, 65(6), 901-913.
Anderl, D., Bogner, S., Rauh, C., UR, & Delgado, A. (2014). Free surface lattice Boltzmann with enhanced bubble model.
Computers & Mathematics with Applications, 67(2), 331-339.
Bogner, S. Harting, J., & UR (2017). Simulation of liquid-gas-solid flow with a free surface lattice Boltzmann method. Submitted.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
When designing and implementing highly ecient scientific applications for parallel com- puters such as clusters of workstations, it is inevitable to consider and to optimize the single- CPU performance of the codes. For this purpose, it is particularly important that the codes respect the hierarchical memory designs that computer architects employ in order to hide the eects of the growing gap between CPU performance and main memory speed. In this paper, we present techniques to enhance the single-CPU eciency of lattice Boltzmann methods which are commonly used in computational fluid dynamics. We show various performance results to emphasize the eectiveness of our optimization techniques.
Article
Full-text available
When designing and implementing highly ecient scientific applications for parallel comput- ers such as clusters of workstations, it is inevitable to consider and to optimize the single-CPU performance of the codes. For this purpose, it is particularly important that the codes respect the hierarchical memory designs that computer architects employ in order to hide the eects of the growing gap between CPU performance and main memory speed. In this article, we present techniques to enhance the single-CPU eciency of lattice Boltzmann methods which are commonly used in computational fluid dynamics. We show various performance results for both 2D and 3D codes in order to emphasize the eectiveness of our optimization techniques.
Conference Paper
Full-text available
Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that the current DRAM technology cannot provide the data fast enough to keep the CPU busy. Although the fundamentals of cache optimizations are quite simple, current compilers cannot optimize even elementary iterative schemes. In this paper, we analyze the memory and cache behavior of iterative methods with extensive profiling and describe program transformation techniques to improve the cache performance of two- and three-dimensional multigrid algorithms.
Conference Paper
Full-text available
Computationally intensive programs with moderate communication requirements such as CFD codes suffer from the standard slow interconnects of commodity "off the shelf" (COTS) hardware. We will introduce different large-scale applications of the Lattice Boltzmann Method (LBM) in fluid dynamics, material science, and chemical engineering and present results of the parallel performance on different architectures. It will be shown that a high speed communication network in combination with an efficient CPU is mandatory in order to achieve the required performance. An estimation of the necessary CPU count to meet the performance of 1 TFlop/s will be given as well as a prediction as to which architecture is the most suitable for LBM. Finally, ratios of costs to application performance for tailored HPC systems and COTS architectures will be presented.
Article
Full-text available
. The numerical solution of partial differential equations leads to large, sparse systems of equations with up a several millions of unknowns. Fast iterative algorithms for the solution of these systems are typically based on the multilevel principle. Unfortunately, some of the commonly used programming techniques lead to a high overhead on many advanced computer architectures. A fundamental problem arises from hierarchical memory architectures with several layers of caches. Their effective use requires programs with data access locality. Unfortunately, iterative solvers are typically implemented by using global sweeps over the whole data set, and thus their performance is essentially limited by the speed of the memory system. This article introduces techniques to improve the data locality and therefore the efficiency of multigrid algorithms. 1991 Mathematics Subject Classification: primary 65M55; secondary 68-04. Postal address: Institut fur Mathematik, Universitat Augsburg, Germany ...
Article
Full-text available
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a code depends on how well the cache structure is exploited. The number of cache misses provides a better measure for comparing algorithms than the number of multiplies. In this paper, suitable blocking strategies for both structured and unstructured grids will be introduced. They improve the cache usage without changing the underlying algorithm. In particular, bitwise compatibility is guaranteed between the standard and the high performance implementations of the algorithms. This is illustrated by comparisons for various multigrid algorithms on a selection of different computers for problems in two and three dimensions. The code restructuring can yield performance improvements of factors of 2-5. This allows the modified codes to achieve a much higher percentage of the peak performance of the CPU than is usually observed with standard implementations. Key words. Computer architectures, i...
Article
Evaporation plays an important role in many technical applications including beam-based additive manufacturing processes, such as selective electron beam or selective laser melting (SEBM/SLM). In this paper, we describe an evaporation model which we employ within the framework of a two-dimensional free surface lattice Boltzmann method. With this method, we solve the hydrodynamics as well as thermodynamics of the molten material taking into account the mass and energy losses due to evaporation and the recoil pressure acting on the melt pool. Validation of the numerical model is performed by measuring maximum melt depths and evaporative losses in samples of pure titanium and Ti–6Al–4V molten by an electron beam. Finally, the model is applied to create processing maps for an SEBM process. The results predict that the penetration depth of the electron beam, which is a function of the acceleration voltage, has a significant influence on evaporation effects.
Article
This paper presents an enhancement to the free surface lattice Boltzmann method (FSLBM) for the simulation of bubbly flows including rupture and breakup of bubbles. The FSLBM uses a volume of fluid approach to reduce the problem of a liquid–gas two-phase flow to a single-phase free surface simulation. In bubbly flows compression effects leading to an increase or decrease of pressure in the suspended bubbles cannot be neglected. Therefore, the free surface simulation is augmented by a bubble model that supplies the missing information by tracking the topological changes of the free surface in the flow. The new model presented here is capable of handling the effects of bubble breakup and coalesce without causing a significant computational overhead. Thus, the enhanced bubble model extends the applicability of the FSLBM to a new range of practically relevant problems, like bubble formation and development in chemical reactors or foaming processes.
Article
This paper is devoted to the simulation of floating rigid bodies in free surface flows. For that, a lattice Boltzmann based model for liquid–gas–solid flows is presented. The approach is built upon previous work for the simulation of liquid–solid particle suspensions on the one hand, and on an interface-capturing technique for liquid–gas free surface flows on the other. The incompressible liquid flow is approximated by a lattice Boltzmann scheme, while the dynamics of the compressible gas are neglected. We show how the particle model and the interface capturing technique can be combined by a novel set of dynamic cell conversion rules. We also evaluate the behaviour of the free surface–particle interaction in simulations. One test case is the rotational stability of non-spherical rigid bodies floating on a plane water surface–a classical hydrostatic problem known from naval architecture. We show the consistency of our method in this kind of flows and obtain convergence towards the ideal solution for the heeling stability of a floating box.
Article
In this report we propose a parallel cache oblivious spatial and temporal blocking algorithm for the lattice Boltzmann method in three spatial dimensions. The algorithm has originally been proposed by Frigo et al. (1999) and divides the space-time domain of stencil-based methods in an optimal way, independently of any external parameters, e.g., cache size. In view of the increasing gap between processor speed and memory performance this approach offers a promising path to increase cache utilisation. We find that even a straightforward cache oblivious implementation can reduce memory traffic at least by a factor of two if compared to a highly optimised standard kernel and improves scalability for shared memory parallelisation. Due to the recursive structure of the algorithm we use an unconventional parallelisation scheme based on task queuing.