Figure 3 - uploaded by Mohamed Hassanine Aissa
Content may be subject to copyright.
2: On the left a structured mesh with a fixed number of neighbors for cells and vertices. On the right an unstructured mesh with the same number of nodes. The number of cell neighbors for the unstructured mesh is the same for all interior elements (m = 3) but for interior vertices the number of neighbors varies from 5 to 7. 

2: On the left a structured mesh with a fixed number of neighbors for cells and vertices. On the right an unstructured mesh with the same number of nodes. The number of cell neighbors for the unstructured mesh is the same for all interior elements (m = 3) but for interior vertices the number of neighbors varies from 5 to 7. 

Source publication
Thesis
Full-text available
Many algorithms are nowadays already intended to run in parallel when developed but running an algorithm efficiently in a massively parallel device such as the GPU is still challenging. For CFD applications, speedups ranging from one to two orders of magnitude or even beyond are reported in the literature [Niemeyer and Sung, 2014b]. Different speed...

Citations

... Without modifying the original CFD code and applied as a plug-in, these implementations were intended to improve the memory bandwidth, one of the main restrictions to applying OpenFOAM in HPC [17]. However, reported issues with memory copies and inconsistent speedups have raised concerns about the appropriateness of hardware investments [18]. Therefore, GPU-enabled libraries have been updated through RapidCFD [19], an open-source OpenFOAM 2.3.1 fork able to run almost entire simulations on NVIDIA GPUs [20]. ...
Article
Full-text available
Driven by the emergence of Graphics Processing Units (GPUs), the solution of increasingly large and intricate numerical problems has become feasible. Yet, the integration of GPUs into Computational Fluid Dynamics (CFD) codes still presents a significant challenge. This study undertakes an evaluation of the computational performance of GPUs for CFD applications. Two Compute Unified Device Architecture (CUDA)-based implementations within the Open Field Operation and Manipulation (OpenFOAM) environment were employed for the numerical solution of a 3D Kaplan turbine draft tube workbench. A series of tests were conducted to assess the fixed-size grid problem speedup in accordance with Amdahl’s Law. Additionally, tests were performed to identify the optimal configuration utilizing various linear solvers, preconditioners, and smoothers, along with an analysis of memory usage.
Chapter
Full-text available
Recently, several research groups have demonstrated significant speedups of scientific computations using General Purpose Graphics Processor Units (GPGPU) as massively-parallel “co-processors” to the Central Processing Unit (CPU). However, the tremendous computational power of GPGPUs has come with a high price since their implementation to Computational Fluids Dynamics (CFD) solvers is still a challenge. To achieve this implementation, the RapidCFD library was developed from the Open Field Operation and Manipulation (OpenFOAM) CFD software to let that the multi-GPGPU were able of running almost the entire simulation in parallel. The parallel performance, as fixed-size speed-up, efficiency and parallel fraction, according to the Amdahl’s law, were compared in two massively parallel multi-GPGPU architectures using Nvidia Tesla C1060 and M2090 units. The simulations were executed on a 3D turbo-machinery benchmark which consist of a structured grid domain of 1 million cells. The results obtained from the implementation of the new library on different software and hardware layouts show that by transferring directly all the computations executed by the linear system solvers to the GPGPU, is possible to make a typical CFD simulation until 9 times faster. Additionally a grid convergence analysis and pressure recovery measurements were executed over scaled computational domains. Thus, it is expected to obtain an affordable low computational cost when the domain be scaled in order to achieve a high flow resolution.
Article
This paper describes the main features of a pioneering unsteady solver for simulating ideal two-fluid plasmas on unstructured grids, taking profit of GPGPU (General-purpose computing on graphics processing units). The code, which has been implemented within the open source COOLFluiD platform, is implicit, second-order in time and space, relying upon a Finite Volume method for the spatial discretization and a three-point backward Euler for the time integration. In particular, the convective fluxes are computed by a multi-fluid version of the AUSM+up scheme for the plasma equations, in combination with a modified Rusanov scheme with tunable dissipation for the Maxwell equations. Source terms are integrated with a one-point rule, using the cell-centered value. Some critical aspects of the porting to GPU's are discussed, as well as the performance of two open source linear system solvers (i.e. PETSc, PARALUTION). The code design allows for computing both flux and source terms on the GPU along with their Jacobian, giving a noticeable decrease in the computational time in comparison with the original CPU-based solver. The code has been tested in a wide range of mesh sizes and in three different systems, each one with a different GPU. The increased performance (up to 14x) is demonstrated in two representative 2D benchmarks: propagation of circularly polarized waves and the more challenging Geospace Environmental Modeling (GEM) magnetic reconnection challenge.
Preprint
Full-text available
This paper describes the main features of a pioneering unsteady solver for simulating ideal two-fluid plasmas on unstructured grids, taking profit of GPGPU (General-purpose computing on graphics processing units). The code, which has been implemented within the open source COOLFluiD platform, is implicit, second-order in time and space, relying upon a Finite Volume method for the spatial discretization and a three-point backward Euler for the time integration. In particular, the convective fluxes are computed by a multi-fluid version of the AUSM+up scheme for the plasma equations, in combination with a modified Rusanov scheme with tunable dissipation for the Maxwell equations. Source terms are integrated with a one-point rule, using the cell-centered value. Some critical aspects of the porting to GPU's are discussed, as well as the performance of two open source linear system solvers (i.e. PETSc, PARALUTION). The code design allows for computing both flux and source terms on the GPU along with their Jacobian, giving a noticeable decrease in the computational time in comparison with the original CPU-based solver. The code has been tested in a wide range of mesh sizes and in three different systems, each one with a different GPU. The increased performance (up to 14x) is demonstrated in two representative 2D benchmarks: propagation of circularly polarized waves and the more challenging Geospace Environmental Modeling (GEM) magnetic reconnection challenge.