The focus of this paper is on the parallel scalability of a distributed multigrid framework, known as the DTU Compute GPUlab Library, for execution on large heterogeneous supercomputers. We demonstrate near-ideal weak scalability for a high-order fully nonlinear potential flow (FNPF) time domain model on the Oakridge Titan supercomputer, which is equipped with a large number of many- core CPU-GPU nodes. The high-order numerical scheme for the solver is implemented to expose data locality and scalability, and the linear Laplace solver is based on an iterative multilevel preconditioned defect correction method due to Engsig-Karup et al. (2011) that is designed for high-throughput processing and massive parallelism. The parallel implementation is designed using software abstractions that enable code reuse and that hide many hardware details. In this work, the FNPF discretization is based on a multi-block discretization which allows for large-scale simulations. In this setup, each grid block is based on a logically structured mesh with support for curvilinear representation of horizontal block boundaries in order to allow for the accurate representation of geometric features such as surface-piercing bottom-mounted structures — e.g. mono-pile foundations as demonstrated. In the numerical benchmarks presented, we demonstrate using 8,192 modern Nvidia GPUs enabling unprecedented large scale and high-resolution nonlinear marine hydrodynamics applications.
We implement and evaluate a massively parallel and scalable algorithm based on a multigrid preconditioned Defect Correction method for the simulation of fully nonlinear free surface flows. The simulations are based on a potential model that describes wave propagation over uneven bottoms in three space dimensions and is useful for fast analysis and prediction purposes in coastal and offshore engineering. A dedicated numerical model based on the proposed algorithm is executed in parallel by utilizing affordable modern special purpose graphics processing unit (GPU). The model is based on a low-storage flexible-order accurate finite difference method that is known to be efficient and scalable on a CPU core (single thread). To achieve parallel performance of the relatively complex numerical model, we investigate a new trend in high-performance computing where many-core GPUs are utilized as high-throughput co-processors to the CPU. We describe and demonstrate how this approach makes it possible to do fast desktop computations for large nonlinear wave problems in numerical wave tanks (NWTs) with close to 50/100 million total grid points in double/single precision with 4GB global device memory available. A new code base has been developed in C++ and compute unified device architecture C and is found to improve the runtime more than an order in magnitude in double precision arithmetic for the same accuracy over an existing CPU (single thread) Fortran 90 code when executed on a single modern GPU. These significant improvements are achieved by carefully implementing the algorithm to minimize data-transfer and take advantage of the massive multi-threading capability of the GPU device. Copyright (c) 2011 John Wiley & Sons, Ltd.
The flexible-order, finite difference based fully nonlinear potential flow model described in [H.B. Bingham, H. Zhang, On the accuracy of finite difference solutions for nonlinear water waves, J. Eng. Math. 58 (2007) 211–228] is extended to three dimensions (3D). In order to obtain an optimal scaling of the solution effort multigrid is employed to precondition a GMRES iterative solution of the discretized Laplace problem. A robust multigrid method based on Gauss–Seidel smoothing is found to require special treatment of the boundary conditions along solid boundaries, and in particular on the sea bottom. A new discretization scheme using one layer of grid points outside the fluid domain is presented and shown to provide convergent solutions over the full physical and discrete parameter space of interest. Linear analysis of the fundamental properties of the scheme with respect to accuracy, robustness and energy conservation are presented together with demonstrations of grid independent iteration count and optimal scaling of the solution effort. Calculations are made for 3D nonlinear wave problems for steep nonlinear waves and a shoaling problem which show good agreement with experimental measurements and other calculations from the literature.
In this chapter, we use our library for heterogeneous and massively parallel GPU implementations. The library is written in Compute Unified Device Architecture (CUDA) C/C++ and a fully nonlinear and dispersive free surface water wave model  is implemented. We describe how flexible-order finite difference (stencil) approximations to the partial differential equations of the model can be prototyped using library components provided in an in-house library. In this library hardware-specific implementation details are hidden via FIGURE 11.1. Snapshot of steady state wave field generated by a Series 60 ship hull.
Robust computational procedures for the solution of non-hydrostatic, free surface, irrotational and invis- cid free-surface water waves in three space dimensions can be based on iterative preconditioned defect correction (PDC) methods. Such methods can be made efficient and scalable to enable prediction of free- surface wave transformation and accurate wave kinematics in both deep and shallow waters in large marine areas or for predicting the outcome of experiments in large numerical wave tanks. We revisit the classical governing equations are fully nonlinear and dispersive potential flow equations. We present new detailed fundamental analysis using finite-amplitude wave solutions for iterative solvers. We demonstrate that the PDC method in combination with a high-order discretization method enables efficient and scalable solution of the linear system of equations arising in potential flow models. Our study is particularly relevant for fast and efficient simulation of non-breaking fully nonlinear water waves over varying bottom topography that may be limited by computational resources or requirements. To gain insight into algorithmic properties and proper choices of discretization parameters for different PDC strategies, we study systematically limits of accuracy, convergence rate, algorithmic and numerical efficiency and scalability of the most efficient known PDC methods. These strategies are of interest, because they enable generalization of geometric multigrid methods to high-order accurate discretizations and enable significant improvement in numerical efficiency while incuring minimal storage requirements. We demonstrate robustness using such PDC methods for prac- tical ranges of interest for coastal and maritime engineering, that is, from shallow to deep water, and report details of numerical experiments that can be used for benchmarking purposes.
We present an arbitrary-order spectral element method for general-purpose simulation of non-overturning water waves, described by fully nonlinear potential theory. The method can be viewed as a high-order extension of the classical finite element method proposed by Cai et al (1998), although the numerical implementation differs greatly. Features of the proposed spectral element method include: nodal Lagrange basis functions, a general quadrature-free approach and gradient recovery using global L2 projections. The quartic nonlinear terms present in the Zakharov form of the free surface conditions can cause severe aliasing problems and consequently numerical instability for marginally resolved or very steep waves. We show how the scheme can be stabilised through a combination of over-integration of the Galerkin projections and a mild spectral filtering on a per element basis. This effectively removes any aliasing driven instabilities while retaining the high-order accuracy of the numerical scheme. The additional computational cost of the over-integration is found insignificant compared to the cost of solving the Laplace problem. The model is applied to several benchmark cases in two dimensions. The results confirm the high order accuracy of the model (exponential convergence), and demonstrate the potential for accuracy and speedup. The results of numerical experiments are in excellent agreement with both analytical and experimental results for strongly nonlinear and irregular dispersive wave propagation. The benefit of using a high-order -- possibly adapted -- spatial discretization for accurate water wave propagation over long times and distances is particularly attractive for marine hydrodynamics applications.