## No full-text available

To read the full-text of this research,

you can request a copy directly from the author.

To read the full-text of this research,

you can request a copy directly from the author.

... The objective of this work is to detail and benchmark a newly developed distributed multigrid framework, which we refer to as the DTU Compute GPUlab Library 20 . The framework offers scalable execution on supercomputers and compute cluster with heterogeneous architectures equipped with many-core coprocessors such as GPUs. ...

... Fast hydrodynamics codes have shown potential in real-time naval hydrodynamics simulations and visualisation 12,20,34 , efficient uncertainty quantification 5 , tsunami propagation 23 , and the development, design, and analysis of marine structures such as naval vessels, waveenergy conversion devices 45 A comprehensive introduction to hydrodynamics is given in 43 , while details on numerical modeling are found in 33 ...

The focus of this paper is on the parallel scalability of a distributed multigrid framework, known as the DTU Compute GPUlab Library, for execution on large heterogeneous supercomputers. We demonstrate near-ideal weak scalability for a high-order fully nonlinear potential flow (FNPF) time domain model on the Oakridge Titan supercomputer, which is equipped with a large number of many- core CPU-GPU nodes. The high-order numerical scheme for the solver is implemented to expose data locality and scalability, and the linear Laplace solver is based on an iterative multilevel preconditioned defect correction method due to Engsig-Karup et al. (2011) that is designed for high-throughput processing and massive parallelism. The parallel implementation is designed using software abstractions that enable code reuse and that hide many hardware details. In this work, the FNPF discretization is based on a multi-block discretization which allows for large-scale simulations. In this setup, each grid block is based on a logically structured mesh with support for curvilinear representation of horizontal block boundaries in order to allow for the accurate representation of geometric features such as surface-piercing bottom-mounted structures — e.g. mono-pile foundations as demonstrated. In the numerical benchmarks presented, we demonstrate using 8,192 modern Nvidia GPUs enabling unprecedented large scale and high-resolution nonlinear marine hydrodynamics applications.

... CFD models are typically too dissipative as a result of the low-order accuracy imposed by computational limitations for large-scale wave simulations. In contrast, already today FNPF models can be used for long-time and large-scale wave simulations [12,23]. FNPF solvers can be used for resolution of full sea states in large marine or coastal areas where nonlinear waves interact with fixed or floating structures. ...

... We note, that BEM is particularly attractive as a near-field solver for cases where waves interact with complex geometries [73] and may be combined with a far-field solver such as FEM [71]. The overall efficiency and scalability of BEM [28] can be compared to efficient and massively parallel free surface hydrodynamics solvers such as [19,16,55] which can achieve very high efficiency and scalability using multigrid-type methods [42,14] for arbitrary sized discrete problems, in particular when the (possibly curvilinear multiblock) meshes are logically structured, e.g. as in [23]. ...

We present an arbitrary-order spectral element method for general-purpose simulation of non-overturning water waves, described by fully nonlinear potential theory. The method can be viewed as a high-order extension of the classical finite element method proposed by Cai et al (1998), although the numerical implementation differs greatly. Features of the proposed spectral element method include: nodal Lagrange basis functions, a general quadrature-free approach and gradient recovery using global L2 projections.
The quartic nonlinear terms present in the Zakharov form of the free surface conditions can cause severe aliasing problems and consequently numerical instability for marginally resolved or very steep waves. We show how the scheme can be stabilised through a combination of over-integration of the Galerkin projections and a mild spectral filtering on a per element basis. This effectively removes any aliasing driven instabilities while retaining the high-order accuracy of the numerical scheme. The additional computational cost of the over-integration is found insignificant compared to the cost of solving the Laplace problem.
The model is applied to several benchmark cases in two dimensions. The results confirm the high order accuracy of the model (exponential convergence), and demonstrate the potential for accuracy and speedup. The results of numerical experiments are in excellent agreement with both analytical and experimental results for strongly nonlinear and irregular dispersive wave propagation. The benefit of using a high-order -- possibly adapted -- spatial discretization for accurate water wave propagation over long times and distances is particularly attractive for marine hydrodynamics applications.

Massively parallel processors, such as graphical processing units (GPUs), have in recent years proven to be effective for a vast amount of scientific applications. Today, most desktop computers are equipped with one or more powerful GPUs, offering heterogeneous high-performance computing to a broad range of scientific researchers and software developers. Though GPUs are now programmable and can be highly effective computing units, they still pose challenges for software developers to fully utilize their efficiency. Sequential legacy codes are not always easily parallelized, and the time spent on conversion might not pay off in the end. This is particular true for heterogeneous computers, where the architectural differences between the main and coprocessor can be so significant that they require completely different optimization strategies. The cache hierarchy management of CPUs and GPUs are an evident example hereof. In the past, industrial companies were able to boost application performance solely by upgrading their hardware systems, with an overt balance between investment and performance speedup. Today, the picture is different; not only do they have to invest in new hardware, but they also must account for the adaption and training of their software developers. What traditionally used to be a hardware problem, addressed by the chip manufacturers, has now become a software problem for application developers.

The solution of large sparse linear systems arises in many applications, such as computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are often so large that they require large scale distributed parallel computing to obtain the solution of interest in a reasonable time. In this paper we discuss the design and implementation of the AmgX library, which provides drop-in GPU acceleration of distributed algebraic multigrid (AMG) and preconditioned iterative methods. The AmgX library implements both classical and aggregation-based AMG methods with different selector and interpolation strategies, along with a variety of smoothers and preconditioners, including block-Jacobi, Gauss-Seidel, and incomplete-LU factorization. The library contains many of the standard and flexible preconditioned Krylov subspace iterative methods, which can be combined with any of the available multigrid methods or simpler preconditioners. The parallelism in the aggregation scheme exploits parallel graph matching techniques, while the smoothers and preconditioners often rely on parallel graph coloring algorithms. The AMG algorithm implemented in the AmgX library achieves 2-5x speedup on a single GPU against a competitive implementation on the CPU. As will be shown in the numerical experiments section, both setup and solve phases scale well across multiple nodes, sustaining this performance advantage.

Current trends in high performance computing (HPC) are advancing towards the use of graphics processing units (GPUs) to achieve speed-ups for linear algebra matrix operations that are common in applied computational fluid dynamics (CFD) solvers. In recent years GPUs have been developed exclusively for computational tasks as massively-parallel co-processors to x86-based CPUs, and provides new HPC opportunities for industry application of CFD software from commercial vendors who mostly deploy implicit sparse iterative solvers.

A major challenge in next-generation industrial applications is to improve numerical analysis by quantifying uncertainties in predictions. In this work we present a formulation of a fully nonlinear and dispersive potential flow water wave model with random inputs for the probabilistic description of the evolution of waves. The model is analyzed using random sampling techniques and non-intrusive methods based on generalized Polynomial Chaos (PC). These methods allow to accurately and efficiently estimate the probability distribution of the solution and require only the computation of the solution in different points in the parameter space, allowing for the reuse of existing simulation software. The choice of the applied methods is driven by the number of uncertain input parameters and by the fact that finding the solution of the considered model is computationally intensive.
We revisit experimental benchmarks often used for validation of deterministic water wave models. Based on numerical experiments and assumed uncertainties in boundary data, our analysis reveals that some of the known discrepancies from deterministic simulation in comparison with experimental measurements could be partially explained by the variability in the model input. We finally present a synthetic experiment studying the variance based sensitivity of the wave load on an off-shore structure to a number of input uncertainties. In the numerical examples presented the PC methods have exhibited fast convergence, suggesting that the problem is amenable to being analyzed with such methods.

We present performance results of a mixed-precision strategy developed to improve a recently developed massively parallel GPU-accelerated tool for fast and scalable simulation of unsteady fully nonlinear free surface water waves over uneven depths (Engsig-Karup et.al. 2011). The underlying wave model is based on a potential flow formulation, which requires efficient solution of a Laplace prob-lem at large-scales. We report recent results on a new mixed-precision strategy for efficient iterative high-order accurate and scalable solution of the Laplace problem using a multigrid-preconditioned defect correction method. The improved strategy improves the performance by exploiting architectural features of modern GPUs for mixed precision computations and is tested in a recently developed generic library for fast prototyping of PDE solvers. The new wave tool is applicable to solve and analyze large-scale wave problems in coastal and offshore engineering.

In this chapter, we use our library for heterogeneous and massively parallel GPU implementations. The library is written in Compute Unified Device Architecture (CUDA) C/C++ and a fully nonlinear and dispersive free surface water wave model [18] is implemented. We describe how flexible-order finite difference (stencil) approximations to the partial differential equations of the model can be prototyped using library components provided in an in-house library. In this library hardware-specific implementation details are hidden via FIGURE 11.1. Snapshot of steady state wave field generated by a Series 60 ship hull.

Robust computational procedures for the solution of non-hydrostatic, free surface, irrotational and invis- cid free-surface water waves in three space dimensions can be based on iterative preconditioned defect correction (PDC) methods. Such methods can be made efficient and scalable to enable prediction of free- surface wave transformation and accurate wave kinematics in both deep and shallow waters in large marine areas or for predicting the outcome of experiments in large numerical wave tanks. We revisit the classical governing equations are fully nonlinear and dispersive potential flow equations. We present new detailed fundamental analysis using finite-amplitude wave solutions for iterative solvers. We demonstrate that the PDC method in combination with a high-order discretization method enables efficient and scalable solution of the linear system of equations arising in potential flow models. Our study is particularly relevant for fast and efficient simulation of non-breaking fully nonlinear water waves over varying bottom topography that may be limited by computational resources or requirements. To gain insight into algorithmic properties and proper choices of discretization parameters for different PDC strategies, we study systematically limits of accuracy, convergence rate, algorithmic and numerical efficiency and scalability of the most efficient known PDC methods. These strategies are of interest, because they enable generalization of geometric multigrid methods to high-order accurate discretizations and enable significant improvement in numerical efficiency while incuring minimal storage requirements. We demonstrate robustness using such PDC methods for prac- tical ranges of interest for coastal and maritime engineering, that is, from shallow to deep water, and report details of numerical experiments that can be used for benchmarking purposes.

The main objective of the present study has been to develop a numerical model and investigate solution techniques for solving the recently derived high-order Boussinesq equations of \cite{MBL02} in irregular domains in one and two horizontal dimensions. The Boussinesq-type methods are the simplest alternative to solving full three-dimensional wave problems by e.g. Navier-Stokes equations, which can capture all the important wave phenomena such as diffraction, refraction, nonlinear wave-wave interactions and interaction with structures.
The main goal can be reached by using multi-domain methods with support for a spatial discretization based on unstructured grids. In the current work, a standard method of lines approach has been adapted, and the method of choice for the spatial discretization is the nodal Discontinuous Galerkin Finite element method (DG-FEM), which provides a highly flexible basis for the model. This method is combined with an explicit Runge-Kutta method for the temporal discretization. The resulting discrete set of equations enables us to simulate water waves accurately in complex geometric settings and possibly employ local adaption techniques to optimize the computational effort.
The high-order Boussinesq equations constitute a highly complex system of coupled equations which put any numerical method to the test. The main problems that need to be overcome to solve the equations are the treatment of strongly nonlinear convection-type terms and spatially varying coefficient terms; efficient and robust solution of the resultant time-dependent linear system; and the numerical treatment of high-order and cross-differential derivatives. The suggested solution strategy of the current work is based on a collocation approach where the DG-FEM is used to approximate spatial derivatives and the boundary conditions are imposed weakly using a symmetry technique. Since collocation methods are prone to aliasing errors, various anti-aliasing strategies are applied for the stabilization of the models. A practical and relatively straightforward discretization is applied, which is based on a simple treatment of slip boundary conditions at wall surfaces.
A linear Fourier analysis has been applied to obtain generic analytical results which can be used for validating the discrete implementation and provide the basis for choosing stable discretization parameters as well as giving new insight into the properties of the high-order Boussinesq equations. Remarkably, it is demonstrated that the linear eigenspectra of the linearized semi-discrete equation system is bounded and hence the stable time increment is not dictated by the spatial discretization. This is a favorable property for explicit time-integration schemes as the stable time increment is not subject to severe restrictions which can affect the performance of the scheme. It is demonstrated that the discrete properties of both DG-FEM and finite difference methods can be discretized to mimic the analytical properties.
It is investigated mathematically and demonstrated numerically how the relaxation method of \cite{LD83} can be applied in spectral/$hp$ multi-domain methods for both accurate internal wave generation of arbitrary wave fields and efficient absorption near domain boundaries. The method is considered to be particular attractive for wave generation purposes for use with high-order Boussinesq models as it alleviates the need for specifying consistent boundary conditions, and importantly, it is a very straightforward and flexible method.
The DG-FEM models have been applied to a number of tests in both one and two horizontal dimensions with the objective of both validating the setup against known analytical and experimental test results, and at the same time demonstrating the attractive properties of the method. It has been demonstrated that difficult nonlinear and dispersive wave problems can be solved accurately in one horizontal dimension. In two horizontal dimensions it has been demonstrated that the model can solve problems in both regular and irregular geometries and by comparison with analytical results it is shown that the results are in general in excellent agreement.
Thus, it has been established that the DG-FEM can be used to solve this relatively complicated system of equations. The computational efficiency of the method has yet to be demonstrated.

This contribution presents our recent progress on developing an efficient fully-nonlinear potential flow model for simulating 3D wave-wave and wave-structure interaction over arbitrary depths (i.e. in coastal and offshore environment). The model is based on a high-order finite difference scheme OceanWave3D presented in [1, 2]. A nonlinear decomposition of the solution into incident and scattered fields is used to increase the efficiency of the wave-structure interaction problem resolution. Application of the method to the diffraction of nonlinear waves around a fixed, bottom mounted circular cylinder are presented and compared to the fully nonlinear potential code XWAVE as well as to experiments.

The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation.
A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism.
We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following:
The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems
The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar.
Instead of traditional benchmarks, use 13 "Dwarfs" to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.)
"Autotuners" should play a larger role than conventional compilers in translating parallel programs.
To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications.
To be successful, programming models should be independent of the number of processors.
To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism.
Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters.
Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines.
To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost.
Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

We implement and evaluate a massively parallel and scalable algorithm based on a multigrid preconditioned Defect Correction method for the simulation of fully nonlinear free surface flows. The simulations are based on a potential model that describes wave propagation over uneven bottoms in three space dimensions and is useful for fast analysis and prediction purposes in coastal and offshore engineering. A dedicated numerical model based on the proposed algorithm is executed in parallel by utilizing affordable modern special purpose graphics processing unit (GPU). The model is based on a low-storage flexible-order accurate finite difference method that is known to be efficient and scalable on a CPU core (single thread). To achieve parallel performance of the relatively complex numerical model, we investigate a new trend in high-performance computing where many-core GPUs are utilized as high-throughput co-processors to the CPU. We describe and demonstrate how this approach makes it possible to do fast desktop computations for large nonlinear wave problems in numerical wave tanks (NWTs) with close to 50/100 million total grid points in double/single precision with 4GB global device memory available. A new code base has been developed in C++ and compute unified device architecture C and is found to improve the runtime more than an order in magnitude in double precision arithmetic for the same accuracy over an existing CPU (single thread) Fortran 90 code when executed on a single modern GPU. These significant improvements are achieved by carefully implementing the algorithm to minimize data-transfer and take advantage of the massive multi-threading capability of the GPU device. Copyright (c) 2011 John Wiley & Sons, Ltd.

This paper considers the relative accuracy and efficiency of low- and high-order finite-difference discretisations of the
exact potential-flow problem for nonlinear water waves. The method developed is an extension of that employed by Li and Fleming
(Coastal Engng 30: 235–238, 1997) to allow arbitrary-order finite-difference schemes and a variable grid spacing. Time-integration
is performed using a fourth-order Runge–Kutta scheme. The linear accuracy, stability and convergence properties of the method
are analysed and high-order schemes with a stretched vertical grid are found to be advantageous relative to second-order schemes
on an even grid. Comparison with highly accurate periodic solutions shows that these conclusions carry over to nonlinear problems
and that the advantages of high-order schemes improve with both increasing nonlinearity and increasing accuracy tolerance.
The combination of non-uniform grid spacing in the vertical and fourth-order schemes is suggested as optimal for engineering
purposes.

The flexible-order, finite difference based fully nonlinear potential flow model described in [H.B. Bingham, H. Zhang, On the accuracy of finite difference solutions for nonlinear water waves, J. Eng. Math. 58 (2007) 211–228] is extended to three dimensions (3D). In order to obtain an optimal scaling of the solution effort multigrid is employed to precondition a GMRES iterative solution of the discretized Laplace problem. A robust multigrid method based on Gauss–Seidel smoothing is found to require special treatment of the boundary conditions along solid boundaries, and in particular on the sea bottom. A new discretization scheme using one layer of grid points outside the fluid domain is presented and shown to provide convergent solutions over the full physical and discrete parameter space of interest. Linear analysis of the fundamental properties of the scheme with respect to accuracy, robustness and energy conservation are presented together with demonstrations of grid independent iteration count and optimal scaling of the solution effort. Calculations are made for 3D nonlinear wave problems for steep nonlinear waves and a shoaling problem which show good agreement with experimental measurements and other calculations from the literature.

This report describes software tools in Diffpack that make it easy to parallelize an existing sequential solver. The scope is limited to solvers that employ explicit finite difference methods. This class of problems allows parallellization via exact domain decomposition procedures. The main emphasis of the report is devoted to user-friendly abstractions for communicating field values between different processes. Both standard and staggered grids can be handled. The software setup is in principle general and not limited to explicit finite difference schemes. 1 Introduction In this report we describe software tools that make it easy to take a standard sequential Diffpack simulation code and develop a version of it that can run effectively on parallel computers. The basic idea of this approach is to formulate the original mathematical problem as a set of subproblems over different parts of the domain. This strategy is usually referred to as domain decomposition. All the subproblems can t...

Wave energy converters (WECs) need to be deployed in large numbers in an array layout in order to have a significant power production. Each WEC has an impact on the incoming wave field, diffracting, reflecting and radiating waves. Simulating the wave transformations within and around a WEC farm is complex; it is difficult to simulate both near field and far field effects with a single numerical model, with relatively fast computing times. Within this research a numerical tool is developed to model near-field and far-field wave transformations caused by WECs. The tool is based on the coupling of a wave-structure interaction solver and a wave propagation model, both based on the potential flow theory. This paper discusses the coupling method and illustrates the functionality with a proof-of-concept. Additionally, a projection of the evolution of the numerical tool is given. It can be concluded that the coupling of the two solvers is an efficient and promising numerical tool to perform simulations on near – and far field wave elevations and kinematics nearby WEC farms.

This paper presents a parallel implementation and validation of an accurate and efficient three-dimensional computational model (3D numerical wave tank), based on fully nonlinear potential flow (FNPF) theory, and its extension to incorporate the motion of a laboratory snake piston wavemaker, as well as an absorbing beach, to simulate experiments in a large-scale 3D wave basin. This work is part of a long-term effort to develop a "virtual" computational wave basin to facilitate and complement large-scale physical wave-basin experiments. The code is based on a higher-order boundary-element method combined with a fast multipole algorithm (FMA). Particular efforts were devoted to making the code efficient for large-scale simulations using high-performance computing platforms. The numerical simulation capability can be tailored to serve as an optimization tool at the planning and detailed design stages of large-scale experiments at a specific basin by duplicating its exact physical and algorithmic features. To date, waves that can be generated in the numerical wave tank (NWT) include solitary, cnoidal, and airy waves. In this paper we detail the wave-basin model, mathematical formulation, wave generation, and analyze the performance of the parallelized FNPF-BEM-FMA code as a function of numerical parameters. Experimental or analytical comparisons with NWT results are provided for several cases to assess the accuracy and applicability of the numerical model to practical engineering problems.

Higher stakes from deep-ocean drilling, increasing complexity from unconventional reservoirs, and an overarching desire for a higher-fidelity subsurface description have led to a demand for reservoir simulators capable of modelling many millions of cells in minutes. Recent advances in heterogeneous computing hardware offer the promise of faster simulation times, particularly through the use of GPUs. Thus far, efforts to take advantage of hardware accelerators have been primarily focused on linear solvers and, in particular, simple preconditioners which often sacrifice rapid convergence for the sake of easy parallelism. This relatively weak convergence, the remaining unaccelerated code paths, and communication bottlenecks have prevented dramatic reductions in run time. A comprehensive approach, however, built from the ground up for accelerators, can deliver on the hardware’s promise to meet industry demand for fast, scalable reservoir simulation. We present the results of our efforts to fully accelerate reservoir simulations on multiple GPUs in an extended black-oil formulation discretized using a fully-implicit finite volume method. We implement all major computational aspects, including property evaluation, Jacobian construction, and robust solvers/preconditioners on GPUs. The CPR-AMG preconditioner we employ allows low iteration count and near-optimal order(N) scaling of computational effort with system size. This combination of algorithms and hardware enables the simulation of fine-scale models with many millions of cells in minutes on a single workstation without any upscaling of the original problem. We discuss the algorithms and methods we employ, give performance and accuracy results on a range of benchmark problems and real assets, and discuss the strong and weak scaling behavior of performance with model size and GPU count. This work was supported by the Marathon Oil Corporation.

Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarse-grained tasks suitable for distributed computers with traditional processing cores. However, accelerating multigrid methods on massively parallel throughput-oriented processors, such as graphics processing units, demands algorithms with abundant fine-grained parallelism. In this paper, we develop a parallel algebraic multigrid method which exposes substantial fine-grained parallelism in both the construction of the multigrid hierarchy as well as the cycling or solve stage. Our algorithms are expressed in terms of scalable parallel primitives that are efficiently implemented on the GPU. The resulting solver achieves an average speedup of 1.8× in the setup phase and 5.7× in the cycling phase when compared to a representative CPU implementation.

Thrust is a parallel algorithms library which resembles the C++ Standard
Template Library (STL). Thrust's high-level interface greatly enhances
programmer productivity while enabling performance portability between
GPUs and multicore CPUs. Interoperability with established technologies
(such as CUDA, TBB, and OpenMP) facilitates integration with existing
software.

A set of nonlinear Boussinesq equations with fully nonlinearity property is solved numerically in generalized coordinates, to develop a Boussinesq-type wave model in dealing with irregular computation boundaries in complex nearshore regions and to facilitate the grid refinements in simulations. The governing equations expressed in contravariant components of velocity vectors under curvilinear coordinates are derived and a high order finite difference scheme on a staggered grid is employed for the numerical implementation. The developed model is used to simulate nearshore wave propagations under curvilinear coordinates, the numerical results are compared against analytical or experimental data with a good agreement.

This survey covers and contrasts the numerical methods that are actively being used to solve free-surface flow problems. The literature cited is slanted towards body-wave problems. However, relevant techniques related to 'free-wave' calculations are also mentioned to illustrate different facets of a methodology. Attention is restricted to problems based on a potential or stream-function formulation. The use of the velocity potential is justifiable only when the inertial forces dominate the viscous forces. Cases of bluff bodies in steady motion and small bodies in a wave field of large amplitude are well-known exceptions. Methods based on the use of the primitive-variables equation (velocity-pressure formulations) are not included. Three major categories of methods are reviewed: finite differences, finite elements, and boundary-integral equations.

An efficient numerical method for improved Boussinesq equations is proposed. Because of the use of the boundary-fitted coordinate system, the method facilitates the solution of wave problems with complicated boundaries and topography. The iterative method, combined with an efficient predictor-corrector scheme, is adopted for the numerical solution of the governing differential equations. The proposed numerical scheme is verified by three test cases where laboratory data were available for comparison. The successful simulation of wave runup around a circular cylinder and twin-tandem cylinders and wave propagation over an elliptical shoal shows that the proposed numerical method is both stable and accurate.

A three-dimensional numerical model based on the complete Navier-Stokes equations is developed and presented in this paper. The model can be used for the problem of propagation of fully nonlinear water waves. The Navier-Stokes equations are first transformed from an irregular calculation domain to a regular one using sigma coordinates. The projection method is used to separate advection and diffusion terms from the pressure terms in Navier-Stokes equations. MacCormack's explicit scheme is used for the advection and diffusion terms, and it has second-order accuracy in both space and time. The pressure variable is further separated into hydrostatic and hydrodynamic pressures so that the computer rounding errors can be largerly avoided. The resulting hydrodynamic pressure equation is solved by a multigrid method. A straggered mesh and central spatial finite-difference scheme are used. The model is tested against the experimental data of Luth et al., and the comparison shows that higher harmonics can be modeled well. Comparison of the model solutions with the elliptic shoal case confirms that the present model works well for wave refraction and diffraction with strong wave focusing.

Although circular piling is a much-used structural element in shore protection, harbor, and other maritime structures, only recently have significant advances been made toward gaining a quantitative understanding of the forces developed by wave action against piling. The present report deals with this subject.

In this paper, generalized 2D shallow sea dynamic equations in movable curvilinear co-ordinates are derived. Through a differential co-ordinate transformation a self-adaptive grid is proposed to treat a continuously deforming lateral boundary and a kinematical boundary condition is adopted. The self-adaptive grid method (SAM) is used to simulate numerically the storm surge flooding in the Bohai Sea on 23 April 1969, which was one of the largest storm surge inundations in China.

A closed-form solution is developed for the velocity potential resulting from the interaction of second-order Stokes waves with a large vertical circular cylinder. At first-order, the solution is the usual linear diffraction theory. At second-order, the solution consists of forced wave motions, due to nonlinear wave-wave interactions in the free surface boundary condition, plus scattered free wave motions, due to the interaction of the forced waves with the fixed cylinder. The velocity potentials are then used to determine the theoretical free surface elevations around the cylinder consistent to second-order. Second-order terms are found to significantly alter wave envelopes around the cylinder as a result of nonlinear diffraction. For example, the maximum wave crest run-up on the cylinder from the nonlinear theory is found to exceed that predicted by the linear diffraction theory by up to 50%. A brief comparison of the nonlinear theory with the measured run-up data is found to largely confirm the theoretical solution.

Using the perturbation method, a time dependent parabolic equation is developed based on the elliptic mild slope equation with dissipation term. With the time dependent parabolic equation employed as the governing equation, a numerical model for wave propagation including dissipation term in water of slowly varying topography is presented in curvilinear coordinates. In the model, the self-adaptive grid generation method is employed to generate a boundary-fitted and varying spacing mesh. The numerical tests show that the effects of dissipation term should be taken into account if the distance of wave propagation is large, and that the outgoing boundary conditions can be treated more effectively by introduction of the dissipation term into the numerical model. The numerical model is able to give good results of simulating wave propagation for waters of complicatedly boundaries and effectively predict physical processes of wave propagation. Moreover, the errors of the analytical solution deduced by Kirby et al. (1994) [Kirby, J.T., Dalrymple, R.A., Kabu, H., 1994. Parabolic approximation for water waves in conformal coordinate systems. Coastal Engineering 23, 185–213.] from the small-angle parabolic approximation of the mild-slope equation for the case of waves between diverging breakwaters in a polar coordinate system are corrected.

Theoretical results for second-order wave run-up around a large diameter vertical circular cylinder are compared to results of 22 laboratory experiments conducted in regular nonlinear waves. In general, the second-order theory explains a significant portion of the nonlinear wave run-up distribution measured at all angles around the cylinder. At the front of the cylinder, for example, measured maximum run-up exceeds linear theory by 44% on average but exceeds the nonlinear theory by only 11% on average. In some cases, both measured run-up and the second-order theory exceed the linear prediction by more than 50%. Similar results are found at the rear of the cylinder where the second-order theory predicts a large increase in wave amplitude for cases where the linear diffraction theory predicts little or no increase. Overall, the nonlinear diffraction theory is found to be valid for the same relative depth and wave steepness conditions applicable to Stokes second-order plane-wave theory. In the last section of the paper, design curves are presented for estimating the maximum second-order wave run-up for a wide range of conditions in terms of the relative depth, relative cylinder size, and wave steepness.

A three-dimensional (3D) numerical wave tank (NWT) solving fully nonlinear potential flow theory, with a higher-order boundary element method (BEM), is modified to simulate tsunami generation by underwater landslides. New features are added to the NWT to model underwater landslide geometry and motion and specify corresponding boundary conditions in the BEM model. In particular, a new snake absorbing piston boundary condition is implemented to remove reflection from the onshore and offshore boundaries of the NWT. Model results are favorably compared to recent laboratory experiments. Sensitivity analyses of numerical results to the width and length of the discretization are conducted, to determine optimal numerical parameters. The effect of landslide width on tsunami generated is estimated. Results show that the two-dimensional approximation is applicable when the ratio of landslide width over landslide length is greater than 2. Numerical accuracy is examined and found to be excellent in all cases.

Based on the fully nonlinear Boussinesq equations in Cartesian coordinates, the equations in generalized coordinates are derived to adapt computations to irregularly shaped shorelines, such as harbors, bays and tidal inlets, and to make computations more efficient in large near-shore regions. Contravariant components of velocity vectors are employed in the derivation instead of the normal components in curvilinear coordinates or original components in Cartesian coordinates, which greatly simplifies the equations in generalized curvilinear coordinates. A high-order finite difference scheme with staggered grids in the image domain is adopted in the numerical model. The model is applied to five examples involving curvilinear coordinate systems. The results of these cases are in good agreement with analytical results, experimental data, and the results from the uniform grid model, which shows that the model has good accuracy and efficiency in dealing with the computations of nonlinear surface gravity waves in domains with complicated geometries.

Methods for the numerical computation of freely propagating irrotational water waves are reviewed. The emphasis is on the methods, not on the results. The primary focus is on methods for time-dependent fully nonlinear water waves, but aspects of steady waves are also discussed. For time-dependent waves, a range of topics from two-dimensional time-periodic waves over a flat bottom to unsteady three-dimensional waves over an arbitrary topography, including the statistical description of water waves, are discussed.

This paper describes a general parallel multi-subdomain strategy for solving the weakly dispersive and nonlinear Boussinesq water wave equations. The parallelization strategy is derived from the additive Schwarz method based on overlapping subdomains. Besides allowing the subdomains to independently solve their local problems, the strategy is also flexible in the sense that different discretization schemes, or even different mathematical models, are allowed in different subdomains. The parallelization strategy is particularly attractive from an implementational point of view, because it promotes the reuse of existing serial software and opens for the possibility of using different software in different subdomains.We study the strategy’s performance with respect to accuracy, convergence properties of the Schwarz iterations, and scalability through numerical experiments concerning waves in a basin, solitary waves, and waves generated by a moving vessel. We find that the proposed technique is promising for large-scale parallel wave simulations. In particular, we demonstrate that satisfactory accuracy and convergence speed of the Schwarz iterations are obtainable independent of the number of subdomains, provided there is sufficient overlap. Moreover, existing serial wave solvers are readily reusable when implementing the parallelization strategy.

The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption. We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up to 160 nodes. Starting with a conventional commodity based cluster we leverage the high bandwidth of graphics processing units (GPUs) to increase the overall system bandwidth that is the decisive performance factor in this scenario. Thus, even the addition of low-end, out of date GPUs leads to improvements in both performance- and power-related metrics.

In this paper a three dimensional multigrid model is developed for the linear and fully nonlinear water wave propagation. The Laplace equation is transformed from an irregular calculation domain to a regular one and the boundary conditions on water surface and sea bottom can be implemented precisely. The multigrid method is used to solve the governing equation and the requirement of the computer storage is very small. The difference in computer time for running the linear model and the fully nonlinear model is not significant. The present model is valid over the complete range of water depths. For fully nonlinear water wave problems the present model is particularly efficient. The model is used to investigate the validity of the mild-slope equation for the case of strong wave focusing behind an elliptical shoal and also applied to Whalin's experiment. Simulation of wave breaking is not included in the present model.

In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implementation for both the stencil-only computation, as well as the discretization of the wave equation, which is currently of great interest in seismic computing. For the larger stencils, the described approach achieves the throughput of between 2,400 to over 3,000 million of output points per second on a single Tesla 10-series GPU. This is roughly an order of magnitude higher than a 4-core Harpertown CPU running a similar code from seismic industry. Multi-GPU parallelization is also described, achieving linear scaling with GPUs by overlapping inter-GPU communication with computation.

Der wichtigste Beitrag dieser Dissertation ist es aufzuzeigen, dass Grafikprozessoren (GPUs) als Repräsentanten der Entwicklung hin zu Vielkern-Architekturen sehr gut geeignet sind zur schnellen und genauen Lösung großer, dünn besetzter linearer Gleichungssysteme, insbesondere mit parallelen Mehrgittermethoden auf heterogenen Rechenclustern. Solche Systeme treten bspw. bei der Diskretisierung (elliptischer) partieller Differentialgleichungen mittels finiter Elemente auf. Wir demonstrieren Beschleunigungsfaktoren von mindestens einer Größenordnung gegenüber konventionellen, hochoptimierten CPU-Implementierungen, ohne Verlust von Genauigkeit und Funktionsumfang. Im Detail liefert diese Dissertation die folgenden Beiträge:
Berechnungen in einfach genauer Fließkommadarstellung können für die hier betrachteten Problemklassen nicht ausreichen. Wir greifen die Methode gemischt genauer iterativer Verfeinerung (Nachiteration) wieder auf, um nicht nur die Genauigkeit von berechneten Lösungen zu verbessern, sondern vielmehr die Effizienz des Lösungsprozesses als ganzes zu steigern. Sowohl auf CPUs als auch auf GPUs demonstrieren wir eine deutliche Leistungssteigerung ohne Genauigkeitsverlust im Vergleich zur Berechnung in höherer Fliesskomma-Genauigkeit.
Wir präsentieren effiziente Parallelisierungstechniken für Mehrgitter-Löser auf Grafik-Hardware, insbesondere für numerisch starke Glätter und Vorkonditionierer, die für stark anisotrope Gitter und Operatoren geeignet sind. Ein Beispiel ist die Entwicklung einer effizienten Reformulierung des Verfahrens der zyklischen Reduktion für die Lösung tridiagonaler Gleichungssysteme. Im Hinblick auf Hardware-orientierte Numerik analysieren wir sorgfältig den Kompromiss zwischen numerischer und Laufzeit-Effizienz für inexakte Parallelisierungstechniken, die einige der inhärent sequentiellen Charakteristiken solcher starker Glätter zugunsten besserer Parallelisierungseigenschaften entkoppeln.
Die Reimplementierung großer, etablierter Softwarepakete zur Anpassung auf neue Hardwareplattformen ist oft inakzeptabel teuer. Wir entwickeln einen "minimalinvasiven" Zugang zur Integration von Co-Prozessoren wie GPUs in FEAST, einem exemplarischen finite Elemente Diskretisierungs- und Löserpaket. Der Hauptvorteil unserer Technik ist, dass Applikationen, die auf FEAST aufsetzen, nicht geändert werden müssen um von der Beschleunigung durch solche Co-Prozessoren zu profitieren. Wir evaluieren unseren Zugang auf großen GPU-beschleunigten Rechenclustern für klassische Benchmarkprobleme aus der linearisierten Elastizität und der Simulation stationärer laminarer Strömungsvorgänge, und beobachten gute Beschleunigungsfaktoren und gute schwache Skalierbarkeit. Die maximal erreichbare Beschleunigung wird zudem analysiert und theoretisch modelliert, um bspw. Vorhersagen treffen zu können.
Weiterhin fassen wir die historische Entwicklung des Forschungsgebiets "wissenschaftliches Rechnen auf Grafikhardware" seit 2001/2002 zusammen, d.h. die Entwicklung von GPGPU als obskures Nischenthema hin zum fachübergreifenden Einsatz heute. Die Darstellung umfasst gleichermaßen die Hardware und das Programmiermodell und beinhaltet eine ausgiebige Bibliografie von Veröffentlichungen im Bereich der Simulation von PDE-Problemen auf GPUs.

Inspired by the attractive Flops/dollar ratio and the incredible growth in the speed of modern graphics processing units (GPUs), we propose to use a cluster of GPUs for high performance scientific computing. As an example application, we have developed a parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and have simulated the dispersion of airborne contaminants in the Times Square area of New York City. Using 30 GPU nodes, our simulation can compute a 480x400x80 LBM in 0.31 second/step, a speed which is 4.6 times faster than that of our CPU cluster implementation. Besides the LBM, we also discuss other potential applications of the GPU cluster, such as cellular automata, PDE solvers, and FEM.

Specialist Committee on CFD in Marine Hydrodynamics

- T Hino
- P Carrica
- R Broglia

Hino T, Carrica P, Broglia R, et al. (2014) Specialist Committee
on CFD in Marine Hydrodynamics. In: Proceedings of the
27th International Towing Tank Conference (ITTC), Copenhagen, DK, 31 August-5 September 2014, pp.522-567.

Highperformance Code Generation for Stencil Computations on GPU Architectures

- J Holewinski
- L N Pouchet
- P Sadayappan

Holewinski J, Pouchet LN and Sadayappan P (2012) Highperformance Code Generation for Stencil Computations on
GPU Architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing, Venice, IT, 25-29 June
2012, pp. 311-320. New York: ACM.

Towards real time simulation of ship-ship interaction -part ii: double body flow linearization and GPU implementation

- O Lindberg
- S L Glimberg
- B B Harry

Lindberg O, Glimberg SL, Harry BB, et al. (2013) Towards real
time simulation of ship-ship interaction -part ii: double body
flow linearization and GPU implementation. In: Proceedings
of The 28th International Workshop on Water Waves and
Floating Bodies, Marseille, FR, 7-10 April 2013, pp. 125-128.

Denmark: Den private ingeniørfond

- I A Svendsen
- I G Jonsson

Svendsen IA and Jonsson IG (1976) Hydrodynamics of coastal
regions. Vol. 3. Denmark: Den private ingeniørfond, Technical
University of Denmark.

His PhD is in applied mathematics from the University of Colorado at Boulder in 2003 and his research interests include sparse matrix computations, multigrid methods, finite elements methods, and parallel numerical algorithms

- N Luke
- Olson

Luke N Olson is a professor at the Department of
Computer Science in the University of Illinois at
Urbana-Champaign. His PhD is in applied mathematics from the University of Colorado at Boulder in
2003 and his research interests include sparse matrix
computations, multigrid methods, finite elements
methods, and parallel numerical algorithms.