Thread hierarchy on the GPU

Source publication

Vectorized Monte Carlo for GUARDYAN - A GPU Assisted Reactor Dynamics Code

Conference Paper

Full-text available

Apr 2018

Vectorization of a computer code offers significant speedup of execution time on parallel computing architectures. Vectorized Monte Carlo (MC) simulations require major changes to a conventional algorithm, which generally follows a history-based structure. The non-trivial task of implementation has already been addressed at the time of the first ap...

Context 1

... blocks, which are required to execute independently. This also ensures automatic scalability of the program, as blocks of threads can be scheduled on any multiprocessors of the device, yielding faster execution time when more multiprocessors are available. To better understand the execution structure of the GPU, thread hierarchy is presented on Fig. 1. Functions executed in parallel are called kernels in CUDA terminology. Kernels are launched by specifying the number of threads in a block, and the total number of blocks. In general, to choose the number of threads in a block as a multiple of warp size (32) is a good idea, however, CUDA offers an opportunity to maximize kernel ...

View in full-text

Verification of the Interaction Physics of GUARDYAN A Novel GPU-based Monte Carlo Code for Short Time Scale Reactor Transients

Conference Paper

Full-text available

Apr 2018

The novel GPU assisted Monte Carlo code GUARDYAN targeting applications of reactor transient analysis has been compared to simulations of MCNP for verification purposes. In 2000 separate calculations using 412 isotopes, about 445 000 data points were generated and compared with MCNP6. Results showed agreement within statistics.

A GPU-based direct Monte Carlo simulation of time dependence in nuclear reactors

Article

Oct 2019
ANN NUCL ENERGY

A novel 3D Monte Carlo (MC) neutron transport code, GUARDYAN, was developed to simulate direct time dependence in nuclear reactors. GUARDYAN (GpU Assisted Reactor DYnamic ANalysis) addresses the huge computational need by exploiting massive parallelism available on modern Graphics Processing Units (GPUs). While the code is still under development, transient analysis on large scale problems is already obtainable. The implementation is verified via comparison of differential and integral quantities to MCNP6 results, including several criticality safety benchmarks. Unlike most conventional MC codes GUARDYAN is intentionally designed for time-dependent calculations supporting parallel scalability on state-of-the-art high performance computing platforms. The methodology of transport simulation thus differs in many aspects: generation-by-generation tracking is replaced by a time step method; branching of neutron histories, neutron banking is eliminated by statistical weight manipulations; a robust delayed neutron treatment is implemented. These concepts, along with advanced acceleration techniques for improving the performance of point-in-cell search routine and the delta tracking method, resulted in an efficient MC tool that seems to outperform existing methods for kinetic MC simulation. Transient analysis was performed on an LWR core demonstrating that simulation of one second of a transient requires around 50 h on a single GeForce GTX 1080 GPU. The power evolution produced by GUARDYAN during this transient was also compared to experimental data; remarkably close agreement was found despite the uncertainties in the MC model.

Thread hierarchy on the GPU

Context in source publication

Citations