Figure 1 - uploaded by Balazs Molnar
Content may be subject to copyright.
Thread hierarchy on the GPU

Thread hierarchy on the GPU

Source publication
Conference Paper
Full-text available
Vectorization of a computer code offers significant speedup of execution time on parallel computing architectures. Vectorized Monte Carlo (MC) simulations require major changes to a conventional algorithm, which generally follows a history-based structure. The non-trivial task of implementation has already been addressed at the time of the first ap...

Context in source publication

Context 1
... blocks, which are required to execute independently. This also ensures automatic scalability of the program, as blocks of threads can be scheduled on any multiprocessors of the device, yielding faster execution time when more multiprocessors are available. To better understand the execution structure of the GPU, thread hierarchy is presented on Fig. 1. Functions executed in parallel are called kernels in CUDA terminology. Kernels are launched by specifying the number of threads in a block, and the total number of blocks. In general, to choose the number of threads in a block as a multiple of warp size (32) is a good idea, however, CUDA offers an opportunity to maximize kernel ...

Citations

... The code GUARDYAN harnesses the power of GPUs to fulfill the computing needs. Investigation of two different GPU implementation strategies of the Monte Carlo method using GUARDYAN is presented in [5]. ...
Conference Paper
Full-text available
The novel GPU assisted Monte Carlo code GUARDYAN targeting applications of reactor transient analysis has been compared to simulations of MCNP for verification purposes. In 2000 separate calculations using 412 isotopes, about 445 000 data points were generated and compared with MCNP6. Results showed agreement within statistics.
Article
A novel 3D Monte Carlo (MC) neutron transport code, GUARDYAN, was developed to simulate direct time dependence in nuclear reactors. GUARDYAN (GpU Assisted Reactor DYnamic ANalysis) addresses the huge computational need by exploiting massive parallelism available on modern Graphics Processing Units (GPUs). While the code is still under development, transient analysis on large scale problems is already obtainable. The implementation is verified via comparison of differential and integral quantities to MCNP6 results, including several criticality safety benchmarks. Unlike most conventional MC codes GUARDYAN is intentionally designed for time-dependent calculations supporting parallel scalability on state-of-the-art high performance computing platforms. The methodology of transport simulation thus differs in many aspects: generation-by-generation tracking is replaced by a time step method; branching of neutron histories, neutron banking is eliminated by statistical weight manipulations; a robust delayed neutron treatment is implemented. These concepts, along with advanced acceleration techniques for improving the performance of point-in-cell search routine and the delta tracking method, resulted in an efficient MC tool that seems to outperform existing methods for kinetic MC simulation. Transient analysis was performed on an LWR core demonstrating that simulation of one second of a transient requires around 50 h on a single GeForce GTX 1080 GPU. The power evolution produced by GUARDYAN during this transient was also compared to experimental data; remarkably close agreement was found despite the uncertainties in the MC model.