Article

Student Cluster Competition 2018, Team Northeastern University: Reproducing Performance of a Multi-Physics Simulations of the Tsunamigenic 2004 Sumatra Megathrust Earthquake on the AMD EPYC 7551 Architecture

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper evaluates the reproducibility of a Supercomputing 17 paper titled Extreme Scale Multi-Physics Simulations of the Tsunamigenic 2004 Sumatra Megathrust Earthquake. We evaluate reproducibility on a significantly smaller computer system than used in the original work. We found that we able to demonstrate reproducibility of the multi-physics simulations on a single-node system, as well as confirm multi-node scaling. However, reproducibility of the visual and geophysical simulation results were inconclusive due to issues related to input parameters provided to our model. The SC 17 paper provided results for both CPU-based simulations as well as Xeon Phi based simulations. Since our cluster uses NVIDIA V100s for acceleration, we are only able to assess the CPU-based results in terms of reproducibility.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Each core has a private L1 data and L1 instruction cache of 32 KB. The AMD EPYC CPU features a peak performance of 3.5 Tflops double-precision operation [28,29]. The AMD CPU used has two sockets, each having 64 AVX2 cores and 256 MB L3. ...
Article
Full-text available
Optimizing sparse matrix–vector multiplication (SpMV) is challenging due to the non-uniform distribution of the non-zero elements of the sparse matrix. The best-performing SpMV format changes depending on the input matrix and the underlying architecture, and there is no “one-size-fit-for-all” format. A hybrid scheme combining multiple SpMV storage formats allows one to choose an appropriate format to use for the target matrix and hardware. However, existing hybrid approaches are inadequate for utilizing the SIMD cores of modern multi-core CPUs with SIMDs, and it remains unclear how to best mix different SpMV formats for a given matrix. This paper presents a new hybrid storage format for sparse matrices, specifically targeting multi-core CPUs with SIMDs. Our approach partitions the target sparse matrix into two segmentations based on the regularities of the memory access pattern, where each segmentation is stored in a format suitable for its memory access patterns. Unlike prior hybrid storage schemes that rely on the user to determine the data partition among storage formats, we employ machine learning to build a predictive model to automatically determine the partition threshold on a per matrix basis. Our predictive model is first trained off line, and the trained model can be applied to any new, unseen sparse matrix. We apply our approach to 956 matrices and evaluate its performance on three distinct multi-core CPU platforms: a 72-core Intel Knights Landing (KNL) CPU, a 128-core AMD EPYC CPU, and a 64-core Phytium ARMv8 CPU. Experimental results show that our hybrid scheme, combined with the predictive model, outperforms the best-performing alternative by 2.9%, 17.5% and 16% on average on KNL, AMD, and Phytium, respectively.
Conference Paper
Full-text available
We present a high-resolution simulation of the 2004 Sumatra-Andaman earthquake, including non-linear frictional failure on a megathrustsplay fault system. Our method exploits unstructured meshes capturing the complicated geometries in subduction zones that are crucial to understand large earthquakes and tsunami generation. These up-to-date largest and longest dynamic rupture simulations enable analysis of dynamic source effects on the seafloor displacements. To tackle the extreme size of this scenario an end-to-end optimization of the simulation code SeisSol was necessary. We implemented a new cache-aware wave propagation scheme and optimized the dynamic rupture kernels using code generation. We established a novel clustered local-time-stepping scheme for dynamic rupture. In total, we achieved a speed-up of 13.6 compared to the previous implementation. For the Sumatra scenario with 221 million elements this reduced the time-to-solution to 13.9 hours on 86, 016 Haswell cores. Furthermore, we used asynchronous output to overlap I/O and compute time.
The hdf5 library and file format -the hdf group
  • T H Group
T.H. Group, The hdf5 library and file format -the hdf group, 2019.
hfp/libxsmm/ library targeting intel architecture for specialized dense and sparse matrix operations
  • H Pabst
H. Pabst, hfp/libxsmm/ library targeting intel architecture for specialized dense and sparse matrix operations, 2019.
Metis -serial graph partitioning and fill-reducing matrix ordering -karypis lab
  • G Karypis
G. Karypis, Metis -serial graph partitioning and fill-reducing matrix ordering -karypis lab, 2015.