PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets
SCI Inst., Univ. of Utah, Salt Lake City, UT, USA
DOI: 10.1109/CLUSTER.2011.19 Conference: Cluster Computing (CLUSTER), 2011 IEEE International Conference on
The IDX data format provides efficient, cache oblivious, and progressive access to large-scale scientific datasets by storing the data in a hierarchical Z (HZ) order. Data stored in IDX format can be visualized in an interactive environment allowing for meaningful explorations with minimal resources. This technology enables real-time, interactive visualization and analysis of large datasets on a variety of systems ranging from desktops and laptop computers to portable devices such as iPhones/iPads and over the web. While the existing ViSUS API for writing IDX data is serial, there are obvious advantages of applying the IDX format to the output of large scale scientific simulations. We have therefore developed PIDX - a parallel API for writing data in an IDX format. With PIDX it is now possible to generate IDX datasets directly from large scale scientific simulations with the added advantage of real-time monitoring and visualization of the generated data. In this paper, we provide an overview of the IDX file format and how it is generated using PIDX. We then present a data model description and a novel aggregation strategy to enhance the scalability of the PIDX library. The S3D combustion application is used as an example to demonstrate the efficacy of PIDX for a real-world scientific simulation. S3D is used for fundamental studies of turbulent combustion requiring exceptionally high fidelity simulations. PIDX achieves up to 18 GiB/s I/O throughput at 8,192 processes for S3D to write data out in the IDX format. This allows for interactive analysis and visualization of S3D data, thus, enabling in situ analysis of S3D simulation.
Available from: Peer-Timo Bremer
- "An important feature utilized by I/O libraries in general is aggregation, which is a stage in the write pipeline that passes data between nodes such that aggregator nodes can do more efficient block-based writes. PIDX , , ,  is a parallel I/O API that stores data in the IDX format. PIDX also uses aggregators, and recently added a restructuring phase that increases efficiency on writes of data in grids that are not powers of 2 "
[Show abstract] [Hide abstract]
ABSTRACT: We present an efficient, flexible, adaptive-resolution I/O framework that is suitable for both uniform and Adaptive Mesh Refinement (AMR) simulations. In an AMR setting, current solutions typically represent each resolution level as an independent grid which often results in inefficient storage and performance. Our technique coalesces domain data into a unified, multiresolution representation with fast, spatially aggregated I/O. Furthermore, our framework easily extends to importance-driven storage of uniform grids, for example, by storing regions of interest at full resolution and nonessential regions at lower resolution for visualization or analysis. Our framework, which is an extension of the PIDX framework, achieves state of the art disk usage and I/O performance regardless of resolution of the data, regions of interest, and the number of processes that generated the data. We demonstrate the scalability and efficiency of our framework using the Uintah and S3D large-scale combustion codes on the Mira and Edison supercomputers.
Proc. ACM/IEEE Conference on Supercomputing (SC14); 01/2014
Available from: Sidharth Kumar
- "Parallel scientific simulations often produce large volumes of data. In order to aid in structure and efficient access of these huge data sets, a variety of high level I/O libraries such as pnetCDF , Parallel HDF5  and Parallel IDX    have been proposed. A significant amount of research has been devoted to characterize and model the I/O and network behavior of High Performance Computing (HPC) systems. "
[Show abstract] [Hide abstract]
ABSTRACT: Parallel I/O library performance can vary greatly in response to user-tunable parameter values such as aggregator count, file count, and aggregation strategy. Unfortunately, manual selection of these values is time consuming and dependent on characteristics of the target machine, the underlying file system, and the dataset itself. Some characteristics, such as the amount of memory per core, can also impose hard constraints on the range of viable parameter values. In this work we address these problems by using machine learning techniques to model the performance of the PIDX parallel I/O library and select appropriate tunable parameter values. We characterize both the network and I/O phases of PIDX on a Cray XE6 as well as an IBM Blue Gene/P system. We use the results of this study to develop a machine learning model for parameter space exploration and performance prediction.
Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, Colorado; 01/2013
Available from: Valerio Pascucci
- "IV. THREE-PHASE I/O The two-phase aggregation algorithm for PIDX  distributes data to a subset of processes after the HZ encoding step. This technique is efficient only in cases in which the local HZ encoding at each process produces a dense buffer. "
[Show abstract] [Hide abstract]
ABSTRACT: Hierarchical, multiresolution data representations enable interactive analysis and visualization of large-scale simulations. One promising application of these techniques is to store high performance computing simulation output in a hierarchical Z (HZ) ordering that translates data from a Cartesian coordinate scheme to a one-dimensional array ordered by locality at different resolution levels. However, when the dimensions of the simulation data are not an even power of 2, parallel HZ ordering produces sparse memory and network access patterns that inhibit I/O performance. This work presents a new technique for parallel HZ ordering of simulation datasets that restructures simulation data into large (power of 2) blocks to facilitate efficient I/O aggregation. We perform both weak and strong scaling experiments using the S3D combustion application on both Cray-XE6 (65,536 cores) and IBM Blue Gene/P (131,072 cores) platforms. We demonstrate that data can be written in hierarchical, multiresolution format with performance competitive to that of native data-ordering methods.
SC Conference for High Performance Computing, Networking, Storage, and Analysis; 11/2012
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.