PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets
ABSTRACT The IDX data format provides efficient, cache oblivious, and progressive access to large-scale scientific datasets by storing the data in a hierarchical Z (HZ) order. Data stored in IDX format can be visualized in an interactive environment allowing for meaningful explorations with minimal resources. This technology enables real-time, interactive visualization and analysis of large datasets on a variety of systems ranging from desktops and laptop computers to portable devices such as iPhones/iPads and over the web. While the existing ViSUS API for writing IDX data is serial, there are obvious advantages of applying the IDX format to the output of large scale scientific simulations. We have therefore developed PIDX - a parallel API for writing data in an IDX format. With PIDX it is now possible to generate IDX datasets directly from large scale scientific simulations with the added advantage of real-time monitoring and visualization of the generated data. In this paper, we provide an overview of the IDX file format and how it is generated using PIDX. We then present a data model description and a novel aggregation strategy to enhance the scalability of the PIDX library. The S3D combustion application is used as an example to demonstrate the efficacy of PIDX for a real-world scientific simulation. S3D is used for fundamental studies of turbulent combustion requiring exceptionally high fidelity simulations. PIDX achieves up to 18 GiB/s I/O throughput at 8,192 processes for S3D to write data out in the IDX format. This allows for interactive analysis and visualization of S3D data, thus, enabling in situ analysis of S3D simulation.
SourceAvailable from: Peer-Timo Bremer
Conference Paper: Efficient I/O and storage of adaptive resolution dataProc. ACM/IEEE Conference on Supercomputing (SC14); 01/2014
Conference Paper: Fast Multi-Resolution Reads of Massive Simulation DatasetsProc. Int. Supercomputing Conference 2014; 01/2014
[Show abstract] [Hide abstract]
ABSTRACT: Fast growing large-scale systems enable scientific applications to run at a much larger scale and accordingly produce gigantic volumes of simulation output. Such data imposes a grand challenge to post-processing tasks such as visualization and data analysis, because these tasks are often performed at a host machine that is remotely located and equipped with much less memory and storage resources. During the simulation runs, it is also desirable for scientists to be able to interactively monitor and steer the progress of simulation. This requires scientific data to be represented in an efficient form for initial exploration and computation steering. In this paper, we propose DynaM a software framework that can represent scientific data in a multiresolution form, and dynamically organize data blocks into an optimized layout for efficient scientific analysis. DynaM supports a convolution-based multiresolution data representation for abstracting scientific data for visualization at a wide spectrum of resolution. To support the efficient generation and retrieval of different data granularities from such representation, a dynamic data organization in DynaM is enabled to cater distinct peculiarities of different size data blocks for efficient and balanced I/O performance. Our experimental results demonstrate that DynaM can efficiently represent large scientific dataset and speed up the visualization of multidimensional scientific data. An up to 29 times speedup is achieved on Jaguar supercomputer at Oak Ridge National Laboratory.Networking, Architecture and Storage (NAS), 2013 IEEE Eighth International Conference on; 01/2013