added a research item
Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
37
Reads
2 new
722
Project log
A growing disparity between supercomputer computation speeds and I/O rates makes it increasingly infeasible for applications to save all results for offline analysis. Instead, applications must analyze and reduce data online so as to output only those results needed to answer target scientific question(s). This change in focus complicates application and experiment design and introduces algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of supercomputer systems. We review these challenges and describe methods and tools that we are developing to enable experimental exploration of algorithmic, software, and system design alternatives.
Because of vast volume of data being produced by today's scientific simulations and experiments, lossy data compressor allowing user-controlled loss of accuracy during the compression is a relevant solution for significantly reducing the data size. However, lossy compressor developers and users are missing a tool to explore the features of scientific datasets and understand the data alteration after compression in a systematic and reliable way. To address this gap, we have designed and implemented a generic framework called Z-checker. On the one hand, Z-checker combines a battery of data analysis components for data compression. On the other hand, Z-checker is implemented as an open-source community tool to which users and developers can contribute and add new analysis components based on their additional analysis demands. In this paper, we present a survey of existing lossy compressors. Then we describe the design framework of Z-checker, in which we integrated evaluation metrics proposed in prior work as well as other analysis tools. Specifically, for lossy compressor developers, Z-checker can be used to characterize critical properties of any dataset to improve compression strategies. For lossy compression users, Z-checker can detect the compression quality, provide various global distortion analysis comparing the original data with the decompressed data and statistical analysis of the compression error. Z-checker can perform the analysis with either coarse granularity or fine granularity, such that the users and developers can select the best-fit, adaptive compressors for different parts of the dataset. Z-checker features a visualization interface displaying all analysis results in addition to some basic views of the datasets such as time series. To the best of our knowledge, Z-checker is the first tool designed to assess lossy compression comprehensively for scientific datasets.
High accuracy scientific simulations on high performance computing (HPC) platforms generate large
amounts of data. To allow data to be efficiently analyzed, simulation outputs need to be refactored, compressed,
and properly mapped onto storage tiers. This paper presents Canopus, a progressive data management
framework for storing and analyzing big scientific data. Canopus allows simulation results to be refactored into
a much smaller dataset along with a series of deltas with fairly low overhead. Then, the refactored data are compressed, mapped, and written onto storage tiers. For data analytics, refactored data are selectively retrieved
to restore data at a specific level of accuracy that satisfies analysis requirements. Canopus enables end users to
make trade-offs between analysis speed and accuracy on the fly. Canopus is demonstrated and thoroughly evaluated using blob detection on fusion simulation data.
A growing disparity between supercomputer computation speeds and I/O rates makes it increasingly infeasible for applications to save all results for offline analysis. Instead, applications must analyze and reduce data online so as to output only those results needed to answer target scientific question(s). This change in focus complicates application and experiment design and introduces algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of supercomputer systems. We review these challenges and describe methods and tools that we are developing to enable experimental exploration of algorithmic, software, and system design alternatives.