Conference Paper

24/7 Characterization of petascale I/O workloads

Math. & Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL, USA
DOI: 10.1109/CLUSTR.2009.5289150 Conference: Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Source: IEEE Xplore

ABSTRACT Developing and tuning computational science applications to run on extreme scale systems are increasingly complicated processes. Challenges such as managing memory access and tuning message-passing behavior are made easier by tools designed specifically to aid in these processes. Tools that can help users better understand the behavior of their application with respect to I/O have not yet reached the level of utility necessary to play a central role in application development and tuning. This deficiency in the tool set means that we have a poor understanding of how specific applications interact with storage. Worse, the community has little knowledge of what sorts of access patterns are common in today's applications, leading to confusion in the storage research community as to the pressing needs of the computational science community. This paper describes the Darshan I/O characterization tool. Darshan is designed to capture an accurate picture of application I/O behavior, including properties such as patterns of access within files, with the minimum possible overhead. This characterization can shed important light on the I/O behavior of applications at extreme scale. Darshan also can enable researchers to gain greater insight into the overall patterns of access exhibited by such applications, helping the storage community to understand how to best serve current computational science applications and better predict the needs of future applications. In this work we demonstrate Darshan's ability to characterize the I/O behavior of four scientific applications and show that it induces negligible overhead for I/O intensive jobs with as many as 65,536 processes.

  • [Show abstract] [Hide abstract]
    ABSTRACT: The lasting memory-wall problem combined with the newly emerged big-data problem makes data access delay the first citizen of performance optimizations of cluster computing. Reduction of data access delay, however, is application dependent. It depends on the data access behaviors of the underlying applications. Therefore, leaning and understanding data access behaviors is a must for effective data access optimizations. Modern microprocessors are equipped with hardware data prefetchers, which predict data access patterns and prefetch data for CPU. However, memory systems in design do not have the capability to understand data access behaviors for performance optimizations. In this study, we propose a novel approach, named KNOWAC, to collect I/O information automatically through high-level I/O libraries. KNOWAC accumulates I/O knowledge and reveals data usage patterns by exploring the collected high-level I/O characteristics. The discovered data usage patterns can be used for different I/O optimizations. We apply KNOWAC to I/O prefetch under the framework of PnetCDF in this study. Experimental results on a real-world application show that KNOWAC is promising and has a true practical value in mitigating the I/O bottleneck.
    Proceedings of the 2012 IEEE International Conference on Cluster Computing; 09/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Input/output (I/O) operations can represent a significant proportion of the run-time when large scientific applications are run in parallel. Although there have been advances in the form of file-format li-braries, file-system design and I/O hardware, a growing divergence exists between the performance of parallel file-systems and compute process-ing rates. The effect is often a bottleneck when any form of file-system interaction is required. In this paper we present RIOT – an input/output tracing toolkit being developed at the University of Warwick for dynamic attachment to par-allel applications. The two-stage tracing process includes a lightweight library to record I/O events and an in-depth post-execution analysis tool to extract performance metrics such as MPI-IO bandwidth, effec-tive POSIX/file-system bandwidth, duration of individual or aggregated time spent in obtaining or releasing file locks and temporal information relating to parallel file activity. We present a case study on the use of RIOT for three standard indus-try I/O benchmarks: the BT-IO micro-application from NASA's Parallel Benchmark suite, FLASH-IO, a benchmark which replicates the check-pointing operations of the FLASH thermonuclear star modelling code and IOR, an industry standard I/O benchmark using HDF-5 and MPI-IO. Furthermore, we utilise RIOT to assess these codes when running with the Parallel Log-structured File System (PLFS) middleware devel-oped by the Los Alamos National Laboratory.
  • Source
    Euro-Par 2014 Parallel Processing; 08/2014


Available from