Conference Paper

Performance Profiling and Analysis of DoD Applications Using PAPI and TAU

University of Tennessee-Knoxville, Knoxville, TN
DOI: 10.1109/DODUGC.2005.50 Conference: Users Group Conference, 2005
Source: IEEE Xplore


Large scientific applications developed as recently as five to ten years ago are often at a disadvantage in current computing environments. Due to frequent acquisition decisions made for reasons such as priceperformance, in order to continue production runs it is often necessary to port large scientific applications to completely different architectures than the ones on which they were developed. Since the porting step does not include optimizations necessary for the new architecture, performance often suffers due to various architectural features. The Programming Environment and Training (PET) Computational Environments (CE) team has developed and deployed different procedures and mechanisms for collection of performance data and for profiling and optimizations of these applications based on that data. The paper illustrates some of these procedures and mechanisms.

Download full-text


Available from: Shirley V Moore, Mar 22, 2014
  • Source
    • "Existing performance tools such as TAU [12], PAPI[9], KOJAK [8], Paradyn [7], IBM's XProfiler and others [3] [1], each have different functionalities to provide different aspects of performance testing of software. For example, TAU provides aggregate profiling data, KOJAK has tracing and analyzing functionality, PAPI can be used to access hardware counter information, and Paradyn can be use for binary code instrumentation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern performance tools provide methods for easy integration into an application for performance evaluation. For a large-scale scientific software package that has been under development for decades and with developers around the world, several obstacles must be overcome in order to utilize modern performance tools and explore performance bottlenecks. In this paper, we present our experience in integrating performance tools with one popular computational chemistry package. We discuss the difficulties we encountered and the mechanisms developed to integrate performance tools into this code. With performance tools integrated, we show one of the initial performance evaluation results, and discuss what other challenges we are facing to conduct performance evaluation for large-scale scientific packages.
    21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California, USA; 01/2007
  • Source

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Benchmarks that measure memory bandwidth, such as STREAM, Apex-MAPS and MultiMAPS, are increasingly popular due to the "Von Neumann" bottleneck of modern processors which causes many calculations to be memory-bound. We present a scheme for predicting the performance of HPC applications based on the results of such benchmarks. A Genetic Algorithm approach is used to "learn" bandwidth as a function of cache hit rates per ma- chine with MultiMAPS as the fitness test. The specific results are 56 individual performance predictions including 3 full-scale pa- rallel applications run on 5 different modern HPC architectures, with various CPU counts and inputs, predicted within 10% aver- age difference with respect to independently verified runtimes.
    Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, SC 2007, November 10-16, 2007, Reno, Nevada, USA; 01/2007
Show more