Conference Paper

Patterns and statistical analysis for understanding reduced resource computing.

DOI: 10.1145/1869459.1869525 Conference: Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2010, October 17-21, 2010, Reno/Tahoe, Nevada, USA
Source: DBLP

ABSTRACT We present several general, broadly applicable mechanisms that enable computations to execute with reduced resources, typically at the cost of some loss in the accuracy of the result they produce.We identify several general computational patterns that interact well with these resource reduction mechanisms, present a concrete manifestation of these patterns in the form of simple model programs, perform simulationbased explorations of the quantitative consequences of applying these mechanisms to our model programs, and relate the model computations (and their interaction with the resource reduction mechanisms) to more complex benchmark applications drawn from a variety of fields.

  • [Show abstract] [Hide abstract]
    ABSTRACT: As HPC system sizes grow to millions of cores and chip feature sizes continue to decrease, HPC applications become increasingly exposed to transient hardware faults. These faults can cause aborts and performance degradation. Most importantly, they can corrupt results. Thus, we must evaluate the fault vulnerability of key HPC algorithms to develop cost-effective techniques to improve application resilience. We present an approach that analyzes the vulnerability of applications to faults, systematically reduces it by protecting the most vulnerable components and predicts application vulnerability at large scales. Weinitially focus on sparse scientific applications and apply our approachin this paper to the Algebraic Multi Grid (AMG) algorithm. We empirically analyze AMG's vulnerability to hardware faults in both sequential and parallel (hybrid MPI/OpenMP) executions on up to 1,600 cores and propose and evaluate the use of targeted pointer replication to reduce it. Our techniques increase AMG's resilience to transient hardware faults by 50-80% and improve its scalability on faulty computational environments by 35%. Further, we show how to model AMG's scalability in fault-prone environments to predict execution times of large-scale runs accurately.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Synchronization overhead is a major bottleneck in scaling parallel applications to a large number of cores. This continues to be true in spite of various synchronization-reduction techniques that have been proposed. Previously studied synchronization-reduction techniques tacitly assume that all synchronizations specified in a source program are essential to guarantee quality of the results produced by the program. Recently there have been proposals to relax the synchronizations in a parallel program and compute approximate results. A fundamental challenge in using relaxed synchronization is guaranteeing that the relaxed program always produces results with a specified quality. We propose a methodology that addresses this challenge in programming with relaxed synchronization. Using our methodology programmers can systematically relax synchronization while always producing results that are of same quality as the original (un-relaxed) program. We demonstrate significant speedups using our methodology on a variety of benchmarks (e.g., up to 15x on KMeans benchmark, and up to 3x on a already highly tuned kernel from Graph500 benchmark).
    Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability; 10/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic Voltage and Frequency Scaling (DVFS) exhibits fundamental limitations as a method to reduce energy consumption in computing systems. In the HPC domain, where performance is of highest priority and codes are heavily optimized to minimize idle time, DVFS has limited opportunity to achieve substantial energy savings. This paper explores if operating processors Near the transistor Threshold Voltage (NTV) is a better alternative to DVFS for breaking the power wall in HPC. NTV presents challenges, since it compromises both performance and reliability to reduce power consumption. We present a first of its kind study of a significance-driven execution paradigm that selectively uses NTV and algorithmic error tolerance to reduce energy consumption in performance-constrained HPC environments. Using an iterative algorithm as a use case, we present an adaptive execution scheme that switches between near-threshold execution on many cores and above-threshold execution on one core, as the computational significance of iterations in the algorithm evolves over time. Using this scheme on state-of-the-art hardware, we demonstrate energy savings ranging between 35% to 67%, while compromising neither correctness nor performance.
    EnA-HPC 2014; 09/2014


1 Download
Available from