Conference Paper

The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software.

DOI: 10.1145/1383422.1383440 Conference: Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 23-27 June 2008, Boston, MA, USA
Source: DBLP

ABSTRACT Previous studies have revealed that paravirtualization im- poses minimal performance overhead on High Performance Computing (HPC) workloads, while exposing numerous ben- efits for this field. In this study, we are investigating the memory hierarchy characteristics of paravirtualized systems and their impact on automatically-tuned software systems. We are presenting an accurate characterization of memory attributes using hardware counters and user-process account- ing. For that, we examine the proficiency of ATLAS, a quintessential example of an autotuning software system, in tuning the BLAS library routines for paravirtualized sys- tems. In addition, we examine the effects of paravirtual- ization on the performance boundary. Our results show that the combination of ATLAS and Xen paravirtualiza- tion delivers native execution performance and nearly iden- tical memory hierarchy performance profiles. Our research thus exposes new benefits to memory-intensive applications arising from the ability to slim down the guest OS without influencing the system performance. In addition, our find- ings support a novel and very attractive deployment scenario for computational science and engineering codes on virtual clusters and computational clouds.

Download full-text


Available from: Haihang You, Jul 06, 2015
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Scientists are currently evaluating the cloud as a new platform. Many important scientific applications, however, perform poorly in the cloud. These applications proceed in highly parallel discrete time-steps or "ticks," using logical synchronization barriers at tick boundaries. We observe that network jitter in the cloud can severely increase the time required for communication in these applications, significantly increasing overall running time. In this paper, we propose a general parallel framework to process time-stepped applications in the cloud. Our framework exposes a high-level, data-centric programming model which represents application state as tables and dependencies between states as queries over these tables. We design a jitter-tolerant runtime that uses these data dependencies to absorb latency spikes by (1) carefully scheduling computation and (2) replicating data and computation. Our data-driven approach is transparent to the scientist and requires little additional code. Our experiments show that our methods improve performance up to a factor of three for several typical time-stepped applications.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of virtual machine technology for sharing resources in a precise and controlled manner. We justify our approach and propose several job scheduling algorithms. We present results obtained in simulations for synthetic and real-world High Performance Computing (HPC) workloads, in which we compare our proposed algorithms with standard batch scheduling algorithms. We find that our approach widely outperforms batch scheduling. We also identify a few promising algorithms that perform well across most experimental scenarios. Our results demonstrate that virtualization technology coupled with lightweight scheduling strategies affords dramatic improvements in performance for HPC workloads.
    Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on; 05/2010