Conference Paper

The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software.

DOI: 10.1145/1383422.1383440 Conference: Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 23-27 June 2008, Boston, MA, USA
Source: DBLP

ABSTRACT Previous studies have revealed that paravirtualization im- poses minimal performance overhead on High Performance Computing (HPC) workloads, while exposing numerous ben- efits for this field. In this study, we are investigating the memory hierarchy characteristics of paravirtualized systems and their impact on automatically-tuned software systems. We are presenting an accurate characterization of memory attributes using hardware counters and user-process account- ing. For that, we examine the proficiency of ATLAS, a quintessential example of an autotuning software system, in tuning the BLAS library routines for paravirtualized sys- tems. In addition, we examine the effects of paravirtual- ization on the performance boundary. Our results show that the combination of ATLAS and Xen paravirtualiza- tion delivers native execution performance and nearly iden- tical memory hierarchy performance profiles. Our research thus exposes new benefits to memory-intensive applications arising from the ability to slim down the guest OS without influencing the system performance. In addition, our find- ings support a novel and very attractive deployment scenario for computational science and engineering codes on virtual clusters and computational clouds.

  • Source
    International Journal of Computer Networks. 01/2012; 4(2).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale computing platforms provide tremendous capabilities for scientific discovery. These systems have hundreds of thousands of computing cores, hundreds of terabytes of memory, and enormous high-performance interconnection networks. These systems are facing enormous challenges to achieve performance at such scale. Failures are an Achilles heel of these enormous systems. As applications and system software scale up to multi-petaflop and beyond to exascale platforms, the occurrence of failure will be much more common. This has given rise to a push in fault-tolerance and resilience research for HPC systems. This includes work on log analysis to identify types of failures, enhancements to the Message Passing Interface (MPI) to incorporate fault awareness, and a variety of fault tolerance mechanisms that span redundant computation, algorithm based fault tolerance, and advanced checkpoint/restart techniques. While there is much work to be done on the FT/Resilience mechanisms for such large-scale systems, there is also a profound gap in the tools for experimentation. This gap is compounded by the fact that HPC environments have stringent performance requirements and are often highly customized. The tool chain for these systems are often tailored for the platform and while the majority of systems on the Top500 Supercomputer list run Linux, these operating environments typically contain many site/machine specific enhancements. Therefore, it is desirable to maintain a consistent execution environment to minimize end-user (scientist) interruption. The work on system-level virtualization for HPC system offers a unique opportunity to maintain a consistent execution environment via a virtual machine (VM). Recent work on virtualization for HPC has shown that low-overhead, high performance systems can be realized. Virtualization also provides a clean abstraction for building experimental tools for investigation into the effects of failures in HPC and the related research on FT/Resilience mechanisms and policies. In this paper we discuss the motivation for tools to perform fault injection in an HPC context, and outline an approach that can leverage virtualization.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of virtual machine technology to share fractional node resources in a precise and controlled manner. Other VM-based scheduling approaches have focused primarily on technical issues or extensions to existing batch scheduling systems, while we take a more aggressive approach and seek to find heuristics that maximize an objective metric correlated with job performance. We derive absolute performance bounds and develop algorithms for the online nonclairvoyant version of our scheduling problem. We further evaluate these algorithms in simulation against both synthetic and real-world HPC workloads and compare our algorithms to standard batch scheduling approaches. We find that our approach improves over batch scheduling by orders of magnitude in terms of job stretch, while leading to comparable or better resource utilization. Our results demonstrate that virtualization technology coupled with lightweight online scheduling strategies can afford dramatic improvements in performance for executing HPC workloads.
    IEEE Transactions on Parallel and Distributed Systems 04/2012; · 1.80 Impact Factor

Full-text (2 Sources)

Available from
May 17, 2014