Scientific Computing in the Cloud

Univ. of Washington, Seattle, WA, USA
Computing in Science and Engineering (Impact Factor: 1.25). 07/2010; DOI: 10.1109/MCSE.2010.70
Source: IEEE Xplore

ABSTRACT Large, virtualized pools of computational resources raise the possibility of a new, advantageous computing paradigm for scientific research. To help achieve this, new tools make the cloud platform behave virtually like a local homogeneous computer cluster, giving users access to high-performance clusters without requiring them to purchase or maintain sophisticated hardware.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Academic data centers are commonly used to solve the major amount of scientific computing. Depending on upcoming research projects the user generated workload may change. Especially in phases of high computational demand it may be useful to temporarily extend the local site. This can be done by leasing computing resources from a cloud computing provider, e.g. Amazon EC2, to improve the service for the local user community. We present a reinforcement learning-based policy which controls the maximum leasing size with regard to the current resource/workload state and the balance between scheduling benefits and costs in an online adaptive fashion. Further, we provide an appropriate model to evaluate such policies and present heuristics to determine upper and lower reference values for the performance evaluation under the given model. Using event driven simulation and real workload traces, we are able to investigate the dynamics of the learning policy and to demonstrate the adaptivity on workload changes. By showing its performance as a ratio between costs and scheduling improvement with regard to the upper and lower reference heuristics we prove the benefit of our concept.
    Proceedings of the 18th international conference on Parallel Processing; 08/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY Cloud computing is offering new approaches for High Performance Computing (HPC) as it provides dynamically scalable resources as a service over the Internet. In addition, General-Purpose computation on Graphical Processing Units (GPGPU) has gained much attention from scientific computing in multiple domains, thus becoming an important programming model in HPC. Compute Unified Device Architecture (CUDA) has been established as a popular programming model for GPGPUs, removing the need for using the graphics APIs for computing applications. Open Computing Language (OpenCL) is an emerging alternative not only for GPGPU but also for any parallel architecture. GPU clusters, usually programmed with a hybrid parallel paradigm mixing Message Passing Interface (MPI) with CUDA/OpenCL, are currently gaining high popularity. Therefore, cloud providers are deploying clusters with multiple GPUs per node and high-speed network interconnects in order to make them a feasible option for HPC as a Service (HPCaaS). This paper evaluates GPGPU for high performance cloud computing on a public cloud computing infrastructure, Amazon EC2 Cluster GPU Instances (CGI), equipped with NVIDIA Tesla GPUs and a 10 Gigabit Ethernet network. The analysis of the results, obtained using up to 64 GPUs and 256-processor cores, has shown that GPGPU is a viable option for high performance cloud computing despite the significant impact that virtualized environments still have on network overhead, which still hampers the adoption of GPGPU communication-intensive applications. Copyright © 2012 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 08/2013; 25(12). · 0.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The efficient deployment of high performance computing applications on Clouds offers many challenges, in particular, for communication-intensive applications. One strategy to mitigate performance overheads caused by high-communication latency is to schedule requested virtual machines (VMs) effectively onto physical resources by optimizing VM placement. In current approaches based on Infrastructure as a Service resource provisioning, the task of selecting VM profiles is either left to the user or resolved with policies independent from usage patterns, with potential for performance degradation caused by virtualization overhead. Also, most VM placement heuristics disregard the topology of virtual clusters, resource usage patterns of each application, and competing workloads. In this paper, we study the case of scientific applications in virtual clusters by analyzing how different VM profiles and placements can affect observed performance of a parallel application that uses distributed memory. We propose a description for the characteristics of a virtual cluster through the placement of virtual cores, analyze different configurations with a software for systematic execution of virtual clusters, and explain the observed performance in terms of virtual cluster features and resource usage patterns of the application. Our analysis shows that other factors besides the number of cores can have significant effect on performance, such as virtual core mappings and VM spreading. We discuss how our methodology can be extended toward developing performance models, which are aware of resource contention and virtual cluster topology. Copyright © 2014 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 08/2014; · 0.85 Impact Factor

Full-text (2 Sources)

Available from
May 22, 2014