Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper introduces a platform to support serverless computing for scalable event-driven data processing that features a multi-level elasticity approach combined with virtualization of GPUs. The platform supports the execution of applications based on Docker containers in response to file uploads to a data storage in order to perform the data processing in parallel. This is managed by an elastic Kubernetes cluster whose size automatically grows and shrinks depending on the number of files to be processed. To accelerate the processing time of each file, several approaches involving virtualized access to GPUs, either locally or remote, have been evaluated. A use case that involves the inference based on deep learning techniques on transthoracic echocardiography imaging has been carried out to assess the benefits and limitations of the platform. The results indicate that the combination of serverless computing and GPU virtualization introduce an efficient and cost-effective event-driven accelerated computing approach that can be applied for a wide variety of scientific applications.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In order to share a physical GPU among containers, some works directly assign an entire physical GPU to one or more containers, while some others partition physical GPUs into multiple virtual GPUs(vGPUs) and allocate one or more vGPUs to the container applying for GPU resources. Furthermore, prior work [10] has been done on extending Kubernetes to enable remote GPU virtualization, which makes containers running in non-GPU nodes share GPU resources of GPU nodes for task acceleration. Remote GPU virtualization allows the nodes of the cluster to share the GPUs present in the computing facility, which increases overall GPU utilization and reduces energy consumption and the amount of GPUs installed in the cluster [11]. ...
... API forwarding solution can overcome the limitations of black-box GPU driver by virtualizing GPUs at the library level, but it is limited by its performance overhead and functional incompleteness [13]. 2. Previous work [10] on remote GPU virtualization in Kubernetes mainly focuses on GPU acceleration. Thus, the problem of communication overhead and sharedresource inference remains unsolved. ...
... Several studies [16,[27][28][29] further extend ConVGPU to support compute resource usage isolation on kernel execution based on API forwarding. Moreover, OSCAR [10] uses API forwarding to provide serverless functions with access to remote GPUs from the containers of a Kubernetes cluster. ...
Article
Full-text available
With the increasing number of new containerized applications, such as high performance and deep learning applications, started to reply on GPU, efficiently supporting GPU in container cloud becomes essential. While GPU sharing has been extensively studied for VM, limited work has been done for containers. Existing works only use a single specific GPU virtualization technique to deploy containers, like GPU pass-through or API forwarding, and lack remote GPU virtualization optimization. The limitations lead to low system throughput and container performance degradation due to the dynamic and heterogeneous nature of container resource requirement and GPU virtualization technique, and the problem of communication overhead and resource racing. Therefore, we designed and implemented KubeGPU, which extends Kubernetes to enable GPU sharing with adaptive share strategy. Adaptive sharing strategy gives KubeGPU the ability to make a dynamic choice of GPU virtualization to deploy containers according to available GPU resources and containers’ configuration parameters such as GPU resource requirement in order to achieve a good container performance and system throughput. Besides that, network-aware scheduling approach and fine-grained allocation of remote GPU resources are proposed to optimize remote GPU virtualization. Finally, using representative real-world workloads for HPC and deep learning, we demonstrate the superiority of KubeGPU compared to other existing works, and the effectiveness of KubeGPU in minimizing communication overhead and eliminating remote GPU resource racing.
... As a result, research focus has been increasingly directed towards expanding the FaaS service oferings by adding access to hardware accelerators. Naranjo et al. introduce a GPU enabled serverless framework which links virtual GPUs with the OpenFaas serverless framework via the rCUDA [42] remote GPU virtualization service [91]. Ringlein et al. propose a system architecture involving disaggregated FPGAs within a FaaS ofering [102]. ...
... With increased exploitation of serverless for diferent application domains, there has been a rise in demand for access to specialized hardware where required. Research on such lexibile serverless frameworks is growing [91] and has great potential. ...
Article
Full-text available
Serverless computing has emerged as an attractive deployment option for cloud applications in recent times. The unique features of this computing model include rapid auto-scaling, strong isolation, fine-grained billing options and access to a massive service ecosystem which autonomously handles resource management decisions. This model is increasingly being explored for deployments in geographically distributed edge and fog computing networks as well, due to these characteristics. Effective management of computing resources has always gained a lot of attention among researchers. The need to automate the entire process of resource provisioning, allocation, scheduling, monitoring and scaling, has resulted in the need for specialized focus on resource management under the serverless model. In this article, we identify the major aspects covering the broader concept of resource management in serverless environments and propose a taxonomy of elements which influence these aspects, encompassing characteristics of system design, workload attributes and stakeholder expectations. We take a holistic view on serverless environments deployed across edge, fog and cloud computing networks. We also analyse existing works discussing aspects of serverless resource management using this taxonomy. This article further identifies gaps in literature and highlights future research directions for improving capabilities of this computing model.
... As a result, research focus has been increasingly directed towards expanding the FaaS service oferings by adding access to hardware accelerators. Naranjo et al. introduce a GPU enabled serverless framework which links virtual GPUs with the OpenFaas serverless framework via the rCUDA [42] remote GPU virtualization service [91]. Ringlein et al. propose a system architecture involving disaggregated FPGAs within a FaaS ofering [102]. ...
... With increased exploitation of serverless for diferent application domains, there has been a rise in demand for access to specialized hardware where required. Research on such lexibile serverless frameworks is growing [91] and has great potential. ...
Preprint
Full-text available
Serverless computing has emerged as an attractive deployment option for cloud applications in recent times. The unique features of this computing model include, rapid auto-scaling, strong isolation, fine-grained billing options and access to a massive service ecosystem which autonomously handles resource management decisions. This model is increasingly being explored for deployments in geographically distributed edge and fog computing networks as well, due to these characteristics. Effective management of computing resources has always gained a lot of attention among researchers. The need to automate the entire process of resource provisioning, allocation, scheduling, monitoring and scaling, has resulted in the need for specialized focus on resource management under the serverless model. In this article, we identify the major aspects covering the broader concept of resource management in serverless environments and propose a taxonomy of elements which influence these aspects, encompassing characteristics of system design, workload attributes and stakeholder expectations. We take a holistic view on serverless environments deployed across edge, fog and cloud computing networks. We also analyse existing works discussing aspects of serverless resource management using this taxonomy. This article further identifies gaps in literature and highlights future research directions for improving capabilities of this computing model.
... Heterogeneous accelerator support. Besides studies of GPU support [125,160] and FPGA support [62,176], other accelerators like Tensor Processing Unit (TPU) also should be noticed for cloud providers of serverless computing. ...
Preprint
Serverless computing is an emerging cloud computing paradigm. Moreover, it has become an attractive development option for cloud-based applications for software developers. The most significant advantage of serverless computing is to free software developers from the burden of complex underlying management tasks and allow them to focus on only the application logic implementation. Based on its benign characteristics and bright prospect, it has been an increasingly hot topic in various scenarios, such as machine learning, scientific computing, video processing, and Internet of Things. However, none of the studies focuses on a comprehensive analysis of the current research state of the art of serverless computing from the research scope and depth. To fill this knowledge gap, we present a comprehensive literature review to summarize the current research state of the art of serverless computing. This review is based on selected 164 research papers to answer three key aspects, i.e., research directions (What), existing solutions (How), and platforms and venues (Where). Specifically, first, we construct a taxonomy linked to research directions about the serverless computing literature. Our taxonomy has 18 research categories covering performance optimization, programming framework, application migration, multi-cloud development, cost, testing, debugging, etc. Second, we classify the related studies of each research direction and elaborate on their specific solutions. Third, we investigate the distributions of experimental platforms and publication venues for existing techniques. Finally, based on our analysis, we discuss some key challenges and envision promising opportunities for future research on the serverless platform side, serverless application side, and serverless computing community side.
... We think a multiplexing accelerator in serverless is the key to solving these obstacles. For example, some works [98,150] integrate GPUs into serverless systems, and BlastFunction [14] makes FPGAs available in serverless. However, the current works are still insufficient. ...
Preprint
The development of cloud infrastructures inspires the emergence of cloud-native computing. As the most promising architecture for deploying microservices, serverless computing has recently attracted more and more attention in both industry and academia. Due to its inherent scalability and flexibility, serverless computing becomes attractive and more pervasive for ever-growing Internet services. Despite the momentum in the cloud-native community, the existing challenges and compromises still wait for more advanced research and solutions to further explore the potentials of the serverless computing model. As a contribution to this knowledge, this article surveys and elaborates the research domains in the serverless context by decoupling the architecture into four stack layers: Virtualization, Encapsule, System Orchestration, and System Coordination. Inspired by the security model, we highlight the key implications and limitations of these works in each layer, and make suggestions for potential challenges to the field of future serverless computing.
... Numba also provides support for generating code for accelerators such as Nvidia/AMD GPUs using NVVM [30] and HLC [31]. Using GPUs for accelerating FaaS functions [32] is our interest for the investigation in the future, but is out of scope for this work. ...
Preprint
Full-text available
FaaS allows an application to be decomposed into functions that are executed on a FaaS platform. The FaaS platform is responsible for the resource provisioning of the functions. Recently, there is a growing trend towards the execution of compute-intensive FaaS functions that run for several seconds. However, due to the billing policies followed by commercial FaaS offerings, the execution of these functions can incur significantly higher costs. Moreover, due to the abstraction of underlying processor architectures on which the functions are executed, the performance optimization of these functions is challenging. As a result, most FaaS functions use pre-compiled libraries generic to x86-64 leading to performance degradation. In this paper, we examine the underlying processor architectures for Google Cloud Functions (GCF) and determine their prevalence across the 19 available GCF regions. We modify, adapt, and optimize three compute-intensive FaaS workloads written in Python using Numba, a JIT compiler based on LLVM, and present results wrt performance, memory consumption, and costs on GCF. Results from our experiments show that the optimization of FaaS functions can improve performance by 12.8x (geometric mean) and save costs by 73.4% on average for the three functions. Our results show that optimization of the FaaS functions for the specific architecture is very important. We achieved a maximum speedup of 1.79x by tuning the function especially for the instruction set of the underlying processor architecture.
Article
The development of cloud infrastructures inspires the emergence of cloud-native computing. As the most promising architecture for deploying microservices, serverless computing has recently attracted more and more attention in both industry and academia. Due to its inherent scalability and flexibility, serverless computing becomes attractive and more pervasive for ever-growing Internet services. Despite the momentum in the cloud-native community, the existing challenges and compromises still wait for more advanced research and solutions to further explore the potentials of the serverless computing model. As a contribution to this knowledge, this article surveys and elaborates the research domains in the serverless context by decoupling the architecture into four stack layers: Virtualization, Encapsule, System Orchestration, and System Coordination. Inspired by the security model, we highlight the key implications and limitations of these works in each layer, and make suggestions for potential challenges to the field of future serverless computing.
Article
Serverless computing and, in particular, the functions as a service model has become a convincing paradigm for the development and implementation of highly scalable applications in the cloud. This is due to the transparent management of three key functionalities: triggering of functions due to events, automatic provisioning and scalability of resources, and fine-grained pay-per-use. This article presents a serverless web-based scientific gateway to execute the inference phase of previously trained machine learning and artificial intelligence models. The execution of the models is performed both in Amazon Web Services and in on-premises clouds with the OSCAR framework for serverless scientific computing. In both cases, the computing infrastructure grows elastically according to the demand adopting scale-to-zero approaches to minimize costs. The web interface provides an improved user experience by simplifying the use of the models. The usage of machine learning in a computing platform that can use both on-premises clouds and public clouds constitutes a step forward in the adoption of serverless computing for scientific applications.
Article
Full-text available
Serverless computing menjadi paradigma tren baru di cloud komputasi, memungkinkan pengembang untuk fokus pada aplikasi inti logika dan aplikasi prototipe cepat. Serverless computing memiliki biaya yang lebih rendah dan kenyamanan yang diberikannya kepada pengguna yang tidak perlu fokus pada manajemen server. Karena prospeknya yang bagus Serverless computing, dalam beberapa tahun terakhir, sebagian besar vendor cloud utama telah meluncurkan platform komputasi tanpa server komoditas mereka. Namun, karakteristik platform ini belum dipelajari secara sistematis. Perlunya analisis platform ini secara kualitatif yaitu mulai dari pengembangan, penyebaran, dan aspek runtime untuk membentuk taksonomi karakteristik. Google Cloud Platform memiliki beberapa jenis serverless Computing, dalam artikel ini akan dijelaskan mengenai perbandingan antara beberapa Serverless Computing yaitu diantaranya Cloud Functions, App Engine, Cloud Run,dan Google Kuberenetes Engine (GKE).
Article
Full-text available
Serverless computing has gained importance over the last decade as an exciting new field, owing to its large influence in reducing costs, decreasing latency, improving scalability, and eliminating server-side management, to name a few. However, to date there is a lack of in-depth survey that would help developers and researchers better understand the significance of serverless computing in different contexts. Thus, it is essential to present research evidence that has been published in this area. In this systematic survey, 275 research papers that examined serverless computing from well-known literature databases were extensively reviewed to extract useful data. Then, the obtained data were analyzed to answer several research questions regarding state-of-the-art contributions of serverless computing, its concepts, its platforms, its usage, etc. We moreover discuss the challenges that serverless computing faces nowadays and how future research could enable its implementation and usage.
Article
Serverless computing is an emerging event‐driven programming model that accelerates the development and deployment of scalable web services on cloud computing systems. Though widely integrated with the public cloud, serverless computing use is nascent for edge‐based, Internet of Things (IoT) deployments. In this work, we present STOIC (serverless teleoperable hybrid cloud), an IoT application deployment and offloading system that extends the serverless model in three ways. First, STOIC adopts a dynamic feedback control mechanism to precisely predict latency and dispatch workloads uniformly across edge and cloud systems using a distributed serverless framework. Second, STOIC leverages hardware acceleration (e.g., GPU resources) for serverless function execution when available from the underlying cloud system. Third, STOIC can be configured in multiple ways to overcome deployment variability associated with public cloud use. We overview the design and implementation of STOIC and empirically evaluate it using real‐world machine learning applications and multitier IoT deployments (edge and cloud). Specifically, we show that STOIC can be used for training image processing workloads (for object recognition)—once thought too resource‐intensive for edge deployments. We find that STOIC reduces overall execution time (response latency) and achieves placement accuracy that ranges from 92% to 97%.
Article
Full-text available
MapReduce is one of the most widely used programming models for analysing large-scale datasets, i.e. Big Data. In recent years, serverless computing and, in particular, Functions as a Service (FaaS) has surged as an execution model in which no explicit management of servers (e.g. virtual machines) is performed by the user. Instead, the Cloud provider dynamically allocates resources to the function invocations and fine-grained billing is introduced depending on the execution time and allocated memory, as exemplified by AWS Lambda. In this article, a high-performant serverless architecture has been created to execute MapReduce jobs on AWS Lambda using Amazon S3 as the storage backend. In addition, a thorough assessment has been carried out to study the suitability of AWS Lambda as a platform for the execution of High Throughput Computing jobs. The results indicate that AWS Lambda provides a convenient computing platform for general-purpose applications that fit within the constraints of the service (15 min of maximum execution time, 3008 MB of RAM and 512 MB of disk space) but it exhibits an inhomogeneous performance behaviour that may jeopardise adoption for tightly coupled computing jobs.
Article
Full-text available
In modern virtual computing environment, existing GPU virtualization techniques are unable to take full advantage of a GPU's powerful 2D/3D hardware-accelerated graphics rendering performance or parallel computing potential, or it has not been considered that the internal resources of a GPU domain are fairly allocated between VMs with different performance requirements. Therefore, we propose a multi-channel GPU virtualization architecture (VMCG), model the corresponding credit allocating and transferring mechanisms, and redesign the virtual multi-channel GPU fair-scheduling algorithm. VMCG provides a separate V-Channel for each guest VM (DomU) that competes with other VMs for the same physical GPU resources, and each DomU submits command request blocks to its respective V-Channel according to the corresponding DomU ID. Through the virtual multi-channel GPU fair-scheduling algorithm, not only do multiple DomUs make full use of native GPU hardware acceleration, but the fairness of GPU resource allocation is significantly improved during GPU-intensive workloads from multiple DomUs running on the same host. Experimental results show that, for 2D/3D graphics applications, performance is close to 96\% of that of the native GPU, performance is improved by approximately 500\% for parallel computing applications, and GPU resource-allocation fairness is improved by approximately 60\%-80\%.
Article
Full-text available
New architectural patterns (e.g. microservices), the massive adoption of Linux containers (e.g. Docker containers), and improvements in key features of Cloud computing such as auto-scaling, have helped developers to decouple complex and monolithic systems into smaller stateless services. In turn, Cloud providers have introduced serverless computing, where applications can be defined as a workflow of event-triggered functions. However, serverless services, such as AWS Lambda, impose serious restrictions for these applications (e.g. using a predefined set of programming languages or difficulting the installation and deployment of external libraries). This paper addresses such issues by introducing a framework and a methodology to create Serverless Container-aware ARchitectures (SCAR). The SCAR framework can be used to create highly-parallel event-driven serverless applications that run on customized runtime environments defined as Docker images on top of AWS Lambda. This paper describes the architecture of SCAR together with the cache-based optimizations applied to minimize cost, exemplified on a massive image processing use case. The results show that, by means of SCAR, AWS Lambda becomes a convenient platform for High Throughput Computing, specially for highly-parallel bursty workloads of short stateless jobs.
Conference Paper
Full-text available
In line with cloud computing emergence as the dominant enterprise computing paradigm, our conceptualization of the cloud computing reference architecture and service construction has also evolved. For example, to address the need for cost reduction and rapid provisioning, virtualization has moved beyond hardware to containers. More recently, serverless computing or Function-as-a-Service has been presented as a means to introduce further cost-efficiencies, reduce configuration and management overheads, and rapidly increase an application's ability to speed up, scale up and scale down in the cloud. The potential of this new computation model is reflected in the introduction of serverless computing platforms by the main hyperscale cloud service providers. This paper provides an overview and multi-level feature analysis of seven enterprise serverless computing platforms. It reviews extant research on these platforms and identifies the emergence of AWS Lambda as a de facto base platform for research on enterprise serverless cloud computing. The paper concludes with a summary of avenues for further research.
Conference Paper
Full-text available
Cloud computing enables an entire ecosystem of developing, composing, and providing IT services. An emerging class of cloud-based software architectures, serverless, focuses on providing software architects the ability to execute arbitrary functions with small overhead in server management, as Function-as-a-service (FaaS). However useful, serverless and FaaS suffer from a community problem that faces every emerging technology, which has indeed also hampered cloud computing a decade ago: lack of clear terminology, and scattered vision about the field. In this work, we address this community problem. We clarify the term serverless, by reducing it to cloud functions as programming units, and a model of executing simple and complex (e.g., workflows of) functions with operations managed primarily by the cloud provider. We propose a research vision, where 4 key directions (perspectives) present 17 technical opportunities and challenges.
Conference Paper
Full-text available
As more scientific workloads are moved into the cloud, the need for high performance accelerators increases. Accelerators such as GPUs offer improvements in both performance and power efficiency over traditional multi-core processors, however, their use in the cloud has been limited. Today, several common hypervisors support GPU passthrough, but their performance has not been systematically characterized. In this paper we show that low overhead GPU passthrough is achievable across 4 major hypervisors and two processor microarchitectures. We compare the performance of two generations of NVIDIA GPUs within the Xen, VMWare ESXi, and KVM hypervisors, and we also compare the performance to that of Linux Containers (LXC). We show that GPU passthrough to KVM achieves 98 - 100\% of the base system's performance across two architectures, while Xen and VMWare achieve 96 - 99\% of the base systems performance, respectively. In addition, we describe several valuable lessons learned through our analysis and share the advantages and disadvantages of each hypervisor/GPU passthrough solution.
Article
Full-text available
Cloud infrastructures are becoming an appropriate solution to address the computational needs of scientific applications. However, the use of public or on-premises Infrastructure as a Service (IaaS) clouds requires users to have non-trivial system administration skills. Resource provisioning systems provide facilities to choose the most suitable Virtual Machine Images (VMI) and basic configuration of multiple instances and subnetworks. Other tasks such as the configuration of cluster services, computational frameworks or specific applications are not trivial on the cloud, and normally users have to manually select the VMI that best fits, including undesired additional services and software packages. This paper presents a set of components that ease the access and the usability of IaaS clouds by automating the VMI selection, deployment, configuration, software installation, monitoring and update of Virtual Appliances. It supports APIs from a large number of virtual platforms, making user applications cloud-agnostic. In addition it integrates a contextualization system to enable the installation and configuration of all the user required applications providing the user with a fully functional infrastructure. Therefore, golden VMIs and configuration recipes can be easily reused across different deployments. Moreover, the contextualization agent included in the framework supports horizontal (increase/decrease the number of resources) and vertical (increase/decrease resources within a running Virtual Machine) by properly reconfiguring the software installed, considering the configuration of the multiple resources running. This paves the way for automatic virtual infrastructure deployment, customization and elastic modification at runtime for IaaS clouds.
Article
Full-text available
The use of virtualization to abstract underlying hardware can aid in sharing such resources and in efficiently managing their use by high performance applications. Unfortunately, virtualization also prevents efficient access to accelerators, such as Graphics Processing Units (GPUs), that have be-come critical components in the design and architecture of HPC systems. Supporting General Purpose computing on GPUs (GPGPU) with accelerators from different vendors presents significant challenges due to proprietary program-ming models, heterogeneity, and the need to share accelera-tor resources between different Virtual Machines (VMs). To address this problem, this paper presents GViM, a sys-tem designed for virtualizing and managing the resources of a general purpose system accelerated by graphics proces-sors. Using the NVIDIA GPU as an example, we discuss how such accelerators can be virtualized without additional hardware support and describe the basic extensions needed for resource management. Our evaluation with a Xen-based implementation of GViM demonstrate efficiency and flexi-bility in system usage coupled with only small performance penalties for the virtualized vs. non-virtualized solutions.
Article
Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Extrapolating from recent trends in network bandwidth and the advent of disaggregated storage, we suggest that stateless functions are a natural fit for data processing in future computing environments.
Conference Paper
The computational power and memory bandwidth of graphics processing units (GPUs) have turned them into attractive platforms for general-purpose applications at significant speed gains versus their CPU counterparts [1]. In addition, an increasing number of today’s state-of-the-art supercomputers [2] include commodity GPUs to bring us unprecedented levels of high performance and low cost. In this paper, we describe CUDA as the software and hardware paradigm behind those achievements. We summarize its evolution over the past decade, explain its major features and provide insights about future trends for this emerging trend to continue as flagship within high performance computing.
Article
Background: Accurate estimates of Rheumatic Heart Disease (RHD) burden are needed to justify improved integration of RHD prevention and screening into the public health systems, but data from Latin America are still sparse. Objective: To determine the prevalence of RHD among socioeconomically disadvantaged youth (5-18years) in Brazil and examine risk factors for the disease. Methods: The PROVAR program utilizes non-expert screeners, telemedicine, and handheld and standard portable echocardiography to conduct echocardiographic screening in socioeconomically disadvantaged schools in Minas Gerais, Brazil. Cardiologists in the US and Brazil provide expert interpretation according to the 2012 World Heart Federation Guidelines. Here we report prevalence data from the first 14months of screening, and examine risk factors for RHD. Results: 5996 students were screened across 21 schools. Median age was 11.9 [9.0/15.0] years, 59% females. RHD prevalence was 42/1000 (n=251): 37/1000 borderline (n=221) and 5/1000 definite (n=30). Pathologic mitral regurgitation was observed in 203 (80.9%), pathologic aortic regurgitation in 38 (15.1%), and mixed mitral/aortic valve disease in 10 (4.0%) children. Older children had higher prevalence (50/1000 vs. 28/1000, p<0.001), but no difference was observed between northern (lower resourced) and central areas (34/1000 vs. 44/1000, p=0.31). Females had higher prevalence (48/1000 vs. 35/1000, p=0.016). Age (OR=1.15, 95% CI:1.10-1.21, p<0.001) was the only variable independently associated with RHD findings. Conclusions: RHD continues to be an important and under recognized condition among socioeconomically disadvantaged Brazilian schoolchildren. Our data adds to the compelling case for renewed investment in RHD prevention and early detection in Latin America.
Chapter
There is a trend towards using graphics processing units (GPUs) not only for graphics visualization, but also for accelerating scientific applications. But their use for this purpose is not without disadvantages: GPUs increase costs and energy consumption. Furthermore, GPUs are generally underutilized. Using virtual machines could be a possible solution to address these problems, however, current solutions for providing GPU acceleration to virtual machines environments, such as KVM or Xen, present some issues. In this paper we propose the use of remote GPUs to accelerate scientific applications running inside KVM virtual machines. Our analysis shows that this approach could be a possible solution, with low overhead when used over InfiniBand networks.
Article
Graphic processing units (GPUs) provide a massively-parallel computational power and encourage the use of general-purpose computing on GPUs (GPGPU). The distinguished design of discrete GPUs helps them to provide the high throughput, scalability, and energy efficiency needed for GPGPU applications. Despite the previous study on GPU virtualization, the tradeoffs between the virtualization approaches remain unclear, because of a lack of designs for or quantitative evaluations of the hypervisor-level virtualization for discrete GPUs. Shedding light on these tradeoffs and the technical requirements for the hypervisor-level virtualization would facilitate the development of an appropriate GPU virtualization solution. GPUvm, which is an open architecture for hypervisor-level GPU virtualization with a particular emphasis on using the Xen hypervisor, is presented in this paper. {GPUvm} offers three virtualization modes: the full-, naive para-, and high-performance para-virtualization. {GPUvm} exposes low-and high-level interfaces such as memory-mapped I/O and DRM APIs to the guest virtual machines (VMs). Our experiments using a relevant commodity GPU showed that {GPUvm} incurs different overheads as the level of the exposed interfaces is changed. The results also showed that a coarse-grained fairness on the GPU among multiple VMs can be achieved using GPU scheduling.
Conference Paper
The use of graphics processing units (GPUs) to accelerate some portions of applications is widespread nowadays. To avoid the usual inconveniences associated with these accelerators (high acquisition cost, high energy consumption, and low utilization), one possible solution is sharing them among several nodes in the cluster. Several years ago, remote GPU virtualization middleware systems appeared to implement this solution. Although these systems tackled the aforementioned inconveniences, their performance was usually impaired by the low bandwidth attained by the underlying network. However, the recent advances in InfiniBand fabrics have changed this trend. In this paper we analyze how the high bandwidth provided by the new EDR 100G InfiniBand fabric allows remote GPU virtualization middleware systems not only to perform very similar to local GPUs, but also to improve overall performance for some applications.
Conference Paper
Using GPUs reduces execution time of many applications but increases acquisition cost and power consumption. Furthermore, GPUs usually attain a relatively low utilization. In this context, remote GPU virtualization solutions were recently created to overcome the drawbacks of using GPUs. Currently, many different remote GPU virtualization frameworks exist, all of them presenting very different characteristics. These differences among them may lead to differences in performance. In this work we present a performance comparison among the only three CUDA remote GPU virtualization frameworks publicly available at no cost. Results show that performance greatly depends on the exact framework used, being the rCUDA virtualization solution the one that stands out among them. Furthermore, rCUDA doubles performance over CUDA for pageable memory copies.
Conference Paper
Graphics Processing Units (GPU) have become important components in high performance computing (HPC) systems for their massively parallel computing capability and energy efficiency. Virtualization technologies are increasingly applied to HPC to reduce administration costs and improve system utilization. However, virtualizing the GPU to support general purpose computing presents many challenges because of the complexity of this device. On VMware's ESX hypervisor, DirectPath I/O can provide virtual machines (VM) high performance access to physical GPUs. However, this technology does not allow multiplexing for sharing GPUs among VMs and is not compatible with vMotion, VMware's technology for transparently migrating VMs among hosts inside clusters. In this paper, we address these issues by implementing a solution that uses "remote API execution" and takes advantage of DirectPath I/O to enable general purpose GPU on ESX. This solution, named vmCUDA, allows CUDA applications running concurrently in multiple VMs on ESX to share GPU(s). Our solution requires neither recompilation nor even editing of the source code of CUDA applications. Our performance evaluation has shown that vmCUDA introduced an overhead of 0.6% - 3.5% for applications with moderate data size and 14% - 20% for those with large data (e.g. 12.5 GB - 237.5GB in our experiments).
Article
This paper presents a general energy management system for High Performance Computing (HPC) clusters and cloud infrastructures that powers off cluster nodes when they are not being used, and conversely powers them on when they are needed. This system can be integrated with different HPC cluster middleware, such as Batch-Queuing Systems or Cloud Management Systems, and can also use different mechanisms for powering on and off the computing nodes. The presented system makes it possible to implement different energy-saving policies depending on the priorities and particularities of the cluster. It also provides a hook system to extend the functionality, and a sensor system in order to take into account environmental information. The paper describes the successful integration of the system proposed with some popular Batch-Queuing Systems, and also with some Cloud Management middlewares, presenting two real use-cases that show significant energy/costs savings of 27% and 17%.
Conference Paper
This paper describes vCUDA, a general-purpose graphics processing unit (GPGPU) computing solution for virtual machines (VMs). vCUDA allows applications executing within VMs to leverage hardware acceleration, which can be beneficial to the performance of a class of high-performance computing (HPC) applications. The key insights in our design include API call interception and redirection and a dedicated RPC system for VMs. With API interception and redirection, Compute Unified Device Architecture (CUDA) applications in VMs can access a graphics hardware device and achieve high computing performance in a transparent way. In the current study, vCUDA achieved a near-native performance with the dedicated RPC system. We carried out a detailed analysis of the performance of our framework. Using a number of unmodified official examples from CUDA SDK and third-party applications in the evaluation, we observed that CUDA applications running with vCUDA exhibited a very low performance penalty in comparison with the native environment, thereby demonstrating the viability of vCUDA architecture.
Conference Paper
The GPU Virtualization Service (gVirtuS) presented in this work tries to fill the gap between in-house hosted computing clusters, equipped with GPGPUs devices, and pay-for-use high performance virtual clusters deployed via public or private computing clouds. gVirtuS allows an instanced virtual machine to access GPGPUs in a transparent and hypervisor independent way, with an overhead slightly greater than a real machine/GPGPU setup. The performance of the components of gVirtuS is assessed through a suite of tests in different deployment scenarios, such as providing GPGPU power to cloud computing based HPC clusters and sharing remotely hosted GPGPUs among HPC nodes.
Conference Paper
The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable energy consumption, so that attaching a GPU to every node of a cluster has a strong impact on its overall power consumption. In this paper we detail a framework that enables remote GPU acceleration in HPC clusters, thus allowing a reduction in the number of accelerators installed in the cluster. This leads to energy, acquisition, maintenance, and space savings.
Computer Aided Diagnosis for Rheumatic Heart Disease by AI Applied to Features Extraction from Echocardiography
  • E Camacho-Ramos
  • A Jimenez-Pastor
  • I Blanquer
  • F García-Castro
  • A Alberich-Bayarri
Google Cloud Functions
  • Google
Automatic visceral fat characterisation on CT scans through deep learning and CNN for the assessment of metabolic syndrome
  • A Jimenez-Pastor
  • A Alberich-Bayarri
  • F Garcia-Castro
  • L Marti-Bonmati