Research Items (19)
This proposal addresses, from two different approaches, the improvement of data centers produc- tivity through an efficient resource management. On the one hand, the combination of GPU remote virtualization technologies with workload managers in HPC clusters demonstrated an interesting increase in throughput, in terms of completed jobs per unit of time, during the research conducted in the predoctoral period. The dissertation begins with an extended study on its impact not only in productivity, but also in resource utilization and energy consumption. Hence, an efficient management of the access to these accelerators is crucial in order to obtain a higher number of completed jobs per unit of time rate. On the same basis, cloud computing environments (public or private) also deal with GPUs, since virtual machines can be equipped with these devices. As detailed in this document, the adoption of a GPU remote virtualization technology together with a resource manager introduces new working modes aimed to the global throughput improvement. On the other hand, the second approach involves job reconfigurations in terms of varying its number of processes during the execution (commonly referred as MPI malleability) in order to increase the system throughput. Currently, MPI jobs suppose a high percentage of the total load in an HPC facility. In an effort to ease the adoption of malleability in scientific applications, this manuscript presents two solutions, from an OmpSs-like programming model approach and from a MPI-friendly syntax, which provide the necessary tools for easily converting an application into malleable. Performance evaluations reveal a non-negligible improvement not only in the throughput, but also in the job waiting time and in the energy consumption.
- Sep 2018
Several studies have proved the benefits of job malleability, that is, the capacity of an application to adapt its parallelism to a dynamically changing number of allocated processors. The most remarkable advantages of executing malleable jobs as part of a high performance computer workload are the throughput increase and the more efficient utilization of the underlying resources. Malleability has been mostly applied to iterative applications where all the processes execute the same operations over different sets of data and with a balanced per process load. Unfortunately, not all scientific applications adhere to this process-level malleable job structure. There are scientific applications which are either noniterative or present an irregular per process load distribution. Unlike many other reconfiguration tools, the Dynamic Management of Resources Application Programming Interface (DMR API) provides the necessary flexibility to make malleable these out-of-target applications. In this article, we study the particular case of using the DMR API to generate a malleable version of HPG aligner, a distributed-memory noniterative genomic sequencer featuring an irregular communication pattern among processes. Through this first conversion of an out-of-target application to a malleable job, we both illustrate how the DMR API may be used to convert this type of applications into malleable and test the benefits of this conversion in production clusters. Our experimental results reveal an important reduction of the malleable HPG aligner jobs completion time compared to the original HPG aligner version. Furthermore, HPG aligner malleable workloads achieve a greater throughput than their fixed counterparts.
Cloud technology is an attractive infrastructure solution that provides customers with an almost unlimited on-demand computational capacity using a pay-per-use approach, and allows data centers to increase their energy and economic savings by adopting a virtualized resource sharing model. However, resources such as graphics processing units (GPUs), have not been fully adapted to this model. Although, general-purpose computing on graphics processing units (GPGPU) is becoming more and more popular, cloud providers lack of fiexibility to manage accelerators, because of the extended use of peripheral component interconnect (PCI) passthrough techniques to attach GPUs to virtual machines (VMs). For this reason, we design, develop, and evaluate a service that provides a complete management of cloudified GPUs (cGPUs) in public cloud platforms. Our solution enables an effective, anonymous, and transparent access from VMs to cGPUs that are previously scheduled and assigned by a full resource manager, taking into account newGPU selection policies and newworking modes based on the locality of the physical accelerators and the exclusivity when accessing them. This easy-to-adopt tool improves the resource availability through different cGPUs configurations for end-users, whilst cloud providers are able to achieve a better utilization of their infrastructures and offer more competitive services. Scalability results in a real cloud environment demonstrate that our solution introduces a virtually null overhead in the deployment of VMs. Besides, performance experiments reveal that GPU-enabled clusters based on cloud infrastructures can benefit from our proposal not only exploiting better the accelerators, but also serving more jobs requests per unit of time.
- Jul 2018
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of number of processes. To carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager and, through the runtime, to change its number of MPI ranks. The collaboration between both the workload manager—aware of the queue of jobs and the resources allocation—and the parallel runtime—able to transparently handle the processes and the program data—is crucial for our throughput-aware malleability methodology. Hence, when a job triggers a reconfiguration, the resource manager will check the cluster status and return the appropriate action: i) expand, if there are spare resources; ii) shrink, if queued jobs can be initiated; or iii) none, if no change can improve the global productivity. In this paper, we describe the internals of our framework and demonstrate how it reduces the global workload completion time along with providing a more efficient usage of the underlying resources. For this purpose, we present a thorough study of the adaptive workloads processing by showing the detailed behavior of our framework in representative experiments.
The C language has been used for ages in the application development in multidisciplinary environments. However, in the academia, this language is being replaced by other higher-level languages due to they are easier to understand, learn and apply. Moreover, the necessity of professionals with a good knowledge in those high-level languages is constantly increasing because of the boosting of mobile devices. This scenario generates a lack of low-level language programmers, required in other less trendy fields, but equal or more important, such as science, engineering or research. In order to revive the interest in low-level languages and provide those minority fields with well-prepared staff, we present in this work a MOCC C-programming course that is addressed to any kind of people with or without IT background. A feature that differentiates this course from others programming online-based courses is that we mainly focus on the C language syntax providing, via a self-tuned virtual machine, an encapsulated environment that hides any interaction with the command-line of the underlying operating system. A secondary target of this work is to foster the computer science degree students to enrol the computer architecture specialization at the Universitat Jaume I (Spain). For this purpose, the High Performance Computing and Architectures research group of that University has decided to use this C course as a tool for fulfill the gap of the current syllabus. The results show that half of the participants that completed the first session of the course have satisfactorily finished the course, and the number of computer science degree students that chose the computer architecture specialization the following academic course was increment by 3x.
- Feb 2017
Graphics processing units (GPUs) are being adopted in many computing facilities given their extraordinary computing power, which makes it possible to accelerate many general purpose applications from different domains. However, GPUs also present several side effects, such as increased acquisition costs as well as larger space requirements. They also require more powerful energy supplies. Furthermore, GPUs still consume some amount of energy while idle, and their utilization is usually low for most workloads. In a similar way to virtual machines, the use of virtual GPUs may address the aforementioned concerns. In this regard, the remote GPU virtualization mechanism allows an application being executed in a node of the cluster to transparently use the GPUs installed at other nodes. Moreover, this technique allows to share the GPUs present in the computing facility among the applications being executed in the cluster. In this way, several applications being executed in different (or the same) cluster nodes can share 1 or more GPUs located in other nodes of the cluster. Sharing GPUs should increase overall GPU utilization, thus reducing the negative impact of the side effects mentioned before. Reducing the total amount of GPUs installed in the cluster may also be possible. In this paper, we explore some of the benefits that remote GPU virtualization brings to clusters. For instance, this mechanism allows an application to use all the GPUs present in the computing facility. Another benefit of this technique is that cluster throughput, measured as jobs completed per time unit, is noticeably increased when this technique is used. In this regard, cluster throughput can be doubled for some workloads. Furthermore, in addition to increase overall GPU utilization, total energy consumption can be reduced up to 40%. This may be key in the context of exascale computing facilities, which present an important energy constraint. Other benefits are related to the cloud computing domain, where a GPU can be easily shared among several virtual machines. Finally, GPU migration (and therefore server consolidation) is one more benefit of this novel technique.
The use of accelerators, such as graphics processing units (GPUs), to reduce the execution time of compute-intensive applications has become popular during the past few years. These devices increment the computational power of a node thanks to their parallel architecture. This trend has led cloud service providers as Amazon or middlewares such as OpenStack to add virtual machines (VMs) including GPUs to their facilities instances. To fulfill these needs, the guest hosts must be equipped with GPUs which, unfortunately, will be barely utilized if a non GPU-enabled VM is running in the host. The solution presented in this work is based on GPU virtualization and shareability in order to reach an equilibrium between service supply and the ap-plications' demand of accelerators. Concretely, we propose to decouple real GPUs from the nodes by using the virtualization technology rCUDA. With this software configuration, GPUs can be accessed from any VM avoiding the need of placing a physical GPUs in each guest host. Moreover, we study the viability of this approach using a public cloud service configuration, and we develop a module for OpenStack in order to add support for the virtualized devices and the logic to manage them. The results demonstrate this is a viable configuration which adds flexibility to current and well-known cloud solutions.
La notable evolución que han sufrido las unidades de procesamiento gráfico (GPUs), unido a la buena relación coste/prestaciones que ofrecen y también a la excelente relación prestaciones/energía que presentan, ha hecho que la computación basada en estos dispositivos se haya generalizado en la actualidad. Sin embargo, aunque las GPUs presentan numerosas ventajas, también tienen algunos inconvenientes. Uno de ellos es que, en general, presentan una baja utilización. Con el fin de aumentar la utilización de estos aceleradores se han creado diversos entornos de virtualización de GPUs. Entre ellos destaca rCUDA por ser el más moderno y proporcionar las mejores prestaciones. rCUDA permite a un proceso que se esté ejecutando en un nodo del cluster usar GPUs remotas que se encuentras en otro nodo. No obstante, al entorno de virtualización de GPUs debe acompañarle el correspondiente planificador de trabajos del cluster, como SLURM, el cual necesita ser extendido para que pueda planificar de forma adecuada el uso de las GPUs remotas. En este trabajo presentamos un estudio en el que extendemos SLURM para que utilice diferentes políticas para asignar GPUs remotas a trabajos. La evaluación de prestaciones se ha llevado a cabo en un cluster compuesto por 9 nodos interconectados por InfiniBand FDR. Cada nodo posee una GPU NVIDIA Tesla K20.
SLURM is a resource manager that can be lever-aged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing units (GPUs). Concretely, although SLURM can use a generic resource plug-in (GRes) to manage GPUs, with this solution the hardware accelerators can only be accessed by the job that is in execution on the node to which the GPU is attached. This is a serious constraint for remote GPU virtualization technologies, which aim at providing a user-transparent access to all GPUs in cluster, independently of the specific location of the node where the application is running with respect to the GPU node. In this work we introduce a new type of device in SLURM, "rgpu", in order to gain access from any application node to any GPU node in the cluster using rCUDA as the remote GPU virtualization solution. With this new scheduling mechanism, a user can access any number of GPUs, as SLURM schedules the tasks taking into account all the graphics accelerators available in the complete cluster. We present experimental results that show the benefits of this new approach in terms of increased flexibility for the job scheduler.
SLURM es un gestor de recursos para clusters que permite compartir una serie de recursos heterogéneos entre los trabajos en ejecución. Sin embargo, SLURM no está diseñado para compartir re-cursos tales como procesadores gráficos (GPUs). De hecho, aunque SLURM admita plugins de recursos genéricos para poder manejar GPUs estas sólo pueden ser accedidas de forma exclusiva por un trabajo en ejecución del nodo que las hospeda. Esto es un serio inconveniente para las tecnologías de virtualización de GPUs remotas, cuya misión es proporcionar al usuario un acceso completamente transparente a todas las GPUs del cluster , independientemente de la ubicación concreta, tanto del trabajo como de la GPU. En este trabajo presentamos un nuevo tipo de dis-positivo en SLURM, "rgpu", para conseguir que una aplicación desde su nodo acceda a cualquier GPU del cluster haciendo uso de la tecnología de virtualización de GPUs remotas, rCUDA. Además, con este nuevo mecanismo de planificación, un trabajo puede utilizar tantas GPUs como existan en el cluster , siempre y cuando estén disponibles. Finalmente, presentamos los resultados de varias simulaciones que muestran los beneficios de este nuevo enfoque, en términos del incremento de la flexibilidad de planificación de trabajos.
SLURM es un planificador de recursos que permite gestionar un cluster heterogéneo, compartiendo algunos de los recursos del cluster entre los procesos que los solicitan para su ejecución. Sin embargo SLURM no está capacitado para compartir ciertos recursos genéricos, como las GPUs, entre los nodos, como lo hace con las CPUs, ya que el planifi- cador gestiona el uso de GPUs pero sólo pueden ser utilizadas por el nodo en el que se encuentran fíısicamente instaladas. Esta caracteríıstica de SLURM se convierte en una limitación cuando se emplean soluciones de virtualización de GPUs como rCUDA, cuyo propósito es proporcionar acceso transparente a las GPUs de un cluster aunque éstas estén instaladas en otro nodo. Para hacer compatible la planificación de SLURM con el funcionamiento de rCUDA, se ha creado un nuevo recurso compartido, la rgpu y se ha añadido al código de SLURM la lógica necesaria para que acepte este nuevo recurso y sea capaz de tratarlo como los recursos no genéricos.
- Jul 2012
In recent years power consumption of high performance computing (HPC) clusters has become a growing problem due, e.g., to the economic cost of electricity, the emission of carbon dioxide (with negative impact on the environment), and the generation of heat (which reduces hardware reliability). In past work, we developed EnergySaving cluster, a software package that regulates the number of active nodes in an HPC facility to match the users' demands. In this paper, we extend this work by presenting a simulator for this tool that allows the evaluation and analysis of the benefits of applying different energy-saving strategies and policies, under realistic workloads, to different cluster configurations.
In recent years power consumption of high performance computing (HPC) clusters has become a growing problem due, e.g., to the economic cost of electricity, the emission of carbon dioxide (with negative impact on the environment), and the generation of heat (which reduces hardware reliability). In past work, we developed EnergySaving cluster, a software package that regulates the number of active nodes in an HPC facility to match the user's current demands. In this paper, we extend this work by presenting a simulator for this tool that allows the evaluation and analysis of the benefits of applying different energy-saving strategies and rules, under realistic workloads, to different cluster configurations.