Miguel Caballer

Miguel Caballer
Universitat Politècnica de València | UPV · Institute for Molecular Imaging Technologies (I3M)

PhD in Computer Science

About

69
Publications
21,459
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
820
Citations
Introduction
I obtained the B.Sc. M.Sc, and PhD degrees in Computer Science from the Universidad Politécnica de Valencia (UPV), Spain, in 2000, 2012 and 2014 respectively. I am part of the Grid and High Performance Computing group of UPV since 2001. I have participated in different research projects about the application of Grid and Cloud computing techniques to several areas of engineering. Other fields of interest include green computing.
Additional affiliations
May 2001 - present
Universitat Politècnica de València
Education
September 2012 - April 2014
Universitat Politècnica de València
Field of study
  • Computer Science
September 2008 - February 2012
Universitat Politècnica de València
Field of study
  • Computer Science
September 1995 - November 2000
Universitat Politècnica de València
Field of study
  • Computer Science

Publications

Publications (69)
Article
Full-text available
Virtual clusters are widely used computing platforms than can be deployed in multiple cloud platforms. The ability to dynamically grow and shrink the number of nodes has paved the way for customised elastic computing both for High Performance Computing and High Throughput Computing workloads. However, elasticity is typically restricted to a single...
Preprint
Full-text available
Virtual clusters are widely used computing platforms than can be deployed in multiple cloud platforms. The ability to dynamically grow and shrink the number of nodes has paved the way for customised elastic computing both for High Performance Computing and High Throughput Computing workloads. However, elasticity is typically restricted to a single...
Article
Full-text available
In this paper we propose a distributed architecture to provide machine learning practitioners with a set of tools and cloud services that cover the whole machine learning development cycle: ranging from the models creation, training, validation and testing to the models serving as a service, sharing and publication. In such respect, the DEEP-Hybrid...
Article
Full-text available
The Fusion Science Demonstrator in the European Open Science Cloud for Research Pilot Project aimed to demonstrate that the fusion community can make use of distributed cloud resources. We developed a platform, Prominence, which enables users to transparently exploit idle cloud resources for running scientific workloads. In addition to standard HTC...
Conference Paper
Full-text available
Serverless computing has introduced unprecedented levels of scalability and parallelism for the execution of High Throughput Computing tasks. This represents a challenge and an opportunity for different scientific workloads to be adapted to upcoming programming models that simplify the usage of such platforms. In this paper we introduce a serverles...
Article
Full-text available
Computer clusters are widely used platforms to execute different computational workloads. Indeed, the advent of virtualization and Cloud computing has paved the way to deploy virtual elastic clusters on top of Cloud infrastructures, which are typically backed by physical computing clusters. In turn, the advances in Green computing have fostered the...
Article
Full-text available
MapReduce is one of the most widely used programming models for analysing large-scale datasets, i.e. Big Data. In recent years, serverless computing and, in particular, Functions as a Service (FaaS) has surged as an execution model in which no explicit management of servers (e.g. virtual machines) is performed by the user. Instead, the Cloud provid...
Article
Full-text available
This article describes the development of an automated configuration of a software platform for Data Analytics that supports horizontal and vertical elasticity to guarantee meeting a specific deadline. It specifies all the components, software dependencies and configurations required to build up the cluster, and analyses the deployment times of dif...
Article
Full-text available
In the framework of the H2020 INDIGO-DataCloud project, we have implemented an advanced solution for the automatic deployment of digital data repositories based on Invenio, the digital library framework developed by CERN. Exploiting cutting-edge technologies, such as Docker and Apache Mesos, and standard specifications to describe application archi...
Presentation
Full-text available
In the framework of the H2020 INDIGO-DataCloud project we have implemented an advanced solution for the automatic deployment of digital data repositories based on Invenio, the digital library framework developed by Cern. Exploiting cutting-edge technologies, like docker and Apache Mesos, and standard interfaces like TOSCA we are able to provide a s...
Article
Full-text available
New architectural patterns (e.g. microservices), the massive adoption of Linux containers (e.g. Docker containers), and improvements in key features of Cloud computing such as auto-scaling, have helped developers to decouple complex and monolithic systems into smaller stateless services. In turn, Cloud providers have introduced serverless computing...
Article
Full-text available
Private cloud infrastructures are now widely deployed and adopted across technology industries and research institutions. Although cloud computing has emerged as a reality, it is now known that a single cloud provider cannot fully satisfy complex user requirements. This has resulted in a growing interest in developing hybrid cloud solutions that bi...
Article
Climate and biodiversity systems are closely linked across a wide range of scales. To better understand the mutual interaction between climate change and biodiversity there is a strong need for multidisciplinary skills, scientific tools, and access to a large variety of heterogeneous, often distributed, data sources. Related to that, the EUBrazilCl...
Presentation
Full-text available
In the framework of the European H2020 project “INDIGO-DataCloud” we have designed and developed an advanced solution for deploying complex data analytics platforms on distributed and heterogeneous e-infrastructures in a transparent and easy way, removing operational complexities for users. The mission of the INDIGO-DataCloud project is to provide...
Conference Paper
Full-text available
La memorizzazione e l'analisi di Big Data sono oggi tra i più importanti trend nel panorama della ricerca e dell'industria, dalla medicina alla sicurezza informatica, dalla fisica delle alte energie alle scienze sociali. Ma l'analisi dei big data si configura come un'operazione tutt'altro che semplice e richiede tecniche e tecnologie diverse da que...
Article
Scientific Workflows (SWFs) are widely used to model processes in e-Science. SWFs are executed by means of Workflow Management Systems (WMSs), which orchestrate the workload on top of computing infrastructures. The advent of cloud computing infrastructures has opened the door of using on-demand infrastructures to complement or even replace local in...
Article
Full-text available
This paper describes the adoption and extension of the TOSCA standard by the INDIGO-DataCloud project for the definition and deployment of complex computing clusters together with the required support in both OpenStack and OpenNebula, carried out in close collaboration with industry partners such as IBM. Two examples of these clusters are described...
Conference Paper
After a sequence of creation and destruction of virtual machines (VMs) in an on-premises Cloud computing platform, the scheduling decisions to host the VMs are far from being optimal and the fragmentation of the physical resources may impede the platform to host some VMs despite the free available virtualization resources. This paper describes a Vi...
Article
Scientific workflows (SWFs) are widely used to model processes in e-Science. SWFs are executed by means of workflow management systems (WMSs), which orchestrate the workload on top of computing infrastructures. The advent of cloud computing infrastructures has opened the door of using on-demand infrastructures to complement or even replace local in...
Conference Paper
A case study on climate models intercomparison data analysis addressing several classes of multi-model experiments is being implemented in the context of the EU H2020 INDIGO-DataCloud project. Such experiments require the availability of large amount of data (multi-terabyte order) related to the output of several climate models simulations as well...
Conference Paper
Full-text available
One of the challenges a scientific computing center has to face is to keep delivering well consolidated computational frameworks (i.e. the batch computing farm), while conforming to modern computing paradigms. The aim is to ease system administration at all levels (from hardware to applications) and to provide a smooth end-user experience. Within t...
Article
Full-text available
In this paper we describe the architecture of a Platform as a Service (PaaS) oriented to computing and data analysis. In order to clarify the choices we made, we explain the features using practical examples, applied to several known usage patterns in the area of HEP computing. The proposed architecture is devised to provide researchers with a unif...
Article
This paper describes the developments to produce EC3 (Elastic Cloud Computing Cluster), a tool that creates self-managed cost-efficient virtual hybrid elastic clusters on top of Infrastructure as a Service (IaaS) Clouds. Using spot instances, together with checkpointing techniques, EC3 can significantly reduce the total cost of executions while int...
Conference Paper
Full-text available
This paper describes the application of a Cloud Computing platform (ODISEA) to deploy and manage the infrastructure required to support remote computational labs across subjects that address computer-related topics such as Cloud Computing, Big Data and Scalable Architectures. ODISEA enables the lecturer to describe using a high level language the r...
Article
Full-text available
Hypervisors and Operating Systems support vertical elasticity techniques such as memory ballooning to dynamically assign the memory of Virtual Machines (VMs). However, current Cloud Management Platforms (CMPs), such as OpenNebula or OpenStack, do not currently support dynamic vertical elasticity. This paper describes a system that integrates with t...
Conference Paper
Full-text available
The scientific experimentation is facing a data deluge in which the amount of data generated is reaching the order of terabytes per day, and thus huge capacity is required to process this data. Computationally, these processes are modelled using Scientific Workflows. However, the execution of a Scientific Workflow can be a complex and resource-dema...
Conference Paper
Full-text available
This paper describes the research work in the context of the CLUVIEM project towards achieving migrat-able, self-managed virtual elastic clusters on hybrid Cloud infrastructures. These virtual clusters can span across on-premises and public Cloud infrastructures thus leveraging hybrid Cloud platforms. They are elastic since working nodes are automa...
Article
Full-text available
This paper presents a software platform to dynamically deploy complex scientific virtual computing infrastructures, on top of Infrastructure as a Service Clouds. The platform orchestrates different services to provision the virtual computing resources. It dynamically installs the appropriate software to satisfy the requirements of a researcher, bot...
Article
Full-text available
This article describes the software architecture designed to cope with the computing demand of research usage of complex data from the Imaging Biobank of the Regional Ministry of Health in the Valencia Region (CS). It proposes the use of self-configured virtual clusters on top of on-premise and public cloud infrastructures. It uses a model based on...
Article
Full-text available
The increasing interest of online learning is unquestionable nowadays, with MOOCs being taken by thousands of students. However, for online learning to go mainstream it is necessary that professors perceive that the effort required to prepare and manage an online course is manageable. Today, a myriad of inexpensive tools and services can be used to...
Conference Paper
Full-text available
Clusters of PCs are one of the most widely used computing platforms in science and engineering, supporting different programming models. However, they suffer from lack of customizability, dificult extensibility and complex workload-balancing. To this end, this work introduces virtual hybrid elastic clusters that can simultaneously harness on-premis...
Conference Paper
Full-text available
Nowadays e-Science experiments require computational and storage resources that most research centres cannot afford. Fortunately, researchers have at their disposal many distributed computing platforms where to run their experiments: clusters, supercomputers, grids and clouds. None of these platforms has showed to be the ideal choice and each of th...
Conference Paper
En este artículo se describe el uso de la plataforma ODISEA en cuatro asignaturas del Máster Universitario en Computación Paralela y Distribuida (MUCPD) de la Universitat Politècnica de València (UPV). Esta plataforma permite desplegar recursos computacionales sobre proveedores Cloud específicamente configurados para soportar actividades educativas...
Conference Paper
Full-text available
This paper presents a software platform to dynamically deploy complex scientific virtual computing infrastructures, on top of Infrastructure as a Service (IaaS) Clouds. The platform orchestrates different services to provision the virtual computing resources. It dynamically installs the appropriate software to satisfy the requirements of a research...
Conference Paper
Atomization involves complex physical processes and interaction gas-liquid. Primary atomization on diesel spray is not well understood due to the difficulties to perform experimental measurements in the near nozzle field. Hence computational fluid dynamics (CFD) has been used as a key element to understand and improve diesel spray. A recently new...
Conference Paper
Full-text available
The exponential increase of biological data as a result of improvements on Next Generation Sequencers has revealed the need of powerful hardware that can cope with it. Despite the development of several tools for dealing with this kind of experiments, the main problem of almost all of them is the lack of scalability. In order to address it, this ar...
Article
Full-text available
Cloud infrastructures are becoming an appropriate solution to address the computational needs of scientific applications. However, the use of public or on-premises Infrastructure as a Service (IaaS) clouds requires users to have non-trivial system administration skills. Resource provisioning systems provide facilities to choose the most suitable Vi...
Article
This paper presents a platform that supports the execution of scientific applications covering different programming models (such as Master/Slave, Parallel/MPI, MapReduce and Workflows) on Cloud infrastructures. The platform includes i) a high-level declarative language to express the requirements of the applications featuring software customizatio...
Article
This paper presents a general energy management system for High Performance Computing (HPC) clusters and cloud infrastructures that powers off cluster nodes when they are not being used, and conversely powers them on when they are needed. This system can be integrated with different HPC cluster middleware, such as Batch-Queuing Systems or Cloud Man...
Article
Full-text available
This paper introduces Elastic Cloud Computing Cluster (EC3), a tool that creates elastic virtual clusters on top of Infrastructure as a Service (IaaS) Clouds. The clusters are self-managed entities that scale out to a larger number of nodes on demand, up to a maximum size specified by the user. Whenever idle resources are detected, the clusters aut...
Conference Paper
Full-text available
This paper describes the migration of a scientific application, related to the structural analysis of buildings and civil engineering structures, to the Cloud. For that, two different approaches have been carried out: one of them based on the Generic Worker, a web-role implementation that manages the execution of the remote tasks in a Windows Azure...
Conference Paper
Full-text available
TRENCADIS is a Grid infrastructure to store and to process large amounts of medical images and its associated data in DICOM objects. This system enables radiologists to effectively group, search and manipulate images and structured reports in order to relate clinical findings and to be of practical value in the diagnosis and treatment of diseases....
Conference Paper
Full-text available
This paper describes an architecture to deploy scalable Software Practice Environments (SPE) to support the practice lessons that require computer resources that can be remotely accessed. The architecture enables (i) to dynamically and on-demand provision the required computing resources from different IaaS Cloud providers, (ii) to perform the auto...
Conference Paper
Full-text available
This paper addresses the impact of vertical elasticity for applications with dynamic memory requirements when running on a virtualized environment. Vertical elasticity is the ability to scale up and scale down the capabilities of a Virtual Machine (VM). In particular, we focus on dynamic memory management to automatically fit at runtime the underly...
Article
It is well known that cavitation phenomenon in diesel injector nozzles has a strong influence on the internal flow during the injection process and spray development. However, its influence on the flow during the needle opening and closing remains still unclear due to the huge difficulties related to performing experiments at partial needle lifts....
Conference Paper
Full-text available
This paper presents a general energy management system for HPC clusters and cloud infrastructures that powers off cluster nodes when they are not being used, and conversely powers them on when they are needed. This system can be integrated with different HPC cluster middleware, such as Batch-Queuing Systems or Cloud Management Systems, by using a s...
Article
In this article we present the strategies foreseen to foster the usage of the Iberian production infrastructure by regional scientific communities. The main focus is on describing the user support mechanisms implemented through a cooperative effort from the Portuguese and Spanish user support teams, and on the services and tools offered to the r...
Article
This paper compares the total cost of ownership of a physical cluster with the cost of a virtual cloud-based cluster. For that purpose, cost models for both a physical cluster and a cluster on a cloud have been developed. The model for the physical cluster takes into account previous works and incorporates a more detailed study of the costs related...
Conference Paper
Full-text available
This paper summarizes the works towards a Service Oriented Architecture to abstract the execution of scienti c applications under different programming models, with a special focus on High Throughput Computing. The platform features SLA-aware capabilities based on WSAgreements and the ability to deploy customized virtual infrastructures with suppor...
Conference Paper
Full-text available
With the advent of cloud technologies the scientists have access to different cloud infrastructures in order to deploy all the virtual machines they need to perform the computations required in their research works. This paper describes a software architecture and a description language to simplify the creation of all the needed resources, and the...
Conference Paper
Full-text available
The widespread usage of virtualization has caused a major impact in disparate areas such as scientific computing, industrial businesses and academic environments. This has led to a massive production of Virtual Machine Images (VMIs). The management of this broad spectrum of VMIs should consider the variety of operating systems, applications and hyp...
Conference Paper
Full-text available
In the last few years there have appeared different initiatives for creating workflow environments to be used in grid deployments. This paper analyzes the state of the art of the proposals for workflow applied to grid computing and proposes an alternative providing new features, focusing on multigrid and extensibility capabilities. The workflow sys...
Article
Full-text available
This paper proposes the Grid technology as an integration method of information, existing procedures and resources in the Public Administration. The exposed work supposes, from the point of view of the electronic government, an advance of future trends by means of the usage of Grid technology. On the other hand, from the perspective of Grid technol...