Guillermo L. Taboada

Guillermo L. Taboada
University of A Coruña | UDC · Department of Computer Engineering

Ph.D. Computer Engineering

About

76
Publications
32,441
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,232
Citations
Introduction
Guillermo L. Taboada currently works at the Computer Architecture Group (CAG), at the Department of Computer Engineering, University of A Coruña. Guillermo does research in Big Data, Big Data Analytics, Parallel Computing, Computer Communications (Networks) and Computer Architecture. He is the CEO of Torusware, a university spin-off devoted to transfer R&D results to industry.
Additional affiliations
August 2010 - present
University of A Coruña
Position
  • Professor (Associate)
November 2002 - present
University of A Coruña
Education
September 2004 - May 2009
University of A Coruña
Field of study
  • Computer Engineering

Publications

Publications (76)
Article
The rising interest in Java for High Performance Computing (HPC) is based on the appealing features of this language for programming multi-core cluster architectures, particularly the built-in networking and multithreading support, and the continuous increase in Java Virtual Machine (JVM) performance. However, its adoption in this area is being del...
Article
Full-text available
This paper deals with the efficiency and sustainability of urban rail transit (URT) using exploratory data analytics (EDA) and data envelopment analysis (DEA). The first stage of the proposed methodology is EDA with already available indicators (e.g., the number of stations and passengers), and suggested indicators (e.g., weekly frequencies, link o...
Article
Full-text available
This paper deals with the efficiency and sustainability of Construction and Demolition Waste (CDW) management in 30 Member States of the European Economic Area (EEA) (the 28 European Union countries plus Norway and Iceland) for the period 2010-2016 using Exploratory Data Analytics (EDA) and Data Envelopment Analysis (DEA). The first stage of the pr...
Article
Full-text available
As the memory capacity of computational systems increases, the in-memory data management of Big Data processing frameworks becomes more crucial for performance. This paper analyzes and improves the memory efficiency of Flame-MR, a framework that accelerates Hadoop applications, providing valuable insight into the impact of memory management on perf...
Article
Full-text available
Nowadays, many organizations analyze their data with the MapReduce paradigm, most of them using the popular Apache Hadoop framework. As the data size managed by MapReduce applications is steadily increasing, the need for improving the Hadoop performance also grows. Existing modifications of Hadoop (e.g., Mellanox Unstructured Data Accelerator) atte...
Conference Paper
Full-text available
The increasing adoption of Big Data analytics has led to a high demand for efficient technologies in order to manage and process large datasets. Popular MapReduce frameworks such as Hadoop are being replaced by emerging ones like Spark or Flink, which improve both the programming APIs and performance. However, few works have focused on comparing th...
Article
Full-text available
The popularity of Big Data computing models like MapReduce has caused the emergence of many frameworks oriented to High Performance Computing (HPC) systems. The suitability of each one to a particular use case depends on its design and implementation, the underlying system resources and the type of application to be run. Therefore, the appropriate...
Article
Full-text available
The ever growing needs of Big Data applications are demanding challenging capabilities which cannot be handled easily by traditional systems, and thus more and more organizations are adopting High Performance Computing (HPC) to improve scalability and efficiency. Moreover, Big Data frameworks like Hadoop need to be adapted to leverage the available...
Article
Accelerators have revolutionised the high performance computing (HPC) community. Despite their advantages, their very specific programming models and limited communication capabilities have kept them in a supporting role of the main processors. With the introduction of Xeon Phi, this is no longer true, as it can be programmed as the main processor...
Article
Providing high-performance inter-node communication is a key capability for running high performance computing applications efficiently on parallel architectures. In fact, current systems deployments are aggregating a significant number of cores interconnected via advanced networking hardware with Remote Direct Memory Access (RDMA) mechanisms, that...
Article
The advent of cloud computing technologies, which dynamically provide on-demand access to computational resources over the Internet, is offering new possibilities to many scientists and researchers. Nowadays, Infrastructure as a Service (IaaS) cloud providers can offset the increasing processing requirements of data-intensive computing applications...
Patent
Full-text available
Disclosed embodiments include a Java messaging method for efficient inter-node and intra-node communications on computer systems with multi-core processors interconnected via high-speed network interconnections. According to one embodiment, the Java messaging method accesses the high-speed networks and memory more directly and reduces message buffe...
Article
The performance and scalability of communications are key for HPC applications in current multi-core era. Despite the significant benefits (e.g., produc-tivity, multithreading) of Java for parallel program-ming, its poor communications support has hindered its adoption in HPC. This paper presents FastMPJ (http://fastmpj.com), an efficient Message-P...
Article
The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. Th...
Article
This paper presents a Java implementation of the recently published MPI 3.0 nonblocking message passing collectives in order to analyze and assess the feasibility of taking advantage of these operations in shared memory systems using Java. Nonblocking collectives aim to exploit the overlapping between computation and communication for collective op...
Article
This paper presents the high-performance computing (HPC) support of jModelTest2, the most popular bioinformatic tool for the statistical selection of models of DNA substitution. As this can demand vast computational resources, especially in terms of processing power, jModelTest2 implements three parallel algorithms for model selection: (1) a multit...
Article
Full-text available
The selection of models of nucleotide substitution is one of the major steps of modern phylogenetic analysis. Different tools exist to accomplish this task, among which jModelTest 2 (jMT2) is one of the most popular. Still, in order to deal with large DNA alignments with hundreds or thousands of loci, users of jMT2 need to have access to High Perfo...
Article
Cloud computing is posing several challenges, such as security, fault tolerance, access interface singularity, and network constraints, both in terms of latency and bandwidth. In this scenario, the performance of communications depends both on the network fabric and its efficient support in virtualized environments, which ultimately determines the...
Article
Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well as applications, can be dynamically provisioned on a pay-per-use basis. This paper presents a thorough evaluation of the I/O storage subsy...
Article
Servet is a suite of benchmarks focused on extracting a set of parameters with high influence on the overall performance of multicore clusters. These parameters can be used to optimize the performance of parallel applications by adapting part of their behavior to the characteristics of the machine. Up to now the tool considered network bandwidth as...
Article
The simulation of particle dynamics is an essential method to analyze and predict the behavior of molecules in a given medium. This work presents the design and implementation of a parallel simulation of Brownian dynamics with hydrodynamic interactions for shared memory systems using two approaches: (1) OpenMP directives and (2) the Partitioned Glo...
Article
Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient exploitation of data locality, especially on hierarchical architectures like multicore clusters. However, the performance issues that ar...
Conference Paper
The presence of many-core units as accelerators has been increasing due to their ability to improve the performance of highly parallel workloads. General Purpose GPU(GPGPU) computing has allowed the graphical units to emerge as successful co-processors that can be employed to improve the performance of many different non-graphical applications with...
Article
Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications o...
Article
Amazon Web Services (AWS) is a well-known public Infrastructure-as-a-Service (IaaS) provider whose Elastic Computing Cloud (EC2) o ering includes some instances, known as cluster instances, aimed at High-Performance Computing (HPC) applications. In previous work, authors have shown that the scalability of HPC communication-intensive applications do...
Article
The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UPCBLAS, a parallel numerical library for dense mat...
Conference Paper
Full-text available
Partitioned Global Address Space (PGAS) languages offer programmers a shared memory view that increases their productivity and allow locality exploitation to obtain good performance on current large-scale distributed memory systems. UPCBLAS is a parallel numerical library for dense matrix computations using the PGAS Unified Parallel C (UPC) languag...
Article
This paper presents ibvdev a scalable and efficient low-level Java message-passing communication device over InfiniBand. The continuous increase in the number of cores per processor underscores the need for efficient communication support for parallel solutions. Moreover, current system deployments are aggregating a significant number of cores thro...
Article
This paper presents F-MPJ (Fast MPJ), a scalable and efficient Message-Passing in Java (MPJ) communication middleware for parallel computing. The increasing interest in Java as the programming language of the multi-core era demands scalable performance on hybrid architectures (with both shared and distributed memory spaces). However, current Java c...
Article
Servet is a suite of benchmarks focused on detecting a set of parameters with high influence on the overall performance of multicore systems. These parameters can be used for autotuning codes to increase their performance on multicore clusters. Although Servet has been proved to detect accurately cache hierarchies, bandwidths and bottlenecks in mem...
Article
This paper presents smdev, a shared memory communication middleware for multi-core systems. smdev provides a simple and powerful messaging API that is able to exploit the underlying multi-core architecture replacing inter-process and network-based communications by threads and shared memory transfers. The performance evaluation of smdev on several...
Article
Full-text available
The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an effcient exploitation of data locality. This paper describes the implementation of effcient parallel dense triangular solvers in the PGAS language Unified Parallel C (UPC). The solve...
Article
Since its release, the Java programming language has attracted considerable attention from the high-performance computing (HPC) community because of its portability, high programming productivity, and built-in multithreading and networking support. As a consequence, several initiatives have been taken to develop a high-performance Java message-pass...
Article
Java is a commonly used programming language, although its use in High Performance Computing (HPC) remains relatively low. One of the reasons is a lack of libraries offering specific HPC functions to Java applications. In this paper we present a Java-based framework, called DpcbTools, designed to provide a set of functions that fill this gap. It in...
Conference Paper
Statistical model selection has become an essential step for the estimation of phylogenies from DNA sequence alignments. The program jModelTest offers different strategies to identify best-fit models for the data at hand, but for large DNA alignments, this task can demand vast computational resources. This paper presents a High Performance Computin...
Conference Paper
The uptrend in the number of cores in cluster architectures underscores the need for scalable communication middleware on these systems. One of the strategies to take advantage of this increase in the available computational power is the use of efficient message-passing middleware for inter-node communications and thread-based shared memory transfe...
Conference Paper
Full-text available
Efficient data access is extremely important for many applications in HPC. In many cases, processes running in one node will need to access data held in another node, as well as access data held in some central storage device. In I/O-intensive applications, accessing data not held in the local node can become a bottleneck, especially in cases where...
Article
This paper presents a scalable and efficient Message-Passing in Java (MPJ) collective communication library for parallel computing on multi-core architectures. The continuous increase in the number of cores per processor underscores the need for scalable parallel solutions. Moreover, current system deployments are usually multi-core clusters, a hyb...
Article
Full-text available
We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. Availability: ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uv...
Conference Paper
In this work, the application of genetic algorithms to the elaboration of land use plans is studied. These plans follow the national legal rules and experts' considerations. Two optimization criteria are applied: aptitude and compactness. As the number of affected plots can be large and, consequently, the execution time of the algorithm can be pote...
Conference Paper
The use of probabilistic models of amino acid replacement is essential for the study of protein evolution, and programs like ProtTest implement different strategies to identify the best-fit model for the data at hand. For large protein alignments, this task can demand vast computational resources, preventing the justification of the model used in t...
Article
This paper reports on the design, implementation and benchmarking of a Java version of the Nas Parallel Benchmarks. We first briefly describe the implementation and the performance pitfalls. We then compare the overall performance of the Fortran MPI (PGI) version with a Java implementation using the ProActive middleware for distribution. All Java e...
Conference Paper
The growing complexity in computer system hierarchies due to the increase in the number of cores per processor, levels of cache (some of them shared) and the number of processors per node, as well as the high-speed interconnects, demands the use of new optimization techniques and libraries that take advantage of their features. In this paper Servet...
Article
Resumen En este artículo se describe nuestra experiencia en la docencia de Arquitectura e Ingeniería de Compu-tadores en el Máster en Informática de la Universi-dade da Coruña, en la cual concurrían las circuns-tancias de titulación EEES de nueva implantación y un número reducido de alumnos. La orientación pro-fesionalizante del máster nos motivó a...
Article
This paper presents a performance analysis of message-passing overhead on high-speed clusters. Communication performance is critical for the overall high-speed cluster performance. In order to analyze the communication overhead, a new linear model pro-posed in this work is used for its characterization. Performance models have been derived using ou...
Conference Paper
The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI perfo...
Conference Paper
Full-text available
1 Resumen— Este artículo presenta una biblioteca de operaciones colectivas para Unified Parallel C (UPC), un lenguaje Partitioned Global Address Spa-ce (PGAS) que permite programar una arquitectura de memoria distribuida como si fuese un sistema de memoria compartida. La biblioteca desarrollada: (1) extiende la biblioteca estándar de operaciones co...
Conference Paper
Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language that exhibits high performance and portability on a broad class of shared and distributed memory parallel architectures. This paper describes the design and implementation of a parallel numerical library for UPC built on top of the sequential BLAS routines. The develope...
Conference Paper
Java is a valuable and emerging alternative for the development of parallel applications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of both shared and distributed memory programming is an interesting option for parallel programming multi-core systems. However, the conce...
Conference Paper
This paper presents our current research efforts on efficient Java communication libraries over InfiniBand. The use of Java for network communications still delivers insufficient performance and does not exploit the performance and other special capabilities (RDMA and QoS) of high-speed networks, especially for this interconnect. In order to increa...
Conference Paper
Unified Parallel C (UPC) is an extension of ANSI C designed for parallel programming. UPC collective primitives, which are part of the UPC standard, increase programming productivity while reducing the communication overhead. This paper presents an up-to-date performance evaluation of two publicly available UPC collective implementations on three s...
Conference Paper
The rising interest in Java for High Performance Computing (HPC) is based on the appealing features of this language for programming multi-core cluster architectures, particularly the built-in networking and multithreading support, and the continuous increase in Java Virtual Machine (JVM) performance. However, its adoption in this area is being del...
Article
As size and architectural complexity of High Per-formance Computing systems increases, the need for productive programming tools and languages becomes more important. The UPC language aims to be a good choice for a produc-tive parallel programming. However, productivity is influenced not only by expressiveness of the language, but also by its perfo...
Article
The study of a language in terms of programmability is a very interesting issue in parallel programming. Traditional approaches in this field have studied different methods, such as the number of Lines of Code or the analysis of programs, in order to prove the benefits of using a paradigm compared to another. Nevertheless, these methods usually foc...
Article
Although Java is among the most used programming languages, its use for HPC applications is still marginal. This article reports on the design, implementation and benchmarking of a Java version of the NAS Parallel Benchmarks translated from their original Fortran / MPI implementation. We have based our version on ProActive, an open source middlewar...
Article
This paper presents Java Fast Sockets (JFS), an optimized Java socket implementation on clusters for high performance computing. Current socket libraries do not efficiently support high-speed cluster interconnects and impose substantial communication overhead. JFS overcomes these performance constraints by: (1) enabling high-speed communication on...
Article
About ten years after the Java Grande effort, this paper aims at providing a snapshot of the current status of Java for High Performance Computing. Multi-core chips are becoming mainstream, offering many ways for a Java Virtual Machine (JVM) to take advantage of such systems for critical tasks such as Just-In-Time compilation or Garbage Collection....
Conference Paper
This paper presents a more efficient Java remote method invocation (RMI) implementation for high-speed clusters. The use of Java for parallel programming on clusters is limited by the lack of efficient communication middleware and high-speed cluster interconnect support. This implementation overcomes these limitations through a more efficient Java...
Conference Paper
The use of Java for parallel programming on clusters re- lies on the need of efficient communication middleware and high-speed cluster interconnect support. Nevertheless, cur- rently there are no solutions that fully fulfill these issues. In this paper, a Java sockets library has been tailored to in- crease the efficiency of Java parallel applicati...
Conference Paper
This paper presents communication strategies for achieving efficient parallel and distributed Java applications on clusters with high-speed interconnects. Communication performance is critical for the overall cluster performance. Previous efforts at obtaining efficient Java communications have a limited applicability on high-speed interconnects as...
Conference Paper
This paper presents communication strategies for support- ing ecient non-blocking Java communication on clusters. The commu- nication performance is critical for the overall cluster performance. It is possible to use non-blocking communications to reduce the communica- tion overhead. Previous eorts to eciently support non-blocking com- munication i...
Conference Paper
This paper aims at designing communication strategies for parallel and distributed Java applications to obtain higher degrees of performance on clusters. Several specific approaches exist to increase the efficiency of Java communications, specially of high level APIs like RMI, although their applicability is relatively limited on clusters, since th...
Conference Paper
The use of Java for parallel programming on clusters according to the message-passing paradigm is an attractive choice. In this case, the overall application performance will largely depend on the performance of the underlying Java message-passing library. This paper evaluates, models and compares the performance of MPI-like point-to-point and coll...