Eduard Ayguadé's research while affiliated with Universitat Politècnica de Catalunya and other places

Publications (512)

Article
Full-text available
The mostcommon approach in image classification tasks is to resize all images in the dataset to a unique shape, while reducing their resolution to a size that makes experimentation at scale easier. This practice has benefits from a computational perspective, but it entails negative side-effects on performance due to loss of information and image de...
Conference Paper
Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs. Fie...
Preprint
Full-text available
Domain-Specific Languages (DSLs) improve programmers productivity by decoupling problem descriptions from algorithmic implementations. However, DSLs for High-Performance Computing (HPC) have two critical non-functional requirements: performance and scalability. This paper presents the automated process of generating, from abstract mathematical spec...
Article
Domain-Specific Languages (DSLs) improve programmers productivity by decoupling problem descriptions from algorithmic implementations. However, DSLs for High-Performance Computing (HPC) have two additional critical requirements: performance and scalability. This paper presents the automated process of generating, from abstract mathematical specific...
Article
Full-text available
Abstract The authors present the parallel‐hybrid recursive least square (PH‐RLS) algorithm for an accurate self‐mixing interferometric laser vibration sensor coupled with an accelerometer under industrial conditions. Previously, this was achieved by using a conventional RLS algorithm to cancel the parasitic vibrations where the sensor itself is not...
Article
Full-text available
Phase unwrapping is an integral part of multiple algorithms with diverse applications. Detailed phase unwrapping is also necessary for achieving high-accuracy metric sensing using laser feedback-based self-mixing interferometry (SMI). Among SMI specific phase unwrapping approaches, a technique called Improved Phase Unwrapping Method (IPUM) provides...
Conference Paper
This paper proposes the extension of task-based programming models with recurrent workloads concepts. The proposal introduces new clauses in the OmpSs task directive to efficiently model recurrent workloads. The clauses define the task period and/or the number of task body repetitions. Despite the new clauses are suitable for any device, their supp...
Article
Full-text available
This paper presents the new features of the OmpSs@FPGA framework. OmpSs is a data-flow programming model that supports task nesting and dependencies to target asynchronous parallelism and heterogeneity. OmpSs@FPGA is the extension of the programming model addressed specifically to FPGAs. OmpSs environment is built on top of Mercurium source to sour...
Article
Full-text available
In recent years, the demand for alternative medical diagnostics of the human kidney or renal is growing, and some of the reasons behind this relate to its non-invasive, early, real-time, and pain-free mechanism. The chronic kidney problem is one of the major kidney problems, which require an early-stage diagnosis. Therefore, in this work, we have p...
Conference Paper
Biological ontologies, such as the Human Phenotype Ontology (HPO) and the Gene Ontology (GO), are extensively used in biomedical research to investigate the complex relationship that exists between the phenome and the genome. The interpretation of the encoded information requires methods that efficiently interoperate between multiple ontologies pro...
Preprint
Task-based programming models are emerging as a promising alternative to make the most of multi-/many-core systems. These programming models rely on runtime systems, and their goal is to improve application performance by properly scheduling application tasks to cores. Additionally, these runtime systems offer policies to cope with application phas...
Preprint
Full-text available
In this work, we leverage ensemble learning as a tool for the creation of faster, smaller, and more accurate deep learning models. We demonstrate that we can jointly optimize for accuracy, inference time, and the number of parameters by combining DNN classifiers. To achieve this, we combine multiple ensemble strategies: bagging, boosting, and an or...
Preprint
Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per task that the runtime uses to order the tasks execution. This order is calculated using shared graphs, which a...
Chapter
Task-based programming models are emerging as a promising alternative to make the most of multi-/many-core systems. These programming models rely on runtime systems, and their goal is to improve application performance by properly scheduling application tasks to cores. Additionally, these runtime systems offer policies to cope with application phas...
Conference Paper
Full-text available
Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating complex applications is error-prone and may hinder the general adoption of these models. In this paper, we target the OmpSs-2 programming...
Preprint
Art is an expression of human creativity, skill and technology. An exceptionally rich source of visual content. In the context of AI image processing systems, artworks represent one of the most challenging domains conceivable: Properly perceiving art requires attention to detail, a huge generalization capacity, and recognizing both simple and compl...
Preprint
Full-text available
One of the major challenges in using extreme scale systems efficiently is to mitigate the impact of faults. Application-level checkpoint/restart (CR) methods provide the best trade-off between productivity, robustness, and performance. There are many solutions implementing CR at the application level. They all provide advanced I/O capabilities to m...
Article
Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per task that the runtime uses to order the tasks execution. This order is calculated using shared graphs, which a...
Article
One of the major challenges in using extreme scale systems efficiently is to mitigate the impact of faults. Application-level checkpoint/restart (CR) methods provide the best trade-off between productivity, robustness, and performance. There are many solutions implementing CR at the application level. They all provide advanced I/O capabilities to m...
Preprint
Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among tasks and a flexible data-flow execution model to exploit dynamic, irregular, and nested parallelism. On appli...
Article
Full-text available
Purpose: To assess the performance of deep learning algorithms for different tasks in retinal fundus images: (1) detection of retinal fundus images versus optical coherence tomography (OCT) or other images, (2) evaluation of good quality retinal fundus images, (3) distinction between right eye (OD) and left eye (OS) retinal fundus images,(4) detec...
Article
Application performance on novel memory systems is typically estimated using a hardware simulator. The simulation is, however, time consuming, which limits the number of design options that can be explored within a practical length of time. Also, although memory simulators are typicallywell validated, current CPU simulators have various shortcoming...
Preprint
Finding tumour genetic markers is essential to biomedicine due to their relevance for cancer detection and therapy development. In this paper, we explore a recently released dataset of chromosome rearrangements in 2,586 cancer patients, where different sorts of alterations have been detected. Using a Random Forest classifier, we evaluate the releva...
Preprint
High-resolution and variable-shape images have not yet been properly addressed by the AI community. The approach of down-sampling data often used with convolutional neural networks is sub-optimal for many tasks, and has too many drawbacks to be considered a sustainable alternative. In sight of the increasing importance of problems that can benefit...
Article
The AXIOM project aims at providing an environment for Cyber-Physical Systems. Smart Video Surveillance targets public environments, involving real-time face detection in crowds. Smart Home Living targets home environments and access control. These applications are used as experimental usecases for the AXIOM platform, currently based on the Xilinx...
Preprint
Full-text available
The purpose of feature extraction on convolutional neural networks is to reuse deep representations learnt for a pre-trained model to solve a new, potentially unrelated problem. However, raw feature extraction from all layers is unfeasible given the massive size of these networks. Recently, a supervised method using complexity reduction was propose...
Conference Paper
Full-text available
This paper summarizes our two-year study of corrected and uncor-rected errors on the MareNostrum 3 supercomputer, covering 2000 billion MB-hours of DRAM in the field. The study analyzes 4.5 million corrected and 71 uncorrected DRAM errors and it compares the reliability of DIMMs from all three major memory manufacturers, built in three different te...
Conference Paper
Application performance on novel memory systems is typically estimated using a hardware simulator. The simulation is, however, time consuming, which limits the number of design options that can be explored within a practical length of time. Also, although memory simulators are typicallywell validated, current CPU simulators have various shortcoming...
Article
The approaching end of DRAM scaling and expansion of emerging memory technologies is motivating a lot of research in future memory systems. Novel memory systems are typically explored by hardware simulators that are slow and often have a simplified or obsolete abstraction of the CPU. This study presents PROFET, an analytical model that predicts how...
Conference Paper
As we move towards exascale computing, an abstraction for effective parallel computation is increasingly needed to overcome the maintainability and portability of scientific applications while ensuring the efficient and full exploitation of high-performance systems. These circumstances require computer and domain scientists to work jointly toward a...
Preprint
Full-text available
Continual learning based on data stream mining deals with ubiquitous sources of Big Data arriving at high-velocity and in real-time. Adaptive Random Forest ({\em ARF}) is a popular ensemble method used for continual learning due to its simplicity in combining adaptive leveraging bagging with fast random Hoeffding trees. While the default ARF size p...
Article
Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. By maintaining two types of cores (fast and slow) AMCs are able to provide high performance under the facility power budget. This paper performs the first extensive evaluation of how portable are the current HPC applications for such su...
Conference Paper
Human lungs are essential respiratory organs. Different Obstructive Lung Diseases (OLD) such as bronchitis, asthma, lungs cancer etc. affects the respiration. Diagnosing OLD in the initial stage is better than diagnosing and curing them later. The delay in diagnosing OLD is due to expensive diagnosing tool and experts requirement. Therefore, a non-...
Article
Task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only runtimes introduce relevant overhead when targeting fine-grained tasks, resulting in performance losses. To overcome this drawba...
Chapter
Fundus image is an effective and low-cost tool to screen for common retinal diseases. At the same time, Deep Learning (DL) algorithms have been shown capable of achieving similar or even better performance accuracies than physicians in certain image classification tasks. One of the key aspects to improve the performance of DL models is to use data...
Chapter
Stream programming is a promising step towards portable, efficient, correct use of parallelism. A stream program is built from kernels that communicate only through point-to-point streams. The stream compiler maps a portable stream program onto the target, automatically sizing communications buffers and applying optimizing transformations such as b...
Conference Paper
Full-text available
OmpSs@FPGA is the flavor of OmpSs that allows offloading application functionality to FPGAs. Similarly to OpenMP, it is based on compiler directives. While the OpenMP specification also includes support for heterogeneous execution, we use OmpSs and OmpSs@FPGA as prototype implementation to develop new ideas for OpenMP. OmpSs@FPGA implements the tas...
Article
Full-text available
To manage power and memory wall affects, the HPC industry supports FPGA reconfigurable accelerators and vector processing cores for data-intensive scientific applications. FPGA based vector accelerators are used to increase the performance of high-performance application kernels. Adding more vector lanes does not affect the performance, if the proc...
Conference Paper
Full-text available
We introduce a community detection algorithm (Fluid Communities) based on the idea of fluids interacting in an environment, expanding and contracting as a result of that interaction. Fluid Communities is based on the propagation methodology, which represents the state-of-the-art in terms of computational cost and scalability. While being highly eff...
Article
Full-text available
The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different com...
Article
Full-text available
This paper presents a work consisting in using deep convolutional neural networks (CNNs) to facilitate the curation of brand-related social media images. The final goal is to facilitate searching and discovering user-generated content (UGC) with potential value for digital marketing tasks. The images are captured in real time and automatically anno...
Conference Paper
The community accepted the need for a detailed simulation of main memory. Currently, the CPU simulators are usually coupled with the cycle-accurate main memory simulators. However, coupling CPU and memory simulators is not a straight-forward task because some pieces of the circuitry between the last level cache and the memory DIMMs could be easily...
Article
Sampled simulation is a mature technique for reducing simulation time of single-threaded programs. Nevertheless, current sampling techniques do not take advantage of other execution models, like task-based execution, to provide both more accurate and faster simulation. Task-based programming models allow the programmer to specify program segments a...
Conference Paper
Shared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where memory accesses are issued and the storage devices containing the requested data. In this context, techniques to manage and mitigate...
Conference Paper
Stacked DRAM memories have become a reality in High-Performance Computing (HPC) architectures. These memories provide much higher bandwidth while consuming less power than traditional off-chip memories, but their limited memory capacity is insufficient for modern HPC systems. For this reason, both stacked DRAM and off-chip memories are expected to...
Article
Full-text available
Biological sequence comparison algorithms that compute the optimal local and global alignments calculate a dynamic programming (DP) matrix with quadratic time complexity. The DP matrix H is calculated with a recurrence relation in which the value of each cell Hi,j is the result of a maximum operation on the cells' values Hi-1, j-1, Hi-1, j and Hi,j...
Preprint
Full-text available
Measuring the distance between concepts is an important field of study of Natural Language Processing, as it can be used to improve tasks related to the interpretation of those same concepts. WordNet, which includes a wide variety of concepts associated with words (i.e., synsets), is often used as a source for computing those distances. In this pap...
Article
Full-text available
The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of performance and energy efficiency. In the first part of this paper, the behaviour of the 12-bit fixed-point arithme...
Article
Full-text available
With the increase in the density and performance of digital electronics, the demand for a power-efficient high-performance computing (HPC) system has been increased for embedded applications. The existing embedded HPC systems suffer from issues like programmability, scalability, and portability. Therefore, a parameterizable and programmable high-pe...
Article
Current trends and projections show that faults in computer systems become increasingly common. Such errors may be detected, and possibly corrected transparently, e.g. by Error Correcting Codes (ECC). For a program to be fault-tolerant, it needs to also handle the Errors that are Detected and Uncorrected (DUE), such as an ECC encountering too many...
Conference Paper
Fundus image is an effective and low-cost tool to screen for common retinal diseases. At the same time, Deep Learning (DL) algorithms have been shown capable of achieving similar or even better performance accuracies than physicians in certain image classification tasks. One of the key aspects to improve the performance of DL models is to use data...
Article
This paper presents the description of a compulsory parallel programming course in the bachelor degree in Informatics Engineering at the Barcelona School of Informatics, Universitat Politècnica de Catalunya UPC-BarcelonaTech. The main focus of the course is on the shared-memory programming paradigm, which facilitates the presentation of fundamental...