João M P Cardoso

João M P Cardoso
University of Porto | UP · Departamento de Engenharia Informática

PhD

About

266
Publications
32,865
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,843
Citations
Additional affiliations
September 2008 - present
University of Porto
Position
  • Professor (Associate)

Publications

Publications (266)
Preprint
Full-text available
Human Activity Recognition (HAR) has been a popular research field due to the widespread of devices with sensors and computational power (e.g., smartphones and smartwatches). Applications for HAR systems have been extensively researched in recent literature, mainly due to the benefits of improving quality of life in areas like health and fitness mo...
Article
This article presents Kadabra, a Java source-to-source compiler that allows users to make code queries, code analysis and code transformations, all user-programmable using the domain-specific language LARA. We show how Kadabra can be used as the basis for developing a runtime autotuning and adaptivity framework, able to adapt existing source Java c...
Article
As applications move to the edge, efficiency in computing power and power/energy consumption is required. Heterogeneous computing promises to meet these requirements through application-specific hardware accelerators. Runtime adaptivity might be of paramount importance to realize the potential of hardware specialization, but further study is requir...
Article
Full-text available
MATLAB is a software based analysis environment that supports a high-level programing language and is widely used to model and analyze systems in various domains of engineering and sciences. Traditionally, the analysis of MATLAB models is done using simulation and debugging/testing frameworks. These methods provide limited coverage due to their inh...
Article
Full-text available
Human Activity Recognition is focused on the use of sensing technology to classify human activities and to infer human behavior. While traditional machine learning approaches use hand-crafted features to train their models, recent advancements in neural networks allow for automatic feature extraction. Auto-encoders are a type of neural network that...
Article
Full-text available
The kNN machine learning method is widely used as a classifier in Human Activity Recognition (HAR) systems. Although the kNN algorithm works similarly both online and in offline mode, the use of all training instances is much more critical online than offline due to time and memory restrictions in the online mode. Some methods propose decreasing th...
Article
Full-text available
Directive-driven programming models, such as OpenMP, are one solution for exploring the potential parallelism when targeting multicore architectures. Although these approaches significantly help developers, code parallelization is still a non-trivial and time-consuming process, requiring parallel programming skills. Thus, many efforts have been mad...
Article
Full-text available
High Level Synthesis (HLS) tools targeting Field Programmable Gate Arrays (FPGAs) aim to provide a method for programming these devices via high-level abstractions. Initially, HLS support for FPGAs focused on compiling C/C++ to hardware circuits. This raised the issue of determining the programming practices which resulted in the best performing ci...
Article
Full-text available
This article presents Clava, a Clang-based source-to-source compiler, that accepts scripts written in LARA, a JavaScript-based DSL with special constructs for code queries, analysis and transformations. Clava improves Clang’s source-to-source capabilities by providing a more convenient and flexible way to analyze, transform and generate C/C++ code,...
Article
Developing and optimizing software applications for high performance and energy efficiency is a very challenging task, even when considering a single target machine. For instance, optimizing for multicore-based computing systems requires in-depth knowledge about programming languages, application programming interfaces, compilers, performance tunin...
Article
In order to take advantage of the processing power of current computing platforms, programmers typically need to develop software versions for different target devices. This task is time‐consuming and requires significant programming and computer architecture expertise. A possible and more convenient alternative is to start with a single high‐level...
Article
Full-text available
In this article, we focus on the acceleration of a chemical reaction simulation that relies on a system of stiff ordinary differential equation (ODEs) targeting heterogeneous computing systems with CPUs and field-programmable gate arrays (FPGAs). Specifically, we target an essential kernel of the coupled chemistry aerosol-tracer transport model to...
Data
This dataset gathers product information for desktop processor devices (and a small subset of mobile devices), ranging from the years of 1970 to 2019. The data were gathered from several sources, which can be found under folder "sources". Sources include: CPUDB from Stanford University (http://cpudb.stanford.edu/), data gathered from AMD and Inte...
Article
Full-text available
The breakdown of Dennard scaling has resulted in a decade-long stall of the maximum operating clock frequencies of processors. To mitigate this issue, computing shifted to multi-core devices. This introduced the need for programming flows and tools that facilitate the expression of workload parallelism at high abstraction levels. However, not all w...
Chapter
Human Activity Recognition is a machine learning task for the classification of human physical activities. Applications for that task have been extensively researched in recent literature, specially due to the benefits of improving quality of life. Since wearable technologies and smartphones have become more ubiquitous, a large amount of informatio...
Chapter
The Classifier kNN is largely used in Human Activity Recognition systems. Research efforts have proposed methods to decrease the high computational costs of the original kNN by focusing, e.g., on approximate kNN solutions such as the ones relying on Locality-sensitive Hashing (LSH). However, embedded kNN implementations need to address the target d...
Chapter
Smartphones are increasingly present in human’s life. For example, for entertainment many people use their smartphones to watch videos or listen to music. Many users, however, stream or play videos with the intention to only listen to the audio track. This way, some battery energy, which is critical to most users, is unnecessarily consumed thus and...
Chapter
Nowadays most people carry a smartphone with built-in sensors (e.g., accelerometers, gyroscopes) capable of providing useful data for Human Activity Recognition (HAR). Machine learning classification methods have been intensively researched and developed for HAR systems, each with different accuracy and performance levels. However, acquiring sensor...
Article
Incorporating speed probability distribution to the computation of the route planning in car navigation systems guarantees more accurate and precise responses. In this paper, we propose a novel approach for dynamically selecting the number of samples used for the Monte Carlo simulation to solve the Probabilistic Time-Dependent Routing (PTDR) proble...
Article
Full-text available
This article presents Nonio, a modular, easy-to-use, design space exploration framework focused on exploring custom combinations of compiler flags and compiler sequences. We describe the framework and discuss its use with two of the most popular compiler toolchains, GCC and Clang+LLVM. Particularly, we discuss implementation details in the context...
Article
Full-text available
Improving execution time and energy efficiency is needed for many applications and usually requires sophisticated code transformations and compiler optimizations. One of the optimization techniques is memoization, which saves the results of computations so that future computations with the same inputs can be avoided. In this article we present a fr...
Article
The ANTAREX project relies on a Domain Specific Language (DSL) based on Aspect Oriented Programming (AOP) concepts to allow applications to enforce extra functional properties such as energy-efficiency and performance and to optimize Quality of Service (QoS) in an adaptive way. The DSL approach allows the definition of energy-efficiency, performanc...
Preprint
Full-text available
The ANTAREX project relies on a Domain Specific Language (DSL) based on Aspect Oriented Programming (AOP) concepts to allow applications to enforce extra functional properties such as energy-efficiency and performance and to optimize Quality of Service (QoS) in an adaptive way. The DSL approach allows the definition of energy-efficiency, performanc...
Preprint
Incorporating speed probability distribution to the computation of the route planning in car navigation systems guarantees more accurate and precise responses. In this paper, we propose a novel approach for dynamically selecting the number of samples used for the Monte Carlo simulation to solve the Probabilistic Time-Dependent Routing (PTDR) proble...
Chapter
Iterative compilation focused on specialized phase orders (i.e., custom selections of compiler passes and orderings for each program or function) can significantly improve the performance of compiled code. However, phase ordering specialization typically needs to deal with large solution space. A previous approach, evaluated by targeting an x86 CPU...
Preprint
Full-text available
Human activity recognition (HAR) is a classification task that aims to classify human activities or predict human behavior by means of features extracted from sensors data. Typical HAR systems use wearable sensors and/or handheld and mobile devices with built-in sensing capabilities. Due to the widespread use of smartphones and to the inclusion of...
Preprint
Full-text available
Automatic compiler phase selection/ordering has traditionally been focused on CPUs and, to a lesser extent, FPGAs. We present experiments regarding compiler phase ordering specialization of OpenCL kernels targeting a GPU. We use iterative exploration to specialize LLVM phase orders on 15 OpenCL benchmarks to an NVIDIA GPU. We analyze the generated...
Article
The use of specialized accelerator circuits is a feasible solution to address performance and energy issues in embedded systems. This paper extends a previous field-programmable gate array-based approach that automatically generates pipelined customized loop accelerators (CLAs) from runtime instruction traces. Despite efficient acceleration, the ap...
Article
Usually, Aspect-Oriented Programming (AOP) languages are an extension of a specific target programming language (e.g., AspectJ for Java and AspectC++ for C++). Although providing AOP support with target language extensions may ease the adoption of an approach, it may impose constraints related with constructs and semantics. Furthermore, by tightly...
Preprint
Full-text available
Compiler writers typically focus primarily on the performance of the generated program binaries when selecting the passes and the order in which they are applied in the standard optimization levels, such as GCC -O3. In some domains, such as embedded systems and High-Performance Computing (HPC), it might be sometimes acceptable to slowdown computati...
Conference Paper
Designing and optimizing applications for energy-efficient High Performance Computing systems up to the Exascale era is an extremely challenging problem. This paper presents the toolbox developed in the ANTAREX European project for autotuning and adaptivity in energy efficient HPC systems. In particular, the modules of the ANTAREX toolbox are descr...
Conference Paper
Full-text available
Compared to general purpose programming languages , domain specific languages (DSLs) targeting adaptive embedded software development provide a very promising alternative for developing clean and error free run-time adaptations. However, the ability to use several rules in a single adaptation strategy, as allowed by some DSLs, may lead to conflicts...
Conference Paper
In the context of compiler optimizations, tuning of parameters and selection of algorithms, runtime adaptivity and autotuning are becoming increasingly important, especially due to the complexity of applications, workloads, computing devices and execution environments. For identifying and specifying adaptivity, different phases are required: analys...
Chapter
Research in compiler pass phase ordering (i.e., selection of compiler analysis/transformation passes and their order of execution) has been mostly performed in the context of CPUs and, in a small number of cases, FPGAs. In this paper we present experiments regarding compiler pass phase ordering specialization of OpenCL kernels targeting NVIDIA GPUs...
Conference Paper
Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not tr...
Conference Paper
Since the introduction of Single Instruction Multiple Thread (SIMT) GPU architectures, vectorization has seldom been recommended. However, for efficient use of 8-bit and 16-bit data types, vector types are necessary even on these GPUs. When only integer types were natively supported in sizes of less than 32-bits, the usefulness of vectors was limit...
Conference Paper
Automatic parallelization of sequential code has become increasingly relevant in multicore programming. In particular, loop parallelization continues to be a promising optimization technique for scientific applications, and can provide considerable speedups for program execution. Furthermore, if we can verify that there are no true data dependencie...
Chapter
This chapter focuses on the need to control and guide design flows to achieve efficient application implementations. The chapter highlights the importance of starting and dealing with high abstraction levels when developing embedded computing applications. The MATLAB language is used to illustrate typical development processes, especially when high...
Chapter
This chapter describes additional topics, such as design space exploration (DSE), hardware/software codesign, runtime adaptability, and performance/energy autotuning (offline and online). More specifically, this chapter provides a starting point for developers needing to apply these concepts, especially in the context of high-performance embedded c...
Chapter
This chapter covers code retargeting for heterogeneous platforms, including the use of directives and DSLs specific to GPUs as well as FPGA-based accelerators. This chapter highlights important aspects when mapping computations described in C/C++ programming language to GPUs and FPGAs using OpenCL and high-level synthesis tools, respectively. We co...
Chapter
This chapter provides an overview of the main concepts and representative computer architectures for high-performance embedded computing systems. As the multicore and many-core trends are shaping the organization of computing devices on all computing domain spectrums, from high-performance computing (HPC) to embedded computing, this chapter briefly...
Chapter
This chapter focuses on the problem of mapping applications to CPU-based platforms, covering general code retargeting mechanisms, compiler options and phase ordering, loop vectorization, exploiting multicore/multiprocessor platforms, and cache optimizations. This chapter provides the fundamental concepts used for code targeting for shared and/or di...
Chapter
This chapter introduces embedded systems and embedded computing in general while highlighting their importance in everyday life. We provide an overview of their main characteristics and possible external environment interfaces. In addition to introducing these topics, this chapter highlights the trends in terms of target architectures and design fl...
Chapter
This chapter describes source code analysis and instrumentation techniques to uncover runtime data later used to decide about the most suitable compilation and/or execution strategy. We describe program metrics derived by static analysis and runtime profiling. This chapter introduces data dependence analysis and graph-based representations to captu...
Chapter
Describes relevant code transformations and optimizations, and how they can be used in real-life codes. We include descriptions of the various high-level code transformations and their main benefits and goals. We emphasize on the use of loop and function-based transformations for performance improvement using several illustrative examples. We inclu...
Conference Paper
Software developers have always found it difficult to adopt Field-Programmable Gate Arrays (FPGAs) as computing platforms. Recent advances in HLS tools aim to ease the mapping of computations to FPGAs by abstracting the hardware design effort via a standard OpenCL interface and execution model. However, OpenCL is a low-level programming language an...
Conference Paper
Full-text available
Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This paper presents a business intelligence approach for studying the manifestations of lack of modularity support in that kind of languages. The study is focused on MATLAB as a...
Conference Paper
MATLAB is a high-level language used in various scientific and engineering fields. Deployment of well-tested MATLAB code to production would be highly desirable, but in practice a number of obstacles prevent this, notably performance and portability. Although MATLAB-to-C compilers exist, the performance of the generated C code may not be sufficient...
Conference Paper
Hierarchical Data Format (HDF5) is a popular binary storage solution in high performance computing (HPC) and other scientific fields. It has bindings for many popular programming languages, including C++, which is widely used in the HPC field. Its C++ API requires mapping of the native C++ data types to types native to the HDF5 API. This task can b...
Conference Paper
Usually, Aspect-Oriented Programming (AOP) languages are an extension of a specific target language (e.g., AspectJ for Java and AspectC++ for C++). This coupling can impose drawbacks such as arbitrary limitations to the aspect language. LARA is a DSL for source-to-source transformations inspired by AOP concepts, and has been designed to be independ...
Article
A summary of contributions made by significant papers from the first 25 years of the Field-Programmable Logic and Applications conference (FPL) is presented. The 27 papers chosen represent those which have most strongly influenced theory and practice in the field.
Book
Embedded Computing for High Performance: Design Exploration and Customization Using High-level Compilation and Synthesis Tools provides a set of real-life example implementations that migrate traditional desktop systems to embedded systems. Working with popular hardware, including Xilinx and ARM, the book offers a comprehensive description of techn...
Conference Paper
This paper describes the mapping and the acceleration of an object detection algorithm on a multiprocessor system based on an FPGA. We use HOG (Histogram of Oriented Gradients), one of the most popular algorithms for detection of different classes of objects and currently being used in smart embedded systems. The use of HOG on such systems requires...
Conference Paper
Object detection in images is a computing demanding task which usually needs to deal with the detection of different classes of objects, and thus requiring variations and adaptations easily provided by software solutions. Object detection algorithms are being part of real-time smarter embedded systems, such as automotive, medical, robotics and secu...
Article
Many embedded applications process large amounts of data using regular computational kernels, amenable to acceleration by specialized hardware coprocessors. To reduce the significant design effort, the dedicated hardware may be automatically generated, usually starting from the application's source or binary code. This paper presents a moduloschedu...
Chapter
The compilation of high-level languages, such as software programming languages, to FPGAs is of paramount importance for the mainstream adoption of FPGAs. An efficient compilation process will improve designer productivity and will make the use of FPGA technology viable for software programmers. When targeting the hardware resources provided by FPG...
Conference Paper
Nowadays compilers include tens or hundreds of optimization passes, which makes it difficult to find sequences of optimizations that achieve compiled code more optimized than the one obtained using typical compiler options such as -O2 and -O3. The problem involves both the selection of the compiler passes to use and their ordering in the compilatio...
Article
Nowadays compilers include tens or hundreds of optimization passes, which makes it difficult to find sequences of optimizations that achieve compiled code more optimized than the one obtained using typical compiler options such as -O2 and -O3. The problem involves both the selection of the compiler passes to use and their ordering in the compilatio...
Conference Paper
Many fields of engineering, science and finance use models that are developed and validated in high-level languages such as MATLAB. However, when moving to environments with resource constraints or portability challenges, these models often have to be rewritten in lower-level languages such as C. Doing so manually is costly and error-prone, but aut...
Chapter
A common use for Field-Programmable Gate Arrays (FPGAs) is the implementation of hardware accelerators. A way of doing so is to specify the internal logic of such accelerators by using Hardware Description Languages (HDLs). However, HDLs rely on the expertise of developers and their knowledge about hardware development with FPGAs. Regarding this, e...
Conference Paper
The ANTAREX project aims at expressing the application self-adaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as we...
Article
A large number of compiler optimizations are nowadays available to users. These optimizations interact with each other and with the input code in several and complex ways. The sequence of application of optimization passes can have a significant impact on the performance achieved. The effect of the optimizations is both platform and application dep...
Article
Abstract In recent years, there has been increasing interest in using task-level pipelining to accelerate the overall execution of applications mainly consisting of producer/consumer tasks. This paper proposes fine- and coarse-grained data synchronization approaches to achieve pipelining execution of producer/consumer tasks in FPGA-based multicore...
Article
Full-text available
This paper describes MATISSE, a compiler able to translate a MATLAB subset to C targeting embedded systems. MATISSE uses LARA, an aspect-oriented programming language, to specify additional information and transformations to the input MATLAB code, for example, insertion of code for initialization of variables, and specification of types and shapes...
Book
This book constitutes the proceedings of the 29th International Conference on Architecture of Computing Systems, ARCS 2016, held in Nuremberg, Germany, in April 2016. The 29 full papers presented in this volume were carefully reviewed and selected from 87 submissions. They were organized in topical sections named: configurable and in-memory acceler...