Leonel Sousa

Leonel Sousa
  • PhD (1996), Habilitation (2004), Universidade de Lisboa, Portugal
  • Professor (Full) at IST, Universidade de Lisboa, Portugal

About

438
Publications
92,075
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,865
Citations
Current institution
IST, Universidade de Lisboa, Portugal
Current position
  • Professor (Full)
Additional affiliations
March 2003 - September 2003
February 1992 - December 2012
Technical University of Lisbon
January 1987 - December 2012
Inesc-ID
Education
January 1992 - June 1996
Technical University of Lisbon
Field of study
  • Computer Engineering

Publications

Publications (438)
Article
This article presents an energy-efficient serial multiply-accumulate (MAC) unit based on the most significant digit first (MSDF) approach, specifically aimed at neural networks operating in resource-constrained and low-energy environments. The proposed MAC unit has been integrated into two neural network architectures: a multilayer perceptron (MLP)...
Preprint
Full-text available
This paper proposes hardware converters for the microscaling format (MX-format), a reduced representation of floating-point numbers. We present an algorithm and a memory-free hardware model for converting 32 single-precision floating-point numbers to MX-format. The proposed model supports six different types of MX-format: E5M2, E4M3, E3M2, E2M3, E2...
Article
Full-text available
Understanding the genetic basis of complex diseases is one of the most important challenges in current precision medicine. To this end, Genome-Wide Association Studies aim to correlate Single Nucleotide Polymorphisms (SNPs) to the presence or absence of certain traits. However, these studies do not consider interactions between several SNPs, known...
Article
Full-text available
Online arithmetic, where computations are performed from the most significant digit first, has shown benefits in improving throughput and latency within high-performance computing. This computational mode is particularly beneficial for intensive data processing in sequential operations, enabling streaming computations. One such intensive operation...
Conference Paper
Full-text available
This paper proposes a method for hardware integer division by a constant, based only on combinational logic, i.e. without requiring storage and feedback in calculations. The proposed scheme for division consists of adders and encoders, where encoders are systems of Boolean functions. The proposed divisor provides at the output the quotient and the...
Chapter
The growing need for inference on edge devices brings with it a necessity for efficient hardware, optimized for particular computational kernels, such as Sparse Matrix-Vector Multiplication (SpMV). With the RISC-V Instruction Set Architecture (ISA) providing unprecedented freedom to hardware designers, there is now a greater opportunity to tailor t...
Article
Computer architects employ a series of performance optimizations at the micro-architecture level. These optimizations are meant to be invisible to the programmer but they are implicitly programmed alongside the architectural state. Critically, the incorrect results of these optimizations are not scrubbed off the micro-architectural state. This side...
Chapter
A new technique for designing efficient modular multipliers based on splitting the operands is proposed. The operands are segmented into small range sub-vectors, which are processed as systems of Boolean functions. Minimization of Boolean functions includes two- and multi-level minimizations using standard tools, like ABC, Espresso, and others. Exp...
Article
Full-text available
One of the most challenging problems in cloud datacenters is the degradation of performance and energy efficiency due to the overutilization of hosts and their exposition to excessive workload. Virtual machine (VM) consolidation and migration from one host to another are strategies that have been proven to successfully bring about performance impro...
Conference Paper
Full-text available
This paper proposes a new technique for designing efficient modular multipliers based on splitting the operands. Operands are segmented into small range sub-vectors, which are processed as systems of Boolean functions. Minimization of Boolean functions include two-and multi-level minimizations using standard tools, as ABC, Espresso and others. We c...
Article
Full-text available
The combination of cloud computing with the Internet of Things has made fundamental changes in areas from industry, healthcare, traffic, and transportation to home appliances and even personal lives. Billions of devices and users are connected through these platforms disseminating enormous amounts of data leading to performance degradation, which h...
Preprint
Developments in Genome-Wide Association Studies have led to the increasing notion that future healthcare techniques will be personalized to the patient, by relying on genetic tests to determine the risk of developing a disease. To this end, the detection of gene interactions that cause complex diseases constitutes an important application. Similarl...
Article
Infrastructure-as-a-Service (IaaS) clouds not only have to meet business requirements but also need to consider other important metrics that influence the quality of service, such as performance, availability, and power consumption. In this paper, we present a Stochastic Activity Network (SAN) based analytical approach that simultaneously computes...
Chapter
Low-precision floating-point formats are used to enhance performance by providing the minimal precision required for the application’s requirements. General-purpose formats, such as IEEE Half-Precision floating-point (FP) format, are not practical for all applications. When dealing with small bit widths, careful adjustment of precision and dynamic...
Preprint
Full-text available
Increased attention to RISC-V in Cloud, Data Center, Automotive and Networking applications, has been fueling the move of RISC-V to the high-performance computing scenario. However, lack of powerful performance monitoring tools will result in poorly optimized applications and, consequently, a limited computing performance. While the RISC-V ISA alre...
Article
Recently, there has been much interest in the use of convolutional neural networks (CNN) for mobile user localization in massive multiple-input multiple-output (MIMO) systems operating at millimeter wave (mmWave) frequencies. However, current CNN-based approaches cannot predict the confidence interval bounds for the localization accuracy. While the...
Article
Optimization problems are becoming increasingly difficult challenges as a result of the definition of more realistic formulations and the availability of larger input data. Fortunately, the computing capabilities of state-of-the-art heterogeneous systems represent an opportunity to deal with the main complexity factors of these problems. These plat...
Article
This paper proposes a novel framework to design efficient variable-latency speculative adders based on a method that mixes serial/parallel prefix structures. In comparison to the conventional speculative parallel prefix adders, the proposed method eliminates the dependency of error signals, and the corresponding late completion error correction. In...
Article
Continuous enhancements and diversity in modern multi-core hardware, such as wider and deeper core pipelines and memory subsystems, bring to practice a set of hard-to-solve challenges when modeling their upper-bound capabilities and identifying the main application bottlenecks. Insightful roofline models are widely used for this purpose, but the ex...
Article
Full-text available
The substitution of nucleotides at specific positions in the genome of a population, known as single-nucleotide polymorphisms (SNPs), has been correlated with a number of important diseases. Complex conditions such as Alzheimer's disease or Crohn's disease are significantly linked to genetics when the impact of multiple SNPs is considered. SNPs oft...
Article
Arithmetic plays a major role in a computer's performance and efficiency. Building new computing platforms supported by the traditional binary arithmetic and silicon-based technologies to meet the requirements of today's applications is becoming increasingly more challenging, regardless whether we consider embedded devices or high-performance compu...
Article
This paper investigates the performance of epidemic routing in mobile social networks considering several communities which are frequently visited by nodes. To this end, a monolithic Stochastic Reward Net (SRN) is proposed to evaluate the delivery delay and the average number of transmissions under epidemic routing by considering skewed location vi...
Book
This book constitutes the proceedings of the 27th International Conference on Parallel and Distributed Computing, Euro-Par 2021, held in Lisbon, Portugal, in August 2021. The conference was held virtually due to the COVID-19 pandemic. The 38 full papers presented in this volume were carefully reviewed and selected from 136 submissions. They deal wi...
Article
Full-text available
Technological trends alongside with the unprecedented growth of the data generated by devices sparsely distributed, most of them mobile devices, cannot be supported by traditional approaches and processing systems. The requirement for computations at the edge are very stringent in terms of security, bandwidth, computational speed, latency and power...
Article
Full-text available
The analysis of complex biological datasets beyond DNA scenarios is gaining increasing interest in current bioinformatics. Particularly, protein sequence data introduce additional complexity layers that impose new challenges from a computational perspective. This work is aimed at investigating GPU solutions to address these issues in a representati...
Chapter
A Single Nucleotide Polymorphism (SNP) is a DNA variation occurring when a single nucleotide differs between individuals of a species. Some conditions can be explained with a single SNP. However, the combined effect of multiple SNPs, known as epistasis, allows to better correlate genotype with a number of complex traits. We propose a highly optimiz...
Article
Inter-algorithm cooperative approaches are increasingly gaining interest as a way to boost the search capabilities of evolutionary algorithms (EAs). However, the growing complexity of real-world optimization problems demands new cooperative designs that implement performance-driven strategies to improve the solution quality. This article explores m...
Chapter
Epistasis detection represents a fundamental problem in bio-medicine to understand the reasons for occurrence of complex phenotypic traits (diseases) across a population of individuals. Exhaustively examining all possible interactions of multiple Single-Nucleotide Polymorphisms provides the most reliable way to identify accurate solutions, but it i...
Article
Full-text available
This article presents a parallelism exploration over the depth modeling mode 1 (DMM-1) encoding algorithm of the 3D high-efficiency video coding (3D-HEVC) standard and applied the proposed solutions in a multicore central processing unit (CPU) and two graphics processor unities (GPU). The article evaluates efficient parallel algorithms for DMM-1, w...
Article
Over the last years positioning systems have become increasingly pervasive, covering most of the planet’s surface. Although they are accurate enough for a large number of uses, their precision, power consumption, and hardware requirements establish the limits for their adoption in mobile devices. In this paper, the energy consumption of a proposed...
Article
Full-text available
Number representation systems establish ways in which numbers are mapped to computer architectures, and how operations over the numbers are translated into computer instructions. The efficiency of public-key cryptography is strongly affected by the used number representations, as these systems are constructed from mathematically inspired problems t...
Chapter
Full-text available
The Enhanced Privacy ID (EPID) scheme is currently used for hardware enclave attestation by an increasingly large number of platforms that implement Intel Software Guard Extensions (SGX). However, the scheme currently deployed by Intel is supported on Elliptic Curve Cryptography (ECC), and will become insecure should a large quantum computer become...
Preprint
This paper investigates the performance of epidemic routing in mobile social networks. It first analyzes the time taken for a node to meet the first node of a set of nodes restricted to move in a specific subarea. Afterwards, a monolithic Stochastic Reward Net (SRN) is proposed to evaluate the delivery delay and the average number of transmissions...
Article
Full-text available
The introduction of 5G’s millimeter wave transmissions brings a new paradigm to wireless communications. Whereas physical obstacles were mostly associated with signal attenuation, their presence now adds complex, non-linear phenomena, including reflections and scattering. The result is a multipath propagation environment, shaped by the obstacles en...
Article
In the coming exascale era, the complexity of modern applications and hardware resources imposes significant challenges for boosting the efficiency via execution fine-tuning. To abstract this complexity in an intuitive way, recent application analysis tools rely on insightful modeling, e.g., Intel® Advisor with Cache-aware Roofline Model. However,...
Article
Most of the scheduling algorithms proposed for real-time embedded systems, with energy constraints, try to reduce power consumption. However, reducing the power consumption may decrease the computation speed and impact the makespan. Therefore, for real-time embedded systems, makespan and power consumption need to be considered simultaneously. Since...
Article
Full-text available
Autonomous and intelligent systems based on deep learning, continuously attract the attention of researchers and engineers. With the progress on the application of deep learning for modern applications arises the challenge of reaching real-time processing. To face this challenge, Field Programmable Gate Arrays (FPGAs) can be used; however, deep lea...
Article
Multiple studies provide evidence on the impact of certain gene interactions in the occurrence of diseases. Due to the complexity of genotype–phenotype relationships, it is required the development of highly efficient algorithmic strategies that successfully identify high-order interactions attending to different evaluation criteria. This work inve...
Article
Full-text available
The consumer electronics markets have increased the demand for high‐speed and low‐power adders with large operands to be integrated in modern portable systems. Traditional fast adder architectures, such as parallel‐prefix adders, exhibit high‐power consumption for large operands. The hybrid design is one of the most promising techniques to achieve...
Article
This paper proposes an architectural parallelism exploration for 3D-High Efficiency Video Coding (3D-HEVC) depth map intra-frame prediction. This work targets data parallelism using the pattern-and block-based approaches for the entire intra-frame prediction flow, encompassing Depth Modeling Modes (DMMs), HEVC intra-frame prediction and Depth Intra...
Article
The conversion from a Residue Number System (RNS) to a weighted representation is a costly inter-modulo operation that introduces delay and area overhead to RNS processors, while also increasing power consumption. This paper proposes a new approach to decompose the reverse conversion into operations that can be processed by the arithmetic units alr...
Article
In an ever more data-centric economy, machine learning models have risen in importance. With the large amounts of data companies collect, they are able to develop highly accurate models to predict the behaviours of their customers. It is thus important to safeguard the data used to build these models to prevent competitors from mimicking their serv...
Article
Full-text available
The Cloud-Edges (CE) framework, wherein small groups of Internet of Things (IoT) devices are serviced by local edge devices, enables a more scalable solution to IoT networks. The trustworthiness of the network may be ensured with Trusted Platform Modules (TPMs). This small hardware chip is capable of measuring and reporting a representation of the...
Article
Full-text available
With successive scaling of CMOS technology, power density and cooling costs significantly increase. Consequently, the cooling system of processors can no longer be designed for the worst‐case situation in each generation of CMOS technology and there is an essential need for run‐time techniques to control the operating temperature. Task scheduling a...
Chapter
Throughout the years, decomposition approaches have been gaining major research attraction as a promising way to solve complex multiobjective optimization problems. This work investigates the application of decomposition-based optimization techniques to address a challenging problem from the bioinformatics domain: the reconstruction of ancestral re...
Article
Full-text available
The three-moduli set {2n,2n − 1,2n+ 1 − 1} started to receive more attention lately. This moduli set is considered an arithmetic-friendly set because it avoids the demanding channel (2n + 1) of the traditional 3-moduli set {2n,2n − 1,2n + 1}. This work considers an enhanced form of the above moduli set, {2n + k,2n − 1,2n+ 1 − 1}, and proposes a sig...
Article
Residue number systems (RNSs) are efficient alternatives to positional number systems, providing fast and power-efficient computational systems. The key feature of the RNS benefitting modern embedded systems and the Internet-of-Thing (IoT) edge devices is its energy efficiency. Modular addition is the most important and frequent operation applied o...
Article
Attacks such as Meldown and Spectre have shown that traditional cloud computing isolation mechanisms are not sufficient to guarantee the confidentiality of processed data. With Fully Homomorphic Encryption (FHE), data may be processed encrypted in the cloud, making any leaked information look random to an attacker. Furthermore, a client might also...
Preprint
Full-text available
The Cloud-Edges (CE) framework, wherein small groups of Internet of Things (IoT) devices are serviced by local edge devices, enables a more scalable solution to IoT networks. The trustworthiness of the network may be ensured with Trusted Platform Modules (TPMs). This small hardware chip is capable of measuring and reporting a representation of the...
Article
Complex optimization problem solving is a constant issue in a wide range of scientific domains. Robust bioinspired procedures with accurate search capabilities are therefore required to address the challenge that such optimization problems represent. This work explores different design alternatives for the metaheuristic Multiobjective Shuffled Frog...
Article
Objective functions provide measurements of solution quality that represent the core calculations required to tackle NP-hard optimization problems. Since their complexity keeps growing with the introduction of more realistic data, research efforts have turned their interest into the proposal of efficient objective function implementations that take...
Article
Full-text available
NUMA platforms, emerging memory architectures with on-package high bandwidth memories bring new opportunities and challenges to bridge the gap between computing power and memory performance. Heterogeneous memory machines feature several performance trade-offs, depending on the kind of memory used, when writing or reading it. Finding memory performa...
Conference Paper
The interest in developing cognitive aware systems, specially for vision applications based on artificial neural networks, has grown exponentially in the last years. While high performance systems are key for the success of current Convolutional Neural Network (CNN) implementations, there is a trend to bring these capabilities to embedded real-time...
Article
In this paper, the performance of a grid resource is modeled and evaluated using stochastic reward nets (SRNs), wherein the failure–repair behavior of its processors is taken into account. The proposed SRN is used to compute the blocking probability and service time of a resource for two different types of tasks: grid and local tasks. After modelin...
Conference Paper
Nowadays, parallel metaheuristics represent one of the preferred choices to address complex optimization problems. However, one of the main problems that arise when using this kind of techniques lies on the potential emergence of load imbalance issues. In fact, the complexity of current optimization problems makes mandatory the adoption of multiple...
Article
Full-text available
Specific information about types of appliances and their use in a specific time window could help determining in details the electrical energy consumption information. However, conventional main power meters fail to provide any specific information. One of the best ways to solve these problems is through non-intrusive load monitoring, which is chea...
Chapter
When optimizing or porting applications to new architectures, a preliminary characterization is necessary to exploit the maximum computing power of the employed devices. Profiling tools are available for numerous architectures and programming models, making it easier to spot possible bottlenecks. However, for a better interpretation of the collecte...
Article
With millimeter wave wireless communications, the resulting radiation reflects on most visible objects, creating rich multipath environments, namely in urban scenarios. The radiation captured by a listening device is thus shaped by the obstacles encountered, which carry latent information regarding their relative positions. In this paper, a system...
Article
The CMOS technology scaling brings new challenges in temperature, reliability, performance and leakage power. Most of the thermal management techniques compromise performance to control thermal behavior of the system by slowing down or turning off processors. In this paper, we use Stochastic Activity Networks (SANs) to model and evaluate the power...
Chapter
In order to fulfill modern applications needs, computing systems become more powerful, heterogeneous and complex. NUMA platforms and emerging high bandwidth memories offer new opportunities for performance improvements. However they also increase hardware and software complexity, thus making application performance analysis and optimization an even...
Article
Full-text available
It is unlikely that a hacker is able to compromise sensitive data that is stored in an encrypted form. However, when data is to be processed, it has to be decrypted, becoming vulnerable to attacks. Homomorphic encryption fixes this vulnerability by allowing one to compute directly on encrypted data. In this survey, both previous and current Somewha...
Article
Full-text available
In the High Efficiency Video Coding (HEVC) standard, multiple decoding modules have been designed to take advantage of parallel processing. In particular, the HEVC in-loop filters (i.e., the deblocking filter and sample adaptive offset) were conceived to be exploited by parallel architectures. However, the type of the offered parallelism mostly sui...

Network

Cited By