[Show abstract][Hide abstract] ABSTRACT: The growing concerns of power efficiency, silicon reliability, and performance scalability motivate research in the area of adaptive embedded systems, i.e., systems endowed with decisional capacity, capable of online decision making so as to meet certain performance criteria. The scope of possible adaptation strategies is subject to the targeted architecture specifics, and may range from simple scenario-driven frequency/voltage scaling to rather complex heuristic-driven algorithm selection. This paper advocates the design of distributed memory homogeneous multiprocessor systems as a suitable template for the best exploiting adaptation features, thereby tackling the aforementioned challenges. The proposed solution lies in the combined use of a typical application processor for global orchestration along with such an adaptive multiprocessor core for the handling of data-intensive computation. This paper describes an exploratory homogeneous multiprocessor template designed from the ground up for scalability and adaptation. The proposed contributions aim at increasing architecture efficiency through smart distributed control of architectural parameters such as frequency, and enhanced techniques for load balancing such as task migration and dynamic multithreading.
Full-text · Article · Aug 2013 · IEEE Transactions on Computers
[Show abstract][Hide abstract] ABSTRACT: The mapping of tasks to processing elements of an MPSoC has critical impact on system performance and energy consumption. To cope with complex dynamic behavior of applications, it is common to perform task mapping during runtime so that the utilization of processors and interconnect can be taken into account when deciding the allocation of each task. This paper has two major contributions, one of them targeting the general problem of evaluating dynamic mapping heuristics in NoC-based MPSoCs, and another focusing on the specific problem of finding a task mapping that optimizes energy consumption in those architectures.
No preview · Article · Mar 2013 · ACM Transactions on Embedded Computing Systems
[Show abstract][Hide abstract] ABSTRACT: Scalability and programmability are important issues in large homogeneous MPSoCs. Such architectures often rely on explicit message-passing among processors, each of which possessing a local private memory. This paper presents a low-overhead hardware/software distributed shared memory approach that makes such architectures multithreading-capable. The proposed solution is implemented into an open-source message-passing MPSoC through developing a POSIX-like thread API, which shows excellent scalability using application kernels used for benchmarking in shared-memory systems. This approach efficiently draws strengths from the on-chip distributed private memory that opens the way to exposing the multithreading programmability/capabilities of that component as a general-purpose accelerator.
[Show abstract][Hide abstract] ABSTRACT: This paper demonstrates that magnitude squared incoherence (MSI) analysis is efficient to localize hot spots, i.e., points at which focused electromagnetic (EM) analyses can be applied with success. It is also demonstrated that MSI may be applied to enhance differential EM analyses (DEMA) based on difference of means (DoM).
No preview · Article · Mar 2012 · IEEE Transactions on Very Large Scale Integration (VLSI) Systems
[Show abstract][Hide abstract] ABSTRACT: Ubiquitous computing is now the new computing trend, such systems that interact with their environment require self-adaptability. Bioinspiration is a natural candidate to provide the capability to handle complex and changing scenarios. This paper presents a programming framework dedicated to pervasive platforms programming. This bioinspired and agentoriented framework has been developed within the frame of the PERPLEXUS European project that is intended to provide support for bioinspiration-driven system adaptability. This framework enables the platform to adapt itself to application requirements at high-level while using hardware acceleration at node level. The resulting programming solution has been used to program three collaborative robotic applications in which robots learn tasks and evolve for achieving a better adaptation to their environment.
Preview · Article · Jan 2012 · International Journal of Distributed Sensor Networks
[Show abstract][Hide abstract] ABSTRACT: Message-passing is an increasingly popular design style for MPSoCs that usually results in systems that perform better compared to external shared-memory designs performance and power-wise, this because of much decreased data transfers with external memory. This scheme relies on explicit communications between processing tasks that participate in the application. Contrarily to shared-memory multiprocessors, tasks usually get assigned to processors at design-time. In order to cope with transient performance losses originating from various phenomena such as increased processing workload or peak traffic in the communication subsystem, various adaptation mechanisms based on task migration have been proposed in the literature. As Message-passing systems usually use PE-private memory architecture, these mechanisms imply migrating application code from processor to processor, which incurs penalty in performance and power consumption. This paper proposes a local shared-memory strategy in which processors execute code hosted in a remote processor.
[Show abstract][Hide abstract] ABSTRACT: As complexity of embedded system increases, configurable hardware is becoming more attractive because it provides a fast and efficient basis for design development. As a consequence, one of the most promising embedded architecture consists in the replication of Processing Elements (PEs) connected through a Network-on-Chip (NoC). Such architectures provide computation parallelism, scalability, and reduced design time thanks to reusability. This paper describes the development of a scalable, distributed memory, open source NoC-based platform called Open-Scale and its implementation into FPGA devices. The main objective of this platform is to provide a complete framework for research development on NoC-based distributed memory MPSoCs.
[Show abstract][Hide abstract] ABSTRACT: Adaptive multiprocessor systems are appearing as a promising solution for dealing with complex and unpredictable scenarios. Given the large variety of possible use cases that these platforms must support and the resulting workload variability, offline approaches are no longer sufficient because they do not allow coping with time changing workloads. This letter presents a novel approach based on the utilization of PI and PID controllers, widely used in control automation, for optimizing resources utilization in Multiprocessor System-on-Chip (MPSoC). Several architecture characteristics such as response time during frequency changing, noise and perturbations are modeled and validated in a high-level model and results are compared to information obtained on a homogeneous MPSoC platform prototype. Power and energy consumption figures are discussed and two controllers are proposed: 1) PI-; and 2) PID-based controllers. Results show the system capability of adapting under disturbing conditions while ensuring application performance constraints and reducing energy consumption.
Full-text · Article · Oct 2011 · IEEE embedded systems letters
[Show abstract][Hide abstract] ABSTRACT: To compensate the variability effects in advanced technologies, Process, Voltage, Temperature (PVT) monitors are mandatory to use Adaptive Voltage Scaling (AVS) or Adaptive Body Biasing (ABB) techniques. This paper describes a new monitoring system, allowing failure anticipation in real-time, looking at the timing slack of a pre-defined set of observable flip–flops. This system is made of dedicated sensor structures located near monitored flip–flop, coupled with a specific timing detection window generator, embedded within the clock-tree. Validation and performances simulated in a 45 nm low power technology, demonstrate a scalable, low power and low area system, and its compatibility with a standard CAD flow. Gains between an AVFS scheme based on those structures and a standard DVFS are given for a 32 bits VLIW DSP.
Full-text · Article · May 2011 · Microelectronics Journal
[Show abstract][Hide abstract] ABSTRACT: Side Channel Analysis (SCA) is a powerful class of attacks to extract cryptographic keys used in a wide va- riety of electronic devices that involves authentication, digital signatures or secure storage. Cryptographic systems are made up of cryptographic primitives implemented in Complementary Metal-Oxide-Semiconductor technology. But CMOS logic gates are designed to minimize their energy usage when their output does not change. Energy is consumed or dissipated mainly for the transitions '0 to 1' and '1 to 0'. Side channel information like power consumption, electromagnetic radiation or light emission, changes the conventional black box model of a cryptographic system into a gray one. By this way, its possible to extract a secret key in a couple of hours. Within this context, this paper introduces the basic concepts of some interesting CMOS logic families, from a security point-of-view, and a design of a cryptographic system based on Secure Triple Track Logic (STTL). This logic is efficient against Differential Power Analysis (DPA), and evaluated in this work against Differential Electro Magnetic Analysis (DEMA) based on Difference of Means (DoM) and Correlation Electro Magnetic Analysis (CEMA). Index Terms—Cryptographic System, Side-Channel Analysis, DEMA, CEMA, Differential Logic, Dynamic Logic, Secure Triple Track Logic
[Show abstract][Hide abstract] ABSTRACT: This paper proposes a novel strategy for enabling dynamic task mapping on heterogeneous NoC-based MPSoC architectures. The solution considers three different platforms with different area constraints and applications with distinct efficient characteristics. We propose a solution that uses a unified model-based framework, which is calibrated according to area information obtained from FPGA synthesis. Besides, we present the performance of various applications running on different processors on FPGAs aiming to obtain application efficiency characteristics for calibrating the proposed high-level model. The paper also presents three different scenarios and discusses the reduction in terms of energy consumption as well as the end-to-end communication cost for different applications such as MPEG and ADPCM, among others multimedia benchmarks.
[Show abstract][Hide abstract] ABSTRACT: The power evaluation of NoC-based MPSoCs is an important and a time-consuming process. Mapping tasks onto processing elements (PEs) has a critical impact on system performance, as well as power dissipation. To cope with complex dynamic behavior of applications, it is common to perform task mapping at runtime so that the utilization of processors and interconnect can be taken into account when deciding the most appropriate PE to host tasks. On the other hand, the process of accurately comparing different mapping heuristics can be very costly once each adopted solution has to be evaluated using simulation that can take hours or even days in the case of large MPSoCs. In this context, this paper has two major contributions: (i) evaluation of dynamic mapping by employing a model-based framework that unifies abstract models of applications, NoC-based platforms and mapping heuristics, and (ii) power consumption evaluation of such heuristics by using a rate-based power model.
[Show abstract][Hide abstract] ABSTRACT: In this paper we propose a strategy for better exploiting Multi-Processor Systems-on-Chip resources utilization by means of using a control-loop feedback mechanism. We apply the proposed techniques in a purely distributed memory MPSoC architecture that is composed of a frequency scaling module responsible for tuning the frequency of processors at run-time. Results show very promising in terms of adaptation capabilities for system with dynamic workload. Performance results demonstrate the effectiveness of the proposed approach when workload requirements for applications may vary, affecting the overall performance of the system. For validating the proposed approach we have implemented a multi-thread MJPEG decoder application and created an architecture model with/without perturbations in the system.
[Show abstract][Hide abstract] ABSTRACT: Electro-Magnetic Analysis has been identified as an efficient technique to retrieve the secret key of cryptographic algorithms. Although similar mathematically speaking, Power or Electro-Magnetic Analysis have different advantages in practice. Among the advantages of EM Analysis, the feasibility of attacking limited and bounded area of integrated systems is the key one. Within this context, the contribution of this paper is a countermeasure against local EM attack performed with tiny magnetic probes. The basic idea is to design circuits such that all datapaths and D-type Flip-Flops, involved in the computation of intermediate values of cryptographic elements, randomly change within a set of logically equivalent electrical paths that are spatially distributed within the Integrated Circuit (IC) die.
[Show abstract][Hide abstract] ABSTRACT: Dynamic reconfiguration provides attractive features such as hardware flexibility and adaptability. Unfortunately, the lack of programming tools to manage it has limited its use in current SoC. This paper presents a method to abstract dynamic reconfiguration management at design time. Dynamic hardware multiplexing is a generic principle based on a scheduler dedicated to the management of reconfigurable resources at run-time. Formal background, implementation, simulation results and validations are exposed to illustrate the contribution of this study.
No preview · Article · Dec 2010 · International Journal of Embedded Systems
[Show abstract][Hide abstract] ABSTRACT: Differential Power Analysis (DPA) is a powerful Side-Channel Attack (SCA) targeting as well symmetric as asymmetric ciphers. Its principle is based on a statistical treatment of power consumption measurements monitored on an Integrated Circuit (IC) computing cryptographic operations. A lot of works have proposed improvements of the attack, but no one focuses on ordering measurements. Our proposal consists in a statistical preprocessing which ranks measurements in a statistically optimized order to accelerate DPA and reduce the number of required measurements to disclose the key.
[Show abstract][Hide abstract] ABSTRACT: Since several years, Electromagnetic (EM) radiations of ICs are considered as source of noise and interferences from an EM Compatibility (EMC) point-of-view. Electromagnetic interferences (EMI) are unwanted disturbances that affect digital circuits due to electromagnetic conduction or electromagnetic radiation emitted from an internal or external source. Today electromagnetic radiations of ICs also represent vulnerabilities for hardware security modules like smartcards. Recently, from a hardware security point of view, researchers, industrials and governmental agencies are focusing with strong interest on these radiations. This paper aims at introducing a pragmatic technique WGMSI (Weighted Global Magnitude Squared Incoherence) allowing localizing hardware security modules within the whole EM noise generated by the environment or other modules of the circuits.