Angeliki Kritikakou

Angeliki Kritikakou
Université de Rennes 1 | UR1 · IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires

PhD

About

85
Publications
6,030
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
444
Citations
Additional affiliations
September 2013 - August 2014
The French Aerospace Lab ONERA
Position
  • PostDoc Position
March 2009 - July 2013
University of Patras
Position
  • Research Assistant

Publications

Publications (85)
Preprint
RISC-V architectures have gained importance in the last years due to their flexibility and open-source Instruction Set Architecture (ISA), allowing developers to efficiently adopt RISC-V processors in several domains with a reduced cost. For application domains, such as safety-critical and mission-critical, the execution must be reliable as a fault...
Article
Instruction Level Parallelism (ILP) of applications is typically limited and variant in time, thus during application execution some processor Function Units (FUs) may not be used all the time. Therefore, these idle FUs can be used to execute replicated instructions, improving reliability. However, existing approaches either schedule the execution...
Article
Full-text available
On multicore platforms, reliable task execution, as well as low energy consumption, are essential. Dynamic Voltage/Frequency Scaling (DVFS) is typically used for energy savings, but with a negative impact on reliability, especially when the applied frequency is low. Using high frequencies, required to meet reliability constraints, or replicating ta...
Conference Paper
Full-text available
Task deployment plays an important role in the overall system performance, especially for complex architectures, including several cores with Dynamic Voltage and Frequency Scaling (DVFS) and Network-on-Chips (NoC). Task deployment affects not only the energy consumption but also the real-time response and reliability of the system. In this work, a...
Technical Report
Full-text available
Cyber-Physical Systems (CPS) usually consist of a set of embedded systems (CPS nodes) connected through wireless communication, providing multiple functionalities that support different types of applications. During CPS deployment, application tasks are mapped on the CPS nodes with the objective of enhancing real-time performance, energy efficiency...
Article
Since several decades, fault tolerance has become a major research field due to transistor shrinking and core number increasing in System-on-Chip (SoC). Especially, faults occurring to Network-on-Chips (NoCs) of those systems have a significant impact, due to the high amount of data, crossing the NoC, for the communication among Intellectual Proper...
Article
Full-text available
Networked systems are useful for a wide range of applications, many of which require distributed and collaborative data processing to satisfy real-time requirements. On the one hand, networked systems are usually resource-constrained, mainly regarding the energy supply of the nodes and their computation and communication abilities. On the other han...
Chapter
Full-text available
An efficient task execution on multicore platforms can lead to low energy consumption. To achieve that, an Integer Non-Linear Programming (INLP) formulation is proposed that performs task mapping by jointly addressing task allocation, task frequency assignment, and task duplication. The goal is to minimize energy consumption under real-time and rel...
Article
Due to technology scaling and harsh environments, a wide range of fault-tolerant techniques exists to deal with the error occurrences. Selecting a fault-tolerant technique is not trivial, whereas more than the necessary overhead is usually inserted during the system design. To avoid over-designing, it is necessary to have an in-depth understanding...
Article
Full-text available
By allocating a set of tasks onto a set of nodes and adjusting the execution time of tasks, task mapping is an efficient approach to realize distributed computing. Cyber-Physical Systems (CPS), as a particular case of distributed systems, raise new challenges in task mapping, because of the heterogeneity and other properties traditionally associate...
Article
Wireless charging provides dynamic power supply for Wireless Sensor Networks (WSNs). Such systems, are typically considered under the scenario of Wireless Rechargeable Sensor Networks (WRSNs). With the use of mobile chargers (MCs), the flexibility of WRSNs is further enhanced. However, the use of MCs poses several challenges during the system desig...
Article
In cyber-physical systems, mobile actuators can enhance the system’s flexibility and scalability, but at the same time incurs complex couplings in the scheduling and controlling of the actuators. In this paper, we propose a novel event-driven method aiming at satisfying a required level of control accuracy and saving energy consumption of the actua...
Conference Paper
Full-text available
Asymmetric Multicore Processors (AMP) are a very promising architecture to deal efficiently with the wide diversity of applications. In real-time application domains, in-time approximated results are preferred than accurate-but too late-results. In this work, we propose a deployment approach that exploits the heterogeneity provided by AMP architect...
Conference Paper
Full-text available
The design of fast and effective coordination among sensors and actuators in Cyber-Physical Systems (CPS) is a fundamental, but challenging issue, especially when the system model is a priori unknown and multiple random events can simultaneously occur. We propose a novel collaborative state estimation and actuator scheduling algorithm with two phas...
Chapter
Full-text available
The design of fast and effective coordination among sensors and actuators in Cyber-Physical Systems (CPS) is a fundamental, but challenging issue, especially when the system model is a priori unknown and multiple random events can simultaneously occur. We propose a novel collaborative state estimation and actuator scheduling algorithm with two phas...
Article
Full-text available
Multicore architectures have great potential for energy-constrained embedded systems, such as energy-harvesting wireless sensor networks. Some embedded applications, especially the real-time ones, can be modeled as imprecise computation tasks. A task is divided into a mandatory subtask that provides a baseline Quality-of-Service (QoS) and an option...
Article
Full-text available
Multicore architectures have been used to enhance computing capabilities, but the energy consumption is still an important concern. Embedded application domains usually require less accurate, but always in-time, results. Imprecise Computation (IC) can be used to divide a task into a mandatory subtask providing a baseline QoS and an optional subtask...
Article
Full-text available
Wireless Sensor and Actuator Networks (WSANs) are emerging as a new generation of Wireless Sensor Networks (WSNs). Due to the coupling between the sensing areas of the sensors and the action areas of the actuators, the efficient coordination among the nodes is a great challenge. In this paper, we address the problem of distributed node coordination...
Conference Paper
Full-text available
Multicore architectures are now widely used in energy-constrained real-time systems, such as energy-harvesting wireless sensor networks. To take advantage of these multicores, there is a strong need to balance system energy, performance and Quality-of-Service (QoS). The Imprecise Computation (IC) model splits a task into mandatory and optional part...
Article
Full-text available
In real-time mixed-critical systems, Worst-Case Execution Time (WCET) analysis is required to guarantee that timing constraints are respected—at least for high-criticality tasks. However, the WCET is pessimistic compared to the real execution time, especially for multicore platforms. As WCET computation considers the worst-case scenario, it means t...
Conference Paper
Full-text available
Parallel architectures are nowadays not only confined to the domain of high performance computing, they are also increasingly used in embedded time-critical systems. The ARGO H2020 project1 provides a programming paradigm and associated tool flow to exploit the full potential of architectures in terms of development productivity, time-to-market, ex...
Article
Full-text available
Current compilers cannot generate code that can compete with hand-tuned code in efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and number of levels of tiling. The scheduling parameter values selection is a very dif...
Article
Full-text available
In this paper, a new methodology for computing the Dense Matrix Vector Multiplication, for both embedded (processors without SIMD unit) and general purpose processors (single and multi-core processors, with SIMD unit), is presented. This methodology achieves higher execution speed than ATLAS state-of-the-art library (speedup from 1.2 up to 1.45). T...
Article
It is well-known that today׳s compilers and state of the art libraries have three major drawbacks. First, the compiler sub-problems are optimized separately; this is not efficient because the separate sub-problems optimization gives a different schedule for each sub-problem and these schedules cannot coexist as the refining of one, causes the degra...
Article
The size required to store an array is crucial for an embedded system, as it affects the memory size, the energy permemory access, and the overall system cost. Existing techniques for finding the minimum number of resources required to store an array are less efficient for codes with large loops and not regularly occurring memory accesses. They hav...
Article
The Matrix Vector Multiplication algorithm is an important kernel in most varied domains and application areas and the performance of its implementations highly depends on the memory utilization and data locality. In this paper, a new methodology for MVM including different types of matrices, i.e. Regular, Toeplitz and Bisymmetric Toeplitz, is pres...
Article
Full-text available
When integrating mixed critical systems on a multi/many-core, one challenge is to ensure predictability for high criticality tasks and an increased utilization for low criticality tasks. In this paper, we address this problem when several high criticality tasks with different deadlines, periods and offsets are concurrently executed on the system. W...
Conference Paper
Full-text available
Although multi/many-core platforms enable the parallel execution of tasks, the sharing of resources may lead to long WCETs that fail to meet the real-time constraints of the system. Then, a safe solution is the execution of the most critical tasks in isolation followed by the execution of the remaining tasks. To improve the system performance, we p...
Article
In this paper, a new methodology for speeding up Matrix–Matrix Multiplication using Single Instruction Multiple Data unit, at one and more cores having a shared cache, is presented. This methodology achieves higher execution speed than ATLAS state of the art library (speedup from 1.08 up to 3.5), by decreasing the number of instructions (load/store...
Patent
A method and device for converting first program code into second program code, such that the second program code has an improved execution on a targeted programmable platform, is disclosed. In one aspect, the method includes grouping operations on data for joint execution on a functional unit of the targeted platform, scheduling operations on data...
Chapter
Embedded systems usually have hard real-time constraints , which require custom HW designs. Although, they improve the performance, they have a high design cost and very limited flexibility, even when they are made partly configurable. The SW designs provide the required flexibility for a wide range of applications at the cost of reduced performanc...
Chapter
The storage size management techniques take as input the memory access scheme, which describes the global valid iteration space and it is defined by the application structure, e.g. the loops, the condition statements and the memory access statements. For instance, commonly used applications in embedded systems, such as image, video and signal proce...
Chapter
In this chapter we describe how the pattern representation of Chap. 4 is used to describe the memory accesses and the parametric templates for size computation step of the intra-signal in-place optimization methodology of Chap. 3 for the case of overlapping stores and loads access scheme , i.e.
Chapter
In this book, we have proposed a reusable DSE methodology which develops near-optimal and scalable DSE framework. We have applied the proposed reusable DSE methodology into both the background memory part and the processing part of the embedded systems.
Chapter
The storage size management techniques search the minimum number of resources required to store the elements of an application, without imposing an inefficient addressing during element accessing
Chapter
Our goal is to provide a way to solve complex, dependent and large DSE problems in a near-optimal and scalable way. In this target domain, as shown in Chap. 1, the conventional DSE methodologies are less appropriate, because they are inherently based on bottom-up approaches and ad-hoc splits that are not driven by constraint propagation.
Chapter
In this chapter we describe how the pattern representation of Chap. 4 is used in the translation step, we define the parametric templates for size computation step of the intra-signal in-place optimization methodology of Chap. 3 for the different cases of non-overlapping store and load access scheme, i.e. all the stores are executed before the load...
Chapter
Embedded systems are computer systems which execute applications dedicated to a specific goal, without indented to be a general purpose computer. Embedded systems contain a collection of programmable parts and components, which interact with the environment. Examples of embedded systems are mobile devices, bio-medical devices, security devices, mul...
Chapter
The scheduling and assignment techniques heavily affect the system design and performance, as they are responsible for meeting the system specifications, e.g. real-time behavior, minimal energy consumption, reliability etc. The scheduling technique assigns operations, groups of operations, memory references or communication transactions to control...
Chapter
he scheduling problem is an optimization problem with fundamental principles applicable in several fields [139], e.g. computer science, economics, job scheduling, project management, production etc. Both research and development community have already invested decades of research and experiments to the techniques solving several instances of the sc...
Article
Memory management searches for the resources required to store the concurrently alive elements. The solution quality is affected by the representation of the element accesses: a sub-optimal representation leads to overestimation and a non-scalable representation increases the exploration time. We propose a methodology to near-optimal and scalable r...
Book
This book describes scalable and near-optimal, processor-level design space exploration (DSE) methodologies. The authors present design methodologies for data storage and processing in real-time, cost-sensitive data-dominated embedded systems. Readers will be enabled to reduce time-to-market, while satisfying system requirements for performance, ar...
Article
Full-text available
Although multi/many-core platforms enable the parallel execution of tasks, the sharing of resources may lead to long WCETs that fail to meet the real-time constraints of the system. Then, a safe solution is the execution of the most critical tasks in isolation followed by the execution of the remaining tasks. To improve the system performance, we p...
Article
Storage-size management techniques aim to reduce the resources required to store elements and to concurrently provide efficient addressing during element accessing. Existing techniques are less appropriate for large iteration spaces with increased numbers of irregularly spread holes. They either have to approximate the accessed regions, leading to...
Article
A systematic methodology for near-optimal software/hardware codesign mapping onto an FPGA platform with microprocessor and HW accelerators is proposed. The mapping steps deal with the inter-organization, the foreground memory management, and the datapath mapping. A step is described by parameters and equations combined in a scalable template. Mappi...
Article
In this paper, a new methodology for speeding up edge and line detection algorithms is presented, achieving improved performance over the state of the art software library OpenCV (speedup from 1.35 up to 2.22) and other conventional implementations, in both general and embedded processors, by reducing the number of load/store and arithmetic instruc...
Article
The scheduling problem is an important partially solved topic related to a wide range of scientific fields. As it applies to design-time mapping on multiprocessing platforms emphasizing on ordering in time and assignment in place, significant improvements can be achieved. To support this improvement, this article presents a complete systematic clas...
Conference Paper
Full-text available
Wireless Sensor Networks (WSNs) have limited power capabilities, whereas they serve applications which usually require specific packets, i.e. High Priority Packets (HPP), to be delivered before a deadline. Hence, it is essential to reduce the energy consumption and to have real-time behavior. To achieve this goal we propose a hybrid technique which...
Conference Paper
Embedded applications usually require Software/Hardware (SW/HW) designs to meet the hard timing constraints and the required design flexibility. Exhaustive exploration for SW/HW designs is a very time consuming task, while the adhoc approaches and the use of partially automatic tools usually lead to less efficient designs. To support a more efficie...
Conference Paper
Full-text available
The Wireless Sensor Networks (WSNs) have limited power capabilities whereas they require long network lifetime. To increase the latter, techniques to reduce energy consumption are highly required. This study proposes such a technique which explores the benefits of data aggregation without size reduction, under various parameters in data flows scena...
Article
Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the performance of its implementations depends on the memory utilization and data locality. There are MMM algorithms, such as standard, Strassen–Winograd variant, and many recursive array layouts, such as Z-Morton or U-Morton. However, their data locali...
Article
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Transform in the West (FFTW) for fast Fourier transform (FFT). FFT is a highly important kernel and the performance of its software implemen- tations depends on the memory hierarchy's utilization. FFTW minimizes register spills and data cache accesses b...
Conference Paper
Full-text available
The Wireless Sensor Networks (WSNs) have limited power and communication capabilities, combined with the requirement for long network lifetime. To increase it, methods to reduce energy consumption are highly required. To achieve this goal, we study a data aggregation technique without size reduction, i.e. data merge. It is a generic technique, sinc...
Conference Paper
Many signal processing applications demand for highly energy efficient flexible implementations. In this paper, we propose a novel Domain Specific Instruction-set Processor (DSIP) architecture template which is tuned to deploy in the targeted domain of on-line surveillance. The architecture, when implemented using a 40-nm CMOS standard cell library...
Chapter
The work presented in this book targets nomadic battery operated embedded systems. In this context, a large amount of related work exists. The goal of this chapter is to present a structured overview of the relevant related work in the design of embedded systems, which forms the broad context. The presented ordering will cover both the architectura...
Chapter
Current embedded systems are built of many interacting components. While optimizing the system, it is important to track the impact of the different parts and their interaction on the global optimality metrics. In this chapter, a representative case study is presented that estimates and compares the most important parts of an embedded platform: nam...
Chapter
This chapter introduces the concept of executing multiple incompatible loops in parallel and thereby enabling multi-threading in an efficient way in a VLIW processor. The proposed multi-threading is enabled by the use of a distributed instruction memory organization with a minimal hardware overhead. This forms one of the core contributions of this...
Chapter
This chapter introduces one of the core contributions of the book which helps improve the energy efficiency of the register file. It presents a novel register file/foreground memory organization which is motivated across application, architecture and physical design abstraction layers. It is fully compatible with the energy-efficient scratchpad mem...
Chapter
This chapter presents a conversion technique for constant multiplications. The targeted multiplication operations (or MULs), which form a significant part of all MULs, are converted into (a number of) less complex, cheaper operations. Multiplier strength reduction is a well-known technique in hardware synthesis and has been used extensively for fil...
Chapter
This chapter presents one of the core contributions of the book which also forms one of the main stepping stone for the rest of the book. It presents the compilation, simulation and energy estimation framework for modeling the large architecture design space of single core platforms for low power embedded systems. For multi-core platforms, the intr...
Chapter
This chapter describes the application of the main techniques proposed in this book to a realistic application benchmark, namely a bioimaging detection and tracking algorithm for on-line animal monitoring. Most of the components and contributions presented in this book have been applied and illustrated in this realistic demonstrator. In particular...
Chapter
Optimizing the energy efficiency of an embedded platform has to be tackled at different abstraction levels and for all relevant components. In the previous chapters we have seen that that platform components that initially dominate the power/energy pie chart have been one by one reduced with a substantial factor: the instruction memory organisation...