Conference Paper

Design space exploration for complex automotive applications: an engine control system case study

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

With technological advances, significant changes are taking place in automotive domain. Modern automobile combines functionalities ranging from safety critical functions such as control systems for engine to navigation and infotainment. To meet the performances requirements of these systems, automotive industry is shifting to multi-core systems. This increases the design complexity. Efficient and fast design space exploration frameworks are required to deal with this design complexity. This paper presents a framework for exploring automotive application design on multi-core systems. It considers an automotive-specific application modeling language named Amalthea and a distributed-memory multi-core system architecture for execution. The effectiveness of our framework is shown on an engine control application.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In addition, the crossbar becomes more difficult to arbitrate with increasing number of cores. The crossbar is suitable for a small number of nodes but not scalable as wire cost becomes even more expensive with manycores [1]. ...
... d 0 to d 3 in Fig. 5.17) to encode 1 bit of data to be transmitted. Fig. 5.18 shows the transition of two logical bits (i.e. from [1,1] to [1,1]) using the 1-of-2 and 1-of-4 protocols. A total of 4 wire transitions occur for 4-phase 1-of-2 as shown in Fig. 5.18a, while only two wire transitions is observed for 4-phase 1-of-4. ...
... d 0 to d 3 in Fig. 5.17) to encode 1 bit of data to be transmitted. Fig. 5.18 shows the transition of two logical bits (i.e. from [1,1] to [1,1]) using the 1-of-2 and 1-of-4 protocols. A total of 4 wire transitions occur for 4-phase 1-of-2 as shown in Fig. 5.18a, while only two wire transitions is observed for 4-phase 1-of-4. ...
Thesis
Full-text available
More computing cores are now being integrated on a single chip in order to meet the ever-growing application demands for high performance and low power computing systems. As the number of cores continues to grow, so is the demand for scalable on-chip communication networks that can deliver high-speed communication among the cores. Contrary to traditional on-chip networks, Networks-on-Chip(NoCs) have emerged as a mature alternative interconnect for manycore architectures since it provides enhanced scalability and power efficiency. Typical NoC routers consist of buffers which serve as temporary data storage. However, studies have shown that buffers are often unutilized (i.e. idle or underutilized) especially when executing applications with non-uniform traffic patterns or bursty behaviours. This is because most typical routers dedicate a set of buffers to their input and/or output ports and these buffers can only be exploited by dataflows using them, which leads to significant performance degradation. Therefore, router architectures capable of maximizing buffer utilization for performance gains are indispensable. In order to maximize buffer resource utilization, this thesis proposes a novel NoC router concept called Roundabout NoC (R-NoC) that is inspired by real-life multi-lanes traffic roundabout. Contrary to existing approaches, R-NoC provides intrinsic and effective resource utilization. However, roundabout-inspired routers are susceptible to deadlocks due to their ring-like architecture. Contrary to existing solutions, R-NoC achieves deadlock-freeness and enhanced network performance over typical NoCs without compromising network area/power. This thesis further exploits R-NoC highly parametric architecture in order to produce different router configurations with varying topological trade-offs for performance gains without sacrificing area.
... Ce travail a été approfondi [12] afin d'inclure, entre autres, une méthode d'exploration de l'espace de conception, proposant des solutions pareto-optimales. Une méthode pour l'exploration d'espace de conception est également proposée par K. Latif et al. [94] pour des applications automobiles. Une approche orientée graphes et automates est également présentée par X. ...
... Les méthodes décrites précédemment concernent essentiellement les systèmes embarqués [133,13,94] et MPSoC [11,12]. Cependant, il existe aussi un intérêt pour la modélisation de systèmes HPC. ...
Thesis
L’essor des techniques et algorithmes d’apprentissage (i.e. Machine Learning) et leur utilisation dans des domaines de plus en plus variés montrent des capacités souvent surprenantes -- dans l’interprétation des données d’entrée et la capacité de construire des représentations abstraites pertinentes (e.g. apprentissage supervisé), mais aussi dans le contrôle dynamique de systèmes complexes (e.g. apprentissage par renforcement).L'optimisation de l'efficacité énergétique des systèmes de calcul est devenue un enjeu majeur. De la conception matérielle au contrôle logiciel, différents leviers existent pour agir sur l'exécution de calculs. Nous considérons dans cette thèse l'optimisation du calcul parallèle, s'exécutant sur des architectures complexes, du processeur multicœur au cluster de calcul. Ces systèmes possèdent un nombre de paramètres de conception et de contrôle important, donnant lieu à un nombre de combinaisons souvent trop large pour pouvoir être considérées de façon exhaustive.Ainsi, cette thèse a pour objectif de s'appuyer sur les techniques d'apprentissage automatique récentes, basées sur les réseaux de neurones, pour construire des solutions de contrôle et de conception de systèmes de calcul. Ces techniques peuvent considérer un ensemble significatif de paramètres afin de proposer des solutions optimales. Les techniques proposées ont vocation à être multi-niveaux et pourront à ce titre être appliquées à l’échelle d’un système embarqué ou de ses divers sous-composants mais aussi à l’échelle d'un système distribué, comme par exemple un cluster de calcul. Des solutions prometteuses sont proposées selon deux axes de recherche distincts. Le premier axe s'adresse au contrôle dynamique du calcul parallèle. Il est question de l'optimisation en temps réel de l'efficacité énergétique d'un système exécutant une application parallélisée avec OpenMP, à l'aide, entre autres, d'un apprentissage par renforcement. Le deuxième axe concerne la conception de réseaux de communication optimisés. En effet, les réseaux de communication représentent une part non négligeable de la consommation énergétique des systèmes de calcul. Ainsi nous proposons un outil d'aide à la conception basé sur une IA générative, pour la génération de réseaux optimisés selon des critères utilisateurs tels que l'efficacité énergétique.
... These simulators are quite accurate, but they are often slow and cannot accommodate large design areas. By using transaction-level modeling (TLM) [19][20][21][22], simulation and modeling of complex systems are possible at a faster rate than at a cycle-accurate level. The transmission delay of communications is not taken into account when simulating communication transactions. ...
Article
Full-text available
The execution of machine learning (ML) algorithms on resource-constrained embedded systems is very challenging in edge computing. To address this issue, ML accelerators are among the most efficient solutions. They are the result of aggressive architecture customization. Finding energy-efficient mappings of ML workloads on accelerators, however, is a very challenging task. In this paper, we propose a design methodology by combining different abstraction levels to quickly address the mapping of convolutional neural networks on ML accelerators. Starting from an open-source core adopting the RISC-V instruction set architecture, we define in RTL a more flexible and powerful multiply-and-accumulate (MAC) unit, compared to the native MAC unit. Our proposal contributes to improving the energy efficiency of the RISC-V cores of PULPino. To effectively evaluate its benefits at system level, while considering CNN execution, we build a corresponding analytical model in the Timeloop/Accelergy simulation and evaluation environment. This enables us to quickly explore CNN mappings on a typical RISC-V system-on-chip model, manufactured under the name of GAP8. The modeling flexibility offered by Timeloop makes it possible to easily evaluate our novel MAC unit in further CNN accelerator architectures such as Eyeriss and DianNao. Overall, the resulting bottom-up methodology assists designers in the efficient implementation of CNNs on ML accelerators by leveraging the accuracy and speed of the combined abstraction levels.
... Le niveau transaction-level modeling (TLM) [100][101][102][103][104] [111]. Les systèmes étudiés dans cet article adoptent des organisations différentes concernant la mémoire (cf. Figure 3.6). ...
Thesis
L’informatique en périphérie ou edge computing est un paradigme de calcul distribué récent permettant d’adresser la problématique des données massives, notamment dans le contexte des objets connectés. Ces derniers prennent une place toujours plus prépondérante dans nos vies. Les exemples vont de la montre connectée à la maison intelligente, en passant par les voitures connectées. Pour des raisons de réactivité due à la surcharge du réseau et d’efficacité énergétique, les traitements des données ainsi générées par ces objets sont passés progressivement d’infrastructures centralisées dans le cloud à des systèmes distribués intégrant des serveurs puissants et des systèmes embarqués utilisables au plus près des sources de données. Aujourd’hui, le traitement de ces dernières intègre de plus en plus d’algorithmes d’intelligence artificielle (typiquement, pour l’analyse de données et la prise de décision) dans le edge computing. Pour rendre cela viable sur les supports embarqués, il est important d’étudier de nouvelles architectures suffisamment performantes et peu gourmandes en énergie. Cette thèse aborde la problématique du calcul embarqué dédié au edge computing. En particulier, elle se focalise sur la conception d’architectures à faible consommation permettant de traiter des algorithmes d’apprentissage machine. Dans un premier temps, elle explore une piste basée sur une architecture multicoeur hétérogène afin de voir dans quelle mesure cela permet de répondre à une large demande algorithmique. Cette architecture innovante repose sur la technologie de processeur proposée par la société française Cortus S.A. Ensuite, la thèse se concentre sur l’accélération des réseaux profonds en proposant une nouvelle unité MAC (multiply-accumulate) à la fois flexible et efficace en énergie. Les gains fournis par cette unité MAC sont évalués à travers une modélisation de haut niveau dans des architectures d’accélérateurs de réseau de neurones convolutif. Plus généralement, le travail présenté dans cette thèse offre des enseignements intéressants quant au choix entre des architectures multicoeurs généralistes et des architectures dédiées de type accélérateur d’intelligence artificielle, pour des noeuds de calcul efficaces en énergie pour le edge computing.
Article
Application mapping in multicore embedded systems plays a central role in their energy-efficiency. The present paper deals with this issue by focusing on the prediction of performance and energy consumption, induced by task and data allocation on computing resources. It proposes a solution by answering three fundamental questions as follows: (i) how to encode mappings for training performance prediction models? (ii) how to define an adequate criterion for assessing the quality of mapping performance predictors? and (iii) which technique among regression and classification enables the best predictions? Here, the prediction models are obtained by applying carefully selected supervised machine learning techniques on raw data, generated off-line from system executions. These techniques are Support Vector Machines, Adaptive Boosting (AdaBoost) and Artificial Neural Networks (ANNs). Our study is validated on an automotive application case study. The experimental results show that with a limited set of training information, AdaBoost and ANNs can provide very good outcomes (up to 84.8% and 89.05% correct prediction score in some cases, respectively), making them attractive enough for the addressed problem.
Conference Paper
The growing demand for smarter high-performance embedded systems leads to the integration of multiple functionalities in on-chip systems with tens (even hundreds) of cores. This trend opens a very challenging question about the optimal resource allocation in those manycore systems. Answering this question is key to meet the performance and energy requirements. This paper deals with a learning technique applicable to manycore systems in order to predict mapping-related performances. The resulting prediction models can enable to improve dynamic resource allocation decisions. Our proposal is demonstrated on two automotive applications with very promising results.
Article
Full-text available
Modern vehicles integrate a multitude of embedded hard realtime control functionalities, and a host of advanced information and entertainment (infotainment) features. The true paradigm shift for future vehicles (cybercars) is not only a result of this increasing plurality of subsystems and functions, but is also driven by the unprecedented levels of intra- and inter-car connections and communications as well as networking with external entities. Several new cybercar security and safety challenges simultaneously arise. On one hand, many challenges arise due to increasing system complexity as well as new functionalities that should jointly work on the existing legacy protocols and technologies; such systems are likely unable to warrant a fully secure and dependable system without afterthoughts. On the other hand, challenges arise due to the escalating number of interconnections among the real-time control functions, infotainment components, and the accessible surrounding external devices, vehicles, networks, and cloud services. The arrival of cybercars calls for novel abstractions, models, protocols, design methodologies, testing and evaluation tools to automate the integration and analysis of the safety and security requirements.
Article
Full-text available
A main challenge for Network-on-Chip (NoC) design is to select a network architecture that suits a particular ap-plication. NNSE enables to analyze the performance im-pact of NoC configuration parameters. It allows one to (1) configure a network with respect to topology, flow con-trol and routing algorithm etc.; (2) configure various reg-ular and application specific traffic patterns; (3) evaluate the network with the traffic patterns in terms of latency and throughput.
Conference Paper
Full-text available
Simulation is a bottleneck in the design flow of on-chip multiprocessors. This paper addresses that problem by reducing the simulation time of complex on-chip interconnects through transaction-level modelling (TLM). A particular on-chip interconnect architecture was chosen, namely a wormhole network-on-chip with priority preemptive virtual channel arbitration, because its mechanisms can be modelled at transaction level in such a way that accurate figures for communication latency can be obtained with less simulation time than a cycle-accurate model. The proposed model produced latency figures with more than 90% accuracy and simulated more than 1000 times faster than a cycle-accurate model.
Conference Paper
Full-text available
Driven by increasing complexity and reliability demands, the Japanese Aerospace Exploration Agency (JAXA) in 2004 commissioned development of ELEGANT, a complete SpecC-based environment for electronic system-level (ESL) design of space and satellite electronics. As integral part of ELEGANT, the Center for Embedded Computer System (CECS) has developed and supplied the SER tool set. Following a Specify-Explore-Refine methodology, SER supports system-level design space exploration, interactive platform development and automatic model refinement and model generation. The SER engine has been successfully integrated into ELEGANT. With SER at its core, ELEGANT provides a seamless tool chain for modeling verification and synthesis from top-level specification down to embedded HW/SW implementation. ELEGANT and SER have been successfully delivered to JAXA and its suppliers. Tools are currently being deployed in companies like NEC Toshiba Space Systems. Evaluation results prove the feasibility of the approach for design space exploration, rapid virtual prototyping and system synthesis resulting in tremendous productivity and reliability gains. In addition, ELEGANT has been commercialized for general market availability. The SER component has been licensed to InterDesign Technologies, Inc. (IDT) and it is available from, sold and supported by IDT.
Conference Paper
Full-text available
With the current trend in integration of more complex systems on chip there is a need for better communication infrastructure on chip that will increase the available bandwidth and simplify the interface verification. We have previously proposed a circuit switched two-dimensional mesh network known as SoCBUS that increases performance and lowers the cost of verification. In this paper, the SoCBUS is explained together with the working principles of the transaction handling. We also introduce the concept of packet connected circuit, PCC, where a packet is switched through the network locking the circuit as it goes. PCC is deadlock free and does not impose any unnecessary restrictions on the system while being simple and efficient in implementation. SoCBUS uses this PCC scheme to set up routes through the network. We introduce a possible application, a telephone to voice-over-IP gateway, and use this to show that the SoCBUS have very good properties in bandwidth, latency, and complexity when used in a hard real time system with scheduling of the traffic. The simulations analysis of the SoCBUS in the application show that a certain SoCBUS setup can handle 48000 channels of voice data including buffer swapping in a single chip. We also show that the SoCBUS is not suitable for general purpose computing platforms that exhibit random traffic patterns but that the SoCBUS show acceptable performance when the traffic is mainly local.
Conference Paper
The reliance on multi/many-core systems to satisfy the high performance requirement of complex embedded software applications is increasing. This necessitates the need to realize efficient mapping methodologies for such complex computing platforms. This paper provides an extensive survey and categorization of state-of-the-art mapping methodologies and highlights the emerging trends for multi/many-core systems. The methodologies aim at optimizing system's resource usage, performance, power consumption, temperature distribution and reliability for varying application models. The methodologies perform design-time and run-time optimization for static and dynamic workload scenarios, respectively. These optimizations are necessary to fulfill the end-user demands. Comparison of the methodologies based on their optimization aim has been provided. The trend followed by the methodologies and open research challenges have also been discussed.
Article
With increasing complexity of today's embedded systems, research has focused on developing fast, yet accurate high-level and executable models of complete platforms. These models address the need for hardware/software co-simulation of the entire system at early stages of the design. Traditional models tend to be either slow or inaccurate. In this paper, we present ingredients for a class of abstract, high-level platform models that enable fast yet accurate performance and power simulation of application execution on heterogeneous multi-core/-processor architectures. Models are based on host-compiled simulation of the application code, which is instrumented with timing and power information. Back-annotated source code is further augmented with abstract OS and processor models that are integrated into standard co-simulation backplanes. The efficiency of the modeling platform has been evaluated by applying an industrial-strength benchmark, demonstrating the feasibility and benefits of such models for rapid, early exploration of the power, performance and cost design space. Results show that an accurate Pareto set of solutions can be obtained in a fraction of the time needed with traditional simulation and modeling approaches.
Article
This paper gives an overview of methods used for design space exploration (DSE) of micro-architectures and systems. The DSE problem generally considers two orthogonal issues: (I) How can a single design point be evaluated, (II) how can the design space be covered during the exploration process? The latter question arises since an exhaustive exploration of the design space is usually prohibitive due to the sheer size of the design space. We explain trade-offs linked to the choice of appropriate evaluation and coverage methods. The designer has to balance the following issues: the accuracy of the evaluation, the time it takes to evaluate one design point (including the implementation of the evaluation model), the precision/granularity of the design space coverage, and, last but not least, the possibilities for automating the exploration process. We also summarize common representations of the design space and compare current system and micro-architecture level design frameworks. This review eases the choice of a decent exploration policy by providing a comprehensive survey and classification of recent related work. It is focused on system-on-a-chip designs, particularly those used for network processors. These systems are heterogeneous in nature using multiple computation, communication, memory, and peripheral resources.
Conference Paper
Current multiprocessor systems on chip (MPSoC) architectures integrate a massive number of IPs that need to exchange data in complex and diverse synchronization ways. The key challenge when designing MPSoC is that the communication architecture needs to be decided at the beginning of the design, before all the details about mapping the application on the architecture are known. These early decisions cause two difficulties: how to select the best communication architecture and how to estimate the effect of mapping the application onto the communication resources. In this paper, we propose high level communication models that allow early accurate performance estimation of both communication architecture and communication mapping. We applied the proposed modeling methods to analyze the impact on performance in case of two network topologies and several communication mapping schemes for the H.264 Encoder application.
Article
The continuous advances in semiconductor technology enable the integration of increasing numbers of IP blocks in a single SoC. Interconnect infrastructures, such as buses, switches, and networks on chips (NoCs), combine the IPs into a working SoC. Moreover, the industry expects platform-based SoC design to evolve to communication-centric design, with NoCs as a central enabling technology. In this article, we introduce the AEthereal NoC. The tenet of the AEthereal NoC is that guaranteed services (GSs) - such as uncorrupted, lossless, ordered data delivery; guaranteed throughput; and bounded latency - are essential for the efficient construction of robust SoCs. To exploit the NoC capacity unused by the GS traffic, we provide best-effort services.
Case Study: Engine Control Application
  • P Frey
P. Frey. Case Study: Engine Control Application. Technical Report 2010-03, Ulmer Informatik-Berichte, 2010.
Deliverable 3.2 - Dynamic power management
  • Dreamcloud
DreamCloud. Deliverable 3.2 -Dynamic power management, 2015. http://www.dreamcloud-project.org/results.
EDA for secure and dependable cybercars: Challenges and opportunities
  • F Koushanfar
  • A.-R Sadeghi
  • H Seudie
F. Koushanfar, A.-R. Sadeghi, and H. Seudie. EDA for secure and dependable cybercars: Challenges and opportunities. In 49th ACM/EDAC/IEEE Design Automation Conference (DAC), pages 220-228, 2012.
Abstract system-level models for early performance and power exploration
  • A Gerstlauer
  • S Chakravarty
  • M Kathuria
  • P Razaghi
A. Gerstlauer, S. Chakravarty, M. Kathuria, and P. Razaghi. Abstract system-level models for early performance and power exploration. In 17th Asia and South Pacific Design Automation Conference (ASP-DAC), pages 213-218, 2012.