Fig 3 - uploaded by Saptarsi Das
Content may be subject to copyright.
Schematic diagram of REDEFINE.  

Schematic diagram of REDEFINE.  

Source publication
Article
Full-text available
Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilizat...

Contexts in source publication

Context 1
... coarse grained operations are called Hyper Operations (HyperOps). A block schematic of REDEFINE is presented in figure 3. Details of the same appear in sec- tion 2. The compilation techniques to transform applications into HyperOps is discussed in section 3. ...
Context 2
... execution engine of REDEFINE (referred to as the Reconfigurable Hardware Fabric in figure 3) comprises a matrix of tiles as shown in figure 4. Each tile comprises a CE which is either an ALU or a FU (depending on the granularity of operations that constitute HyperOps) and router that connects the CEs. The set of routers together serve as the Network on Chip (NoC). ...
Context 3
... HyperOps are synthesized on the CEs that constitute the reconfigurable hardware fabric (see figure 3). As shown in figure 4 figure 3). ...
Context 4
... HyperOps are synthesized on the CEs that constitute the reconfigurable hardware fabric (see figure 3). As shown in figure 4 figure 3). These communications have a longer latency when compared to the communication within the interconnect. ...
Context 5
... communications have a longer latency when compared to the communication within the interconnect. A Hardware Resource Manager (see figure 3) is responsible for scheduling HyperOps for execution. A HyperOp is ready for execution when all its input operands are available (which are input operands of the operations in the HyperOp). ...
Context 6
... architecture (see figure 3) includes an Operation and Data Store. The Operation Store contains the Compute and Transport Metadata of all the HyperOps. ...
Context 7
... data are not sent directly to the consumer. Producer stores the data in the Data Store and the consumer loads it from there (refer figure 3). Hence tags need not be generated for these variables. ...

Similar publications

Conference Paper
Full-text available
A case study exploring multi-frequency design is presented for a low energy and high performance FFT circuit implementation. An FFT architecture with concurrent data stream computation is selected. An asynchronous and synchronous implementations for a 16-point and a 64-point FFT circuit were designed and compared for energy, performance and area. B...

Citations

... A reconfigurable multi-core architecture that could host safety critical tasks, see [2,38,41], for instance, can become an example of a safe avionics processor by taking advantage of the inherent redundancy that enables graceful degradation [17]: when some core fails, we can use the remaining ones by reallocating affected applications to a healthy area of the chip [36]. The inherent redundancy in such parallel architecture can thus be seen as an opportunity to increase the reliability of aerospace computing systems, be it in safety critical embedded systems or for computing centers requiring guaranties of continuity of service. ...
... Constraints specific to the architecture Additional constraints can be added to respect specific aspects of the considered architecture. For example, some multi-core architectures [2], where intra-application communication between CU can happen only in a specific way as illustrated in Fig. 5, orientation of the applications on the architecture matters because nodes that can communicate in a given orientation will not be able to do so if they are rotated on the architecture. Therefore, the orientation as computed by the compiler must be enforced. ...
Article
Full-text available
This work presents an online decentralized allocation algorithm of a safety-critical application on parallel computing architectures, where individual Computational Units can be affected by faults. The described method includes representing the architecture by an abstract graph where each node represents a Computational Unit. Applications are also represented by the graph of Computational Units they require for execution. The problem is then to decide how to allocate Computational Units to applications to guarantee execution of a safety-critical application. The problem is formulated as an optimization problem with the form of an Integer Linear Program. A state-of-the-art solver is then used to solve the problem. Decentralizing the allocation process is achieved through redundancy of the allocator executed on the architecture. No centralized element decides on the allocation of the entire architecture, thus improving the reliability of the system. Inspired by multi-core architectures in avionics systems, an experimental illustration of the work is also presented. It is used to demonstrate the capabilities of the proposed allocation process to maintain the operation of a physical system in a decentralized way while individual components fail.
... Dynamic logic reconfiguration is a concept that allows for efficient on-the-fly modifications of combinational circuit behavior in both ASIC [13,14] and FPGA devices. In FPGAs, combinational circuits are typically implemented using Look-Up Tables (LUTs), i.e., configurable primitives which store truth tables of k-input Boolean functions f : B k → B. Dynamic logic reconfiguration allows for the run-time alteration of the circuit behavior by modifying the content of specific look-up tables, while leaving the routing intact. ...
Article
Dynamic logic reconfiguration is a concept that allows for efficient on-the-fly modifications of combinational circuit behavior in both ASIC and FPGA devices. The reconfiguration of Boolean functions is achieved by modification of their generators (e.g., shift register-based look-up tables) and it can be controlled from within the chip, without the necessity of any external intervention. This hardware polymorphism can be utilized for the implementation of side-channel attack countermeasures, as demonstrated by Sasdrich et al. for the lightweight cipher PRESENT. In this work, we adapt these countermeasures to two of the AES finalists, namely Rijndael and Serpent. Just like PRESENT, both Rijndael and Serpent are block ciphers based on a substitution-permutation network. We describe the countermeasures and adjustments necessary to protect these ciphers using the resources available in modern Xilinx FPGAs. We describe our implementations and evaluate the side-channel leakage and effectiveness of different countermeasures combinations using a methodology based on Welch’s t-test. Furthermore, we attempt to break the protected AES/Rijndael implementation using second-order DPA/CPA attacks. We did not detect any significant first-order leakage from the fully protected versions of our implementations. Using one million power traces, we detect second-order leakage from Serpent encryption, while AES encryption second-order leakage is barely detectable. We show that the countermeasures proposed by Sasdrich et al.are, with some modifications, successfully applicable to AES and Serpent.
... A popular alternative to FPGAs is Coarse-Grained Reconfigurable Architectures (CGRA) which have the advantages of shorter reconfiguration time and lower power consumption [9]. Some examples include REDEFINE [10], ADRES [11], DySER [12], and LAC [13,14,15], where the data-path functional units can be reconfigured based on the computation requirements. These architectures achieve ASIC-level efficiencies while supporting multiple memory access patterns. ...
... The aerospace industry is yet undertaking to take up this challenge. In collaboration with the French aerospace company Safran, the Indian Institute of Science along with Morphing Machines developed REDEFINE 1 [6], a reconfigurable multi-core architecture that could host safety critical applications. REDEFINE can become an example of a safe multi-core processor by taking advantage of the inherent redundancy of such processors that enables graceful degradation [7]: when some core fails, we can use the multiple remaining ones by reallocating affected applications to a healthy area of the chip. ...
... The REDEFINE many-core architecture [6] that inspired this work is an architecture in development at the company Morphing Machines and the Indian Institute of Science. Unlike other architectures, REDEFINE features a dynamic (a) Layout of the allocators on the computing architecture. ...
... All of them are connected to a common routing switch in a local area network (LAN). At this stage of the work, we did not try to accurately reproduce the communication protocol used on the REDEFINE Fabric [6]. Also, for simplicity of visualization, the network is considered to be a square mesh, instead of a toroidal mesh. ...
Preprint
This work presents a decentralized allocation algorithm of safety-critical application on parallel computing architectures, where individual Computational Units can be affected by faults. The described method consists in representing the architecture by an abstract graph where each node represents a Computational Unit. Applications are also represented by the graph of Computational Units they require for execution. The problem is then to decide how to allocate Computational Units to applications to guarantee execution of the safety-critical application. The problem is formulated as an optimization problem, with the form of an Integer Linear Program. A state-of-the-art solver is then used to solve the problem. Decentralizing the allocation process is achieved through redundancy of the allocator executed on the architecture. No centralized element decides on the allocation of the entire architecture, thus improving the reliability of the system. Experimental reproduction of a multi-core architecture is also presented. It is used to demonstrate the capabilities of the proposed allocation process to maintain the operation of a physical system in a decentralized way while individual component fails.
... This work is applied to a multi-core architecture called REDEFINE 1 [3], developed by the company Morphing Machines and the Indian Institute of Science (IISc), and applied to avionics control applications in collaboration with Safran Electronics & Defense. ...
Conference Paper
Full-text available
The development of multi-core processors has been providing benefits to many industries. However, some issues may arise among these benefits because of the potential conflicts between parallel applications. One of the important concerns is the reliability of the safety critical applications. This paper proposes two algorithms for computing a probability of failure of the safety-critical application on reconfigurable multi-core architecture. One is based on the exact analytical computation; whereas, the other one is the approximated of the former with less computational expensive. These algorithms adopts a combination of a fault tree analysis and a subgraph isomorphism problem to apply for avionics application. Their numerical results and performance analysis are also included.
... The realization of MFA with dgeqrf is termed as MFA while the realization of MFA with dgeqr2ht is termed as M 2 FA. We choose REDE-FINE CGRA as a platform for parallel realization of MFA and M 2 FA since the REDEFINE is capable of composing and executing custom instructions at run-time [14] [15]. Further description of micro-architecture of REDEFINE is provided in section II-A2. ...
... REDIFINE [3] a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re-composition. The high-level compiler invoked creates substructures that contain sets of compute elements. ...
Conference Paper
This paper presents DARSA, a Dataflow Application Resource and Sub-graph Analysis tool. DARSA can be used for early and accurate results for design space exploration of dataflow applications. Additionally DARSA can be used to extrapolate Coarse-Grain Reconfigurable Arrays to improve performance by analysing target applications. DARSA helps the user identify common patterns and hot-spots in the high-level Dataflow Graph, and subsequently create custom hardware for nodes that have high impact on the total resource utilization. The user can also use DARSA results to find similarities among different applications. Ultimately the analysis provided is platform independent and can be used to target any modern reconfigurable platform. Additionally this analysis could be used for determining a Coarse-Grain Reconfigurable Array, this way we can further accelerate the application with ASIC-specific elements, as well as, create energy efficient designs. The analysis provided, attempts to identify common characteristics and frequently occurring sub-graphs in four applications: Hayashi-Yoshida, Transfer Entropy, Mutual Information and Spec-FEM3D. During our experiments we observed that DARSA identified abstract nodes that contribute over 90% in the application's total resource utilization, and discovered frequent sub-graphs that contribute on average ≈10%. The obtained knowledge can help in creating specialized hardware in the form of CGRAs able to efficiently map the target application.
... A special care arXiv:1803.05320v1 [cs.DC] 14 Mar 2018 has to be taken in designing of the accelerator that is capable of attaining desired performance while maintaining generality of the accelerator in supporting other operations in the domain. Coarse-grained Reconfigurable Architectures (CGRAs) are a good candidate for the domain of DLA computations since they are capable of attaining performance of Application Specific Integrated Circuits (ASICs) while flexibility of Field Programmable Gate Arrays (FPGAs) [6][7] [8]. ...
... REDEFINE CGRA is a customizable massively parallel Multiprocessor System on Chip (MPSoC) where several Tiles are connected through Network-on-Chip (NoC) [14]. Each Tile consists of a Compute Element (CE) and a Router. ...
Article
Full-text available
We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input matrix simultaneously. GGR takes 33% lesser multiplications compared to GR. For custom implementation of GGR, we identify macro operations in GGR and realize them on a Reconfigurable Data-path (RDP) tightly coupled to pipeline of a Processing Element (PE). In PE, GGR attains speed-up of 1.1x over Modified Householder Transform (MHT) presented in the literature. For parallel realization of GGR, we use REDEFINE, a scalable massively parallel Coarse-grained Reconfigurable Architecture, and show that the speed-up attained is commensurate with the hardware resources in REDEFINE. GGR also outperforms General Matrix Multiplication (gemm) by 10% in-terms of Gflops/watt which is counter-intuitive.
... We present a static timing analysis method for good estimates of best and worst case time that meets the Safeness and Tightness constraints. RE-DEFINE Execution Model and Architecture: REDEFINE [1] is a distributed macro data ow execution engine for accelerating execution of application kernels (hotspots to be accelerated) speci ed as some partial order between HyperOps [1]. Each HyperOp is a convex scheduleable data-race free partition of the kernel's data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or commercial advantage and that copies bear this notice and the full citation on the rst page. ...
... We present a static timing analysis method for good estimates of best and worst case time that meets the Safeness and Tightness constraints. RE-DEFINE Execution Model and Architecture: REDEFINE [1] is a distributed macro data ow execution engine for accelerating execution of application kernels (hotspots to be accelerated) speci ed as some partial order between HyperOps [1]. Each HyperOp is a convex scheduleable data-race free partition of the kernel's data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or commercial advantage and that copies bear this notice and the full citation on the rst page. ...
Conference Paper
REDEFINE is a distributed dynamic dataflow architecture, designed for exploiting parallelism at various granularities as an embedded system-on-chip (SoC). This paper dwells on the flexibility of REDEFINE architecture and its execution model in accelerating real-time applications coupled with a WCET analyzer that computes execution time bounds of real time applications.
... II. REDEFINE ARCHITECTURE REDEFINE [10] is a reconfigurable multi-core architecture developed at the Indian Institute of Science (IISc). The architecture and its execution model are detailed in this section, and its suitability for hosting safety-critical embedded applications is studied. ...