Conference Paper

A highly resilient routing algorithm for fault-tolerant NoCs

DOI: 10.1109/DATE.2009.5090627 Conference: Design, Automation and Test in Europe, DATE 2009, Nice, France, April 20-24, 2009
Source: DBLP


Current trends in technology scaling foreshadow worsening transistor reliability as well as greater numbers of transistors in each system. The combination of these factors will soon make long-term product reliability extremely difficult in complex modern systems such as systems on a chip (SoC) and chip multiprocessor (CMP) designs, where even a single device failure can cause fatal system errors. Resiliency to device failure will be a necessary condition at future technology nodes. In this work, we present a network-on-chip (NoC) routing algorithm to boost the robustness in interconnect networks, by reconfiguring them to avoid faulty components while maintaining connectivity and correct operation. This distributed algorithm can be implemented in hardware with less than 300 gates per network router. Experimental results over a broad range of 2D-mesh and 2D-torus networks demonstrate 99.99% reliability on average when 10% of the interconnect links have failed.

Download full-text


Available from: Valeria Bertacco, Apr 25, 2015
    • "Once a packet faces a faulty component, it is rerouted around the fault to reach the destination. The methods using this technique can also be divided into two subgroups, depending on the mechanism they use to create a deadlock-free path around the faulty region; some approaches [12], [15] are designed based on the turn model, and some [4], [16] use VCs. As discussed earlier, the turn model and VCs were first introduced to guarantee the deadlock-freedom and provide adaptivity in routing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Networks-on-Chip (NoCs) are becoming more susceptible to faults due to the increasing density in the VLSI circuits. As a result, designing reliable and efficient routing methods is highly desirable. Most of the existing fault-tolerant routing techniques use nonminimal paths to reroute the packets around the faulty regions. Using these approaches, the network performance degrades drastically not only by taking unnecessary longer paths, but also by creating hotspots around the faults. Moreover, they are designed statically and cannot adapt to the dynamic traffic distribution in the network. In this paper, a reconfigurable and fault-tolerant routing method is proposed which is designed based on the Abacus Turn Model (AbTM). The presented deadlock-free routing technique is dynamically tuned based on the location of faults and congestion in the network. Thus, it is able to tolerate all single router failures without exploiting virtual channels. Moreover, it can grant full adaptiveness to the hotspot regions of the network. Using this scheme, the rerouting is minimized by forwarding the packets through the available shortest paths. This efficiency makes the proposed method a powerful asset for reliable routing in NoCs.
    • "Finally, concluding remarks are presented in Section V. II. RELATED WORK Solutions based on forwarding tables (source or distributed based) [3], [4], [5] are flexible to deal with irregular topologies but suffer from the scalability and high cost associated with tables [6]. On the other side, low cost deterministic solutions such as FDOR (Flexible Dimension Order Routing [2]) or turn model based adaptive approaches [7], [8] offer limited flexibility and coverage as compared to tables. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To deal with the communication challenges of current and future many-core architectures, Network-on-Chip (NoC) has been proposed as a promising alternative. Regular 2D mesh topology is the most preferred design choice for NoCs. Hardware failures owing to manufacturing, wear-out, aging etc., however, may disrupt the regularity of 2D mesh. Sustaining routing under these circumstances becomes a challenge. Though traditional table based routing method is flexible enough to handle any irregularity, it is neither scalable nor cost-effective solution. Scalable distributed logic based solutions like uLBDR have limited flexibility and work only in restricted architectural space despite complex switch design. To overcome these limitations, this paper presents CERI (Cost-Effective Routing Implementation), an efficient logic based routing capable of handling failure-induced irregularities in 2D mesh. Implementation of proposed approach does not require tables or a complex switch design. Performance analysis of CERI demonstrates its cost effectiveness as area and power requirements are reduced respectively by (14%) and (16%) than previously proposed logic based solution uLBDR.
    2015 28th International Conference on VLSI Design (VLSID); 02/2015
  • Source
    • "As for the counter-clockwise bead, all North-West turns above it are prohibited and all East-North turns below it are forbidden. Fick presented a similar method [13]. A block based fault model [14] sacrifices system processing capability because some fault-free nodes are isolated and marked as faulty in order to form rectangular or convex regions . "

    IEEE Transactions on Computers 01/2015; DOI:10.1109/TC.2015.2425887 · 1.66 Impact Factor
Show more