Conference Paper

A highly resilient routing algorithm for fault-tolerant NoCs.

DOI: 10.1109/DATE.2009.5090627 Conference: Design, Automation and Test in Europe, DATE 2009, Nice, France, April 20-24, 2009
Source: DBLP

ABSTRACT Current trends in technology scaling foreshadow worsening transistor reliability as well as greater numbers of transistors in each system. The combination of these factors will soon make long-term product reliability extremely difficult in complex modern systems such as systems on a chip (SoC) and chip multiprocessor (CMP) designs, where even a single device failure can cause fatal system errors. Resiliency to device failure will be a necessary condition at future technology nodes. In this work, we present a network-on-chip (NoC) routing algorithm to boost the robustness in interconnect networks, by reconfiguring them to avoid faulty components while maintaining connectivity and correct operation. This distributed algorithm can be implemented in hardware with less than 300 gates per network router. Experimental results over a broad range of 2D-mesh and 2D-torus networks demonstrate 99.99% reliability on average when 10% of the interconnect links have failed.

Download full-text

Full-text

Available from: Valeria Bertacco, Apr 25, 2015
0 Followers
 · 
233 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we have presented a reliable On-chip interconnection network design using spare links. It helps to mitigate the problem of fault chain formation due to failure of boundary links. The modified router design uses the redundant ports in boundary routers along with spare links for establishing connection with adjacent routers in case of link faults. This design modification on mesh based network along with proposed routing algorithm improves system reliability in case of single and multiple link failures. The performance evaluation in terms of network latency has also been improved compared to recent works with minimal area overhead.
    VLSI Design and Test, 18th International Symposium on, Coimbatore, India; 07/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Adaptive routing algorithms have been proposed for deadlock avoidance and load balancing. Furthermore they can be used to avoid failures on physical links. The classical algorithms divide the NoCs into several zones, and hence route the packets using one routing criterion. In this paper we present Gradient, a novel adaptive fault-tolerant routing algorithm. It considers sequence of alternative paths for packets when the main path fails. The proposed algorithm tolerates faults in worst traffic condition in NoCs. To evaluate the performance of the proposed algorithm, scenarios with various link-faults and node failure schemes are created. Hence the number of hops to the destination nodes and the number of alternative paths in faulty network are determined and compared with other adaptive routing algorithms. The results show that Gradient has more alternative routes with minimum hops, less latency and higher throughput than other routing algorithms.
    Design and Architectures for Signal and Image Processing (DASIP), 2012 Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As technology evolves, networks-on-chip will need to survive to manufacturing faults in order to sustain yield. An effective configuration strategy implies the design of an efficient routing infrastructure, that enables a fast and efficient configuration of the NoC system to go around faulty links and switches. The strategy must minimize the overhead in resources and guarantee the entire system to be deadlock free. A centralized approach, through a monitoring controller is appealing as will get global network visibility. This paper proposes a centralized routing configuration strategy that meets the requirements by means of a fast configuration algorithm for the most common failure patterns. The strategy is designed towards the goals of reduced configuration time and high coverage support (maximum number of supported failure patterns). No extra resources (virtual channels) are needed for the effective final configuration of the system. Results show the effectiveness of the proposed configuration algorithm.
    18th International Conference on High Performance Computing, HiPC 2011, Bengaluru, India, December 18-21, 2011; 01/2011