Conference Paper

A highly resilient routing algorithm for fault-tolerant NoCs

DOI: 10.1109/DATE.2009.5090627 Conference: Design, Automation and Test in Europe, DATE 2009, Nice, France, April 20-24, 2009
Source: DBLP

ABSTRACT Current trends in technology scaling foreshadow worsening transistor reliability as well as greater numbers of transistors in each system. The combination of these factors will soon make long-term product reliability extremely difficult in complex modern systems such as systems on a chip (SoC) and chip multiprocessor (CMP) designs, where even a single device failure can cause fatal system errors. Resiliency to device failure will be a necessary condition at future technology nodes. In this work, we present a network-on-chip (NoC) routing algorithm to boost the robustness in interconnect networks, by reconfiguring them to avoid faulty components while maintaining connectivity and correct operation. This distributed algorithm can be implemented in hardware with less than 300 gates per network router. Experimental results over a broad range of 2D-mesh and 2D-torus networks demonstrate 99.99% reliability on average when 10% of the interconnect links have failed.

Download full-text


Available from: Valeria Bertacco, Apr 25, 2015
  • Source
    • "In [6] it was shown that reachability increases with the number of regions, but performance decreases. The authors in [7] and [8] also propose fault-tolerant routing algorithms using tables and disabling certain turns to avoid deadlock without requiring virtual channels or duplicated physical links. However, this approach does not ensure full reachability. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel low-cost routing algorithm for regular (mesh) topology networks-on-chip. While deterministic NoC routing algorithms such as XY routing are still widely used, they can fail when a link or router in the NoC fails temporarily or permanently, because they provide no adaptivity. However, switching to a topology-agnostic routing algorithm can be a very costly approach in terms of performance and/or area. Instead, we propose switching the dimension order to YX instead of XY routing and back accordingly. Preliminary experimental results show that this approach maintains the simplicity of dimension-order routing and, therefore, its hardware efficiency, while greatly improving reachability. This approach can be combined with topology-agnostic approaches to reduce packet loss during algorithm reconfiguration time.
    21st IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2014, Marseille, France; 12/2014
  • Source
    • "In this context we focus our attention on link faults. In [10] authors, have presented a highly resilient, table based distributed routing algorithm where routers can be reconfigured in case of fault for maintaining correct operation without using Virtual Channels (VCs). A bidirectional fault tolerant NoC which addresses both transient and permanent link faults by utilizing bidirectional channels has been mentioned in [9]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we have presented a reliable On-chip interconnection network design using spare links. It helps to mitigate the problem of fault chain formation due to failure of boundary links. The modified router design uses the redundant ports in boundary routers along with spare links for establishing connection with adjacent routers in case of link faults. This design modification on mesh based network along with proposed routing algorithm improves system reliability in case of single and multiple link failures. The performance evaluation in terms of network latency has also been improved compared to recent works with minimal area overhead.
    VLSI Design and Test, 18th International Symposium on, Coimbatore, India; 07/2014
  • Source
    • "The algorithm reconfigures the routing tables through reinforcement learning based on 2- hop fault information. In [15] a routing algorithm that boosts the robustness of interconnect networks by reconfiguration to avoid faulty components while maintaining connectivity and correct operation has been proposed. A lightweight faulttolerant mechanism based on the notion of default backup paths (DBPs) has been proposed in [16]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new fault-tolerant and congestion-aware adaptive routing algorithm for Networks-on-Chip (NoCs). The proposed algorithm is based on the ball-and-string model and employs a distributed approach based on partitioning of the regular NoC architecture into regions controlled by local monitoring units. Each local monitoring unit runs a shortest path computation procedure to identify the best routing path so that highly congested routers and faulty links are avoided while latency is improved. To dynamically react to continuously changing traffic conditions, the shortest path computation procedure is invoked periodically. Because this procedure is based on the ball-and-string model, the hardware overhead and computational times are minimal. Experimental re-sults based on an actual Verilog implementation demonstrate that the proposed adaptive routing algorithm improves significantly the network throughput compared to traditional XY routing and DyXY adaptive algorithms.
    EEE Congress on Evolutionary Computation (CEC); 01/2014
Show more