Conference Paper

Express virtual channels: towards the ideal interconnection fabric.

DOI: 10.1145/1250662.1250681 Conference: 34th International Symposium on Computer Architecture (ISCA 2007), June 9-13, 2007, San Diego, California, USA
Source: DBLP

ABSTRACT ABSTRACT Due to wire delay scalability and bandwidth,limitations inherent in shared buses and dedicated links, packet-switched on-chip interconnection networks are fast emerging as the pervasive communication fabric to connect dieren t processing elements in many-core chips. However, current state-ofthe-art packet-switched networks rely on complex routers which increases the communication overhead and energy consumption as compared,to the ideal interconnection fabric. In this paper, we try to close the gap between the stateof-the-art packet-switched network and the ideal interconnect by proposing express virtual channels (EVCs), a novel o w control mechanism which allows packets to virtually bypass intermediate routers along their path in a completely non-speculative fashion, thereby lowering the energy/delay towards that of a dedicated wire while simultaneously approaching ideal throughput with a practical design suitable for on-chip networks. Our evaluation results using a detailed cycle-accurate simulator on a range of synthetic trac,and SPLASH benchmark traces show upto 84% reduction in packet latency and upto 23% improvement in throughput while reducing the average router energy consumption by upto 38% over an existing state-of-the-art packet-switched design. When compared to the ideal interconnect, EVCs add just two cycles to the no-load latency, and are within 14% of the ideal throughput. Moreover, we show that the proposed design incurs a minimal hardware overhead while exhibiting excellent scalability with increasing network sizes.

1 Follower
 · 
80 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the emergence of many-core multiprocessor system-on-chips (MPSoCs), the on-chip networks are facing serious challenges in providing fast communication for various tasks and cores. One promising solution shown in recent studies is to add express channels to the network as shortcuts to bypass intermediate routers, thereby reducing packet latency. However, this approach also greatly changes the packet delay estimation and traffic behaviors of the network, both of which have not yet been exploited in existing mapping algorithms. In this paper, we explore the opportunities in optimizing application mapping for express channel-based on-chip networks. Specifically, we derive a new delay model for this type of networks, identify their unique characteristics, and propose an efficient heuristic mapping algorithm that increases the bypassing opportunities by reducing unnecessary turns that would otherwise impose the entire router pipeline delay to packets. Simulation results show that the proposed algorithm can achieve a 2~4X reduction in the number of turns and 10~26% reduction in the average packet delay.
    Design Automation and Test in Europe; 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: A new asynchronous arbitration node is introduced for use as a building block in an asynchronous interconnection network. The target network topology is a variant Mesh-of-Trees (MoT), combining a binary fan-out (i.e. routing) network and a binary fan-in (i.e. arbitration) network, which is becoming widely used for multi-core shared-memory interfaces. The two key features are: (i) each fan-in node can resolve its arbitration and pre-allocate the corresponding input channel, before the actual data arrives; and (ii) a lightweight shadow monitoring network fast forwards information as soon as data enters the network without synchronization to a fixed-rate clock, notifying each fan-in node on its path to enable the early arbitration. Simulations of the new arbitration node, using IBM 90nm technology and an ARM standard cell library, indicate latency reductions up to 54.4% over prior designs, while maintaining roughly comparable throughput. Network-level simulations were then performed on eight diverse synthetic benchmarks, comparing the new approach ("early arbitration") with two earlier alternative asynchronous MoT networks ("baseline" and "predictive"), using a mix of random and deterministic traffic. Considerable improvements in system latency were obtained on all benchmarks, ranging from 13.0% to 38.7%, with especially strong benefits for the two most adversarial benchmarks.
    2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC); 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: The network-on-chip is becoming an increasingly important component of chip multiprocessors. Recently bufferless deflection routers were proposed, aiming to reduce hardware cost in comparison to classic virtual channel based routers, by eliminating router buffers. We propose RIDER, a low cost deflection router based on an internal rotating ring structure with minimal number of buffers. We compare RIDER with 16 buffers to a wormhole router with 12 buffers, a virtual channel buffered router with 64 buffers, to CHIPPER, a bufferless deflection router with no buffers, and to MinBD, a buffered deflection router with four buffers.
    Design Automation for Embedded Systems 01/2014; DOI:10.1007/s10617-014-9130-0 · 0.24 Impact Factor

Preview

Download
3 Downloads
Available from