Conference Paper

Express virtual channels: Towards the ideal interconnection fabric

DOI: 10.1145/1250662.1250681 Conference: 34th International Symposium on Computer Architecture (ISCA 2007), June 9-13, 2007, San Diego, California, USA
Source: DBLP


ABSTRACT Due to wire delay scalability and bandwidth,limitations inherent in shared buses and dedicated links, packet-switched on-chip interconnection networks are fast emerging as the pervasive communication fabric to connect dieren t processing elements in many-core chips. However, current state-ofthe-art packet-switched networks rely on complex routers which increases the communication overhead and energy consumption as compared,to the ideal interconnection fabric. In this paper, we try to close the gap between the stateof-the-art packet-switched network and the ideal interconnect by proposing express virtual channels (EVCs), a novel o w control mechanism which allows packets to virtually bypass intermediate routers along their path in a completely non-speculative fashion, thereby lowering the energy/delay towards that of a dedicated wire while simultaneously approaching ideal throughput with a practical design suitable for on-chip networks. Our evaluation results using a detailed cycle-accurate simulator on a range of synthetic trac,and SPLASH benchmark traces show upto 84% reduction in packet latency and upto 23% improvement in throughput while reducing the average router energy consumption by upto 38% over an existing state-of-the-art packet-switched design. When compared to the ideal interconnect, EVCs add just two cycles to the no-load latency, and are within 14% of the ideal throughput. Moreover, we show that the proposed design incurs a minimal hardware overhead while exhibiting excellent scalability with increasing network sizes.

Download full-text


Available from: Partha Kundu, Oct 16, 2015
  • Source
    • "Therefore the main drawback of conventional NoCs is their inadequacy in latency predictability. A few techniques, such as express virtual channels [6], dedicated virtual channels, priority-based NoC routing [7], QoS at the network level [8], and RTOS support for NoC-based architectures [9], have been used to mitigate the lack of deterministic latency guarantees in NoC-based communications. Still, they have not been able to meet the hard real-time constraints required by many distributed realtime applications. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The increasing complexity of embedded systems is accelerating the use of multicore processors in these systems. This trend gives rise to new problems such as the sharing of on-chip network resources among hard real-time and normal best effort data traffic. We propose a network-on-chip router that provides predictable and deterministic communication latency for hard real-time data traffic while maintaining high concurrency and throughput for best-effort/general-purpose traffic with minimal hardware overhead. The proposed router requires less area than non-interfering networks, and provides better Quality of Service (QoS) in terms of predictability and determinism to hard real-time traffic than priority-based routers. We present a deadlock-free algorithm for decoupled routing of the two types of traffic. We compare the area and power estimates of three different router architectures with various QoS schemes using the IBM 45-nm SOI CMOS technology cell library. Performance evaluations are done using three realistic benchmark applications: a hybrid electric vehicle application, a utility grid connected photovoltaic converter system, and a variable speed induction motor drive application.
    Preview · Article · Feb 2015
  • Source
    • "Subsequently , the authors of[11]discuss the performance advantages achievable by inserting a few long-range wireline links following principles of small-world graphs[12]. The concept of express virtual channels is introduced in[13]. Despite significant performance gains, in all of the above schemes the long-range links are designed with conventional wires. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple Voltage Frequency Island (VFI)-based designs can reduce the energy dissipation in multicore chips. Indeed, by tailoring the voltages and frequencies of each VFI domain, we can achieve significant energy savings subject to specific performance constraints. The achievable performance of VFI-based multicore platforms depends on the overall communication backbone, which relies predominantly on Networks-on-Chip (NoCs). Traditionally mesh-based NoCs have been used in VFI-based systems. However, the mesh-based NoCs have large latency and energy overheads due to their inherently long multihop paths. Emerging paradigms such as the millimeter (mm)-wave small-world wireless Networks-on-Chip (mSWNoCs) have lately been observed to help reduce the impact of the communication backbone on the performance of the multicore chips. In this work, we demonstrate that not only do mSWNoC-enabled VFI designs mitigate some of the full-system performance degradation inherent in VFI-partitioned multicore designs, but they also help in eliminating it entirely for certain applications. We also demonstrate that the VFI-partitioned designs used in conjunction with a novel NoC architecture like mSWNoC can achieve significant energy savings while minimizing the impact on the performance for each application under consideration.
    Full-text · Article · Jan 2015 · IEEE Transactions on Computers
  • Source
    • "Transmitting data in NoCs is mainly in the form of data package causes high latency when the competition for transfer channel communicating in different nodes. In order to provide low latency and high bandwidth communication in NoCs fast router is proposed in [4] [5] [6] and new network topologies are proposed in [7] [8] [9]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To reduce traffic jam caused by various data competitions for channel, we present a low delay and energy efficient network-on-chip with three channels for different type’s data. Hence, the transmission for control data between cores won’t be congested by the big amount of data transmitted from caches to core, and it achieves better performance in latency and energy. Our strategy is to make a directive long wire to connect two nodes in the same row or column, and distribute these connective wires to different layers which are connected by 3D stacking technology. In the many-core system applied with this topology, every pair of core-cache nodes are at most 5 hops away while real-time and short control information is transmitted by a 2D mesh network. The experimental results show up to 23% of network latency reduction and up to 15% energy reduction when compared to a 3D network-on-chip.
    Full-text · Article · Dec 2013
Show more