Conference Paper

Corona: System Implications of Emerging Nanophotonic Technology.

Univ. of Wisconsin - Madison, Madison, WI
DOI: 10.1109/ISCA.2008.35 Conference: 35th International Symposium on Computer Architecture (ISCA 2008), June 21-25, 2008, Beijing, China
Source: DBLP

ABSTRACT We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length global wires are significant bandwidth impediments. Recent developments in silicon nanophotonic technology have the potential to meet these off- and on-stack bandwidth requirements at acceptable power levels. Corona is a 3 D many-core architecture that uses nanophotonic communication for both inter-core communication and off-stack communication to memory or I/O devices. Its peak floating-point performance is 10 teraflops. Dense wavelength division multiplexed optically connected memory modules provide 10 terabyte per second memory bandwidth. A photonic crossbar fully interconnects its 256 low-power multithreaded cores at 20 terabyte per second bandwidth. We have simulated a 1024 thread Corona system running synthetic benchmarks and scaled versions of the SPLASH-2 benchmark suite. We believe that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Optical network-on-chip (ONoC) is a promising solution for its high bandwidth and low energy consumption. However, optical circuit switching requires an electrical control network, which leads to network congestion, low network utilization, high latency, and an overhead in energy consumption. In this letter, we propose an architecture called RPNoC based on ring topology using packet switching. A wavelength assignment method and a deadlock-free deterministic routing algorithm are jointly designed, which significantly reduce the maximum number of hops using much fewer optical devices compared with other packet-switched ONoCs. The simulation is carried out for 64-node RPNoC under uniform and realistic traffic pattern. The evaluation results show that it yields higher throughput, lower latency, and lower energy consumption than mesh and ring networks.
    IEEE Photonics Technology Letters 01/2015; 27(4):423-426. DOI:10.1109/LPT.2014.2376972 · 2.18 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In manycore systems, the silicon-photonic link technology is projected to replace electrical link technology for global communication in network-on-chip (NoC) as it can provide as much as an order of magnitude higher bandwidth density and lower data-dependent power. However, a large amount of fixed power is dissipated in the laser sources required to drive these silicon-photonic links, which negates any bandwidth density advantages. This large laser power dissipation depends on the number of on-chip silicon-photonic links, the bandwidth of each link, and the photonic losses along each link. In this paper, we propose to reduce the laser power dissipation at runtime by dynamically activating/deactivating L2 cache banks and switching ON/OFF the corresponding silicon-photonic links in the NoC. This method effectively throttles the total on-chip NoC bandwidth at runtime according to the memory access features of the applications running on the manycore system. Full-system simulation utilizing Princeton application repository for shared-memory computers and Stanford parallel applications for shared-memory-2 parallel benchmarks reveal that our proposed technique achieves on an average 23.8% (peak value 74.3%) savings in laser power, and 9.2% (peak value 26.9%) lower energy-delay product for the whole system at the cost of 0.65% loss (peak value 2.6%) in instructions per cycle on average when compared to the cases where all L2 cache banks are always active.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 01/2015; DOI:10.1109/TCAD.2015.2402172 · 1.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: During the past decade, the very large scale integration (VLSI) community has migrated towards incorporating multiple cores on a single chip to sustain the historic performance improvement in computing systems. As the core count continuously increases, the performance of network-on-chip (NoC), which is responsible for the communication between cores, caches and memory controllers, is increasingly becoming critical for sustaining the performance improvement. In this dissertation, we propose several methods to improve the energy efficiency of both electrical and silicon-photonic NoCs. Firstly, for electrical NoC, we propose a flow control technique, Express Virtual Channel with Taps (EVC-T), to transmit both broadcast and data packets efficiently in a mesh network. A low-latency notification tree network is included to maintain the order of broadcast packets. The EVC-T technique improves the NoC latency by 24% and the system energy efficiency in terms of energy-delay product (EDP) by 13%. In the near future, the silicon-photonic links are projected to replace the electrical links for global on-chip communication due to their lower datadependent power and higher bandwidth density, but the high laser power can more than offset these advantages. Therefore, we propose a silicon-photonic multi-bus NoC architecture and a methodology that can reduce the laser power by 49% on average through bandwidth reconfiguration at runtime based on the variations in bandwidth requirements of applications. We also propose a technique to reduce the laser power by dynamically activating/deactivating the L2 cache banks and switching ON/OFF the corresponding silicon-photonic links in a crossbar NoC. This cache-reconfiguration based technique can save laser power by 23.8% and improves system EDP by 5.52% on average. In addition, we propose a methodology for placing and sharing on-chip laser sources by jointly considering the bandwidth requirements, thermal constraints and physical layout constraints. Our proposed methodology for placing and sharing of on-chip laser sources reduces laser power. In addition to reducing the laser power to improve the energy efficiency of silicon-photonic NoCs, we propose to leverage the large bandwidth provided by silicon-photonic NoC to share computing resources. The global sharing of floating-point units can save system area by 13.75% and system power by 10%.
    01/2014, Degree: Doctor of Philosophy, Supervisor: Ajay Joshi

Full-text (2 Sources)

Available from
Jun 10, 2014