Jin Ouyang

Pennsylvania State University, University Park, Maryland, United States

Are you Jin Ouyang?

Claim your profile

Publications (11)0 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Hard real-time embedded systems impose a strict latency requirement on interconnection subsystems. In the case of network-on-chip (NoC), this means each packet of a traffic stream has to be delivered within a time interval. In addition, with the increasing complexity of NoC, it consumes a significant portion of total chip power, which boosts the power footprint of such chips. In this work, we propose a methodology to minimize the energy consumption of NoC without violating the prespecified latency deadlines of real-time applications. First, we develop a formal approach based on network calculus to obtain the worst-case delay bound of all packets, from which we derive a safe estimate of the number of cycles that a packet can be further delayed in the network without violating its deadline- the worst-case slack. With this information, we then develop an optimization algorithm that trades the slacks for lower NoC energy. Our algorithm recognizes the distribution of slacks for different traffic streams, and assigns different voltages and frequencies to different routers to achieve NoC energy-efficiency, while meeting the deadlines for all packets.
    Design Automation Conference (DAC), 2013 50th ACM / EDAC / IEEE; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The increasing number of cores in contemporary and future many-core processors will continue to demand high through-put, scalable, and energy efficient on-chip interconnection networks. To overcome the intrinsic inefficiency of electrical interconnects, researchers have leveraged recent developments in chip photonics to design novel optical network-on-chip (NoC). However, existing optical NoCs are mostly based on passively switched, channel-guided optical interconnect in which large amount of power is wasted in heating the micro-rings and maintaining the optical signal integrity. In this paper we present an optical NoC based on free-space optical interconnect in which optical signals emitted from the transmitter is propagated in the free space in the package. With lower attenuation and no coupling effects, free-space optical interconnects have less overheads to maintain the signal integrity, and no energy waste for heating micro-rings. In addition, we propose a novel cost-effective wavelength-switching method where a refractive grating layer directs optical signals in different wavelengths to different photodetectors without collision. Based on the above interconnect and switching technologies, we propose free flattened butterfly (F2BFLY) NoC which features both high-radix network and dense free-space optical interconnects to improve the performance while reducing the power. Our experiment results, comparing F2BFLY with state-of-the-art electrical and optical on-chip networks, show that it is a highly competitive interconnect substrate for many-core architectures.
    05/2011;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Phase Change Random Access Memory (PRAM) has great potential as the replacement of DRAM as main memory, due to its advantages of high density, non-volatility, fast read speed, and excellent scalability. However, poor endurance and high write energy appear to be the challenges to be tackled before PRAM can be adopted as main memory. In order to mitigate these limitations, prior research focuses on reducing write intensity at the bit level. In this work, we study the data pattern of memory write operations, and explore the frequent-value locality in data written back to main memory. Based on the fact that many data are written to memory repeatedly, an architecture of frequent-value storage is proposed for PRAM memory. It can significantly reduce the write intensity to PRAM memory so that the lifetime is improved and the write energy is reduced. The trade-off between endurance and capacity of PRAM memory is explored for different configurations. After using the frequent-value storage, the endurance of PRAM is improved to about 1.6X on average, and the write energy is reduced by 20%.
    Proceedings of the 16th Asia South Pacific Design Automation Conference, ASP-DAC 2011, Yokohama, Japan, January 25-27, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The emerging three-dimensional (3D) integration technology is one of the promising solutions to overcome the barriers in interconnection scaling, thereby offering an opportunity to continue performance improvements using CMOS technology. As the fabrication of 3D integrated circuits has become viable, developing CAD tools and architectural techniques are imperative for the successful adoption of 3D integration technology. In this article, we first give a brief introduction on the 3D integration technology, and then review the EDA challenges and solutions that can enable the adoption of 3D ICs, and finally present design and architectural techniques on the application of 3D ICs, including a survey of various approaches to design future 3D ICs, leveraging the benefits of fast latency, higher bandwidth, and heterogeneous integration capability that are offered by 3D technology.
    Foundations and Trends in Electronic Design Automation. 01/2011; 5:1-151.
  • Source
    Jin Ouyang, Yuan Xie
    [Show abstract] [Hide abstract]
    ABSTRACT: With the recent development in silicon photonics, researchers have developed optical network-on-chip (NoC) architectures that achieve both low latency and low power, which are beneficial for future large scale chip-multiprocessors (CMPs). However, none of the existing optical NoC architectures has quality-of-service (QoS) support, which is a desired feature of an efficient interconnection network. QoS support provides contending flows with differentiated bandwidths according to their priorities (or weights), which is crucial to account for application-specific communication patterns and provides bandwidth guarantees for real-time applications. In this paper, we propose a quality-of-service framework for optical network-on-chip based on frame-based arbitration. We show that the proposed approach achieves excellent differentiated bandwidth allocation with only simple hardware additions and low performance overheads. To the best of our knowledge, this is the first work that provides QoS support for optical network-on-chip.
    Proceedings of the 16th Asia South Pacific Design Automation Conference, ASP-DAC 2011, Yokohama, Japan, January 25-27, 2011; 01/2011
  • Conference Paper: F
    Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31 - June 04, 2011; 01/2011
  • Source
    Jin Ouyang, Yuan Xie
    [Show abstract] [Hide abstract]
    ABSTRACT: Providing quality-of-service (QoS) for concurrent tasks in many-core architectures is becoming important, especially for real-time applications. QoS support for on-chip shared resources (such as shared cache, bus, and memory controllers)in chip-multiprocessors has been investigated in recent years. Unlike other shared resources, network-on-chip (NoC) does not typically have central arbitration of accesses to the shared resource. Instead, each router shares the responsibility of resource allocation. While such distributed nature benefits the scalable performance of NoC, it also dramatically complicates the problem of providing QoS support for individual flows. Existing approaches to address this problem suffer from various shortcomings such as low network utilization and weak QoS guarantees. In this work, we propose LOFT No architecture which features both high network utilization and strong QoS guarantees. LOFT is based on the combination of two mechanisms: a) locally-synchronized frames (LSF), which is a distributed frame-based scheduling mechanism that provides flexible QoS guarantees to different flows and b)flit-reservation (FRS), which is a flow-control mechanism integrated in LSF that improves network utilization. The experimental results show that LOFT delivers flexible and reliable QoS guarantees while sufficiently utilizes available network capacity to gain high overall throughput.
    43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010, 4-8 December 2010, Atlanta, Georgia, USA; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent 3DIC studies, through silicon vias (TSV) are usually employed as the vertical interconnects in the 3D stack. Despite its benefit of short latency and low power, forming TSVs adds additional complexities to the fabrication process. Recently, inductive/capactive-coupling links are proposed to replace TSVs in 3D stacking because the fabrication complexities of them are lower. Although state-of-the-art inductive/capacitive-coupling links show comparable bandwidth and power as TSV, the relatively large footprints of those links compromise their area efficiencies. In this work, we study the design of 3D network-on-chip (NoC) using inductive/capacitive-coupling links. We propose three techniques to mitigate the area overhead introduced by using these links: (a) serialization, (b) in-transceiver data compression, and (c) high-speed asynchronous transmission. With the combination of these three techniques, evaluation results show that the overheads of all aspects caused by using inductive/capacitive-coupling vertical links can be bounded under 10%.
    2010 International Conference on Computer-Aided Design (ICCAD'10), November 7-11, 2010, San Jose, CA, USA; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Networks-on-chip (NoC) is emerging as a key on-chip communication architecture for multiprocessor systems-on-chip (MPSoC). In traditional electronic NoCs, high bandwidth can be obtained by increasing the number of parallel metallic wires at the cost of more energy consumption. Optical NoCs are thus proposed to achieve low-power ultra-high-bandwidth data transmission in optical domain. Electronic control technology could be a complement to the optical networks. Besides NoCs, three-dimensional integrated circuits (3D ICs) are another attractive solution for system performance improvement by reducing the interconnect length. The investigation of using 3D IC as a platform for the realization of mixed-technology electronic-controlled optical NoC has not been addressed until recently. In this paper, we propose a 3D electronic-controlled optical NoC implemented in a TSV-based (through-silicon via) two-layer 3D chip. The upper device layer is an optical layer. It integrates an optical data transmission network, which is responsible for optical payload packets transmission. The bottom device layer is an electronic layer. It contains an electronic control network, which is used to route control packets and configure the optical network. We built an 8 times 8 mesh-based 3D optical NoC, with a 45 nm electronic control network. Power comparison with a matched 2D electronic NoC shows that the optical NoC can reduce power consumption significantly. For 2048 B packets, it has a 70% power reduction. End-to-end delay (ETE delay) and network throughput of the two NoCs under varying injection rates were evaluated for comparison. The results show that ETE delay of the optical NoC is much smaller than the electronic NoC when the network becomes congested. Take 4096 B packets for example, it is 18.7 mus in the optical NoC with an injection rate of 0.5, while 33.5 mus in the electronic one. A maximum throughput of 478 Gbps can be offered by the optical NoC using 32 Gbps optical link bandwi- dth. Because of the low resource utilization of circuit switching, the maximum throughput of the optical NoC is slightly lower than the electronic one.
    3D System Integration, 2009. 3DIC 2009. IEEE International Conference on; 10/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Embedded processors have become increasingly complex, re- sulting in variable execution behavior and reduced timing predictability. On such processors, safe timing specifica- tions expressed as bounds on the worst-case execution time (WCET) are generally too loose due to conservative assump- tions about complex architectural features, timing anoma- lies and programmatic complexities. Hence, exploiting the latest architectures may not be an option for embedded sys- tems with hard real-time constraints where deadline misses cannot be tolerated. This work addresses these shortcomings by contributing CheckerCore. CheckerCore is a mode-enhanced SPARC v8 soft core processor synthesized on an FPGA. During reg- ular execution the core adheres to its original specifications. But when operating in a special time-checking configura- tion, CheckerCore executes programs irrespective of inputs and steers execution along selected control flow paths. Such execution allows systematic derivation of worst-case execu- tion time (WCET) bounds. This paper presents the design and implementation of CheckerCore and illustrates its use in deriving accurate WCET bounds for a set of embedded benchmarks. Overall, CheckerCore proposes a realistic processor core enhancement that encapsulate processor details without revealing them to users while supporting safe bounding of WCETs. To the best of our knowledge, this is the first contribution of a WCET-enhancing microar- chitectural feature besides full processor encapsulations.
    Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2009, Grenoble, France, October 11-16, 2009; 01/2009
  • Source
    Yibo Chen, Jin Ouyang, Yuan Xie
    [Show abstract] [Hide abstract]
    ABSTRACT: The impact of process variations on circuit timing increases rapidly as technology scales. Consequently, it is important to consider timing variations at the early stages of circuit designs. Conventional high level synthesis relies on the worst-case delay analysis to guide the design space exploration, however, such worst-case timing analysis can results in overly conservative designs with pessimistic performance estimation. This paper presents a 0-1 integer linear programming (ILP) formulation that aims at reducing the impact of timing variations in high-level synthesis, by integrating overall timing yield constraints into scheduling and resource binding. The proposed approach focuses on how to achieve the maximum performance (minimum latency) under given timing yield constraints with affordable computation time. Experiment results show that significant latency reduction is achieved under different timing yield constraints, compared to traditional worst-case based approach.
    SOC Conference, 2008 IEEE International; 10/2008

Publication Stats

19 Citations

Institutions

  • 2008–2011
    • Pennsylvania State University
      • Department of Computer Science and Engineering
      University Park, Maryland, United States
  • 2009
    • The Hong Kong University of Science and Technology
      • Department of Electronic and Computer Engineering
      Kowloon, Hong Kong