Keun Sup Shim

Massachusetts Institute of Technology, Cambridge, MA, USA

Are you Keun Sup Shim?

Claim your profile

Publications (11)0 Total impact

  • Source
    Conference Proceeding: Scalable, accurate multicore simulation in the 1000-core era
    [show abstract] [hide abstract]
    ABSTRACT: We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on 6 separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed 11 ×. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, and parameters driving power and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple DOR routing to complex Valiant, ROMM, or PROM schemes, BSOR, and adaptive routing. HORNET can run in network-only mode using synthetic traffic or traces, directly emulate a MIPS-based multicore, or function as the memory subsystem for native applications executed under the Pin instrumentation tool. HORNET is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/.
    Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on; 05/2011
  • Conference Proceeding: Memory coherence in the age of multicores.
    IEEE 29th International Conference on Computer Design, ICCD 2011, Amherst, MA, USA, October 9-12, 2011; 01/2011
  • Conference Proceeding: Deadlock-free fine-grained thread migration.
    NOCS 2011, Fifth ACM/IEEE International Symposium on Networks-on-Chip, Pittsburgh, Pennsylvania, USA, May 1-4, 2011; 01/2011
  • Source
    Conference Proceeding: Brief announcement: distributed shared memory based on computation migration.
    SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, USA, June 4-6, 2011 (Co-located with FCRC 2011); 01/2011
  • Source
    Conference Proceeding: Path-Diverse In-Order Routing
    M. Lis, Myong Hyon Cho, Keun Sup Shim, S. Devadas
    [show abstract] [hide abstract]
    ABSTRACT: We present Path-Diverse In-Order Routing (PDIOR), an oblivious routing method which guarantees network-level inorder delivery for multi-path routing. Based on Exclusive Dynamic Virtual Channel Allocation (EDVCA), which allows single-path efficient inorder delivery with dynamic virtual channel allocation, PDIOR extends the same guarantees to routing schemes where each flow may be routed via more than one path. As with EDVCA, PDIOR avoids the overheads inherent in reordering packets at the destination core, and requires only minor, inexpensive changes to traditional oblivious router architectures: for example, an implementation of PDIOR on 8×8 mesh network with 4 VCs per port requires 492 bytes of memory per node, while inorder packet delivery in a comparable conventional network may requires tens to hundreds of kilobytes of reorder buffer memory at each node.
    Green Circuits and Systems (ICGCS), 2010 International Conference on; 07/2010
  • Source
    Conference Proceeding: Static virtual channel allocation in oblivious routing.
    Third International Symposium on Networks-on-Chips, NOCS 2009, May 10-13 2009, La Jolla, CA, USA. Proceedings; 01/2009
  • Source
    Article: Guaranteed in-order packet delivery using Exclusive Dynamic Virtual Channel Allocation
    [show abstract] [hide abstract]
    ABSTRACT: In-order packet delivery, a critical abstraction for many higher-level protocols, can severely limit the performance potential in low-latency networks (common, for example, in network-on-chip designs with many cores). While basic variants of dimension-order routing guarantee in-order delivery, improving performance by adding multiple dynamically allocated virtual channels or using other routing schemes compromises this guarantee. Although this can be addressed by reordering out-of-order packets at the destination core, such schemes incur significant overheads, and, in the worst case, raise the specter of deadlock or require expensive retransmission. We present Exclusive Dynamic VCA, an oblivious virtual channel allocation scheme which combines the performance advantages of dynamic virtual allocation with in-network, deadlock-free in-order delivery. At the same time, our scheme reduces head-of-line blocking, often significantly improving throughput compared to equivalent baseline (out-of-order) dimension-order routing when multiple virtual channels are used, and so may be desirable even when in-order delivery is not required. Implementation requires only minor, inexpensive changes to traditional oblivious dimension-order router architectures, more than offset by the removal of packet reorder buffers and logic.
  • Source
    Article: Oblivious Routing in On-Chip Bandwidth-Adaptive Networks
    [show abstract] [hide abstract]
    ABSTRACT: Oblivious routing can be implemented on simple router hardware, but network performance suffers when routes become congested. Adaptive routing attempts to avoid hot spots by re-routing flows, but requires more complex hardware to determine and configure new routing paths. We propose on-chip bandwidth-adaptive networks to mitigate the performance problems of oblivious routing and the complexity issues of adaptive routing.In a bandwidth-adaptive network, the bisection bandwidth of a network can adapt to changing network conditions. We describe one implementation of a bandwidth-adaptive network in the form of a two-dimensional mesh with adaptive bidirectional links, where the bandwidth of the link in one direction can be increased at the expense of the other direction. Efficient local intelligence is used to reconfigure each link, and this reconfiguration can be done very rapidly in response to changing traffic demands. We compare the hardware designs of a unidirectional and bidirectional link and evaluate the performance gains provided by a bandwidth-adaptive network in comparison to a conventional network under uniform and bursty traffic when oblivious routing is used.
  • Source
    Article: Static virtual channel allocation in oblivious routing
    [show abstract] [hide abstract]
    ABSTRACT: Most virtual channel routers have multiple virtual channels to mitigate the effects of head-of-line blocking. When there are more flows than virtual channels at a link, packets or flows must compete for channels, either in a dynamic way at each link or by static assignment computed before transmission starts. In this paper, we present methods that statically allocate channels to flows at each link when oblivious routing is used, and ensure deadlock freedom for arbitrary minimal routes when two or more virtual channels are available. We then experimentally explore the performance trade-offs of static and dynamic virtual channel allocation for various oblivious routing methods, including DOR, ROMM, Valiant and a novel bandwidth-sensitive oblivious routing scheme (BSORM). Through judicious separation of flows, static allocation schemes often exceed the performance of dynamic allocation schemes.
    Srinivas Devadas.
  • Source
    Article: DARSIM: a parallel cycle-level NoC simulator
    [show abstract] [hide abstract]
    ABSTRACT: We present DARSIM, a parallel, highly configurable, cycle-level network-on-chip simulator based on an ingress-queued wormhole router architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization, permitting tradeoffs between perfect accuracy and high speed with very good accuracy. When run on four separate physical cores, speedups can exceed a factor of 3.5, while when eight threads are mapped to the same cores via hyperthreading, simulation speeds up as much as five-fold. Most hardware parameters are configurable, including geometry, bandwidth, crossbar dimensions, and pipeline depths. A highly parametrized table-based design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple DOR routing to complex Valiant, ROMM, or PROM schemes, BSOR, and adaptive routing. DARSIM can run in network-only mode using traces or directly emulate a MIPS-based multicore. Sources are freely available under the open-source MIT license.
    MoBS 2010 - Sixth Annual Workshop on Modeling, Benchmarking and Simulation.
  • Source
    Article: Path-Based, Randomized, Oblivious, Minimal Routing
    [show abstract] [hide abstract]
    ABSTRACT: Path-based, Randomized, Oblivious, Minimal routing (PROM) is a family of oblivious, minimal, path-diverse routing algorithms especially suitable for Network-on-Chip applications with n x n mesh geometry. Rather than choosing among all possible paths at the source node, PROM algorithms achieve the same effect progressively through efficient, local randomized decisions at each hop. Routing is deadlock-free in all PROM algorithms when the routers have at least two virtual channels. While the approach we present can be viewed as a generalization of both ROMM and O1TURN routing, it combines the low-hardware cost of O1TURN with the routing diversity offered by the most complex n-phase ROMM schemes. As all PROM algorithms employ the same hardware, a wide range of routing behaviors, from O1TURN-equivalent to uniformly path-diverse, can be effected by adjusting just one parameter, even while the network is live and continues to forward packets. Detailed simulation on a set of benchmarks indicates that, on equivalent hardware, the performance of PROM algorithms compares favorably to existing oblivious routing algorithms, including dimension-ordered routing, two-phase ROMM, and O1TURN.
    author/dept web page.