Conference Paper

A reconfigurable architecture for multicore systems

DOI: 10.1109/IPDPSW.2010.5470753 Conference: Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
Source: IEEE Xplore

ABSTRACT Various studies concluded that bus-based multiprocessor architectures outperform Network-on-Chip (NoC) architectures when the number of processors is relatively small. On the other hand, NoC architectures offer distinct performance advantages when the number of processors is large. This led to recent proposals for hybrid architectures where each node in a mesh-style packet-switched NoC architecture contains a bus-based subsystem with a small number of processors. Experimental results using select benchmarks demonstrated that these hybrid architectures offer superior performance when compared with purely bus based or purely NoC style architectures. Our studies indicate that while a hybrid architecture is preferable, the optimal number of processors on each bus subsystem varies based on the application. This number appears to vary between 1 and 8 depending on the communication requirements of the application. Further, various applications simultaneously executing on the same system require differing numbers of processors on each bus-based subsystem to minimize the overall throughput time. In this paper, we present a new reconfigurable NoC architecture which allows scalable bus-based multiprocessor subsystems on each node in the NoC. Following configuration, the system provides a multi-bus execution environment where each processor is connected to a bus and the bus-based subsystems communicate via routers connected in a mesh-style configuration. The system can be reconfigured to vary the number of bus subsystems and the number of processors on each subsystem. Each processor contains a Level 1 (L1) cache and each bus, connected to a router, has access to a Level 2 (L2) cache. The L2 caches distributed across the network together form a large virtual L2 that can shared by all the processors in the system via the router network. We present the architecture in detail, discuss a configuration algorithm, and discuss experimental results (using the NS2 and SIMICS simulators) -
on standard and synthetic benchmarks indicating the performance advantages of the proposed architecture.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mesh-connected processor array is a popular architecture used in parallel processing. Extensive studies have been conducted on reconfiguration algorithms for the processor arrays with faults, but few work is on parallel algorithm to accelerate the reconfiguration. This paper presents a fast algorithm to reconfigure two dimensional mesh-connected processor arrays with faults. A traditional algorithm is successfully accelerated in the manner of multithread, without loss of harvest. The proposed algorithm reconfigures the processor array with the mechanics of route distance in order to avoid the routing errors. Simulation results show that the proposed algorithm can accelerate the reconfiguration nearly by 15 times on a 64 × 64 array in comparison to the traditional algorithm cited in this paper.
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012 13th International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Proposed multicore architectures are usually evaluated using two types of benchmarks: application and synthetic. Application benchmarks use well understood computations to generate well defined workloads. In contrast, synthetic benchmarks are tunable to generate a range of custom workloads. Both classes are currently limited. Existing application benchmarks are inflexible. And the options offered by synthetic benchmarks are too limited to generate a large variety of workload patterns. In this paper we propose novel workload generation methodologies that allow system developers to generate custom benchmarks for desired workload conditions for a variety of existing and multicore architectures. Specifically we describe two configurable workload generators, which we name ConWork and CompWork. ConWork is a configurable synthetic workload generator using which artificial traffic among the processors and memories can be generated. CompWork is a configurable computational workload generator, which can be used to specify vector and matrix applications so as to elicit the desired computational workloads among the processors. Together the two generators provide a number of options to generate workloads to evaluate a variety of performance metrics of existing and emerging multicore architectures including bus based SoCs, packet switching NoCs and hybrids.
    IEEE 24th International SoC Conference, SOCC 2011, Taipei, Taiwan, September 26-28, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we present the NOC router architecture with five port support which utilizes dual crossbar arrangement, the latency which arises due to the dual cross bar architecture is reduced by using predominant routing algorithm. This arrangement is more efficient and reduces about 10 % of device utilization.
    Emerging Trends in VLSI, Embedded System, Nano Electronics and Telecommunication System (ICEVENT), 2013 International Conference on; 01/2013