Conference Paper

A reconfigurable architecture for multicore systems

DOI: 10.1109/IPDPSW.2010.5470753 Conference: Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
Source: IEEE Xplore


Various studies concluded that bus-based multiprocessor architectures outperform Network-on-Chip (NoC) architectures when the number of processors is relatively small. On the other hand, NoC architectures offer distinct performance advantages when the number of processors is large. This led to recent proposals for hybrid architectures where each node in a mesh-style packet-switched NoC architecture contains a bus-based subsystem with a small number of processors. Experimental results using select benchmarks demonstrated that these hybrid architectures offer superior performance when compared with purely bus based or purely NoC style architectures. Our studies indicate that while a hybrid architecture is preferable, the optimal number of processors on each bus subsystem varies based on the application. This number appears to vary between 1 and 8 depending on the communication requirements of the application. Further, various applications simultaneously executing on the same system require differing numbers of processors on each bus-based subsystem to minimize the overall throughput time. In this paper, we present a new reconfigurable NoC architecture which allows scalable bus-based multiprocessor subsystems on each node in the NoC. Following configuration, the system provides a multi-bus execution environment where each processor is connected to a bus and the bus-based subsystems communicate via routers connected in a mesh-style configuration. The system can be reconfigured to vary the number of bus subsystems and the number of processors on each subsystem. Each processor contains a Level 1 (L1) cache and each bus, connected to a router, has access to a Level 2 (L2) cache. The L2 caches distributed across the network together form a large virtual L2 that can shared by all the processors in the system via the router network. We present the architecture in detail, discuss a configuration algorithm, and discuss experimental results (using the NS2 and SIMICS simulators) -
on standard and synthetic benchmarks indicating the performance advantages of the proposed architecture.

11 Reads
  • Source
    • "This software is used in the academic and research fields for exploration into the performance of many aspects of networking, including NoC [8], [9]. Since the software is open source, new code was written to extend the functionality. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Polymorphic processors attempt to merge the benefits of general purpose processors with performance gains from reconfigurable elements. In this paper, we present a novel polymorphic processor architecture. The integration of a network-on-a-chip (NoC) architecture as a replacement for the processor datapath creates unique requirements for the NoC design. We explore multiple NoC topologies as potential candidates for the creation of a polymorphic processor. A simulator based around the network simulator 2 (ns-2) software platform is created. Standard embedded processor benchmark programs are simulated to explore critical parameters and NoC design decisions impacting the performance of the polymorphic processor.
    Preview · Conference Paper · Jun 2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Proposed multicore architectures are usually evaluated using two types of benchmarks: application and synthetic. Application benchmarks use well understood computations to generate well defined workloads. In contrast, synthetic benchmarks are tunable to generate a range of custom workloads. Both classes are currently limited. Existing application benchmarks are inflexible. And the options offered by synthetic benchmarks are too limited to generate a large variety of workload patterns. In this paper we propose novel workload generation methodologies that allow system developers to generate custom benchmarks for desired workload conditions for a variety of existing and multicore architectures. Specifically we describe two configurable workload generators, which we name ConWork and CompWork. ConWork is a configurable synthetic workload generator using which artificial traffic among the processors and memories can be generated. CompWork is a configurable computational workload generator, which can be used to specify vector and matrix applications so as to elicit the desired computational workloads among the processors. Together the two generators provide a number of options to generate workloads to evaluate a variety of performance metrics of existing and emerging multicore architectures including bus based SoCs, packet switching NoCs and hybrids.
    No preview · Conference Paper · Sep 2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mesh-connected processor array is a popular architecture used in parallel processing. Extensive studies have been conducted on reconfiguration algorithms for the processor arrays with faults, but few work is on parallel algorithm to accelerate the reconfiguration. This paper presents a fast algorithm to reconfigure two dimensional mesh-connected processor arrays with faults. A traditional algorithm is successfully accelerated in the manner of multithread, without loss of harvest. The proposed algorithm reconfigures the processor array with the mechanics of route distance in order to avoid the routing errors. Simulation results show that the proposed algorithm can accelerate the reconfiguration nearly by 15 times on a 64 × 64 array in comparison to the traditional algorithm cited in this paper.
    No preview · Conference Paper · Dec 2012
Show more