Conference Paper

A generic reconfigurable neural network architecture as a network on chip

Pennsylvania State Univ., USA
DOI: 10.1109/SOCC.2004.1362404 Conference: SOC Conference, 2004. Proceedings. IEEE International
Source: IEEE Xplore

ABSTRACT Neural networks are widely used in pattern recognition, security applications and data manipulation. We propose a hardware architecture for a generic neural network, using network on chip (NoC) interconnect. The proposed architecture allows for expandability, mapping of more than one logical unit onto a single physical unit, and dynamic reconfiguration based on application-specific demands. Simulation results show that this architecture has significant performance benefits over existing architectures.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Providing highly flexible connectivity is a major architectural challenge for hardware implementation of reconfigurable neural networks. We perform an analytical evaluation and comparison of different configurable interconnect architectures (mesh NoC, tree, shared bus and point-to-point) emulating variants of two neural network topologies (having full and random exponential configurable connectivity). We derive analytical expressions and asymptotic limits for performance (in terms of bandwidth) and cost (in terms of area and power) of the interconnect architectures considering three communication methods (unicast, multicast and broadcast). It is shown that multicast mesh NoC provides the highest performance/cost ratio and consequently it is the most suitable interconnect architecture for configurable neural network implementation. Simulation results successfully validate the analytical models and the asymptotic behavior of the network as a function of its size.
    Networks-on-Chip (NOCS), 2010 Fourth ACM/IEEE International Symposium on; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Implementations of Artificial Neural Networks (ANNs) and their training often have to deal with a trade-off between efficiency and flexibility. Pure software solutions on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism, whereas hardware realizations usually rely on optimizations that reduce the range of applicable network topologies, or attempt to increase processing efficiency by means of low-precision data representation. This paper describes a mixed approach to ANN training, based on a system-on-chip architecture on a reconfigurable device, where a coprocessor with a large number of parallel neural processing units is controlled by software running on an embedded processor. Software control and the use of floating-point arithmetic guarantee system generality, and replication of processing logic is used to exploit parallelism. Implementation of the proposed architecture on a low-cost Altera FPGA achieves a performance of 431 MCUPS (millions of connection updates per second).
    International Journal on Advances in Systems and Measurements. 06/2009; 2(1):44-55.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Spiking neural networks (SNNs) attempt to emulate information processing in the mammalian brain based on massively parallel arrays of neurons that communicate via spike events. SNNs offer the possibility to implement embedded neuromorphic circuits, with high parallelism and low power consumption compared to the traditional von Neumann computer paradigms. Nevertheless, the lack of modularity and poor connectivity shown by traditional neuron interconnect implementations based on shared bus topologies is prohibiting scalable hardware implementations of SNNs. This paper presents a novel hierarchical network-on-chip (H-NoC) architecture for SNN hardware, which aims to address the scalability issue by creating a modular array of clusters of neurons using a hierarchical structure of low and high-level routers. The proposed H-NoC architecture incorporates a spike traffic compression technique to exploit SNN traffic patterns and locality between neurons, thus reducing traffic overhead and improving throughput on the network. In addition, adaptive routing capabilities between clusters balance local and global traffic loads to sustain throughput under bursting activity. Analytical results show the scalability of the proposed H-NoC approach under different scenarios, while simulation and synthesis analysis using 65-nm CMOS technology demonstrate high-throughput, low-cost area, and power consumption per cluster, respectively.
    IEEE Transactions on Parallel and Distributed Systems 12/2013; 24(12):2451-2461. · 2.17 Impact Factor


Available from