ArticlePublisher preview available

A fast training method for memristor crossbar based multi-layer neural networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Memristor crossbar arrays carry out multiply–add operations in parallel in the analog domain which is the dominant operation in a neural network application. On-chip training of memristor neural network systems have the significant advantage of being able to get around device variability and faults. This paper presents a novel technique for on-chip training of multi-layer neural networks implemented using a single crossbar per layer and two memristors per synapse. Using two memristors per synapse provides double the synaptic weight precision when compared to a design that uses only one memristor per synapse. Proposed system utilizes a novel variant of the back-propagation (BP) algorithm to reduce both circuit area and training time. During training, all the memristors in a crossbar are updated in four steps in parallel. We evaluated the training of the proposed system with some nonlinearly separable datasets through detailed SPICE simulations which take crossbar wire resistance and sneak-paths into consideration. The proposed training algorithm trained the nonlinearly separable functions with a slight loss in accuracy compared to training with the traditional BP algorithm.
This content is subject to copyright. Terms and conditions apply.
A fast training method for memristor crossbar based multi-layer
neural networks
Raqibul Hasan
1
Tarek M. Taha
1
Chris Yakopcic
1
Received: 26 January 2017 / Accepted: 25 September 2017 / Published online: 5 October 2017
Springer Science+Business Media, LLC 2017
Abstract Memristor crossbar arrays carry out multiply–
add operations in parallel in the analog domain which is the
dominant operation in a neural network application. On-
chip training of memristor neural network systems have the
significant advantage of being able to get around device
variability and faults. This paper presents a novel technique
for on-chip training of multi-layer neural networks imple-
mented using a single crossbar per layer and two mem-
ristors per synapse. Using two memristors per synapse
provides double the synaptic weight precision when com-
pared to a design that uses only one memristor per synapse.
Proposed system utilizes a novel variant of the back-
propagation (BP) algorithm to reduce both circuit area and
training time. During training, all the memristors in a
crossbar are updated in four steps in parallel. We evaluated
the training of the proposed system with some nonlinearly
separable datasets through detailed SPICE simulations
which take crossbar wire resistance and sneak-paths into
consideration. The proposed training algorithm trained the
nonlinearly separable functions with a slight loss in accu-
racy compared to training with the traditional BP
algorithm.
Keywords Neural networks Memristor crossbars
Training algorithm On-chip training
1 Introduction
Reliability and power consumption are among the main
obstacles for continued performance improvement in future
computing systems. Embedded neural network based pro-
cessing systems have significant advantages to offer, such
as the ability to solve complex problems while consuming
very little power and area [1,2]. Memristors [3,4] have
received significant attention as a potential building block
for neuromorphic systems [5,6]. In these systems mem-
ristors are used in a crossbar structure. Memristor devices
in a crossbar structure can evaluate many multiply–add
operations in parallel in the analog domain very efficiently
(these are the dominant operations in neural networks).
This enables highly dense neuromorphic system with great
computational efficiency [1].
It is necessary to have an efficient training system for
memristor neural network based systems. Two approaches
for training are off-chip training and on-chip training. The
key benefit of off-chip training is that any training algo-
rithm can be implemented in software and run on powerful
computer clusters. Memristor crossbars are difficult to
model in software due to sneak paths and device variations
[7,8]. On-chip training has the advantage that it can take
into account variations between devices and can use the
full analog range of the device (as opposed to a set of
discrete resistances that off-chip training typically targets).
This paper presents circuits for on-chip training of
memristor crossbars that utilize two memristors per
synapse. The use of two memristors per synapse has sig-
nificant advantages over using a single memristor per
synapse. Most recent memristor crossbar circuit fabrica-
tions for neuromorphic computing have been using two
memristors per synapse [9,10]. Using two memristors per
synapse provides double the synaptic weight precision
&Raqibul Hasan
hasanm1@udayton.edu
Tarek M. Taha
tarek.taha@udayton.edu
Chris Yakopcic
cyakopcic1@udayton.edu
1
Department of Electrical and Computer Engineering,
University of Dayton, Dayton, OH, USA
123
Analog Integr Circ Sig Process (2017) 93:443–454
DOI 10.1007/s10470-017-1051-y
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... First, most network training is based on off-line data sets [4], [20], [23], [25], [38], and the proposed network is trained through online data sets. Second, some of the learning patterns are off-chip, such as computer or FPGA. ...
... Second, some of the learning patterns are off-chip, such as computer or FPGA. Some calculate in the iteration of implementing on-chip learning [23], [25], [38], but the storage of programming signals, such as buffers, computers, and FPGA, was hard to be integrated on-chip. To address this problem, we introduce the learning algorithm into the circuit design of the neural network. ...
Article
Full-text available
The analog circuit design of the memristive neural network (MNN), which can automatically perform the online learning algorithm, is an open question. In this paper, a memristive self-learning neuron circuit for implementing the online least mean square (LMS) algorithm is designed. Extending on the designed neuron circuit, the circuit implementation of the monolayer and multilayer neural network is proposed. The proposed neural network can automatically converge the output to the set target according to the input. The application-level validations of the circuits are done using pattern recognition and license plate detection. The performances of the designed MNN circuits and the effect of memristive variation are analyzed through PSpice simulations. The learning accuracy of the proposed circuit for license plate detection can reach 93%. Circuit simulation results reveal that the proposed MNN circuits can accelerate the training speed and have the tolerance to the variations of the memristor.
... Then, the neural network circuit based on the memristors is used to realize character recognition. In [22][23][24], differential input signals were applied to two rows of a memristor cross array. The sum of the output voltages was expressed as the difference between the resistance values of two memristors, thus obtaining positive, zero, and negative weights. ...
Article
Full-text available
The memristor-based neural network configuration is a promising approach to realizing artificial neural networks (ANNs) at the hardware level. The memristors can effectively simulate the strength of synaptic connections between neurons in neural networks due to their diverse significant characteristics such as nonvolatility, nanoscale dimensions, and variable conductance. This work presents a new synaptic circuit based on memristors and Complementary Metal Oxide Semiconductor(CMOS), which can realize the adjustment of positive, negative, and zero synaptic weights using only one control signal. The relationship between synaptic weights and the duration of control signals is also explained in detail. Accordingly, Widrow–Hoff algorithm-based memristive neural network (MNN) circuits are proposed to solve the recognition of three types of character pictures. The functionality of the proposed configurations is verified using SPICE simulation.
... PIM for DL training. Another body of works leverages PIM techniques to accelerate DL training [196,[247][248][249][250][251][252][253][254][255][256][257][258]. These works mainly utilize the analog computation capabilities of non-volatile memory (NVM) technologies to implement training of deep neural networks [247-250, 252, 254, 255, 257]. ...
Preprint
Full-text available
Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several representative classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world general-purpose PIM architecture, (2) rigorously evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is 27×27\times faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and 1.34×1.34\times faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is 2.8×2.8\times and 3.2×3.2\times than state-of-the-art CPU and GPU versions, respectively. To our knowledge, our work is the first one to evaluate ML training on a real-world PIM architecture. We conclude with key observations, takeaways, and recommendations that can inspire users of ML workloads, programmers of PIM architectures, and hardware designers & architects of future memory-centric computing systems.
... Emotional learning is contextdependent, the context can influence some classical emotional learning processes [23,24], such as habituation, acquisition and extinction, which is demonstrated by the simulations of the proposed memristive circuit. Besides, considering that the most memristive neural networks are usually used to deal with a single problem or task [25][26][27][28][29], we propose a multiinput multi-output circuit of the context-dependent emotional learning network and apply it to multi-task classification. In the implementation of the circuit, the multiple tasks are trained in parallel based on contextual information, which breaks the traditional ideal of divide-conquer. ...
Article
Full-text available
Emotional intelligence plays an important role in artificial intelligence. The brain circuitry of emotion mainly includes the prefrontal cortex, the amygdala, hippocampus and et al. Many brain emotional learning (BEL) models were proposed in recent years, the existing BEL models failed to consider the contextual information in practical applications, and do not discuss the corresponding circuit implementation. In this article, a context-dependent emotional learning network (CD-ELN) and its memristive circuit implementation are introduced. The added context-dependent module is used to process the contextual information, which makes the network context dependent when receiving the same input signals. For circuit implementation, the memristive circuit design mainly contains the amygdala module and orbitofrontal cortex module, which imitates the emotion learning process in the brain. Besides, a multi-input multioutput memristive circuit of the context-dependent emotional network is applied to multitask classification. PSPICE simulation results verified the adaptability and flexibility of the CD-ELN.
Conference Paper
We present a CMOS-Memristor hybrid analog design of a neuromorphic crossbar array with integrated inference and training. Each crosspoint on the crossbar includes a memristor to store synaptic weights. Integrate-and-fire (IF) neurons are designed using CMOS transistors and placed along the rows and columns of the crossbar. Learning of synaptic weights is facilitated using the trace-driven spike timing-dependent plasticity (TrSTDP) rule, where the trace (i.e., difference in spike timings) collected during forward propagation (i.e., inference) is used to compute weight updates. The key novelty of our design is an interface circuit that captures the trace during inference and autonomously controls the learning circuit (designed using memristor) to generate the appropriate voltage pulse width necessary to update the synaptic weight, without requiring any software/system support. Our interface circuit consists of a voltage-to-time converter (VTC), adder, and a voltage amplifier, all of which are designed using CMOS transistors. We implement the proposed design using Synopsys HSPICE at 90nm technology node and thoroughly evaluate the accuracy, latency, area, and power overheads of the interface circuit.
Article
In this brief, an efficient training method for memristor-based array (crossbar) with one transistor and one memristor (1T1M) synapse is proposed, which enables parallel update of memristor-based arrays trained by stochastic gradient descent within two steps. Voltage ThrEshold Adaptive Memristor (VTEAM) model is utilized to describe memristor characteristics for simulations. On this basis, circuit parameters optimization method compensating the asymmetric and nonlinear weight update is provided for better training results. The effectiveness of proposed training method is evaluated on OR, AND functions and digit recognition task. Simulation results demonstrate the robustness of proposed training method to electrical noise and imperfections of memristors.
Chapter
Neural networks (NNs) are utilized in a wide range of applications such as image classification, speech recognition, and forecasting events. The energy consumption associated with the use of these networks should be reduced as much as possible. There are different reasons behind the need for this reduction, such as the limited battery capacity used in portable devices as well as the air-cooling challenge in data centers. An effective approach for the energy reduction of the NNs as well as increasing the computation speed is to employ memristor crossbars to perform matrix-vector multiplication (MVM) operations required for generating the weighted sum of the inputs. A complementary approach is to use inverters for the implementation of the activation functions, consuming considerably less power compared to the conventional case of using operational amplifiers (Op-Amps). In this chapter, recent advances in the hardware implementation of inverter-based memristive NNs (IM-NNs) will be reviewed. This includes the overall structure of IM-NNs and their input/output interfaces to the digital domain. The interfaces, which consist of ultralow-power ADC and DAC circuits, comprise of inverters and memristors too. Furthermore, a comprehensive study of offline/online training methods for these NNs is provided. In the study, the loading effect of the memristive crossbars on the voltage transfer characteristic (VTC) of the inverters will be discussed. The effects of weight (and activation function) variations, which are in turn caused by variations of memristors conductance (and transistor characteristics), on the accuracy of these NNs are also investigated, and state-of-the-art approaches to overcome these issues are presented.KeywordsMemristive neural networksNon-idealities of circuit elementsMathematical analysisOffline/online trainingInput/output interfaceVariation mitigation training techniques
Article
This letter presents a pulse-width modulator (PWM) using a novel high-speed comparator, whose duty cycle changes with respect to the variation in the input reference voltage. This circuit has shown a variation of 44% in the duty cycle of a PWM wave for an input reference voltage variation of 2 V from measurements. This circuit is showing an input referred offset voltage of 40 mV, an input rms noise voltage of 4 mV and a power consumption of 56 μW\mu \text{W} . Further, the circuit is able to discriminate a minimum differential input voltage of 40 mV. The reliable operation was demonstrated with a maximum input signal frequency of 10 kHz and a clock frequency of up to 1 MHz at a supply voltage of 4 V under normal ambient conditions. This circuit was fabricated on a flexible 30- μm\mu \text{m} thick polyamide substrate using an a-IGZO TFT technology.
Article
Full-text available
Artificial neural networks (ANNs) are finding increasing use as tools to model and solve problems in almost every discipline in today’s world. The successful implementation of ANNs in software—particularly in the fields of deep learning and machine learning—has spiked an interest in designing hardware architectures that are custom-made to implement ANNs. Several categories of ANNs exist. The two-layer bidirectional associative memory (BAM) is a particular class of hetero-associative memory networks that is extremely efficient and exhibits good performance for storing and retrieving pattern pairs. The memristor is a novel hardware element that is well-suited to modelling neural synapses because it exhibits tunable resistance. In this work, in order to create a device that can perform Braille–Latin conversion, we have implemented a circuit realization of a BAM neural network. The implemented hardware BAM uses a memristor crossbar array for modelling neural synapses and a neuron circuit comprising an I-to-V converter (resistor), voltage comparator, D flip-flop, and inverter. The efficiency of the implemented hardware BAM was tested initially using 2 × 2 and 3 × 3 patterns. Upon successfully verifying the ability of the implemented BAM to store and retrieve simple pattern pairs, it was trained for a pattern-recognition application, namely mapping Braille alphabets to their Latin counterparts and vice versa. The performance of the implemented BAM network is robust even on the introduction of noise. The application can recognize the input patterns with accuracies of 100% in either direction when tested with up to 30% noise.
Article
Full-text available
We investigated batch and stochastic Manhattan Rule algorithms for training multilayer perceptron classifiers implemented with memristive crossbar circuits. In Manhattan Rule training, the weights are updated only using sign information of classical backpropagation algorithm. The main advantage of Manhattan Rule is its simplicity, which leads to more compact hardware implementation and faster training time. Additionally, in case of stochastic training, Manhattan Rule allows performing all weight updates in parallel, which further speeds up the training procedure. The tradeoff for simplicity is slightly worse classification performance. For example, simulation results showed that classification fidelity on Proben1 benchmark for memristor-based implementation trained with batch Manhattan Rule were comparable to that of classical backpropagation algorithm, and about 2.8 percent worse than the best reported results.
Conference Paper
Full-text available
The artificial neural network (ANN) is among the most widely used methods in data processing applications. The memristor-based neural network further demonstrates a power efficient hardware realization of ANN. Training phase is the critical operation of memristor-based neural network. However, the traditional training method for memristor-based neural network is time consuming and energy inefficient. Users have to first work out the parameters of memristors through digital computing systems and then tune the memristor to the corresponding state. In this work, we introduce a mixed-signal training acceleration framework, which realizes the self-training of memristor-based neural network. We first modify the original stochastic gradient descent algorithm by approximating calculations and designing an alternative computing method. We then propose a mixed-signal acceleration architecture for the modified training algorithm by equipping the original memristor-based neural network architecture with the copy crossbar technique, weight update units, sign calculation units and other assistant units. The experiment on the MNIST database demonstrates that the proposed mixed-signal acceleration is 3 orders of magnitude faster and 4 orders of magnitude more energy efficient than the CPU implementation counterpart at the cost of a slight decrease of the recognition accuracy (<; 5%).
Conference Paper
Full-text available
As improvements in per-transistor speed and energy efficiency diminish, radical departures from conventional approaches are becoming critical to improving the performance and energy efficiency of general-purpose processors. We propose a solution-from circuit to compiler-that enables general-purpose use of limited-precision, analog hardware to accelerate “approximable” code-code that can tolerate imprecise execution. We utilize an algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an “analog” neural model. We outline the challenges of taking an analog approach, including restricted-range value encoding, limited precision in computation, circuit inaccuracies, noise, and constraints on supported topologies. We address these limitations with a combination of circuit techniques, a hardware/software interface, neural-network training techniques, and compiler support. Analog neural acceleration provides whole application speedup of 3.7× and energy savings of 6.3× with quality loss less than 10% for all except one benchmark. These results show that using limited-precision analog circuits for code acceleration, through a neural approach, is both feasible and beneficial over a range of approximation-tolerant, emerging applications including financial analysis, signal processing, robotics, 3D gaming, compression, and image processing.
Article
Full-text available
Learning in multilayer neural networks (MNNs) relies on continuous updating of large matrices of synaptic weights by local rules. Such locality can be exploited for massive parallelism when implementing MNNs in hardware. However, these update rules require a multiply and accumulate operation for each synaptic weight, which is challenging to implement compactly using CMOS. In this paper, a method for performing these update operations simultaneously (incremental outer products) using memristor-based arrays is proposed. The method is based on the fact that, approximately, given a voltage pulse, the conductivity of a memristor will increment proportionally to the pulse duration multiplied by the pulse magnitude if the increment is sufficiently small. The proposed method uses a synaptic circuit composed of a small number of components per synapse: one memristor and two CMOS transistors. This circuit is expected to consume between 2% and 8% of the area and static power of previous CMOS-only hardware alternatives. Such a circuit can compactly implement hardware MNNs trainable by scalable algorithms based on online gradient descent (e.g., backpropagation). The utility and robustness of the proposed memristor-based circuit are demonstrated on standard supervised learning tasks.
Article
Motivated by energy constraints, future heterogeneous multi-cores may contain a variety of accelerators, each targeting a subset of the application spectrum. Beyond energy, the growing number of faults steers accelerator research towards fault-tolerant accelerators. In this article, we investigate a fault-tolerant and energy-efficient accelerator for signal processing applications. We depart from traditional designs by introducing an accelerator which relies on unary coding, a concept which is well adapted to the continuous real-world inputs of signal processing applications. Unary coding enables a number of atypical micro-architecture choices which bring down area cost and energy; moreover, unary coding provides graceful output degradation as the amount of transient faults increases. We introduce a configurable hybrid digital/analog micro-architecture capable of implementing a broad set of signal processing applications based on these concepts, together with a back-end optimizer which takes advantage of the special nature of these applications. For a set of five signal applications, we explore the different design tradeoffs and obtain an accelerator with an area cost of 1.63mm². On average, this accelerator requires only 2.3% of the energy of an Atom-like core to implement similar tasks. We then evaluate the accelerator resilience to transient faults, and its ability to trade accuracy for energy savings.
Conference Paper
This paper describes memristor-based neuromorphic circuits for non-linear separable pattern recognition. We initially describe a memristor based neuron circuit and then show how multilayer neural networks can be constructed based on this neuron circuit. By applying neural network learning algorithms to these circuits, we demonstrate the learning of both linearly and non-linearly separable logic functions. The simulations are carried out in SPICE using a detailed memristor model so that the crossbar is simulated as accurately as possible. We also examine the system level performance of multicore memristor crossbar based neuromorphic processors. We consider the impact on on-chip routing, calculate the chip areas, and evaluate the timing of the systems in the study. The results indicate that such architectures can enable over 300,000 times energy efficiencies over traditional high performance computing architectures when processing large neural networks.