ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Reinforcement learning algorithms that use deep neural networks are a promising approach for the development of machines that can acquire knowledge and solve problems without human input or supervision. At present, however, these algorithms are implemented in software running on relatively standard complementary metal–oxide–semiconductor digital platforms, where performance will be constrained by the limits of Moore’s law and von Neumann architecture. Here, we report an experimental demonstration of reinforcement learning on a three-layer 1-transistor 1-memristor (1T1R) network using a modified learning algorithm tailored for our hybrid analogue–digital platform. To illustrate the capabilities of our approach in robust in situ training without the need for a model, we performed two classic control problems: the cart–pole and mountain car simulations. We also show that, compared with conventional digital systems in real-world reinforcement learning tasks, our hybrid analogue–digital computing system has the potential to achieve a significant boost in speed and energy efficiency. A reinforcement learning algorithm can be implemented on a hybrid analogue–digital platform based on memristive arrays for parallel and energy-efficient in situ training.
In-memristor reinforcement learning in the mountain car environment a, Schematic illustration of the mountain car environment. The car is originally situated in the bottom of the valley. Its engine is too weak to overcome the gravitational force to reach the goal marked by the flag. The learning agent can make a left push, right push, or no push at each discrete time step to make the car reach the target in the shortest time. b, Experimental curve of the hybrid analogue–digital system and digitally simulated curves with 0, 4 and 8 µS programming noise tracking the number of rewards per epoch. The experimental memristor performance is similar to the simulated one with 4 µS programming noise, following the same trend as the noise-free double-precision floating-point simulation. The simulated curve with 8 µS programming noise has a relatively poor performance with more negative rewards in the early epochs, as illustrated in the zoomed inset. c, The time evolution of the car position x. The agent quickly discovered the technique to drive the car back and forth, as indicated by the oscillation patterns. d, Left: value map of all possible input states over the two-dimensional input domain, given by the maximum Q-value associated with the allowed actions of a particular input state. Notably, the right bottom corner is the highest action value since the positive position and velocity has guaranteed success. Similarly, the left upper corner is a local maximum because the altitude equips the car with a large potential energy to accelerate in the subsequent rightward motion. Right: learned map of actions of all possible input states over the two-dimensional input domain. The agent exerts a left (right) push if the car moves left (right), which helps the car to quickly build up momentum.
… 
This content is subject to copyright. Terms and conditions apply.
Articles
https://doi.org/10.1038/s41928-019-0221-6
1Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA, USA. 2Binghamton University, Binghamton, NY, USA.
3Hewlett Packard Labs, Hewlett Packard Enterprise, Palo Alto, CA, USA. 4Air Force Research Laboratory, Information Directorate, Rome, NY, USA.
5College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA. 6Department of Electrical Engineering and Computer
Science, Syracuse University, Syracuse, NY, USA. 7Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.
8These authors contributed equally: Zhongrui Wang, Can Li. *e-mail: qxia@umass.edu; jjyang@umass.edu
A
primary goal of machine learning is to equip machines with
behaviours that optimize their control over different envi-
ronments. Unlike supervised or unsupervised learning, rein-
forcement learning, which is inspired by cognitive neuroscience,
provides a way to formulate the decision-making process that is
learned without a supervisor providing labelled training examples.
It instead uses less informative evaluations in the form of ‘rewards’,
and learning is directed towards maximizing the amount of rewards
received over time1.
Developments in deep neural networks have advanced rein-
forcement learning2,3, exemplified with the recent achievements of
AlphaGo4,5. However, the first generation of AlphaGo (or AlphaGo
Fan) ran on 1,920 central processing units (CPUs) and 280 graph-
ics processing units (GPUs), consuming a peak power of half a
megawatt. Application-specific integrated circuits (ASIC) like
DaDianNao6, tensor processing unit (TPU)7 and Eyeriss8 offer
potential enhancements in speed and reductions in power consump-
tion. However, since the majority of neural network parameters (for
example, weights) are still stored in dynamic random-access mem-
ory (DRAM), moving data back-and-forth between the DRAM and
the caches (for example, static random-access memory; SRAM) of
processing units increases both the latency and power consump-
tion. The growing challenge of this communication bottleneck,
together with the saturation of Moore’s law, limits the speed and
energy efficiency of complementary metal–oxide–semiconductor
(CMOS)-based reinforcement learning in the era of big data.
A processing-in-memory architecture could provide a highly
parallel and energy-efficient approach to address these challenges,
relying on dense, power-efficient, fast, and scalable building blocks
such as ionic transistors9, phase change memory1013 and redox mem-
ristors1426. A key advantage of networks based on these emerging
devices is ‘compute by physics’, where vector-matrix multiplica-
tions are performed intrinsically via Ohm’s law (for multiplication)
and Kirchhoffs current law (for summation)27,28. Such a network
computes exactly where the data are stored and thus avoids the
communication bottleneck. Furthermore, it is able to compute in
parallel and in the analogue domain. Applications, including sig-
nal processing29,30, scientific computing31,32, hardware security33 and
neuromorphic computing22,28,3441, have been recently demonstrated
(see Supplementary Table 1).
Memristor arrays have shown potential speed and energy
enhancements for in situ supervised learning (for spatial/tempo-
ral pattern classification)22,3437,39,41,42 and unsupervised learning
(for data clustering)40,43,44. Although the memristor crossbar imple-
mentation of reinforcement learning may significantly benefit the
reward predictions based on the forward passes of the deep-Q net-
work, using historical observations repeatedly replayed from the
experience’ to optimize the decision-making in unknown environ-
ments3, it has yet to be demonstrated due to lack of the hardware
and its corresponding algorithm.
In this Article, we report an experimental demonstration of rein-
forcement learning in analogue memristor arrays. The parallel and
energy-efficient in situ reinforcement learning with a three-layer
fully connected memristive deep-Q network is implemented on a
128 × 64 1-transistor 1-memristor (1T1R) array. We show that the
learning can be generally applied to classic reinforcement learning
environments, including cart–pole45 and mountain car46 problems.
Our results indicate that in-memristor reinforcement learning can
achieve a 4 to 5 bit representation capability per weight using a
two-pulse write-without-verification scheme to program the 1T1R
array, with potential improvements in computing speed and energy
efficiency (see Supplementary Note 1).
Reinforcement learning with analogue
memristor arrays
ZhongruiWang1,8, CanLi 1,8, WenhaoSong1, MingyiRao1, DanielBelkin 1, YunningLi1, PengYan1,
HaoJiang1, PengLin1, MiaoHu 2, JohnPaulStrachan3, NingGe3, MarkBarnell4, QingWu4,
AndrewG.Barto5, QinruQiu6, R.StanleyWilliams 7, QiangfeiXia 1* and J.JoshuaYang 1*
Reinforcement learning algorithms that use deep neural networks are a promising approach for the development of machines
that can acquire knowledge and solve problems without human input or supervision. At present, however, these algorithms are
implemented in software running on relatively standard complementary metal–oxide–semiconductor digital platforms, where
performance will be constrained by the limits of Moore’s law and von Neumann architecture. Here, we report an experimental
demonstration of reinforcement learning on a three-layer 1-transistor 1-memristor (1T1R) network using a modified learning
algorithm tailored for our hybrid analogue–digital platform. To illustrate the capabilities of our approach in robust in situ train-
ing without the need for a model, we performed two classic control problems: the cart–pole and mountain car simulations.
We also show that, compared with conventional digital systems in real-world reinforcement learning tasks, our hybrid
analogue–digital computing system has the potential to achieve a significant boost in speed and energy efficiency.
NATURE ELECTRONICS | VOL 2 | MARCH 2019 | 115–124 | www.nature.com/natureelectronics 115
Content courtesy of Springer Nature, terms of use apply. Rights reserved
... Meanwhile, in 2017, researchers from UMass Amherst and Hewlett Packard Labs, for the first time, demonstrated analog input, analog weight, and analog output on an integrated 128×64 1T1M array with discrete off-chip peripheral circuits for analog signal processing and image compressing tasks [10]. Soon after, various ML algorithms have been experimentally implemented on the same platform, including in-situ training of multilayer perceptron (MLP) [11], convolutional neural network (CNN) [12], long short-term memory (LSTM) [13] and reinforcement learning(RL) [14], etc. ...
Preprint
The unprecedented advancement of artificial intelligence has placed immense demands on computing hardware, but traditional silicon-based semiconductor technologies are approaching their physical and economic limit, prompting the exploration of novel computing paradigms. Memristor offers a promising solution, enabling in-memory analog computation and massive parallelism, which leads to low latency and power consumption. This manuscript reviews the current status of memristor-based machine learning accelerators, highlighting the milestones achieved in developing prototype chips, that not only accelerate neural networks inference but also tackle other machine learning tasks. More importantly, it discusses our opinion on current key challenges that remain in this field, such as device variation, the need for efficient peripheral circuitry, and systematic co-design and optimization. We also share our perspective on potential future directions, some of which address existing challenges while others explore untouched territories. By addressing these challenges through interdisciplinary efforts spanning device engineering, circuit design, and systems architecture, memristor-based accelerators could significantly advance the capabilities of AI hardware, particularly for edge applications where power efficiency is paramount.
... Memristor is widely recognized as a promising "compute-withphysics" device that directly implements matrix-vector multiplication (MVM) using physical laws, namely Ohm's law for multiplication and Kirchhoff's law for summation 9 . As MVM is the most frequently used operation in deep learning, this implementation has resulted in greatly improved energy efficiency [17][18][19][20][21][22][23][24][25][26][27][28] . However, since memristors are tailored to fit these algorithms, problems like large number of parameters and computation operations still exist. ...
Article
Full-text available
Inspired by biological processes, feature learning techniques, such as deep learning, have achieved great success in various fields. However, since biological organs may operate differently from semiconductor devices, deep models usually require dedicated hardware and are computation-complex. High energy consumption has made deep model growth unsustainable. We present an approach that directly implements feature learning using semiconductor physics to minimize disparity between model and hardware. Following this approach, a feature learning technique based on memristor drift-diffusion kinetics is proposed by leveraging the dynamic response of a single memristor to learn features. The model parameters and computational operations of the kinetics-based network are reduced by up to 2 and 4 orders of magnitude, respectively, compared with deep models. We experimentally implement the proposed network on 180 nm memristor chips for various dimensional pattern classification tasks. Compared with memristor-based deep learning hardware, the memristor kinetics-based hardware can further reduce energy and area consumption significantly. We propose that innovations in hardware physics could create an intriguing solution for intelligent models by balancing model complexity and performance.
... One solution here is memristor-based analogue computing [9][10][11] . This approach improves energy efficiency by eliminating the emulation layers required to operate an artificial neural network on von Neumann architecture-based computers [12][13][14] . Memristors can achieve a higher density in crossbar arrays compared to conventional digital memory devices due to their simple two-terminal structure and ability to achieve multi-level conductance states 15 . ...
Article
Full-text available
Memristor-based platforms could be used to create compact and energy-efficient artificial intelligence (AI) edge-computing systems due to their parallel computation ability in the analogue domain. However, systems based on memristor arrays face challenges implementing real-time AI algorithms with fully on-device learning due to reliability issues, such as low yield, poor uniformity and endurance problems. Here we report an analogue computing platform based on a selector-less analogue memristor array. We use interfacial-type titanium oxide memristors with a gradual oxygen distribution that exhibit high reliability, high linearity, forming-free attribute and self-rectification. Our platform—which consists of a selector-less (one-memristor) 1 K (32 × 32) crossbar array, peripheral circuitry and digital controller—can run AI algorithms in the analogue domain by self-calibration without compensation operations or pretraining. We illustrate the capabilities of the system with real-time video foreground and background separation, achieving an average peak signal-to-noise ratio of 30.49 dB and a structural similarity index measure of 0.81; these values are similar to those of simulations for the ideal case.
... These nanoscale devices offer exceptional performance metrics, combining durability, rapid switching capabilities, and high scalability [4][5][6]. A particularly compelling feature is their unique ability to both store and process information within the same physical space, enabling energy-efficient computing for both inmemory and parallel processing applications [7][8][9][10][11][12][13][14][15][16][17][18]. ...
Preprint
Full-text available
Achieving reliable resistive switching in oxide-based memristive devices requires precise control over conductive filament (CF) formation and behavior, yet the fundamental relationship between oxide material properties and switching uniformity remains incompletely understood. Here, we develop a comprehensive physical model to investigate how electrical and thermal conductivities influence CF dynamics in TaOx-based memristors. Our simulations reveal that higher electrical conductivity promotes oxygen vacancy generation and reduces forming voltage, while higher thermal conductivity enhances heat dissipation, leading to increased forming voltage. The uniformity of resistive switching is strongly dependent on the interplay between these transport properties. We identify two distinct pathways for achieving optimal High Resistance State (HRS) uniformity with standard deviation-to-mean ratios as low as 0.045, each governed by different balances of electrical and thermal transport mechanisms. For the Low Resistance State (LRS), high uniformity (0.009) can be maintained when either electrical or thermal conductivity is low. The resistance ratio between HRS and LRS shows a strong dependence on these conductivities, with higher ratios observed at lower conductivity values. These findings provide essential guidelines for material selection in RRAM devices, particularly for applications demanding high reliability and uniform switching characteristics.
Article
Full-text available
Machine learning algorithms have proven to be effective for essential quantum computation tasks such as quantum error correction and quantum control. Efficient hardware implementation of these algorithms at cryogenic temperatures is essential. Here we utilize magnetic topological insulators as memristors (termed magnetic topological memristors) and introduce a cryogenic in-memory computing scheme based on the coexistence of a chiral edge state and a topological surface state. The memristive switching and reading of the giant anomalous Hall effect exhibit high energy efficiency, high stability and low stochasticity. We achieve high accuracy in a proof-of-concept classification task using four magnetic topological memristors. Furthermore, our algorithm-level and circuit-level simulations of large-scale neural networks demonstrate software-level accuracy and lower energy consumption for image recognition and quantum state preparation compared with existing magnetic memristor and complementary metal-oxide-semiconductor technologies. Our results not only showcase a new application of chiral edge states but also may inspire further topological quantum-physics-based novel computing schemes.
Article
Recently, the analog matrix computing (AMC) concept has been proposed for fast, efficient matrix operations, by configuring global feedback loops with crosspoint resistive memory arrays and operational amplifiers (OAs). The implementation of a general real-valued matrix (containing both positive and negative elements) is enabled by using a set of analog inverters, which, however, is considered inefficient regarding circuit compactness, power consumption, and temporal response. Here, with the assistance of the conductance compensation (CC) strategy to take full advantage of the inherent differential inputs of OAs, new AMC circuits without analog inverters are designed. Such a design saves the area occupation and power dissipation of analog inverters, and thus turns to be smaller and lower-power. Simulation results reveal that the new circuit also shows a faster response towards the steady state, thanks to the reduction of poles in the circuit, which, again, is contributed by the elimination of analog inverters. Along with all of these benefits, extensive simulations demonstrate that the CC-AMC circuits do not compromise the computing performance in terms of relative error caused by various non-ideal factors in the circuit.
Article
Full-text available
Learning rate scheduling (LRS) is a critical factor influencing the performance of neural networks by accelerating the convergence of learning algorithms and enhancing the generalization capabilities. The escalating computational demands in artificial intelligence (AI) necessitate advanced hardware solutions capable of supporting neural network training with LRS. This not only requires linear and symmetric analog programming capabilities but also the precise adjustment of channel conductance to achieve tunable slope in weight update behaviors. Here, a cascaded duplex organic vertical memory is proposed with the coupling of ferroelectric polarization effect and Schottky gate control on the same semiconducting channel, exhibiting adjustable‐slope conductance update with high linearity and symmetry. Therefore, in the chest X‐ray image detection, a fast‐to‐slow LRS is used for a bi‐layer ANN training, achieving a rapid, stable convergence behavior within only 15 epochs and a high recognition accuracy. Moreover, the proposed LRS training is also suitable for the Mackey Glass prediction task using long short‐term memory networks. This work integrates LRS into synaptic devices, enabling efficient hardware implementation of neural networks and thus enhancing AI performance in practical applications.
Article
Full-text available
Deep neural networks are increasingly popular in data-intensive applications, but are power-hungry. New types of computer chips that are suited to the task of deep learning, such as memristor arrays where data handling and computing take place within the same unit, are required. A well-used deep learning model called long short-term memory, which can handle temporal sequential data analysis, is now implemented in a memristor crossbar array, promising an energy-efficient and low-footprint deep learning platform.
Article
Full-text available
Memristive devices have been extensively studied for data-intensive tasks such as artificial neural networks. These types of computing tasks are considered to be ‘soft’ as they can tolerate low computing precision without suffering from performance degradation. However, ‘hard’ computing tasks, which require high precision and accurate solutions, dominate many applications and are difficult to implement with memristors because the devices normally offer low native precision and suffer from high device variability. Here we report a complete memristor-based hardware and software system that can perform high-precision computing tasks, making memristor-based in-memory computing approaches attractive for general high-performance computing environments. We experimentally implement a numerical partial differential equation solver using a tantalum oxide memristor crossbar system, which we use to solve static and time-evolving problems. We also illustrate the practical capabilities of our memristive hardware by using it to simulate an argon plasma reactor.
Article
Full-text available
Memristors with tunable resistance states are emerging building blocks of artificial neural networks. However, in situ learning on a large-scale multiple-layer memristor network has yet to be demonstrated because of challenges in device property engineering and circuit integration. Here we monolithically integrate hafnium oxide-based memristors with a foundry-made transistor array into a multiple-layer neural network. We experimentally demonstrate in situ learning capability and achieve competitive classification accuracy on a standard machine learning dataset, which further confirms that the training algorithm allows the network to adapt to hardware imperfections. Our simulation using the experimental parameters suggests that a larger network would further increase the classification accuracy. The memristor neural network is a promising hardware platform for artificial intelligence with high speed-energy efficiency.
Article
Full-text available
Neural-network training can be slow and energy intensive, owing to the need to transfer the weight data for the network between conventional digital memory chips and processor chips. Analogue non-volatile memory can accelerate the neural-network training algorithm known as backpropagation by performing parallelized multiply-accumulate operations in the analogue domain at the location of the weight data. However, the classification accuracies of such in situ training using non-volatile-memory hardware have generally been less than those of software-based training, owing to insufficient dynamic range and excessive weight-update asymmetry. Here we demonstrate mixed hardware-software neural-network implementations that involve up to 204,900 synapses and that combine long-term storage in phase-change memory, near-linear updates of volatile capacitors and weight-data transfer with 'polarity inversion' to cancel out inherent device-to-device variations. We achieve generalization accuracies (on previously unseen data) equivalent to those of software-based training on various commonly used machine-learning test datasets (MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100). The computational energy efficiency of 28,065 billion operations per second per watt and throughput per area of 3.6 trillion operations per second per square millimetre that we calculate for our implementation exceed those of today's graphical processing units by two orders of magnitude. This work provides a path towards hardware accelerators that are both fast and energy efficient, particularly on fully connected neural-network layers.
Article
Full-text available
As complementary metal–oxide–semiconductor (CMOS) scaling reaches its technological limits, a radical departure from traditional von Neumann systems, which involve separate processing and memory units, is needed in order to extend the performance of today’s computers substantially. In-memory computing is a promising approach in which nanoscale resistive memory devices, organized in a computational memory unit, are used for both processing and memory. However, to reach the numerical accuracy typically required for data analytics and scientific computing, limitations arising from device variability and non-ideal device characteristics need to be addressed. Here we introduce the concept of mixed-precision in-memory computing, which combines a von Neumann machine with a computational memory unit. In this hybrid system, the computational memory unit performs the bulk of a computational task, while the von Neumann machine implements a backward method to iteratively improve the accuracy of the solution. The system therefore benefits from both the high precision of digital computing and the energy/areal efficiency of in-memory computing. We experimentally demonstrate the efficacy of the approach by accurately solving systems of linear equations, in particular, a system of 5,000 equations using 998,752 phase-change memory devices. A hybrid system that combines a von Neumann machine with a computational memory unit can offer both the high precision of digital computing and the energy/areal efficiency of in-memory computing, which is illustrated by accurately solving a system of 5,000 equations using 998,752 phase-change memory devices.
Article
Full-text available
Hardware-intrinsic security primitives employ instance-specific and process-induced variations in electronic hardware as a source of cryptographic data. Among various emerging technologies, memristors offer unique opportunities in such security applications due to their underlying stochastic operation. Here we show that the analogue tuning and nonlinear conductance variations of memristors can be used to build a basic building block for implementing physically unclonable functions that are resilient, dense, fast and energy-efficient. Using two vertically integrated 10 × 10 metal-oxide memristive crossbar circuits, we experimentally demonstrate a security primitive that offers a near ideal 50% average uniformity and diffuseness, as well as a minimum bit error rate of around 1.5 ± 1%. Readjustment of the conductances of the devices allows nearly unique security instances to be implemented with the same crossbar circuit. By exploiting the nonlinear and analogue tuning properties of memristors, robust security primitives can be fabricated using integrated memristive crossbar circuits.
Article
Full-text available
Neuromorphic computers comprised of artificial neurons and synapses could provide a more efficient approach to implementing neural network algorithms than traditional hardware. Recently, artificial neurons based on memristors have been developed, but with limited bio-realistic dynamics and no direct interaction with the artificial synapses in an integrated network. Here we show that a diffusive memristor based on silver nanoparticles in a dielectric film can be used to create an artificial neuron with stochastic leaky integrate-and-fire dynamics and tunable integration time, which is determined by silver migration alone or its interaction with circuit capacitance. We integrate these neurons with nonvolatile memristive synapses to build fully memristive artificial neural networks. With these integrated networks, we experimentally demonstrate unsupervised synaptic weight updating and pattern classification. Leaky integrate-and-fire artificial neurons based on diffusive memristors enable unsupervised weight updates of drift-memristor synapses in an integrated convolutional neural network capable of pattern recognition.
Article
Memristor-based neuromorphic networks have been actively studied as a promising candidate to overcome the von-Neumann bottleneck in future computing applications. Several recent studies have demonstrated memristor network's capability to perform unsupervised learning using unlabeled datasets, where features inherent in the input are identified and analyzed by comparing with features stored in the memristor network. However, even though in some cases the stored feature vectors can be normalized so that the winning neurons can be directly found by the (input) vector - (stored) vector dot-products, in many other cases normalization of the feature vectors is not trivial or practically feasible, and calculation of the actual Euclidean distance between the input vector and the stored vector is required. Here we report experimental implementation of memristor crossbar hardware systems that can allow direct comparison of the Euclidean distances without normalizing the weights. The experimental system enables unsupervised K-means clustering algorithm through online learning, and produces high classification accuracy (93.3%) for the standard IRIS dataset. The approaches and devices can be used in other unsupervised learning systems, and significantly broadens the range of problems memristor based network can solve.