About
285
Publications
89,034
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
38,122
Citations
Introduction
Professor Wei Lu is with the Department of Electrical Engineering and Computer Science (EECS), University of Michigan. His current research topics include high-density memory based on two-terminal resistive devices (RRAM), memristors and memristive systems, neuromorphic circuits, aggressively scaled nanowire transistors, and other emerging electrical devices. He is an IEEE Fellow and co-founder of Crossbar Inc.
Current institution
Additional affiliations
April 2003 - August 2005
September 1996 - March 2003
September 2005 - June 2016
Publications
Publications (285)
Analog compute in memory (CIM) with multilevel cell (MLC) resistive random access memory (ReRAM) promises highly dense and efficient compute support for machine learning and scientific computing. This article introduces analog to digital converter (ADC)-assisted bit-serial processing for efficient, high-throughput compute. Bit-serial digital to ana...
A memristor array has emerged as a potential computing hardware for artificial intelligence (AI). It has an inherent memory effect that allows information storage in the form of easily programmable electrical conductance, making it suitable for efficient data processing without shuttling of data between the processor and memory. To realize its full...
Cutting-edge humanoid machine vision merely mimics human systems and lacks polarimetric functionalities that convey the information of navigation and authentic images. Interspecies-chimera vision reserving multiple hosts’ capacities will lead to advanced machine vision. However, implementing the visual functions of multiple species (human and non-h...
Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a...
Neuromorphic technologies aim to use the organizing principles of the brain to build efficient and intelligent systems, making them the center-piece between the biological and current Artificial Intelligence (AI) systems. Specifically, in conventional AI systems, one of the dominant sources of power consumption is the data movement between the memo...
Neuromorphic computing systems promise high energy efficiency and low latency. In particular, when integrated with neuromorphic sensors, they can be used to produce intelligent systems for a broad range of applications. An event‐based camera is such a neuromorphic sensor, inspired by the sparse and asynchronous spike representation of the biologica...
Memristive devices are of potential use in a range of computing applications. However, many of these devices are based on amorphous materials, where systematic control of the switching dynamics is challenging. Here we report tunable and stable memristors based on an entropy-stabilized oxide. We use single-crystalline (Mg,Co,Ni,Cu,Zn)O films grown o...
Artificial Intelligence (AI) is currently experiencing a bloom driven by deep learning (DL) techniques, which rely on networks of connected simple computing units operating in parallel. The low communication bandwidth between memory and processing units in conventional von Neumann machines does not support the requirements of emerging applications...
Decoder-only Transformer models such as GPT have demonstrated exceptional performance in text generation, by autore-gressively predicting the next token. However, the efficacy of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. Process-in-memory (PIM) architectures can minimize off-chip data...
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency when performing inference with deep learning workloads.
Error backpropagation is presently regarded as the most effective method for training SNNs, but in a twist of irony, when training on modern graphics processing units (GPUs)...
Deep learning accelerators (DLAs) based on compute-in-memory (CIM) technologies have been considered promising candidates to drastically improve the throughput and energy efficiency for running deep neural network models. In this review, we analyze DLA designs reported in the past decade, including both fully digital DLAs and analog CIM based DLAs,...
With the rising popularity of post-quantum cryptographic schemes, realizing practical implementations for real-world applications is still a major challenge. A major bottleneck in such schemes is the fetching and processing of large polynomials in the Number Theoretic Transform (NTT), which makes non Von Neumann paradigms, such as near-memory proce...
The need for deep neural network (DNN) models with higher performance and better functionality leads to the proliferation of very large models. Model training, however, requires intensive computation time and energy. Memristor‐based compute‐in‐memory (CIM) modules can perform vector‐matrix multiplication (VMM) in situ and in parallel, and have show...
Analog compute‐in‐memory (CIM) systems are promising candidates for deep neural network (DNN) inference acceleration. However, as the use of DNNs expands, protecting user input privacy has become increasingly important. Herein, a potential security vulnerability is identified wherein an adversary can reconstruct the user's private input data from a...
The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This article serves as a tutorial and perspective showing how to apply the lessons learned from several decades of research in deep le...
Memristive technology has been rapidly emerging as a potential alternative to traditional CMOS technology, which is facing fundamental limitations in its development. Since oxide-based resistive switches were demonstrated as memristors in 2008, memristive devices have garnered significant attention due to their biomimetic memory properties, which p...
The need for deep neural network (DNN) models with higher performance and better functionality leads to the proliferation of very large models. Model training, however, requires intensive computation time and energy. Memristor-based compute-in-memory (CIM) modules can perform vector-matrix multiplication (VMM) in situ and in parallel, and have show...
Analog compute-in-memory (CIM) accelerators are becoming increasingly popular for deep neural network (DNN) inference due to their energy efficiency and in-situ vector-matrix multiplication (VMM) capabilities. However, as the use of DNNs expands, protecting user input privacy has become increasingly important. In this paper, we identify a security...
Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the even data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are difficult to train. In this work, we propose a neural network architect...
Fully automated retail vehicles will have stringent, real-time operational safety, sensing, communication, inference, planning, and control requirements. Meeting them with existing technologies will impose prohibitive energy-provisioning and thermal management requirements. This article summarizes the research challenges facing the designers of com...
Automated vehicles (AV) hold great promise for improving safety, as well as reducing congestion and emissions. In order to make automated vehicles commercially viable, a reliable and highperformance vehicle-based computing platform that meets ever-increasing computational demands will be key. Given the state of existing digital computing technology...
In-memory computing (IMC) systems have great potential for accelerating data-intensive tasks such as deep neural networks (DNNs). As DNN models are generally highly proprietary, the neural network architectures become valuable targets for attacks. In IMC systems, since the whole model is mapped on chip and weight memory read can be restricted, the...
Compute-in-Memory (CIM) implemented with Resistive-Random-Access-Memory (RRAM) crossbars is a promising approach for accelerating Convolutional Neural Network (CNN) computations. The growing size in the number of parameters in state-of-the-art CNN models, however, creates challenge for on-chip weight storage for CIM implementations, and CNN compres...
Electronic switches based on the migration of high-density point defects, or memristors, are poised to revolutionize post-digital electronics. Despite significant research, key mechanisms for filament formation and oxygen transport remain unresolved, thus hindering our ability to predict and design crucial device properties. For example, predicted...
As more cloud computing resources are used for machine learning training and inference processes, privacy-preserving techniques that protect data from revealing at the cloud platforms attract increasing interest. Homomorphic encryption (HE) is one of the most promising techniques that enable privacy-preserving machine learning because HE allows dat...
We present
MEMprop
, the adoption of gradient-based learning to train fully memristive spiking neural networks (MSNNs). Our approach harnesses intrinsic device dynamics to trigger naturally arising voltage spikes. These spikes emitted by memristive dynamics are analog in nature, and thus fully differentiable, which eliminates the need for surroga...
Compute-in-Memory (CIM) implemented with Resistive-Random-Access-Memory (RRAM) crossbars is a promising approach for Deep Neural Network (DNN) acceleration. As the DNN size continues to grow, the finite on-chip weight storage has become a challenge for CIM implementations. Pruning can reduce network size, but unstructured pruning is not compatible...
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency when performing inference with deep learning workloads. Error backpropagation is presently regarded as the most effective method for training SNNs, but in a twist of irony, when training on modern graphics processing units (GPUs)...
Network features found in the brain may help implement more efficient and robust neural networks. Spiking neural networks (SNNs) process spikes in the spatiotemporal domain and can offer better energy efficiency than deep neural networks. However, most SNN implementations rely on simple point neurons that neglect the rich neuronal and dendritic dyn...
In-memory computing (IMC) systems have great potential for accelerating data-intensive tasks such as deep neural networks (DNNs). As DNN models are generally highly proprietary, the neural network architectures become valuable targets for attacks. In IMC systems, since the whole model is mapped on chip and weight memory read can be restricted, the...
We present MEMprop, the adoption of gradient-based learning to train fully memristive spiking neural networks (MSNNs). Our approach harnesses intrinsic device dynamics to trigger naturally arising voltage spikes. These spikes emitted by memristive dynamics are analog in nature, and thus fully differentiable, which eliminates the need for surrogate...
Memristive devices, which combine a resistor with memory functions such that voltage pulses can change their resistance (and hence their memory state) in a nonvolatile manner, are beginning to be implemented in integrated circuits for memory applications. However, memristive devices could have applications in many other technologies, such as non-vo...
Memristive arrays are a natural fit to implement spiking neural network (SNN) acceleration. Representing information as digital spiking events can improve noise margins and tolerance to device variability compared to analog bitline current summation approaches to multiply–accumulate (MAC) operations. Restricting neuron activations to single-bit spi...
In this letter, we demonstrate a physical unclonable function (PUF) system based on fingerprint-like random planar structures through pattern transfer of self-assembled binary polymer mixtures. With properly designed electrode structures and materials, different types of conductance distributions with large variations are achieved, allowing the PUF...
Research on electronic devices and materials is currently driven by both the slowing down of transistor scaling and the exponential growth of computing needs, which make present digital computing increasingly capacity-limited and power-limited. A promising alternative approach consists in performing computing based on intrinsic device dynamics, suc...
In-memory computing on RRAM crossbars enables efficient and parallel vector-matrix multiplication. The neural network weight matrix is mapped onto the crossbar, and multiplication is performed in the analog domain. This article discusses different RRAM crossbar implementations, the associated mixed-signal circuits, and their challenges. As a proof-...
Memristive devices have demonstrated rich switching behaviors that closely resemble synaptic functions and provide a building block to construct efficient neuromorphic systems. It is demonstrated that resistive switching effects are controlled not only by the external field, but also by the dynamics of various internal state variables that facilita...
Spiking and Quantized Neural Networks (NNs) are becoming exceedingly important for hyper-efficient implementations of Deep Learning (DL) algorithms. However, these networks face challenges when trained using error backpropagation, due to the absence of gradient signals when applying hard thresholds. The broadly accepted trick to overcoming this is...
Spiking neural networks can compensate for quantization error by encoding information either in the temporal domain, or by processing discretized quantities in hidden states of higher precision. In theory, a wide dynamic range state-space enables multiple binarized inputs to be accumulated together, thus improving the representational capacity of i...
In analog in-memory computing systems based on nonvolatile memories such as resistive random-access memory (RRAM), neural network models are often trained offline and then the weights are programmed onto memory devices as conductance values. The programmed weight values inevitably deviate from the target values during the programming process. This...
The impact of device and circuit-level effects in mixed-signal Resistive Random Access Memory (RRAM) accelerators typically manifest as performance degradation of Deep Learning (DL) algorithms, but the degree of impact varies based on algorithmic features. These include network architecture, capacity, weight distribution, and the type of inter-laye...
Advances in electronics have revolutionized the way people work, play, and communicate with each other. Historically, these advances were mainly driven by CMOS transistor scaling following Moore’s law, where new generations of devices are smaller, faster, and cheaper, leading to more powerful circuits and systems. However, conventional scaling is n...
Neuromorphic systems that can emulate the structure and the operations of biological neural circuits have long been viewed as a promising hardware solution to meet the ever-growing demands of big-data analysis and AI tasks. Recent studies on resistive switching or memristive devices have suggested such devices may form the building blocks of biorea...
We present and experimentally validate two minimal compact memristive models for spiking neuronal signal generation using commercially available low-cost components. The first neuron model is called the Memristive Integrate-and-Fire (MIF) model, for neuronal signaling with two voltage levels: the spike-peak, and the rest-potential. The second model...
The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This paper shows how to apply the lessons learnt from several decades of research in deep learning, gradient descent, backpropagation...
Reservoir computing (RC) offers efficient temporal data processing with a low training cost by separating recurrent neural networks into a fixed network with recurrent connections and a trainable linear network. The quality of the fixed network, called reservoir, is the most important factor that determines the performance of the RC system. In this...
We present TAICHI, a general in-memory computing deep neural network accelerator design based on RRAM crossbar arrays heterogeneously integrated with local arithmetic units and global co-processors to allow the system to efficiently map different models while maintaining high energy efficiency and throughput. A hierarchical mesh network-onchip is i...
Knowing the connectivity patterns in neural circuitry is essential to understand the operating mechanism of the brain, as it allows the analysis of how neural signals are processed and flown through the neural system. With the recent advances in neural recording technologies in terms of channel size and time resolution, a simple and efficient syste...
The advances of neural recording techniques have fostered rapid growth of the number of simultaneously recorded neurons, opening up new possibilities to investigate the interactions and dynamics inside neural circuitry. The high recording channel counts, however, pose significant challenges for data analysis because the required time and computatio...
Reservoir computing (RC) offers efficient temporal data processing with a low training cost by separating recurrent neural networks into a fixed network with recurrent connections and a trainable linear network. The quality of the fixed network, called reservoir, is the most important factor that determines the performance of the RC system. In this...
Memristors have emerged as transformative devices to enable neuromorphic and in‐memory computing, where success requires the identification and development of materials that can overcome challenges in retention and device variability. Here, high‐entropy oxide composed of Zr, Hf, Nb, Ta, Mo, and W oxides is first demonstrated as a switching material...
Stochastic Computing (SC) is a computing paradigm that allows for the low-cost and low-power computation of various arithmetic operations using stochastic bit streams and digital logic. In contrast to conventional representation schemes used within the binary domain, the sequence of bit streams in the stochastic domain is inconsequential, and compu...
Stochastic Computing (SC) is a computing paradigm that allows for the low-cost and low-power computation of various arithmetic operations using stochastic bit streams and digital logic. In contrast to conventional representation schemes used within the binary domain, the sequence of bit streams in the stochastic domain is inconsequential, and compu...
Resistive random-access memory (RRAM) structured in crossbar arrays typically use selectors to improve noise margins by suppressing sneak path currents and leakages. Without selectors, crossbar array dimensions would be prohibitively small for practical use in storage-class memory (SCM), and compute-in-memory (CIM) applications. Most one-selector o...
In article number 2003984, Yiyang Li, A. Alec Talin, and co‐workers design a deterministic nonvolatile resistive memory cell without nanosized filaments. By using the statistical ensemble behavior of all point defects within the 3D bulk for information storage, they solve the challenge of stochastic switching that has plagued filament‐based memrist...
Biologically plausible computing systems require fine‐grain tuning of analog synaptic characteristics. In this study, lithium‐doped silicate resistive random access memory with a titanium nitride (TiN) electrode mimicking biological synapses is demonstrated. Biological plausibility of this RRAM device is thought to occur due to the low ionization e...
Digital computing is nearing its physical limits as computing needs and energy consumption rapidly increase. Analogue-memory-based neuromorphic computing can be orders of magnitude more energy efficient at data-intensive tasks like deep neural networks, but has been limited by the inaccurate and unpredictable switching of analogue resistive memory....
To tackle important combinatorial optimization problems, a variety of annealing-inspired computing accelerators, based on several different technology platforms, have been proposed, including quantum-, optical- and electronics-based approaches. However, to be of use in industrial applications, further improvements in speed and energy efficiency are...
To address the von Neumann bottleneck that leads to both energy and speed degradations, in-memory processing architectures have been proposed as a promising alternative for future computing applications. In this paper, we present an in-memory computing system based on resistive random-access memory (RRAM) crossbar arrays that is reconfigurable and...
The ability to efficiently analyze the activities of biological neural networks can significantly promote our understanding of neural communications and functionalities. However, conventional neural signal analysis approaches need to transmit and store large amounts of raw recording data, followed by extensive processing offline, posing significant...
Analog compute-in-memory with resistive random access memory (RRAM) devices promises to overcome the data movement bottleneck in data-intensive AI and machine learning. RRAM crossbar arrays improve the efficiency of vector-matrix multiplications (VMM), which is a vital operation in these applications. The prototype IC is the first complete, fully-i...
Oxide-based memristors are two-terminal devices whose resistance can be modulated by the history of applied stimulation. Memristors have been extensively studied as memory (as resistive random-access memory (RRAM)) and synaptic devices for neuromorphic computing applications. Understanding the internal dynamics of memristors is essential for contin...
With the slowing down of the Moore’s law and fundamental limitations due to the von-Neumann bottleneck, continued improvements in computing hardware performance become increasingly more challenging. Resistive switching (RS) devices are being extensively studied as promising candidates for next generation memory and computing applications due to the...
Advances in computing power have historically been driven by transistor scaling, commonly known as Moore’s law. However, transistor scaling is now close to an end as device sizes approach fundamental physical limits. In the meantime, demands in the application space have been evolving, often requiring processing large amounts of data at high throug...
We present an optimized conductance-based retina microcircuit simulator which transforms light stimuli into a series of graded and spiking action potentials through photo transduction. We use discrete retinal neuron blocks based on a collation of single-compartment models and morphologically realistic formulations, and successfully achieve a biolog...
Near infrared (NIR) synaptic devices offer a remote-control approach to implement neuromorphic computing for data safety and artificial retinal system applications. In upconverting nanoparticles (UCNPs)-mediated optogenetics biosystems, NIR regulation of membrane ion channels allows remote and selective control of the Ca²⁺ flux to modulate synaptic...
Time-series analysis including forecasting is essential in a range of fields from finance to engineering. However, long-term forecasting is difficult, particularly for cases where the underlying models and parameters are complex and unknown. Neural networks can effectively process features in temporal units and are attractive for such purposes. Res...
Silicon (Si) nanostructures are widely used in microelectronics and nanotechnology. Brittle to ductile transition in nanoscale Si is of great scientific and technological interest, but this phenomenon and its underlying mechanism remain elusive. By conducting in situ temperature-controlled nanomechanical testing inside a transmission electron micro...
Memristors and memristor crossbar arrays have been widely studied for neuromorphic and other in-memory computing applications. To achieve optimal system performance, however, it is essential to integrate memristor crossbars with peripheral and control circuitry. Here, we report a fully functional, hybrid memristor chip in which a passive crossbar a...
This paper is concerned with mode-dependent impulsive hybrid systems driven by deterministic finite automaton (DFA) with mixed-mode effects. In the hybrid systems, a complex phenomenon called mixed mode, caused in time-varying delay switching systems, is considered explicitly. Furthermore, mode-dependent impulses, which can exist not only at the in...
Advances in the understanding of nanoscale ionic processes in solid‐state thin films have led to the rapid development of devices based on coupled ionic–electronic effects. For example, ion‐driven resistive‐switching (RS) devices have been extensively studied for future memory applications due to their excellent performance in terms of switching sp...
We describe a hybrid analog-digital computing approach to solve important combinatorial optimization problems that leverages memristors (two-terminal nonvolatile memories). While previous memristor accelerators have had to minimize analog noise effects, we show that our optimization solver harnesses such noise as a computing resource. Here we descr...
Resistive random-access memory (RRAM) devices have attracted broad interest as promising building blocks for high-density non-volatile memory and neuromorphic computing applications. Atomic-level thermodynamic and kinetic descriptions of resistive switching (RS) processes are essential for continued device design and optimization, but are relativel...
Stochastic computing is a low-cost form of computing. To perform stochastic computing, inputs need to be converted to stochastic bit streams using stochastic number generators (SNGs). The random number generation presents a significant overhead, which partially defeats the benefits of stochastic computing. In this work, we show that stochastic comp...
Coupled ionic–electronic effects present intriguing opportunities for device and circuit development. In particular, layered two-dimensional materials such as MoS2 offer highly anisotropic ionic transport properties, facilitating controlled ion migration and efficient ionic coupling among devices. Here, we report reversible modulation of MoS2 films...
Resistive switching (RS) is an interesting property shown by some materials systems that, especially during the last decade, has gained a lot of interest for the fabrication of electronic devices, with electronic nonvolatile memories being those that have received the most attention. The presence and quality of the RS phenomenon in a materials syst...
Memristors based on 2D layered materials could provide bio-realistic ionic interactions and potentially enable construction of energy-efficient artificial neural networks capable of faithfully emulating neuronal interconnections in human brains. To build reliable 2D-material-based memristors suitable for constructing working neural networks, the me...
Self-limited and forming-free Cu-based CBRAM devices with a double Al
<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub>
O
<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sub>
atomic layer deposition layer (D-ALD) structure were developed. The prop...