Preprint
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

In the modern era of artificial intelligence, increasingly sophisticated artificial neural networks (ANNs) are implemented, which pose challenges in terms of execution speed and power consumption. To tackle this problem, recent research on reduced-precision ANNs opened the possibility to exploit analog hardware for neuromorphic acceleration. In this scenario, photonic-electronic engines are emerging as a short-medium term solution to exploit the high speed and inherent parallelism of optics for linear computations needed in ANN, while resorting to electronic circuitry for signal conditioning and memory storage. In this paper we introduce a precision-scalable integrated photonic-electronic multiply-accumulate neuron, namely PEMAN. The proposed device relies on (i) an analog photonic engine to perform reduced-precision multiplications at high speed and low power, and (ii) an electronic front-end for accumulation and application of the nonlinear activation function by means of a nonlinear encoding in the analog-to-digital converter (ADC). The device, based on the iSiPP50G SOI process for the photonic engine and a commercial 28 nm CMOS process for the electronic front end, has been numerically validated through cosimulations to perform multiply-accumulate operations (MAC). PEMAN exhibits a multiplication accuracy of 6.1 ENOB up to 10 GMAC/s, while it can perform computations up to 56 GMAC/s with a reduced accuracy down to 2.1 ENOB. The device can trade off speed with resolution and power consumption, it outperforms its analog electronics counterparts both in terms of speed and power consumption, and brings substantial improvements also compared to a leading GPU.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
We present a reduced-precision integrated photonic electronic multiply-accumulate (MAC) neuron with ADC-embedded nonlinearity. The proposed device trades off speed with resolution, outperforming both analog and digital electronic solutions in terms of speed and energy consumption. © 2021 The Author(s) 1. Introduction In recent years, deep neural networks (DNN) have become one of the most powerful tools in machine learning, achieving unprecedented milestones in various fields such as computer vision, robotics and autonomous driving [1]. However, the energy consumption for computation and data movement is now becoming a major limiting factor [2]. In this context, photonic solutions are being investigated as an energy-efficient alternative to electronics-based DNNs because of the inherent parallelism, the high processing rate with low latency, and the possibility to exploit passive optical elements [3]. As the path towards truly deep photonic neural networks is still long [4], hybrid photonic-electronic systems are emerging to carry out the linear (i.e., MAC) operations in each neuron. This paper presents the novel Photonic Electronic Multiply-Accumulate Neuron (PEMAN), readily integratable in commercial photonic and electronic platforms, performing both MAC and nonlinear operations at once, and overcoming analog and digital electronic neuromorphic hardware in terms of speed and energy consumption.
Article
Full-text available
Reconfigurable linear optical processors can be used to perform linear transformations and are instrumental in effectively computing matrix–vector multiplications required in each neural network layer. In this paper, we characterize and compare two thermally tuned photonic integrated processors realized in silicon-on-insulator and silicon nitride platforms suited for extracting feature maps in convolutional neural networks. The reduction in bit resolution when crossing the processor is mainly due to optical losses, in the range 2.3–3.3 for the silicon-on-insulator chip and in the range 1.3–2.4 for the silicon nitride chip. However, the lower extinction ratio of Mach–Zehnder elements in the latter platform limits their expressivity (i.e., the capacity to implement any transformation) to 75%, compared to 97% of the former. Finally, the silicon-on-insulator processor outperforms the silicon nitride one in terms of footprint and energy efficiency.
Article
Full-text available
We propose a CMOS Analog Vector-Matrix Multiplier for Deep Neural Networks, implemented in a standard single-poly 180 nm CMOS technology. The learning weights are stored in analog floating-gate memory cells embedded in current mirrors implementing the multiplication operations. We experimentally verify the analog storage capability of designed single-poly floating-gate cells, the accuracy of the multiplying function of proposed tunable current mirrors, and the effective number of bits of the analog operation. We perform system-level simulations to show that an analog deep neural network based on the proposed vector-matrix multiplier can achieve an inference accuracy comparable to digital solutions with an energy efficiency of 26.4 TOPs/J, a layer latency close to $100~\mu \text{s}$ and an intrinsically high degree of parallelism. Our proposed design has also a cost advantage, considering that it can be implemented in a standard single-poly CMOS process flow.
Article
Full-text available
Photonics offers exciting opportunities for neuromorphic computing. This paper specifically reviews the prospects of integrated optical solutions for accelerating inference and training of artificial neural networks. Calculating the synaptic function, thereof, is computationally very expensive and does not scale well on state-of-the-art computing platforms. Analog signal processing, using linear and nonlinear properties of integrated optical devices, offers a path toward substantially improving performance and power efficiency of these artificial intelligence workloads. The ability of integrated photonics to operate at very high speeds opens opportunities for time-critical real-time applications, while chip-level integration paves the way to cost-effective manufacturing and assembly.
Article
Full-text available
Microelectronic computers have encountered challenges in meeting all of today’s demands for information processing. Meeting these demands will require the development of unconventional computers employing alternative processing models and new device physics. Neural network models have come to dominate modern machine learning algorithms, and specialized electronic hardware has been developed to implement them more efficiently. A silicon photonic integration industry promises to bring manufacturing ecosystems normally reserved for microelectronics to photonics. Photonic devices have already found simple analog signal processing niches where electronics cannot provide sufficient bandwidth and reconfigurability. In order to solve more complex information processing problems, they will have to adopt a processing model that generalizes and scales. Neuromorphic photonics aims to map physical models of optoelectronic systems to abstract models of neural networks. It represents a new opportunity for machine information processing on sub-nanosecond timescales, with application to mathematical programming, intelligent radio frequency signal processing, and real-time control. The strategy of neuromorphic engineering is to externalize the risk of developing computational theory alongside hardware. The strategy of remaining compatible with silicon photonics externalizes the risk of platform development. In this perspective article, we provide a rationale for a neuromorphic photonics processor, envisioning its architecture and a compiler. We also discuss how it can be interfaced with a general purpose computer, i.e. a CPU, as a coprocessor to target specific applications. This paper is intended for a wide audience and provides a roadmap for expanding research in the direction of transforming neuromorphic photonics into a viable and useful candidate for accelerating neuromorphic computing.
Article
Full-text available
Recent progress in artificial intelligence is largely attributed to the rapid development of machine learning, especially in the algorithm and neural network models. However, it is the performance of the hardware, in particular the energy efficiency of a computing system that sets the fundamental limit of the capability of machine learning. Data-centric computing requires a revolution in hardware systems, since traditional digital computers based on transistors and the von Neumann architecture were not purposely designed for neuromorphic computing. A hardware platform based on emerging devices and new architecture is the hope for future computing with dramatically improved throughput and energy efficiency. Building such a system, nevertheless, faces a number of challenges, ranging from materials selection, device optimization, circuit fabrication, and system integration, to name a few. The aim of this Roadmap is to present a snapshot of emerging hardware technologies that are potentially beneficial for machine learning, providing the Nanotechnology readers with a perspective of challenges and opportunities in this burgeoning field.
Article
Full-text available
This paper presents the performance analysis of a phase error- and loss-tolerant multiport field-programmable MZI-based structure for optical neural networks (ONNs). Compared to the triangular (Reck) mesh, our proposed diamond mesh makes use of a larger number of MZIs, leading to a symmetric topology and adding additional degrees of freedom for the weight matrix optimization in the backpropagation process. Furthermore, the additional MZIs enable the diamond mesh to optimally eliminate the excess light intensity that degrades the performance of the ONNs through the tapered out waveguides. Our results show that the diamond topology is more robust to the inevitable imperfections in practice, i.e., insertion loss of the constituent MZIs and the phase errors. This robustness allows for better classification accuracy in the presence of experimental imperfections. The practical performance and the scalability of the two structures implementing different sizes of optical neural networks are analytically compared. The obtained results confirm that the diamond mesh is more error- and loss-tolerant in classifying the data samples in different sizes of ONNs.
Article
Full-text available
Microring resonators (MRRs) are reconfigurable optical elements ubiquitous in photonic integrated circuits. Owing to its high sensitivity, MRR control is very challenging, especially in large-scale optical systems. In this work, we experimentally demonstrate continuous, multi-channel control of MRR weight banks using simple calibration procedures. A record-high accuracy and precision are achieved for all the controlled MRRs with negligible inter-channel crosstalk. Our approach allows accurate transmission calibration without the need for direct access to the output of the microring weight bank and without the need to lay out electrical and optical I/Os specific for calibration purpose. These features mean that our MRR control approach can be applied to large-scale photonic integrated circuits while maintaining its accuracy with manageable cost of chip area and I/O complexity.
Article
Full-text available
Photonic artificial neural networks have garnered enormous attention due to their potential to perform multiply-accumulate (MAC) operations at much higher clock rates and consuming significantly lower power and chip real-estate compared to digital electronic alternatives. Herein, we present a comprehensive power consumption analysis of photonic neurons, taking into account global design parameters and concluding to analytical expressions for the neuron's energy- and footprint efficiencies. We identify the optimal design-space and analyze the performance plateaus and their dependence on a range of physical parameters, highlighting the existence of an optimal data-rate for maximizing the energy efficiency. Following a survey of the best-in-class integrated photonic devices, including on-chip lasers, photodetectors, modulators and weighting elements, the mathematically calculated energy and footprint efficiencies are mapped into real photonic neuron deployment scenarios. We reveal that silicon photonics can compete with the best-performing currently available digital electronic neural network engines, reaching ${\rm{TMAC/s/mm^{2}}}$ footprint- and sub-pJ/MAC energy efficiencies. Simultaneously, neuromorphic plasmonics, plasmo-photonics and sub-wavelength photonics hold the credentials for 1 to 3 orders of magnitude improvements even when the laser requirements and a reasonable waveguide pitch are accounted for, promising performance at a few fJ/MAC and up to a few ${\rm{TMAC/s/mm^{2}}}$ .
Article
Full-text available
Deep learning is revolutionizing many aspects of our society, addressing a wide variety of decision-making tasks, from image classification to autonomous vehicle control. Matrix multiplication is an essential and computationally intensive step of deep-learning calculations. The computational complexity of deep neural networks requires dedicated hardware accelerators for additional processing throughput and improved energy efficiency in order to enable scaling to larger networks in the upcoming applications. Silicon photonics is a promising platform for hardware acceleration due to recent advances in CMOS-compatible manufacturing capabilities, which enable efficient exploitation of the inherent parallelism of optics. This article provides a detailed description of recent implementations in the relatively new and promising platform of silicon photonics for deep learning. Opportunities for multiwavelength microring silicon photonic architectures codesigned with field-programmable gate array (FPGA) for pre- and postprocessing are presented. The detailed analysis of a silicon photonic integrated circuit shows that a codesigned implementation based on the decomposition of large matrix-vector multiplication into smaller instances and the use of nonnegative weights could significantly simplify the photonic implementation of the matrix multiplier and allow increased scalability. We conclude this article by presenting an overview and a detailed analysis of design parameters. Insights for ways forward are explored.
Article
Full-text available
Photonic solutions are today a mature industrial reality concerning high speed, high throughput data communication and switching infrastructures. It is still a matter of investigation to what extent photonics will play a role in next-generation computing architectures. In particular, due to the recent outstanding achievements of artificial neural networks, there is a big interest in trying to improve their speed and energy efficiency by exploiting photonic-based hardware instead of electronic-based hardware. In this work we review the state-of-the-art of photonic artificial neural networks. We propose a taxonomy of the existing solutions (categorized into multilayer perceptrons, convolutional neural networks, spiking neural networks, and reservoir computing) with emphasis on proof-of-concept implementations. We also survey the specific approaches developed for training photonic neural networks. Finally we discuss the open challenges and highlight the most promising future research directions in this field.
Article
Full-text available
The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. To this end, various precision-scalable MAC architectures optimized for neural networks have recently been proposed. Yet, it has been hard to comprehend their differences and make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. To overcome this, this work exhaustively reviews the stateof-the-art precision-scalable MAC architectures and unifies them in a new taxonomy. Subsequently, these different topologies are thoroughly benchmarked in a 28nm commercial CMOS process, across a wide range of performance targets, and with precision ranging from 2 to 8 bits. Circuits are analyzed for each precision as well as jointly in practical use cases, highlighting the impact of architectures and scalability in terms of energy, throughput, area and bandwidth, aiming to understand the key trends to reduce computation costs in neural-network processing.
Article
Full-text available
It has long been known that photonic communication can alleviate the data movement bottlenecks that plague conventional microelectronic processors. More recently, there has also been interest in its capabilities to implement low precision linear operations, such as matrix multiplications, fast and efficiently. We characterize the performance photonic and electronic hardware underlying neural network and deep learning models using multiply-accumulate operations. First, we investigate the fundamental limits of analog electronic crossbar arrays and on-chip photonic linear computing systems. Photonic processors are shown to be superior in the limit of large processor sizes (>100 μm), large vector sizes (N > 100), and low precision (≤4 bits). We discuss several proposed tunable photonic MAC systems, and provide a concrete comparison between deep learning and photonic hardware using several empirically-validated device and system models. We show significant potential improvements over digital electronics in energy ( $> 10^2$ ), speed ( $> 10^3$ ), and compute density ( $> 10^2$ ).
Article
Full-text available
Conventional neural networks show a powerful framework for background subtraction in video acquired by static cameras. Indeed, the well-known SOBS method and its variants based on neural networks were the leader methods on the largescale CDnet 2012 dataset during a long time. Recently, convolutional neural networks which belong to deep learning methods were employed with success for background initialization, foreground detection and deep learned features. Currently, the top current background subtraction methods in CDnet 2014 are based on deep neural networks with a large gap of performance in comparison on the conventional unsupervised approaches based on multi-features or multi-cues strategies. Furthermore, a huge amount of papers was published since 2016 when Braham and Van Droogenbroeck published their first work on CNN applied to background subtraction providing a regular gain of performance. In this context, we provide the first review of deep neural network concepts in background subtraction for novices and experts in order to analyze this success and to provide further directions. For this, we first surveyed the methods used background initialization, background subtraction and deep learned features. Then, we discuss the adequacy of deep neural networks for background subtraction. Finally, experimental results are presented on the CDnet 2014 dataset.
Article
Full-text available
We introduce an all-optical Diffractive Deep Neural Network (D2NN) architecture that can learn to implement various functions after deep learning-based design of passive diffractive layers that work collectively. We experimentally demonstrated the success of this framework by creating 3D-printed D2NNs that learned to implement handwritten digit classification and the function of an imaging lens at terahertz spectrum. With the existing plethora of 3D-printing and other lithographic fabrication methods as well as spatial-light-modulators, this all-optical deep learning framework can perform, at the speed of light, various complex functions that computer-based neural networks can implement, and will find applications in all-optical image analysis, feature detection and object classification, also enabling new camera designs and optical components that can learn to perform unique tasks using D2NNs.
Article
Full-text available
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of deep neural network to improve energy-efficiency and throughput without sacrificing performance accuracy or increasing hardware cost are critical to enabling the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various platforms and architectures that support DNNs, and highlight key trends in recent efficient processing techniques that reduce the computation cost of DNNs either solely via hardware design changes or via joint hardware design and network algorithm changes. It will also summarize various development resources that can enable researchers and practitioners to quickly get started on DNN design, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-design, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand trade-offs between various architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand of recent implementation trends and opportunities.
Article
Full-text available
As society’s appetite for information continues to grow, so does our need to process this information with increasing speed and versatility. Many believe that the one-size-fits-all solution of digital electronics is becoming a limiting factor in certain areas such as data links, cognitive radio, and ultrafast control. Analog photonic devices have found relatively simple signal processing niches where electronics can no longer provide sufficient speed and reconfigurability. Recently, the landscape for commercially manufacturable photonic chips has been changing rapidly and now promises to achieve economies of scale previously enjoyed solely by microelectronics. By bridging the mathematical prowess of artificial neural networks to the underlying physics of optoelectronic devices, neuromorphic photonics could breach new domains of information processing demanding significant complexity, low cost, and unmatched speed. In this article, we review the progress in neuromorphic photonics, focusing on photonic integrated devices. The challenges and design rules for optoelectronic instantiation of artificial neurons are presented. The proposed photonic architecture revolves around the processing network node composed of two parts: a nonlinear element and a network interface. We then survey excitable lasers in the recent literature as candidates for the nonlinear node and microring-resonator weight banks as the network interface. Finally, we compare metrics between neuromorphic electronics and neuromorphic photonics and discuss potential applications.
Article
Full-text available
We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves $51\%$ top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.
Article
Full-text available
Convolutional Neural Networks (CNNs) have revolutionized the world of image classification over the last few years, pushing the computer vision close beyond human accuracy. The required computational effort of CNNs today requires power-hungry parallel processors and GP-GPUs. Recent efforts in designing CNN Application-Specific Integrated Circuits (ASICs) and accelerators for System-On-Chip (SoC) integration have achieved very promising results. Unfortunately, even these highly optimized engines are still above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. On the algorithmic side, highly competitive classification accuracy can be achieved by properly training CNNs with binary weights. This novel algorithm approach brings major optimization opportunities in the arithmetic core by removing the need for the expensive multiplications as well as in the weight storage and I/O costs. In this work, we present a HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a core area of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V. Our accelerator outperforms state-of-the-art performance in terms of ASIC energy efficiency as well as area efficiency with 61.2 TOp/s/W and 1135 GOp/s/MGE, respectively.
Article
Full-text available
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
Article
Full-text available
Many advanced optical functions, including spatial mode converters, linear optics quantum computing gates, and arbitrary linear optical processors for communications and other applications could be implemented using meshes of Mach‐Zehnder interferometers in technologies such as silicon photonics, but performance is limited by beam splitters that deviate from the ideal 50:50 split. We propose a new architecture and a novel self‐adjustment approach that automatically compensate for imperfect fabricated split ratios anywhere from 85:15 to 15:85. The entire mesh can be both optimized and programmed after initial fabrication, with progressive algorithms, without calculations or calibration, and even using only sources and detectors external to the mesh. Hence, one universal field‐programmable linear array (FPLA) optical element could be mass‐fabricated, with broad process tolerances, and then configured automatically for a wide range of complex and precise linear optical functions.
Article
Full-text available
The spiking neural network architecture (SpiNNaker) project aims to deliver a massively parallel million-core computer whose interconnect architecture is inspired by the connectivity characteristics of the mammalian brain, and which is suited to the modeling of large-scale spiking neural networks in biological real time. Specifically, the interconnect allows the transmission of a very large number of very small data packets, each conveying explicitly the source, and implicitly the time, of a single neural action potential or “spike.” In this paper, we review the current state of the project, which has already delivered systems with up to 2500 processors, and present the real-time event-driven programming model that supports flexible access to the resources of the machine and has enabled its use by a wide range of collaborators around the world.
Article
Full-text available
In this paper, we describe the design of Neurogrid, a neuromorphic system for simulating large-scale neural models in real time. Neuromorphic systems realize the function of biological neural systems by emulating their structure. Designers of such systems face three major design choices: 1) whether to emulate the four neural elements—axonal arbor, synapse, dendritic tree, and soma—with dedicated or shared electronic circuits; 2) whether to implement these electronic circuits in an analog or digital manner; and 3) whether to interconnect arrays of these silicon neurons with a mesh or a tree network. The choices we made were: 1) we emulated all neural elements except the soma with shared electronic circuits; this choice maximized the number of synaptic connections; 2) we realized all electronic circuits except those for axonal arbors in an analog manner; this choice maximized energy efficiency; and 3) we interconnected neural arrays in a tree network; this choice maximized throughput. These three choices made it possible to simulate a million neurons with billions of synaptic connections in real time—for the first time—using 16 Neurocores integrated on a board that consumes three watts.
Article
Full-text available
This article presents a comprehensive overview of the hardware realizations of artificial neural network (ANN) models, known as hardware neural networks (HNN), appearing in academic studies as prototypes as well as in commercial use. HNN research has witnessed a steady progress for more than last two decades, though commercial adoption of the technology has been relatively slower. We study the overall progress in the field across all major ANN models, hardware design approaches, and applications. We outline underlying design approaches for mapping an ANN model onto a compact, reliable, and energy efficient hardware entailing computation and communication and survey a wide range of illustrative examples. Chip design approaches (digital, analog, hybrid, and FPGA based) at neuronal level and as neurochips realizing complete ANN models are studied. We specifically discuss, in detail, neuromorphic designs including spiking neural network hardware, cellular neural network implementations, reconfigurable FPGA based implementations, in particular, for stochastic ANN models, and optical implementations. Parallel digital implementations employing bit-slice, systolic, and SIMD architectures, implementations for associative neural memories, and RAM based implementations are also outlined. We trace the recent trends and explore potential future research directions.
Article
We present an approach for the generation of an adaptive sigmoid-like and PReLU nonlinear activation function of an all-optical perceptron, exploiting the bistability of an injection-locked Fabry-Perot semiconductor laser. The profile of the activation function can be tailored by adjusting the injection-locked side-mode order, frequency detuning of the input optical signal, Henry factor, or bias current. The universal fitting function for both families of the activation functions is presented.
Article
NVIDIA A100 Tensor Core GPU is NVIDIA's latest flagship GPU. It has been designed with many new innovative features to provide performance and capabilities for HPC, AI, and data analytics workloads. Feature enhancements include a Third-Generation Tensor Core, new asynchronous data movement & programming model, enhanced L2 cache, HBM2 DRAM, and third-generation NVIDIA NVLink I/O.
Article
Artificial intelligence tasks across numerous applications require accelerators for fast and low-power execution. Optical computing systems may be able to meet these domain-specific needs but, despite half a century of research, general-purpose optical computing systems have yet to mature into a practical technology. Artificial intelligence inference, however, especially for visual computing applications, may offer opportunities for inference based on optical and photonic systems. In this Perspective, we review recent work on optical computing for artificial intelligence applications and discuss its promise and challenges.
Article
Neuromorphic computing takes inspiration from the brain to create energy-efficient hardware for information processing, capable of highly sophisticated tasks. Systems built with standard electronics achieve gains in speed and energy by mimicking the distributed topology of the brain. Scaling-up such systems and improving their energy usage, speed and performance by several orders of magnitude requires a revolution in hardware. We discuss how including more physics in the algorithms and nanoscale materials used for data processing could have a major impact in the field of neuromorphic computing. We review striking results that leverage physics to enhance the computing capabilities of artificial neural networks, using resistive switching materials, photonics, spintronics and other technologies. We discuss the paths that could lead these approaches to maturity, towards low-power, miniaturized chips that could infer and learn in real time. Neuromorphic computing takes inspiration from the brain to create energy-efficient hardware for information processing, capable of highly sophisticated tasks. Including more physics in the algorithms and nanoscale materials used for computing could have a major impact in this field.
Article
With an ongoing trend in computing hardware toward increased heterogeneity, domain-specific coprocessors are emerging as alternatives to centralized paradigms. The tensor core unit has been shown to outperform graphic processing units by almost 3 orders of magnitude, enabled by a stronger signal and greater energy efficiency. In this context, photons bear several synergistic physical properties while phase-change materials allow for local nonvolatile mnemonic functionality in these emerging distributed non-von Neumann architectures. While several photonic neural network designs have been explored, a photonic tensor core to perform tensor operations is yet to be implemented. In this manuscript, we introduce an integrated photonics-based tensor core unit by strategically utilizing (i) photonic parallelism via wavelength division multiplexing, (ii) high 2 peta-operations-per-second throughputs enabled by tens of picosecond-short delays from optoelectronics and compact photonic integrated circuitry, and (iii) near-zero static power-consuming novel photonic multi-state memories based on phase-change materials featuring vanishing losses in the amorphous state. Combining these physical synergies of material, function, and system, we show, supported by numerical simulations, that the performance of this 4-bit photonic tensor core unit can be 1 order of magnitude higher for electrical data. The full potential of this photonic tensor processor is delivered for optical data being processed, where we find a 2–3 orders higher performance (operations per joule), as compared to an electrical tensor core unit, while featuring similar chip areas. This work shows that photonic specialized processors have the potential to augment electronic systems and may perform exceptionally well in network-edge devices in the looming 5G networks and beyond.
Article
Machine learning has emerged as the dominant tool for implementing complex cognitive tasks that require supervised, unsupervised, and reinforcement learning. While the resulting machines have demonstrated in some cases even superhuman performance, their energy consumption has often proved to be prohibitive in the absence of costly supercomputers. Most state-of-the-art machine-learning solutions are based on memoryless models of neurons. This is unlike the neurons in the human brain that encode and process information using temporal information in spike events. The different computing principles underlying biological neurons and how they combine together to efficiently process information is believed to be a key factor behind their superior efficiency compared to current machine-learning systems.
Article
Convolutional Neural Networks (CNNs) are powerful and highly ubiquitous tools for extracting features from large datasets for applications such as computer vision and natural language processing. However, a convolution is a computationally expensive operation in digital electronics. In contrast, neuromorphic photonic systems, which have experienced a recent surge of interest over the last few years, propose higher bandwidth and energy efficiencies for neural network training and inference. Neuromorphic photonics exploits the advantages of optical electronics, including the ease of analog processing, and busing multiple signals on a single waveguide at the speed of light. Here, we propose a Digital Electronic and Analog Photonic (DEAP) CNN hardware architecture that has potential to be 2.8 to 14 times faster while maintaining the same power usage of current state-of-the-art graphical processing units (GPUs).
Article
Neuromorphic photonics relies on efficiently emulating analog neural networks at high speeds. Prior work showed that transducing signals from the optical to the electrical domain and back with transimpedance gain was an efficient approach to implementing analog photonic neurons and scalable networks. Here, we examine modulator-based photonic neuron circuits with passive and active transimpedance gains, with special attention to the sources of noise propagation. We find that a modulator nonlinear transfer function can suppress noise, which is necessary to avoid noise propagation in hardware neural networks. In addition, while efficient modulators can reduce power for an individual neuron, signal-to-noise ratios must be traded off with power consumption at a system level. Active transimpedance amplifiers may help relax this tradeoff for conventional p-n junction silicon photonic modulators, but a passive transimpedance circuit is sufficient when very efficient modulators (i.e. low C and low V-pi) are employed.
Article
Recently, along with the rapid development of mobile communication technology, edge computing theory and techniques have been attracting more and more attention from global researchers and engineers, which can significantly bridge the capacity of cloud and requirement of devices by the network edges, and thus can accelerate content delivery and improve the quality of mobile services. In order to bring more intelligence to edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with mobile edge systems, for optimizing mobile edge computing, caching and communication. And thus, we design the “In-Edge AI” framework in order to intelligently utilize the collaboration among devices and edge nodes to exchange the learning parameters for a better training and inference of the models, and thus to carry out dynamic system-level optimization and application-level enhancement while reducing the unnecessary system communication load. “In-Edge AI” is evaluated and proved to have near-optimal performance but relatively low overhead of learning, while the system is cognitive and adaptive to mobile communication systems. Finally, we discuss several related challenges and opportunities for unveiling a promising upcoming future of "In-Edge AI."
Article
We introduce an electro-optic hardware platform for nonlinear activation functions in optical neural networks. The optical-to-optical nonlinearity operates by converting a small portion of the input optical signal into an analog electric signal, which is used to intensity-modulate the original optical signal with no reduction in processing speed. Our scheme allows for complete nonlinear on-off contrast in transmission at relatively low optical power thresholds and eliminates the requirement of having additional optical sources between each layer of the network. Moreover, the activation function is reconfigurable via electrical bias, allowing it to be programmed or trained to synthesize a variety of nonlinear responses. Using numerical simulations, we demonstrate that this activation function significantly improves the expressiveness of optical neural networks, allowing them to perform well on two benchmark machine learning tasks: learning a multi-input exclusive-OR (XOR) logic function and classification of images of handwritten numbers from the MNIST dataset. The addition of the nonlinear activation function improves test accuracy on the MNIST task from 85% to 94%.
Article
Neuromorphic photonics has experienced a recent surge of interest over the last few years, promising orders of magnitude improvements in both speed and energy efficiency over digital electronics. This paper provides a tutorial overview of neuromorphic photonic systems and their application to optimization and machine learning problems. We discuss the physical advantages of photonic processing systems, and we describe underlying device models that allow practical systems to be constructed. We also describe several real-world applications for control and deep learning inference. Lastly, we discuss scalability in the context of designing a full-scale neuromorphic photonic processing system, considering aspects such as signal integrity, noise, and hardware fabrication platforms. The paper is intended for a wide audience and teaches how theory, research, and device concepts from neuromorphic photonics could be applied in practical machine learning systems.
Article
With the recent successes of neural networks (NN) to perform machine-learning tasks, photonic-based NN designs may enable high throughput and low power neuromorphic compute paradigms since they bypass the parasitic charging of capacitive wires. Thus, engineering data-information processors capable of executing NN algorithms with high efficiency is of major importance for applications ranging from pattern recognition to classification. Our hypothesis is, therefore, that if the time-limiting electro-optic conversion of current photonic NN designs could be postponed until the very end of the network, then the execution time of the photonic algorithm is simple the delay of the time-of-flight of photons through the NN, which is on the order of picoseconds for integrated photonics. Exploring such all-optical NN, in this work we discuss two independent approaches for implementing the optical perceptron’s nonlinear activation function based on nanophotonic structures exhibiting i) induced transparency and ii) reverse saturated absorption. Our results show that the all-optical nonlinearity provides about 3 and 7 dB extinction ratios for the two systems considered, respectively, and classification accuracies of an exemplary MNIST task of 97% and near 100% are found, which rivals that of software based trained NNs, yet with ignored noise in the network. Together with a developed concept for an all-optical perceptron, these findings point to the possibility of realizing pure photonic NNs with potentially unmatched throughput and even energy consumption for next generation information processing hardware.
Article
Initially developed for gaming and 3-D rendering, graphics processing units (GPUs) were recognized to be a good fit to accelerate deep learning training. Its simple mathematical structure can easily be parallelized and can therefore take advantage of GPUs in a natural way. Further progress in compute efficiency for deep learning training can be made by exploiting the more random and approximate nature of deep learning work flows. In the digital space that means to trade off numerical precision for accuracy at the benefit of compute efficiency. It also opens the possibility to revisit analog computing, which is intrinsically noisy, to execute the matrix operations for deep learning in constant time on arrays of nonvolatile memories. To take full advantage of this in-memory compute paradigm, current nonvolatile memory materials are of limited use. A detailed analysis and design guidelines how these materials need to be reengineered for optimal performance in the deep learning space shows a strong deviation from the materials used in memory applications.
Article
Microring weight banks present novel opportunities for reconfigurable, high-performance analog signal processing in photonics. Controlling microring filter response is a challenge due to fabrication variations and thermal sensitivity. Prior work showed continuous weight control of multiple wavelength-division multiplexed signals in a bank of microrings based on calibration and feedforward control. Other prior work has shown resonance locking based on feedback control by monitoring photoabsorption-induced changes in resistance across in-ring photoconductive heaters. In this work, we demonstrate continuous, multi-channel control of a microring weight bank with an effective 5.1 bits of accuracy on 2Gbps signals. Unlike resonance locking, the approach relies on an estimate of filter transmission versus photo-induced resistance changes. We introduce an estimate still capable of providing 4.2 bits of accuracy without any direct transmission measurements. Furthermore, we present a detailed characterization of this response for different values of carrier wavelength offset and power. Feedback weight control renders tractable the weight control problem in reconfigurable analog photonic networks.
Article
A highly-power-efficient silicon (Si) photonic PAM4 transmitter was developed by integrating a Si segmented Mach-Zehnder modulator and a CMOS driver chip. Si PIN-type phase shifters are directly driven with a CMOS inverter driver array to realize a low power operation. A passive RC equalizing technique was adopted to extend the modulation bandwidth up to 20GHz while maintaining a low power consumption. By integrating a passive RC filter within the photonics chip, we achieved a very compact foot print for the transmitter (450 $\times$ 950 mu;m). The fabricated modulator exhibited a low V $\pi$ L of 0.19 Vcm and a moderate insertion loss of 23.7 dB/cm. The transmitter successfully demonstrated clear eye-openings of PAM4 signal up to 56 Gbps together with a record-high-efficiency of 1.59 mW/Gbps. A low bit-error-rate below KP4 FEC limit (<2.0 $\times$ 10 $^{-4}$ ) was also confirmed at 50-Gbps PAM4 operation even with an un-equalized receiver.
Article
Potential advantages of analog- and mixed-signal nanoelectronic circuits, based on floating-gate devices with adjustable conductance, for neuromorphic computing had been realized long time ago. However, practical realizations of this approach suffered from using rudimentary floating-gate cells of relatively large area. Here, we report a prototype 28 × 28 binary-input, ten-output, three-layer neuromorphic network based on arrays of highly optimized embedded nonvolatile floating-gate cells, redesigned from a commercial 180-nm nor flash memory. All active blocks of the circuit, including 101 780 floating-gate cells, have a total area below 1 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . The network has shown a 94.7% classification fidelity on the common Modified National Institute of Standards and Technology benchmark, close to the 96.2% obtained in simulation. The classification of one pattern takes a sub-1-μs time and a sub-20-nJ energy-both numbers much better than in the best reported digital implementations of the same task. Estimates show that a straightforward optimization of the hardware and its transfer to the already available 55nm technology may increase this advantage to more than 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> × in speed and 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">4</sup> × in energy efficiency.
Article
An analog implementation of a deep machine learning system for efficient feature extraction is presented in this work. It features online unsupervised trainability and non-volatile floating-gate analog storage. It utilizes a massively parallel reconfigurable current-mode analog architecture to realize efficient computation, and leverages algorithm-level feedback to provide robustness to circuit imperfections in analog signal processing. A 3-layer, 7-node analog deep machine-learning engine was fabricated in a 0.13 µm standard CMOS process, occupying 0.36 mm^2 active area. At a processing speed of 8300 input vectors per second, it consumes 11.4 µW from the 3 V supply, achieving 1×10^12 operation per second per Watt of peak energy efficiency. Measurement demonstrates real-time cluster analysis, and feature extraction for pattern recognition with 8-fold dimension reduction with an accuracy comparable to the floating-point software simulation baseline.
Article
We propose an on-chip optical architecture to support massive parallel communication among high-performance spiking laser neurons. Designs for a network protocol, computational element, and waveguide medium are described, and novel methods are considered in relation to prior research in optical on-chip networking, neural networking, and computing. Broadcast-and-weight is a new approach for combining neuromorphic processing and optoelectronic physics, a pairing that is found to yield a variety of advantageous features. We discuss properties and design considerations for architectures for scalable wavelength reuse and biologically relevant organizational capabilities, in addition to aspects of practical feasibility. Given recent developments commercial photonic systems integration and neuromorphic computing, we suggest that a novel approach to photonic spike processing represents a promising opportunity in unconventional computing.
Article
Inspired by the brain’s structure, we have developed an efficient, scalable, and flexible non–von Neumann architecture that leverages contemporary silicon technology. To demonstrate, we built a 5.4-billion-transistor chip with 4096 neurosynaptic cores interconnected via an intrachip network that integrates 1 million programmable spiking neurons and 256 million configurable synapses. Chips can be tiled in two dimensions via an interchip communication interface, seamlessly scaling the architecture to a cortexlike sheet of arbitrary size. The architecture is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification. With 400-pixel-by-240-pixel video input at 30 frames per second, the chip consumes 63 milliwatts.
Article
A single-channel, asynchronous successive-approximation (SA) ADC with improved feedback delay is fabricated in 40 nm CMOS. Compared with a conventional SAR structure that employs a single quantizer controlled by a digital feedback logic loop, the proposed SAR-ADC employs multiple quantizers for each conversion bit, clocked by an asynchronous ripple clock that is generated after each quantization. Hence, the sampling rate of the 6-bit ADC is limited only by the six delays of the Capacitive-DAC settling and each comparator's quantization delay, as the digital logic delay is eliminated. Measurement results of the 40 nm-CMOS SAR-ADC achieves a peak SNDR of 32.9 dB and 30.5 dB, at 1 GS/s and 1.25 GS/s, consuming 5.28 mW and 6.08 mW, leading to a FoM of 148 fJ/conv-step and 178 fJ/conv-step, respectively, in a core area less than 170 um by 85 um.
Conference Paper
Modeling neural tissue is an important tool to investigate biological neural networks. Until recently, most of this modeling has been done using numerical methods. In the European research project "FACETS" this computational approach is complemented by different kinds of neuromorphic systems. A special emphasis lies in the usability of these systems for neuroscience. To accomplish this goal an integrated software/hardware framework has been developed which is centered around a unified neural system description language, called PyNN, that allows the scientist to describe a model and execute it in a transparent fashion on either a neuromorphic hardware system or a numerical simulator. A very large analog neuromorphic hardware system developed within FACETS is able to use complex neural models as well as realistic network topologies, i.e. it can realize more than 10000 synapses per neuron, to allow the direct execution of models which previously could have been simulated numerically only.
Article
Contenido: Bandas de energía en sólidos; Fenómenos de transporte en semiconductores; Características de los diodos de unión; Circuitos de diodos; Características de los transistores; Circuitos digitales; Circuitos integrados: fabricación y características; El transistor a baja frecuencia; Transistor polarizado y su estabilidad térmica; Transistores de efecto de campo; El transistor a altas frecuencias; Amplificadores graduales; Amplificadores de retroalimentación; Estabilidad y osciladores; Amplificadores operacionales; Circuitos integrados como sistemas análogos de construcción por bloques; Circuitos integrados como sistemas digitales de construcción por bloques; Circuitos y sistemas de potencia; Física de dispositivos semiconductores.
Language models are few-shot learners
  • T B Brown
T. B. Brown et al., "Language models are few-shot learners," arXiv preprint arXiv:2005.14165, 2020.