ArticleLiterature Review

Backpropagation and the brain

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

During learning, the brain modifies synapses to improve behaviour. In the cortex, synapses are embedded within multilayered networks, making it difficult to determine the effect of an individual synaptic modification on the behaviour of the system. The backpropagation algorithm solves this problem in deep artificial neural networks, but historically it has been viewed as biologically problematic. Nonetheless, recent developments in neuroscience and the successes of artificial neural networks have reinvigorated interest in whether backpropagation offers insights for understanding learning in the cortex. The backpropagation algorithm learns quickly by computing synaptic updates using feedback connections to deliver error signals. Although feedback connections are ubiquitous in the cortex, it is difficult to see how they could deliver the error signals required by strict formulations of backpropagation. Here we build on past and recent developments to argue that feedback connections may instead induce neural activities whose differences can be used to locally approximate these signals and hence drive effective learning in deep networks in the brain.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We also address common misconceptions regarding perturbative training methods and show that they can indeed scale, contrary to prevailing sentiment in the field. 2,3 Designing dedicated neuromorphic hardware introduces several challenges, particularly when considering the methods for training such systems. Training an ML model amounts to minimizing a specified loss function. ...
... Critically, gradient estimation time was assumed to be a good proxy for training time. 2,3,26 In this paper, we test this assumption and find that the connection between gradient and training accuracy is much more nuanced. Our findings are bolstered by a number of recent papers that have also found perturbative techniques to be more effective than assumed [27][28][29][30] and similarly are convenient to implement in hardware. ...
... Finally, we introduce the distinction between gradient convergence and network accuracy. While the former has generally been the focal point for prior studies in perturbative training of neural networks, 3,26 our numerical studies in the following sections call into question whether this attention is fully warranted. ...
Article
Full-text available
In this work, we explore the capabilities of multiplexed gradient descent (MGD), a scalable and efficient perturbative zeroth-order training method for estimating the gradient of a loss function in hardware and training it via stochastic gradient descent. We extend the framework to include both weight and node perturbation and discuss the advantages and disadvantages of each approach. We investigate the time to train networks using MGD as a function of network size and task complexity. Previous research has suggested that perturbative training methods do not scale well to large problems since in these methods, the time to estimate the gradient scales linearly with the number of network parameters. However, in this work, we show that the time to reach a target accuracy—that is, actually solve the problem of interest—does not follow this undesirable linear scaling and in fact often decreases with network size. Furthermore, we demonstrate that MGD can be used to calculate a drop-in replacement for the gradient in stochastic gradient descent, and therefore, optimization accelerators such as momentum can be used alongside MGD, ensuring compatibility with existing machine learning practices. Our results indicate that MGD can efficiently train large networks on hardware, achieving accuracy comparable with backpropagation, thus presenting a practical solution for future neuromorphic computing systems.
... Perhaps even more prominently, the reverse calculation of gradients in deep neural networks naturally requires knowledge of the forward weights [4][5][6]. While this weight transport is inconsequential when calculations are simply carried out by an arithmetic logic unit, models of error backpropagation in the brain require the corresponding backward transport circuitry to mirror the forward one [7][8][9]. ...
... Similar issues have initially led to a strong pushback against BP-like learning in the brain [9,24]. With the emergence of biologically plausible adaptations of BP [25][26][27][28][29][30][31][32][33][34][35][36][37], several issues of standard BP have been mitigated; however, many of these algorithms still (at least implicitly) rely on copying the weights from the bottom-up pathways to the top-down pathways for correct transportation of errors; for example, approaches such as the Kolen-Pollack algorithm and variants [31][32][33][34] defer the weight transport problem to a weight update transport problem. ...
... Each input consists of a voltage that is converted into spike trains using eqs. (9) and (10). The input for the training phases consists of nine equally spaced voltages between −3 and 3; the input for the validation consists of six voltages between −2.7 and 2.7. ...
Preprint
Full-text available
In both machine learning and in computational neuroscience, plasticity in functional neural networks is frequently expressed as gradient descent on a cost. Often, this imposes symmetry constraints that are difficult to reconcile with local computation, as is required for biological networks or neuromorphic hardware. For example, wake-sleep learning in networks characterized by Boltzmann distributions builds on the assumption of symmetric connectivity. Similarly, the error backpropagation algorithm is notoriously plagued by the weight transport problem between the representation and the error stream. Existing solutions such as feedback alignment tend to circumvent the problem by deferring to the robustness of these algorithms to weight asymmetry. However, they are known to scale poorly with network size and depth. We introduce spike-based alignment learning (SAL), a complementary learning rule for spiking neural networks, which uses spike timing statistics to extract and correct the asymmetry between effective reciprocal connections. Apart from being spike-based and fully local, our proposed mechanism takes advantage of noise. Based on an interplay between Hebbian and anti-Hebbian plasticity, synapses can thereby recover the true local gradient. This also alleviates discrepancies that arise from neuron and synapse variability -- an omnipresent property of physical neuronal networks. We demonstrate the efficacy of our mechanism using different spiking network models. First, we show how SAL can significantly improve convergence to the target distribution in probabilistic spiking networks as compared to Hebbian plasticity alone. Second, in neuronal hierarchies based on cortical microcircuits, we show how our proposed mechanism effectively enables the alignment of feedback weights to the forward pathway, thus allowing the backpropagation of correct feedback errors.
... The human cerebral cortex learns through updating synapses based on input signals that activate multiple regions involved in the learning process (Hebb, 2005;Markram et al., 1997;Bliss & Lømo, 1973). The complex multilayered structure of our brain's neural network makes it challenging to determine the exact mechanism responsible for learning in the brain (Lillicrap et al., 2020). A common way to address this problem in Deep Artificial Neural Networks is through a credit assignment algorithm responsible for updating synaptic connections based on feedback signals. ...
... Backpropagation, while efficient, is often criticized for its biological implausibility, especially regarding the weight transport problem (Grossberg, 1987;Crick, 1989;Schwartz, 1993). Unlike the brain, which uses asymmetrical feedback signals, backpropagation employs identical weights for forward and backward passes (Lillicrap et al., 2020). Efforts to create biologically plausible credit assignment methods focus on reducing weight transport. ...
... Feedback Alignment (FA) (Lillicrap et al., 2016) uses fixed random feedback matrices, avoids weight transport entirely and demonstrates that symmetry is unnecessary for training. This mechanism aligns forward synaptic connections with synthetic feedback, making errors derived by feedforward weights converge toward those calculated by synthetic backward matrices (Lillicrap et al., 2020). Additionally, biologically plausible methods could overcome backpropagation's sequential nature, which limits computational efficiency. ...
Preprint
Full-text available
Backpropagation (BP) has long been the predominant method for training neural networks due to its effectiveness. However, numerous alternative approaches, broadly categorized under feedback alignment, have been proposed, many of which are motivated by the search for biologically plausible learning mechanisms. Despite their theoretical appeal, these methods have consistently underperformed compared to BP, leading to a decline in research interest. In this work, we revisit the role of such methods and explore how they can be integrated into standard neural network training pipelines. Specifically, we propose fine-tuning BP-pre-trained models using Sign-Symmetry learning rules and demonstrate that this approach not only maintains performance parity with BP but also enhances robustness. Through extensive experiments across multiple tasks and benchmarks, we establish the validity of our approach. Our findings introduce a novel perspective on neural network training and open new research directions for leveraging biologically inspired learning rules in deep learning.
... A core task in neural network training is synaptic "credit assignment" for error in downstream layers [1][2][3]. Although backpropagation has been established as the standard approach to this problem, its biological plausibility has been questioned [3][4][5]. ...
... A core task in neural network training is synaptic "credit assignment" for error in downstream layers [1][2][3]. Although backpropagation has been established as the standard approach to this problem, its biological plausibility has been questioned [3][4][5]. Backpropagation requires bidirectional synaptic communication, which is incompatible with the unidirectional transmission of neural action potentials [3]. Consequently, backpropagation would require either symmetric neural connectivity or a parallel reversed network for error feedback to earlier layers [3,6,7]. ...
... Although backpropagation has been established as the standard approach to this problem, its biological plausibility has been questioned [3][4][5]. Backpropagation requires bidirectional synaptic communication, which is incompatible with the unidirectional transmission of neural action potentials [3]. Consequently, backpropagation would require either symmetric neural connectivity or a parallel reversed network for error feedback to earlier layers [3,6,7]. ...
Preprint
Full-text available
State-of-the-art methods for backpropagation-free learning employ local error feedback to direct iterative optimisation via gradient descent. In this study, we examine the more restrictive setting where retrograde communication from neuronal outputs is unavailable for pre-synaptic weight optimisation. To address this challenge, we propose Forward Projection (FP). This novel randomised closed-form training method requires only a single forward pass over the entire dataset for model fitting, without retrograde communication. Target values for pre-activation membrane potentials are generated layer-wise via nonlinear projections of pre-synaptic inputs and the labels. Local loss functions are optimised over pre-synaptic inputs using closed-form regression, without feedback from neuronal outputs or downstream layers. Interpretability is a key advantage of FP training; membrane potentials of hidden neurons in FP-trained networks encode information which is interpretable layer-wise as label predictions. We demonstrate the effectiveness of FP across four biomedical datasets. In few-shot learning tasks, FP yielded more generalisable models than those optimised via backpropagation. In large-sample tasks, FP-based models achieve generalisation comparable to gradient descent-based local learning methods while requiring only a single forward propagation step, achieving significant speed up for training. Interpretation functions defined on local neuronal activity in FP-based models successfully identified clinically salient features for diagnosis in two biomedical datasets. Forward Projection is a computationally efficient machine learning approach that yields interpretable neural network models without retrograde communication of neuronal activity during training.
... In this article, we demonstrate the effectiveness of multiplexed gradient descent (MGD), a general perturbative training framework, in matching the accuracy of the backpropagation algorithm in identical network architectures, even for large networks (> 10 6 parameters). We also address common misconceptions regarding perturbative training methods and show that they can indeed scale, contrary to prevailing sentiment in the field [2,3]. ...
... Despite early interest [14][15][16][17][18][19][20][21][22][23][24][25] and ease of implementation, perturbative techniques fell out of favor due to the poor scaling of time to estimate the gradient. Critically, gradient estimation time was assumed to be a good proxy for training time [2,3,26]. In this paper we test this assumption and find that the connection between gradient and training accuracy is much more nuanced. ...
... Finally, we introduce the distinction between gradient convergence and network accuracy. Whereas the former has generally been the focal point for prior studies in perturbative training of neural networks [3,26], our numerical studies in the following sections call into question whether this attention is fully warranted. ...
Preprint
Full-text available
In this work, we explore the capabilities of multiplexed gradient descent (MGD), a scalable and efficient perturbative zeroth-order training method for estimating the gradient of a loss function in hardware and training it via stochastic gradient descent. We extend the framework to include both weight and node perturbation, and discuss the advantages and disadvantages of each approach. We investigate the time to train networks using MGD as a function of network size and task complexity. Previous research has suggested that perturbative training methods do not scale well to large problems, since in these methods the time to estimate the gradient scales linearly with the number of network parameters. However, in this work we show that the time to reach a target accuracy--that is, actually solve the problem of interest--does not follow this undesirable linear scaling, and in fact often decreases with network size. Furthermore, we demonstrate that MGD can be used to calculate a drop-in replacement for the gradient in stochastic gradient descent, and therefore optimization accelerators such as momentum can be used alongside MGD, ensuring compatibility with existing machine learning practices. Our results indicate that MGD can efficiently train large networks on hardware, achieving accuracy comparable to backpropagation, thus presenting a practical solution for future neuromorphic computing systems.
... In particular, the backpropagation algorithm requires information that is spatially nonlocal (e.g., encoded by synaptically distant neurons) and collected at different timepoints to be accurately distributed to all neurons in the network (Lillicrap et al., 2020) . The idea that apical dendrites might be key to addressing this problem has gained a lot of traction in the past decade. ...
... Learning in predictive coding is through to rely on Hebbian plasticity rules, but involves several problematic assumptions. First, this learning algorithm assumes that feedforward and feedback weights are symmetric, which is also called the "weight transport problem" (Lillicrap et al., 2020) . This pre-established perfect weight symmetry is implausible, but it has been shown that through an additional weight decay term, weights can be aligned sufficiently to enable learning without symmetry (Alonso and Neftci, 2021) . ...
Preprint
Full-text available
This review synthesizes advances in predictive processing within the sensory cortex. Predictive processing theorizes that the brain continuously predicts sensory inputs, refining neuronal responses by highlighting prediction errors. We identify key computational primitives, such as stimulus adaptation, dendritic computation, excitatory/inhibitory balance and hierarchical processing, as central to this framework. Our review highlights convergences, such as top-down inputs and inhibitory interneurons shaping mismatch signals, and divergences, including species-specific hierarchies and modality-dependent layer roles. To address these conflicts, we propose experiments in mice and primates using in-vivo two-photon imaging and electrophysiological recordings to test whether temporal, motor, and omission mismatch stimuli engage shared or distinct mechanisms. The resulting dataset, collected and shared via the OpenScope program, will enable model validation and community analysis, fostering iterative refinement and refutability to decode the neural circuits of predictive processing.
... Extensive research explores whether and how BP can be realized in biological neural networks (Lillicrap et al., 2020). Pineda proposed a neural network model formulated as a dynamical system, where each node's activity evolves over time (Pineda, 1987). ...
... These challenges drive the search for alternative learning algorithms. Beyond biological plausibility (Lillicrap et al., 2020;Ernoult et al., 2022;Hinton, 2022), practical effectiveness in solving real-world problems is also a key requirement for new methods. ...
Preprint
Efficient training of artificial neural networks remains a key challenge in deep learning. Backpropagation (BP), the standard learning algorithm, relies on gradient descent and typically requires numerous iterations for convergence. In this study, we introduce Expectation Reflection (ER), a novel learning approach that updates weights multiplicatively based on the ratio of observed to predicted outputs. Unlike traditional methods, ER maintains consistency without requiring ad hoc loss functions or learning rate hyperparameters. We extend ER to multilayer networks and demonstrate its effectiveness in performing image classification tasks. Notably, ER achieves optimal weight updates in a single iteration. Additionally, we reinterpret ER as a modified form of gradient descent incorporating the inverse mapping of target propagation. These findings suggest that ER provides an efficient and scalable alternative for training neural networks.
... Modern deep learning methods rely on high-dimensional gradients that propagate backward through layers, ensuring a precise credit assignment to each parameter [1,2]. However, this process is computationally intensive and biologically implausible, given the lack of direct evidence for backpropagation in the brain [3,4]. If the loss gradient does not propagate back through the network, it must be obtained through alternative routes [5]. ...
... 3. We show that more complex yet highly useful architectures, such as convolutional networks and transformers, can also be effectively trained with low-dimensional feedback. 4. We reveal that error dimensionality shapes the receptive fields in a model of the ventral visual system, offering new insights into the relationship between learning mechanisms and biological neural representations. ...
Preprint
Full-text available
Training deep neural networks typically relies on backpropagating high dimensional error signals a computationally intensive process with little evidence supporting its implementation in the brain. However, since most tasks involve low-dimensional outputs, we propose that low-dimensional error signals may suffice for effective learning. To test this hypothesis, we introduce a novel local learning rule based on Feedback Alignment that leverages indirect, low-dimensional error feedback to train large networks. Our method decouples the backward pass from the forward pass, enabling precise control over error signal dimensionality while maintaining high-dimensional representations. We begin with a detailed theoretical derivation for linear networks, which forms the foundation of our learning framework, and extend our approach to nonlinear, convolutional, and transformer architectures. Remarkably, we demonstrate that even minimal error dimensionality on the order of the task dimensionality can achieve performance matching that of traditional backpropagation. Furthermore, our rule enables efficient training of convolutional networks, which have previously been resistant to Feedback Alignment methods, with minimal error. This breakthrough not only paves the way toward more biologically accurate models of learning but also challenges the conventional reliance on high-dimensional gradient signals in neural network training. Our findings suggest that low-dimensional error signals can be as effective as high-dimensional ones, prompting a reevaluation of gradient-based learning in high-dimensional systems. Ultimately, our work offers a fresh perspective on neural network optimization and contributes to understanding learning mechanisms in both artificial and biological systems.
... In the machine learning, Artificial Neural Networks (ANNs) -the core of most solutions in this field-are a very simple model of the brain. However, it has been analyzed that the algorithm responsible for training those structures is very different from what happens in the brain [1]. This well-known algorithm is called backpropagation (BP) [2]. ...
... This approach is different from the traditional BP algorithm, which employs the CE loss function (but is not limited to this loss) at the end of the network. From this point of view, FF is more similar to brain function, where it has been shown that BP is not [1]. ...
Preprint
Although backpropagation is widely accepted as a training algorithm for artificial neural networks, researchers are always looking for inspiration from the brain to find ways with potentially better performance. Forward-Forward is a new training algorithm that is more similar to what occurs in the brain, although there is a significant performance gap compared to backpropagation. In the Forward-Forward algorithm, the loss functions are placed after each layer, and the updating of a layer is done using two local forward passes and one local backward pass. Forward-Forward is in its early stages and has been designed and evaluated on simple multi-layer perceptron networks to solve image classification tasks. In this work, we have extended the use of this algorithm to a more complex and modern network, namely the Vision Transformer. Inspired by insights from contrastive learning, we have attempted to revise this algorithm, leading to the introduction of Contrastive Forward-Forward. Experimental results show that our proposed algorithm performs significantly better than the baseline Forward-Forward leading to an increase of up to 10% in accuracy and boosting the convergence speed by 5 to 20 times on Vision Transformer. Furthermore, if we take Cross Entropy as the baseline loss function in backpropagation, it will be demonstrated that the proposed modifications to the baseline Forward-Forward reduce its performance gap compared to backpropagation on Vision Transformer, and even outperforms it in certain conditions, such as inaccurate supervision.
... Neural networks (NN) play a crucial role in the integration of AI within CDSS [41]. These algorithms are designed to function in a way that mimics the pattern recognition capabilities of the human brain, as shown in Fig. 4. In health care, NNs can analyze complex data sets, such as medical images or patient histories, to detect patterns that may be overlooked by health care professionals. ...
... Recent evidence in large models demonstrates the potential of reinforcement learning in replacing classical supervised learning. Furthermore, there is no direct evidence that the brain uses a backprop-like algorithm for learning 55 , which raises question about backpropagation's suitability for continual scenarios where precise gradient calculations conflict with dynamic stability-plasticity balance. In addition, it has been shown that biological cognition system typically employs dual-process mechanisms 56 , i.e., a fast system quickly performs intuition-driven processing for familiar inputs, and a slow system engages in eliberative reasoning engaging memory retrieval and algorithmic processing. ...
Preprint
Humans and most animals inherently possess a distinctive capacity to continually acquire novel experiences and accumulate worldly knowledge over time. This ability, termed continual learning, is also critical for deep neural networks (DNNs) to adapt to the dynamically evolving world in open environments. However, DNNs notoriously suffer from catastrophic forgetting of previously learned knowledge when trained on sequential tasks. In this work, inspired by the interactive human memory and learning system, we propose a novel biomimetic continual learning framework that integrates semi-parametric memory and the wake-sleep consolidation mechanism. For the first time, our method enables deep neural networks to retain high performance on novel tasks while maintaining prior knowledge in real-world challenging continual learning scenarios, e.g., class-incremental learning on ImageNet. This study demonstrates that emulating biological intelligence provides a promising path to enable deep neural networks with continual learning capabilities.
... A key strength of RCs lie in their training efficiency; by keeping the reservoir fixed and training only the output layer, RCs achieves faster training and lower computational demand [3][4][5] compared with other machine learning approaches, such as convolutional neural networks (CNNs), long-short term memory networks (LSTMs), and transformers [6]. These alternatives typically rely on feed-forward architectures requiring extensive multiple-layer training via back-propagation [7]and substantial data and computational resources. ...
Preprint
Full-text available
Reservoir computers (RCs) provide a computationally efficient alternative to deep learning while also offering a framework for incorporating brain-inspired computational principles. By using an internal neural network with random, fixed connections-the 'reservoir'-and training only the output weights, RCs simplify the training process but remain sensitive to the choice of hyperparameters that govern activation functions and network architecture. Moreover, typical RC implementations overlook a critical aspect of neuronal dynamics: the balance between excitatory and inhibitory (E-I) signals, which is essential for robust brain function. We show that RCs characteristically perform best in balanced or slightly over-inhibited regimes, outperforming excitation-dominated ones. To reduce the need for precise hyperparameter tuning, we introduce a self-adapting mechanism that locally adjusts E/I balance to achieve target neuronal firing rates, improving performance by up to 130% in tasks like memory capacity and time series prediction compared with globally tuned RCs. Incorporating brain-inspired heterogeneity in target neuronal firing rates further reduces the need for fine-tuning hyperparameters and enables RCs to excel across linear and non-linear tasks. These results support a shift from static optimization to dynamic adaptation in reservoir design, demonstrating how brain-inspired mechanisms improve RC performance and robustness while deepening our understanding of neural computation.
... Authors claim that standard ML algorithms for optimization like backpropagation may have, at least partially, some correspondences in biological brains and they stress the importance of determining "whether and how brains implement these algorithms". Indeed, the bio-plausibility of backpropation has been a historically contentious matter, but one that has of late been successfully defended [11]. The use of DL as a "framework of neuroscience" has recently been advocated in [12], arguing that the same building blocks investigated for DL design: objective functions, learning rules and model architectures, would benefit Systems Neuroscience Research. ...
... Although BP-based approaches achieve strong performance, they face criticism for their lack of biological plausibility and computational inefficiency [12]- [16]. These challenges include issues such as the weight transport problem [17]- [19], global error propagation [14], [20], and reliance on explicit gradient calculations [21], [22]. In response, recent research has investigated alternative training methods to overcome the limitations of BP-based methods. ...
Preprint
Full-text available
Recent research has shown the vulnerability of Spiking Neural Networks (SNNs) under adversarial examples that are nearly indistinguishable from clean data in the context of frame-based and event-based information. The majority of these studies are constrained in generating adversarial examples using Backpropagation Through Time (BPTT), a gradient-based method which lacks biological plausibility. In contrast, local learning methods, which relax many of BPTT's constraints, remain under-explored in the context of adversarial attacks. To address this problem, we examine adversarial robustness in SNNs through the framework of four types of training algorithms. We provide an in-depth analysis of the ineffectiveness of gradient-based adversarial attacks to generate adversarial instances in this scenario. To overcome these limitations, we introduce a hybrid adversarial attack paradigm that leverages the transferability of adversarial instances. The proposed hybrid approach demonstrates superior performance, outperforming existing adversarial attack methods. Furthermore, the generalizability of the method is assessed under multi-step adversarial attacks, adversarial attacks in black-box FGSM scenarios, and within the non-spiking domain.
... Reciprocal interareal connections have been characterized as conveying predictions or expectations and errors respectively. However, it remains unclear how local circuits process predictions, particularly whether or how they explicitly or implicitly compute differences from local activity or other afferent signals (de Lange, Heilbron, and Kok 2018;Keller and Mrsic-Flogel 2018;Marques et al. 2018;Sacramento et al. 2018;Whittington and Bogacz 2019;Jordan and Keller 2020;Lillicrap et al. 2020;Payeur et al. 2021;Garner and Keller 2022;Greedy et al. 2022;Hertäg and Clopath 2022;Audette and Schneider 2023;Aceituno et al. 2024;Dias et al. 2024;Ellenberger et al. 2024;Furutachi et al. 2024;Seignette et al. 2024). A clearer picture of the functional role of feedback connections will certainly come from their direct manipulations in awake primates (Debes and Dragoi 2023;Andrei and Dragoi 2025), complemented by computational perspectives (Tugsbayar et al. 2025). ...
Preprint
Full-text available
Neuronal circuits of the cerebral cortex are the structural basis of mammalian cognition. The same qualitative components and connectivity motifs are repeated across functionally specialized cortical areas and mammalian species, suggesting a single underlying algorithmic motif. Here, we propose a perspective on current knowledge of the cortical structure, from which we extract two core principles for computational modeling. The first principle is that cortical cell types fulfill distinct computational roles. The second principle is that cortical connectivity can be efficiently characterized by only a few canonical blueprints of connectivity between cell types. Starting with these two foundational principles, we outline a general framework for building functional and mechanistic models of cortical circuits.
... In this study, the activation functions include a linear function at the output layer, while hyperbolic tangent sigmoid functions are used in the other layers, and the data is scaled within [−1, 1]. The network's weights are trained using the Levenberg-Marquardt backpropagation algorithm [31], which minimizes the sum of squared errors as the loss function. The architecture of the neural networks was determined through a trial-and-error approach. ...
Preprint
Full-text available
This paper presents a novel method to optimize thermal balance in parabolic trough collector (PTC) plants. It uses a market-based system to distribute flow among loops combined with an artificial neural network (ANN) to reduce computation and data requirements. This auction-based approach balances loop temperatures, accommodating varying thermal losses and collector efficiencies. Validation across different thermal losses, optical efficiencies, and irradiance conditions-sunny, partially cloudy, and cloudy-show improved thermal power output and intercept factors compared to a no-allocation system. It demonstrates scalability and practicality for large solar thermal plants, enhancing overall performance. The method was first validated through simulations on a realistic solar plant model, then adapted and successfully tested in a 50 MW solar trough plant, demonstrating its advantages. Furthermore, the algorithms have been implemented, commissioned, and are currently operating in 13 commercial solar trough plants.
... While there is no evidence that backpropagation exists in natural intelligence (Lillicrap et al., 2020), some studies put efforts into designing biologically plausible forward-only learning algorithms. For example, Nøkland (2016) employs the direct feedback alignment to train hidden layers independently. ...
Preprint
Full-text available
Differential privacy (DP) in deep learning is a critical concern as it ensures the confidentiality of training data while maintaining model utility. Existing DP training algorithms provide privacy guarantees by clipping and then injecting external noise into sample gradients computed by the backpropagation algorithm. Different from backpropagation, forward-learning algorithms based on perturbation inherently add noise during the forward pass and utilize randomness to estimate the gradients. Although these algorithms are non-privatized, the introduction of noise during the forward pass indirectly provides internal randomness protection to the model parameters and their gradients, suggesting the potential for naturally providing differential privacy. In this paper, we propose a \blue{privatized} forward-learning algorithm, Differential Private Unified Likelihood Ratio (DP-ULR), and demonstrate its differential privacy guarantees. DP-ULR features a novel batch sampling operation with rejection, of which we provide theoretical analysis in conjunction with classic differential privacy mechanisms. DP-ULR is also underpinned by a theoretically guided privacy controller that dynamically adjusts noise levels to manage privacy costs in each training step. Our experiments indicate that DP-ULR achieves competitive performance compared to traditional differential privacy training algorithms based on backpropagation, maintaining nearly the same privacy loss limits.
... After feature extraction, facial data is recognized using ANN algorithms, namely backpropagation and CNN. Backpropagation algorithms learn quickly by computing synaptic updates using feedback connections to send error signals [27]. CNN was chosen as a classification method because of its compatibility with image data, where CNN can independently learn and extract features from an image [28]. ...
Article
Full-text available
Facial recognition is a biometric system used to identify individuals through faces. Although this technology has many advantages, it still faces several challenges. One of the main challenges is that the level of accuracy has yet to reach its maximum potential. This research aims to improve facial recognition performance by applying the discrete cosine transform (DCT) and Gaussian mixture model (GMM), which are then trained with backward propagation of errors (backpropagation) and convolutional neural networks (CNN). The research results show low DCT and GMM feature extraction accuracy with backpropagation of 4.88%. However, the combination of DCT, GMM, and CNN feature extraction produces an accuracy of up to 98.2% and a training time of 360 seconds on the Olivetti Research Laboratory (ORL) dataset, an accuracy of 98.9% and a training time of 1210 seconds on the Yale dataset, and 100% accuracy and training time 1749 seconds on the Japanese female facial expression (JAFFE) dataset. This improvement is due to the combination of DCT, GMM, and CNN's ability to remove noise and study images accurately. This research is expected to significantly contribute to overcoming accuracy challenges and increasing the flexibility of facial recognition systems in various practical situations, as well as the potential to improve security and reliability in security and biometrics.
... To go beyond backpropagation, biologically-inspired algorithms aim to replicate learning processes from the brain [14], [37], [48], a prime example of efficient learning in physical networks. In contrast to backpropagation, the brain is hypothesized to learn solely through local neuronal activity, avoiding any energy-intensive data shuffling [27], [36]. As research interest into in-memory computing architectures increases, previously separated information become physically locally available. ...
Preprint
Full-text available
Equilibrium Propagation (EP) is a supervised learning algorithm that trains network parameters using local neuronal activity. This is in stark contrast to backpropagation, where updating the parameters of the network requires significant data shuffling. Avoiding data movement makes EP particularly compelling as a learning framework for energy-efficient training on neuromorphic systems. In this work, we assess the ability of EP to learn on hardware that contain physical uncertainties. This is particularly important for researchers concerned with hardware implementations of self-learning systems that utilize EP. Our results demonstrate that deep, multi-layer neural network architectures can be trained successfully using EP in the presence of finite uncertainties, up to a critical limit. This limit is independent of the training dataset, and can be scaled through sampling the network according to the central limit theorem. Additionally, we demonstrate improved model convergence and performance for finite levels of uncertainty on the MNIST, KMNIST and FashionMNIST datasets. Optimal performance is found for networks trained with uncertainties close to the critical limit. Our research supports future work to build self-learning hardware in situ with EP.
... However, computational limitations led to an "AI winter" where progress stalled. AI research revived in the 1980s with the introduction of machine learning and neural networks, with some exciting works in backpropagation [85], speech and image recognition, and robotic applications. Despite this progress, limited data and processing power continued to slow development. ...
Article
Full-text available
The Industrial Revolution (IR) involves a centuries-long process of economic and societal transformation driven by industrial and technological innovation. From agrarian, craft-based societies to modern systems powered by Artificial Intelligence (AI), each IR has brought significant societal advancements yet raised concerns about future implications. As we transition from the Fourth Industrial Revolution (IR4.0) to the emergent Fifth Industrial Revolution (IR5.0), similar questions arise regarding human employment, technological control, and adaptation. During all these shifts, a recurring theme emerges as we fear the unknown and bring a concern that machines may replace humans’ hard and soft skills. Therefore, comprehensive preparation, critical discussion, and future-thinking policies are necessary to successfully navigate any industrial revolution. While IR4.0 emphasized cyber-physical systems, IoT (Internet of Things), and AI-driven automation, IR5.0 aims to integrate these technologies, keeping human, emotion, intelligence, and ethics at the center. This paper critically examines this transition by highlighting the technological foundations, socioeconomic implications, challenges, and opportunities involved. We explore the role of AI, blockchain, edge computing, and immersive technologies in shaping IR5.0, along with workforce reskilling strategies to bridge the potential skills gap. Learning from historic patterns will enable us to navigate this era of change and mitigate any uncertainties in the future.
... Moreover, BP lacks support from biological evidence regarding how data is transmitted through the brain, as information spreads in multiple forward directions in the cortex region [36], while BP does not replicate this unidirectional flow. In order to provide a more faithful representation of the brain structure in the neural networks, a novel algorithm, named Forward-Forward (FF) propagation, has been developed by Nobel Laureate Prof. Geoffrey Hinton to train fully-connected networks [37]. ...
... SPMs lack support for AD because they smooth the expected loss landscape through stochasticity and therefore require the calculation of expected values at the network output. While output smoothing allows the implementation of the finite difference algorithm, this algorithm does not scale to large models and is therefore of little practical use for training ANNs (Werfel et al., 2004;Lillicrap et al., 2020). The application of AD, however, requires differentiable models, like standard ANNs, so that the chain rule can be used to decompose the gradient computation into simple primitives. ...
Article
Full-text available
Training spiking neural networks to approximate universal functions is essential for studying information processing in the brain and for neuromorphic computing. Yet the binary nature of spikes poses a challenge for direct gradient-based training. Surrogate gradients have been empirically successful in circumventing this problem, but their theoretical foundation remains elusive. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches. On the one hand, we consider smoothed probabilistic models, which, due to the lack of support for automatic differentiation, are impractical for training multilayer spiking neural networks but provide derivatives equivalent to surrogate gradients for single neurons. On the other hand, we investigate stochastic automatic differentiation, which is compatible with discrete randomness but has not yet been used to train spiking neural networks. We find that the latter gives surrogate gradients a theoretical basis in stochastic spiking neural networks, where the surrogate derivative matches the derivative of the neuronal escape noise function. This finding supports the effectiveness of surrogate gradients in practice and suggests their suitability for stochastic spiking neural networks. However, surrogate gradients are generally not gradients of a surrogate loss despite their relation to stochastic automatic differentiation. Nevertheless, we empirically confirm the effectiveness of surrogate gradients in stochastic multilayer spiking neural networks and discuss their relation to deterministic networks as a special case. Our work gives theoretical support to surrogate gradients and the choice of a suitable surrogate derivative in stochastic spiking neural networks.
... Most recent machine learning models have shown great effectiveness at solving a wide range of complex cognitive tasks (LeCun et al., 2015;Whittington & Bogacz, 2019), and backpropagation algorithms seem to be at the core of the majority of those models, proving it to be one of the most reliable and fast ways for machines to learn (Bartunov et al., 2018;Lillicrap et al., 2020;Marblestone et al., 2016). Visual pattern recognition is one of the many fields in which backpropagation algorithms thrive (Goodfellow et al., 2016;LeCun et al., 2015;Sutskever et al., 2013). ...
Article
Full-text available
A vast majority of the current research in the field of machine learning is done using algorithms with strong arguments pointing to their biological implausibility such as backpropagation, deviating the field’s focus from understanding its original organic inspiration to a compulsive search for optimal performance. Yet there have been a few proposed models that respect most of the biological constraints present in the human brain and are valid candidates for mimicking some of its properties and mechanisms. In this letter, we focus on guiding the learning of a biologically plausible generative model called the Helmholtz machine in complex search spaces using a heuristic based on the human image perception mechanism. We hypothesize that this model’s learning algorithm is not fit for deep networks due to its Hebbian-like local update rule, rendering it incapable of taking full advantage of the compositional properties that multilayer networks provide. We propose to overcome this problem by providing the network’s hidden layers with visual queues at different resolutions using multilevel data representation. The results on several image data sets showed that the model was able to not only obtain better overall quality but also a wider diversity in the generated images, corroborating our intuition that using our proposed heuristic allows the model to take more advantage of the network’s depth growth. More important, they show the unexplored possibilities underlying brain-inspired models and techniques.
... In order to train the multilayer feed-forward network, the backpropagation algorithm [47] is generally employed. This is a supervised learning method in which the network's output is compared to a known target during training to indicate how well the network is performing. ...
Preprint
In this work, we introduce Virology-Informed Neural Networks (VINNs), a powerful tool for capturing the intricate dynamics of viral infection when data of some compartments of the model are not available. VINNs, an extension of the widely known Physics-Informed Neural Networks (PINNs), offer an alternative approach to traditional numerical methods for solving system of differential equations. We apply this VINN technique on a recently proposed hepatitis B virus (HBV) infection dynamics model to predict the transmission of the infection within the liver more accurately. This model consists of four compartments, namely uninfected and infected hepatocytes, rcDNA-containing capsids, and free viruses, along with the consideration of capsid recycling. Leveraging the power of VINNs, we study the impacts of variations in parameter range, experimental noise, data variability, network architecture, and learning rate in this work. In order to demonstrate the robustness and effectiveness of VINNs, we employ this approach on the data collected from nine HBV-infceted chimpanzees, and it is observed that VINNs can effectively estimate the model parameters. VINNs reliably capture the dynamics of infection spread and accurately predict their future progression using real-world data. Furthermore, VINNs efficiently identify the most influential parameters in HBV dynamics based solely on experimental data from the capsid component. It is also expected that this framework can be extended beyond viral dynamics, providing a powerful tool for uncovering hidden patterns and complex interactions across various scientific and engineering domains.
... Unlike simple artificial neural networks, it uses both unsupervised and supervised learning strategies. Unsupervised learning, i.e., competitive learning [32] based on Euclidean distance (also called the k-means clustering algorithm) is used between the input and hidden layer, whereas supervised learning based on the back-propagation algorithm [33] is used between hidden and output layers. The Gaussian function [34,35] shown in Equation (5) is used as an activation function in the hidden layer and the linear activation function is used in the output layer. ...
Article
Full-text available
Electric load forecasting is an essential task for Distribution System Operators in order to achieve proper planning, high integration of small-scale production from renewable energy sources, and to define effective marketing strategies. In this framework, machine learning and data dimensionality reduction techniques can be useful for building more efficient tools for electrical energy load prediction. In this paper, a machine learning model based on a combination of a radial basis function neural network and an autoencoder is used to forecast the electric load on a 33/11 kV substation located in Godishala, Warangal, India. One year of historical data on an electrical substation and weather are considered to assess the effectiveness of the proposed model. The impact of weather, day, and season status on load forecasting is also considered. The input dataset dimensionality is reduced using autoencoder to build a light-weight machine learning model to be deployed on edge devices. The proposed methodology is supported by a comparison with the state of the art based on extensive numerical simulations.
... The non-locality and update-locking features of BP, among others, have been argued as reasons that make BP unlikely as the learning rule used by the brain [19]. Different local learning mechanisms that may not rely on the propagation of errors using symmetric weights have been explored in many works [6,8,10,13,27]. ...
Conference Paper
Full-text available
Training deep neural networks (DNNs) using traditional backpropagation (BP) presents challenges in terms of computational complexity and energy consumption, particularly for on-device learning where computational resources are limited. Various alternatives to BP, including random feedback alignment, forward-forward, and local classifiers, have been explored to address these challenges. These methods have their advantages, but they can encounter difficulties when dealing with intricate visual tasks or demand considerable computational resources. In this paper, we propose a novel Local Learning rule inspired by neural activity Synchronization phenomena (LLS) observed in the brain. LLS utilizes fixed periodic basis vectors to synchronize neuron activity within each layer, enabling efficient training without the need for additional trainable parameters. We demonstrate the effectiveness of LLS and its variations , LLS-M and LLS-MxM, on multiple image classification datasets, achieving accuracy comparable to BP with reduced computational complexity and minimal additional parameters. Specifically, LLS achieves comparable performance with up to 300× fewer multiply-accumulate (MAC) operations and half the memory requirements of BP. Furthermore , the performance of LLS on the Visual Wake Word (VWW) dataset highlights its suitability for on-device learning tasks, making it a promising candidate for edge hardware implementations.
... КОЛМОГОРОВА, П.А. НАЛОБИНА [Lillicrap et al., 2020] считают, что приобретение и структурирование субъективного опыта в кортексе человека работает точно так же, как механизм обратного распространения ошибки, широко используемый при обучении нейросетей. Таким образом, отмечают исследователи, последний можно использовать для объяснения нейрокогнитивных процессов у человека. ...
Article
The article is devoted to the description of the differences in the conceptualization of space observed in informants, large language models and computer vision models capable of generating a text describing what they “saw”. We use the concept of a cognitive agent and substantiate the distinction between “natural vs artificial cognitive agent”: the first is understood as a person, the second is an AI model capable of making decisions and performing tasks adequately in a given situation. The aim of the study is to compare the ways of understanding the location of an object in space in natural cognitive agents and artificial cognitive agents of two types: large language models and models created for Image to Text task. The main methods are the method of linguistic experiment and the method of semantic description based on the theory of topological semantics by L. Talmi. As an incentive material, six paintings from the collection of the State Hermitage Museum were used, divided into three groups: portraits, monofigure paintings on mythological or religious themes, and multifigure compositions. The participants of the experiments were: 63 informants (Mean age = 19.1, 48 females, 15 males), 5 LLMs, 6 Image to Text models based on computer vision technology and capable of generating descriptions of recognized images in English. Using the typology of configurational topological schemes and “figure – background” type schemes, we compared the ways of understanding space that the models rely on. As a result, we have formulated a number of conclusions, the most important of which is that natural cognitive agents differ from artificial cognitive agents in its ability to integrate the process of conceptualization of an object in space into other cognitive processes: entity recognition and categorization, attention mechanisms, awareness of cause-and-effect relationships. Artificial cognitive agents are only learning such integrativity and mutual coordination, for example, when generative models conceptualize those objects in which they are not sure, since these are products of hallucination, as objects with fuzzy boundaries, and Image to Text models combine into a single heterogeneous human object and the most striking original detail of its environment, because they “believe” that this is the most important thing for description tasks.
... In addition, the lack of perfect controltransformation mapping on the typically noisy and lossy analogue computation platforms 22,23 can't satiate the high requirements of standard BP. The mismatch between the user's interpretation and the PNN's actual state can lead to failed training or dramatic performance degradation 24,25 . Therefore, some non-gradient-based [26][27][28][29][30][31] or modelfree/stochastic methods [32][33][34][35] are proposed to bypass the requirement for formulating a model description. ...
Article
Full-text available
Photonic neural networks (PNNs) are fast in-propagation and high bandwidth paradigms that aim to popularize reproducible NN acceleration with higher efficiency and lower cost. However, the training of PNN is known to be challenging, where the device-to-device and system-to-system variations create imperfect knowledge of the PNN. Despite backpropagation (BP)-based training algorithms being the industry standard for their robustness, generality, and fast gradient convergence for digital training, existing PNN-BP methods rely heavily on accurate intermediate state extraction or extensive computational resources for deep PNNs (DPNNs). The truncated photonic signal propagation and the computation overhead bottleneck DPNN’s operation efficiency and increase system construction cost. Here, we introduce the asymmetrical training (AsyT) method, tailored for encapsulated DPNNs, where the signal is preserved in the analogue photonic domain for the entire structure. AsyT offers a lightweight solution for DPNNs with minimum readouts, fast and energy-efficient operation, and minimum system footprint. AsyT’s ease of operation, error tolerance, and generality aim to promote PNN acceleration in a widened operational scenario despite the fabrication variations and imperfect controls. We demonstrated AsyT for encapsulated DPNN with integrated photonic chips, repeatably enhancing the performance from in-silico BP for different network structures and datasets.
... Despite many successes in training spiking neural networks, backpropagation has faced persistent criticism for its lack of alignment with the mechanisms of the brain [5], raising doubts about its viability as a model for credit assignment in neural systems [6]. One of the major limitations lies in its requirement to explicitly store all neural activity for later use during synaptic adjustments. ...
Preprint
Full-text available
Spiking Neural Networks (SNNs) offer a biologically inspired computational paradigm that emulates neuronal activity through discrete spike-based processing. Despite their advantages, training SNNs with traditional backpropagation (BP) remains challenging due to computational inefficiencies and a lack of biological plausibility. This study explores the Forward-Forward (FF) algorithm as an alternative learning framework for SNNs. Unlike backpropagation, which relies on forward and backward passes, the FF algorithm employs two forward passes, enabling localized learning, enhanced computational efficiency, and improved compatibility with neuromorphic hardware. We introduce an FF-based SNN training framework and evaluate its performance across both non-spiking (MNIST, Fashion-MNIST, CIFAR-10) and spiking (Neuro-MNIST, SHD) datasets. Experimental results demonstrate that our model surpasses existing FF-based SNNs by over 5% on MNIST and Fashion-MNIST while achieving accuracy comparable to state-of-the-art backpropagation-trained SNNs. On more complex tasks such as CIFAR-10 and SHD, our approach outperforms other SNN models by up to 6% and remains competitive with leading backpropagation-trained SNNs. These findings highlight the FF algorithm's potential to advance SNN training methodologies and neuromorphic computing by addressing key limitations of backpropagation.
... To begin with, the chosen model is many orders of magnitude smaller than the number of neurons in the human brain, and it has been trained via backpropagation, an update mechanism which most likely does not exist in nature in the same form as it does in artificial neural networks. 32 Moreover, Figure 2 visualises the activations only from one layer of the model, even though many layers are stacked beneath it, each contributing to the final output (i.e., there are many heatmaps). Another key difference is input segmentation: for simplicity, I divided the text into equal-length chunks, which does not reflect how humans engage with texts. ...
Article
Full-text available
This article draws upon recent developments in cognitive neuroscience and natural language processing to contribute a techno-cognitive perspective into the ‘deep reading’ versus ‘surface reading’ debate in literary studies. Research at the intersection of humanities and sciences suggests that narrative experience, including both production (decoding) and reception (encoding) of stories, constitutes a sequentially and hierarchically complex process shaped simultaneously by socio-cultural contexts, sensory-emotional dynamics, and cognitive integration across multiple levels of complexity. This interdisciplinary view contrasts with traditional humanities methodologies such as area studies, which privileges identity-based accounts of literary phenomena, or Marxist genealogy, which neglects extra-political sources of meaning. The article surveys relevant research findings across multiple domains and discusses the hermeneutic implications of the techno-cognitive approach for literary studies, exemplified in a reading of Zhang Xianliang's 1985 novel Half of Man is Woman.
... However, supervised learning usually requires a large number of labels, while the brain can learn more efficiently with fewer (or sometimes without) labels. The necessity of backpropagation for training deep neural networks also challenges their feasibility in biological systems, although some recent studies proposed alternative methods (11,12). The potential performance of simpler biological circuits than deep neural networks for obtaining good representations, especially in an unsupervised manner and without backpropagation, has not been sufficiently addressed. ...
Article
Obtaining appropriate low-dimensional representations from high-dimensional sensory inputs in an unsupervised manner is essential for straightforward downstream processing. Although nonlinear dimensionality reduction methods such as t -distributed stochastic neighbor embedding ( t -SNE) have been developed, their implementation in simple biological circuits remains unclear. Here, we develop a biologically plausible dimensionality reduction algorithm compatible with t -SNE, which uses a simple three-layer feedforward network mimicking the Drosophila olfactory circuit. The proposed learning rule, described as three-factor Hebbian plasticity, is effective for datasets such as entangled rings and MNIST, comparable to t -SNE. We further show that the algorithm could be working in olfactory circuits in Drosophila by analyzing the multiple experimental data in previous studies. We lastly suggest that the algorithm is also beneficial for association learning between inputs and rewards, allowing the generalization of these associations to other inputs not yet associated with rewards.
... Training an NN involves iteratively adjusting its weights to minimize the error between predicted and actual outputs. This is achieved through backpropagation, an algorithm that computes error gradients and updates weights accordingly using optimization methods like Stochastic Gradient Descent (SGD) or Adam [31,32]. During training, the dataset is processed over multiple epochs, with data divided into shuffled mini-batches to balance memory usage and computational efficiency [33]. ...
Article
Full-text available
This study explores the extraction of remote Photoplethysmography (rPPG) signals from images using various neural network architectures, addressing the challenge of accurate signal estimation in biomedical contexts. The objective is to evaluate the effectiveness of different models in capturing rPPG signals from dataset snapshots. Two training strategies were investigated: pre-training models with only the fully connected layer being fine-tuned and training the entire network from scratch. The analysis reveals that models trained from scratch consistently outperform their pre-trained counterparts in extracting rPPG signals. Among the architectures assessed, DenseNet121 demonstrated superior performance, offering the most reliable results in this context. These findings underscore the potential of neural networks in advancing rPPG signal extraction, which has promising applications in fields such as clinical monitoring and personalized medical care. This study contributes to the integration of advanced imaging techniques and neural network-based analysis in biomedical engineering, paving the way for more robust and efficient methodologies.
... The above definitions of N-GNN, HN-GNN, and n-SHN-GNN assume typical forward-pass, layer-by-layer neural network operations. Training is done by gradient-based optimization (e.g., backpropagation[120,170,216,279]) on a loss function that measures predictive Takaaki Fujita and Florentin Smarandache, Exploring Concepts of HyperFuzzy, HyperNeutrosophic, and HyperPlithogenic Sets II performance. The novel aspect is the representation of edges and vertices with (hyper)neutrosophic or n-superhyperneutrosophic sets of membership values, enabling richer modeling of uncertainty and ambiguity in graph-structured data. ...
Chapter
Full-text available
This paper delves into the advancements of classical set theory to address the complexities and uncertainties inherent in real-world phenomena. It highlights three major extensions of traditional set theory Fuzzy Sets [288], Neutrosophic Sets [237], and Plithogenic Sets [243]-and examines their further generalizations into Hyperfuzzy [106], HyperNeutrosophic [90], and Hyperplithogenic Sets [90]. Building on previous research [83], this study explores the potential applications of HyperNeutrosophic Sets and SuperHyperNeutrosophic Sets across various domains. Specifically, it extends fundamental concepts such as Neutrosophic Logic, Cognitive Maps, Graph Neural Networks, Classifiers, and Triplet Groups through these advanced set structures and briefly analyzes their mathematical properties.
Article
Robotic vision is essential for enabling intelligent and autonomous systems across diverse applications, including manufacturing, healthcare, autonomous navigation, and surveillance. However, conventional vision systems, which rely on rigid imaging hardware, face challenges such as limited adaptability, high energy consumption, and processing latency. Recently, flexible and stretchable photodetectors (PDs) have emerged as promising alternatives due to their advantages over rigid counterparts, making them ideal for robotic vision that requires multifunctionality and high energy efficiency to perform environment‐specific tasks. Despite their potential, current research studies on deformable PDs have largely focused on improving basic properties such as softness and responsivity, limiting their practical implementation in robotic vision. To unlock their full potential, next‐generation flexible and stretchable vision systems must integrate advanced image acquisition and processing capabilities. This review explores recent progress in vision systems with a focus on these two aspects. First, we examine bio‐inspired vision systems that mimic structural and functional features of biological eyes to enhance image acquisition. Next, we describe vision systems integrated with in‐sensor computing architecture that enables simultaneous image acquisition and processing. Finally, we discuss remaining challenges and propose future directions for developing next‐generation flexible and stretchable vision systems to meet the growing demands of advanced robotic vision.
Article
Accurate wind speed prediction is crucial for optimizing renewable energy utilization and enhancing operational safety in wind farms. However, existing methods face challenges due to data noise, mode mixing in decomposition, and limited model adaptability for multi-step forecasting. This paper proposes a novel hybrid framework (HPMTC-CVMD-IBTA) integrating three innovations: (1) A spatial-temporal denoising method (HPMTC) combining high-order polynomial fitting with M-estimator correction and temporal clustering to preserve signal integrity while removing noise; (2) A decomposition-optimization approach (CVMD) that adaptively weights variational mode decomposition (VMD) components via convolutional neural networks, reducing reconstruction errors compared to traditional methods; and (3) An Informer-BiGRU-Temporal Attention (IBTA) model that leverages multi-variable dependencies and long-sequence patterns through bidirectional gated units and attention mechanisms. Experiments on real-world wind farm datasets (Guangdong and Gansu, China) demonstrate the framework’s superiority: It achieves over 99% prediction accuracy (R2), reduces MAE by 15–40% against benchmarks (e.g., LSTM, BiGRU), and improves multi-step forecasting robustness across seasons. The proposed system addresses critical limitations in noise sensitivity, decomposition instability, and temporal feature decay, offering a reliable solution for energy management and disaster prevention.
Article
Full-text available
Cognitive brain functions rely on experience-dependent internal representations of relevant information. Such representations are organized by attractor dynamics or other mechanisms that constrain population activity onto “neural manifolds”. Quantitative analyses of representational manifolds are complicated by their potentially complex geometry, particularly in the absence of attractor states. Here we trained juvenile and adult zebrafish in an odor discrimination task and measured neuronal population activity to analyze representations of behaviorally relevant odors in telencephalic area pDp, the homolog of piriform cortex. No obvious signatures of attractor dynamics were detected. However, olfactory discrimination training selectively enhanced the separation of neural manifolds representing task-relevant odors from other representations, consistent with predictions of autoassociative network models endowed with precise synaptic balance. Analytical approaches using the framework of manifold capacity revealed multiple geometrical modifications of representational manifolds that supported the classification of task-relevant sensory information. Manifold capacity predicted odor discrimination across individuals better than other descriptors of population activity, indicating a close link between manifold geometry and behavior. Hence, pDp and possibly related recurrent networks store information in the geometry of representational manifolds, resulting in joint sensory and semantic maps that may support distributed learning processes.
Preprint
Full-text available
Efficient spatial navigation is a hallmark of the mammalian brain, inspiring the development of neuromorphic systems that mimic biological principles. Despite progress, implementing key operations like back-tracing and handling ambiguity in bio-inspired spiking neural networks remains an open challenge. This work proposes a mechanism for activity back-tracing in arbitrary, uni-directional spiking neuron graphs. We extend the existing replay mechanism of the spiking hierarchical temporal memory (S-HTM) by our spike timing-dependent threshold adaptation (STDTA), which enables us to perform path planning in networks of spiking neurons. We further present an ambiguity dependent threshold adaptation (ADTA) for identifying places in an environment with less ambiguity, enhancing the localization estimate of an agent. Combined, these methods enable efficient identification of the shortest path to an unambiguous target. Our experiments show that a network trained on sequences reliably computes shortest paths with fewer replays than the steps required to reach the target. We further show that we can identify places with reduced ambiguity in multiple, similar environments. These contributions advance the practical application of biologically inspired sequential learning algorithms like the S-HTM towards neuromorphic localization and navigation.
Article
Full-text available
The description of a combination of technologies as ‘artificial intelligence’ (AI) is misleading. To ascribe intelligence to a statistical model without human attribution points towards an attempt at shifting legal, social, and ethical responsibilities to machines. This paper exposes the deeply flawed characterisation of AI and the unearned assumptions that are central to its current definition, characterisation, and efforts at controlling it. The contradictions in the framing of AI have been the bane of the incapacity to regulate it. A revival of applied definitional framing of AI across disciplines have produced a plethora of conceptions and inconclusiveness. Therefore, the research advances this position with two fundamental and interrelated arguments. First, the difficulty in regulating AI is tied to it characterisation as artificial intelligence. This has triggered existing and new conflicting notions of the meaning of ‘artificial’ and ‘intelligence’, which are broad and largely unsettled. Second, difficulties in developing a global consensus on responsible AI stem from this inconclusiveness. To advance these arguments, this paper utilises functional contextualism to analyse the fundamental nature and architecture of artificial intelligence and human intelligence. There is a need to establish a test for ‘artificial intelligence’ in order ensure appropriate allocation of rights, duties, and responsibilities. Therefore, this research proposes, develops, and recommends an adaptive three-elements, three-step threshold for achieving responsible artificial intelligence.
Article
Despite the impressive performance of biological and artificial networks, an intuitive understanding of how their local learning dynamics contribute to network-level task solutions remains a challenge to this date. Efforts to bring learning to a more local scale indeed lead to valuable insights, however, a general constructive approach to describe local learning goals that is both interpretable and adaptable across diverse tasks is still missing. We have previously formulated a local information processing goal that is highly adaptable and interpretable for a model neuron with compartmental structure. Building on recent advances in Partial Information Decomposition (PID), we here derive a corresponding parametric local learning rule, which allows us to introduce “infomorphic” neural networks. We demonstrate the versatility of these networks to perform tasks from supervised, unsupervised, and memory learning. By leveraging the interpretable nature of the PID framework, infomorphic networks represent a valuable tool to advance our understanding of the intricate structure of local learning.
Article
Full-text available
Theoretical neuroscientists and machine learning researchers have proposed a variety of learning rules to enable artificial neural networks to effectively perform both supervised and unsupervised learning tasks. It is not always clear, however, how these theoretically-derived rules relate to biological mechanisms of plasticity in the brain, or how these different rules might be mechanistically implemented in different contexts and brain regions. This study shows that the calcium control hypothesis, which relates synaptic plasticity in the brain to the calcium concentration ([Ca²⁺]) in dendritic spines, can produce a diverse array of learning rules. We propose a simple, perceptron-like neuron model, the calcitron, that has four sources of [Ca²⁺]: local (following the activation of an excitatory synapse and confined to that synapse), heterosynaptic (resulting from the activity of other synapses), postsynaptic spike-dependent, and supervisor-dependent. We demonstrate that by modulating the plasticity thresholds and calcium influx from each calcium source, we can reproduce a wide range of learning and plasticity protocols, such as Hebbian and anti-Hebbian learning, frequency-dependent plasticity, and unsupervised recognition of frequently repeating input patterns. Moreover, by devising simple neural circuits to provide supervisory signals, we show how the calcitron can implement homeostatic plasticity, perceptron learning, and BTSP-inspired one-shot learning. Our study bridges the gap between theoretical learning algorithms and their biological counterparts, not only replicating established learning paradigms but also introducing novel rules.
Article
Full-text available
We show that deep networks can be trained using Hebbian updates yielding similar performance to ordinary back-propagation on challenging image datasets. To overcome the unrealistic symmetry in connections between layers, implicit in back-propagation, the feedback weights are separate from the feedforward weights. The feedback weights are also updated with a local rule, the same as the feedforward weights—a weight is updated solely based on the product of activity of the units it connects. With fixed feedback weights as proposed in Lillicrap et al. (2016) performance degrades quickly as the depth of the network increases. If the feedforward and feedback weights are initialized with the same values, as proposed in Zipser and Rumelhart (1990), they remain the same throughout training thus precisely implementing back-propagation. We show that even when the weights are initialized differently and at random, and the algorithm is no longer performing back-propagation, performance is comparable on challenging datasets. We also propose a cost function whose derivative can be represented as a local Hebbian update on the last layer. Convolutional layers are updated with tied weights across space, which is not biologically plausible. We show that similar performance is achieved with untied layers, also known as locally connected layers, corresponding to the connectivity implied by the convolutional layers, but where weights are untied and updated separately. In the linear case we show theoretically that the convergence of the error to zero is accelerated by the update of the feedback weights.
Article
Full-text available
This review article summarises recently proposed theories on how neural circuits in the brain could approximate the error back-propagation algorithm used by artificial neural networks. Computational models implementing these theories achieve learning as efficient as artificial neural networks, but they use simple synaptic plasticity rules based on activity of presynaptic and postsynaptic neurons. The models have similarities, such as including both feedforward and feedback connections, allowing information about error to propagate throughout the network. Furthermore, they incorporate experimental evidence on neural connectivity, responses, and plasticity. These models provide insights on how brain networks might be organised such that modification of synaptic weights on multiple levels of cortical hierarchy leads to improved performance on tasks.
Article
Full-text available
Ventral visual stream neural responses are dynamic, even for static image presentations. However, dynamical neural models of visual cortex are lacking as most progress has been made modeling static, time-averaged responses. Here, we studied population neural dynamics during face detection across three cortical processing stages. Remarkably, ~30 milliseconds after the initially evoked response, we found that neurons in intermediate level areas decreased their responses to typical configurations of their preferred face parts relative to their response for atypical configurations even while neurons in higher areas achieved and maintained a preference for typical configurations. These hierarchical neural dynamics were inconsistent with standard feedforward circuits. Rather, recurrent models computing prediction errors between stages captured the observed temporal signatures. This model of neural dynamics, which simply augments the standard feedforward model of online vision, suggests that neural responses to static images may encode top-down prediction errors in addition to bottom-up feature estimates.
Article
Full-text available
The neocortex contains a multitude of cell types that are segregated into layers and functionally distinct areas. To investigate the diversity of cell types across the mouse neocortex, here we analysed 23,822 cells from two areas at distant poles of the mouse neocortex: the primary visual cortex and the anterior lateral motor cortex. We define 133 transcriptomic cell types by deep, single-cell RNA sequencing. Nearly all types of GABA (γ-aminobutyric acid)-containing neurons are shared across both areas, whereas most types of glutamatergic neurons were found in one of the two areas. By combining single-cell RNA sequencing and retrograde labelling, we match transcriptomic types of glutamatergic neurons to their long-range projection specificity. Our study establishes a combined transcriptomic and projectional taxonomy of cortical cell types from functionally distinct areas of the adult mouse cortex.
Article
Full-text available
Significance Understanding the neural code is to attribute proper meaning to temporal sequences of action potentials. We report a simple neural code based on distinguishing single spikes from spikes in close succession, commonly called “bursts.” By separating these two types of responses, we show that ensembles of neurons can communicate rapidly changing and graded information from two sources simultaneously and with minimal cross-talk. Second, we show that this multiplexing can optimize the information transferred per action potential when bursts are relatively rare. Finally, we show that neurons can demultiplex these two streams of information. We propose that this multiplexing may be particularly important in hierarchical communication where bottom–up and top–down information must be distinguished.
Article
Full-text available
Understanding visual perceptual learning (VPL) has become increasingly more challenging as new phenomena are discovered with novel stimuli and training paradigms. Although existing models aid our knowledge of critical aspects of VPL, the connections shown by these models between behavioral learning and plasticity across different brain areas are typically superficial. Most models explain VPL as readout from simple perceptual representations to decision areas and are not easily adaptable to explain new findings. Here, we show that a well-known instance of deep neural network (DNN), whereas not designed specifically for VPL, provides a computational model of VPL with enough complexity to be studied at many levels of analyses. After learning a Gabor orientation discrimination task, the DNN model reproduced key behavioral results, including increasing specificity with higher task precision, and also suggested that learning precise discriminations could transfer asymmetrically to coarse discriminations when the stimulus conditions varied. Consistent with the behavioral findings, the distribution of plasticity moved toward lower layers when task precision increased and this distribution was also modulated by tasks with different stimulus types. Furthermore, learning in the network units demonstrated close resemblance to extant electrophysiological recordings in monkey visual areas. Altogether, the DNN fulfilled predictions of existing theories regarding specificity and plasticity and reproduced findings of tuning changes in neurons of the primate visual areas. Although the comparisons were mostly qualitative, the DNN provides a new method of studying VPL, can serve as a test bed for theories, and assists in generating predictions for physiological investigations.
Article
Full-text available
Humans and many other animals have an enormous capacity to learn about sensory stimuli and to master new skills. However, many of the mechanisms that enable us to learn remain to be understood. One of the greatest challenges of systems neuroscience is to explain how synaptic connections change to support maximally adaptive behaviour. Here, we provide an overview of factors that determine the change in the strength of synapses, with a focus on synaptic plasticity in sensory cortices. We review the influence of neuromodulators and feedback connections in synaptic plasticity and suggest a specific framework in which these factors can interact to improve the functioning of the entire network.
Article
Full-text available
Animal behaviour depends on learning to associate sensory stimuli with the desired motor command. Understanding how the brain orchestrates the necessary synaptic modifications across different brain areas has remained a longstanding puzzle. Here, we introduce a multi-area neuronal network model in which synaptic plasticity continuously adapts the network towards a global desired output. In this model synaptic learning is driven by a local dendritic prediction error that arises from a failure to predict the top-down input given the bottom-up activities. Such errors occur at apical dendrites of pyramidal neurons where both long-range excitatory feedback and local inhibitory predictions are integrated. When local inhibition fails to match excitatory feedback an error occurs which triggers plasticity at bottom-up synapses at basal dendrites of the same pyramidal neurons. We demonstrate the learning capabilities of the model in a number of tasks and show that it approximates the classical error backpropagation algorithm. Finally, complementing this cortical circuit with a disinhibitory mechanism enables attention-like stimulus denoising and generation. Our framework makes several experimental predictions on the function of dendritic integration and cortical microcircuits, is consistent with recent observations of cross-area learning, and suggests a biological implementation of deep learning.
Article
Full-text available
Deep learning has led to significant advances in artificial intelligence, in part, by adopting strategies motivated by neurophysiology. However, it is unclear whether deep learning could occur in the real brain. Here, we show that a deep learning algorithm that utilizes multi-compartment neurons might help us to understand how the neocortex optimizes cost functions. Like neocortical pyramidal neurons, neurons in our model receive sensory information and higher-order feedback in electrotonically segregated compartments. Thanks to this segregation, neurons in different layers of the network can coordinate synaptic weight updates. As a result, the network learns to categorize images better than a single layer network. Furthermore, we show that our algorithm takes advantage of multilayer architectures to identify useful higher-order representations-the hallmark of deep learning. This work demonstrates that deep learning can be achieved using segregated dendritic compartments, which may help to explain the morphology of neocortical pyramidal neurons.
Article
Full-text available
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article
Full-text available
Successful recognition of partially occluded objects is presumed to involve dynamic interactions between brain areas responsible for vision and cognition, but neurophysiological evidence for the involvement of feedback signals is lacking. Here, we demonstrate that neurons in the ventrolateral prefrontal cortex (vlPFC) of monkeys performing a shape discrimination task respond more strongly to occluded than unoccluded stimuli. In contrast, neurons in visual area V4 respond more strongly to unoccluded stimuli. Analyses of V4 response dynamics reveal that many neurons exhibit two transient response peaks, the second of which emerges after vlPFC response onset and displays stronger selectivity for occluded shapes. We replicate these findings using a model of V4/vlPFC interactions in which occlusion-sensitive vlPFC neurons feed back to shape-selective V4 neurons, thereby enhancing V4 responses and selectivity to occluded shapes. These results reveal how signals from frontal and visual cortex could interact to facilitate object recognition under occlusion.
Article
Full-text available
A different form of synaptic plasticity How do synaptic or other neuronal changes support learning? This subject has been dominated by Hebb's postulate of synaptic change. Although there is strong experimental support for Hebbian plasticity in a number of preparations, alternative ideas have also been developed over the years. Bittner et al. provide in vivo, in vitro, and modeling data to support the view that non-Hebbian plasticity may underlie the formation of hippocampal place fields (see the Perspective by Krupic). Instead of multiple pairings, a single strong Ca ²⁺ plateau potential in neuronal dendrites paired with spatial inputs may be sufficient to produce place cells. Science , this issue p. 1033 ; see also p. 974
Article
Full-text available
The neocortex is central to mammalian cognitive ability, playing critical roles in sensory perception, motor skills and executive function. This thin, layered structure comprises distinct, functionally specialized areas that communicate with each other through the axons of pyramidal neurons. For the hundreds of such cortico-cortical pathways to underlie diverse functions, their cellular and synaptic architectures must differ so that they result in distinct computations at the target projection neurons. In what ways do these pathways differ? By originating and terminating in different laminae, and by selectively targeting specific populations of excitatory and inhibitory neurons, these “interareal” pathways can differentially control the timing and strength of synaptic inputs onto individual neurons, resulting in layer-specific computations. Due to the rapid development in transgenic techniques, the mouse has emerged as a powerful mammalian model for understanding the rules by which cortical circuits organize and function. Here we review our understanding of how cortical lamination constrains long-range communication in the mammalian brain, with an emphasis on the mouse visual cortical network. We discuss the laminar architecture underlying interareal communication, the role of neocortical layers in organizing the balance of excitatory and inhibitory actions, and highlight the structure and function of layer 1 in mouse visual cortex.
Article
Full-text available
We introduce Equilibrium Propagation, a learning framework for energy-based models. It involves only one kind of neural computation, performed in both the first phase (when the prediction is made) and the second phase of training (after the target or prediction error is revealed). Although this algorithm computes the gradient of an objective function just like Backpropagation, it does not need a special computation or circuit for the second phase, where errors are implicitly propagated. Equilibrium Propagation shares similarities with Contrastive Hebbian Learning and Contrastive Divergence while solving the theoretical issues of both algorithms: our algorithm computes the gradient of a well-defined objective function. Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point or stationary distribution) toward a configuration that reduces prediction error. In the case of a recurrent multi-layer supervised network, the output units are slightly nudged toward their target in the second phase, and the perturbation introduced at the output layer propagates backward in the hidden layers. We show that the signal “back-propagated” during this second phase corresponds to the propagation of error derivatives and encodes the gradient of the objective function, when the synaptic update corresponds to a standard form of spike-timing dependent plasticity. This work makes it more plausible that a mechanism similar to Backpropagation could be implemented by brains, since leaky integrator neural computation performs both inference and error back-propagation in our model. The only local difference between the two phases is whether synaptic changes are allowed or not. We also show experimentally that multi-layer recurrently connected networks with 1, 2, and 3 hidden layers can be trained by Equilibrium Propagation on the permutation-invariant MNIST task.
Article
Full-text available
To efficiently learn from feedback, cortical networks need to update synaptic weights on multiple levels of cortical hierarchy. An effective and well-known algorithm for computing such changes in synaptic weights is the error backpropagation algorithm. However, in this algorithm, the change in synaptic weights is a complex function of weights and activities of neurons not directly connected with the synapse being modified, whereas the changes in biological synapses are determined only by the activity of presynaptic and postsynaptic neurons. Several models have been proposed that approximate the backpropagation algorithm with local synaptic plasticity, but these models require complex external control over the network or relatively complex plasticity rules. Here we show that a network developed in the predictive coding framework can efficiently perform supervised learning fully autonomously, employing only simple local Hebbian plasticity. Furthermore, for certain parameters, the weight change in the predictive coding model converges to that of the backpropagation algorithm. This suggests that it is possible for cortical networks with simple Hebbian synaptic plasticity to implement efficient learning algorithms in which synapses in areas on multiple levels of hierarchy are modified to minimize the error on the output.
Conference Paper
Full-text available
Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment. This could become a serious issue as one considers deeper and more non-linear functions, e.g., consider the extreme case of non-linearity where the relation between parameters and cost is actually discrete. Inspired by the biological implausibility of back-propagation, a few approaches have been proposed in the past that could play a similar credit assignment role. In this spirit, we explore a novel approach to credit assignment in deep networks that we call target propagation. The main idea is to compute targets rather than gradients, at each layer. Like gradients, they are propagated backwards. In a way that is related but different from previously proposed proxies for back-propagation which rely on a backwards network with symmetric weights, target propagation relies on auto-encoders at each layer. Unlike back-propagation, it can be applied even when units exchange stochastic bits rather than real numbers. We show that a linear correction for the imperfectness of the auto-encoders, called difference target propagation, is very effective to make target propagation actually work, leading to results comparable to back-propagation for deep networks with discrete and continuous units and denoising auto-encoders and achieving state of the art for stochastic networks.
Article
Full-text available
Recent work in computer science has shown the power of deep learning driven by the backpropagation algorithm in networks of artificial neurons. But real neurons in the brain are different from most of these artificial ones in at least three crucial ways: they emit spikes rather than graded outputs, their inputs and outputs are related dynamically rather than by piecewise-smooth functions, and they have no known way to coordinate arrays of synapses in separate forward and feedback pathways so that they change simultaneously and identically, as they do in backpropagation. Given these differences, it is unlikely that current deep learning algorithms can operate in the brain, but we that show these problems can be solved by two simple devices: learning rules can approximate dynamic input-output relations with piecewise-smooth functions, and a variation on the feedback alignment algorithm can train deep networks without having to coordinate forward and feedback synapses. Our results also show that deep spiking networks learn much better if each neuron computes an intracellular teaching signal that reflects that cell’s nonlinearity. With this mechanism, networks of spiking neurons show useful learning in synapses at least nine layers upstream from the output cells and perform well compared to other spiking networks in the literature on the MNIST digit recognition task.
Article
Full-text available
The brain processes information through multiple layers of neurons. This deep architecture is representationally powerful, but complicates learning because it is difficult to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame by multiplying error signals with all the synaptic weights on each neuron's axon and further downstream. However, this involves a precise, symmetric backward connectivity pattern, which is thought to be impossible in the brain. Here we demonstrate that this strong architectural constraint is not required for effective error propagation. We present a surprisingly simple mechanism that assigns blame by multiplying errors by even random synaptic weights. This mechanism can transmit teaching signals across multiple layers of neurons and performs as effectively as backpropagation on a variety of tasks. Our results help reopen questions about how the brain could use error signals and dispel long-held assumptions about algorithmic constraints on learning.
Article
Full-text available
Deep learning has led to significant advances in artificial intelligence in recent years, in part by adopting architectures and functions motivated by neurophysiology. However, current deep learning algorithms are biologically infeasible, because they assume non-spiking units, discontinuous-time, and non-local synaptic weight updates. Here, we build on recent discoveries in artificial neural networks to develop a spiking, continuous-time neural network model that learns to categorize images from the MNIST data-set with local synaptic weight updates. The model achieves this via a three-compartment cellular architecture, motivated by neocortical pyramidal cell neurophysiology, wherein feedforward sensory information and feedback from higher layers are received at separate compartments in the neurons. We show that, thanks to the separation of feedforward and feedback information in different dendrites, our learning algorithm can coordinate learning across layers, taking advantage of multilayer architectures to identify abstract categories - the hallmark of deep learning. Our model demonstrates that deep learning can be achieved within a biologically feasible framework using segregated dendritic compartments, which may help to explain the anatomy of neocortical pyramidal neurons.
Article
Full-text available
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition.
Article
Full-text available
In the last decade dendrites of cortical neurons have been shown to nonlinearly combine synaptic inputs by evoking local dendritic spikes. It has been suggested that these nonlinearities raise the computational power of a single neuron, making it comparable to a 2-layer network of point neurons. But how these nonlinearities can be incorporated into the synaptic plasticity to optimally support learning remains unclear. We present a theoretically derived synaptic plasticity rule for supervised and reinforcement learning that depends on the timing of the presynaptic, the dendritic and the postsynaptic spikes. For supervised learning, the rule can be seen as a biological version of the classical error-backpropagation algorithm applied to the dendritic case. When modulated by a delayed reward signal, the same plasticity is shown to maximize the expected reward in reinforcement learning for various coding scenarios. Our framework makes specific experimental predictions and highlights the unique advantage of active dendrites for implementing powerful synaptic plasticity rules that have access to downstream information via backpropagation of action potentials.
Article
Full-text available
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
Article
Full-text available
Since the work of Ramón y Cajal in the late 19th and early 20th centuries, neuroscientists have speculated that a complete understanding of neuronal cell types and their connections is key to explaining complex brain functions. However, a complete census of the constituent cell types and their wiring diagram in mature neocortex remains elusive. By combining octuple whole-cell recordings with an optimized avidin-biotin-peroxidase staining technique, we carried out a morphological and electrophysiological census of neuronal types in layers 1, 2/3, and 5 of mature neocortex and mapped the connectivity between more than 11,000 pairs of identified neurons. We categorized 15 types of interneurons, and each exhibited a characteristic pattern of connectivity with other interneuron types and pyramidal cells. The essential connectivity structure of the neocortical microcircuit could be captured by only a few connectivity motifs.
Article
Full-text available
Gradient backpropagation (BP) requires symmetric feedforward and feedback connections--the same weights must be used for forward and backward passes. This "weight transport problem" [1] is thought to be the crux of BP's biological implausibility. Using 15 different classification datasets, we systematically study to what extent BP really depends on weight symmetry. Surprisingly, the results indicate: (1) the magnitudes of feedback weights do not matter to performance (2) the signs of feedback weights do matter--the more concordant signs between feedforward and their corresponding feedback connections, the better (3) with feedback weights having random magnitudes and 100% concordant signs, we were able to achieve the same or even better performance than SGD. (4) some normalizations/stabilizations are indispensable for such asymmetric BP to work, namely Batch Normalization (BN) [2] and/or a "Batch Manhattan" (BM) update rule.
Article
Full-text available
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Article
Full-text available
Feature-selective firing allows networks to produce representations of the external and internal environments. Despite its importance, the mechanisms generating neuronal feature selectivity are incompletely understood. In many cortical microcircuits the integration of two functionally distinct inputs occurs nonlinearly through generation of active dendritic signals that drive burst firing and robust plasticity. To examine the role of this processing in feature selectivity, we recorded CA1 pyramidal neuron membrane potential and local field potential in mice running on a linear treadmill. We found that dendritic plateau potentials were produced by an interaction between properly timed input from entorhinal cortex and hippocampal CA3. These conjunctive signals positively modulated the firing of previously established place fields and rapidly induced new place field formation to produce feature selectivity in CA1 that is a function of both entorhinal cortex and CA3 input. Such selectivity could allow mixed network level representations that support context-dependent spatial maps.
Chapter
State-of-the-art algorithms and theory in a novel domain of machine learning, prediction when the output has structure. Machine learning develops intelligent computer systems that are able to generalize from previously seen examples. A new domain of machine learning, in which the prediction must satisfy the additional constraints found in structured data, poses one of machine learning's greatest challenges: learning functional dependencies between arbitrary input and output domains. This volume presents and analyzes the state of the art in machine learning algorithms and theory in this novel field. The contributors discuss applications as diverse as machine translation, document markup, computational biology, and information extraction, among others, providing a timely overview of an exciting field. Contributors Yasemin Altun, Gökhan Bakir, Olivier Bousquet, Sumit Chopra, Corinna Cortes, Hal Daumé III, Ofer Dekel, Zoubin Ghahramani, Raia Hadsell, Thomas Hofmann, Fu Jie Huang, Yann LeCun, Tobias Mann, Daniel Marcu, David McAllester, Mehryar Mohri, William Stafford Noble, Fernando Pérez-Cruz, Massimiliano Pontil, Marc'Aurelio Ranzato, Juho Rousu, Craig Saunders, Bernhard Schölkopf, Matthias W. Seeger, Shai Shalev-Shwartz, John Shawe-Taylor, Yoram Singer, Alexander J. Smola, Sandor Szedmak, Ben Taskar, Ioannis Tsochantaridis, S.V.N Vishwanathan, Jason Weston
Book
Policy search is a subfield of Reinforcement Learning (RL) that focuses on finding good parameters for a given policy parameterization. It is well suited tor robotics as it can cope with high-dimensional state and action spaces, which is one of the main challenges in robot learning. A Survey on Policy Search for Robotics reviews recent successes of both model-free and model-based policy search in robot learning. Model-free policy search is a general approach to learn policies based on sampled trajectories. This text classifies model-free methods based on their policy evaluation, policy update, and exploration strategies, and presents a unified view of existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot's dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning. For both model-free and model-based policy search methods, A Survey on Policy Search for Robotics reviews their respective properties and their applicability to robotic systems. It is an invaluable reference for anyone working in the area.
Article
Distributing learning across multiple layers has proven extremely powerful in artificial neural networks. However, little is known about how multi-layer learning is implemented in the brain. Here, we provide an account of learning across multiple processing layers in the electrosensory lobe (ELL) of mormyrid fish and report how it solves problems well known from machine learning. Because the ELL operates and learns continuously, it must reconcile learning and signaling functions without switching its mode of operation. We show that this is accomplished through a functional compartmentalization within intermediate layer neurons in which inputs driving learning differentially affect dendritic and axonal spikes. We also find that connectivity based on learning rather than sensory response selectivity assures that plasticity at synapses onto intermediate-layer neurons is matched to the requirements of output neurons. The mechanisms we uncover have relevance to learning in the cerebellum, hippocampus, and cerebral cortex, as well as in artificial systems.
Article
Sensory experience and perceptual learning changes receptive field properties of cortical pyramidal neurons (PNs), largely mediated by synaptic long-term potentiation (LTP). The circuit mechanisms underlying cortical LTP remain unclear. In the mouse somatosensory cortex, LTP can be elicited in layer 2/3 PNs by rhythmic whisker stimulation. We dissected the synaptic circuitry underlying this type of plasticity in thalamocortical slices. We found that projections from higher-order, posterior medial thalamic complex (POm) are key to eliciting N-methyl-D-aspartate receptor (NMDAR)-dependent LTP of intracortical synapses. Paired activation of cortical and higher-order thalamocortical inputs increased vasoactive intestinal peptide (VIP) and parvalbumin (PV) interneuron (IN) activity and decreased somatostatin (SST) IN activity, which together disinhibited the PNs. VIP IN-mediated disinhibition was critical for inducing LTP. This study reveals a circuit motif in which higher-order thalamic inputs gate synaptic plasticity via disinhibition. This motif may allow contextual feedback to shape synaptic circuits that process first-order sensory information.
Article
A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds. Reasoning that a complete model of auditory cortex must solve ecologically relevant tasks, we optimized hierarchical neural networks for speech and music recognition. The best-performing network contained separate music and speech pathways following early shared processing, potentially replicating human cortical organization. The network performed both tasks as well as humans and exhibited human-like errors despite not being optimized to do so, suggesting common constraints on network and human performance. The network predicted fMRI voxel responses substantially better than traditional spectrotemporal filter models throughout auditory cortex. It also provided a quantitative signature of cortical representational hierarchy-primary and non-primary responses were best predicted by intermediate and late network layers, respectively. The results suggest that task optimization provides a powerful set of tools for modeling sensory systems.
Article
Artificial intelligence has seen a number of breakthroughs in recent years, with games often serving as significant milestones. A common feature of games with these successes is that they involve information symmetry among the players, where all players have identical information. This property of perfect information, though, is far more common in games than in real-world problems. Poker is the quintessential game of imperfect information, and it has been a longstanding challenge problem in artificial intelligence. In this paper we introduce DeepStack, a new algorithm for imperfect information settings such as poker. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition about arbitrary poker situations that is automatically learned from self-play games using deep learning. In a study involving dozens of participants and 44,000 hands of poker, DeepStack becomes the first computer program to beat professional poker players in heads-up no-limit Texas hold'em. Furthermore, we show this approach dramatically reduces worst-case exploitability compared to the abstraction paradigm that has been favored for over a decade.
Article
In primary visual cortex, a subset of neurons responds when a particular stimulus is encountered in a certain location in visual space. This activity can be modeled using a visual receptive field. In addition to visually driven activity, there are neurons in visual cortex that integrate visual and motor-related input to signal a mismatch between actual and predicted visual flow. Here we show that these mismatch neurons have receptive fields and signal a local mismatch between actual and predicted visual flow in restricted regions of visual space. These mismatch receptive fields are aligned to the retinotopic map of visual cortex and are similar in size to visual receptive fields. Thus, neurons with mismatch receptive fields signal local deviations of actual visual flow from visual flow predicted based on self-motion and could therefore underlie the detection of objects moving relative to the visual flow caused by self-motion. Video Abstract Download video (42MB)Help with mp4 files
Article
Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.
Article
This work follows Bengio and Fischer (2015) in which theoretical foundations were laid to show how iterative inference can backpropagate error signals. Neurons move their activations towards configurations corresponding to lower energy and smaller prediction error: a new observation creates a perturbation at visible neurons that propagates into hidden layers, with these propagated perturbations corresponding to the back-propagated gradient. This avoids the need for a lengthy relaxation in the positive phase of training (when both inputs and targets are observed), as was believed with previous work on fixed-point recurrent networks. We show experimentally that energy-based neural networks with several hidden layers can be trained at discriminative tasks by using iterative inference and an STDP-like learning rule. The main result of this paper is that we can train neural networks with 1, 2 and 3 hidden layers on the permutation-invariant MNIST task and get the training error down to 0.00%. The results presented here make it more biologically plausible that a mechanism similar to back-propagation may take place in brains in order to achieve credit assignment in deep networks. The paper also discusses some of the remaining open problems to achieve a biologically plausible implementation of backprop in brains.
Article
In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.
Article
Significance Discovering the visual features and representations used by the brain to recognize objects is a central problem in the study of vision. Recent successes in computational models of visual recognition naturally raise the question: Do computer systems and the human brain use similar or different computations? We show by combining a novel method (minimal images) and simulations that the human recognition system uses features and learning processes, which are critical for recognition, but are not used by current models. The study uses a “phase transition” phenomenon in minimal images, in which minor changes to the image have a drastic effect on its recognition. The results show fundamental limitations of current approaches and suggest directions to produce more realistic and better-performing models.
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Article
Recent advances in neural network modeling have enabled major strides in computer vision and other artificial intelligence applications. Human-level visual recognition abilities are coming within reach of artificial systems. Artificial neural networks are inspired by the brain, and their computations could be implemented in biological neurons. Convolutional feedforward networks, which now dominate computer vision, take further inspiration from the architecture of the primate visual hierarchy. However, the current models are designed with engineering goals, not to model brain computations. Nevertheless, initial studies comparing internal representations between these models and primate brains find surprisingly similar representational spaces. With human-level performance no longer out of reach, we are entering an exciting new era, in which we will be able to build biologically faithful feedforward and recurrent computational models of how biological brains perform high-level feats of intelligence, including vision.
Article
It is often assumed that the action of cortical feedback connections is slow and modulatory, whereas feedforward connections carry a rapid drive to their target neurons. Recent results from our laboratory showed a very rapid effect of feedback connections on the visual responses of neurons in lower order areas. We wanted to determine whether such a rapid action is mediated by fast conducting axons. Using electrical stimulation, we compared the conduction velocities along feedforward and feedback axons between areas V1 and V2 of the macaque monkey. We conclude that feedback and feedforward connections between V1 and V2 have comparable fast conduction velocities (around 3.5 m/s).
Chapter
Threshold functions and related operators are widely used as basic elements of adaptive and associative networks [Nakano 72, Amari 72, Hopfield 82]. There exist numerous learning rules for finding a set of weights to achieve a particular correspondence between input-output pairs. But early works in the field have shown that the number of threshold functions (or linearly separable functions) in N binary variables is small compared to the number of all possible boolean mappings in N variables, especially if N is large. This problem is one of the main limitations of most neural networks models where the state is fully specified by the environment during learning: they can only learn linearly separable functions of their inputs. Moreover, a learning procedure which requires the outside world to specify the state of every neuron during the learning session can hardly be considered as a general learning rule because in real-world conditions, only a partial information on the “ideal” network state for each task is available from the environment. It is possible to use a set of so-called “hidden units” [Hinton,Sejnowski,Ackley. 84], without direct interaction with the environment, which can compute intermediate predicates. Unfortunately, the global response depends on the output of a particular hidden unit in a highly non-linear way, moreover the nature of this dependence is influenced by the states of the other cells.
Article
In a typical scene with many different objects, attentional mechanisms are needed to select relevant objects for visual processing and control over behavior. To test the role of area V4 in the selection of objects based on non-spatial features, we recorded from V4 neurons in the monkey, using a visual search paradigm. A cue stimulus was presented at the center of gaze, followed by a blank delay period. After the delay, a two-stimulus array was presented extrafoveally, and the monkey was rewarded for detecting the target stimulus matching the cue. The array was composed of one ‘good’ stimulus (effective in driving the cell when presented alone) and one ‘poor’ stimulus (ineffective in driving the cell when presented alone). When the choice array was presented in the receptive field (RF) of the neuron, many cells showed suppressive interactions between the stimuli as well as strong attention effects. Within 150–200 ms of array onset, responses to the array were determined by the target stimulus. If the target was the good stimulus, the response to the array became equal to the response to the good stimulus presented alone. If the target was the poor stimulus, the response approached the response to that stimulus presented alone. Thus the influence of the nontarget stimulus was filtered out. These effects were reduced or eliminated when the poor stimulus was located outside the RF and, therefore, no longer competing for the cell's response. Overall, the results support a ‘biased competition’ model of attention, according to which objects in the visual field compete for representation in the cortex, and this competition is biased in favor of the behaviorally relevant object.
Article
Cortical neurons receive synaptic inputs from thousands of afferents that fire action potentials at rates ranging from less than 1 hertz to more than 200 hertz. Both the number of afferents and their large dynamic range can mask changes in the spatial and temporal pattern of synaptic activity, limiting the ability of a cortical neuron to respond to its inputs. Modeling work based on experimental measurements indicates that short-term depression of intracortical synapses provides a dynamic gain-control mechanism that allows equal percentage rate changes on rapidly and slowly firing afferents to produce equal postsynaptic responses. Unlike inhibitory and adaptive mechanisms that reduce responsiveness to all inputs, synaptic depression is input-specific, leading to a dramatic increase in the sensitivity of a neuron to subtle changes in the firing patterns of its afferents.
Article
A fundamental issue in cortical processing of sensory information is whether top-down control circuits from higher brain areas to primary sensory areas not only modulate but actively engage in perception. Here, we report the identification of a neural circuit for top-down control in the mouse somatosensory system. The circuit consisted of a long-range reciprocal projection between M2 secondary motor cortex and S1 primary somatosensory cortex. In vivo physiological recordings revealed that sensory stimulation induced sequential S1 to M2 followed by M2 to S1 neural activity. The top-down projection from M2 to S1 initiated dendritic spikes and persistent firing of S1 layer 5 (L5) neurons. Optogenetic inhibition of M2 input to S1 decreased L5 firing and the accurate perception of tactile surfaces. These findings demonstrate that recurrent input to sensory areas is essential for accurate perception and provide a physiological model for one type of top-down control circuit. Copyright © 2015 Elsevier Inc. All rights reserved.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.