February 2025
·
11 Reads
Knowledge-Based Systems
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
February 2025
·
11 Reads
Knowledge-Based Systems
January 2025
Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned update rules that enable faster, hyperparameter-free training of neural networks. A critical element for practically useful learned optimizers, that can be used off-the-shelf after meta-training, is strong meta-generalization: the ability to apply the optimizers to new tasks. Recent state-of-the-art work in learned optimizers, VeLO (Metz et al., 2022), requires a large number of highly diverse meta-training tasks along with massive computational resources, 4000 TPU months, to achieve meta-generalization. This makes further improvements to such learned optimizers impractical. In this work, we identify several key elements in learned optimizer architectures and meta-training procedures that can lead to strong meta-generalization. We also propose evaluation metrics to reliably assess quantitative performance of an optimizer at scale on a set of evaluation tasks. Our proposed approach, Celo, makes a significant leap in improving the meta-generalization performance of learned optimizers and also outperforms tuned state-of-the-art optimizers on a diverse set of out-of-distribution tasks, despite being meta-trained for just 24 GPU hours.
December 2024
·
51 Reads
Neurons in the brain have rich and adaptive input-output properties. Features such as heterogeneous f-I curves and spike frequency adaptation are known to place single neurons in optimal coding regimes when facing changing stimuli. Yet, it is still unclear how brain circuits exploit single-neuron flexibility, and how network-level requirements may have shaped such cellular function. To answer this question, a multi-scaled approach is needed where the computations of single neurons and neural circuits must be considered as a complete system. In this work, we use artificial neural networks to systematically investigate single-neuron input-output adaptive mechanisms, optimized in an end-to-end fashion. Throughout the optimization process, each neuron has the liberty to modify its nonlinear activation function parametrized to mimic f-I curves of biological neurons, either by learning an individual static function or via a learned and shared adaptation mechanism to modify activation functions in real-time during a task. We find that such adaptive networks show much-improved robustness to noise and changes in input statistics. Using tools from dynamical systems theory, we analyze the role of these emergent single-neuron properties and argue that neural diversity and adaptation play an active regularization role, enabling neural circuits to optimally propagate information across time. Finally, we outline similarities between these optimized solutions and known coding strategies found in biological neurons, such as gain scaling and fractional order differentiation/integration.
October 2024
·
55 Reads
·
1 Citation
Computational neuroscience relies on gradient descent (GD) for training artificial neural network (ANN) models of the brain. The advantage of GD is that it is effective at learning difficult tasks. However, it produces ANNs that are a poor phenomenological fit to biology, making them less relevant as models of the brain. Specifically, it violates Dale's law, by allowing synapses to change from excitatory to inhibitory and leads to synaptic weights that are not log-normally distributed, contradicting experimental data. Here, starting from first principles of optimisation theory, we present an alternative learning algorithm, exponentiated gradient (EG), that respects Dale's Law and produces log-normal weights, without losing the power of learning with gradients. We also show that in biologically relevant settings EG outperforms GD, including learning from sparsely relevant signals and dealing with synaptic pruning. Altogether, our results show that EG is a superior learning algorithm for modelling the brain with ANNs.
October 2024
·
37 Reads
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. How can we achieve cooperation among self-interested, independent learning agents? Promising recent work has shown that in certain tasks cooperation can be established between learning-aware agents who model the learning dynamics of each other. Here, we present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning, which takes into account that other agents are themselves learning through trial and error based on multiple noisy trials. We then leverage efficient sequence models to condition behavior on long observation histories that contain traces of the learning dynamics of other agents. Training long-context policies with our algorithm leads to cooperative behavior and high returns on standard social dilemmas, including a challenging environment where temporally-extended action coordination is required. Finally, we derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
October 2024
·
26 Reads
Compositionality is believed to be fundamental to intelligence. In humans, it underlies the structure of thought, language, and higher-level reasoning. In AI, compositional representations can enable a powerful form of out-of-distribution generalization, in which a model systematically adapts to novel combinations of known concepts. However, while we have strong intuitions about what compositionality is, there currently exists no formal definition for it that is measurable and mathematical. Here, we propose such a definition, which we call representational compositionality, that accounts for and extends our intuitions about compositionality. The definition is conceptually simple, quantitative, grounded in algorithmic information theory, and applicable to any representation. Intuitively, representational compositionality states that a compositional representation satisfies three properties. First, it must be expressive. Second, it must be possible to re-describe the representation as a function of discrete symbolic sequences with re-combinable parts, analogous to sentences in natural language. Third, the function that relates these symbolic sequences to the representation, analogous to semantics in natural language, must be simple. Through experiments on both synthetic and real world data, we validate our definition of compositionality and show how it unifies disparate intuitions from across the literature in both AI and cognitive science. We also show that representational compositionality, while theoretically intractable, can be readily estimated using standard deep learning tools. Our definition has the potential to inspire the design of novel, theoretically-driven models that better capture the mechanisms of compositional thought.
October 2024
·
27 Reads
The goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.
October 2024
·
30 Reads
During development, neural circuits are shaped continuously as we learn to control our bodies. The ultimate goal of this process is to produce neural dynamics that enable the rich repertoire of behaviors we perform with our limbs. What begins as a series of "babbles" coalesces into skilled motor output as the brain rapidly learns to control the body. However, the nature of the teaching signal underlying this normative learning process remains elusive. Here, we test two well-established and biologically plausible theories-supervised learning (SL) and reinforcement learning (RL)-that could explain how neural circuits develop the capacity for skilled movements. We trained recurrent neural networks to control a biomechanical model of a primate arm using either SL or RL and compared the resulting neural dynamics to populations of neurons recorded from the motor cortex of monkeys performing the same movements. Intriguingly, only RL-trained networks produced neural activity that matched their biological counterparts in terms of both the geometry and dynamics of population activity. We show that the similarity between RL-trained networks and biological brains depends critically on matching biomechanical properties of the limb. We then demonstrated that monkeys and RL-trained networks, but not SL-trained networks, show a strikingly similar capacity for robust short-term behavioral adaptation to a movement perturbation, indicating a fundamental and general commonality in the neural control policy. Together, our results support the hypothesis that neural dynamics for behavioral control emerge through a process akin to reinforcement learning. The resulting neural circuits offer numerous advantages for adaptable behavioral control over simpler and more efficient learning rules and expand our understanding of how developmental processes shape neural dynamics.
September 2024
·
67 Reads
Classical psychedelics induce complex visual hallucinations in humans, generating percepts that are coherent at a low level, but which have surreal, dream-like qualities at a high level. While there are many hypotheses as to how classical psychedelics could induce these effects, there are no concrete mechanistic models that capture the variety of observed effects in humans, while remaining consistent with the known pharmacological effects of classical psychedelics on neural circuits. In this work, we propose the "oneirogen hypothesis," which posits that the perceptual effects of classical psychedelics are a result of their pharmacological actions inducing neural activity states that truly are more similar to dream-like states. We simulate classical psychedelics' effects via manipulating neural network models trained on perceptual tasks with the Wake-Sleep algorithm. This established machine learning algorithm leverages two activity phases, a perceptual phase (wake) where sensory inputs are encoded, and a generative phase (dream) where the network internally generates activity consistent with stimulus-evoked responses. We simulate the action of psychedelics by partially shifting the model to the 'Sleep' state, which entails a greater influence of top-down connections, in-line with the impact of psychedelics on apical dendrites. The effects resulting from this manipulation capture a number of experimentally observed phenomena including the emergence of hallucinations, increases in stimulus-conditioned variability, and large increases in synaptic plasticity. We further provide a number of testable predictions which could be used to validate or invalidate our oneirogen hypothesis.
September 2024
·
8 Reads
Neuroscience employs diverse neuroimaging techniques, each offering distinct insights into brain activity, from electrophysiological recordings such as EEG, which have high temporal resolution, to hemodynamic modalities such as fMRI, which have increased spatial precision. However, integrating these heterogeneous data sources remains a challenge, which limits a comprehensive understanding of brain function. We present the Spatiotemporal Alignment of Multimodal Brain Activity (SAMBA) framework, which bridges the spatial and temporal resolution gaps across modalities by learning a unified latent space free of modality-specific biases. SAMBA introduces a novel attention-based wavelet decomposition for spectral filtering of electrophysiological recordings, graph attention networks to model functional connectivity between functional brain units, and recurrent layers to capture temporal autocorrelations in brain signal. We show that the training of SAMBA, aside from achieving translation, also learns a rich representation of brain information processing. We showcase this classify external stimuli driving brain activity from the representation learned in hidden layers of SAMBA, paving the way for broad downstream applications in neuroscience research and clinical contexts.
... Moreover, Exponentiated gradient (EGU and EG) updates can converge faster than GD when the target weight vector is sparse [5], [8], [11]. Recent findings about synapses in biology indicate that EG algorithms are more biologically plausible than additive GD updates [54]. EG updates are typically viewed as appropriate for problems where the geometry of the optimization domain is described by the Kullback-Leibler divergence or relative entropy, as is often the case when optimizing over probability distributions. ...
October 2024
... Additionally, this approach can help tackle the credit assignment problem (Y. H. Liu et al., 2021) and promotes multi-task learning by regulating different modes of neuronal dynamics (Munn et al., 2023;Williams et al., 2024), offering a more robust framework for continuous and adaptive learning in artificial neural systems. ...
July 2024
... In such a setting, intra-individual variability cannot be accounted for. This shortcoming can be potentially addressed by studies focusing on datasets featuring hours of scanning time per individual [69][70][71] . ...
July 2024
... For an intraoperative procedure, optimizing neural biomarkers is thus preferable over optimizing physiological ones. We investigate the relationship between eCAPs and physiological responses in detail in a companion study Berthon et al. (2023). We have shown that OBOES is suitable for optimizing B-fibre activation, a prime example for an indirect optimization of heart rate using a neural biomarker. ...
June 2024
Bioelectronic Medicine
... Future work is left to combine our model with biological constraints that induce additional effects during perturbations, e.g., through non-normal synaptic connectivity (O'Shea et al., 2022;Kim et al., 2023;Bondanelli and Ostojic, 2020;Logiaco et al., 2021). Third, our work connects to the setting of BCI, where the experimenter chooses the output weights at the beginning of learning (Sadtler et al., 2014;Golub et al., 2018;Willett et al., 2021;Rajeswaran et al., 2024). Typically, the output weights are set to lie 'within the manifold' of the leading PCs so that we expect aligned dynamics (Sadtler et al., 2014). ...
March 2024
... This advanced approach enhances therapeutic outcomes by specifically targeting distinct portions of the vagus nerve (Bowles et al., 2022), achieving greater efficacy and minimizing side effects compared to traditional whole-nerve stimulation methods. Precision VNS operates by selectively stimulating different fiber bundles within the vagus nerve (Mourdoukoutas et al., 2018), enabling exact control over physiological pathways responsible for functions like heart rate regulation (Wernisch et al., 2024), without impacting unrelated systems such as digestion or immune responses (Ahmed et al., 2022). ...
April 2024
... Recent work (Ji et al., 2023) suggests a measurement approach to the ineffability incurred during the mental representation and ascription of thoughts, beliefs, and desires to others. Leveraging multi-modal sensor information (Radford & et al., 2021;Ramesh et al., 2021;OpenAI, 2023) can improve the richness of module communication and obtain refined cross-modal representa-tions that can be potentially reused for different downstream tasks. ...
Reference:
Meta Neural Coordination
March 2024
Neuroscience of Consciousness
... This approach iteratively refreshes assessment of the response surface by utilizing a probabilistic model that effectively converges on ideal settings, while maintaining a balance between exploration and exploitation (Shahriari et al., 2016). Bayesian optimization has also been extensively used in neurostimulation (Choinière et al., 2024). Furthermore, bayesian optimization may be especially effective for individualizing therapy, as it can adapt to each patient's unique physiological profile. ...
February 2024
STAR Protocols
... 10 This new computational scheme, which does not require updating the internal synaptic 11 weight, is called reservoir computing. It is further assumed that the local cortical 12 microcircuit as well as the global cortical network can be viewed as reservoir 13 networks [16,21,31]. It is well documented that binocular rivalry is the result of the 14 interaction of numerous brain regions, including the lateral geniculate nucleus, the 15 thalamic reticular nucleus, and the visual areas V1, V2, V4, MT, and IT. ...
January 2024
... Behavior prediction metrics for BCI performance are also more intuitive for benchmarking progress than neural data prediction or the abstract goal of providing scientific insight (e.g. with latent variable models or in silico models) (Pei et al., 2021;Wang et al., 2023b). Finally, recent work has shown that deep networks are able to transfer learn across motor cortical datasets collected at different timepoints, subjects, or tasks (Azabou et al., 2024;Ye et al., 2023;Schneider et al., 2023). These ingredients provide the motivation and means for scaling neural data modeling. ...
Reference:
A Generalist Intracortical Motor Decoder
October 2023