Aaron R. Voelker’s research while affiliated with University of Waterloo and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Programming Neuromorphics Using the Neural Engineering Framework
  • Chapter

February 2023

·

40 Reads

·

7 Citations

Aaron R. Voelker

·

As neuromorphic hardware begins to emerge as a viable target platform for artificial intelligence (AI) applications, there is a need for tools and software that can effectively compile a variety of AI models onto such hardware. Nengo (http://nengo.ai) is an ecosystem of software designed to fill this need with a suite of tools for creating, training, deploying, and visualizing neural networks for various hardware backends, including CPUs, GPUs, FPGAs, microcontrollers, and neuromorphic hardware. While backpropagation-based methods are powerful and fully supported in Nengo, there is also a need for frameworks that are capable of efficiently mapping dynamical systems onto such hardware while best utilizing its computational resources. The neural engineering framework (NEF) is one such method that is supported by Nengo. Most prominently, Nengo and the NEF have been used to engineer the world’s largest functional model of the human brain. In addition, as a particularly efficient approach to training neural networks for neuromorphics, the NEF has been ported to several neuromorphic platforms. In this chapter, we discuss the mathematical foundations of the NEF and a number of its extensions and review several recent applications that use Nengo to build models for neuromorphic hardware. We focus in-depth on a particular class of dynamic neural networks, Legendre Memory Units (LMUs), which have demonstrated advantages over state-of-the-art approaches in deep learning with respect to energy efficiency, training time, and accuracy.


Figure 2: The LMU and implicit self-attention architecture along with output dimensions. In the illustration, n refers to the sequence length, q is the order and q is the reduced order, and d is the embedding dimension. Normalization layers and skip connections are not shown. One variant uses the FFN component right after the input, and the other variant uses global attention.
Figure 3: Cross-entropy scores in nats, averaged across all the tokens in the sequence. Transformers and LSTMs fits are from Kaplan et al. (2020). Our models perform better than Transformers and LSTM models up to 1 million non-embedding parameters.
Figure 5: Approximately matching the loss between transformers and LMUs requires 10x more training for the transformer. The LMU and Attention model continues to significantly outperform transformers with 10x less training.
Parameter counts and compute (forward pass) for one layer of the network, per token. The first row indicates the number of FLOPs when following the implementation in Section A.1.
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers
  • Preprint
  • File available

October 2021

·

299 Reads

Narsimha Chilkuri

·

·

Aaron Voelker

·

[...]

·

Recent studies have demonstrated that the performance of transformers on the task of language modeling obeys a power-law relationship with model size over six orders of magnitude. While transformers exhibit impressive scaling, their performance hinges on processing large amounts of data, and their computational and memory requirements grow quadratically with sequence length. Motivated by these considerations, we construct a Legendre Memory Unit based model that introduces a general prior for sequence processing and exhibits an O(n) and O(nlnn)O(n \ln n) (or better) dependency for memory and computation respectively. Over three orders of magnitude, we show that our new architecture attains the same accuracy as transformers with 10x fewer tokens. We also show that for the same amount of training our model improves the loss over transformers about as much as transformers improve over LSTMs. Additionally, we demonstrate that adding global self-attention complements our architecture and the augmented model improves performance even further.

Download

Simulating and Predicting Dynamical Systems With Spatial Semantic Pointers

June 2021

·

56 Reads

·

18 Citations

Neural Computation

While neural networks are highly effective at learning task-relevant representations from data, they typically do not learn representations with the kind of symbolic structure that is hypothesized to support high-level cognitive processes, nor do they naturally model such structures within problem domains that are continuous in space and time. To fill these gaps, this work exploits a method for defining vector representations that bind discrete (symbol-like) entities to points in continuous topological spaces in order to simulate and predict the behavior of a range of dynamical systems. These vector representations are spatial semantic pointers (SSPs), and we demonstrate that they can (1) be used to model dynamical systems involving multiple objects represented in a symbol-like manner and (2) be integrated with deep neural networks to predict the future of physical trajectories. These results help unify what have traditionally appeared to be disparate approaches in machine learning.


A short letter on the dot product between rotated Fourier transforms

July 2020

·

8 Reads

Spatial Semantic Pointers (SSPs) have recently emerged as a powerful tool for representing and transforming continuous space, with numerous applications to cognitive modelling and deep learning. Fundamental to SSPs is the notion of "similarity" between vectors representing different points in n-dimensional space -- typically the dot product or cosine similarity between vectors with rotated unit-length complex coefficients in the Fourier domain. The similarity measure has previously been conjectured to be a Gaussian function of Euclidean distance. Contrary to this conjecture, we derive a simple trigonometric formula relating spatial displacement to similarity, and prove that, in the case where the Fourier coefficients are uniform i.i.d., the expected similarity is a product of normalized sinc functions: k=1nsinc(ak)\prod_{k=1}^{n} \operatorname{sinc} \left( a_k \right), where aRn\mathbf{a} \in \mathbb{R}^n is the spatial displacement between the two n-dimensional points. This establishes a direct link between space and the similarity of SSPs, which in turn helps bolster a useful mathematical framework for architecting neural networks that manipulate spatial structures.

Citations (2)


... The LMU is a relevant Recurrent Neural Network (RNN) architecture for classification tasks of time-based signals [3,17,38] originally implemented in the Nengo framework [4]. We here summarize its main properties by reviewing the key definitions directly from [37] for ease of reference. ...

Reference:

Natively neuromorphic LMU architecture for encoding-free SNN-based HAR on commercial edge devices
Programming Neuromorphics Using the Neural Engineering Framework
  • Citing Chapter
  • February 2023

... An alternative approach to preserving similarity for nearby positions is proposed by Komer et al. (2019), Voelker et al. (2021), andFrady et al. (2022) who make use of fractional binding. For this, two random HVs x and y are assigned to represent the x-and y-axis, respectively. ...

Simulating and Predicting Dynamical Systems With Spatial Semantic Pointers
  • Citing Article
  • June 2021

Neural Computation