Technical ReportPDF Available

Analysis of oscillatory weight changes from online learning with filtered spiking feedback

Authors:

Abstract and Figures

Prescribed Error Sensitivity (PES) is a biologically plausible supervised learning rule that is frequently used with the Neural Engineering Framework (NEF). PES modifies the connection weights between populations of spiking neurons to minimize an error signal. Continuing the work of Voelker (2015), we solve for the dynamics of PES, while filtering the error with an arbitrary linear synapse model. For the most common case of a lowpass filter, the continuous-time weight changes are characterized by a second-order bandpass filter with frequency ω = sqrt(τ^-1 κ ||a||^2) and bandwidth Q = sqrt(τ κ ||a||^2) , where τ is the exponential time constant, κ is the learning rate, and a is the activity vector. Therefore, the error converges to zero, yet oscillates if and only if τ κ ||a||^2 > 1/4. This provides a heuristic for setting κ based on the synaptic τ , and a method for engineering remarkably accurate decaying oscillators using only a single spiking leaky integrate-and-fire neuron.
Content may be subject to copyright.
Analysis of oscillatory weight changes from online
learning with filtered spiking feedback
Aaron R. Voelker and Chris Eliasmith
Centre for Theoretical Neuroscience technical report.
October 1, 2017
Abstract
Prescribed Error Sensitivity (PES) is a biologically plausible super-
vised learning rule that is frequently used with the Neural Engineering
Framework (NEF). PES modifies the connection weights between popula-
tions of spiking neurons to minimize an error signal. Continuing the work
of Voelker (2015), we solve for the dynamics of PES, while filtering the
error with an arbitrary linear synapse model. For the most common case
of a lowpass filter, the continuous-time weight changes are characterized
by a second-order bandpass filter with frequency ω=pτ1κkak2and
bandwidth Q=pτ κkak2, where τis the exponential time constant, κis
the learning rate, and ais the activity vector. Therefore, the error con-
verges to zero, yet oscillates if and only if τκkak2>1
4. This provides a
heuristic for setting κbased on the synaptic τ, and a method for engineer-
ing remarkably accurate decaying oscillators using only a single spiking
leaky integrate-and-fire neuron.
1 Introduction
The Neural Engineering Framework (NEF; Eliasmith and Anderson, 2003) is a
method for constructing biologically plausible spiking networks. To build and
simulate such models, the Centre for Theoretical Neuroscience makes extensive
use of the open-source software, Nengo (Bekolay et al., 2014). Nengo typically
learns its connection weights offline, but also supports a number of biologically
plausible supervised and unsupervised learning rules to learn its weights on-
line. By far, the most commonly used learning rule in Nengo is the Prescribed
Error Sensitivity (PES; Bekolay et al., 2013) rule, which learns a function by
minimizing a supervised error signal from external and recurrent feedback.
Previously, Voelker (2015) fully characterized the discrete-time dynamics of
PES under the restricted setting of a constant input signal, constant reference
signal, and no noise. Due to the absence of noise, no filter was required for the
error signal. However, for spiking networks considered in practice, a lowpass is
applied to the error to filter out spike-noise (e.g., DeWolf et al., 2016; Rasmussen
et al., 2017),
In this report, we relax the assumption of a constant reference signal, and
apply an arbitrary linear filter to the error signal. For simplicity, we do so for
the case of a continuous-time simulation, but our analysis can also be applied
1
r(t)
y(t)
x
h(t)
d(t)
e(t)
a
+
Figure 1: Network diagram used to analyze the PES rule. A constant in-
put xis represented by a population of nspiking neurons with the activ-
ity vector aRn. A dynamic reference signal r(t) determines the error
e(t) = ((yr)h) (t) (equation 3), which in turn drives y(t) towards r(t) by
modulating the connection weights via PES (equation 2). These learned con-
nection weights decode y(t) via the decoders d(t)Rn(equation 1). A linear
filter h(t) models the postsynaptic current induced by each spike.
to the discrete-time setting via the Z-transform. To keep our analysis tractable,
we still assume a constant input signal, and briefly discuss implications for the
general dynamic setting.
We begin by formulating a mathematical description of the network in sec-
tion 2. We present our theoretical results in section 3, and prove them in
section 4. In section 5, we validate our results with numerical simulations, and
demonstrate the utility of this analysis by engineering oscillators with prede-
termined frequencies and decay rates. Finally, we conclude in section 6 by dis-
cussing some implications of this report for learning spiking dynamical networks
online.
2 Prescribed Error Sensitivity
Consider a network in Nengo, containing a population of nspiking neurons,
encoding the constant scalar input x. Let aRnbe the average (i.e., rate)
activity of each neuron in response to this encoding.1This vector is determined
by the first principle of the NEF, and remains fixed for constant x. The decoders
d(t)Rndetermine the scalar output y(t) via the dot-product:
y(t) = aTd(t). (1)
The PES rule learns these decoders, online, according to the following dynamics:
˙
d(t) = κe(t)a, (2)
where κ > 0 is the learning rate,2and e(t) is the chosen error signal:3
e(t) = ((yr)h) (t), (3)
1Here on, we assume that a6=0, otherwise PES will have no effect.
2κis automatically scaled by n1in Nengo, to balance the linear scaling of kak2.
3Signs are flipped in Voelker (2015); equations 2 and 3 are consistent with Nengo.
2
where r(t) is the reference (i.e., ideal) output, and h(t) is some arbitrary lin-
ear filter modeling the postsynaptic current (PSC) induced by a spike arriving
at the synaptic cleft. Typically, h(t) is a first-order lowpass filter with time
constant τ > 0 (i.e., modeling an exponentially decaying PSC):
h(t) = 1
τet
τH(s) = 1
τs + 1.4(4)
The final network is summarized in Figure 1. This also naturally extends to the
case where xand yare vectors (using a population code), but we consider the
scalar case for simplicity.
Now, we aim to characterize the dynamics of e(t) in response to the control
signal r(t). Alternatively, we could characterize the dynamics of y(t) or d(t),
but the former is easier to work with, while describing the latter via equations 1
and 2 (i.e., by integrating e(t)).
3 Results
Let φ=τκkak2. For the network described in Figure 1 and section 2, we have:
e(t) = (rf)(t), (5)
where:
F(s) = s
sH(s)1+κkak2, (6)
hence F(s) is the transfer function from R(s) to E(s). For the case of a first-
order lowpass filter (equation 4),
=F(s) = s
τs2+s+κkak2=κkak21s
1
ω2s2+1
ωQ s+ 1
(7)
ω=pτ1κkak2=τ1pφ(8)
Q=pτκkak2=pφ. (9)
Thus, F(s) is a second-order Q-bandpass filter with frequency ωin radians per
second ( ω
2πis the frequency in hertz) (Zumbahlen et al., 2011, pp. 8.9–8.10).
The poles of F(s) are:
s=1±14φ
2τ. (10)
Since φ > 0, this system is exponentially stable, and, moreover, the impulse
response, f(t), is a decaying oscillator if and only if φ > 1
4.
4 Proof
We begin by transforming equations 1–3 into the Laplace domain:
Y(s) = aTD(s)
sD(s) = κE(s)a
E(s) = (Y(s)R(s)) H(s).5
4Capital-case variables denote the Laplace transforms of their corresponding lower-case
(time-domain) variables.
3
Substituting the first two equations into the last, yields:
E(s) = aTD(s)R(s)H(s)
=aTs1κE(s)aR(s)H(s)
=κkak2s1H(s)E(s)R(s)H(s)
1 + κkak2s1H(s)E(s) = R(s)H(s)
E(s) = R(s)H(s)
1 + κkak2s1H(s)
=R(s)s
sH(s)1+κkak2
=R(s)F(s).
Equations 5 and 6 follow from the convolution theorem. Equations 7–9 are
verified by substituting H(s)1=τs + 1 into equation 6.
The poles of the system (equation 10) are obtained by applying the quadratic
formula to the denominator polynomial from equation 7 (τs2+s+κkak2).
Exponential stability is implied by both poles being strictly in the left half-
plane. Lastly, f(t) oscillates if and only if the poles are complex, if and only if
the discriminant (1 4φ) is negative, if and only if φ > 1
4.
5 Validation
We construct the network from Figure 1 using Nengo 2.5.0 (Bekolay et al., 2013),
n= 1 spiking leaky integrate-and-fire neurons (mean firing rate of 262 Hz),6
τ= 0.1 s (equation 4), x= 0, and κsuch that φ > 1
4. We construct the transfer
function from equation 7 using nengolib 0.4.0 (Voelker, 2017):
im po rt nengolib
from ne ng ol ib. s ig na l import s
H = nengo li b.L ow pass( tau )
F = s / (s /H + ka pp aa. do t( a ))
where tau τis the time constant of the synapse, kappa κis the learning-
rate supplied to Nengo (divided by n), and aais the NumPy array for
the population’s activity. We evaluate (rf)(t) using F.filt(r, dt=dt), and
compare this to the e(t) obtained numerically in simulation. The filt method
automatically discretizes F(s) according to the simulation time-step (dt = 1 ms)
using zero-order hold (ZOH).7
In Figure 2, we confirm that (rf)(t) approximates the numerical e(t) given
white noise r(t). In Figure 3-Top, we exploit our knowledge of the impulse
response, f(t), to engineer a number of decaying oscillators by controlling r(t).
In Figure 3-Bottom, we evaluate np.abs(F.evaluate(freqs)) at a variety of
frequencies (freqs) to visualize the bandpass behaviour of each filter.
5This has nice form since ais a constant – otherwise multiplication of two time-varying
signals becomes a complex integral in the Laplace domain.
6Spikes are used in place of ain equations 1 and 2.
7Technically, for a discrete-time simulation, the problem and results should have been
formulated in the discrete-time domain using the Z-transform to begin with, as opposed to
discretizing at the end, but the difference is quite subtle.
4
Figure 2: Comparison of the analytical error (equation 7) to the numerical e(t)
obtained by simulating the network from Figure 1 (κ= 103). The control
signal r(t) is randomly sampled white noise with a cutoff frequency of 10 Hz.
The normalized root-mean-square error is approximately 3.7%.
Figure 3: Harnessing the dynamics of PES with various κto engineer decaying
oscillators with predetermined frequencies (ω) and bandwidths (Q). The time
constant of the first-order lowpass filter is fixed at τ= 0.1 s, while κis set
to achieve the desired ωvia equation 8. (Top) Once every second, r(t) is set
to a unit-area impulse. Consequently, e(t) oscillates according to the impulse
response, f(t). (Bottom) Visualizing the ideal frequency response of F(s) (equa-
tion 7). Dashed lines at ω(equation 8) align with the peak of each bandpass
filter, or equivalently the frequency of each oscillation. The width of each filter
is proportional to the decay rate Q1(equation 9).
5
6 Discussion
Since φ=τκkak2>1
4if and only if the weights oscillate, this motivates a simple
heuristic for setting the learning rate to prevent oscillatory weight changes: set
κ1
4τkak2, where kak2is maximal over all possible activity vectors. In this
case, equation 7 factors into a (differentiated) double-exponential:
F(s) = τ1τ2s
τ(τ1s+ 1)(τ2s+ 1) =τ1τ2τ1s1
τ1s+ 11
τ2s+ 1,
that is, two first-order lowpass filters chained together, where:
(τ1, τ2) = 2τ
114φ,
by equation 10. In other words, the non-oscillatory regime (0 < φ 1
4) of
PES is characterized by the dynamics of a double-exponential. We remark that
τ1=τ2= 2τ(i.e., an alpha filter) directly on the point of bifurcation from
double-exponential to oscillatory behaviour (φ=1
4; see Figure 4).
In all applications involving online learning (that we are aware of) oscillatory
weight changes are viewed as problematic, and so the relevant constants (τ,κ,
and kak2) are tweaked until the issue disappears. In contrast, we have shown
that not only can the relationship between these constants and the oscillations
be fully understood, but they can be harnessed to engineer bandpass filters
(with respect to the transformation r(t)7→ e(t)) with specific frequencies (ω)
and bandwidths (Q). More generally, the PES learning rule can be used to
construct dynamical systems whose transfer function (equation 6) depends on
H(s), κ, and kak2. As we used only a single spiking neuron, the accuracy of
these systems rely solely on the accuracy of the PES implementation, the model
of H(s), and the constancy of (ah)(t) in practice (i.e., given spiking activity).
Although we have analyzed the continuous-time setting, the same proof tech-
nique can be applied to the discrete-time domain by use of the Z-transform.
Likewise, although we have assumed xis a constant, we can apply a “separa-
tion of timescales” argument (i.e., assuming x(t) changes on a slower timescale
than f(t)) to carry this same analysis over to dynamic x(t). By equation 10, this
analysis holds approximately for x(t) with frequencies  − 1
4πτ Hz, by applying
a time-varying filter to d(t) that depends on the current activity vector.
In conclusion, we have extended our previous analysis of PES to include
linearly filtered feedback and a dynamic reference signal. This fully characterizes
the rule in the context of NEF networks representing a constant value, as a
transfer function from the reference signal to the error signal. This transfer
function may then be readily analyzed and exploited using linear systems theory.
This demonstrates a more general principle of recurrently coupling available
dynamical primitives in biological models (here, a PSC that is integrated by an
online learning rule) to improve network-level computations.
Acknowledgements
We thank Terrence C. Stewart for inspiring this work, in part by providing his
perspective on the PES rule applied to adaptive control in Nengo (DeWolf et al.,
2016), at the 2017 Telluride Neuromorphic Cognition Engineering Workshop.
6
Figure 4: Visualizing the poles of F(s) (equation 10) by sweeping κ > 0 (while
kak2and τ= 0.1 s remain fixed). Arrows follow the direction of increasing κ.
When φ1
4, the dynamics of PES are a double-exponential. The learning rule
becomes an alpha filter when the two poles collide: φ=1
4s=1
2τ
(marked by a solid circle). When φ > 1
4, the weight changes become oscillatory
(due to complex poles). As κincreases, the oscillatory frequency, ω, scales as
O(κ). As κdecreases, the first pole converges to s=1
τ(marked by a solid
x) while the second pole cancels the zero at s= 0.
7
References
Trevor Bekolay, Carter Kolbeck, and Chris Eliasmith. Simultaneous unsuper-
vised and supervised learning of cognitive functions in biologically plausible
spiking neural networks. In 35th Annual Conference of the Cognitive Science
Society, pages 169–174. Cognitive Science Society, 2013.
Trevor Bekolay, James Bergstra, Eric Hunsberger, Travis DeWolf, Terrence C
Stewart, Daniel Rasmussen, Xuan Choo, Aaron Russell Voelker, and Chris
Eliasmith. Nengo: A python tool for building large-scale functional brain
models. Frontiers in Neuroinformatics, 7(48), 2014. ISSN 1662-5196. doi:
10.3389/fninf.2013.00048.
Travis DeWolf, Terrence C Stewart, Jean-Jacques Slotine, and Chris Elia-
smith. A spiking neural model of adaptive arm control. Proceedings of
the Royal Society B, 283(48), 2016. doi: 10.1098/rspb.2016.2134. URL
http://dx.doi.org/10.1098/rspb.2016.2134.
Chris Eliasmith and Charles H. Anderson. Neural engineering: Computation,
representation, and dynamics in neurobiological systems. MIT Press, Cam-
bridge, MA, 2003.
Daniel Rasmussen, Aaron R. Voelker, and Chris Eliasmith. A neu-
ral model of hierarchical reinforcement learning. PLoS ONE,
12(7):1–39, 2017. doi: 10.1371/journal.pone.0180234. URL
https://doi.org/10.1371/journal.pone.0180234.
Aaron R. Voelker. A solution to the dynamics of the Prescribed Error Sensi-
tivity learning rule. Technical report, Centre for Theoretical Neuroscience,
Waterloo, ON, 10 2015.
Aaron R. Voelker. Nengolib – Additional extensions and tools for modelling
dynamical systems in Nengo. https://github.com/arvoelke/nengolib/,
2017. Accessed: 2017-08-12.
Hank Zumbahlen et al. Linear circuit design handbook. Newnes, 2011.
8
Thesis
Full-text available
Dynamical systems are universal computers. They can perceive stimuli, remember, learn from feedback, plan sequences of actions, and coordinate complex behavioural responses. The Neural Engineering Framework (NEF) provides a general recipe to formulate models of such systems as coupled sets of nonlinear differential equations and compile them onto recurrently connected spiking neural networks – akin to a programming language for spiking models of computation. The Nengo software ecosystem supports the NEF and compiles such models onto neuromorphic hardware. In this thesis, we analyze the theory driving the success of the NEF, and expose several core principles underpinning its correctness, scalability, completeness, robustness, and extensibility. We also derive novel theoretical extensions to the framework that enable it to far more effectively leverage a wide variety of dynamics in digital hardware, and to exploit the device-level physics in analog hardware. At the same time, we propose a novel set of spiking algorithms that recruit an optimal nonlinear encoding of time, which we call the Delay Network (DN). Backpropagation across stacked layers of DNs dramatically outperforms stacked Long Short-Term Memory (LSTM) networks—a state-of-the-art deep recurrent architecture—in accuracy and training time, on a continuous-time memory task, and a chaotic time-series prediction benchmark. The basic component of this network is shown to function on state-of-the-art spiking neuromorphic hardware including Braindrop and Loihi. This implementation approaches the energy-efficiency of the human brain in the former case, and the precision of conventional computation in the latter case.
Article
Full-text available
We present a spiking neuron model of the motor cortices and cerebellum of the motor control system. The model consists of anatomically organized spiking neurons encompassing premotor, primary motor, and cerebellar cortices. The model proposes novel neural computations within these areas to control a nonlinear three-link arm model that can adapt to unknown changes in arm dynamics and kinematic structure. We demonstrate the mathematical stability of both forms of adaptation, suggesting that this is a robust approach for common biological problems of changing body size (e.g. during growth), and unexpected dynamic perturbations (e.g. when moving through different media, such as water or mud). To demonstrate the plausibility of the proposed neural mechanisms, we show that the model accounts for data across 19 studies of the motor control system. These data include a mix of behavioural and neural spiking activity, across subjects performing adaptive and static tasks. Given this proposed characterization of the biological processes involved in motor control of the arm, we provide several experimentally testable predictions that distinguish our model from previous work.
Article
Full-text available
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.
Technical Report
Full-text available
Prescribed Error Sensitivity (PES) is a biologically plausible supervised learning rule that is frequently used with the Neural Engineering Framework (NEF). PES modifies the connection weights between populations of neurons to minimize an external error signal. We solve the discrete dynamical system for the case of constant inputs and no noise, to show that the decoding vectors given by the NEF have a simple closed-form expression in terms of the number of simulation timesteps. Moreover, with γ = (1 − κ||a||^2) < 1, where κ is the learning rate and a is the vector of firing rates, the error at timestep k is the initial error times γ^k. Thus, γ > −1 implies exponential convergence to a unique stable solution, γ < 0 results in oscillatory weight changes, and γ ≤ −1 implies instability.
Article
Full-text available
Neuroscience currently lacks a comprehensive theory of how cognitive processes can be implemented in a biological substrate. The Neural Engineering Framework (NEF) proposes one such theory, but has not yet gathered significant empirical support, partly due to the technical challenge of building and simulating large-scale models with the NEF. Nengo is a software tool that can be used to build and simulate large-scale models based on the NEF; currently, it is the primary resource for both teaching how the NEF is used, and for doing research that generates specific NEF models to explain experimental data. Nengo 1.4, which was implemented in Java, was used to create Spaun, the world's largest functional brain model (Eliasmith et al., 2012). Simulating Spaun highlighted limitations in Nengo 1.4's ability to support model construction with simple syntax, to simulate large models quickly, and to collect large amounts of data for subsequent analysis. This paper describes Nengo 2.0, which is implemented in Python and overcomes these limitations. It uses simple and extendable syntax, simulates a benchmark model on the scale of Spaun 50 times faster than Nengo 1.4, and has a flexible mechanism for collecting simulation results.
Article
This book enables design engineers to be more effective in designing discrete and integrated circuits by helping them understand the role of analog devices in their circuit design. Analog elements are at the heart of many important functions in both discrete and integrated circuits, but from a design perspective the analog components are often the most difficult to understand. Examples include operational amplifiers, D/A and A/D converters and active filters. Effective circuit design requires a strong understanding of the operation of these analog devices and how they affect circuit design. Key Features * comprehensive coverage of analog circuit components for the practicing engineer * market-validated design information for all major types of linear circuits * includes practical advice on how to read op amp data sheets and how to choose off-the-shelf op amps * full chapter covering printed circuit board design issues.
Nengolib – Additional extensions and tools for modelling dynamical systems in Nengo. https://github.com/arvoelke
  • R Aaron
  • Voelker
Aaron R. Voelker. Nengolib – Additional extensions and tools for modelling dynamical systems in Nengo. https://github.com/arvoelke/nengolib/, 2017. Accessed: 2017-08-12.
Nengolib -Additional extensions and tools for modelling dynamical systems in Nengo
  • Aaron R Voelker
Aaron R. Voelker. Nengolib -Additional extensions and tools for modelling dynamical systems in Nengo. https://github.com/arvoelke/nengolib/, 2017. Accessed: 2017-08-12.