Working PaperPDF Available

Training Spiking Deep Networks for Neuromorphic Hardware

Authors:
Training Spiking Deep Networks
for Neuromorphic Hardware
Eric Hunsberger
Centre for Theoretical Neuroscience
University of Waterloo
Waterloo, ON N2L 3G1
ehunsber@uwaterloo.ca
Chris Eliasmith
Centre for Theoretical Neuroscience
University of Waterloo
Waterloo, ON N2L 3G1
celiasmith@uwaterloo.ca
Abstract
We describe a method to train spiking deep networks that can be run using leaky
integrate-and-fire (LIF) neurons, achieving state-of-the-art results for spiking LIF
networks on five datasets, including the large ImageNet ILSVRC-2012 bench-
mark. Our method for transforming deep artificial neural networks into spik-
ing networks is scalable and works with a wide range of neural nonlinearities.
We achieve these results by softening the neural response function, such that its
derivative remains bounded, and by training the network with noise to provide
robustness against the variability introduced by spikes. Our analysis shows that
implementations of these networks on neuromorphic hardware will be many times
more power-efficient than the equivalent non-spiking networks on traditional hard-
ware.
1 Introduction
Deep artificial neural networks (ANNs) have recently been very successful at solving image cate-
gorization problems. Early successes with the MNIST database [1] were subsequently tested on the
more difficult but similarly sized CIFAR-10 [2] and Street-view house numbers [3] datasets. Re-
cently, many groups have achieved better results on these small datasets (e.g. [4]), as well as on
larger datasets (e.g. [5]). This work has culminated with the application of deep convolutional neu-
ral networks to ImageNet [6], a very large and challenging dataset with 1.2 million images across
1000 categories.
There has recently been considerable effort to introduce neural “spiking” into deep ANNs [7, 8, 9,
10, 11, 12], such that connected nodes in the network transmit information via instantaneous single
bits (spikes), rather than transmitting real-valued activities. While one goal of this work is to better
understand the brain by trying to reverse engineer it [7], another goal is to build energy-efficient
neuromorphic systems that use a similar spiking communication method, for image categorization
[10, 11, 12] or other applications [13].
In this paper, we present a novel method for translating deep ANNs into spiking networks for im-
plementation on neuromorphic hardware. Unlike previous methods, our method is applicable to
a broad range of neural nonlinearities, allowing for implementation on hardware with idiosyncratic
neuron types (e.g. [14]). We extend our previous results [15] to additional datasets, and most notably
demonstrate that it scales to the large ImageNet dataset. We also perform an analysis demonstrating
that neuromorphic implementations of these networks will be many times more power-efficient than
the equivalent non-spiking networks running on traditional hardware.
1
2 Methods
We first train a network on static images using traditional deep learning techniques; we call this the
ANN. We then take the parameters (weights and biases) from the ANN and use them to connect
spiking neurons, forming the spiking neural network (SNN). A central challenge is to train the ANN
in such a way that it can be transferred into a spiking network, and such that the classification error
of the resulting SNN is minimized.
2.1 Convolutional ANN
We base our network off that of Krizhevsky et al. [6], which won the ImageNet ILSVRC-2012
competition. A smaller variant of the network achieved 11% error on the CIFAR-10 dataset. The
network makes use of a series of generalized convolutional layers, where one such layer is composed
of a set of convolutional weights, followed by a neural nonlinearity, a pooling layer, and finally a
local contrast normalization layer. These generalized convolutional layers are followed by either
locally-connected layers, fully-connected layers, or both, all with a neural nonlinearity. In the case
of the original network, the nonlinearity is a rectified linear (ReLU) function, and pooling layers
perform max-pooling. The details of the network can be found in [6] and code is available1.
To make the ANN transferable to spiking neurons, a number of modifications are necessary. First,
we remove the local response normalization layers. This computation would likely require some
sort of lateral connections between neurons, which are difficult to add in the current framework
since the resulting network would not be feedforward and we are using methods focused on training
feedforward networks.
Second, we changed the pooling layers from max pooling to average pooling. Again, computing max
pooling would likely require lateral connections between neurons, making it difficult to implement
without significant changes to the training methodology. Average pooling, on the other hand, is very
easy to compute in spiking neurons, since it is simply a weighted sum.
The other modifications—using leaky integrate-and-fire neurons and training with noise—are the
main focus of this paper, and are described in detail below.
2.2 Leaky integrate-and-fire neurons
Our network uses a modified leaky integrate-and-fire (LIF) neuron nonlinearity instead of the recti-
fied linear nonlinearity. Past work has kept the rectified linear nonlinearity for the ANN and substi-
tuted in the spiking integrate-and-fire (IF) neuron model in the SNN [11, 10], since the static firing
curve of the IF neuron model is a rectified line. Our motivation for using the LIF neuron model is
that it and it demonstrates that more complex, nonlinear neuron models can be used in such net-
works. Thus, these methods can be extended to the idiosyncratic neuron types employed by some
neuromorphic hardware (e.g. [14]).
The LIF neuron dynamics are given by the equation
τRC ˙v(t) = v(t) + J(t)(1)
where v(t)is the membrane voltage, ˙v(t)is its derivative with respect to time, J(t)is the input
current, and τRC is the membrane time constant. When the voltage reaches Vth = 1, the neuron
fires a spike, and the voltage is held at zero for a refractory period of τref . Once the refractory
period is finished, the neuron obeys Equation 1 until another spike occurs.
Given a constant input current J(t) = j, we can solve Equation 1 for the time it takes the voltage to
rise from zero to one, and thereby find the steady-state firing rate
r(j) = τref +τRC log 1 + Vth
ρ(jVth)1
(2)
where ρ(x) = max(x, 0).
Theoretically, we should be able to train a deep neural network using Equation 2 as the static non-
linearity and make a reasonable approximation of the network in spiking neurons, assuming that
1https://github.com/akrizhevsky/cuda-convnet2
2
Figure 1: Comparison of LIF and soft LIF response functions. The left panel shows the response
functions themselves. The LIF function has a hard threshold at j=Vth = 1; the soft LIF function
smooths this threshold. The right panel shows the derivatives of the response functions. The hard
LIF function has a discontinuous and unbounded derivative at j= 1; the soft LIF function has a
continuous bounded derivative, making it amenable to use in backpropagation.
the spiking network has a synaptic filter that sufficiently smooths a spike train to give a good ap-
proximation of the firing rate. The LIF steady state firing rate has the particular problem that the
derivative approaches infinity as j0+, which causes problems when employing backpropagation.
To address this, we added smoothing to the LIF rate equation.
If we replace the hard maximum ρ(x) = max(x, 0) with a softer maximum ρ1(x) = log(1 + ex),
then the LIF neuron loses its hard threshold and the derivative becomes bounded. Further, we can
use the substitution
ρ2(x) = γlog h1 + ex/γ i(3)
to allow us control over the amount of smoothing, where ρ2(x)max(x, 0) as γ0. Figure 1
shows the result of this substitution.
2.3 Training with noise
Training neural networks with various types of noise on the inputs is not a new idea. Denoising
autoencoders [16] have been successfully applied to datasets like MNIST, learning more robust
solutions with lower generalization error than their non-noisy counterparts.
In a biological spiking neural network, synapses between neurons perform some measure of filtering
on the spikes, due to the fact that the post-synaptic current induced by the neurotransmitter release
is distributed over time. We employ a similar mechanism in our networks to attenuate some of
the variability introduced by spikes. The α-function α(t)=(t/τs)et/τsis a simple second-order
lowpass filter, inspired by biology [17]. We chose this as a synaptic filter for our networks since it
provides better noise reduction than a first-order lowpass filter.
The filtered spike train can be viewed as an estimate of the neuron activity. For example, if the
neuron is firing regularly at 200 Hz, filtering spike train will result in a signal fluctuating around 200
Hz. We can view the neuron output as being 200 Hz, with some additional “noise” around this value.
By training our ANN with some random noise added to the output of each neuron for each training
example, we can simulate the effects of using spikes on the signal received by the post-synaptic
neuron.
Figure 2 shows how the variability of filtered spike trains depends on input current for the LIF
neuron. Since the impulse response of the α-filter has an integral of one, the mean of the filtered
spike trains is equal to the analytical rate of Equation 2. However, the statistics of the filtered signal
vary significantly across the range of input currents. Just above the firing threshold, the distribution
is skewed towards higher firing rates (i.e. the median is below the mean), since spikes are infrequent
so the filtered signal has time to return to near zero between spikes. At higher input currents, on the
3
Figure 2: Variability in filtered spike trains versus input current for the LIF neuron (τRC =
0.02, τref = 0.004). The solid line shows the mean of the filtered spike train (which matches
the analytical rate of Equation 2), the ‘x’-points show the median, the solid error bars show the 25th
and 75th percentiles, and the dotted error bars show the minimum and maximum. The spike train
was filtered with an α-filter with τs= 0.003 s.
other hand, the distribution is skewed towards lower firing rates (i.e. the median is above the mean).
In spite of this, we used a Gaussian distribution to generate the additive noise during training, for
simplicity. We found the average standard deviation to be approximately σ= 10 across all positive
input currents for an α-filter with τs= 0.005. During training, we add Gaussian noise ηG(0, σ )
to the firing rate r(j)(Equation 2) when j > 0, and add no noise when j0.
2.4 Conversion to a spiking network
Finally, we convert the trained ANN to a SNN. The parameters in the spiking network (i.e. weights
and biases) are all identical to that of the ANN. The convolution operation also remains the same,
since convolution can be rewritten as simple connection weights (synapses) wij between pre-
synaptic neuron iand post-synaptic neuron j. (How the brain might learn connection weight pat-
terns, i.e. filters, that are repeated at various points in space, is a much more difficult problem that
we will not address here.) Similarly, the average pooling operation can be written as a simple con-
nection weight matrix, and this matrix can be multiplied by the convolutional weight matrix of the
following layer to get direct connection weights between neurons.2
The only component of the network that changes when moving from the ANN to the SNN is the
neurons themselves. The most significant change is that we replace the soft LIF rate model (Equa-
tion 2) with the LIF spiking model (Equation 1). We remove the additive Gaussian noise used in
training. We also add post-synaptic filters to the neurons, which removes a significant portion of the
high-frequency variation produced by spikes.
3 Results
We tested our methods on five datasets: MNIST [1], SVHN [18], CIFAR-10 and CIFAR-100 [19],
and the large ImageNet ILSVRC-2012 dataset [20]. Our best result for each dataset is shown in
Table 1. Using our methods has allowed us to build spiking networks that perform nearly as well as
their non-spiking counterparts using the same number of neurons. All datasets show minimal loss
in accuracy when transforming from the ANN to the SNN. 3
2For computational efficiency, we actually compute the convolution and pooling separately.
3The ILSVRC-2012 dataset actually shows a marginal increase in accuracy, though this is likely not statisti-
cally significant and could be because the spiking LIF neurons have harder firing thresholds than their soft-LIF
rate counterparts. Also, the CIFAR-100 dataset shows a considerable increase in performance when using soft-
4
Dataset ReLU ANN LIF ANN LIF SNN
MNIST 0.79% 0.84% 0.88%
SVHN 5.65% 5.79% 6.08%
CIFAR-10 16.48% 16.28% 16.46%
CIFAR-100 50.05% 44.35% 44.87%
ILSVRC-2012 45.4% (20.9%)a48.3% (24.1%)a48.2% (23.8%)a
aResults from the first 3072-image test batch.
Table 1: Results for spiking LIF networks (LIF SNN), compared with ReLU ANN and LIF ANN
(both using the same network structure, but with ReLU and LIF rate neurons respectively). The
spiking versions of each network perform almost as well as the rate-based versions. The ILSVRC-
2012 (ImageNet) results show the error for the top result, with the top-5 result in brackets.
Dataset This Paper TN 1-chip TN 8-chip Best Other
MNIST 0.88% (27k) None None 0.88% (22k) [10]
SVHN 6.08% (27k) 3.64% (1M) 2.83% (8M) None
CIFAR-10 16.46% (50k) 17.50% (1M) 12.50% (8M) 22.57% (28k) [11]
CIFAR-100 44.87% (50k) 47.27% (1M) 36.95% (8M) None
ILSVRC-2012 48.2%, 23.8% (493k)aNone None None
aResults from the first 3072-image test batch.
Table 2: Our error rates compared with recent results on the TrueNorth (TN) neuromorphic
chip [12], as well as other best results in the literature. Approximate numbers of neurons are shown
in parentheses. The TrueNorth networks use significantly more neurons than our networks (about
20×more for the 1-chip network and 160×more for the 8-chip network). The first number for
ILSVRC-2012 (ImageNet) indicates the error for the top result, and the second number the more
commonly reported top-5 result.
Table 2 compares our results to the best spiking network results on these datasets in the litera-
ture. The most significant recent results are from [12], who implemented networks for a number of
datasets on both one and eight TrueNorth chips. Their results are impressive, but are difficult to com-
pare with ours since they use between 20 and 160 times more neurons. We surpass a number of their
one-chip results while using an order of magnitude fewer neurons. Furthermore, we demonstrate
that our method scales to the large ILSVRC-2012 dataset, which no other SNN implementation to
date has done. The most significant difference between our results and that of [10] and [11] is that
we use LIF neurons and can generalize to other neuron types, whereas their methods (and those of
[12]) are specific to IF neurons.
We examined our methods in more detail on the CIFAR-10 dataset. This dataset is composed of
60000 32×32 pixel labelled images from ten categories. We used the first 50000 images for training
and the last 10000 for testing, and augmented the dataset by taking random 24 ×24 patches from the
training images and then testing on the center patches from the testing images. This methodology
is similar to Krizhevsky et al. [6], except that they also used multiview testing where the classifier
output is the average output of the classifier run on nine random patches from each testing image
(increasing the accuracy by about 2%).
Table 3 shows the effect of each modification on the network classification error. Rows 1-5 show that
each successive modification required to make the network amenable to running in spiking neurons
adds additional error. Despite the fact that training with noise adds additional error to the ANN,
rows 6-8 of the table show that in the spiking network, training with noise pays off, though training
with too much noise is not advantageous. Specifically, though training with σ= 20 versus σ= 10
decreased the error introduced when switching to spiking neurons, it introduced more error to the
ANN (Network 5), resulting in worse SNN performance (Network 8).
LIF neurons versus ReLUs in the ANN, but this could simply be due to the training hyperparameters chosen,
since these were not optimized in any way.
5
# Modification CIFAR-10 error
0 Original ANN based on Krizhevsky et al. [6] 14.03%
1 Network 0 minus local contrast normalization 14.38%
2 Network 1 minus max pooling 16.70%
3 Network 2 with soft LIF 15.89%
4 Network 3 with training noise (σ= 10) 16.28%
5 Network 3 with training noise (σ= 20) 16.92%
6 Network 3 (σ= 0) in spiking neurons 17.06%
7 Network 4 (σ= 10) in spiking neurons 16.46%
8 Network 5 (σ= 20) in spiking neurons 17.04%
Table 3: Effects of successive modifications to CIFAR-10 error. We first show the original ANN
based on [6], and then the effects of each subsequent modification. Rows 6-8 show the results of
running ANNs 3-5 in spiking neurons, respectively. Row 7 is the best spiking network, using a
moderate amount of training noise.
3.1 Efficiency
Running on standard hardware, spiking networks are considerably less efficient than their ANN
counterparts. This is because ANNs are static, requiring only one forward-pass through the network
to compute the output, whereas SNNs are dynamic, requiring the input to be presented for a number
of time steps and thus a number of forward passes. On hardware that can take full advantage of the
sparsity that spikes provide—that is, neuromorphic hardware—SNNs can be more efficient than the
equivalent ANNs, as we show here.
First, we need to compute the computational efficiency of the original network, specifically the num-
ber of floating-point operations (flops) required to pass one image through the network. There are
two main sources of computation in the image: computing the neurons and computing the connec-
tions.
flops =flops
neuron ×neurons +flops
connection ×connections (4)
Since a rectified linear unit is a simple max function, it requires only one flop to compute
(flops/neuron = 1). Each connection requires two flops, a multiply and an add (flops/connection = 2).
We can determine the number of connections by “unrolling” each convolution, so that the layer is in
the same form as a locally connected layer.
To compute the SNN efficiency on a prospective neuromorphic chip, we begin by identifying the
energy cost of a synaptic event (Esynop) and neuron update (Eupdate), relative to standard hardware.
In consultation with neuromorphic experts, and examining current reports of neuromorphic chips
(e.g. [21]), we assume that each neuron update takes as much energy as 0.25 flops (Eupdate = 0.25),
and each synaptic event takes as much energy as 0.08 flops (Esynop = 0.08). (These numbers could
potentially be much lower for analog chips, e.g. [14].) Then, the total energy used by an SNN to
classify one image is (in units of the energy required by one flop on standard hardware)
ESN N =Esynop
synops
s+Eupdate
updates
s×s
image (5)
For our CIFAR-10 network, we find that on average, the network has rates of 2,693,315,174 syn-
ops/s and 49,536,000 updates/s. This results in EC IF AR10 = 45,569,843, when each image is
presented for 200 ms. Dividing by the number of flops per image on standard hardware, we find that
the relative efficiency of the CIFAR-10 network is 0.76, that is it is somewhat less efficient.
Equation 5 shows that if we are able to lower the amount of time needed to present each image to
the network, we can lower the energy required to classify the image. Alternatively, we can lower
the number of synaptic events per second by lowering the firing rates of the neurons. Lowering
the number of neuron updates would have little effect on the overall energy consumption since the
synaptic events require the majority of the energy.
To lower the presentation time required for each input while maintaining accuracy, we need to
decrease the synapse time constant as well, so that the information is able to propagate through the
6
Dataset τs[ms] c0[ms] c1[ms] Error Efficiency
CIFAR-10 5 120 200 16.46% 0.76×
CIFAR-10 0 10 80 16.63% 1.64×
CIFAR-10 0 10 60 17.47% 2.04×
MNIST 5 120 200 0.88% 5.94×
MNIST 2 40 100 0.92% 11.98×
MNIST 2 50 60 1.14% 14.42×
MNIST 0 20 60 3.67% 14.42×
ILSVRC-2012 3 140 200 23.80% 1.39×
ILSVRC-2012 0 30 80 25.33% 2.88×
ILSVRC-2012 0 30 60 25.36% 3.51×
Table 4: Estimated efficiency of our networks on neuromorphic hardware, compared with traditional
hardware. For all datasets, there is a tradeoff between accuracy and efficiency, but we find many con-
figurations that are significantly more efficient while sacrificing little in terms of accuracy. τsis the
synapse time constant, c0is the start time of the classification, c1is the end time of the classification
(i.e. the total presentation time for each image).
whole network in the decreased presentation time. Table 4 shows the effect of various alternatives
for the presentation time and synapse time constant on the accuracy and efficiency of the networks
for a number of the datasets.
Table 4 shows that for some datasets (e.g. CIFAR-10 and ILSVRC-2012) the synapses can be com-
pletely removed (τs= 0 ms) without sacrificing much accuracy. Interestingly, this is not the case
with the MNIST network, which requires at least some measure of synapses to function accurately.
We suspect that this is because the MNIST network has much lower firing rates than the other net-
works (average of 9.67 Hz for MNIST, 148 Hz for CIFAR-10, 93.3 Hz for ILSVRC-2012). This
difference in average firing rates is also why the MNIST network is significantly more efficient than
the other networks.
It is important to tune the classification time, both in terms of the total length of time each example
is shown for (c1), and when classification begins (c0). The optimal values for these parameters are
very dependent on the network, both in terms of the number of layers, firing rates, and synapse time
constants. Figure 3 shows how the classification time affects accuracy for various networks.
Given that the CIFAR-10 network performs almost as well with no synapses as with synapses, one
may question whether noise is required during training at all. We retrained the CIFAR-10 network
with no noise and ran with no synapses, but could not achieve accuracy better than 18.06%. This
suggests that noise is still beneficial during training.
4 Discussion
Our results show that it is possible to train accurate deep convolutional networks for image clas-
sification without adding neurons, while using more complex nonlinear neuron types—specifically
the LIF neuron—as opposed to the traditional rectified-linear or sigmoid neurons. We have shown
that networks can be run in spiking neurons, and training with noise decreases the amount of error
introduced when running in spiking versus rate neurons. These networks can be significantly more
energy-efficient than traditional ANNs when run on specialized neuromorphic hardware.
The first main contribution of this paper is to demonstrate that state-of-the-art spiking deep networks
can be trained with LIF neurons, while maintaining high levels of classification accuracy. For exam-
ple, we have described the first large-scale SNN able to provide good results on ImageNet. Notably,
all other state-of-the-art methods use integrate-and-fire (IF) neurons [11, 10, 12], which are straight-
forward to fit to the rectified linear units commonly used in deep convolutional networks. We show
that there is minimal drop in accuracy when converting from ANN to SNN. We also examine how
classification time affects accuracy and energy-efficiency, and find that networks can be made quite
efficient with minimal loss in accuracy.
7
CIFAR-10 (τs= 5 ms) CIFAR-10 (τs= 0 ms)
MNIST (τs= 2 ms) ILSVRC-2012 (τs= 0 ms)
Figure 3: Effects of classification time on accuracy. Individual traces show different starting classi-
fication times (c0), and the x-axis the end classification time (c1).
By smoothing the LIF response function so that its derivative remains bounded, we are able to use
this more complex and nonlinear neuron with a standard convolutional network trained by back-
propagation. Our smoothing method is extensible to other neuron types, allowing for networks to be
trained for neuromorphic hardware with idiosyncratic neuron types (e.g. [14]). We found that there
was very little error introduced by switching from the soft response function to the hard response
function with LIF neurons for the amount of smoothing that we used. However, for neurons with
harsh discontinuities that require more smoothing, it may be necessary to slowly relax the smoothing
over the course of the training so that, by the end of the training, the smooth response function is
arbitrarily close to the hard response function.
The second main contribution of this paper is to demonstrate that training with noise on neuron
outputs can decrease the error introduced when transitioning to spiking neurons. The error decreased
by 0.6% overall on the CIFAR-10 network, despite the fact that the ANN trained without noise
performs better. This is because noise on the output of the neuron simulates the variability that a
spiking network encounters when filtering a spike train. There is a tradeoff between training with
too little noise, which makes the SNN less accurate, and too much noise, which makes the initially
trained ANN less accurate.
These methods provide new avenues for translating traditional ANNs to spike-based neuromorphic
hardware. We have provided some evidence that such implementations can be significantly more
energy-efficient than their ANN counterparts. While our analyses only consider static image classi-
fication, we expect that the real efficiency of SNNs will become apparent when dealing with dynamic
inputs (e.g. video). This is because SNNs are inherently dynamic, and take a number of simulation
steps to process each image. This makes them best suited to processing dynamic sequences, where
adjacent frames in the video sequence are similar to one another, and the network does not have to
take time to constantly “reset” after sudden changes in the input.
Future work includes experimenting with lowering firing rates for greater energy-efficiency. This
could be done by changing the neuron refractory period τref to limit the firing below a particular
8
rate, optimizing for both accuracy and low rates, using adapting neurons, or adding lateral inhibition
in the convolutional layers. Other future work includes implementing max-pooling and local contrast
normalization layers in spiking networks. Networks could also be trained offline as described here
and then fine-tuned online using an STDP rule [22, 23] to help further reduce errors associated
with converting from rate-based to spike-based networks, while avoiding difficulties with training a
network in spiking neurons from scratch.
References
[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recogni-
tion,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[2] A. Krizhevsky, “Convolutional deep belief networks on CIFAR-10,” Tech. Rep., 2010.
[3] P. Sermanet, S. Chintala, and Y. LeCun, “Convolutional neural networks applied to house numbers digit
classification,” in International Conference on Pattern Recognition (ICPR), 2012.
[4] C.-Y. Lee, S. Xie, P. W. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in International
Conference on Artificial Intelligence and Statistics (AISTATS), vol. 38, 2015, pp. 562–570.
[5] R. Gens and P. Domingos, “Discriminative learning of sum-product networks,” in Advances in Neural
Information Processing Systems (NIPS), 2012, pp. 1–9.
[6] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural net-
works,” in Advances in Neural Information Processing Systems, 2012.
[7] C. Eliasmith, T. C. Stewart, X. Choo, T. Bekolay, T. DeWolf, C. Tang, and D. Rasmussen, “A Large-Scale
Model of the Functioning Brain,” Science, vol. 338, no. 6111, pp. 1202–1205, Nov. 2012.
[8] E. Neftci, S. Das, B. Pedroni, K. Kreutz-Delgado, and G. Cauwenberghs, “Event-driven contrastive di-
vergence for spiking neuromorphic systems,Frontiers in Neuroscience, vol. 7, no. 272, 2013.
[9] P. O’Connor, D. Neil, S.-C. Liu, T. Delbruck, and M. Pfeiffer, “Real-time classification and sensor fusion
with a spiking deep belief network,” Frontiers in Neuroscience, vol. 7, Jan. 2013.
[10] P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, and M. Pfeiffer, “Fast-Classifying, High-Accuracy
Spiking Deep Networks Through Weight and Threshold Balancing,” in IEEE International Joint Confer-
ence on Neural Networks (IJCNN), 2015.
[11] Y. Cao, Y. Chen, and D. Khosla, “Spiking Deep Convolutional Neural Networks for Energy-Efficient
Object Recognition,” International Journal of Computer Vision, vol. 113, no. 1, pp. 54–66, Nov. 2014.
[12] S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L.
Mckinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner, and D. S.
Modha, “Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing,” arXiv preprint,
vol. 1603, no. 08270, pp. 1–7, 2016.
[13] P. U. Diehl, G. Zarrella, A. Cassidy, B. U. Pedroni, and E. Neftci, “Conversion of Artificial Recurrent
Neural Networks to Spiking Neural Networks for Low-power Neuromorphic Hardware,arXiv preprint,
vol. 1601, no. 04187, 2016.
[14] B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chandrasekaran, J.-M. Bussat, R. Alvarez-
Icaza, J. V. Arthur, P. A. Merolla, and K. Boahen, “Neurogrid: A mixed-analog-digital multichip system
for large-scale neural simulations,Proceedings of the IEEE, vol. 102, no. 5, pp. 699–716, 2014.
[15] E. Hunsberger and C. Eliasmith, “Spiking Deep Networks with LIF Neurons,” arXiv:1510.08829 [cs],
pp. 1–9, 2015.
[16] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with
denoising autoencoders,” in International Conference on Machine Learning (ICML), 2008, pp. 1096–
1103.
[17] Z. F. Mainen and T. J. Sejnowski, “Reliability of spike timing in neocortical neurons.” Science (New York,
N.Y.), vol. 268, no. 5216, pp. 1503–6, Jun. 1995.
[18] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading Digits in Natural Images with
Unsupervised Feature Learning,” in NIPS workshop on deep learning and unsupervised feature learning,
2011, pp. 1–9.
[19] A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” Master’s thesis, University of
Toronto, 2009.
[20] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla,
M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,Inter-
national Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
9
[21] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson,
N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D.
Flickner, W. P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuron integrated circuit with a
scalable communication network and interface,Science, vol. 345, no. 6197, pp. 668–673, 2014.
[22] B. Nessler, M. Pfeiffer, L. Buesing, and W. Maass, “Bayesian computation emerges in generic cortical
microcircuits through spike-timing-dependent plasticity.” PLoS computational biology, vol. 9, no. 4, p.
e1003037, Apr. 2013.
[23] T. Bekolay, C. Kolbeck, and C. Eliasmith, “Simultaneous unsupervised and supervised learning of cogni-
tive functions in biologically plausible spiking neural networks,” in Proc. 35th Annual Conference of the
Cognitive Science Society, 2013, pp. 169–174.
10
... 2. Train the equivalent rate-based network with the methodology described by Hunsberger and Eliasmith 40 , the default method implemented in NengoDL to train SNNs. ...
... Cao et al 45 established an equivalence between the ReLU activation function 46 and the spiking neuron's firing rate. Hunsberger and Eliasmith 40 propose a method to train spiking deep networks that can be extended to non-linear activation functions such as leaky integrate-and-fire (LIF), by smoothing the equivalent rate equation employed to train the ANN. To understand this, let us look at the equation governing the dynamics of a LIF neuron: ...
... where ρ(x) = max(x, 0). However, this function is not completely differentiable, so the LIF rate equation is softened to address this problem and allow to use the backpropagation algorithm 40 . The hard maximum ρ is replaced by a soft maximum ρ ′ defined as: ...
Preprint
Full-text available
There is an upwards trend of applying deep learning to model wind power forecasts. The modelling and training of these architectures may take many computational resources, hindering the possibility of implementing such algorithms for shorter term prediction horizons. Emerging computational architectures such as neuromorphic computing have the potential of real-time learning using brain-inspired algorithms characterized by low latency and low energy consumption. In particular, we introduce spiking neural networks for short-term wind power forecasting, taking into consideration the current development and features of neuromorphic devices.
... Gardner et al. (Gardner, Sporea, and Grüning 2015) applied a probability neuron model to calculate gradients. Gradient backpropagation was applied in (Hunsberger and Eliasmith 2016;Lee, Delbruck, and Pfeiffer 2016;Jin, Zhang, and Li 2018;Wu et al. 2019b) based on spike rate coding where gradient backpropagation through both SNN layer responses and neuron membrane potential dynamics was needed, which made the algorithms extremely complex. Backpropagation through SNN layer responses only was applied in (Mostafa 2017). ...
... A list of neuromorphic hardware has been developed for SNN, such as IBM TrueNorth (Merolla et al. 2014), Intel Loihi (Davies et al. 2018), and BrainScaleS (Aamir et al. 2018b). For energy efficiency, Hunsberger and Eliasmith (Hunsberger and Eliasmith 2016) estimated that a synaptic operation consumed only 8% of the energy of a microprocessor floating-point operation. Cao et al. (Cao, Chen, and Khosla 2015) showed that SNN implemented in a neuromorphic circuit with 45 pJ per spike was 185 times more energyefficient than the FPGA-based DNN implementation. ...
Article
Spiking neural network (SNN) is promising but the development has fallen far behind conventional deep neural networks (DNNs) because of difficult training. To resolve the training problem, we analyze the closed-form input-output response of spiking neurons and use the response expression to build abstract SNN models for training. This avoids calculating membrane potential during training and makes the direct training of SNN as efficient as DNN. We show that the nonleaky integrate-and-fire neuron with single-spike temporal-coding is the best choice for direct-train deep SNNs. We develop an energy-efficient phase-domain signal processing circuit for the neuron and propose a direct-train deep SNN framework. Thanks to easy training, we train deep SNNs under weight quantizations to study their robustness over low-cost neuromorphic hardware. Experiments show that our direct-train deep SNNs have the highest CIFAR-10 classification accuracy among SNNs, achieve ImageNet classification accuracy within 1% of the DNN of equivalent architecture, and are robust to weight quantization and noise perturbation.
... In SNNs, synaptic strengths are described as scalar weights that can be dynamically modified according to a particular learning rule. Actively investigated, the learning rules of SNNs can be generally categorized into three categories: conversionbased methods that map SNNs from trained ANNs (Diehl et al., 2016;Hunsberger and Eliasmith, 2016;Rueckauer et al., 2016Rueckauer et al., , 2017Sengupta et al., 2019;Han et al., 2020); supervised learning with spikes that directly train SNNs using variations of error backpropagation (Lee et al., 2016;Shrestha and Orchard, 2018;Wu et al., 2018Wu et al., , 2019Neftci et al., 2019;Yin et al., 2020;Fang et al., 2021); local learning rules at synapses, such as schemes exploring the spike time dependent plasticity (STDP) (Song et al., 2000;Nessler et al., 2009;Diehl and Cook, 2015;Tavanaei et al., 2016;Masquelier and Kheradpisheh, 2018). In addition to the above-mentioned directions, many new algorithms have emerged, such as: a biological plausible BP implementation in pyramidal neurons based on the Bursting mechanism (Payeur et al., 2021); a biologically plausible online learning based on rewards and eligibility traces (Bellec et al., 2020); and the target-based learning in recurrent spiking networks (Ingrosso and Abbott, 2019;Muratore et al., 2021), which provides an alternative to error-based approaches. ...
Article
Full-text available
Spiking Neural Networks (SNNs) are considered more biologically realistic and power-efficient as they imitate the fundamental mechanism of the human brain. Backpropagation (BP) based SNN learning algorithms that utilize deep learning frameworks have achieved good performance. However, those BP-based algorithms partially ignore bio-interpretability. In modeling spike activity for biological plausible BP-based SNNs, we examine three properties: multiplicity, adaptability, and plasticity (MAP). Regarding multiplicity, we propose a Multiple-Spike Pattern (MSP) with multiple-spike transmission to improve model robustness in discrete time iterations. To realize adaptability, we adopt Spike Frequency Adaption (SFA) under MSP to reduce spike activities for enhanced efficiency. For plasticity, we propose a trainable state-free synapse that models spike response current to increase the diversity of spiking neurons for temporal feature extraction. The proposed SNN model achieves competitive performances on the N-MNIST and SHD neuromorphic datasets. In addition, experimental results demonstrate that the proposed three aspects are significant to iterative robustness, spike efficiency, and the capacity to extract spikes' temporal features. In summary, this study presents a realistic approach for bio-inspired spike activity with MAP, presenting a novel neuromorphic perspective for incorporating biological properties into spiking neural networks.
... However, researchers are increasingly focussing on applying known properties of the primate visual system, to make CNNs more brain-like. For example, the degree of recurrency is positively correlated with prediction accuracy of primate ventral stream activity (Kubilius et al., 2019), and some researchers are testing how spike-timing -an important feature of neural information transmission -can be implemented in CNNs (Hunsberger & Eliasmith, 2016;Tavanaei & Maida, 2016). ...
Thesis
Visual understanding of real-world scenes is near-instantaneous. Humans can extract a wealth of information, including spatial structure, semantic category, and the identity of embedded objects, from images viewed for fewer than 100 msecs. Visual processing has capacity limits, and, as a result, the computational processes that underlie this behaviour must be highly efficient. Computational theories of realworld scene perception model early image processing in various ways. In Chapter 1, I review these theories, and in Chapter 2, I review the role of depth cues in rapid visual processing. This discussion reveals three problems: (i) Tests of the agreement between model predictions and human responses may be biased by the arbitrary choice of category system, (ii) Current models posit that scene semantics is estimated from spatial structure properties, but empirical support for this position is inconsistent, and (iii) The time-course of depth estimation in real-world scenes is poorly understood. To address these problems, three empirical papers are presented in Chapters 3, 4, and 5. In Chapter 3, I propose and validate a novel clustering algorithm that can be applied to image databases to derive category systems for visual experiments. In Chapters 3 and 4, I examine the relationship between spatial structure and semantic information, and find little support for the position that spatial structure properties inform semantic discrimination. In Chapters 4 and 5, I characterize the time-course of depth processing for images presented for <267 msecs, and conclude that binocular disparity and elevation cues contribute to realworld perception shortly after image onset (<50 msecs). These findings are discussed together in Chapter 6. This thesis contributes to the evaluation of modern models of real-world scene perception, and helps to characterize how visual understanding unfolds over time.
... Functional SNNs are most commonly obtained by converting a previously trained ANN [33][34][35][36][37] or through direct training using timing-based methods [38][39][40][41][42] or SGs [5,6,43]. While both approaches can result in well-performing networks, direct training typically leads to sparser activity levels while also leveraging spike timing which can be beneficial for energy efficiency [44]. ...
Preprint
Full-text available
Spiking neural networks (SNNs) underlie low-power, fault-tolerant information processing in the brain and could constitute a power-efficient alternative to conventional deep neural networks when implemented on suitable neuromorphic hardware accelerators. However, instantiating SNNs that solve complex computational tasks in-silico remains a significant challenge. Surrogate gradient (SG) techniques have emerged as a standard solution for training SNNs end-to-end. Still, their success depends on synaptic weight initialization, similar to conventional artificial neural networks (ANNs). Yet, unlike in the case of ANNs, it remains elusive what constitutes a good initial state for an SNN. Here, we develop a general initialization strategy for SNNs inspired by the fluctuation-driven regime commonly observed in the brain. Specifically, we derive practical solutions for data-dependent weight initialization that ensure fluctuation-driven firing in the widely used leaky integrate-and-fire (LIF) neurons. We empirically show that SNNs initialized following our strategy exhibit superior learning performance when trained with SGs. These findings generalize across several datasets and SNN architectures, including fully connected, deep convolutional, recurrent, and more biologically plausible SNNs obeying Dale's law. Thus fluctuation-driven initialization provides a practical, versatile, and easy-to-implement strategy for improving SNN training performance on diverse tasks in neuromorphic engineering and computational neuroscience.
Article
Full-text available
Spiking neural networks (SNNs) can utilize spatio-temporal information and have the characteristic of energy efficiency, being a good alternative to deep neural networks (DNNs). The event-driven information processing means that SNNs can reduce the expensive computation of DNNs and save a great deal of energy consumption. However, high training and inference latency is a limitation of the development of deeper SNNs. SNNs usually need tens or even hundreds of time steps during the training and inference process, which causes not only an increase in latency but also excessive energy consumption. To overcome this problem, we propose a novel training method based on backpropagation (BP) for ultra-low-latency (1–2 time steps) SNNs with multi-threshold. In order to increase the information capacity of each spike, we introduce the multi-threshold Leaky Integrate and Fired (LIF) model. The experimental results show that our proposed method achieves average accuracy of 99.56%, 93.08%, and 87.90% on MNIST, FashionMNIST, and CIFAR10, respectively, with only two time steps. For the CIFAR10 dataset, our proposed method achieves 1.12% accuracy improvement over the previously reported directly trained SNNs with fewer time steps.
Article
Deep spiking neural network (DSNN) is a promising computational model towards artificial intelligence. It benefits from both the DNNs and SNNs through a hierarchy structure to extract multiple levels of abstraction and the event-driven computational manner to provide ultra-low-power neuromorphic implementation, respectively. However, how to efficiently train the DSNNs remains an open question because of the non-differentiable spike function that prevents the traditional back-propagation (BP) learning algorithm directly applied to DSNNs. Here, inspired by the findings from the biological neural networks, we address the above-mentioned problem by introducing neural oscillation and spike-phase information to DSNNs. Specifically, we propose an Oscillation Postsynaptic Potential (Os-PSP) and phase-locking active function, and further put forward a new spiking neuron model, namely Resonate Spiking Neuron (RSN). Based on the RSN, we propose a Spike-Level-Dependent Back-Propagation (SLDBP) learning algorithm for DSNNs. Experimental results show that the proposed learning algorithm resolves the problems caused by the incompatibility between the BP learning algorithm and SNNs, and achieves state-of-the-art performance in single spike-based learning algorithms. This work investigates the contribution of introducing biologically inspired mechanisms, such as neural oscillation and spike-phase information to DSNNs and providing a new perspective to design future DSNNs.
Article
Full-text available
Spiking neural networks (SNNs) are brain-inspired machine learning algorithms with merits such as biological plausibility and unsupervised learning capability. Previous works have shown that converting Artificial Neural Networks (ANNs) into SNNs is a practical and efficient approach for implementing an SNN. However, the basic principle and theoretical groundwork are lacking for training a non-accuracy-loss SNN. This paper establishes a precise mathematical mapping between the biological parameters of the Linear Leaky-Integrate-and-Fire model (LIF)/SNNs and the parameters of ReLU-AN/Deep Neural Networks (DNNs). Such mapping relationship is analytically proven under certain conditions and demonstrated by simulation and real data experiments. It can serve as the theoretical basis for the potential combination of the respective merits of the two categories of neural networks.
Article
Full-text available
The past decade has witnessed the great success of deep neural networks in various domains. However, deep neural networks are very resource-intensive in terms of energy consumption, data requirements, and high computational costs. With the recent increasing need for the autonomy of machines in the real world, e.g., self-driving vehicles, drones, and collaborative robots, exploitation of deep neural networks in those applications has been actively investigated. In those applications, energy and computational efficiencies are especially important because of the need for real-time responses and the limited energy supply. A promising solution to these previously infeasible applications has recently been given by biologically plausible spiking neural networks. Spiking neural networks aim to bridge the gap between neuroscience and machine learning, using biologically realistic models of neurons to carry out the computation. Due to their functional similarity to the biological neural network, spiking neural networks can embrace the sparsity found in biology and are highly compatible with temporal code. Our contributions in this work are: (i) we give a comprehensive review of theories of biological neurons; (ii) we present various existing spike-based neuron models, which have been studied in neuroscience; (iii) we detail synapse models; (iv) we provide a review of artificial neural networks; (v) we provide detailed guidance on how to train spike-based neuron models; (vi) we revise available spike-based neuron frameworks that have been developed to support implementing spiking neural networks; (vii) finally, we cover existing spiking neural network applications in computer vision and robotics domains. The paper concludes with discussions of future perspectives.
Conference Paper
Full-text available
Deep neural networks such as Convolutional Net­ works (ConvNets) and Deep Belief Networks (DBNs) represent the state-of-the-art for many machine learning and computer vision classification problems. To overcome the large computational cost of deep networks, spiking deep networks have recently been pro­ posed, given the specialized hardware now available for spiking neural networks (SNNs). However, this has come at the cost of performance losses due to the conversion from analog neural networks (ANNs) without a notion of time, to sparsely firing, event-driven SNNs. Here we analyze the effects of converting deep ANNs into SNNs with respect to the choice of parameters for spiking neurons such as firing rates and thresholds. We present a set of optimization techniques to minimize performance loss in the conversion process for ConvNets and fully connected deep networks. These techniques yield networks that outperform all previous SNNs on the MNIST database to date, and many networks here are close to maximum performance after only 20 ms of simulated time. The techniques include using rectified linear units (ReLUs) with zero bias during training, and using a new weight normalization method to help regulate firing rates. Our method for converting an ANN into an SNN enables low­ latency classification with high accuracies already after the first output spike, and compared with previous SNN approaches it yields improved performance without increased training time. The presented analysis and optimization techniques boost the value of spiking deep networks as an attractive framework for neuromorphic computing platforms aimed at fast and efficient pattern recognition.
Article
Full-text available
We train spiking deep networks using leaky integrate-and-fire (LIF) neurons, and achieve state-of-the-art results for spiking networks on the CIFAR-10 and MNIST datasets. This demonstrates that biologically-plausible spiking LIF neurons can be integrated into deep networks can perform as well as other spiking models (e.g. integrate-and-fire). We achieved this result by softening the LIF response function, such that its derivative remains bounded, and by training the network with noise to provide robustness against the variability introduced by spikes. Our method is general and could be applied to other neuron types, including those used on modern neuromorphic hardware. Our work brings more biological realism into modern image classification models, with the hope that these models can inform how the brain performs this difficult task. It also provides new methods for training deep networks to run on neuromorphic hardware, with the aim of fast, power-efficient image classification for robotics applications.
Article
Full-text available
Deep-learning neural networks such as convolutional neural network (CNN) have shown great potential as a solution for difficult vision problems, such as object recognition. Spiking neural networks (SNN)-based architectures have shown great potential as a solution for realizing ultra-low power consumption using spike-based neuromorphic hardware. This work describes a novel approach for converting a deep CNN into a SNN that enables mapping CNN to spike-based hardware architectures. Our approach first tailors the CNN architecture to fit the requirements of SNN, then trains the tailored CNN in the same way as one would with CNN, and finally applies the learned network weights to an SNN architecture derived from the tailored CNN. We evaluate the resulting SNN on publicly available Defense Advanced Research Projects Agency (DARPA) Neovision2 Tower and CIFAR-10 datasets and show similar object recognition accuracy as the original CNN. Our SNN implementation is amenable to direct mapping to spike-based neuromorphic hardware, such as the ones being developed under the DARPA SyNAPSE program. Our hardware mapping analysis suggests that SNN implementation on such spike-based hardware is two orders of magnitude more energy-efficient than the original CNN implementation on off-the-shelf FPGA-based hardware.
Conference Paper
Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent. We make an attempt to boost the classification performance by studying a new formulation in deep networks. Three aspects in convolutional neural networks (CNN) style architectures are being looked at: (1) transparency of the intermediate layers to the overall classification; (2) discriminativeness and robustness of learned features, especially in the early layers; (3) effectiveness in training due to the presence of the exploding and vanishing gradients. We introduce "companion objective" to the individual hidden layers, in addition to the overall objective at the output layer (a different strategy to layer-wise pre-training). We extend techniques from stochastic gradient methods to analyze our algorithm. The advantage of our method is evident and our experimental result on benchmark datasets shows significant performance gain over existing methods (e.g. all state-of-the-art results on MNIST, CIFAR-10, CIFAR-100, and SVHN).
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using "hard" gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset.
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Article
Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex scenes like photographs, however, is far more difficult: the best existing methods lag well behind human performance on the same tasks. In this paper we attack the prob-lem of recognizing digits in a real application using unsupervised feature learning methods: reading house numbers from street level photos. To this end, we intro-duce a new benchmark dataset for research use containing over 600,000 labeled digits cropped from Street View images. We then demonstrate the difficulty of recognizing these digits when the problem is approached with hand-designed fea-tures. Finally, we employ variants of two recently proposed unsupervised feature learning methods and find that they are convincingly superior on our benchmarks.