This paper introduces an event-driven feedforward categorization system, which takes data from a temporal contrast address event representation (AER) sensor. The proposed system extracts bio-inspired cortex-like features and discriminates different patterns using an AER based tempotron classifier (a network of Leaky Integrate-and-Fire spiking neurons). One of the system’s most appealing characteristics is its event-driven processing, with both input and features taking the form of address events (spikes). The system was evaluated on an AER posture dataset and compared to two recently developed bio-inspired models. Experimental results have shown that it consumes much less simulation time while still maintaining comparable performance. In addition, experiments on the MNIST image dataset have demonstrated that the proposed system can work not only on raw AER data but also on images (with a preprocessing step to convert images into AER events) and that it can maintain competitive accuracy even when noise is added. The system was further evaluated on the MNIST-DVS dataset (in which data is recorded using an AER dynamic vision sensor), with testing accuracy of 88.14%.
This paper describes CAVIAR, a massively parallel hardware implementation of a spike-based sensing-processing-learning-actuating system inspired by the physiology of the nervous system. CAVIAR uses the asychronous address-event representation (AER) communication framework and was developed in the context of a European Union funded project. It has four custom mixed-signal AER chips, five custom digital AER interface components, 45k neurons (spiking cells), up to 5M synapses, performs 12G synaptic operations per second, and achieves millisecond object recognition and tracking latencies.
Conventional vision-based robotic systems that must operate quickly require high video frame rates and consequently high computational costs. Visual response latencies are lower-bound by the frame period, e.g., 20 ms for 50 Hz frame rate. This paper shows how an asynchronous neuromorphic dynamic vision sensor (DVS) silicon retina is used to build a fast self-calibrating robotic goalie, which offers high update rates and low latency at low CPU load. Independent and asynchronous per pixel illumination change events from the DVS signify moving objects and are used in software to track multiple balls. Motor actions to block the most "threatening" ball are based on measured ball positions and velocities. The goalie also sees its single-axis goalie arm and calibrates the motor output map during idle periods so that it can plan open-loop arm movements to desired visual locations. Blocking capability is about 80% for balls shot from 1 m from the goal even with the fastest-shots, and approaches 100% accuracy when the ball does not beat the limits of the servo motor to move the arm to the necessary position in time. Running with standard USB buses under a standard preemptive multitasking operating system (Windows), the goalie robot achieves median update rates of 550 Hz, with latencies of 2.2 ± 2 ms from ball movement to motor command at a peak CPU load of less than 4%. Practical observations and measurements of USB device latency are provided.
Event-driven visual sensors have attracted interest from a number of different research communities. They provide visual information in quite a different way from conventional video systems consisting of sequences of still images rendered at a given "frame rate." Event-driven vision sensors take inspiration from biology. Each pixel sends out an event (spike) when it senses something meaningful is happening, without any notion of a frame. A special type of event-driven sensor is the so-called dynamic vision sensor (DVS) where each pixel computes relative changes of light or "temporal contrast." The sensor output consists of a continuous flow of pixel events that represent the moving objects in the scene. Pixel events become available with microsecond delays with respect to "reality." These events can be processed "as they flow" by a cascade of event (convolution) processors. As a result, input and output event flows are practically coincident in time, and objects can be recognized as soon as the sensor provides enough meaningful events. In this paper, we present a methodology for mapping from a properly trained neural network in a conventional frame-driven representation to an event-driven representation. The method is illustrated by studying event-driven convolutional neural networks (ConvNet) trained to recognize rotating human silhouettes or high speed poker card symbols. The event-driven ConvNet is fed with recordings obtained from a real DVS camera. The event-driven ConvNet is simulated with a dedicated event-driven simulator and consists of a number of event-driven processing modules, the characteristics of which are obtained from individually manufactured hardware modules.
This paper presents a modular, scalable approach to assembling hierarchically structured neuromorphic Address Event Representation (AER) systems. The method consists of arranging modules in a 2D mesh, each communicating bidirectionally with all four neighbors. Address events include a module label. Each module includes an AER router which decides how to route address events. Two routing approaches have been proposed, analyzed and tested, using either destination or source module labels. Our analyses reveal that depending on traffic conditions and network topologies either one or the other approach may result in better performance. Experimental results are given after testing the approach using high-end Virtex-6 FPGAs. The approach is proposed for both single and multiple FPGAs, in which case a special bidirectional parallel-serial AER link with flow control is exploited, using the FPGA Rocket-I/O interfaces. Extensive test results are provided exploiting convolution modules of 64 × 64 pixels with kernels with sizes up to 11 × 11, which process real sensory data from a Dynamic Vision Sensor (DVS) retina. One single Virtex-6 FPGA can hold up to 64 of these convolution modules, which is equivalent to a neural network with 262 × 10(3) neurons and almost 32 million synapses.
Dynamic Vision Sensors (DVS) have recently appeared as a new paradigm for vision sensing and processing. They feature unique characteristics such as contrast coding under wide illumination variation, micro-second latency response to fast stimuli, and low output data rates (which greatly improves the efficiency of post-processing stages). They can track extremely fast objects (e.g., time resolution is better than 100 kFrames/s video) without special lighting conditions. Their availability has triggered a new range of vision applications in the fields of surveillance, motion analyses, robotics, and microscopic dynamic observations. One key DVS feature is contrast sensitivity, which has so far been reported to be in the 10-15% range. In this paper, a novel pixel photo sensing and transimpedance pre-amplification stage makes it possible to improve by one order of magnitude contrast sensitivity (down to 1.5%) and power (down to 4 mW), reduce the best reported FPN (Fixed Pattern Noise) by a factor of 2 (down to 0.9%), while maintaining the shortest reported latency (3 μs) and good Dynamic Range (120 dB), and further reducing overall area (down to 30 × 31 μm per pixel). The only penalty is the limitation of intrascene Dynamic Range to 3 decades. A 128 × 128 DVS test prototype has been fabricated in standard 0.35 μm CMOS and extensive experimental characterization results are provided.
The biomimetic CMOS dynamic vision and image sensor described in this paper is based on a QVGA (304×240) array of fully autonomous pixels containing event-based change detection and pulse-width-modulation (PWM) imaging circuitry. Exposure measurements are initiated and carried out locally by the individual pixel that has detected a change of brightness in its field-of-view. Pixels do not rely on external timing signals and independently and asynchronously request access to an (asynchronous arbitrated) output channel when they have new grayscale values to communicate. Pixels that are not stimulated visually do not produce output. The visual information acquired from the scene, temporal contrast and grayscale data, are communicated in the form of asynchronous address-events (AER), with the grayscale values being encoded in inter-event intervals. The pixel-autonomous and massively parallel operation ideally results in lossless video compression through complete temporal redundancy suppression at the pixel level. Compression factors depend on scene activity and peak at ~1000 for static scenes. Due to the time-based encoding of the illumination information, very high dynamic range - intra-scene DR of 143 dB static and 125 dB at 30 fps equivalent temporal resolution - is achieved. A novel time-domain correlated double sampling (TCDS) method yields array FPN of <;0.25% rms. SNR is >56 dB (9.3 bit) for >10 Lx illuminance.
The computational power of formal models for networks of spiking neurons is compared with that of other neural network models based on McCulloch Pitts neurons (i.e., threshold gates), respectively, sigmoidal gates. In particular it is shown that networks of spiking neurons are, with regard to the number of neurons that are needed, computationally more powerful than these other neural network models. A concrete biologically relevant function is exhibited which can be computed by a single spiking neuron (for biologically reasonable values of its parameters), but which requires hundreds of hidden units on a sigmoidal neural net. On the other hand, it is known that any function that can be computed by a small sigmoidal neural net can also be computed by a small network of spiking neurons. This article does not assume prior knowledge about spiking neurons, and it contains an extensive list of references to the currently available literature on computations in networks of spiking neurons and relevant results from neurobiology.
The ability of learning networks to generalize can be greatly enhanced by providing constraints from the task domain. This paper demonstrates how such constraints can be integrated into a backpropagation network through the architecture of the network. This approach has been successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service. A single network learns the entire recognition operation, going from the normalized image of the character to the final classification.
This paper describes a CMOS vision sensor which is inspired by biological visual systems. Each pixel independently and in continuous time quantizes local relative intensity changes to generate spike events. These events appear at the output of the sensor as an asynchronous stream of digital pixel addresses. These address-events signify scene reflectance change and have sub-millisecond timing precision. The output data rate depends on the dynamic content of the scene and is typically orders of magnitude lower than those of conventional frame-based imagers. By combining an active front-end logarithmic photoreceptor running in continuous time with a self-timed switched-capacitor differencing circuit, the sensor achieves an array mismatch of 2.1% in relative intensity event threshold and a pixel bandwidth of 3 kHz under 1 klux scene illumination. Dynamic range is >120 dB and chip power consumption is 23 mW. Event latency shows weak light dependency and decreases to 15 us at >1 klux pixel illumination. The sensor is built in a 0.35u 4M2P process yielding 40x40 um2 pixels with 9.4% fill-factor. By providing high pixel bandwidth, wide dynamic range, and precisely- timed sparse digital output, this neuromorphic silicon retina provides an attractive combination of characteristics for low-latency dynamic vision under uncontrolled illumination with low post-processing requirements.