Figure - available from: Biological Cybernetics
This content is subject to copyright. Terms and conditions apply.
Second-stage outputs and weight matrices for a two-bar visual stimulus as the manner of generating temporal fluctuations was varied, with all other stimulus parameters held constant. The left column of panels shows all ten network outputs as they developed over time. Refer to the upper left corner of Fig. 8 for a legend to identify each trace. The right column of panels shows the thresholded weight matrices at 15 s, the time at which training was concluded for each experiment. The top row (a and b) is data from the reference stimulus, which used sinusoidal shadowing. In the second row (c and d), no shadowing was used, but rather the distance of the bars from the simulated camera (and thus by perspective projection their size in the image) was varied over time. In the bottom row (e and f), a pattern of multiplicative random shadow was used. In this case, network outputs are shown for 15 s due to the increased complexity of the stimulus. However, in all three cases, the final weight matrix develops a very similar representation of the visual stimulus a Network outputs with sinusoidal shadow b Sine shadow weight matrix c Network outputs with distance variation d Distance variation weight matrix e Network outputs with random shadow f Random shadow weight matrix
Source publication
Visual binding is the process of associating the responses of visual interneurons in different visual submodalities all of which are responding to the same object in the visual field. Recently identified neuropils in the insect brain termed optic glomeruli reside just downstream of the optic lobes and have an internal organization that could suppor...
Citations
We have developed a neural network model capable of performing visual binding inspired by neuronal circuitry in the optic glomeruli of flies: a brain area that lies just downstream of the optic lobes where early visual processing is performed. This visual binding model is able to detect objects in dynamic image sequences and bind together their respective characteristic visual features—such as color, motion, and orientation—by taking advantage of their common temporal fluctuations. Visual binding is represented in the form of an inhibitory weight matrix which learns over time which features originate from a given visual object. In the present work, we show that information represented implicitly in this weight matrix can be used to explicitly count the number of objects present in the visual image, to enumerate their specific visual characteristics, and even to create an enhanced image in which one particular object is emphasized over others, thus implementing a simple form of visual attention. Further, we present a detailed analysis which reveals the function and theoretical limitations of the visual binding network and in this context describe a novel network learning rule which is optimized for visual binding.