ThesisPDF Available

Post-Cochlear Auditory Modelling for Sound Localisation using Bio-Inspired Techniques

Authors:

Abstract and Figures

This thesis presents spiking neural architectures which simulate the sound localisation capability of the mammalian auditory pathways. This localisation ability is achieved by exploiting important differences in the sound stimulus received by each ear, known as binaural cues. Interaural time difference and interaural intensity difference are the two binaural cues which play the most significant role in mammalian sound localisation. These cues are processed by different regions within the auditory pathways and enable the localisation of sounds at different frequency ranges; interaural time difference is used to localise low frequency sounds whereas interaural intensity difference localises high frequency sounds. Interaural time difference refers to the different points in time at which a sound from a single location arrives at each ear and interaural intensity difference refers to the difference in sound pressure levels of the sound at each ear, measured in decibels. Taking inspiration from the mammalian brain, two spiking neural network topologies were designed to extract each of these cues. The architecture of the spiking neural network designed to process the interaural time difference cue was inspired by the medial superior olive. The lateral superior olive was the inspiration for the architecture designed to process the interaural intensity difference cue. The development of these spiking neural network architectures required the integration of other biological models, such as an auditory periphery (cochlea) model, models of bushy cells and the medial nucleus of the trapezoid body, leaky integrate and fire spiking neurons, facilitating synapses, receptive fields and the appropriate use of excitatory and inhibitory neurons. Two biologically inspired learning algorithms were used to train the architectures to perform sound localisation. Experimentally derived HRTF acoustical data from adult domestic cats was employed to validate the localisation ability of the two architectures. The localisation abilities of the two models are comparable to other computational techniques employed in the literature. The experimental results demonstrate that the two SNN models behave in a similar way to the mammalian auditory system, i.e. the spiking neural network for interaural time difference extraction performs best when it is localising low frequency data, and the interaural intensity difference spiking neuron model performs best when it is localising high frequency data. Thus, the combined models form a duplex system of sound localisation. Additionally, both spiking neural network architectures show a high degree of robustness when the HRTF acoustical data is corrupted by noise.
Content may be subject to copyright.
A preview of the PDF is not available
... A number of theoretical and numerical models of ITD and ILD function shapes can be found. For ITD three formulas for low and high frequencies can be found in literature [8,9,10,11]: ...
Article
Poznan Supercomputing and Networking Center (PSNC) developed an ambisonic installation and workflow as part of audio-visual 8K VR 360° immersive media experiments. This work aimed to investigate the quality of performance of the PSNC setup through both subjective tests as well as simulations providing objective parameters of interaural characteristics in a real-life scenario of PSNC studio. For the objective part, an algorithm for angle estimation has been proposed and computations were performed.
Article
Full-text available
Population coding is widely regarded as a key mechanism for achieving reliable behavioral responses in the face of neuronal variability. But in standard reinforcement learning a flip-side becomes apparent. Learning slows down with increasing population size since the global reinforcement becomes less and less related to the performance of any single neuron. We show that, in contrast, learning speeds up with increasing population size if feedback about the populationresponse modulates synaptic plasticity in addition to global reinforcement. The two feedback signals (reinforcement and population-response signal) can be encoded by ambient neurotransmitter concentrations which vary slowly, yielding a fully online plasticity rule where the learning of a stimulus is interleaved with the processing of the subsequent one. The assumption of a single additional feedback mechanism therefore reconciles biological plausibility with efficient learning.
Article
Full-text available
Studies of cortical neurons in monkeys performing short-term memory tasks have shown that information about a stimulus can be maintained by persistent neuron firing for periods of many seconds after removal of the stimulus. The mechanism by which this sustained activity is initiated and maintained is unknown. In this article we present a spiking neural network model of short-term memory and use it to investigate the hypothesis that recurrent, or “re-entrant,” networks with constant connection strengths are sufficient to store graded information temporarily. The synaptic weights that enable the network to mimic the input-output characteristics of an active memory module are computed using an optimization procedure for recurrent networks with non-spiking neurons. This network is then transformed into one with spiking neurons by interpreting the continuous output values of the nonspiking model neurons as spiking probabilities. The behavior of the model neurons in this spiking network is compared with that of 179 single units previously recorded in monkey inferotemporal (IT) cortex during the performance of a short-term memory task. The spiking patterns of almost every model neuron are found to resemble closely those of IT neurons. About 40% of the IT neuron firing patterns are also found to be of the same types as those of model neurons. A property of the spiking model is that the neurons cannot maintain precise graded activity levels indefinitely, but eventually relax to one of a few constant activities called fixed-point attractors. The noise introduced into the model by the randomness of spiking causes the network to jump between these attractors. This switching between attractor states generates spike trains with a characteristic statistical temporal structure. We found evidence for the same kind of structure in the spike trains from about half of the IT neurons in our test set. These results show that the behavior of many real cortical memory neurons is consistent with an active storage mechanism based on recurrent activity in networks with fixed synaptic strengths.
Article
Full-text available
This report describes the binaural basis of the auditory space map in the optic tectum of the barn owl (Tyto alba). Single units were recorded extracellularly in ketamine-anesthetized birds. Unit tuning for interaural differences in timing and intensity of wideband noise was measured using digitally synthesized sound presented through earphones. Spatial receptive fields of the same units were measured with a free field sound source. Auditory units in the optic tectum are sharply tuned for both the azimuth and the elevation of a free field sound source. To determine the binaural cues that could be responsible for this spatial tuning, we measured in the ear canals the amplitude and phase spectra produced by a free field noise source and calculated from these measurements the interaural differences in time and intensity associated with each of 178 locations throughout the frontal hemisphere. For all frequencies, interaural time differences (ITDs) varied systematically and most strongly with source azimuth. The pattern of variation of interaural intensity differences (IIDs) depended on frequency. For low frequencies (below 4 kHz) IID varied primarily with source azimuth, whereas for high frequencies (above 5 kHz) IID varied primarily with source elevation. Tectal units were tuned for interaural differences in both time and intensity of dichotic stimuli. Changing either parameter away from the best value for the unit decreased the unit's response. The tuning of units to either parameter was sharp: the width of ITD tuning curves, measured at 50% of the maximum response with IID held constant (50% tuning width), ranged from 18 to 82 microsecs. The 50% tuning widths of IID tuning curves, measured with ITD held constant, ranged from 8 to 37 dB. For most units, tuning for ITD was largely independent of IID, and vice versa. A few units exhibited systematic shifts of the best ITD with changes in IID (or shifts of the best IID with changes in ITD); for these units, a change in the value of one parameter to favor one ear shifted the best value of the other parameter in favor of the same ear, i.e., in the direction opposite to that expected from "time-intensity trading." Overall sound intensity had little or no effect on ITD tuning, but did increase the best IIDs of units tuned to nonzero IIDs. The tuning of units for ITD and IID changed systematically along different dimensions of the optic tectum to create coextensive, independent neurophysiological maps of ITD and IID.(ABSTRACT TRUNCATED AT 400 WORDS)
Book
How can we engineer systems capable of “cocktail party” listening? Human listeners are able to perceptually segregate one sound source from an acoustic mixture, such as a single voice from a mixture of other voices and music at a busy cocktail party. How can we engineer “machine listening” systems that achieve this perceptual feat? Albert Bregmans book Auditory Scene Analysis, published in 1990, drew an analogy between the perception of auditory scenes and visual scenes, and described a coherent framework for understanding the perceptual organization of sound. His account has stimulated much interest in computational studies of hearing. Such studies are motivated in part by the demand for practical sound separation systems, which have many applications including noiserobust automatic speech recognition, hearing prostheses, and automatic music transcription. This emerging field has become known as computational auditory scene analysis (CASA). Computational Auditory Scene Analysis: Principles, Algorithms, and Applications provides a comprehensive and coherent account of the state of the art in CASA, in terms of the underlying principles, the algorithms and system architectures that are employed, and the potential applications of this exciting new technology. With a Foreword by Bregman, its chapters are written by leading researchers and cover a wide range of topics including: Estimation of multiple fundamental frequenciesFeaturebased and modelbased approaches to CASASound separation based on spatial locationProcessing for reverberant environmentsSegregation of speech and musical signalsAutomatic speech recognition in noisy environmentsNeural and perceptual modeling of auditory organizationThe text is written at a level that will be accessible to graduate students and researchers from related science and engineering disciplines. The extensive bibliography accompanying each chapter will also make this book a valuable reference source. A web site accompanying the text, http://www.casabook.org, features software tools and sound demonstrations. © 2006 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Article
1. We studied the sensitivity of cells in the medial superior olive (MSO) of the anesthetized cat to variations in interaural phase differences (IPDs) of low-frequency tones and in interaural time differences (ITDs) of tones and broad-band noise signals. Our sample consisted of 39 cells histologically localized to the MSO. 2. All but one of the cells had characteristic frequencies less than 3 kHz, and 79% were sensitive to ITDs and IPDs. More than one-half (56%) of the cells responded to monaural stimulation of either ear, and both the binaural and monaural responses were highly phase locked. All of the cells that were sensitive to IPDs and monaurally driven by either ear responded in accord with that predicted by the coincidence model of Jeffress, as judged by comparisons of the phases at which the monaural and binaural responses occurred. The optimal IPDs were tightly clustered between 0.0 and 0.2 cycles. Most cells exhibited facilitation of the response at favorable ITDs and inhibition at unfavorable ITDs compared with the monaural responses. 3. Cells in the MSO exhibited characteristic delay, as judged by a linear relationship between the mean interaural phase and stimulating frequency. Characteristic phases were clustered near 0 indicating the most cells responded maximally when the two input tones were in phase. With the use of the binaural beat stimulus we found no differential selectivity for either the direction or speed of interaural phase changes. 4. The cells were also sensitive to ITDs of broad-band noise signals. The ITD curve in response to broad-band noise was similar to that predicted by the composite curve, which was calculated by linearly summating the tonal responses over the frequencies in the response area of the cell. Most (93%) of the peaks of the composite curves were between 0 and +400 microseconds, corresponding to locations in the contralateral sound field. Moreover, computer cross correlations of the monaural spike trains were similar to the ITD curve generated binaurally for both correlated and uncorrelated noise signals to the two ears. Thus our data suggest that the cells in the MSO behave much like cross-correlators. 5. By combining data from different animals and lcoating each cell on a standard MSO, we found evidence for a spatial map of ITDs across the anterior-posterior (A-P) axis of the MSO.(ABSTRACT TRUNCATED AT 400 WORDS)
Chapter
In this paper, we describe a novel sound source localization sensor, mimicking excellent auditory mechanisms of barn owls. It has two microphones locating with a distance in the azimuth axis and having different sensitivity in the elevation axis. Orientation information of a sound source which are encoded in interaural intensity difference (IID) and interaural time difference (ITD) are decoded by an optimum, neuro-morphic signal processing algorithm. We theoretically obtain optimal mechanisms (outer ear shape) and algorithms. They show close resemblance with the auditory systems of the barn owl. We show a fabrication result and several experiments for evaluating the spatial and temporal resolution of it.