PosterPDF Available

Simple spectral transformations capture auditory input to cortex

Authors:

Abstract

This is a conference poster of our paper on "Simple transformations capture auditory input to cortex". Here we explored both biologically-detailed and very simple models of the auditory periphery to find the appropriate input to encoding models of auditory cortical responses to natural sounds. For three different stimulus sets, we tested the capacity of a wide range of models, from the mechanistic to the phenomenological, to predict the time-course of single-unit neural responses recorded in the ferret primary auditory cortex. We found that the complex properties of the auditory periphery and brainstem may together result in a simpler than expected functional transformation to provide input to the auditory cortex. Please check out the poster and associated paper published in PNAS for more details ...
Simple spectral transformations capture auditory
input to cortex
Monzilur Rahman, Ben D. B. Willmore, Andrew J. King, Nicol S. Harper
Department of Physiology, Anatomy and Genetics, University of Oxford, Sherrington Building,
Sherrington Road, Oxford OX1 3PT, UK
Presenting author’s email: monzilur.rahman@dpag.ox.ac.uk
M Rahman, BDB Willmore, AJ King, NS Harper (2020) Simple transformations capture
auditory input to cortex, PNAS. https://doi.org/10.1073/pnas.1922033117
Motivation
Sensory systems, from the sense organs up through the neural pathway, are typically very complex,
comprising many different structures and cell types that often interact in a non-linear fashion. The complexity
of these dynamic systems can make understanding their computations challenging. However, much of this
physiological complexity may reflect biological constraints or come into play only under unusual conditions.
Consequently, it could be that the signal transformations that they commonly compute are substantially
simpler than their physical implementations. Taking the auditory system as an example, we aimed to
empirically determine the computational transformation of auditory signals through the ear to the cortex. To
understand this transformation, we appended various models of the auditory periphery to neural encoding
models to predict auditory cortical responses to diverse sounds.
The models that best explain particular physiological characteristics of the auditory periphery may differ from
the ones that best explains the impact of auditory nerve activity on cortical responses to natural sounds. This
is because neuronal responses are transformed through the central auditory pathway to the cortex, and the
periphery may operate differently with natural sounds.
Generating cochleagrams using
various cochlear models.
A. A sound waveform, the input to
a cochlear model.
B.The stages of transformation of
sound through each of the cochlear
models (from left to right):
Biologically-detailed: Wang
Shamma Ru (WSR) model, Lyon
model, Bruce Erfani Zilani (BEZ)
model, Meddis Sumner Steadman
(MSS) model.
Spectrogram-based: spec-log
model, spec-log1plus model, spec-
power model and spec-Hill model.
OME, outer and middle ear; OHC,
outer hair cell; IHC, inner hair cell;
BM, basilar membrane; DRNL,
dual resonance non-linear filter; lin,
linear; nonlin, nonlinear; AN,
auditory nerve; LSR, low
spontaneous rate; MSR, medium
spontaneous rate; HSR, high
spontaneous rate.
C. The output of the cochlear
models, the cochleagram. The
example shown here is a 3 s
excerpt of a sound of a wolf
howling by a waterfall.
Cochleagram produced by each cochlear model for identical inputs.
A. Each column is a different stimulus: a click, 1 kHz pure tone, 10 kHz pure tone, white noise, a natural sound
a 100 ms clip of human speech and a 5 s clip of the same natural sound (from left to right).
B. Each row is a different cochlear model.
Predicting the neural
responses to natural sounds
and estimating spectro-
temporal receptive fields.
A. The encoding scheme: pre-
processing by cochlear models
to produce a cochleagram (in
this case, with 16 frequency
channels) followed by the
linear (L)-nonlinear (N)
encoding model. The
parameters of the linear stage
(the weight matrix) are
commonly referred to as the
spectro-temporal receptive
field (STRF) of the neuron.
Note how the choice of
cochlear model influences
estimation of the parameters of
both the L and N stages of the
encoding scheme and, in turn,
prediction of neural responses
by the model.
B. The STRF of an example
neuron from natural sound
dataset 1, estimated by using
different cochlear models.
Each row is for a cochlear
model and each column is the
number of frequency channels.
Performance of different cochlear models in predicting
neural responses of natural sound dataset 1.
A-H. Each gray dot represents the CCnorm between a
neuron’s recorded response and the prediction by the
model; the larger black dot represents the mean value
across neurons and the error bars are standard error of the
mean.
I. Comparison of all models. Color coding of the lines
matches the other panels.
Multi-fiber and multi-threshold cochlear models.
A. Cochleagram of a natural sound clip produced by the
MSS model (left) and the Spec-Hill model (right).
B. Cochleagram of the same natural sound clip
produced by the multi-fiber MSS model (left) and the
multi-threshold Spec-Hill model (right).
C. Mean CCnorm for predicting the responses of all 73
cortical neurons in natural sound dataset 1 for the multi-
fiber/threshold models and their single-fiber/threshold
equivalents.
D. STRFs of an example neuron from natural sound
dataset 1, when estimated using the multi-fiber and
multi-threshold models.
Performance of different cochlear models across datasets and encoding models.
A,B. Mean CCnorm between the LN encoding model prediction and data for neurons in natural sound dataset 2
(awake ferrets) for single fiber models (A) and for multi-fiber models (B).
C,D. Mean CCnorm between the LN encoding model prediction and data for neurons in the DRC dataset
(anesthetized ferrets) for single fiber models (C) and for multi-fiber models (D).
E,F. Mean CCnorm between the prediction of the NRF model and data for neurons in natural sound dataset 1
(anesthetized ferrets) for single fiber models (E) and for multi-fiber models (F).
G,H. Mean CCnorm between the prediction of the NRF model and data for neurons in natural sound dataset 2 for
single fiber models (G) and for multi-fiber models (H).
I,J. Mean CCnorm between the prediction of the NRF model and data for neurons in DRC dataset for single fiber
models (I) and for multi-fiber models (J).
Main findings of the work
We considered a range of existing biologically-detailed models of the auditory periphery, and adapted them to provide
input for a number of encoding models of cortical responses. We also constructed a variety of simple spectrogram-
based models, including a novel one accounting for the different types of auditory nerve fiber. Surprisingly, we found
that the responses of neurons in the primary auditory cortex (A1) in ferrets can be explained equally well using the
simple spectrogram-based cochlear models (spec-log, spec-power, spec-Hill), as when more complex biologically-
detailed cochlear models are used. Furthermore, these simple models explain the cortical responses more consistently
well over different sound types and anesthetic states. Hence, much of the complexity present in auditory peripheral
processing may not substantially impact cortical responses. This suggests that the intricate complexity of the cochlea
and the central auditory pathway together results in a simpler than expected transformation of auditory inputs from ear
to cortex.
M Rahman, BDB Willmore, AJ King, NS Harper (2020) Simple transformations capture auditory input to
cortex, PNAS. https://doi.org/10.1073/pnas.1922033117
Interaction between compressive cochlear non-linearities and LN-model output non-linearities. Average
CCnorm of spectrogram-based models (spec-lin is the spectrogram-based model without any compressive
cochlear non-linearity) for an LN encoding model with (LN) and without (L) the output non-linearity.
CCnorm on A. natural sound dataset 1. B. natural sound dataset 2. C. the DRC dataset.
References
T. Chi, P. Ru, S. A. Shamma, Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887906
(2005)
R. F. Lyon, Cascades of two-poletwo-zero asymmetric resonators are good models of peripheral auditory function. J. Acoust.
Soc. Am. 130, 38933904 (2011).
I. C. Bruce, Y. Erfani, M. S. A. Zilany, A phenomenological model of the synapse between the inner hair cell and auditory nerve:
Implications of limited neurotransmitter release sites. Hear. Res. 360, 4054 (2018).
M. A. Steadman, C. J. Sumner, Changes in neuronal representations of consonants in the ascending auditory system and their
role in speech recognition. Front. Neurosci. 12, 671 (2018).
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
A fundamental task of the ascending auditory system is to produce representations that facilitate the recognition of complex sounds. This is particularly challenging in the context of acoustic variability, such as that between different talkers producing the same phoneme. These representations are transformed as information is propagated throughout the ascending auditory system from the inner ear to the auditory cortex (AI). Investigating these transformations and their role in speech recognition is key to understanding hearing impairment and the development of future clinical interventions. Here, we obtained neural responses to an extensive set of natural vowel-consonant-vowel phoneme sequences, each produced by multiple talkers, in three stages of the auditory processing pathway. Auditory nerve (AN) representations were simulated using a model of the peripheral auditory system and extracellular neuronal activity was recorded in the inferior colliculus (IC) and primary auditory cortex (AI) of anaesthetized guinea pigs. A classifier was developed to examine the efficacy of these representations for recognizing the speech sounds. Individual neurons convey progressively less information from AN to AI. Nonetheless, at the population level, representations are sufficiently rich to facilitate recognition of consonants with a high degree of accuracy at all stages indicating a progression from a dense, redundant representation to a sparse, distributed one. We examined the timescale of the neural code for consonant recognition and found that optimal timescales increase throughout the ascending auditory system from a few milliseconds in the periphery to several tens of milliseconds in the cortex. Despite these longer timescales, we found little evidence to suggest that representations up to the level of AI become increasingly invariant to across-talker differences. Instead, our results support the idea that the role of the subcortical auditory system is one of dimensionality expansion, which could provide a basis for flexible classification of arbitrary speech sounds.
Article
Full-text available
A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound. Simplified, more specifically tailored versions of this model have already been validated by successful application in the assessment of speech intelligibility [Elhilali et al., Speech Commun. 41(2-3), 331-348 (2003); Chi et al., J. Acoust. Soc. Am. 106, 2719-2732 (1999)] and in explaining the perception of monaural phase sensitivity [R. Carlyon and S. Shamma, J. Acoust. Soc. Am. 114, 333-348 (2003)]. Here we provide a more complete mathematical formulation of the model, illustrating how complex signals are transformed through various stages of the model, and relating it to comparable existing models of auditory processing. Furthermore, we outline several reconstruction algorithms to resynthesize the sound from the model output so as to evaluate the fidelity of the representation and contribution of different features and cues to the sound percept.
Article
Peterson and Heil [Hear. Res., In Press] have argued that the statistics of spontaneous spiking in auditory nerve fibers (ANFs) can be best explained by a model with a limited number of synaptic vesicle docking (release) sites (∼4) and a relatively-long average redocking time (∼16-17 ms) for each of the sites. In this paper we demonstrate how their model can be: i) generalized to also describe sound-driven ANF responses and ii) incorporated into a well-established and widely-used model of the entire auditory periphery [Zilany et al., J. Acoust. Soc. Am. 135, 283-286, 2014]. The responses of the new model exhibit substantial improvement in several measures of ANF spiking statistics, and predicted physiological forward-masking and rate-level functions from the new model structure are shown to also better match published physiological data.
Article
A cascade of two-pole-two-zero filter stages is a good model of the auditory periphery in two distinct ways. First, in the form of the pole-zero filter cascade, it acts as an auditory filter model that provides an excellent fit to data on human detection of tones in masking noise, with fewer fitting parameters than previously reported filter models such as the roex and gammachirp models. Second, when extended to the form of the cascade of asymmetric resonators with fast-acting compression, it serves as an efficient front-end filterbank for machine-hearing applications, including dynamic nonlinear effects such as fast wide-dynamic-range compression. In their underlying linear approximations, these filters are described by their poles and zeros, that is, by rational transfer functions, which makes them simple to implement in analog or digital domains. Other advantages in these models derive from the close connection of the filter-cascade architecture to wave propagation in the cochlea. These models also reflect the automatic-gain-control function of the auditory system and can maintain approximately constant impulse-response zero-crossing times as the level-dependent parameters change.