Content uploaded by Monzilur Rahman
Author content
All content in this area was uploaded by Monzilur Rahman on Oct 30, 2020
Content may be subject to copyright.
Simple spectral transformations capture auditory
input to cortex
Monzilur Rahman, Ben D. B. Willmore, Andrew J. King, Nicol S. Harper
Department of Physiology, Anatomy and Genetics, University of Oxford, Sherrington Building,
Sherrington Road, Oxford OX1 3PT, UK
Presenting author’s email: monzilur.rahman@dpag.ox.ac.uk
M Rahman, BDB Willmore, AJ King, NS Harper (2020) Simple transformations capture
auditory input to cortex, PNAS. https://doi.org/10.1073/pnas.1922033117
Motivation
Sensory systems, from the sense organs up through the neural pathway, are typically very complex,
comprising many different structures and cell types that often interact in a non-linear fashion. The complexity
of these dynamic systems can make understanding their computations challenging. However, much of this
physiological complexity may reflect biological constraints or come into play only under unusual conditions.
Consequently, it could be that the signal transformations that they commonly compute are substantially
simpler than their physical implementations. Taking the auditory system as an example, we aimed to
empirically determine the computational transformation of auditory signals through the ear to the cortex. To
understand this transformation, we appended various models of the auditory periphery to neural encoding
models to predict auditory cortical responses to diverse sounds.
The models that best explain particular physiological characteristics of the auditory periphery may differ from
the ones that best explains the impact of auditory nerve activity on cortical responses to natural sounds. This
is because neuronal responses are transformed through the central auditory pathway to the cortex, and the
periphery may operate differently with natural sounds.
Generating cochleagrams using
various cochlear models.
A. A sound waveform, the input to
a cochlear model.
B.The stages of transformation of
sound through each of the cochlear
models (from left to right):
Biologically-detailed: Wang
Shamma Ru (WSR) model, Lyon
model, Bruce Erfani Zilani (BEZ)
model, Meddis Sumner Steadman
(MSS) model.
Spectrogram-based: spec-log
model, spec-log1plus model, spec-
power model and spec-Hill model.
OME, outer and middle ear; OHC,
outer hair cell; IHC, inner hair cell;
BM, basilar membrane; DRNL,
dual resonance non-linear filter; lin,
linear; nonlin, nonlinear; AN,
auditory nerve; LSR, low
spontaneous rate; MSR, medium
spontaneous rate; HSR, high
spontaneous rate.
C. The output of the cochlear
models, the cochleagram. The
example shown here is a 3 s
excerpt of a sound of a wolf
howling by a waterfall.
Cochleagram produced by each cochlear model for identical inputs.
A. Each column is a different stimulus: a click, 1 kHz pure tone, 10 kHz pure tone, white noise, a natural sound –
a 100 ms clip of human speech and a 5 s clip of the same natural sound (from left to right).
B. Each row is a different cochlear model.
Predicting the neural
responses to natural sounds
and estimating spectro-
temporal receptive fields.
A. The encoding scheme: pre-
processing by cochlear models
to produce a cochleagram (in
this case, with 16 frequency
channels) followed by the
linear (L)-nonlinear (N)
encoding model. The
parameters of the linear stage
(the weight matrix) are
commonly referred to as the
spectro-temporal receptive
field (STRF) of the neuron.
Note how the choice of
cochlear model influences
estimation of the parameters of
both the L and N stages of the
encoding scheme and, in turn,
prediction of neural responses
by the model.
B. The STRF of an example
neuron from natural sound
dataset 1, estimated by using
different cochlear models.
Each row is for a cochlear
model and each column is the
number of frequency channels.
Performance of different cochlear models in predicting
neural responses of natural sound dataset 1.
A-H. Each gray dot represents the CCnorm between a
neuron’s recorded response and the prediction by the
model; the larger black dot represents the mean value
across neurons and the error bars are standard error of the
mean.
I. Comparison of all models. Color coding of the lines
matches the other panels.
Multi-fiber and multi-threshold cochlear models.
A. Cochleagram of a natural sound clip produced by the
MSS model (left) and the Spec-Hill model (right).
B. Cochleagram of the same natural sound clip
produced by the multi-fiber MSS model (left) and the
multi-threshold Spec-Hill model (right).
C. Mean CCnorm for predicting the responses of all 73
cortical neurons in natural sound dataset 1 for the multi-
fiber/threshold models and their single-fiber/threshold
equivalents.
D. STRFs of an example neuron from natural sound
dataset 1, when estimated using the multi-fiber and
multi-threshold models.
Performance of different cochlear models across datasets and encoding models.
A,B. Mean CCnorm between the LN encoding model prediction and data for neurons in natural sound dataset 2
(awake ferrets) for single fiber models (A) and for multi-fiber models (B).
C,D. Mean CCnorm between the LN encoding model prediction and data for neurons in the DRC dataset
(anesthetized ferrets) for single fiber models (C) and for multi-fiber models (D).
E,F. Mean CCnorm between the prediction of the NRF model and data for neurons in natural sound dataset 1
(anesthetized ferrets) for single fiber models (E) and for multi-fiber models (F).
G,H. Mean CCnorm between the prediction of the NRF model and data for neurons in natural sound dataset 2 for
single fiber models (G) and for multi-fiber models (H).
I,J. Mean CCnorm between the prediction of the NRF model and data for neurons in DRC dataset for single fiber
models (I) and for multi-fiber models (J).
Main findings of the work
We considered a range of existing biologically-detailed models of the auditory periphery, and adapted them to provide
input for a number of encoding models of cortical responses. We also constructed a variety of simple spectrogram-
based models, including a novel one accounting for the different types of auditory nerve fiber. Surprisingly, we found
that the responses of neurons in the primary auditory cortex (A1) in ferrets can be explained equally well using the
simple spectrogram-based cochlear models (spec-log, spec-power, spec-Hill), as when more complex biologically-
detailed cochlear models are used. Furthermore, these simple models explain the cortical responses more consistently
well over different sound types and anesthetic states. Hence, much of the complexity present in auditory peripheral
processing may not substantially impact cortical responses. This suggests that the intricate complexity of the cochlea
and the central auditory pathway together results in a simpler than expected transformation of auditory inputs from ear
to cortex.
M Rahman, BDB Willmore, AJ King, NS Harper (2020) Simple transformations capture auditory input to
cortex, PNAS. https://doi.org/10.1073/pnas.1922033117
Interaction between compressive cochlear non-linearities and LN-model output non-linearities. Average
CCnorm of spectrogram-based models (spec-lin is the spectrogram-based model without any compressive
cochlear non-linearity) for an LN encoding model with (LN) and without (L) the output non-linearity.
CCnorm on A. natural sound dataset 1. B. natural sound dataset 2. C. the DRC dataset.
References
T. Chi, P. Ru, S. A. Shamma, Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906
(2005)
R. F. Lyon, Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function. J. Acoust.
Soc. Am. 130, 3893–3904 (2011).
I. C. Bruce, Y. Erfani, M. S. A. Zilany, A phenomenological model of the synapse between the inner hair cell and auditory nerve:
Implications of limited neurotransmitter release sites. Hear. Res. 360, 40–54 (2018).
M. A. Steadman, C. J. Sumner, Changes in neuronal representations of consonants in the ascending auditory system and their
role in speech recognition. Front. Neurosci. 12, 671 (2018).