Article

Auditory Physiology: Cortical Assistance for the Auditory Signals-to-Symbols Transformation

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

How the temporal information that is crucial for understanding speech and music is processed in the brain is poorly understood, but a new study shows how the auditory cortex is tuned to the spectro-temporal acoustic features characteristic of natural biological sounds.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
This study employs a voice-transformation to overcome the limitations of brain mapping to study brain representations of natural sounds such as speech. Brain mapping studies of natural sound representations, which present a fixed sound to many neurons with different acoustic frequency selectivity, are difficult to interpret because individual neurons exhibit considerable unexplained variability in the dynamical aspects of their evoked responses. This new approach samples how a single recording responds to an ensemble of sounds, instead of sampling an ensemble of neuronal recordings. A noise excited filter-bank analysis and resynthesis vocoder systematically shifts the frequency band occupied by sounds in the ensemble. The quality of the voice transformation is assessed by evaluating the number of bands the filter bank must have to support emotional prosody identification. Perceptual data show that emotional prosody can be recognized within normal limits if the bandwidth of filter-bank channels is less than or equal to the bandwidth of perceptual auditory filters. Example physiological data show that stationary linear transfer functions cannot fully explain the responses of central auditory neurons to speech sounds, and that deviations from model predictions are not random. They may be related to acoustic or articulatory features of speech.
Article
Full-text available
THE power spectrum S(f), of many fluctuating physical variables, V(t), is `1/f-like', varying as f-γ (0.5<~γ<~1.5), over many decades of frequency (see ref. 1 for review). We have found that loudness fluctuations in music and speech, and pitch (melody) fluctuations in music exhibit 1/f power spectra.
Conference Paper
Full-text available
In order to process incoming sounds efficiently, it is advantageous for the auditory system to be adapted to the statistical structure of natural auditory scenes- As a first step in investigating the relation between the system and its inputs, we study low-order statistical properties in several sound ensembles using a filter bank analysis. Focusing on the amplitude and phase in different frequency bands, we find simple parametric descriptions for their distribution and power spectrum that are valid for very different types of sounds. In particular, the amplitude distribution has an exponential tail and its power spectrum exhibits a modified power-law behavior, which is manifested by self-similarity and long-range temporal correlations. Furthermore, the statistics for different bands within a given ensemble are virtually identical, suggesting translation in-variance along the cochlear ctxis. These results show that natural sounds are highly redundant, and have possible implications to the neural code used by the auditory system.
Article
Full-text available
The representation of sound information in the central nervous system relies on the analysis of time-varying features in communication and other environmental sounds. How are auditory physiologists and theoreticians to choose an appropriate method for characterizing spectral and temporal acoustic feature representations in single neurons and neural populations? A brief survey of currently available scientific methods and their potential usefulness is given, with a focus on the strengths and weaknesses of using noise analysis techniques for approximating spectrotemporal response fields (STRFs). Noise analysis has been used to foster several conceptual advances in describing neural acoustic feature representation in a variety of species and auditory nuclei. STRFs have been used to quantitatively assess spectral and temporal transformations across mutually connected auditory nuclei, to identify neuronal interactions between spectral and temporal sound dimensions, and to compare linear vs. nonlinear response properties through state-dependent comparisons. We propose that noise analysis techniques used in combination with novel stimulus paradigms and parametric experiment designs will provide powerful means of exploring acoustic feature representations in the central nervous system.
Article
Full-text available
Although single units in primary auditory cortex (A1) exhibit accurate timing in their phasic response to the onset of sound (precision of a few milliseconds), paradoxically, they are unable to sustain synchronized responses to repeated stimuli at rates much beyond 20 Hz. To explore the relationship between these two aspects of cortical response, we designed a broadband stimulus with a slowly modulated spectrotemporal envelope riding on top of a rapidly modulated waveform (or fine structure). Using this stimulus, we quantified the ability of cortical cells to encode independently and simultaneously the stimulus envelope and fine structure. Specifically, by reverse-correlating unit responses with these two stimulus dimensions, we measured the spectrotemporal response fields (STRFs) associated with the processing of the envelope, the fine structure, and the complete stimulus. A1 cells respond well to the slow spectrotemporal envelopes and produce a wide variety of STRFs. In over 70% of cases, A1 units also track the fine-structure modulations precisely, throughout the stimulus, and for frequencies up to several hundred Hertz. Such a dual response, however, is contingent on the cell being driven by both fast and slow modulations, in that the response to the slowly modulated envelope gates the expression of the fine structure. We also demonstrate that either a simplified model of synaptic depression and facilitation, and/or a cortical network of thalamic excitation and cortical inhibition can account for major trends in the observed findings. Finally, we discuss the potential functional significance and perceptual relevance of these coexistent, complementary dynamic response modes.
Article
Full-text available
A phenomenological model with time-varying excitation and inhibition was developed to study possible neural mechanisms underlying changes in the representation of temporal envelopes along the auditory pathway. A modified version of an existing auditory-nerve model [Zhang et al., J. Acoust. Soc. Am. 109, 648-670 (2001)] was used to provide inputs to higher hypothetical processing centers. Model responses were compared directly to published physiological data at three levels: the auditory nerve, ventral cochlear nucleus, and inferior colliculus. Trends and absolute values of both average firing rate and synchrony to the modulation period were accurately predicted at each level for a wide range of stimulus modulation depths and modulation frequencies. The diversity of central physiological responses was accounted for with realistic variations of model parameters. Specifically, enhanced synchrony in the cochlear nucleus and rate-tuning to modulation frequency in the inferior colliculus were predicted by choosing appropriate relative strengths and time courses of excitatory and inhibitory inputs to postsynaptic model cells. The proposed model is fundamentally different than others that have been used to explain the representation of envelopes in the mammalian midbrain, and it provides a computational tool for testing hypothesized relationships between physiology and psychophysics.
Article
Full-text available
Neurons in primary auditory cortex (A1) of cats show strong stimulus-specific adaptation (SSA). In probabilistic settings, in which one stimulus is common and another is rare, responses to common sounds adapt more strongly than responses to rare sounds. This SSA could be a correlate of auditory sensory memory at the level of single A1 neurons. Here we studied adaptation in A1 neurons, using three different probabilistic designs. We showed that SSA has several time scales concurrently, spanning many orders of magnitude, from hundreds of milliseconds to tens of seconds. Similar time scales are known for the auditory memory span of humans, as measured both psychophysically and using evoked potentials. A simple model, with linear dependence on both short-term and long-term stimulus history, provided a good fit to A1 responses. Auditory thalamus neurons did not show SSA, and their responses were poorly fitted by the same model. In addition, SSA increased the proportion of failures in the responses of A1 neurons to the adapting stimulus. Finally, SSA caused a bias in the neuronal responses to unbiased stimuli, enhancing the responses to eccentric stimuli. Therefore, we propose that a major function of SSA in A1 neurons is to encode auditory sensory memory on multiple time scales. This SSA might play a role in stream segregation and in binding of auditory objects over many time scales, a property that is crucial for processing of natural auditory scenes in cats and of speech and music in humans.
Article
Full-text available
Response properties of primary auditory cortical neurons in the adult common marmoset monkey (Callithrix jacchus) were modified by extensive exposure to altered vocalizations that were self-generated and rehearsed frequently. A laryngeal apparatus modification procedure permanently lowered the frequency content of the native twitter call, a complex communication vocalization consisting of a series of frequency modulation (FM) sweeps. Monkeys vocalized shortly after this procedure and maintained voicing efforts until physiological evaluation 5-15 months later. The altered twitter calls improved over time, with FM sweeps approaching but never reaching the normal spectral range. Neurons with characteristic frequencies <4.3 kHz that had been weakly activated by native twitter calls were recruited to encode self-uttered altered twitter vocalizations. These neurons showed a decrease in response magnitude and an increase in temporal dispersion of response timing to twitter call and parametric FM stimuli but a normal response profile to pure tone stimuli. Tonotopic maps in voice-modified monkeys were not distorted. These findings suggest a previously unrecognized form of cortical plasticity that is specific to higher-order processes involved in the discrimination of more complex sounds, such as species-specific vocalizations.
Article
Full-text available
A sound embedded in an acoustic stream cannot be unambiguously segmented and identified without reference to its stimulus context. To understand the role of stimulus context in cortical processing, we investigated the responses of auditory cortical neurons to 2-sound sequences in awake marmosets, with a focus on stimulus properties other than carrier frequency. Both suppressive and facilitatory modulations of cortical responses were observed by using combinations of modulated tone and noise stimuli. The main findings are as follows. 1) Preceding stimuli could suppress or facilitate responses to succeeding stimuli for durations >1 s. These long-lasting effects were dependent on the duration, sound level, and modulation parameters of the preceding stimulus, in addition to the carrier frequency. They occurred regardless of whether the 2 stimuli were separated by a silent interval. 2) Suppression was often tuned such that preceding stimuli whose parameters were similar to succeeding stimuli produced the strongest suppression. However, the responses of many units could be suppressed, although often weaker, even when the 2 stimuli were dissimilar. In some cases, only a dissimilar preceding stimulus produced suppression in the responses to the succeeding stimulus. 3) In contrast to suppression, facilitation of responses to succeeding stimuli by the preceding stimulus was usually strongest when the 2 stimuli were dissimilar. 4) There was no clear correlation between the firing rate evoked by the preceding stimulus and the change in the firing rate evoked by the succeeding stimulus, indicating that the observed suppression was not simply a result of habituation or spike adaptation. These results demonstrate that persistent modulations of the responses of an auditory cortical neuron to a given stimulus can be induced by preceding stimuli. Decreases or increases of responses to the succeeding stimuli are dependent on the spectral, temporal, and intensity properties of the preceding stimulus. This indicates that cortical auditory responses to a sound are not static, but instead depend on the stimulus context in a stimulus-specific manner. The long-lasting impact of stimulus context and the prevalence of facilitation suggest that such cortical response properties are important for auditory processing beyond forward masking, such as for auditory streaming and segregation.
Article
Full-text available
Functional dissociations within the neural basis of auditory sentence processing are difficult to specify because phonological, syntactic and semantic information are all involved when sentences are perceived. In this review I argue that sentence processing is supported by a temporo-frontal network. Within this network, temporal regions subserve aspects of identification and frontal regions the building of syntactic and semantic relations. Temporal analyses of brain activation within this network support syntax-first models because they reveal that building of syntactic structure precedes semantic processes and that these interact only during a later stage.
Article
Full-text available
Auditory experience leads to myriad changes in processing in the central auditory system. We recently described task-related plasticity characterized by rapid modulation of spectro-temporal receptive fields (STRFs) in ferret primary auditory cortex (A1) during tone detection. We conjectured that each acoustic task may have its own "signature" STRF changes, dependent on the salient cues that the animal must attend to perform the task. To discover whether other acoustic tasks could elicit changes in STRF shape, we recorded from A1 in ferrets also trained on a frequency discrimination task. Overall, we found a distinct pattern of STRF change, characterized by an expected selective enhancement at target tone frequency but also by an equally selective depression at reference tone frequency. When single-tone detection and frequency discrimination tasks were performed sequentially, neurons responded differentially to identical tones, reflecting distinct predictive values of stimuli in the two behavioral contexts. All results were observed in multiunit as well as single-unit recordings. Our findings provide additional evidence for the presence of adaptive neuronal responses in A1 that can swiftly change to reflect both sensory content and the changing behavioral meaning of incoming acoustic stimuli.
Article
The representation of voice onset time (VOT) for 197 single units in cat primary auditory cortex was studied for a /ba/-/pa/ continuum in which VOT was varied in 5-ms steps from 0 to 70 ms. The effect of stimulus intensity, characteristic frequency of the neurons, and age of the animals was investigated. The minimum VOT represented in onset responses to both the voiceless and voiced parts of the sound (a "double-on" response) was dependent on overall stimulus level. An interaction was found between the efficacy of the burst in evoking neural activity and the size of the subsequent response to the onset of voicing. There was only a minor difference in the mean values for the minimal neural VOT for young (42 ms), juvenile (36 ms), and adult animals (46 ms), albeit that for individual young and juvenile animals more frequently values close to 10-15 ms were found. The cumulative distribution for the adult group showed a relative lack of neural VOTs around 30-40 ms. No other cues in the single unit and local neuronal group firing rate representation of VOT were found that were related to the categorical perception boundary.
Article
Little is known about the mechanisms that allow the cortex to selectively improve the neural representations of behaviorally important stimuli while ignoring irrelevant stimuli. Diffuse neuromodulatory systems may facilitate cortical plasticity by acting as teachers to mark important stimuli. This study demonstrates that episodic electrical stimulation of the nucleus basalis, paired with an auditory stimulus, results in a massive progressive reorganization of the primary auditory cortex in the adult rat. Receptive field sizes can be narrowed, broadened, or left unaltered depending on specific parameters of the acoustic stimulus paired with nucleus basalis activation. This differential plasticity parallels the receptive field remodeling that results from different types of behavioral training. This result suggests that input characteristics may be able to drive appropriate alterations of receptive fields independently of explicit knowledge of the task. These findings also suggest that the basal forebrain plays an active instructional role in representational plasticity.
Article
Responses to various steady-state vowels were recorded in single units in the primary auditory cortex (AI) of the barbiturate-anaesthetized ferret. Six vowels were presented (/a/, /epsilon/, 2 different /i/'s, and 2 different /u/'s) in a natural voiced and a synthetic unvoiced mode. In addition, the responses to broadband stimuli with a sinusoidally shaped spectral envelope (called ripple stimuli) were recorded in each cell, and the response field (RF), which consists of both excitatory and inhibitory regions, was derived from the ripple transfer function. We examined whether the vowel responses could be predicted using a linear ripple analysis method [Shamma et al., Auditory Neurosci. 1, 233-254 (1995)], i.e., by cross correlating the RF of the single unit, and the smoothed spectral envelope of the vowel. We found that for most AI cells (71%) the relative responses to natural vowels could be predicted on the basis of this method. Responses and prediction results for unvoiced and voiced vowels were very similar, suggesting that the spectral fine structure may not play a significant role in the neuron's response to the vowels. Predictions on the basis of the entire RF were significantly better than based solely on best frequency (BF) (or "place"). These findings confirm the ripple analysis method as a valid method to characterize AI responses to broadband sounds as we proposed in a previous paper using synthesized spectra [Shamma and Versnel, Auditory Neurosci. 1, 255-270 (1995)].
Article
Because auditory cortical neurons have limited stimulus-synchronized responses, cortical representations of more rapidly occurring but still perceivable stimuli remain unclear. Here we show that there are two largely distinct populations of neurons in the auditory cortex of awake primates: one with stimulus-synchronized discharges that, with a temporal code, explicitly represented slowly occurring sound sequences and the other with non-stimulus-synchronized discharges that, with a rate code, implicitly represented rapidly occurring events. Furthermore, neurons of both populations displayed selectivity in their discharge rates to temporal features within a short time-window. Our results suggest that the combination of temporal and rate codes in the auditory cortex provides a possible neural basis for the wide perceptual range of temporal information.
Article
Receptive fields have been characterized independently in the lemniscal auditory thalamus and cortex, usually with spectrotemporally simple sounds tailored to a specific task. No studies have employed naturalistic stimuli to investigate the thalamocortical transformation in temporal, spectral, and aural domains simultaneously and under identical conditions. We recorded simultaneously in the ventral division of the medial geniculate body (MGBv) and in primary auditory cortex (AI) of the ketamine-anesthetized cat. Spectrotemporal receptive fields (STRFs) of single units (n = 387) were derived by reverse-correlation with a broadband and dynamically varying stimulus, the dynamic ripple. Spectral integration, as measured by excitatory bandwidth and spectral modulation preference, was similar across both stations (mean Q(1/e) thalamus = 5.8, cortex = 5.4; upper cutoff of spectral modulation transfer function, thalamus = 1.30 cycles/octave, cortex = 1.37 cycles/octave). Temporal modulation rates slowed by a factor of two from thalamus to cortex (mean preferred rate, thalamus = 32.4 Hz, cortex = 16.6 Hz; upper cutoff of temporal modulation transfer function, thalamus = 62.9 Hz, cortex = 37.4 Hz). We found no correlation between spectral and temporal integration properties, suggesting that the excitatory-inhibitory interactions underlying preference in each domain are largely independent. A small number of neurons in each station had highly asymmetric STRFs, evidence of frequency sweep selectivity, but the population showed no directional bias. Binaural preferences differed in their relative proportions, most notably an increased prevalence of excitatory contralateral-only cells in cortex (40%) versus thalamus (23%), indicating a reorganization of this parameter. By comparing simultaneously along multiple stimulus dimensions in both stations, these observations establish the global characteristics of the thalamocortical receptive field transformation.
Article
This paper provides a look at how modulated broad-band noises modulate the thalamic response evoked by brief probe sounds in the awake animal. We demonstrate that noise not only attenuates the response to probe sounds (masking) but also changes the temporal response pattern (scrambling). Two brief probe sounds, a Gaussian noise burst and a brief sinusoidal tone, were presented in silence and in three ongoing noises. The three noises were targeted at activating the auditory system in qualitatively distinct ways. Dynamic ripple noise, containing many random tone-like elements, is targeted at those parts of the auditory system that respond well to tones. International Collegium of Rehabilitative Audiology noise, comprised of the sum of several simultaneous streams of Schroeder-phase speech, is targeted at those parts of the auditory system that respond well to modulated sounds but lack a well defined response to tones. Gaussian noise is targeted at those parts of the auditory system that respond to acoustic energy regardless of modulation. All noises both attenuated and decreased the precise temporal repeatability of the onset response to probe sounds. In addition, the modulated noises induced context-specific changes in the temporal pattern of the response to probe sounds. Scrambling of the temporal response pattern may be a direct neural correlate of the unfortunate experience of being able to hear, but not understand, speech sounds in noisy environments.
Article
For researchers and clinical practitioners alike, evoked and event-related responses measured with MEG and EEG provide the means for studying human brain function and dysfunction. However, the generation mechanism of event-related responses remains unclear, hindering our ability to formulate viable theories of neural information processing. Event-related responses are assumed to be generated either (1) separately of ongoing, oscillatory brain activity or (2) through stimulus-induced reorganization of ongoing activity. Here, we approached this issue through examining single-trial auditory MEG data in humans. We demonstrate that phase coherence over trials observed with commonly used signal decomposition methods (e.g., wavelets) can result from both a phase-coherent state of ongoing oscillations and from the presence of a phase-coherent event-related response which is additive to ongoing oscillations. To avoid this problem, we introduce a method based on amplitude variance to establish the relationship between ongoing oscillations and event-related responses. We found that auditory stimuli do not give rise to phase reorganization of ongoing activity. Further, increases in spectral power accompany the emergence of event-related responses, and the relationship between spectral power and the amplitude of these responses can be accounted for by a linear summation of the event-related response and ongoing oscillation with a stochastically distributed phase. Thus, on the basis of our observations, auditory event-related responses are unique descriptors of neural information processing in humans, generated by processes separate from and additive to ongoing brain activity.
Article
It has been well documented that neurons in the auditory cortex of anaesthetized animals generally display transient responses to acoustic stimulation, and typically respond to a brief stimulus with one or fewer action potentials. The number of action potentials evoked by each stimulus usually does not increase with increasing stimulus duration. Such observations have long puzzled researchers across disciplines and raised serious questions regarding the role of the auditory cortex in encoding ongoing acoustic signals. Contrary to these long-held views, here we show that single neurons in both primary (area A1) and lateral belt areas of the auditory cortex of awake marmoset monkeys (Callithrix jacchus) are capable of firing in a sustained manner over a prolonged period of time, especially when they are driven by their preferred stimuli. In contrast, responses become more transient or phasic when auditory cortex neurons respond to non-preferred stimuli. These findings suggest that when the auditory cortex is stimulated by a sound, a particular population of neurons fire maximally throughout the duration of the sound. Responses of other, less optimally driven neurons fade away quickly after stimulus onset. This results in a selective representation of the sound across both neuronal population and time.
Article
The amplitude and pitch fluctuations of natural soundscapes often exhibit “1/f spectra” [1 • Voss R.F. • Clarke J. 1/f noise in music and speech.Nature. 1975; 258: 317-318 • Crossref • Scopus (376) • Google Scholar , 2 • De Coensel B. • Botterdooren D. • De Muer T. 1/f Noise in rural and urban soundscapes.Acta Acustica. 2003; 89: 287-295 • Google Scholar ], which means that large, abrupt changes in pitch or loudness occur proportionally less frequently in nature than gentle, gradual fluctuations. Furthermore, human listeners reportedly prefer 1/f distributed random melodies to melodies with faster (1/f0) or slower (1/f2) dynamics [3 • Voss R.F. • Clarke J. 1/f noise in music: Music from 1/f noise.J. Acoust. Soc. Am. 1978; 63: 258-263 • Crossref • Scopus (262) • Google Scholar ]. One might therefore suspect that neurons in the central auditory system may be tuned to 1/f dynamics, particularly given that recent reports provide evidence for tuning to 1/f dynamics in primary visual cortex [4 • Yu Y. • Romero R. • Lee T.S. Preference of sensory neural coding for 1/f signals.Phys. Rev. Lett. 2005; 94: 108103(1)-108103(4) • Google Scholar ]. To test whether neurons in primary auditory cortex (A1) are tuned to 1/f dynamics, we recorded responses to random tone complexes in which the fundamental frequency and the envelope were determined by statistically independent “1/fγ random walks,” with γ set to values between 0.5 and 4. Many A1 neurons showed clear evidence of tuning and responded with higher firing rates to stimuli with γ between 1 and 1.5. Response patterns elicited by 1/fγ stimuli were more reproducible for values of γ close to 1. These findings indicate that auditory cortex is indeed tuned to the 1/f dynamics commonly found in the statistical distributions of natural soundscapes.
Article
In order to process incoming sounds efficiently, it is advantageous for the auditory system to be adapted to the statistical structure of natural auditory scenes. As a first step in investigating the relation between the system and its inputs, we study low-order statistical properties in several sound ensembles using a filter bank analysis. Focusing on the amplitude and phase in different frequency bands, we find simple parametric descriptions for their distribution and power spectrum that are valid for very different types of sounds. In particular, the amplitude distribution has an exponential tail and its power spectrum exhibits a modified power-law behavior, which is manifested by self-similarity and long-range temporal correlations. Furthermore, the statistics for different bands within a given ensemble are virtually identical, suggesting translation invariance along the cochlear axis. These results show that natural sounds are highly redundant, and have possible implications to th...
Representation of natural and synthetic vowels in the primary auditory cortex
  • Versnel
Versnel, H., and Shamma, S. (1998). Representation of natural and synthetic vowels in the primary auditory cortex. J. Acoust. Soc. Am. 103, 2502–2514.