Chapter
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

How are environmental sounds relevant to the neurobiology of language? As studied in the 20th century, the purported structure of language and its processing—a human-specific “faculty” characterized by an abstract system of rules governing the hierarchical recombination of symbols encoded by arbitrary sound units—is seemingly unrelated to the recognition and comprehension of environmental sounds. Environmental sounds have often been used as a means of defining what is “language-specific” in the brain. However, as research in both language and environmental sounds has matured, useful parallels between the two domains have emerged, as well as some illustrative differences. In this chapter, we first discuss what environmental sounds are (and are not), and then move through different aspects of environmental sounds research that parallel fields of study in language. We consider, in detail, the behavioral and neuroimaging evidence for how environmental sounds are processed, highlighting the range of perceptual, cross-modal, semantic, and contextual processes involved, and finish by considering how studying environmental sounds informs our understanding of language processing.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To mimic aspects of the environment of a given phenomenon, more complex mappings of psychoacoustics cues are required. Studies suggest that the mental processing of environmental sounds is quite similar to those used for spoken language [11]. This could suggest that since environmental sounds are processed as a form of language then by mimicking these attributes this might make the data more comprehensible to the user. ...
... Considering that sonification is listened to in the same way that speech is [12], this is also comparable to environmental sounds which are processed by the same region in the brain [11]. Language is based on a process of learning how a word associates to a concept in order to make sense to a user. ...
Conference Paper
Full-text available
This study investigated the design and evaluation of a sonification, created for an astronomer who studies exosolar accretion discs. User design methods were applied to sonify data that could allow the classification of accretion discs. The sonification was developed over three stages: a requirements gathering exercise that inquired about the astronomer's work and the data, design and development, as well as an evaluation. Twenty datasets were sonified and analysed. The sonification effectively represented the accretion discs allowing the astronomer to commence a preliminary, comparative classification. Multiple parameter mappings provide rich auditory stimuli. Spatial mapping and movement allow for easier identification of fast changes and peaks in the data which improved the understanding of the extent of these changes.
... The results from exemplar discrimination showed that, despite demonstrating robust statistical representations of sound textures, the older listeners' ability to access the fine temporal details of sound textures was severely limited. Factors such as task complexity, signal quality and loss of high frequency sensitivity, which have been shown to negatively affect older listeners' perception of environmental sounds (Dick et al., 2015;Gygi & Shafiro, 2013), do not explain the findings in this study: the task was simple, the textures were synthesized with the full set of statistics, and the experiments were conducted in quiet with audibility compensation up to 8 kHz. Non-auditory cognitive decline has been linked to deficits in auditory processing tasks (for a review, see Aydelott et al., 2010). ...
Article
Full-text available
Sound textures are a broad class of sounds defined by their homogeneous temporal structure. It has been suggested that sound texture perception is mediated by time-averaged summary statistics measured from early stages of the auditory system. The ability of young normal-hearing (NH) listeners to identify synthetic sound textures increases as the statistics of the synthetic texture approach those of its real-world counterpart. In sound texture discrimination, young NH listeners utilize the fine temporal stimulus information for short-duration stimuli, whereas they switch to a time-averaged statistical representation as the stimulus’ duration increases. The present study investigated how younger and older listeners with a sensorineural hearing impairment perform in the corresponding texture identification and discrimination tasks in which the stimuli were amplified to compensate for the individual listeners’ loss of audibility. In both hearing impaired (HI) listeners and NH controls, sound texture identification performance increased as the number of statistics imposed during the synthesis stage increased, but hearing impairment was accompanied by a significant reduction in overall identification accuracy. Sound texture discrimination performance was measured across listener groups categorized by age and hearing loss. Sound texture discrimination performance was unaffected by hearing loss at all excerpt durations. The older listeners’ sound texture and exemplar discrimination performance decreased for signals of short excerpt duration, with older HI listeners performing better than older NH listeners. The results suggest that the time-averaged statistic representations of sound textures provide listeners with cues which are robust to the effects of age and sensorineural hearing loss.
... Previous ERP studies compared the semantic processing of speech and environmental sounds by combining them with semantically incongruent images, videos or visual narratives (Cummings et al., 2008;Liu et al., 2011;Manfredi et al., 2018;Plante et al., 2000;Puce et al., 2007; see also Dick et al., 2016). In 2011, Liu and colleagues analyzed the integration of natural/non-natural sound and visual information by presenting videos of real-world events with J o u r n a l P r e -p r o o f semantically congruent or incongruent natural sound/speech. ...
Article
Full-text available
To investigate the processing of environmental sounds, previous researchers have compared the semantic processing of words and sounds, yielding mixed results. This study aimed to specifically investigate the electrophysiological mechanism underlying the semantic processing of environmental sounds presented in a naturalistic visual scene. We recorded event-related brain potentials in a group of young adults over the presentation of everyday life actions that were either congruent or incongruent with environmental sounds. Our results showed that incongruent environmental sounds evoked both a P400 and an N400 effect, reflecting sensitivity to physical and semantic violations of environmental sounds’ properties, respectively. In addition, our findings showed an enhanced late positivity in response to incongruous environmental sounds, probably reflecting additional reanalysis costs. In conclusion, these results indicate that the crossmodal processing of the environmental sounds might require the simultaneous involvement of different cognitive processes.
... However, how attentional selection and information extraction interact in audition compared to vision is likely to differ due to fundamental differences between the two sensory systems (for an example of this dissociation in fi lm, see T. Smith 2014). Furthermore, auditory processing of dialogue would involve specifi c linguistic processes (word recognition, semantic role assignment, syntactic processing) (Dick et al. 2016), and their output would be passed along to the back-end. On the other hand, regardless of the information modality, the output of all of these front-end processes could be assumed to be amodal information about the event (i.e., "event indices": people, objects, actions, causal relationships, and characters' goals; Magliano et al. 2013;Zwaan and Radvansky 1997), which is sent to the back-end. ...
Article
Full-text available
This study tested the role of the audio soundtrack in the opening scene of Orson Welles’s Touch of Evil (Orson Welles and Albert Zugsmith, 1958) in supporting a predictive inference that a time bomb will explode, as the filmmakers intended. We designed two experiments and interpreted their results using the Scene Perception and Event Comprehension Theory (SPECT). Across both experiments, viewers watched the scene, we manipulated their knowledge of the bomb, and they made a predictive inference just before the bomb would explode. Experiment 1 found that the likelihood of predicting the explosion decreased when the soundtrack was absent. Experiment 2 showed that individual differences in working memory accounted for variability in generating the prediction when the soundtrack was absent. We explore the implications for filmmaking in general.
... Based on prior research, however, it seems unlikely that strictly isolated speech processing would occur. Not only have cross-modal effects involving rapid interaction of many types of nonlinguistic information with language been documented, but recent research has suggested that words and meaningful non-linguistic stimuli may have more in common in processing than previously thought given the neural resources involved in understanding both (Cummings et al., 2006;Dick, Krishnan, Leech, & Saygin, 2016;Leech & Saygin, 2011;Saygin, 2003;Saygin, Dick, & Bates, 2005). However, there is little research on how environmental sounds are understood, especially in comparison to speech sounds, and few studies directly comparing recognition and understanding of these two classes of sounds under a common contextual constraint. ...
Article
There is debate about how individuals use context to successfully predict and recognize words. One view argues that context supports neural predictions that make use of the speech motor system, whereas other views argue for a sensory or conceptual level of prediction. While environmental sounds can convey clear referential meaning, they are not linguistic signals, and are thus neither produced with the vocal tract nor typically encountered in sentence context. We compared the effect of spoken sentence context on recognition and comprehension of spoken words versus nonspeech, environmental sounds. In Experiment 1, sentence context decreased the amount of signal needed for recognition of spoken words and environmental sounds in similar fashion. In Experiment 2, listeners judged sentence meaning in both high and low contextually constraining sentence frames, when the final word was present or replaced with a matching environmental sound. Results showed that sentence constraint affected decision time similarly for speech and nonspeech, such that high constraint sentences (i.e., frame plus completion) were processed faster than low constraint sentences for speech and nonspeech. Linguistic context facilitates the recognition and understanding of nonspeech sounds in much the same way as for spoken words. This argues against a simple form of a speech-motor explanation of predictive coding in spoken language understanding, and suggests support for conceptual-level predictions.
... Successful perception of environmental sounds and auditory scenes requires integration of low-level sensory information with high-level cognitive representations from memory of distal objects and events [3]. Thus, as auditory stimuli, environmental sounds represent a useful contrast to speech, music and laboratory-generated acoustic stimuli in evaluation of the interaction between the sensory and cognitive aspects of auditory perception [26][27][28][29][30]. ...
Article
Full-text available
Objective Sounds in everyday environments tend to follow one another as events unfold over time. The tacit knowledge of contextual relationships among environmental sounds can influence their perception. We examined the effect of semantic context on the identification of sequences of environmental sounds by adults of varying age and hearing abilities, with an aim to develop a nonspeech test of auditory cognition. Method The familiar environmental sound test (FEST) consisted of 25 individual sounds arranged into ten five-sound sequences: five contextually coherent and five incoherent. After hearing each sequence, listeners identified each sound and arranged them in the presentation order. FEST was administered to young normal-hearing, middle-to-older normal-hearing, and middle-to-older hearing-impaired adults (Experiment 1), and to postlingual cochlear-implant users and young normal-hearing adults tested through vocoder-simulated implants (Experiment 2). Results FEST scores revealed a strong positive effect of semantic context in all listener groups, with young normal-hearing listeners outperforming other groups. FEST scores also correlated with other measures of cognitive ability, and for CI users, with the intelligibility of speech-in-noise. Conclusions Being sensitive to semantic context effects, FEST can serve as a nonspeech test of auditory cognition for diverse listener populations to assess and potentially improve everyday listening skills.
Article
Full-text available
The study of patients with semantic dementia has revealed important insights into the cognitive and neural architecture of semantic memory. Patients with semantic dementia are known to have difficulty understanding the meanings of environmental sounds from an early stage but little is known about their knowledge for famous tunes, which might be preserved in some cases. Patients with semantic dementia (n = 13), Alzheimer's disease (n = 14) as well as matched healthy control participants (n = 20) underwent a battery of tests designed to assess knowledge of famous tunes, environmental sounds and famous faces, as well as volumetric magnetic resonance imaging. As a group, patients with semantic dementia were profoundly impaired in the recognition of everyday environmental sounds and famous tunes with consistent performance across testing modalities, which is suggestive of a central semantic deficit. A few notable individuals (n = 3) with semantic dementia demonstrated clear preservation of knowledge of known melodies and famous people. Defects in auditory semantics were mild in patients with Alzheimer's disease. Voxel-based morphometry of magnetic resonance brain images showed that the recognition of famous tunes correlated with the degree of right anterior temporal lobe atrophy, particularly in the temporal pole. This area was segregated from the region found to be involved in the recognition of everyday sounds but overlapped considerably with the area that was correlated with the recognition of famous faces. The three patients with semantic dementia with sparing of musical knowledge had significantly less atrophy of the right temporal pole in comparison to the other patients in the semantic dementia group. These findings highlight the role of the right temporal pole in the processing of known tunes and faces. Overlap in this region might reflect that having a unique identity is a quality that is common to both melodies and people.
Article
Full-text available
The clinical presentation of patients with semantic dementia is dominated by anomia and poor verbal comprehension. Although a number of researchers have argued that these patients have impaired comprehension of non-verbal as well as verbal stimuli, the evidence for semantic deterioration is mainly derived from tasks that include some form of verbal input or output. Few studies have investigated semantic impairment using entirely non-verbal assessments and the few exceptions have been based on results from single cases ([3]: Breedin SD, Saffran EM, Coslett HB. Reversal of the concreteness effect in a patient with semantic dementia. Cognitive Neuropsychology 1994;11:617–660, [12]: Graham KS, Becker JT, Patterson K, Hodges JR. Lost for words: a case of primary progressive aphasia? In: Parkin A, editor. Case studies in the neuropsychology of memory, East Sussex: Lawrence Erlbaum, 1997. pp. 83–110, [21]: Lambon Ralph MA, Howard D. Gogi aphasia or semantic dementia? Simulating and assessing poor verbal comprehension in a case of progressive fluent aphasia. Cognitive Neuropsychology, (in-press).
Article
Full-text available
In this article we report on listener categorization of meaningful environmental sounds. A starting point for this study was the phenomenological taxonomy proposed by Gaver (1993b). In the first experimental study, 15 participants classified 60 environmental sounds and indicated the properties shared by the sounds in each class. In a second experimental study, 30 participants classified and described 56 sounds exclusively made by solid objects. The participants were required to concentrate on the actions causing the sounds independent of the sound source. The classifications were analyzed with a specific hierarchical cluster technique that accounted for possible cross-classifications, and the verbalizations were submitted to statistical lexical analyses. The results of the first study highlighted 4 main categories of sounds: solids, liquids, gases, and machines. The results of the second study indicated a distinction between discrete interactions (e.g., impacts) and continuous interactions (e.g., tearing) and suggested that actions and objects were not independent organizational principles. We propose a general structure of environmental sound categorization based on the sounds' temporal patterning, which has practical implications for the automatic classification of environmental sounds.
Article
Full-text available
The unique way in which each of us perceives the world must arise from our brain representations. If brain imaging could reveal an individual's unique mental representation, it could help us understand the biological substrate of our individual experiential worlds in mental health and disease. However, imaging studies of object vision have focused on commonalities between individuals rather than individual differences and on category averages rather than representations of particular objects. Here we investigate the individually unique component of brain representations of particular objects with functional MRI (fMRI). Subjects were presented with unfamiliar and personally meaningful object images while we measured their brain activity on two separate days. We characterized the representational geometry by the dissimilarity matrix of activity patterns elicited by particular object images. The representational geometry remained stable across scanning days and was unique in each individual in early visual cortex and human inferior temporal cortex (hIT). The hIT representation predicted perceived similarity as reflected in dissimilarity judgments. Importantly, hIT predicted the individually unique component of the judgments when the objects were personally meaningful. Our results suggest that hIT brain representational idiosyncrasies accessible to fMRI are expressed in an individual's perceptual judgments. The unique way each of us perceives the world thus might reflect the individually unique representation in high-level visual areas.
Article
Full-text available
In this paper we provide normative data along multiple cognitive and affective variable dimensions for a set of 110 sounds, including living and manmade stimuli. Environmental sounds are being increasingly utilized as stimuli in the cognitive, neuropsychological and neuroimaging fields, yet there is no comprehensive set of normative information for these type of stimuli available for use across these experimental domains. Experiment 1 collected data from 162 participants in an on-line questionnaire, which included measures of identification and categorization as well as cognitive and affective variables. A subsequent experiment collected response times to these sounds. Sounds were normalized to the same length (1 second) in order to maximize usage across multiple paradigms and experimental fields. These sounds can be freely downloaded for use, and all response data have also been made available in order that researchers can choose one or many of the cognitive and affective dimensions along which they would like to control their stimuli. Our hope is that the availability of such information will assist researchers in the fields of cognitive and clinical psychology and the neuroimaging community in choosing well-controlled environmental sound stimuli, and allow comparison across multiple studies.
Article
Full-text available
Perceptual training with spectrally degraded environmental sounds results in improved environmental sound identification, with benefits shown to extend to untrained speech perception as well. The present study extended those findings to examine longer-term training effects as well as effects of mere repeated exposure to sounds over time. Participants received two pretests (1 week apart) prior to a week-long environmental sound training regimen, which was followed by two posttest sessions, separated by another week without training. Spectrally degraded stimuli, processed with a four-channel vocoder, consisted of a 160-item environmental sound test, word and sentence tests, and a battery of basic auditory abilities and cognitive tests. Results indicated significant improvements in all speech and environmental sound scores between the initial pretest and the last posttest with performance increments following both exposure and training. For environmental sounds (the stimulus class that was trained), the magnitude of positive change that accompanied training was much greater than that due to exposure alone, with improvement for untrained sounds roughly comparable to the speech benefit from exposure. Additional tests of auditory and cognitive abilities showed that speech and environmental sound performance were differentially correlated with tests of spectral and temporal-fine-structure processing, whereas working memory and executive function were correlated with speech, but not environmental sound perception. These findings indicate generalizability of environmental sound training and provide a basis for implementing environmental sound training programs for cochlear implant (CI) patients.
Article
Full-text available
The human brain is thought to process auditory objects along a hierarchical temporal “what” stream that progressively abstracts object information from the low-level structure (e.g., loudness) as processing proceeds along the middle-to-anterior direction. Empirical demonstrations of abstract object encoding, independent of low-level structure, have relied on speech stimuli, and non-speech studies of object-category encoding (e.g., human vocalizations) often lack a systematic assessment of low-level information (e.g., vocalizations are highly harmonic). It is currently unknown whether abstract encoding constitutes a general functional principle that operates for auditory objects other than speech. We combined multivariate analyses of functional imaging data with an accurate analysis of the low-level acoustical information to examine the abstract encoding of non-speech categories. We observed abstract encoding of the living and human-action sound categories in the fine-grained spatial distribution of activity in the middle-to-posterior temporal cortex (e.g., planum temporale). Abstract encoding of auditory objects appears to extend to non-speech biological sounds and to operate in regions other than the anterior temporal lobe. Neural processes for the abstract encoding of auditory objects might have facilitated the emergence of speech categories in our ancestors.
Article
Full-text available
Four experiments investigated the acoustical correlates of similarity and categorization judgments of environmental sounds. In Experiment 1, similarity ratings were obtained from pairwise comparisons of recordings of 50 environmental sounds. A three-dimensional multidimensional scaling (MDS) solution showed three distinct clusterings of the sounds, which included harmonic sounds, discrete impact sounds, and continuous sounds. Furthermore, sounds from similar sources tended to be in close proximity to each other in the MDS space. The orderings of the sounds on the individual dimensions of the solution were well predicted by linear combinations of acoustic variables, such as harmonicity, amount of silence, and modulation depth. The orderings of sounds also correlated significantly with MDS solutions for similarity ratings of imagined sounds and for imagined sources of sounds, obtained in Experiments 2 and 3—as was the case for free categorization of the 50 sounds (Experiment 4)—although the categorization data were less well predicted by acoustic features than were the similarity data.
Article
Full-text available
Numerous species possess cortical regions that are most sensitive to vocalizations produced by their own kind (conspecifics). In humans, the superior temporal sulci (STSs) putatively represent homologous voice-sensitive areas of cortex. However, superior temporal sulcus (STS) regions have recently been reported to represent auditory experience or "expertise" in general rather than showing exclusive sensitivity to human vocalizations per se. Using functional magnetic resonance imaging and a unique non-stereotypical category of complex human non-verbal vocalizations-human-mimicked versions of animal vocalizations-we found a cortical hierarchy in humans optimized for processing meaningful conspecific utterances. This left-lateralized hierarchy originated near primary auditory cortices and progressed into traditional speech-sensitive areas. Our results suggest that the cortical regions supporting vocalization perception are initially organized by sensitivity to the human vocal tract in stages before the STS. Additionally, these findings have implications for the developmental time course of conspecific vocalization processing in humans as well as its evolutionary origins.
Article
Full-text available
Whether viewed or heard, an object in action can be segmented from a background scene based on a number of different sensory cues. In the visual system, salient low-level attributes of an image are processed along parallel hierarchies, and involve intermediate stages, such as the lateral occipital cortices, wherein gross-level object form features are extracted prior to stages that show object specificity (e.g. for faces, buildings, or tools). In the auditory system, though relying on a rather different set of low-level signal attributes, a distinct acoustic event or “auditory object” can also be readily extracted from a background acoustic scene. However, it remains unclear whether cortical processing strategies used by the auditory system similarly extract gross-level aspects of “acoustic object form” that may be inherent to many real-world sounds. Examining mechanical and environmental action sounds, representing two distinct categories of non-biological and non-vocalization sounds, we had participants assess the degree to which each sound was perceived as a distinct object versus an acoustic scene. Using two functional magnetic resonance imaging (fMRI) task paradigms, we revealed bilateral foci along the superior temporal gyri (STG) showing sensitivity to the “object-ness” ratings of action sounds, independent of the category of sound and independent of task demands. Moreover, for both categories of sounds these regions also showed parametric sensitivity to spectral structure variations—a measure of change in entropy in the acoustic signals over time (acoustic form)—while only the environmental sounds showed parametric sensitivity to mean entropy measures. Thus, similar to the visual system, the auditory system appears to include intermediate feature extraction stages that are sensitive to the acoustic form of action sounds, and may serve as a stage that begins to dissociate different categories of real-world auditory objects.
Article
Full-text available
Picture naming is a widely used technique in psycholinguistic studies. Here, we describe new on-line resources that our project has compiled and made available to researchers on the world wide web at http://crl.ucsd.edu/~aszekely/ipnp/. The website provides access to a wide range of picture stimuli and related norms in seven languages. Picture naming norms, including indices of name agreement and latency, for 520 black-and-white drawings of common objects and 275 concrete transitive and intransitive actions are presented. Norms for age-of-acquisition, word-frequency, familiarity, goodness-of-depiction, and visual complexity are included. An on-line database query system can be used to select a specific range of stimuli, based on parameters of interest for a wide range of studies on healthy and clinical populations, as well as studies of language development.
Conference Paper
Full-text available
Auditory icons add valuable functionality to computer interfaces, particularly when they are parameterized to convey dimensional information. They are difficult to create and manipulate, however, because they usually rely on digital sampling techniques. This paper suggests that new synthesis algorithms, controlled along dimensions of events rather than those of the sounds themselves, may solve this problem. Several algorithms, developed from research on auditory event perception, are described in enough detail here to permit their implementation. They produce a variety of impact, bouncing, breaking, scraping, and machine sounds. By controlling them with attributes of relevant computer events, a wide range of parameterized auditory icons may be created.
Article
Full-text available
Three experiments tested the hypothesis that temporal patterning alone, without differences in quasi-stable spectral properties, provides effective information for adult listeners to categorize breaking and bouncing styles of change. Exp I, conducted with 15 college students, investigated whether natural sound would provide sufficient acoustic information for Ss to categorize the events of breaking and bouncing glass. Results showed that Ss could reliably categorize bouncing and breaking. Exp II, conducted with 15 college students, adjusted the temporal components of the sounds but kept their spectral properties constant. Results were similar to those in Exp I, and the performance of Ss was only about 10% lower. Exp III investigated whether initial noise, in addition to the pulse patterns, was necessary for 30 college students to categorize breaking and bouncing. Results indicate that removal of the initial noise did not reduce perceptual performance. Overall, the findings of these experiments suggest that higher order temporal properties of the acoustic signal provide information for the auditory perception of bouncing and breaking.
Article
Full-text available
Everyday listening is the experience of hearing events in the world rather than sounds per se. In this article, I explore the acoustic basis of everyday listening as a start toward understanding how sounds near the ear can indicate remote physical events. Information for sound-producing events and their dimensions is investigated using physical analyses of events to inform acoustic analyses of the sounds they produce. The result is a variety of algorithms which enable the synthesis of sounds made by basic-level events such as impacts, scraping, and dripping, as well as by more complex events such as bouncing, breaking, spilling, and machinery. These algorithms may serve as instantiated hypotheses about the acoustic information for events. Analysis and synthesis work together in their development: Just as analyses of the physics and acoustics of sound-producing events may inform synthesis, so listening to the results of synthesis may inform analysis. This raises several issues concerning evaluation, specification, and the tension between formal and informal physical analyses. In the end, however, the fundamental test of these algorithms is in the sounds they produce: I describe them in enough detail here that readers may implement, test, and extend them.
Article
Full-text available
The effects of context on the identification of everyday sounds was examined in four experiments. In all experiments, the test sounds were selected as nearly homonymous pairs. That is, the members of a pair sounded similar but were aurally discriminable. These test sounds were presented in isolation to get baseline identification performance and within sequences of other everyday sounds to assess contextual effects. These sequences were: (a) semantically consistent with the correct response, (b) semantically biased toward the other member of the pair, or (c) composed of randomly arranged sounds. TWO par- adigms, binary choice and free identification were used. Results indicate that context had significant negative effects and only minor positive effects. Per- formance was consistently poorest in biased context and best in both isolated and consistent context. A signal detection analysis indicated that perform- ance in identifying an out-of-context sound remains constant for the two par- adigms, and that response bias is conservative, especially with a free-response paradigm. Labels added to enhance context generally had little effect beyond the effects of sounds alone.
Article
Full-text available
Comparisons are made between the perception of environmental sound and the perception of speech. With both, two types of processing are involved, bottom-up and top-down, and with both, the detailed form of the processing is, in several respects, similar. Recognition of isolated speech and environmental sounds produces similar patterns of semantic interpretations. Environmental sound "homonyms" are ambiguous in much the same manner as speech homonyms. Environmental sounds become integrated on the basis of cognitive processes similar to those used to perceive speech. The general conclusion is that environmental sound is usefully thought of as a form of language.
Article
Full-text available
The attentional effects triggered by emotional stimuli in humans have been substantially investigated, but little is known about the impact of affective valence on the processing of meaning. Here, we used a cross-modal priming paradigm involving visually presented adjective-noun dyads and environmental sounds of controlled affective valence to test the contributions of conceptual relatedness and emotional congruence to priming. Participants undergoing event-related potential recording indicated whether target environmental sounds were related in meaning to adjective-noun dyads presented as primes. We tested spontaneous emotional priming by manipulating the congruence between the affective valence of the adjective in the prime and that of the sound. While the N400 was significantly reduced in amplitude by both conceptual relatedness and emotional congruence, there was no interaction between the 2 factors. The same pattern of results was found when participants judged the emotional congruence between environmental sounds and adjective-noun dyads. These results support the hypothesis that conceptual and emotional processes are functionally independent regardless of the specific cognitive focus of the comprehender.
Article
Full-text available
Semantic knowledge is supported by a widely distributed neuronal network, with differential patterns of activation depending upon experimental stimulus or task demands. Despite a wide body of knowledge on semantic object processing from the visual modality, the response of this semantic network to environmental sounds remains relatively unknown. Here, we used fMRI to investigate how access to different conceptual attributes from environmental sound input modulates this semantic network. Using a range of living and manmade sounds, we scanned participants whilst they carried out an object attribute verification task. Specifically, we tested visual perceptual, encyclopedic, and categorical attributes about living and manmade objects relative to a high-level auditory perceptual baseline to investigate the differential patterns of response to these contrasting types of object-related attributes, whilst keeping stimulus input constant across conditions. Within the bilateral distributed network engaged for processing environmental sounds across all conditions, we report here a highly significant dissociation within the left hemisphere between the processing of visual perceptual and encyclopedic attributes of objects.
Article
Full-text available
The effect of context on the identification of common environmental sounds (e.g., dogs barking or cars honking) was tested by embedding them in familiar auditory background scenes (street ambience, restaurants). Initial results with subjects trained on both the scenes and the sounds to be identified showed a significant advantage of about five percentage points better accuracy for sounds that were contextually incongruous with the background scene (e.g., a rooster crowing in a hospital). Further studies with naive (untrained) listeners showed that this incongruency advantage (IA) is level-dependent: there is no advantage for incongruent sounds lower than a Sound/Scene ratio (So/Sc) of -7.5 dB, but there is about five percentage points better accuracy for sounds with greater So/Sc. Testing a new group of trained listeners on a larger corpus of sounds and scenes showed that the effect is robust and not confined to a specific stimulus set. Modeling using spectral-temporal measures showed that neither analyses based on acoustic features, nor semantic assessments of sound-scene congruency can account for this difference, indicating the IA is a complex effect, possibly arising from the sensitivity of the auditory system to new and unexpected events, under particular listening conditions.
Article
Full-text available
It is widely accepted that hearing loss increases markedly with age, beginning in the fourth decade ISO 7029 (2000). Age-related hearing loss is typified by high-frequency threshold elevation and associated reductions in speech perception because speech sounds, especially consonants, become inaudible. Nevertheless, older adults often report additional and progressive difficulties in the perception and comprehension of speech, often highlighted in adverse listening conditions that exceed those reported by younger adults with a similar degree of high-frequency hearing loss (Dubno, Dirks, & Morgan) leading to communication difficulties and social isolation (Weinstein & Ventry). Some of the age-related decline in speech perception can be accounted for by peripheral sensory problems but cognitive aging can also be a contributing factor. In this article, we review findings from the psycholinguistic literature predominantly over the last four years and present a pilot study illustrating how normal age-related changes in cognition and the linguistic context can influence speech-processing difficulties in older adults. For significant progress in understanding and improving the auditory performance of aging listeners to be made, we discuss how future research will have to be much more specific not only about which interactions between auditory and cognitive abilities are critical but also how they are modulated in the brain.
Article
Full-text available
It is still unknown whether sonic environments influence the processing of individual sounds in a similar way as discourse or sentence context influences the processing of individual words. One obstacle to answering this question has been the failure to dissociate perceptual (i.e., how similar are sonic environment and target sound?) and conceptual (i.e., how related are sonic environment and target?) priming effects. In this study, we dissociate these effects by creating prime-target pairs with a purely perceptual or both a perceptual and conceptual relationship. Perceptual prime-target pairs were derived from perceptual-conceptual pairs (i.e., meaningful environmental sounds) by shuffling the spectral composition of primes and targets so as to preserve their perceptual relationship while making them unrecognizable. Hearing both original and shuffled targets elicited a more positive N1/P2 complex in the ERP when targets were related to a preceding prime as compared with unrelated. Only related original targets reduced the N400 amplitude. Related shuffled targets tended to decrease the amplitude of a late temporo-parietal positivity. Taken together, these effects indicate that sonic environments influence first the perceptual and then the conceptual processing of individual sounds. Moreover, the influence on conceptual processing is comparable to the influence linguistic context has on the processing of individual words.
Article
Full-text available
Cocktail parties and other natural auditory environments present organisms with mixtures of sounds. Segregating individual sound sources is thought to require prior knowledge of source properties, yet these presumably cannot be learned unless the sources are segregated first. Here we show that the auditory system can bootstrap its way around this problem by identifying sound sources as repeating patterns embedded in the acoustic input. Due to the presence of competing sounds, source repetition is not explicit in the input to the ear, but it produces temporal regularities that listeners detect and use for segregation. We used a simple generative model to synthesize novel sounds with naturalistic properties. We found that such sounds could be segregated and identified if they occurred more than once across different mixtures, even when the same sounds were impossible to segregate in single mixtures. Sensitivity to the repetition of sound sources can permit their recovery in the absence of other segregation cues or prior knowledge of sounds, and could help solve the cocktail party problem.
Article
Full-text available
How the brain processes complex sounds, like voices or musical instrument sounds, is currently not well understood. The features comprising the acoustic profiles of such sounds are thought to be represented by neurons responding to increasing degrees of complexity throughout auditory cortex, with complete auditory "objects" encoded by neurons (or small networks of neurons) in anterior superior temporal regions. Although specialized voice and speech-sound regions have been proposed, it is unclear how other types of complex natural sounds are processed within this object-processing pathway. Using functional magnetic resonance imaging, we sought to demonstrate spatially distinct patterns of category-selective activity in human auditory cortex, independent of semantic content and low-level acoustic features. Category-selective responses were identified in anterior superior temporal regions, consisting of clusters selective for musical instrument sounds and for human speech. An additional subregion was identified that was particularly selective for the acoustic-phonetic content of speech. In contrast, regions along the superior temporal plane closer to primary auditory cortex were not selective for stimulus category, responding instead to specific acoustic features embedded in natural sounds, such as spectral structure and temporal modulation. Our results support a hierarchical organization of the anteroventral auditory-processing stream, with the most anterior regions representing the complete acoustic signature of auditory objects.
Article
Full-text available
It is important to understand the rich structure of natural sounds in order to solve important tasks, like automatic speech recognition, and to understand auditory processing in the brain. This thesis takes a step in this direction by characterising the statistics of simple natural sounds. We focus on the statistics because perception often appears to depend on them, rather than on the raw waveform. For example the perception of auditory textures, like running water, wind, fire and rain, depends on summary-statistics, like the rate of falling rain droplets, rather than on the exact details of the physical source. In order to analyse the statistics of sounds accurately it is necessary to improve a number of traditional signal processing methods, including those for amplitude demodulation, time-frequency analysis, and sub-band demodulation. These estimation tasks are ill-posed and therefore it is natural to treat them as Bayesian inference problems. The new probabilistic versions of these methods have several advantages. For example, they perform more accurately on natural signals and are more robust to noise, they can also fill-in missing sections of data, and provide error-bars. Furthermore, free-parameters can be learned from the signal. Using these new algorithms we demonstrate that the energy, sparsity, modulation depth and modulation time-scale in each sub-band of a signal are critical statistics, together with the dependencies between the sub-band modulators. In order to validate this claim, a model containing co-modulated coloured noise carriers is shown to be capable of generating a range of realistic sounding auditory textures. Finally, we explored the connection between the statistics of natural sounds and perception. We demonstrate that inference in the model for auditory textures qualitatively replicates the primitive grouping rules that listeners use to understand simple acoustic scenes. This suggests that the auditory system is optimised for the statistics of natural sounds.
Article
Full-text available
The influence of listener's expertise and sound identification on the categorization of environmental sounds is reported in three studies. In Study 1, the causal uncertainty of 96 sounds was measured by counting the different causes described by 29 participants. In Study 2, 15 experts and 15 nonexperts classified a selection of 60 sounds and indicated the similarities they used. In Study 3, 38 participants indicated their confidence in identifying the sounds. Participants reported using either acoustical similarities or similarities of the causes of the sounds. Experts used acoustical similarity more often than nonexperts, who used the similarity of the cause of the sounds. Sounds with a low causal uncertainty were more often grouped together because of the similarities of the cause, whereas sounds with a high causal uncertainty were grouped together more often because of the acoustical similarities. The same conclusions were reached for identification confidence. This measure allowed the sound classification to be predicted, and is a straightforward method to determine the appropriate description of a sound.
Article
Full-text available
Little is known about the processing of non-verbal sounds in the primary progressive aphasias. Here, we investigated the processing of complex non-verbal sounds in detail, in a consecutive series of 20 patients with primary progressive aphasia [12 with progressive non-fluent aphasia; eight with semantic dementia]. We designed a novel experimental neuropsychological battery to probe complex sound processing at early perceptual, apperceptive and semantic levels, using within-modality response procedures that minimized other cognitive demands and matching tests in the visual modality. Patients with primary progressive aphasia had deficits of non-verbal sound analysis compared with healthy age-matched individuals. Deficits of auditory early perceptual analysis were more common in progressive non-fluent aphasia, deficits of apperceptive processing occurred in both progressive non-fluent aphasia and semantic dementia, and deficits of semantic processing also occurred in both syndromes, but were relatively modality specific in progressive non-fluent aphasia and part of a more severe generic semantic deficit in semantic dementia. Patients with progressive non-fluent aphasia were more likely to show severe auditory than visual deficits as compared to patients with semantic dementia. These findings argue for the existence of core disorders of complex non-verbal sound perception and recognition in primary progressive aphasia and specific disorders at perceptual and semantic levels of cortical auditory processing in progressive non-fluent aphasia and semantic dementia, respectively.
Article
A 33-year-old woman had developed cortical deafness with profound initial deafness lasting for eleven months after pneumococcal meningitis ten years previously. CT scan demonstrated bilateral temporal lobe lesions, predominantly on the left side where it extended into the adjacent parietal and frontal lobes. Audiometry suggested integrity of the internal ear and brain stem. Early auditory evoked potentials were present, while potentials of moderate latency and delayed potentials were abolished. Neuropsychological investigations demonstrated total absence of spoken language, contrasting with conservation of written language, though with agrammatism and an impossibility of identifying non-verbal noises, spoken language, and music. The patient could not identify rhythms, pitch, melodies or the different types of music. The musical quality of sound stimuli and musical pleasure were, however spared as shown by recognition of tape recorded sound stimuli with written denomination and designation of images in multiple choice tests. The relations between auditory agnosia, 'pure' verbal deafness and cortical deafness are discussed. Reported cases are reviewed and an attempt is made to demonstrate the existence of several levels in the integration of musical stimuli, the most elementary of which could be the perception of the musical quality of sounds, as was the case in the present patient.
Presents a standardized set of 260 pictures for use in experiments investigating differences and similarities in the processing of pictures and words. The pictures are black-and-white line drawings executed according to a set of rules that provide consistency of pictorial representation. They have been standardized on 4 variables of central relevance to memory and cognitive processing: name agreement, image agreement, familiarity, and visual complexity. The intercorrelations among the 4 measures were low, suggesting that they are indices of different attributes of the pictures. The concepts were selected to provide exemplars from several widely studied semantic categories. Sources of naming variance, and mean familiarity and complexity of the exemplars, differed significantly across the set of categories investigated. The potential significance of each of the normative variables to a number of semantic and episodic memory tasks is discussed. (34 ref) (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
Twenty-five left-handed and 25 right-handed subjects performed three dichotic listening tasks, two verbal and one non-verbal. Comparisons were made between mean scores obtained at the right and left ears, as well as between the handedness groups. The following results were obtained:
Article
We examine the mechanisms by which the human auditory cortex processes the frequency content of natural sounds. Through mathematical modeling of ultra-high field (7 T) functional magnetic resonance imaging responses to natural sounds, we derive frequency-tuning curves of cortical neuronal populations. With a data-driven analysis, we divide the auditory cortex into five spatially distributed clusters, each characterized by a spectral tuning profile. Beyond neuronal populations with simple single-peaked spectral tuning (grouped into two clusters), we observe that ∼60% of auditory populations are sensitive to multiple frequency bands. Specifically, we observe sensitivity to multiple frequency bands (1) at exactly one octave distance from each other, (2) at multiple harmonically related frequency intervals, and (3) with no apparent relationship to each other. We propose that beyond the well known cortical tonotopic organization, multipeaked spectral tuning amplifies selected combinations of frequency bands. Such selective amplification might serve to detect behaviorally relevant and complex sound features, aid in segregating auditory scenes, and explain prominent perceptual phenomena such as octave invariance.
Article
This study investigated the development of children's skills in identifying ecologically relevant sound objects within naturalistic listening environments, using a non-linguistic analogue of the classic 'cocktail-party' situation. Children aged 7 to 12.5 years completed a closed-set identification task in which brief, commonly encountered environmental sounds were presented at varying signal-to-noise ratios. To simulate the complexity of real-world acoustic environments, target sounds were embedded in either a single, stereophonically presented scene, or in one of two different scenes, with each scene presented to a single ear. Each target sound was either congruent or incongruent with the auditory context. Identification accuracy improved with increasing age, particularly in trials with low signal-to-noise ratios. Performance was most accurate when target sounds were incongruent with the background scene, and when sounds were presented in a single background scene. The presence of two backgrounds disproportionately disrupted children's performance relative to that of previously tested adults, and reduced children's sensitivity to contextual cues. Successful identification of familiar sounds in complex auditory contexts is the outcome of a protracted learning process, with children reaching adult levels of performance after a decade or more of experience.
Article
Humans can see and name thousands of distinct object and action categories, so it is unlikely that each category is represented in a distinct brain area. A more efficient scheme would be to represent categories as locations in a continuous semantic space mapped smoothly across the cortical surface. To search for such a space, we used fMRI to measure human brain activity evoked by natural movies. We then used voxelwise models to examine the cortical representation of 1,705 object and action categories. The first few dimensions of the underlying semantic space were recovered from the fit models by principal components analysis. Projection of the recovered semantic space onto cortical flat maps shows that semantic selectivity is organized into smooth gradients that cover much of visual and nonvisual cortex. Furthermore, both the recovered semantic space and the cortical organization of the space are shared across different individuals. Video Abstract eyJraWQiOiI4ZjUxYWNhY2IzYjhiNjNlNzFlYmIzYWFmYTU5NmZmYyIsImFsZyI6IlJTMjU2In0.eyJzdWIiOiI3ZWNkYzY3ZDRjOTFhY2IyM2M3OTFiMTQ1NjBjODcxYyIsImtpZCI6IjhmNTFhY2FjYjNiOGI2M2U3MWViYjNhYWZhNTk2ZmZjIiwiZXhwIjoxNjA1OTg1OTk3fQ.TIfO0vZhS64vD6XcG7NrP4pNhAwy5ffRuUCy6rfglyVYgh132-tX0C-qXLM0hWFTshxonufyBgunY-95zDnSr25Wt4Hlhj8dUMCaT4-FGUKKlWG_9CmdIqZw768-OxcKUwY30gavdTbvrJMYWSVAh4CAItc182RqWtzJkhBJ3ppyVJg2QDgCOq9pVfmX55cDhs56RRa0l8iZJZV7E3usEYII46FHGoGK3NaE0czwZGS2hH4-UQsUAN7EXePId7dq0powuVh6oZzv9jzHgBbl0G43aItJb-EzdCjxvT1f9isAMSbxcv5FFXxyzMloSBMTjILvJTzC3XVLFZXOg9aePg (mp4, (74.34 MB) Download video
Article
"96 RIGHT-HANDED PATIENTS WITH UNILATERAL HEMISPHERIC DAMAGED (51 APHASICS, 16 NON-APHASIC LEFT BRAIN-DAMAGED AND 29 RIGHT BRAIN-DAMAGED PATIENTS) AND 35 CONTROL PATIENTS WITHOUT CEREBRAL LESIONS WERE GIVEN A SOUND RECOGNITION TEST REQUIRING THE IDENTIFICATION OF 10 MEANINGFUL SOUNDS OR NOISES." THE FINDINGS INDICATE THAT "THE IMPAIRED RECOGNITION OF MEANINGFUL SOUNDS, CHARACTERISTIC OF APHASICS, IS DUE TO A GREAT EXTENT TO THE INABILITY TO ASSOCIATE THE PERCEIVED SOUND TO ITS CORRECT MEANING, RATHER THAN TO A MERELY ACOUSTIC-DISCRIMINATIVE DEFECT. THE HYPOTHESIS IS ADVANCED THAT THIS SEMANTIC-ASSOCIATIVE DISORDER IS THE COMMON FACTOR UNDERLYING BOTH THE SOUND RECOGNITION AND THE AUDITORY LANGUAGE COMPREHENSION DEFECTS, AND THAT IT MAY MANIFEST ITSELF ALSO IN OTHER AREAS OF THE APHASIC'S BEHAVIOR." (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Auditory cortical processing of complex meaningful sounds entails the transformation of sensory (tonotopic) representations of incoming acoustic waveforms into higher-level sound representations (e.g., their category). However, the precise neural mechanisms enabling such transformations remain largely unknown. In the present study, we use functional magnetic resonance imaging (fMRI) and natural sounds stimulation to examine these two levels of sound representation (and their relation) in the human auditory cortex. In a first experiment, we derive cortical maps of frequency preference (tonotopy) and selectivity (tuning width) by mathematical modeling of fMRI responses to natural sounds. The tuning width maps highlight a region of narrow tuning that follows the main axis of Heschl's gyrus and is flanked by regions of broader tuning. The narrowly tuned portion on Heschl's gyrus contains two mirror-symmetric frequency gradients, presumably defining two distinct primary auditory areas. In addition, our analysis indicates that spectral preference and selectivity (and their topographical organization) extend well beyond the primary regions and also cover higher-order and category-selective auditory regions. In particular, regions with preferential responses to human voice and speech occupy the low-frequency portions of the tonotopic map. We confirm this observation in a second experiment, where we find that speech/voice selective regions exhibit a response bias toward the low frequencies characteristic of human voice and speech, even when responding to simple tones. We propose that this frequency bias reflects the selective amplification of relevant and category-characteristic spectral bands, a useful processing step for transforming a sensory (tonotopic) sound image into higher level neural representations.
Article
Regional cerebral blood flow (rCBF) was measured using positron emission tomography with oxygen-15 labeled water as 10 normal subjects listened to three types of auditory stimuli (environmental sounds, meaningless speech, and words) presented binaurally or dichotically. Binaurally presented environmental sounds and words caused similar bilateral rCBF increases in left and right superior temporal gyri. Dichotically presented stimuli (subjects attended to left or right ears) caused asymmetric activation in the temporal lobes, resulting from increased rCBF in temporal lobe regions contralateral to the attended ear and decreased rCBF in the opposite hemisphere. The results indicate that auditorily presented language and non- language stimuli activate similar temporal regions, that dichotic stimulation dramatically changes rCBF in temporal lobes, and that the change is due both to attentional mechanisms and to hemispheric specialization.
Article
Latent Semantic Analysis (Landauer & Dumais, 1997) and Hyperspace Analogue to Language (Burgess & Lund, 1997) model meaning as the relations among abstract symbols that are arbitrarily related to what they signify. These symbols are ungrounded in that they are not tied to perceptual experience or action. Because the symbols are ungrounded, they cannot, in principle, capture the meaning of novel situations. In contrast, participants in three experiments found it trivially easy to discriminate between descriptions of sensible novel situations (e.g., using a newspaper to protect one's face from the wind) and nonsense novel situations (e.g., using a matchbook to protect one's face from the wind). These results support the Indexical Hypothesis that the meaning of a sentence is constructed by (a) indexing words and phrases to real objects or perceptual, analog symbols; (b) deriving affordances from the objects and symbols; and (c) meshing the affordances under the guidance of syntax.
Article
Three experiments examined repetition priming for meaningful environmental sounds (e.g., clock ticking, tooth brushing, toilet flushing, etc.) in a sound stem identification paradigm using brief sound cues. Prior encoding of target sounds together with their associated names facilitated subsequent identification of sound stems relative to nonstudied controls. In contrast, prior exposure to names alone in the absence of the environmental sounds did not prime subsequent sound stem identification performance at all (Experiments 1 and 3). Explicit and implicit memory were dissociated such that sound stem cued recall was higher following semantic than nonsemantic encoding, whereas sound priming was insensitive to manipulations of depth encoding (Experiments 2 and 3). These results extend the findings of long-term repetition priming into the auditory nonverbal domain and suggest that priming for environmental sounds is mediated primarily by perceptual processes.
Article
A case is reported of severe agnosia for verbal and non-verbal sounds without associated aphasic disorder. A CT scan revealed bilateral, temporal lobe lesions from two ischaemic accidents that had occurred 9 months apart. The search for subtle deficits in the patient showed normal sensitivity to changes in the intensity and frequency of simple sounds; in contrast, his ability to discriminate sound duration and musical note sequences was severely impaired. The simultaneous recording of the whole auditory-evoked response pattern revealed no abnormality in the early components, which reflect the activation of the auditory nuclei and pathways of the brain stem. However, the middle and late components were delayed and slowed. These results and others in the literature suggest that the neocortex in man, as in other mammals, plays an essential role in the temporal aspects of hearing. Also, the two main ingredients commonly recognized in auditory agnosia, i.e. word deafness and the inability to interpret non-verbal sounds, are caused by the disruption of elementary, bilaterally represented cortical functions which start the processing of every kind of auditory information.
Article
We propose a multisensory framework based on Glaser and Glaser's (1989) general reading-naming interference model to account for the semantic priming effect by naturalistic sounds and spoken words on visual picture sensitivity. Four experiments were designed to investigate two key issues: First, can auditory stimuli enhance visual sensitivity when the sound leads the picture as well as when they are presented simultaneously? And, second, do naturalistic sounds (e.g., a dog's "woofing") and spoken words (e.g., /dɔg/) elicit similar semantic priming effects? Here, we estimated participants' sensitivity and response criterion using signal detection theory in a picture detection task. The results demonstrate that naturalistic sounds enhanced visual sensitivity when the onset of the sounds led that of the picture by 346 ms (but not when the sounds led the pictures by 173 ms, nor when they were presented simultaneously, Experiments 1-3A). At the same SOA, however, spoken words did not induce semantic priming effects on visual detection sensitivity (Experiments 3B and 4A). When using a dual picture detection/identification task, both kinds of auditory stimulus induced a similar semantic priming effect (Experiment 4B). Therefore, we suggest that there needs to be sufficient processing time for the auditory stimulus to access its associated meaning to modulate visual perception. Besides, the interactions between pictures and the two types of sounds depend not only on their processing route to access semantic representations, but also on the response to be made to fulfill the requirements of the task.
Article
In the present study, we investigated the influence of object-scene relationships on eye movement control during scene viewing. We specifically tested whether an object that is inconsistent with its scene context is able to capture gaze from the visual periphery. In four experiments, we presented rendered images of naturalistic scenes and compared baseline consistent objects with semantically, syntactically, or both semantically and syntactically inconsistent objects within those scenes. To disentangle the effects of extrafoveal and foveal object-scene processing on eye movement control, we used the flash-preview moving-window paradigm: A short scene preview was followed by an object search or free viewing of the scene, during which visual input was available only via a small gaze-contingent window. This method maximized extrafoveal processing during the preview but limited scene analysis to near-foveal regions during later stages of scene viewing. Across all experiments, there was no indication of an attraction of gaze toward object-scene inconsistencies. Rather than capturing gaze, the semantic inconsistency of an object weakened contextual guidance, resulting in impeded search performance and inefficient eye movement control. We conclude that inconsistent objects do not capture gaze from an initial glimpse of a scene.
Article
Studies of semantic dementia and repetitive TMS have suggested that the bilateral anterior temporal lobes (ATLs) underpin a modality-invariant representational hub within the semantic system. However, it is not clear whether all ATL subregions contribute in the same way. We utilized distortion-corrected fMRI to investigate the pattern of activation in the left and right ATL when participants performed a semantic decision task on auditory words, environmental sounds, or pictures. This showed that the ATL is not functionally homogeneous but is more graded. Both left and right ventral ATL (vATL) responded to all modalities in keeping with the notion that this region underpins multimodality semantic processing. In addition, there were graded differences across the hemispheres. Semantic processing of both picture and environmental sound stimuli was associated with equivalent bilateral vATL activation, whereas auditory words generated greater activation in left than right vATL. This graded specialization for auditory stimuli would appear to reflect the input from the left superior ATL, which responded solely to semantic decisions on the basis of spoken words and environmental sounds, suggesting that this region is specialized to auditory stimuli. A final noteworthy result was that these regions were activated for domain level decisions to singly presented stimuli, which appears to be incompatible with the hypotheses that the ATL is dedicated (a) to the representation of specific entities or (b) for combinatorial semantic processes.
Article
Using functional MRI, we investigated whether auditory processing of both speech and meaningful non-linguistic environmental sounds in superior and middle temporal cortex relies on a complex and spatially distributed neural system. We found that evidence for spatially distributed processing of speech and environmental sounds in a substantial extent of temporal cortices. Most importantly, regions previously reported as selective for speech over environmental sounds also contained distributed information. The results indicate that temporal cortices supporting complex auditory processing, including regions previously described as speech-selective, are in fact highly heterogeneous.
Article
In contrast to visual object processing, relatively little is known about how the human brain processes everyday real-world sounds, transforming highly complex acoustic signals into representations of meaningful events or auditory objects. We recently reported a fourfold cortical dissociation for representing action (nonvocalization) sounds correctly categorized as having been produced by human, animal, mechanical, or environmental sources. However, it was unclear how consistent those network representations were across individuals, given potential differences between each participant's degree of familiarity with the studied sounds. Moreover, it was unclear what, if any, auditory perceptual attributes might further distinguish the four conceptual sound-source categories, potentially revealing what might drive the cortical network organization for representing acoustic knowledge. Here, we used functional magnetic resonance imaging to test participants before and after extensive listening experience with action sounds, and tested for cortices that might be sensitive to each of three different high-level perceptual attributes relating to how a listener associates or interacts with the sound source. These included the sound's perceived concreteness, effectuality (ability to be affected by the listener), and spatial scale. Despite some variation of networks for environmental sounds, our results verified the stability of a fourfold dissociation of category-specific networks for real-world action sounds both before and after familiarity training. Additionally, we identified cortical regions parametrically modulated by each of the three high-level perceptual sound attributes. We propose that these attributes contribute to the network-level encoding of category-specific acoustic knowledge representations.
Article
Autism is a pervasive developmental disorder characterized by deficits in social-emotional, social-communicative, and language skills. Behavioral and neuroimaging studies have found that children with autism spectrum disorders (ASD) evidence abnormalities in semantic processing, with particular difficulties in verbal comprehension. However, it is not known whether these semantic deficits are confined to the verbal domain or represent a more general problem with semantic processing. The focus of the current study was to investigate verbal and meaningful nonverbal semantic processing in high-functioning children with autism (mean age = 5.8 years) using event-related potentials (ERPs). ERPs were recorded while children attended to semantically matching and mismatching picture-word and picture-environmental sound pairs. ERPs of typically developing children exhibited evidence of semantic incongruency detection in both the word and environmental sound conditions, as indexed by elicitation of an N400 effect. In contrast, children with ASD showed an N400 effect in the environmental sound condition but not in the word condition. These results provide evidence for a deficiency in the automatic activation of semantic representations in children with ASD, and suggest that this deficit is somewhat more selective to, or more severe in, the verbal than the nonverbal domain.
Article
In a non-linguistic analog of the "cocktail-party" scenario, informational and contextual factors were found to affect the recognition of everyday environmental sounds embedded in naturalistic auditory scenes. Short environmental sound targets were presented in a dichotic background scene composed of either a single stereo background scene or a composite background scene created by playing different background scenes to the different ears. The side of presentation, time of onset, and number of target sounds were varied across trials to increase the uncertainty for the participant. Half the sounds were contextually congruent with the background sound (i.e., consistent with the meaningful real-world sound environment represented in the auditory scene) and half were incongruent. The presence of a single competing background scene decreased identification accuracy, suggesting an informational masking effect. In tandem, there was a contextual pop-out effect, with contextually incongruent sounds identified more accurately. However, when targets were incongruent with the real-world context of the background scene, informational masking was reduced. Acoustic analyses suggested that this contextual pop-out effect was driven by a mixture of perceptual differences between the target and background, as well as by higher-level cognitive factors. These findings indicate that identification of environmental sounds in naturalistic backgrounds is an active process that requires integrating perceptual, attentional, and cognitive resources.
Article
In an effort to clarify whether semantic integration is impaired in verbal and nonverbal auditory domains in children with developmental language impairment (a.k.a., LI and SLI), the present study obtained behavioral and neural responses to words and environmental sounds in children with language impairment and their typically developing age-matched controls (ages 7-15 years). Event-related brain potentials (ERPs) were recorded while children performed a forced-choice matching task on semantically matching and mismatching visual-auditory, picture-word and picture-environmental sound pairs. Behavioral accuracy and reaction time measures were similar for both groups of children, with environmental sounds eliciting more accurate responses than words. In picture-environmental sound trials, behavioral performance and the brain's response to semantic incongruency (i.e., the N400 effect) of the children with language impairment were comparable to those of their typically developing peers. However, in picture-word trials, children with LI tended to be less accurate than their controls and their N400 effect was significantly delayed in latency. Thus, the children with LI demonstrated a semantic integration deficit that was somewhat specific to the verbal domain. The particular finding of a delayed N400 effect is consistent with the storage deficit hypothesis of language impairment (Kail & Leonard, 1986) suggesting weakened and/or less efficient connections within the language networks of children with LI.
Article
We report the case of patient M, who suffered unilateral left posterior temporal and parietal damage, brain regions typically associated with language processing. Language function largely recovered since the infarct, with no measurable speech comprehension impairments. However, the patient exhibited a severe impairment in nonverbal auditory comprehension. We carried out extensive audiological and behavioral testing in order to characterize M's unusual neuropsychological profile. We also examined the patient's and controls' neural responses to verbal and nonverbal auditory stimuli using functional magnetic resonance imaging (fMRI). We verified that the patient exhibited persistent and severe auditory agnosia for nonverbal sounds in the absence of verbal comprehension deficits or peripheral hearing problems. Acoustical analyses suggested that his residual processing of a minority of environmental sounds might rely on his speech processing abilities. In the patient's brain, contralateral (right) temporal cortex as well as perilesional (left) anterior temporal cortex were strongly responsive to verbal, but not to nonverbal sounds, a pattern that stands in marked contrast to the controls' data. This substantial reorganization of auditory processing likely supported the recovery of M's speech processing.