Article

Discrimination of voice gender in the human auditory cortex

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Discerning a speaker's gender from their voice is a basic and crucial aspect of human communication. Voice pitch height, the perceptual correlate of fundamental frequency, is higher in females and provides a cue for gender discrimination. However, male and female voices are also differentiated by multiple other spectral and temporal characteristics, including mean formant frequency and spectral flux. The robust perceptual segregation of male and female voices is thought to result from processing the combination of discriminating features, which in neural terms may correspond to early sound object analysis occurring in non-primary auditory cortex. However, the specific mechanism for gender perception has been unclear. Here, using functional magnetic resonance imaging, we show that discrete sites in non-primary auditory cortex are differentially activated by male and female voices, with female voices consistently evoking greater activation in the upper bank of the superior temporal sulcus and posterior superior temporal plane. This finding was observed at the individual subject-level in all 24 subjects. The neural response was highly specific: no auditory regions were more activated by male than female voices. Further, the activation associated with female voices was 1) larger than can be accounted for by a sole effect of fundamental frequency, 2) not due to psychological attribution of female gender and 3) unaffected by listener gender. These results demonstrate that male and female voices are represented as distinct auditory objects in the human brain, with the mechanism for gender discrimination being a gender-dependent activation-level cue in non-primary auditory cortex.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Building on the findings of Wolfe et al., Weston et al. (2015) found that, when given a choice between male and female genders, listeners correctly identified the gender of a speaker even when the fundamental speaking pitch was altered, so long as the spectral envelope remained unaffected by pitch-shifting. Weston et al. asked listeners to gauge a speaker's gender in three distinct sources of auditory stimuli. ...
... They found that when the spectral envelope remained intact listeners could correctly identify the speaker's gender regardless of the spoken pitch level. These studies show that pitch could determine gender perception in the voice even when the spectral envelope is altered, but if unchanged, spectral flux (the rate of spectral change over time)which is statistically greater in females (Weston et al. 2015: 210)remains the primary determinant of gender perception in the voice. Hence, using a time-stretching process to alter the speaker's inflection disproportionately to the original may impact perception of the speaker's gender. ...
Article
Full-text available
This analysis explores how Barry Truax’s Song of Songs (1992) for oboe d’amore, English horn and two digital soundtracks reorients prevailing norms of sexuality by playing with musical associations and aural conventions of how gender sounds. The work sets the erotic dialogue between King Solomon and Shulamite from the biblical Song of Solomon text. On the soundtracks we hear a Christian monk’s song, environmental sounds (birds, cicadas and bells), and two speakers who recite the biblical text in its entirety preserving the gendered pronouns of the original. By attending to established gender norms, Truax confirms the identity of each speaker, such that the speakers seemingly address one another as a duet, but the woman also addresses a female lover and the man a male. These gender categories are then progressively blurred with granular time-stretching and harmonisation (which transform the timbre of the voices), techniques that, together, resituate the presumed heteronormative text within a diverse constellation of possible sexual orientations.
... Furthermore, no research has examined these mechanisms in the context of advertising. Neuroimaging studies of mixed-gender samples confirm that the perception generated by the FV, as opposed to the MV, elicits greater activation of the right posterior superior temporal gyrus, left postcentral gyrus, and the bilateral parietal lobe (Lattner, Meyer, & Friederici, 2005;Sokhi, Hunter, Wilkinson, & Woodruff, 2005;Weston, Hunter, Sokhi, Wilkinson, & Woodruff, 2015). Conversely, the MV strongly activated the inferior temporal gyrus when pronouncing coherent phrases (Humphries, Willard, Buchsbaum, & Hickok, 2001). ...
... These findings agree with those of several studies that related the superior temporal and parietal lobes with the processing of the FV (vs. MV; Junger et al., 2014;Lattner et al., 2005;Weston et al., 2015). A specific neuroimaging study carried out by von Kriegstein, Eger, Kleinschmidt, and Giraud (2003) analyzed the cortical response to spoken phrases, comparing two recognition tasks targeting either the speaker's voice or the verbal content. ...
Article
Full-text available
This article examines the neural and behavioral effects of voice gender and message framing in ecological advertising by means of functional MRI in conjunction with a task presenting persuasive gain-framed (GF) or loss-framed (LF) messages pronounced by male voice (MV) and female voice (FV). Behavioral responses showed more positive attitudes toward ads comprising MVs and GF messages. A whole-brain analysis revealed that visual regions are strongly elicited by LF (vs. GF) messages, whereas an area related to personal value computing, future positive aspirations, and social benefits—namely, the anterior cingulate cortex—is strongly activated by GF (vs. LF) messages. The MV triggered stronger activation in areas related to pitch processing and visual scenes, whereas the FV provoked a higher neural–audiovisual integration. Furthermore, neural activation in the inferior frontal gyrus significantly predicted attitudes toward ads pronounced by the MV, and the activation in the orbitofrontal gyrus predicted attitudes toward ads comprising GF messages. Taken together, this article sheds light on a differential neural processing of gain and loss frames as well as MV and FV. It also suggests that messages expressing social benefits (e.g., environmental) could be processed differently from those reporting individual advantages (e.g., healthy). Finally, it advises managers and associations which market environmentally responsible products and ideas the voice and frame of the message which generate the highest subconscious value.
... Only female participants were selected in order to have a shared definition of maleness and femaleness for one same sex. Importantly, to make sure that the listener extracted the sex of the speaker as a general feature beyond the acoustic features of voices, we used ten different speakers, five males and five females (for similar approaches, Deguchi et al., 2010, Van Berkum et al., 2008Weston et al., 2015). We selected two semantically gendered words in French making reference to human beings, for which there is a direct connection between the biological sex and the gender given to these words. ...
... All speakers were native French speakers. Following this approach, by introducing a high acoustic variability in the auditory stimuli, the expected MMN response to deviant stimuli should be elicited by the sex of the speaker as a general feature, and not to the unique acoustic feature of a given voice (Deguchi et al., 2010, Van Berkum et al., 2008Weston et al., 2015). The sound recording took part in a sound proof room with an unidirectional microphone. ...
Article
When exposed to a spoken message, a listener takes into account several sources of linguistic and indexical information. Using the mismatch negativity (MMN) response, we examined whether the indexical information about the sex of the speaker influenced the processing of semantically gendered spoken words. Female participants listened two semantically gendered French words, one masculine and one feminine representing human beings, said either by five male or by five female speakers. The opposite sex voices produced an enhancement of MMN response. In line with interactive connections between indexical and linguistic information processing through activating lexical memory traces, the results showed more pronounced MMN response when the sex of the speaker matched with the gender of the word. Furthermore, there was a later detection of the incongruence between the sex information about the speaker and the gender of the word, shown by an enhancement of MMN response. Overall, these findings suggest that the listeners integrate the indexical information about the sex of the speakers both at the lexical selection level and at a higher-level processing such as the grammatical access.
... According to Weston et al. [57], the clear differentiation between male and female voices is believed to arise from the analysis of multiple discriminating features. This process occurs in the non-primary auditory cortex, where early sound object analysis takes place. ...
Article
Full-text available
Functional near-infrared spectroscopy (fNIRS) technology has been widely used to analyze biomechanics and diagnose brain activity. Despite being a promising tool for assessing the brain cortex status, this system is susceptible to disturbances and noise from electrical instrumentation and basal metabolism. In this study, an alternative filtering method, maximum likelihood generalized extended stochastic gradient (ML-GESG) estimation, is proposed to overcome the limitations of these disturbance factors. The proposed algorithm was designed to reduce multiple disturbances originating from heartbeats, breathing, shivering, and instrumental noises as multivariate parameters. To evaluate the effectiveness of the algorithm in filtering involuntary signals, a comparative analysis was conducted with a conventional filtering method, using hemodynamic responses to auditory stimuli and psycho-acoustic factors as quality indices. Using auditory sound stimuli consisting of 12 voice sources (six males and six females), the fNIRS test was configured with 18 channels and conducted on 10 volunteers. The psycho-acoustic factors of loudness and sharpness were used to evaluate physiological responses to the stimuli. Applying the proposed filtering method, the oxygenated hemoglobin concentration correlated better with the psychoacoustic analysis of each auditory stimulus than that of the conventional filtering method.
... At this moment, an audible phoneme /BA/ or /BI/ was delivered to participants' headphones; this was produced by a male speaker, with a duration of 200 ms and loudness of 70 dB SPL (see Fig. 1C). Given that previous studies have reported distinct neural correlates and brain activation of voice gender processing (Sokhi et al. 2005;Zäske et al. 2009;Latinus and Taylor 2012;Weston et al. 2015), a single (male) voice sample was consistently used across all the experiments. Each audible phoneme (/BA/ and /BI/) was presented on 50% of trials within each trial block in a randomized order. ...
Article
Full-text available
Self-generated overt actions are preceded by a slow negativity as measured by electroencephalogram, which has been associated with motor preparation. Recent studies have shown that this neural activity is modulated by the predictability of action outcomes. It is unclear whether inner speech is also preceded by a motor-related negativity and influenced by the same factor. In three experiments, we compared the contingent negative variation elicited in a cue paradigm in an active vs. passive condition. In Experiment 1, participants produced an inner phoneme, at which an audible phoneme whose identity was unpredictable was concurrently presented. We found that while passive listening elicited a late contingent negative variation, inner speech production generated a more negative late contingent negative variation. In Experiment 2, the same pattern of results was found when participants were instead asked to overtly vocalize the phoneme. In Experiment 3, the identity of the audible phoneme was made predictable by establishing probabilistic expectations. We observed a smaller late contingent negative variation in the inner speech condition when the identity of the audible phoneme was predictable, but not in the passive condition. These findings suggest that inner speech is associated with motor preparatory activity that may also represent the predicted action-effects of covert actions.
... Our results have implications for speech (re)synthesis, particularly when the goal is to achieve perceptually gender ambiguous speech, and communication technology. Many studies that evaluate speaker gender perception attempt to achieve ambiguity by manipulating only fo and formant frequencies (Nagels et al., 2020;Smith et al., 2019;Sokhi et al., 2005;Weston et al., 2015). While manipulation of these two parameters may be sufficient to achieve gender ambiguity for short speech segments such as isolated vowels, our findings indicate that articulatory and intonation features should also be considered when manipulating connected (especially spontaneous) speech. ...
Thesis
A central issue in the study of speech perception is how listeners resolve the vast amount of variability present in speech signals. Gender diversity presents an opportunity to examine how listeners learn and represent one dimension of sociophonetic variability arising from an evolving social category, speaker gender. In this dissertation, a speech corpus of scripted and unscripted utterances was created inclusive of speakers with varying gender identities (e.g., non-binary, transgender men, and transgender women). Read utterances from the corpus were used in an auditory free classification paradigm, in which listeners categorized the speakers on perceived general similarity and gender identity. Cluster and multidimensional scaling analyses were used to ascertain listeners’ perceptual organization of speakers. Cluster solutions for listeners’ categorizations of general similarity revealed a complex hierarchical structure in which speakers were broadly differentiated based on gender prototypicality, and more finely differentiated by masculinity/femininity, age, dialect, vocal quality, and suprasegmental features. Further, listeners used different organizing factors depending on perception of speaker gender. In contrast, cluster solutions for categorizations of gender identity were simplified and demonstrated listeners’ attention to gender prototypicality and masculinity/femininity. Multidimensional scaling analyses revealed two-dimensional solutions, in which listeners demonstrated gradient organization of speakers for each dimension, as the best fit for all free classifications. The first dimension was interpreted as masculinity/femininity, where listeners organized speakers from high to low fundamental frequency and first formant frequency. The second was interpreted as gender prototypicality, where listeners separated speakers with fundamental frequency and first formant frequency at upper and lower extreme values (prototypical) from more intermediate values (non-prototypical). Results suggest that listeners engage in fine-grained analysis of speaker gender that cannot be adequately captured by a “male” versus “female” dichotomy. Assumptions of a gender binary in the study of speech communication may require a critical re-examination to accommodate multidimensional and gradient representation of speaker gender.
... Despite this promising work, calls are being made for further study, focused on achieving an objective understanding of the psychological mechanisms generating and maintaining sector bias. For example, although some studies have highlighted motivated reasoning as a factor that can bolster preexisting beliefs when faced by contradictory evidence, suggesting this may be the main psychological mechanism leading to sector bias (Taber & Lodge, 2006), others have emphasised the effects of reward and preference-related processes when individuals encounter attitude-congruent performance information, as a driver of sector bias (Weston et al., 2015). ...
Article
Full-text available
Governments, political parties and public institutions regularly design and launch communication campaigns emphasising their successes, fostering participation in democratic acts, promoting the use of public services and seeking to boost electoral support. Accordingly, researchers in the fields of politics and public administration have long sought to enhance our understanding of how individuals perceive the performance of services offered by the private and public sectors. In this respect, conflicting findings have been reported: some studies affirm there is an anti-public sector bias, others detect a preference towards public-sector providers, and some have found no evidence of a sector bias, pro or anti. We believe it crucially important to understand the mechanisms underlying sector bias, if it exists. To address the current research gap in this area, we make use of theories and tools drawn from neuropolitics (namely, functional Magnetic Resonance Imaging, fMRI) to elucidate the neurobiological foundations of perceptions regarding the performance of public-sector service providers. The neural findings obtained reveal that brain networks associated with reward and positive values provide a neurobiological explanation for pro-public sector bias, while neural mechanisms linked to aversion, risk, ambiguity and motivated reasoning are associated with an anti-public-sector bias. The implications of these findings should be considered by policymakers; for example, to promote acceptance of public-sector service provision, people must be clearly informed about the goals achieved and other positive aspects.
... Our results have implications for speech (re)synthesis, particularly when the goal is to achieve perceptually gender ambiguous speech. Many studies that evaluate speaker gender perception attempt to achieve ambiguity by manipulating only f o and formant frequencies (Nagels et al., 2020;Smith et al., 2019;Sokhi et al., 2005;Weston et al., 2015). Although manipulation of these two parameters may be sufficient to achieve gender ambiguity for short speech segments, such as isolated vowels, our findings indicate that articulatory and intonation features should also be considered when manipulating connected (especially spontaneous) speech. ...
Article
Examinations of speaker gender perception have primarily focused on the roles of fundamental frequency (fo) and formant frequencies from structured speech tasks using cisgender speakers. Yet, there is evidence to suggest that fo and formants do not fully account for listeners' perceptual judgements of gender, particularly from connected speech. This study investigated the perceptual importance of fo , formant frequencies, articulation, and intonation in listeners' judgements of gender identity and masculinity/femininity from spontaneous speech from cisgender male and female speakers as well as transfeminine and transmasculine speakers. Stimuli were spontaneous speech samples from 12 speakers who are cisgender (6 female and 6 male) and 12 speakers who are transgender (6 transfeminine and 6 transmasculine). Listeners performed a two-alternative forced choice (2AFC) gender identification task and masculinity/femininity rating task in two experiments that manipulated which acoustic cues were available. Experiment 1 confirmed that fo and formant frequency manipulations were insufficient to alter listener judgements across all speakers. Experiment 2 demonstrated that articulatory cues had greater weighting than intonation cues on the listeners' judgements when the fo and formant frequencies were in a gender ambiguous range. These findings counter the assumptions that fo and formant manipulations are sufficient to effectively alter perceived speaker gender.
... Face-voice gender recognition is robust and precocious, appearing as early as 6-8 months in human development (Walker-Andrews et al., 1991;Patterson and Werker, 2002). The TVA and FFA are both involved in the unimodal recognition of gender in their respective modalities (Contreras et al., 2013;Weston et al., 2015). In addition, both of these areas along with a supramodal fronto-parietal network have been found to be activated during face-voice gender categorization with a functional magnetic resonance imaging (fMRI) protocol (Joassin et al., 2011). ...
Article
Full-text available
Cross-modal effects provide a model framework for investigating hierarchical inter-areal processing, particularly, under conditions where unimodal cortical areas receive contextual feedback from other modalities. Here, using complementary behavioral and brain imaging techniques, we investigated the functional networks participating in face and voice processing during gender perception, a high-level feature of voice and face perception. Within the framework of a signal detection decision model, Maximum likelihood conjoint measurement (MLCM) was used to estimate the contributions of the face and voice to gender comparisons between pairs of audio-visual stimuli in which the face and voice were independently modulated. Top–down contributions were varied by instructing participants to make judgments based on the gender of either the face, the voice or both modalities ( N = 12 for each task). Estimated face and voice contributions to the judgments of the stimulus pairs were not independent; both contributed to all tasks, but their respective weights varied over a 40-fold range due to top–down influences. Models that best described the modal contributions required the inclusion of two different top–down interactions: (i) an interaction that depended on gender congruence across modalities (i.e., difference between face and voice modalities for each stimulus); (ii) an interaction that depended on the within modalities’ gender magnitude. The significance of these interactions was task dependent. Specifically, gender congruence interaction was significant for the face and voice tasks while the gender magnitude interaction was significant for the face and stimulus tasks. Subsequently, we used the same stimuli and related tasks in a functional magnetic resonance imaging (fMRI) paradigm ( N = 12) to explore the neural correlates of these perceptual processes, analyzed with Dynamic Causal Modeling (DCM) and Bayesian Model Selection. Results revealed changes in effective connectivity between the unimodal Fusiform Face Area (FFA) and Temporal Voice Area (TVA) in a fashion that paralleled the face and voice behavioral interactions observed in the psychophysical data. These findings explore the role in perception of multiple unimodal parallel feedback pathways.
... In other words, bearing in mind: (a) the way in which voice gender is coded in the specialized areas of the brain (Zäske et al. 2009) to generate auditory representations in the cerebral cortex and at the neuronal level (Mullenix et al. 1995);(b) recent studies that have demonstrated how dimensions of timbre are processed in the non-primary auditory cortex (Weston et al. 2015), enabling instantaneous identification of a speaker's voice gender; and (c) differences in the formant structure of speakers of the same language, likewise making it possible to distinguish a speaker's gender (Charest et al. 2013, Strand 2000, it is clear that, well before the verbal, linguistic and terminological characteristics of a simultaneous interpretation are discerned, culturally attributed markers of gender and age, as perceived through the interpreter's voice, are registered immediately and unconsciously by the listener, forming an impression of the interpretation they are hearing that is consistent with the generally accepted acoustic characteristics of voice. Hence, the most professional and credible performance is attributed to the experienced male interpreter (EM), while the delivery with the highest quality is ascribed to the novice male interpreter (NM). ...
Chapter
Full-text available
This study investigates the impact that an interpreter’s gender, as con- veyed by her or his voice, has on the perception of the quality of simul- taneous interpreting by conference audiences. In a controlled environ- ment, we asked 63 subjects in Mexico City to evaluate the recorded per- formances of four interpreters.
... In humans, these areas develop around 7 months of postnatal age (Blasi et al., 2011;Grossmann et al., 2010), in parallel with adult-like processing of naturalistic sounds in the auditory system by 3− 9-month old infants (Wild et al., 2017). The TVA not only responds stronger to voice compared to other sounds but some of its subregions show enhanced activity to various types of voice signals, such as voice identity (Latinus et al., 2013), gender (Weston et al., 2015), body size (Von Kriegstein et al., 2010), affect (Ethofer et al., 2012;Frühholz and Grandjean, 2013a), attractiveness (Bestelmeyer et al., 2012), dialect and accent . In deaf individuals, the TVA may even support visual face processing, demonstrating functional plasticity (Benetti et al., 2017). ...
Article
Full-text available
While humans have developed a sophisticated and unique system of verbal auditory communication, they also share a more common and evolutionarily important nonverbal channel of voice signaling with many other mammalian and vertebrate species. This nonverbal communication is mediated and modulated by the acoustic properties of a voice signal, and is a powerful – yet often neglected – means of sending and perceiving socially relevant information. From the viewpoint of dyadic (involving a sender and a signal receiver) voice signal communication, we discuss the integrated neural dynamics in primate nonverbal voice signal production and perception. Most previous neurobiological models of voice communication modelled these neural dynamics from the limited perspective of either voice production or perception, largely disregarding the neural and cognitive commonalities of both functions. Taking a dyadic perspective on nonverbal communication, however, it turns out that the neural systems for voice production and perception are surprisingly similar. Based on the interdependence of both production and perception functions in communication, we first propose a re-grouping of the neural mechanisms of communication into auditory, limbic, and paramotor systems, with special consideration for a subsidiary basal-ganglia-centered system. Second, we propose that the similarity in the neural systems involved in voice signal production and perception is the result of the co-evolution of nonverbal voice production and perception systems promoted by their strong interdependence in dyadic interactions.
... Bentin et al., 1996; Freeman et al., 2010); 而 声音的初级性别加工主要在颞上沟(Charest et al., 2013;Weston et al., 2015), 对应的关键脑电成分 为 N1(Latinus & Taylor, 2012;Zaske et al., 2009)。 面孔性别信息的高级加工主要在脑岛、额下回、 眶额回等脑区(Kaul et al., 2011), 关键的脑电成分 为 P300 (Tomelleri & Castelli, 2012); 而声音性别 信息的高级加工脑区主要为前扣带回、额下回和 脑岛等脑区(Charest et al., 2013), 关键脑电成分 为 P2 (Latinus & Taylor, 2012; Zaske et al.Contreras et al., 2013; Wiese et al., 2012), 但对人 类而言, 性别本身的重要性要超过其他维度的个 体信息。另一方面, 性别加工的研究方法各不相 同, 难以进行有效的整合。 例如, 对于如何测量性 别效应, 存在很多方法上的差异, 可归纳为: 男性 和女性刺激的差异(Ino et al., 2010; Zhang et al., 2018); 典型性别和模糊性别的差异(Freeman et al., 2010)、 性别任务和非性别任务的差异(Wiese et al., 2012); 性 别 信 息 带 来 的 效 应 , 如 冲 突 效 应 (Yokoyama et al., 2014)、重复抑制(Podrebarac et al., 2013)、性别后效(Ng et al.Hyde et al., 2019)。而且, 越来越多的研究者 将 性 别 描 述 成 一 种 连 续 的 变 量 , 即 男 性 化 (masculinity) 或 女 性 化 (femininity) 的 倾 向 (Gilani et al., 2014; Kozlowski, 2015)。从采用线性变换的 连续性刺激得到的行为结果(一条中间陡峭, 两侧 平缓的 Sigmoid 曲线(Charest et al., 2013; Freeman et al., 2010; Zhou et al., 2014))来看, 吴彬星, 张智君, 孙雨生. (2014). ...
... Presumably the most important factor, voice gender (female vs. male speaker) would be expected to cause variation in spectro-temporal voice characteristics (e.g. fundamental frequency) [76,77]. For instance, a former ERP study on voice gender categorization found female and male voices to trigger larger P2 and N1 components, respectively [76]. ...
Article
Full-text available
Objective: Degradations of transmitted speech have been shown to affect perceptual and cognitive processing in human listeners, as indicated by the P3 component of the event-related brain potential (ERP). However, research suggests that previously observed P3 modulations might actually be traced back to earlier neural modulations in the time range of the P1-N1-P2 complex of the cortical auditory evoked potential (CAEP). This study investigates whether auditory sensory processing, as reflected by the P1-N1-P2 complex, is already systematically altered by speech quality degradations. Approach: Electrophysiological data from two studies were analyzed to examine effects of speech transmission quality (high-quality, noisy, bandpass-filtered) for spoken words on amplitude and latency parameters of individual P1, N1 and P2 components. Main results: In the resultant ERP waveforms, an initial P1-N1-P2 manifested at stimulus onset, while a second N1-P2 occurred within the ongoing stimulus. Bandpass-filtered versus high-quality word stimuli evoked a faster and larger initial N1 as well as a reduced initial P2, hence exhibiting effects as early as the sensory stage of auditory information processing. Significance: The results corroborate the existence of systematic quality-related modulations in the initial N1-P2, which may potentially have carried over into P3 modulations demonstrated by previous studies. In future psychophysiological speech quality assessments, rigorous control procedures are needed to ensure the validity of P3-based indication of speech transmission quality. An alternative CAEP-based assessment approach is discussed, which promises to be more efficient and less constrained than the established approach based on P3.
... There is evidence that TVA and FFA are both involved in the unimodal recognition of gender in their respective modalities (Charest et al., 2012;Contreras et al., 2013;Kaul et al., 2011;Weston et al., 2015). In addition, both of these areas along with a supramodal frontoparietal network have been found to be activated during a face-voice gender categorization with a functional magnetic resonance imaging (fMRI) protocol (Joassin et al., 2011). ...
Preprint
Full-text available
Multimodal integration provides an ideal framework for investigating top-down influences in perceptual integration. Here, we investigate mechanisms and functional networks participating in face-voice multimodal integration during gender perception by using complementary behavioral (Maximum Likelihood Conjoint Measurement) and brain imaging (Dynamic Causal Modeling of fMRI data) techniques. Thirty-six subjects were instructed to judge pairs of face-voice stimuli either according to the gender of the face (face task), the voice (voice task) or the stimulus (stimulus task; no specific modality instruction given). Face and voice contributions to the tasks were not independent, as both modalities significantly contributed to all tasks. The top-down influences in each task could be modeled as a differential weighting of the contributions of each modality with an asymmetry in favor of the auditory modality in terms of magnitude of the effect. Additionally, we observed two independent interaction effects in the decision process that reflect both the coherence of the gender information across modalities and the magnitude of the gender difference from neutral. In a second experiment we investigated with functional MRI the modulation of effective connectivity between the Fusiform Face Area (FFA) and the Temporal Voice Area (TVA), two cortical areas implicated in face and voice processing. Twelve participants were presented with multimodal face-voice stimuli and instructed to attend either to face, voice or any gender information. We found specific changes in effective connectivity between these areas in the same conditions that generated behavioral interactions. Taken together, we interpret these results as converging evidence supporting the existence of multiple parallel hierarchical systems in multi-modal integration.
... Until now, only little is known about the impact of visual face information on voice gender discrimination (one previous study focused on voice gender discrimination in isolation 34 ), maybe because there are generally fewer studies of voices than of faces. 3 However, a recent study 35 focused on a related issue, namely the effects of perceived visual gender on voice type categorization (bass, baritone, alto, and soprano) using stimuli consisting of notes sung by male and female singers at the same g3 pitch. ...
Article
The processing of voices and faces is known to interact, for example, when recognizing other persons. However, few studies focus on both directions of this interaction, including the influence of incongruent visual stimulation on voice perception. In the present study, we implemented an interference paradigm involving 1152 videos of faces with either gender-congruent or gender-incongruent voices. Participants were asked to categorize the gender of either the face or the voice via key press. Task (face-based vs. voice-based gender categorization task) was manipulated both block-wise (relatively low executive control demands) and in a mixed block (relatively high executive control demands due to trial-by-trial task switches). We aimed at testing whether and how gender-incongruent stimuli negatively affected gender categorization speed and accuracy. The results indicate significant congruency effects in both directions – gender-incongruent visual information negatively affected voice categorization time and errors, and gender-incongruent voices affected visual face categorization. However, the former effect was stronger, supporting theories postulating visual dominance in face-voice integration. Congruency effects, which were not significantly reduced over the course of the experiment, were larger under high executive control demands (task switches), suggesting the availability of fewer attentional resources for incongruency resolution. Overall, voices generally appear to be processed in conjunction with facial information, which yields enhanced processing for more authentic voices, that is, voices that do not violate face-based expectancies. The data strengthen theories of face-voice processing emphasizing strong interaction between both processing channels.
... Thus, when participants attend to a talker based on cues for spatial location or gender, activity or phase-locking may increase in neurons that preferentially encode the spatial location of the target talker (e.g. those tuned to the left or right hemifield; for a review, see Salminen, Tiitinen, & May, 2012) in combination with neurons that preferentially encode the f0 and/or VTL of the target talker (Mäkelä et al., 2002;Steinschneider, Nourski, & Fishman, 2013;Weston, Hunter, Sokhi, Wilkinson, & Woodruff, 2015); those features could potentially be 'bound' together by synchronous oscillatory activity (Clayton, Yeung, & Cohen Kadosh, 2015;Engel & Singer, 2001;Singer, 1993). Importantly, the same pattern of RTs for the trial-by-trial analyses were observed on attendlocation and attend-gender trials, which is inconsistent with the alternative explanation that participants were either using space-based or feature-based attention on both types of trial. ...
Article
Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1000, and 2000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker’s spatial location or their gender. Participants directed attention to location and gender simultaneously (‘objects’) at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.
... Contrary to expectations, activations in the postcentral, ACC and amygdala areas are recorded when comparing young vs. old voices. On the one hand, some fMRI studies analyzing emotional speech found postcentral activations when comparing female vs. male voices (Weston et al. 2015). Since female voices (like young voices) have a higher timbre and pitch than male voices (like old voices), it is reasonable to identify activation of the postcentral gyrus in young vs. old voices. ...
Article
Full-text available
Ecological information offered to society through advertising enhances awareness of environmental issues, encourages development of sustainable attitudes and intentions, and can even alter behavior. This paper, by means of functional Magnetic Resonance Imaging (fMRI) and self-reports, explores the underlying mechanisms of processing ecological messages. The study specifically examines brain and behavioral responses to persuasive ecological messages that differ in temporal framing and in the age of the voice pronouncing them. The findings reveal that attitudes are more positive toward future-framed messages presented by young voices. The whole-brain analysis reveals that future-framed (FF) ecological messages trigger activation in brain areas related to imagery, prospective memories and episodic events, thus reflecting the involvement of past behaviors in future ecological actions. Past-framed messages (PF), in turn, elicit brain activations within the episodic system. Young voices (YV), in addition to triggering stronger activation in areas involved with the processing of high-timbre, high-pitched and high-intensity voices, are perceived as more emotional and motivational than old voices (OV) as activations in anterior cingulate cortex and amygdala. Messages expressed by older voices, in turn, exhibit stronger activation in areas formerly linked to low-pitched voices and voice gender perception. Interestingly, a link is identified between neural and self-report responses indicating that certain brain activations in response to future-framed messages and young voices predicted higher attitudes toward future-framed and young voice advertisements, respectively. The results of this study provide invaluable insight into the unconscious origin of attitudes toward environmental messages and indicate which voice and temporal frame of a message generate the greatest subconscious value.
... In this research, a new parameter was applied to model spatio-temporal fluctuations of EEG spectral content: SF. It has been previously used in the context of acoustic signal processing to measure changes in time-frequency spectra [47]. In the case of neural signal processing, only a few studies applied SF to characterize, for instance, behavioral states in newborn babies [48]. ...
Article
Background: An accurate characterization of neural dynamics in mild cognitive impairment (MCI) is of paramount importance to gain further insights into the underlying neural mechanisms in Alzheimer’s disease (AD). Nevertheless, there has been relatively little research on brain dynamics in prodromal AD. As a consequence, its neural substrates remain unclear. Methods: In the present research, electroencephalographic (EEG) recordings from patients with dementia due to AD, subjects with MCI due to AD and healthy controls (HC) were analyzed using relative power (RP) in conventional EEG frequency bands and a novel parameter useful to explore the spatio-temporal fluctuations of neural dynamics: the spectral flux (SF). Results: Our results suggest that dementia due to AD is associated with a significant slowing of EEG activity and several significant alterations in spectral fluctuations at low (i.e. theta) and high (i.e. beta and gamma) frequency bands compared to HC (p < 0.05). Furthermore, subjects with MCI due to AD exhibited a specific frequency-dependent pattern of spatio-temporal abnormalities, which can help identify neural mechanisms involved in cognitive impairment preceding AD. Classification analyses using linear discriminant analysis with a leave-one-out cross-validation procedure showed that the combination of RP and within-electrode SF at the beta band was useful to obtain a 77.3 % of accuracy to discriminate between HC and AD patients. In the case of comparison between HC and MCI subjects, the classification accuracy reached a value of 79.2 %, combining within-electrode SF at beta and gamma bands. SF has proven to be a useful measure to obtain an original description of brain dynamics at different stages of AD. Conclusion: Consequently, SF may contribute to gain a more comprehensive understanding into neural substrates underlying MCI, as well as to develop potential early AD biomarkers.
... In this research, a new parameter was applied to model spatio-temporal fluctuations of EEG spectral content: SF. It has been previously used in the context of acoustic signal processing to measure changes in time-frequency spectra [47]. In the case of neural signal processing, only a few studies applied SF to characterize, for instance, behavioral states in newborn babies [48]. ...
Article
Background: An accurate characterization of neural dynamics in mild cognitive impairment (MCI) is of paramount importance to gain further insights into the underlying neural mechanisms in Alzheimer's disease (AD). Nevertheless, there has been relatively little research on brain dynamics in prodromal AD. As a consequence, its neural substrates remain unclear. Methods: In the present research, electroencephalographic (EEG) recordings from patients with dementia due to AD, subjects with MCI due to AD and healthy controls (HC) were analyzed using relative power (RP) in conventional EEG frequency bands and a novel parameter useful to explore the spatio-temporal fluctuations of neural dynamics: the spectral flux (SF). Results: Our results suggest that dementia due to AD is associated with a significant slowing of EEG activity and several significant alterations in spectral fluctuations at low (i.e. theta) and high (i.e. beta and gamma) frequency bands compared to HC (p < 0.05). Furthermore, subjects with MCI due to AD exhibited a specific frequency-dependent pattern of spatio-temporal abnormalities, which can help identify neural mechanisms involved in cognitive impairment preceding AD. Classification analyses using linear discriminant analysis with a leave-one-out cross-validation procedure showed that the combination of RP and within-electrode SF at the beta band was useful to obtain a 77.3 % of accuracy to discriminate between HC and AD patients. In the case of comparison between HC and MCI subjects, the classification accuracy reached a value of 79.2 %, combining within-electrode SF at beta and gamma bands. SF has proven to be a useful measure to obtain an original description of brain dynamics at different stages of AD. Conclusion: Consequently, SF may contribute to gain a more comprehensive understanding into neural substrates underlying MCI, as well as to develop potential early AD biomarkers.
... It should be noted that in this study, only youngeradult females' voices were used. Since previous studies have shown that male and female voices are represented as distinct auditory objects in the human nonprimary auditory cortex (Weston et al. 2015), it is necessary in the future to examine whether similar results can be obtained when male-voice stimuli are used. ...
Article
Full-text available
Background Under ‘cocktail party’ listening conditions, healthy listeners and listeners with schizophrenia can use temporally pre-presented auditory speech-priming (ASP) stimuli to improve target-speech recognition, even though listeners with schizophrenia are more vulnerable to informational speech masking. Method Using functional magnetic resonance imaging, this study searched for both brain substrates underlying the unmasking effect of ASP in 16 healthy controls and 22 patients with schizophrenia, and brain substrates underlying schizophrenia-related speech-recognition deficits under speech-masking conditions. Results In both controls and patients, introducing the ASP condition (against the auditory non-speech-priming condition) not only activated the left superior temporal gyrus (STG) and left posterior middle temporal gyrus (pMTG), but also enhanced functional connectivity of the left STG/pMTG with the left caudate. It also enhanced functional connectivity of the left STG/pMTG with the left pars triangularis of the inferior frontal gyrus (TriIFG) in controls and that with the left Rolandic operculum in patients. The strength of functional connectivity between the left STG and left TriIFG was correlated with target-speech recognition under the speech-masking condition in both controls and patients, but reduced in patients. Conclusions The left STG/pMTG and their ASP-related functional connectivity with both the left caudate and some frontal regions (the left TriIFG in healthy listeners and the left Rolandic operculum in listeners with schizophrenia) are involved in the unmasking effect of ASP, possibly through facilitating the following processes: masker-signal inhibition, target-speech encoding, and speech production. The schizophrenia-related reduction of functional connectivity between the left STG and left TriIFG augments the vulnerability of speech recognition to speech masking.
Article
Full-text available
La discriminación tonal es un proceso que involucra múltiples dominios cognitivos y ha sido asociada con la regulación de diversas áreas neuroanatómicas. Los músicos presentan una percepción tonal incrementada, vinculada a cambios cerebrales tanto estructurales como funcionales. Se han identificado áreas clave en la discriminación tonal y relación con la funcionalidad cerebral. Dado el interés en conocer formas cómo el cerebro procesa los sonidos, la discriminación tonal representa un aspecto fundamental, ya que sugiere un procesamiento diferenciado entre personas con y sin instrucción musical. Esta revisión explora la diversidad de correlaciones en áreas específicas del sistema nervioso y las diferencias funcionales entre músicos y no músicos. El giro de Heschl, el giro temporal superior y el plano temporal se han señalado como zonas clave en la discriminación tonal, vinculadas con áreas poco exploradas, además de cambios funcionales entre músicos, músicos con tono absoluto y no músicos. Sugerimos integrar áreas relacionadas con la discriminación tonal, realizar paradigmas de evaluación de funciones cognitivas en músicos para identificar diferencias en capacidades funcionales específicas, y ampliar los estudios entre músicos con y sin tono absoluto para analizar posibles diferencias en el Sistema Nervioso Central.
Article
Psychophysical methods were used to study the features of identifying the gender of a speaker based on voice characteristics under conditions of speech-like interference and stimulation through headphones. We used a set of speech signals and multi-talker noise from experiments in a free sound field – a spatial scene (Andreeva et al., 2019). The set included 8 disyllabic words spoken by 4 speakers: 2 male and 2 female voices with average fundamental frequencies of 117, 139, 208 and 234 Hz. Multi-talker noise represented the result of mixing all audio files (8 words * 4 speakers). The signal-to-noise ratio was 1:1, which subjectively corresponded to the maximum noise level in the spatial scene (SNR = –14 dB). Adult subjects from 17 to 57 years old (n = 42) participated in the experiments. Additionally, 3 age subgroups were identified: 18.6±1.5 years (n = 27); 28±4.1 years (n = 7); 46±5.4 years (n = 8). All subjects had normal hearing. The results of the study and their comparison with the data of mentioned work confirmed the importance of voice characteristics for the auditory analysis of complex spatial (free sound field) and non-spatial (headphones) scenes, and also demonstrated the role of mechanisms of the masking and binaural perception, in particular, the high-frequency mechanism of spatial hearing. A relation the perceptual assessment of the gender by voice in noise and the age of the subjects and the gender of the speakers (male/female voice) was also found. The results are of practical importance for the organization of hearing-speech training, early detection of speech hearing interference immunity impairment, as well as the development of noise-resistant systems for automatic speaker verification and hearing aid technologies.
Article
Background and Purpose: In the dichotic listening test, not only the asymmetry of the auditory system is evaluated but also the associated cognitive processes. The aim of this study was to examine the effect of male and female voices on dichotic listening. Methods: Participants consisted of 10 men and 14 women, aged 18–45 (28.54±6.23) without neurological or auditory disorders. The dichotic listening test was applied to four different sessions. These sessions: female voices in both ears, male voices in both ears, male and female voices in both ears (mono session), male voices in the right ear, female voices in the left ear, and female voices in the right ear, and male voices in the left ear (stereo). Results: It was determined that brain lateralization was significantly reduced in the female session compared with the stereo session; in other words, right ear dominance decreased (p=0.026, d=-0.293). It was determined that there was a significant difference between the number of errors in male and stereo sessions. Conclusion: Participants preferred the syllables voiced with male voice more in mono and stereo sessions. It is observed that female participants mostly prefer syllables voiced with a male voice, and male participants prefer syllables voiced with a female voice.
Article
Gender characteristics of speech to do with prosody and linguistic style influence how a person is understood. We investigated electrophysiological correlates of these. Twenty-four people listened to sentences spoken by a female who manipulated her prosody by using a feminine and an imitated masculine voice and her linguistic style by using function words differentially associated with males and females. Event-related potentials unexpectedly showed a larger N400 elicited by imitated masculine compared to feminine prosody. A prosody by linguistic style interaction was also found in late positive components and a later window, where sentences congruent with speaker sex and gender (i.e. feminine prosody, linguistic style, and voice) were more negative going than sentences that were not. Further results showed less upper-alpha (∼10–13 Hz) event-related desynchronisation with imitated masculine compared to feminine prosody in a late time-window. These results suggest gender atypical speech affects early and reduces later semantic processing.
Chapter
Gender detection of speech signal is a significant task as it is the preliminary step to build an identification system to identify a person. Gender detection using speech is comparatively easier than that from facial changes. Moreover gender detection is required for security purpose also. There are so many approaches for gender detection—Facial image based, fingerprint based, etc. For simplicity of work, we have offered a simple low-level facet-based scheme for detecting gender of a speaker in our present work. We have worked with pitch based acoustic facets to identify gender. Neural network (NN), Naïve Bayes, and random forest classifier have been exercised for sorting task. From the investigational upshot and comparative scrutiny, the strength of the proposed facet set can be understood.
Thesis
Full-text available
DISSERTATION ABSTRACT This dissertation examines electronic and electroacoustic music written after 1950 with a focus on works composers attribute with erotic connotations. It explores the ways in which music presents eroticism, how composers envision a musically erotic subject as well as what listeners find aurally stimulating or provocative about music. Compositions by Pierre Schaeffer, Luc Ferrari, Robert Normandeau, Annea Lockwood, Alice Shields, Barry Truax, Pauline Oliveros, Juliana Hodkinson, and Niels Rønsholdt, exhibit common musical idioms, such as the drive to climax, use of the female voice, and visual or textual imagery. But beyond these commonalities, the dissertation’s framing theoretical, critical, and philosophical analyses prove each work exhibits erotic qualities particular to its social, historical, and music-compositional climate. Early works aspire toward a Husserlian essence of the erotic, paralleling the scientific objectivity of the 1950s; in the 1980s and ’90s, many erotic works deemphasize male sexual pleasure to mirror second-wave feminist critiques of pornography; and, on the heels of this corrective, composers at the turn of the twenty-first century use digital processing to reorient gender and sexual markers. Reacting to electronic music’s historical disregard for gender and sexual difference, this dissertation exposes the philosophical, psychological, socio-cultural, and historical relevance of eroticism in electronic and electroacoustic works. Key words: new musicology, music theory, cultural musicology, female voice, music and sexuality, gender, electronic music, electroacoustic music, computer music, analysis of electroacoustic music, erotic art, representation, philosophy, pornography, digital signal processing, Pierre Schaeffer, Luc Ferrari, Robert Normandeau, Annea Lockwood, Alice Shields, Barry Truax, Pauline Oliveros, Juliana Hodkinson, Niels Rønsholdt *** “Sexus erklingt: Erotische Strömungen in der elektronischen Musik” von Danielle Sofer BA MMus MA DISSERTATION ABSTRACT Diese Dissertation untersucht elektronische und elektroakustische Musikstücke, die nach 1950 komponiert wurden, und denen ihre Komponistinnen und Komponisten erotische Konnotationen zugeschrieben haben. Es wird erforscht, wie die Erotik in der Musik dargestellt ist, wie sich Komponistinnen und Komponisten das musikalisch-erotische Sujet vorgestellt haben, und was Zuhörer als stimulierend oder provokativ in dieser Musik empfinden. Kompositionen von Pierre Schaeffer, Luc Ferrari, Robert Normandeau, Annea Lockwood, Alice Shields, Barry Truax, Pauline Oliveros, Juliana Hodkinson, sowie Niels Rønsholdt weisen gemeinsame musikalische Idiome auf, die in der Regel benutzt werden, um Erotik in der Musik zu evozieren, wie beispielsweise das Höhepunktsstreben, der Einsatz der Frauenstimme, und visuelle oder textuelle Bildsprache. Über diese Gemeinsamkeiten hinaus weist die theoretische, kritische und philosophische Analyse dieser Dissertation zudem nach, dass jedes Werk über spezielle erotische Qualitäten verfügt, die spezifisch für sein soziales, historisches und kompositorisches Umfeld sind. Frühe Kompositionen streben nach einem Husserlschen Wesen der Erotik, parallel zur wissenschaftlichen Objektivität der 1950er Jahre. Viele erotische Musikstücke der 1980er und 1990er Jahre schwächen die Merkmale der männlichen Lust am Sex ab, um dadurch die feministische Pornografiekritik der zweiten Welle der Frauenbewegung widerzuspiegeln. Kurz nach diesem Korrektiv, an der Schwelle zum 21. Jahrhundert, haben Komponistinnen und Komponisten digitale Verarbeitungsmethoden benutzt, um Gender- und Sexualitätsverhältnisse neu auszurichten. Als Reaktion auf die historische Gleichgültigkeit bezüglich Gender- und Sexualitätsdifferenz im Bereich der elektronischen Musik zeigt diese Dissertation die philosophische, psychologische, sozio-kulturelle und historische Relevanz der Erotik in elektronischen und elektroakustischen Musikstücken auf.
Chapter
Mounting evidence shows that biological sex and gender impacts how our bodies and brains work. As our traditional scientific model, heavily influenced by misguided policies and ingrained cultures, is rooted in the belief that males and females are interchangeable outside of our reproductive zones, it’s time for a scientific reboot. Depending upon the context, our chromosomes, hormones, and life experiences effect our lives in ways which are both inconsequential and critically important. To practice up-to-date medicine and optimize our own resiliency, it’s important to understand and openly discuss these very real differences. This introductory chapter is designed as a “sex and gender boot camp” and will review basic definitions, explore clinical and professional examples of sex and gender differences, provide a template for framing differences, and share the author’s personal experiences in discovering this material and using it to become more resilient.
Article
Full-text available
The role of the left and right hemispheres in processing the gender of voices is controversial, some evidence suggesting a bilateral involvement, some others suggesting a right-hemispheric superiority. We investigated this issue in a gender categorization task involving healthy participants and a male split-brain patient: female or male natural voices were presented in one ear during the simultaneous presentation of white noise in the other ear (dichotic listening paradigm). Results revealed faster responses by the healthy participants for stimuli presented in the left than in the right ear, although no asymmetries emerged between the two ears in the accuracy of both the patient and the control group. Healthy participants were also more accurate at categorizing female than male voices, and an opposite-gender bias emerged – at least in females – showing faster responses in categorizing voices of the opposite gender. The results support a bilateral hemispheric involvement in voice gender categorization, without asymmetries in the patient, but with a faster categorization when voices are directly presented to the right hemisphere in the healthy sample. Moreover, when the two hemispheres directly interact with one another, a faster categorization of voices of the opposite gender emerges, and it can be an evolutionary grounded bias.
Article
Full-text available
Thesis
Full-text available
KEYWORDS: phenomenology, voice, sound, presence, embodiment, artistic research How can voice be understood as a theme for philosophical inquiry? Using mostly phenomenological strategies and approximations, this thesis attempts to map a transdisciplinary constellation of nexus points, where voice emerges as an expressive manifestation of presence and living processes. Simultaneously situated, bodily and transgressive in the context of the notion of acoustic territories, the ambiguity of the voice and its potential as a both phenomenon, concept and tangible resonance of subjectivity, are explored via an inquiry informed by Ancient Philosophy, Sound Studies, Acoustics, Phenomenology and Artistic Research.
Conference Paper
Automatic gender recognition has been becoming very important in potential applications. Many state-of-the-art gender recognition approaches based on a variety of biometrics, such as face, body shape, voice, are proposed recently. Among them, relying on voice is suboptimal due to significant variations in pitch, emotion, and noise in real-world speech. Inspired from the speaker recognition approaches relying on i-vector presentation in NIST SRE, it's believed that i-vector contains information about gender as a part of speaker's characters, and works for speaker recognition as well as for gender recognition in complex environments. So, we apply the total variability space analysis to gender classification and propose i-vector based discrimination for speaker gender recognition. The results of experiments on TIMIT corpus and NUST603_2014 database show that the proposed i-vector based speaker gender recognition improves the performance up to 99.9%, and surpasses the pitch method and UBM-SVM baseline subsystems in term of accuracy comparatively.
Article
Full-text available
Normal listeners effortlessly determine a person's gender by voice, but the cerebral mechanisms underlying this ability remain unclear. Here, we demonstrate 2 stages of cerebral processing during voice gender categorization. Using voice morphing along with an adaptation-optimized functional magnetic resonance imaging design, we found that secondary auditory cortex including the anterior part of the temporal voice areas in the right hemisphere responded primarily to acoustical distance with the previously heard stimulus. In contrast, a network of bilateral regions involving inferior prefrontal and anterior and posterior cingulate cortex reflected perceived stimulus ambiguity. These findings suggest that voice gender recognition involves neuronal populations along the auditory ventral stream responsible for auditory feature extraction, functioning in pair with the prefrontal cortex in voice gender perception.
Article
Full-text available
Voice gender perception can be thought of as a mixture of low-level perceptual feature extraction and higher-level cognitive processes. Although it seems apparent that voice gender perception would rely on low-level pitch analysis, many lines of research suggest that this is not the case. Indeed, voice gender perception has been shown to rely on timbre perception and to be categorical, i.e., to depend on accessing a gender model or representation. Here, we used a unique combination of acoustic stimulus manipulation and mathematical modeling of human categorization performances to determine the relative contribution of pitch and timbre to this process. Contrary to the idea that voice gender perception relies on timber only, we demonstrate that voice gender categorization can be performed using pitch only but more importantly that pitch is used only when timber information is ambiguous (i.e., for more androgynous voices).
Article
Full-text available
Gender is salient, socially critical information obtained from faces and voices, yet the brain processes underlying gender discrimination have not been well studied. We investigated neural correlates of gender processing of voices in two ERP studies. In the first, ERP differences were seen between female and male voices starting at 87 ms, in both spatial-temporal and peak analyses, particularly the fronto-central N1 and P2. As pitch differences may drive gender differences, the second study used normal, high- and low-pitch voices. The results of these studies suggested that differences in pitch produced early effects (27-63 ms). Gender effects were seen on N1 (120 ms) with implicit pitch processing (study 1), but were not seen with manipulations of pitch (study 2), demonstrating that N1 was modulated by attention. P2 (between 170 and 230 ms) discriminated male from female voices, independent of pitch. Thus, these data show that there are two stages in voice gender processing; a very early pitch or frequency discrimination and a later more accurate determination of gender at the P2 latency.
Article
Full-text available
Key features of the voice--fundamental frequency (F(0)) and formant frequencies (Fn)--can vary extensively among individuals. Some of this variation might cue fitness-related, biosocial dimensions of speakers. Three experiments tested the independent, joint and relative effects of F(0) and Fn on listeners' assessments of the body size, masculinity (or femininity), and attractiveness of male and female speakers. Experiment 1 replicated previous findings concerning the joint and independent effects of F(0) and Fn on these assessments. Experiment 2 established frequency discrimination thresholds (or just-noticeable differences, JND's) for both vocal features to use in subsequent tests of their relative salience. JND's for F(0) and Fn were consistent in the range of 5%-6% for each sex. Experiment 3 put the two voice features in conflict by equally discriminable amounts and found that listeners consistently tracked Fn over F(0) in rating all three dimensions. Several non-exclusive possibilities for this outcome are considered, including that voice Fn provides more reliable cues to one or more dimensions and that listeners' assessments of the different dimensions are partially interdependent. Results highlight the value of first establishing JND's for discrimination of specific features of natural voices in future work examining their effects on voice-based social judgments.
Article
Full-text available
In the mature adult brain, there are voice selective regions that are especially tuned to familiar voices. Yet, little is known about how the infant's brain treats such information. Here, we investigated, using electrophysiology and source analyses, how newborns process their mother's voice compared with that of a stranger. Results suggest that, shortly after birth, newborns distinctly process their mother's voice at an early preattentional level and at a later presumably cognitive level. Activation sources revealed that exposure to the maternal voice elicited early language-relevant processing, whereas the stranger's voice elicited more voice-specific responses. A central probably motor response was also observed at a later time, which may reflect an innate auditory-articulatory loop. The singularity of left-dominant brain activation pattern together with its ensuing sustained greater central activation in response to the mother's voice may provide the first neurophysiologic index of the preferential mother's role in language acquisition.
Article
Full-text available
This paper investigates the theoretical basis for estimating vocal-tract length (VTL) from the formant frequencies of vowel sounds. A statistical inference model was developed to characterize the relationship between vowel type and VTL, on the one hand, and formant frequency and vocal cavity size, on the other. The model was applied to two well known developmental studies of formant frequency. The results show that VTL is the major source of variability after vowel type and that the contribution due to other factors like developmental changes in oral-pharyngeal ratio is small relative to the residual measurement noise. The results suggest that speakers adjust the shape of the vocal tract as they grow to maintain a specific pattern of formant frequencies for individual vowels. This formant-pattern hypothesis motivates development of a statistical-inference model for estimating VTL from formant-frequency data. The technique is illustrated using a third developmental study of formant frequencies. The VTLs of the speakers are estimated and used to provide a more accurate description of the complicated relationship between VTL and glottal pulse rate as children mature into adults.
Article
Full-text available
By sucking on a nonnutritive nipple in different ways, a newborn human could produce either its mother's voice or the voice of another female. Infants learned how to produce the mother's voice and produced it more often than the other voice. The neonate's preference for the maternal voice suggests that the period shortly after birth may be important for initiating infant bonding to the mother.
Article
Full-text available
The purpose of this study was to replicate and extend the classic study of vowel acoustics by Peterson and Barney (PB) [J. Acoust. Soc. Am. 24, 175-184 (1952)]. Recordings were made of 45 men, 48 women, and 46 children producing the vowels /i,I,e, epsilon,ae,a, [symbol: see text],O,U,u, lambda,3 iota/ in h-V-d syllables. Formant contours for F1-F4 were measured from LPC spectra using a custom interactive editing tool. For comparison with the PB data, formant patterns were sampled at a time that was judged by visual inspection to be maximally steady. Analysis of the formant data shows numerous differences between the present data and those of PB, both in terms of average frequencies of F1 and F2, and the degree of overlap among adjacent vowels. As with the original study, listening tests showed that the signals were nearly always identified as the vowel intended by the talker. Discriminant analysis showed that the vowels were more poorly separated than the PB data based on a static sample of the formant pattern. However, the vowels can be separated with a high degree of accuracy if duration and spectral change information is included.
Article
Full-text available
The perceptual representation of voice gender was examined with two experimental paradigms: identification/discrimination and selective adaptation. The results from the identification and discrimination of a synthetic male-female voice continuum indicated that voice gender perception was not categorical. In addition, results from selective adaptation experiments with natural and synthetic voice stimuli indicated that the perceptual representation of voice adapted is an auditory-based representation. Overall, these findings suggest that the perceptual representation of voice gender is auditory based and is qualitatively different from the representation of phonetic information.
Article
Full-text available
In the present experiment, 25 adult subjects discriminated speech tokens ([ba]/[da]) or made pitch judgments on tone stimuli (rising/falling) under both binaural and dichotic listening conditions. We observed that when listeners performed tasks under the dichotic conditions, during which greater demands are made on auditory selective attention, activation within the posterior (parietal) attention system and at primary processing sites in the superior temporal and inferior frontal regions was increased. The cingulate gyrus within the anterior attention system was not influenced by this manipulation. Hemispheric differences between speech and nonspeech tasks were also observed, both at Broca's Area within the inferior frontal gyrus and in the middle temporal gyrus.
Article
Full-text available
Magnetic resonance imaging was used to quantify the vocal tract morphology of 129 normal humans, aged 2-25 years. Morphometric data, including midsagittal vocal tract length, shape, and proportions, were collected using computer graphic techniques. There was a significant positive correlation between vocal tract length and body size (either height or weight). The data also reveal clear differences in male and female vocal tract morphology, including changes in overall vocal tract length and the relative proportions of the oral and pharyngeal cavity. These sex differences are not evident in children, but arise at puberty, suggesting that they are part of the vocal remodeling process that occurs during puberty in males. These findings have implications for speech recognition, speech forensics, and the evolution of the human speech production system, and provide a normative standard for future studies of human vocal tract morphology and development.
Article
Full-text available
We measured the neural activity associated with the temporal structure of sound in the human auditory pathway from cochlear nucleus to cortex. The temporal structure includes regularities at the millisecond level and pitch sequences at the hundreds-of-milliseconds level. Functional magnetic resonance imaging (fMRI) of the whole brain with cardiac triggering allowed simultaneous observation of activity in the brainstem, thalamus and cerebrum. This work shows that the process of recoding temporal patterns into a more stable form begins as early as the cochlear nucleus and continues up to auditory cortex.
Article
Full-text available
We used positron emission tomography to examine the response of human auditory cortex to spectral and temporal variation. Volunteers listened to sequences derived from a standard stimulus, consisting of two pure tones separated by one octave alternating with a random duty cycle. In one series of five scans, spectral information (tone spacing) remained constant while speed of alternation was doubled at each level. In another five scans, speed was kept constant while the number of tones sampled within the octave was doubled at each level, resulting in increasingly fine frequency differences. Results indicated that (i) the core auditory cortex in both hemispheres responded to temporal variation, while the anterior superior temporal areas bilaterally responded to the spectral variation; and (ii) responses to the temporal features were weighted towards the left, while responses to the spectral features were weighted towards the right. These findings confirm the specialization of the left-hemisphere auditory cortex for rapid temporal processing, and indicate that core areas are especially involved in these processes. The results also indicate a complementary hemispheric specialization in right-hemisphere belt cortical areas for spectral processing. The data provide a unifying framework to explain hemispheric asymmetries in processing speech and tonal patterns. We propose that differences exist in the temporal and spectral resolution of corresponding fields in the two hemispheres, and that they may be related to anatomical hemispheric asymmetries in myelination and spacing of cortical columns.
Article
Full-text available
We used functional imaging of normal subjects to identify the neural substrate for the perception of voices in external auditory space. This fundamental process can be abnormal in psychosis, when voices that are not true external auditory objects (auditory verbal hallucinations) may appear to originate in external space. The perception of voices as objects in external space depends on filtering by the outer ear. Psychoses that distort this process involve the cerebral cortex. Functional magnetic resonance imaging was carried out on 12 normal subjects using an inside-the-scanner simulation of 'inside head' and 'outside head' voices in the form of typical auditory verbal hallucinations. Comparison between the brain activity associated with the two conditions allowed us to test the hypothesis that the perception of voices in external space ('outside head') is subserved by a temperoparietal network comprising association auditory cortex posterior to Heschl's gyrus [planum temporale (PT)] and inferior parietal lobule. Group analyses of response to 'outside head' versus 'inside head' voices showed significant activation solely in the left PT. This was demonstrated in three experiments in which the predominant lateralization of the stimulus was to the right, to the left or balanced. These findings suggest a critical involvement of the left PT in the perception of voices in external space that is not dependent on precise spatial location. Based on this, we suggest a model for the false perception of externally located auditory verbal hallucinations.
Article
Full-text available
Pitch, one of the primary auditory percepts, is related to the temporal regularity or periodicity of a sound. Previous functional brain imaging work in humans has shown that the level of population neural activity in centers throughout the auditory system is related to the temporal regularity of a sound, suggesting a possible relationship to pitch. In the current study, functional magnetic resonance imaging was used to measure activation in response to harmonic tone complexes whose temporal regularity was identical, but whose pitch salience (or perceptual pitch strength) differed, across conditions. Cochlear nucleus, inferior colliculus, and primary auditory cortex did not show significant differences in activation level between conditions. Instead, a correlate of pitch salience was found in the neural activity levels of a small, spatially localized region of nonprimary auditory cortex, overlapping the anterolateral end of Heschl's gyrus. The present data contribute to converging evidence that anterior areas of nonprimary auditory cortex play an important role in processing pitch.
Article
Full-text available
The size of a resonant source can be estimated by the acoustic-scale information in the sound [1-3]. Previous studies revealed that posterior superior temporal gyrus (STG) responds to acoustic scale in human speech when it is controlled for spectral-envelope change (unpublished data). Here we investigate whether the STG activity is specific to the processing of acoustic scale in human voice or whether it reflects a generic mechanism for the analysis of acoustic scale in resonant sources. In two functional magnetic resonance imaging (fMRI) experiments, we measured brain activity in response to changes in acoustic scale in different categories of resonant sound (human voice, animal call, and musical instrument). We show that STG is activated bilaterally for spectral-envelope changes in general; it responds to changes in category as well as acoustic scale. Activity in left posterior STG is specific to acoustic scale in human voices and not responsive to acoustic scale in other resonant sources. In contrast, the anterior temporal lobe and intraparietal sulcus are activated by changes in acoustic scale across categories. The results imply that the human voice requires special processing of acoustic scale, whereas the anterior temporal lobe and intraparietal sulcus process auditory size information independent of source category.
Article
Full-text available
The consistent, but often wrong, impressions people form of the size of unseen speakers are not random but rather point to a consistent misattribution bias, one that the advertising, broadcasting, and entertainment industries also routinely exploit. The authors report 3 experiments examining the perceptual basis of this bias. The results indicate that, under controlled experimental conditions, listeners can make relative size distinctions between male speakers using reliable cues carried in voice formant frequencies (resonant frequencies, or timbre) but that this ability can be perturbed by discordant voice fundamental frequency (F-sub-0, or pitch) differences between speakers. The authors introduce 3 accounts for the perceptual pull that voice F-sub-0 can exert on our routine (mis)attributions of speaker size and consider the role that voice F-sub-0 plays in additional voice-based attributions that may or may not be reliable but that have clear size connotations.
Article
Full-text available
of the thorniest problems faced by cognitive scientists (Allport, 1989; Kahneman, 1973; Posner and Boies, 1971; Schneider and Shiffrin, 1977; Wickens, 1980). When considering attention at the cognitive level, we must consider the following functional issues. First, an individual operating in an environment is bombarded by a vast array of perceptual inputs simultaneously and must, in order to function effectively, somehow select certain things for enhanced processing while ignoring others (Allport, 1989; Posner, 1991). (Selective attention may, of course, be further subdivided into operations such as disengagement from a current focus, engagement of a new focus, and sustained focal attention over time (Posner and Peterson, 1990; Posner, 1991).) Second, there appear to be limits on the number of things that can be processed simultaneously; that is, a bottleneck or capacity limitation exists on the ability to divide attention between multiple stimuli or mental events. Much early laborato
Article
Full-text available
We present a straightforward and robust algorithm for periodicity detection, working in the lag (autocorrelation) domain. When it is tested for periodic signals and for signals with additive noise or jitter, it proves to be several orders of magnitude more accurate than the methods commonly used for speech analysis. This makes our method capable of measuring harmonics-to-noise ratios in the lag domain with an accuracy and reliability much greater than that of any of the usual frequency-domain methods. By definition, the best candidate for the acoustic pitch period of a sound can be found from the position of the maximum of the autocorrelation function of the sound, while the degree of periodicity (the harmonics-to-noise ratio) of the sound can be found from the relative height of this maximum. However, sampling and windowing cause problems in accurately determining the position and height of the maximum. These problems have led to inaccurate timedomain and cepstral methods for p...
Article
Visual or auditory hallucinations may accompany epileptic seizures or auras. Penfield reproduced such experiences by electrical stimulation of the temporal lohe cortex. We have found that similar hallucinatory experiences may arise from subcortical stimulation of the temporal lobe. The study of these events yields information pertinent to the processes involved in perception, imagery formation, and memory.
Article
We used functional imaging of normal subjects to identify the neural substrate for the perception of voices in external auditory space. This fundamental process can be abnormal in psychosis, when voices that are not true external auditory objects (auditory verbal hallucinations) may appear to originate in external space. The perception of voices as objects in external space depends on filtering by the outer ear. Psychoses that distort this process involve the cerebral cortex. Functional magnetic resonance imaging was carried out on 12 normal subjects using an inside‐the‐scanner simulation of ‘inside head’ and ‘outside head’ voices in the form of typical auditory verbal hallucinations. Comparison between the brain activity associated with the two conditions allowed us to test the hypothesis that the perception of voices in external space (‘outside head’) is subserved by a temperoparietal network comprising association auditory cortex posterior to Heschl’s gyrus [planum temporale (PT)] and inferior parietal lobule. Group analyses of response to ‘outside head’ versus ‘inside head’ voices showed significant activation solely in the left PT. This was demonstrated in three experiments in which the predominant lateralization of the stimulus was to the right, to the left or balanced. These findings suggest a critical involvement of the left PT in the perception of voices in external space that is not dependent on precise spatial location. Based on this, we suggest a model for the false perception of externally located auditory verbal hallucinations.
Article
There has recently been a series of studies concerning the interaction of glottal pulse rate (GPR) and mean-formant-frequency (MFF) in the perception of speaker characteristics and speech recognition. This paper extends the research by comparing the recognition and discrimination performance achieved with voiced words to that achieved with whispered words. The recognition experiment shows that performance with whispered words is slightly worse than with voiced words at all MFFs when the GPR of the voiced words is in the middle of the normal range. But, as GPR decreases below this range, voiced-word performance decreases and eventually becomes worse than whispered-word performance. The discrimination experiment shows that the just noticeable difference (JND) for MFF is essentially independent of the mode of vocal excitation; the JND is close to 5% for both voiced and voiceless words for all speaker types. The interaction between GPR and VTL is interpreted in terms of the stability of the internal representation of speech which improves with GPR across the range of values used in these experiments.
Article
Dichotic listening means that two different stimuli are presented at the same time, one in each ear. This technique is frequently used in experimental and clinical studies as a measure of hemispheric specialization. The primary aim of the present study was to record regional changes in the distribution of cerebral blood flow (CBF) with the 15O-PET technique to dichotically presented consonant-vowel (CV) and musical instrument stimuli, in order to test the basic assumption of differential hemispheric involvement when stimuli presented to one ear dominate over stimuli presented in the other ear. All stimuli were 380 ms in duration with a 1000 ms interstimulus interval, and were presented in blocks of either CV-syllable or musical instrument pairs. Twelve normal healthy subjects had to press a button whenever they detected a CV-syllable or a musical instrument target in a stream of CV- and musical instrument distractor stimuli. The targets appeared equally often in the right and left ear channel. The CV-syllable and musical instrument targets activated bilateral areas in the superior temporal gyri. However, there were significant interactions with regard to asymmetry of the magnitude of peak activation in the significant activation clusters. The CV-syllables resulted in greater neural activation in the left temporal lobe while the musical instruments resulted in greater neural activation in the right temporal lobe. Within-subjects correlations between magnitude of dichotic listening and CBF asymmetry were, however, non-significant. The changes in neural activation were closely mimicked by the performance data which showed a right ear superiority in response accuracy for the CV-syllables, and a left ear superiority for the musical instruments. In addition to the temporal lobe activations, there were activation tendencies in the left inferior frontal lobe, right dorsolateral prefrontal cortex, left occipital lobe, and cerebellum.
Article
We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal tract expands or contracts as the length of the vocal tract increases or decreases. There is a transform, the Mellin transform, that is immune to the effects of time dilation; it maps impulse responses that differ in temporal scale onto a single distribution and encodes the size information separately as a scalar constant. In this paper we investigate the use of the Mellin transform for vowel normalisation. In the auditory system, sounds are initially subjected to a form of wavelet analysis in the cochlea and then, in each frequency channel, the repeating patterns produced by periodic sounds appear to be stabilised by a form of time-interval calculation. The result is like a two-dimensional array of interval histograms and it is referred to as an auditory image. In this paper, we show that there is a two-dimensional form of the Mellin transform that can convert the auditory images of vowel sounds from vocal tracts with different sizes into an invariant Mellin image (MI) and, thereby, facilitate the extraction and separation of the size and shape information associated with a given vowel type. In signal processing terms, the MI of a sound is the Mellin transform of a stabilised wavelet transform of the sound. We suggest that the MI provides a good model of auditory vowel normalisation, and that this provides a good framework for auditory processing from cochlea to cortex.
Article
We review in a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation (Charpentier and Moulines, 1988; Moulines and Charpentier, 1988; Hamon et al., 1989). These algorithms rely on a pitch-synchronous overlap-add (PSOLA) approach for modifying the speech prosody and concatenating speech waveforms. The modifications of the speech signal are performed either in the frequency domain (FD-PSOLA), using the Fast Fourier Transform, or directly in the time domain (TD-PSOLA), depending on the length of the window used in the synthesis process. The frequency domain approach is capable of a great flexibility in modifying the spectral characteristics of the speech signal, while the time domain approach provides very efficient solutions for the real time implementation of synthesis systems. We also discuss the different kinds of distortions involved in these different algorithms.
Article
The purpose of this research was to investigate the potential effectiveness of digital speech processing and pattern recognition techniques for the automatic recognition of gender from speech. In part I Coarse Analysis [K. Wu and D. G. Childers, J. Acoust. Soc. Am. 90, 1828-1840 (1991)] various feature vectors and distance measures were examined to determine their appropriateness for recognizing a speaker's gender from vowels, unvoiced fricatives, and voiced fricatives. One recognition scheme based on feature vectors extracted from vowels achieved 100% correct recognition of the speaker's gender using a database of 52 speakers (27 male and 25 female). In this paper a detailed, fine analysis of the characteristics of vowels is performed, including formant frequencies, bandwidths, and amplitudes, as well as speaker fundamental frequency of voicing. The fine analysis used a pitch synchronous closed-phase analysis technique. Detailed formant features, including frequencies, bandwidths, and amplitudes, were extracted by a closed-phase weighted recursive least-squares method that employed a variable forgetting factor, i.e., WRLS-VFF. The electroglottograph signal was used to locate the closed-phase portion of the speech signal. A two-way statistical analysis of variance (ANOVA) was performed to test the differences between gender features. The relative importance of grouped vowel features was evaluated by a pattern recognition approach. Numerous interesting results were obtained, including the fact that the second formant frequency was a slightly better recognizer of gender than fundamental frequency, giving 98.1% versus 96.2% correct recognition, respectively. The statistical tests indicated that the spectra for female speakers had a steeper slope (or tilt) than that for males. The results suggest that redundant gender information was imbedded in the fundamental frequency and vocal tract resonance characteristics. The feature vectors for female voices were observed to have higher within-group variations than those for male voices. The data in this study were also used to replicate portions of the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] study of vowels for male and female speakers.
Article
Twenty speakers, diagnosed as male-to-female transsexuals, produced conversational recordings of speech and voice. The samples were submitted to perceptual evaluations and to acoustic analysis by means of a Visi-Pitch, Apple IIe microcomputer system. Transsexuals categorized as having female voices had higher fundamental frequencies (fo), less extensive downward intonations, a higher percentage of upward intonations and downward shifts, and a smaller percentage of level intonations and level shifts than transsexuals categorized as having male voices. The lowest average fo identified as belonging to a female speaker was 155 Hz. Higher (more feminine) ratings on the masculinity-femininity dimension correlated with fo (r = .89), percentage of level shifts (r = -.67), percentage of downward shifts (r = .50), percentage of level intonations (r = -.43), and percentage of upward intonations (r = .40). Findings are discussed in terms of the relative perceptual salience of average fundamental frequency and patterns of intonation for female voice quality.
Article
Comparison is drawn between male and female larynges on the basis of overall size, vocal fold membranous length, elastic properties of tissue, and prephonatory glottal shape. Two scale factors are proposed that are useful for explaining differences in fundamental frequency, sound power, mean airflow, and glottal efficiency. Fundamental frequency is scaled primarily according to the membranous length of the vocal folds (scale factor of 1.6), whereas mean airflow, sound power, glottal efficiency, and amplitude of vibration include another scale factor (1.2) that relates to overall larynx size. Some explanations are given for observed sex differences in glottographic waveforms. In particular, the simulated (computer-modeled) vocal fold contact area is used to infer male-female differences in the shape of the glottis. The female glottis appears to converge more linearly (from bottom to top) than the male glottis, primarily because of medial surface bulging of the male vocal folds.
Article
The need for a simply applied quantitative assessment of handedness is discussed and some previous forms reviewed. An inventory of 20 items with a set of instructions and response- and computational-conventions is proposed and the results obtained from a young adult population numbering some 1100 individuals are reported. The separate items are examined from the point of view of sex, cultural and socio-economic factors which might appertain to them and also of their inter-relationship to each other and to the measure computed from them all. Criteria derived from these considerations are then applied to eliminate 10 of the original 20 items and the results recomputed to provide frequency-distribution and cumulative frequency functions and a revised item-analysis. The difference of incidence of handedness between the sexes is discussed.
Article
The results of this study reveal that Listeners are able to identify speaker sex from isolated productions of whispered vowels.
Article
Synopsis A comprehensive semi-structured questionnaire was administered to 100 psychotic patients who had experienced auditory hallucinations. The aim was to extend the phenomenology of the hallucination into areas of both form and content and also to guide future theoretical development. All subjects heard ‘voices’ talking to or about them. The location of the voice, its characteristics and the nature of address were described. Precipitants and alleviating factors plus the effect of the hallucinations on the sufferer were identified. Other hallucinatory experiences, thought insertion and insight were examined for their inter-relationships. A pattern emerged of increasing complexity of the auditory–verbal hallucination over time by a process of accretion, with the addition of more voices and extended dialogues, and more intimacy between subject and voice. Such evolution seemed to relate to the lessening of distress and improved coping. These findings should inform both neurological and cognitive accounts of the pathogenesis of auditory hallucinations in psychotic disorders.
Article
This study investigated the perceptual and acoustical characteristics of vocal presentation in both the masculine and the feminine modes by the same group of male subjects. Listeners (N = 88) evaluated 22 voice samples by using 18 semantic differential scales and 57 adjectives. The 22 voice samples were provided by 11 biologically male speakers, who described themselves as heterosexual crossdressers. Each speaker read a standard passage under controlled conditions. In one reading, they demonstrated their typical masculine voice and in the other they spoke in their feminine voice. Acoustical analyses included mean fundamental frequency, frequency range, overall passage duration, and duration of a sample of stressed vowels. Results indicated that listeners heard significant differences between masculine and feminine presentations across the 11 speakers and the 18 semantic differential scales. Masculine-feminine and high-low pitch were the most salient scales in the perceptual judgments. Acoustical analyses indicated wide variation according to speaker and condition. Clinical applications are provided.
Article
We examined brain activity associated with visual imagery at episodic memory retrieval using positron emission tomography (PET). Twelve measurements of regional cerebral blood flow (rCBF) were taken in six right-handed, healthy, male volunteers. During six measurements, they were engaged in the cued recall of imageable verbal paired associates. During the other six measurements, they recalled nonimageable paired associates. Memory performance was equalized across all word lists. The subjects' use of an increased degree of visual imagery during the recall of imageable paired associates was confirmed using subjective rating scales after each scan. Memory-related imagery was associated with significant activation of a medial parietal area, the precuneus. This finding confirms a previously stated hypothesis about the precuneus and provides strong evidence that it is a key part of the neural substate of visual imagery occurring in conscious memory recall.
Article
The use of functional magnetic resonance imaging (fMRI) to explore central auditory function may be compromised by the intense bursts of stray acoustic noise produced by the scanner whenever the magnetic resonance signal is read out. We present results evaluating the use of one method to reduce the effect of the scanner noise: "sparse" temporal sampling. Using this technique, single volumes of brain images are acquired at the end of stimulus and baseline conditions. To optimize detection of the activation, images are taken near to the maxima and minima of the hemodynamic response during the experimental cycle. Thus, the effective auditory stimulus for the activation is not masked by the scanner noise. In experiment 1, the course of the hemodynamic response to auditory stimulation was mapped during continuous task performance. The mean peak of the response was at 10.5 sec after stimulus onset, with little further change until stimulus offset. In experiment 2, sparse imaging was used to acquire activation images. Despite the fewer samples with sparse imaging, this method successfully delimited broadly the same regions of activation as conventional continuous imaging. However, the mean percentage MR signal change within the region of interest was greater using sparse imaging. Auditory experiments that use continuous imaging methods may measure activation that is a result of an interaction between the stimulus and task factors (e.g., attentive effort) induced by the intense background noise. We suggest that sparse imaging is advantageous in auditory experiments as it ensures that the obtained activation depends on the stimulus alone.
Article
The potentially important effect of gradient switching sound on brain function during functional magnetic resonance imaging (fMRI) was studied by comparing experiments with low and high scanner sound levels. To provide a low sound level experiment, a sparse scanning method was used, characterized by long, 9 sec, periods of scanner silence interspersed with 1 sec echoplanar imaging (EPI) bursts. For the condition with high sound levels, extra EPI gradient modules were inserted in the 9 sec inter-image intervals. Visual, motor, or auditory stimuli were presented in the interval between imaging. It was found that with the addition of gradient sounds, auditory activation was significantly decreased while motor and visual activation were not significantly altered. Other general factors relating to fMRI were also examined, such as experimental duration and fatigue. For example, motion of the subjects during the experiments was found to be related to the time spent in the scanner, rather than to the ambient sound level.
Article
Previous positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) studies show that during attentive listening, processing of phonetic information is associated with higher activity in the left auditory cortex than in the right auditory cortex while the opposite is true for musical information. The present PET study determined whether automatically activated neural mechanisms for phonetic and musical information are lateralized. To this end, subjects engaged in a visual word classification task were presented with phonetic sound sequences consisting of frequent (P = 0.8) and infrequent (P = 0.2) phonemes and with musical sound sequences consisting of frequent (P = 0.8) and infrequent (P = 0.2) chords. The phonemes and chords were matched in spectral complexity as well as in the magnitude of frequency difference between the frequent and infrequent sounds (/e/ vs. /o/; A major vs. A minor). In addition, control sequences, consisting of either frequent (/e/; A major) or infrequent sounds (/o/; A minor) were employed in separate blocks. When sound sequences consisted of intermixed frequent and infrequent sounds, automatic phonetic processing was lateralized to the left hemisphere and musical to the right hemisphere. This lateralization, however, did not occur in control blocks with one type of sound (frequent or infrequent). The data thus indicate that automatic activation of lateralized neuronal circuits requires sound comparison based on short-term sound representations.
Article
It is increasingly recognized that the human planum temporale is not a dedicated language processor, but is in fact engaged in the analysis of many types of complex sound. We propose a model of the human planum temporale as a computational engine for the segregation and matching of spectrotemporal patterns. The model is based on segregating the components of the acoustic world and matching these components with learned spectrotemporal representations. Spectrotemporal information derived from such a 'computational hub' would be gated to higher-order cortical areas for further processing, leading to object recognition and the perception of auditory space. We review the evidence for the model and specific predictions that follow from it.
Article
An fMRI experiment was performed to identify the main stages of melody processing in the auditory pathway. Spectrally matched sounds that produce no pitch, fixed pitch, or melody were all found to activate Heschl's gyrus (HG) and planum temporale (PT). Within this region, sounds with pitch produced more activation than those without pitch only in the lateral half of HG. When the pitch was varied to produce a melody, there was activation in regions beyond HG and PT, specifically in the superior temporal gyrus (STG) and planum polare (PP). The results support the view that there is hierarchy of pitch processing in which the center of activity moves anterolaterally away from primary auditory cortex as the processing of melodic sounds proceeds.
Article
Timbre is a major structuring force in music and one of the most important and ecologically relevant features of auditory events. We used sound stimuli selected on the basis of previous psychophysiological studies to investigate the neural correlates of timbre perception. Our results indicate that both the left and right hemispheres are involved in timbre processing, challenging the conventional notion that the elementary attributes of musical perception are predominantly lateralized to the right hemisphere. Significant timbre-related brain activation was found in well-defined regions of posterior Heschl's gyrus and superior temporal sulcus, extending into the circular insular sulcus. Although the extent of activation was not significantly different between left and right hemispheres, temporal lobe activations were significantly posterior in the left, compared to the right, hemisphere, suggesting a functional asymmetry in their respective contributions to timbre processing. The implications of our findings for music processing in particular and auditory processing in general are discussed.
Article
Objects are the building blocks of experience, but what do we mean by an object? Increasingly, neuroscientists refer to 'auditory objects', yet it is not clear what properties these should possess, how they might be represented in the brain, or how they might relate to the more familiar objects of vision. The concept of an auditory object challenges our understanding of object perception. Here, we offer a critical perspective on the concept and its basis in the brain.
Article
The present functional magnetic resonance imaging (fMRI) study examined the neurophysiological processing of voice information. The impact of the major acoustic parameters as well as the role of the listener's and the speaker's gender were investigated. Male and female, natural, and manipulated voices were presented to 16 young adults who were asked to judge the naturalness of each voice. The hemodynamic responses were acquired by a 3T Bruker scanner utilizing an event-related design. The activation was generally stronger in response to female voices as well as to manipulated voice signals, and there was no interaction with the listener's gender. Most importantly, the results suggest a functional segregation of the right superior temporal cortex for the processing of different voice parameters, whereby (1) voice pitch is processed in regions close and anterior to Heschl's Gyrus, (2) voice spectral information is processed in posterior parts of the superior temporal gyrus (STG) and areas surrounding the planum parietale (PP) bilaterally, and (3) information about prototypicality is predominately processed in anterior parts of the right STG. Generally, by identifying distinct functional regions in the right STG, our study supports the notion of a fundamental role of the right hemisphere in spoken language comprehension.
Article
Harmonic complex tones elicit a pitch sensation at their fundamental frequency (F0), even when their spectrum contains no energy at F0, a phenomenon known as "pitch of the missing fundamental." The strength of this pitch percept depends upon the degree to which individual harmonics are spaced sufficiently apart to be "resolved" by the mechanical frequency analysis in the cochlea. We investigated the resolvability of harmonics of missing-fundamental complex tones in the auditory nerve (AN) of anesthetized cats at low and moderate stimulus levels and compared the effectiveness of two representations of pitch over a much wider range of F0s (110-3,520 Hz) than in previous studies. We found that individual harmonics are increasingly well resolved in rate responses of AN fibers as the characteristic frequency (CF) increases. We obtained rate-based estimates of pitch dependent upon harmonic resolvability by matching harmonic templates to profiles of average discharge rate against CF. These estimates were most accurate for F0s above 400-500 Hz, where harmonics were sufficiently resolved. We also derived pitch estimates from all-order interspike-interval distributions, pooled over our entire sample of fibers. Such interval-based pitch estimates, which are dependent on phase-locking to the harmonics, were accurate for F0s below 1,300 Hz, consistent with the upper limit of the pitch of the missing fundamental in humans. The two pitch representations are complementary with respect to the F0 range over which they are effective; however, neither is entirely satisfactory in accounting for human psychophysical data.
Article
In this functional magnetic resonance imaging (fMRI) study, we investigated the neural basis of mental auditory imagery of familiar complex sounds that did not contain language or music. In the first condition (perception), the subjects watched familiar scenes and listened to the corresponding sounds that were presented simultaneously. In the second condition (imagery), the same scenes were presented silently and the subjects had to mentally imagine the appropriate sounds. During the third condition (control), the participants watched a scrambled version of the scenes without sound. To overcome the disadvantages of the stray acoustic scanner noise in auditory fMRI experiments, we applied sparse temporal sampling technique with five functional clusters that were acquired at the end of each movie presentation. Compared to the control condition, we found bilateral activations in the primary and secondary auditory cortices (including Heschl's gyrus and planum temporale) during perception of complex sounds. In contrast, the imagery condition elicited bilateral hemodynamic responses only in the secondary auditory cortex (including the planum temporale). No significant activity was observed in the primary auditory cortex. The results show that imagery and perception of complex sounds that do not contain language or music rely on overlapping neural correlates of the secondary but not primary auditory cortex.
Article
In schizophrenia, auditory verbal hallucinations (AVHs) are likely to be perceived as gender-specific. Given that functional neuro-imaging correlates of AVHs involve multiple brain regions principally including auditory cortex, it is likely that those brain regions responsible for attribution of gender to speech are invoked during AVHs. We used functional magnetic resonance imaging (fMRI) and a paradigm utilising 'gender-apparent' (unaltered) and 'gender-ambiguous' (pitch-scaled) male and female voice stimuli to test the hypothesis that male and female voices activate distinct brain areas during gender attribution. The perception of female voices, when compared with male voices, affected greater activation of the right anterior superior temporal gyrus, near the superior temporal sulcus. Similarly, male voice perception activated the mesio-parietal precuneus area. These different gender associations could not be explained by either simple pitch perception or behavioural response because the activations that we observed were conjointly activated by both 'gender-apparent' and 'gender-ambiguous' voices. The results of this study demonstrate that, in the male brain, the perception of male and female voices activates distinct brain regions.
Article
That auditory hallucinations are voices heard in the absence of external stimuli implies the existence of endogenous neural activity within the auditory cortex responsible for their perception. Further, auditory hallucinations occur across a range of healthy and disease states that include reduced arousal, hypnosis, drug intoxication, delirium, and psychosis. This suggests that, even in health, the auditory cortex has a propensity to spontaneously “activate” during silence. Here we report the findings of a functional MRI study, designed to examine baseline activity in speech-sensitive auditory regions. During silence, we show that functionally defined speech-sensitive auditory cortex is characterized by intermittent episodes of significantly increased activity in a large proportion (in some cases >30%) of its volume. Bilateral increases in activity are associated with foci of spontaneous activation in the left primary and association auditory cortices and anterior cingulate cortex. We suggest that, within auditory regions, endogenous activity is modulated by anterior cingulate cortex, resulting in spontaneous activation during silence. Hence, an aspect of the brain's “default mode” resembles a (preprepared) substrate for the development of auditory hallucinations. These observations may help explain why such hallucinations are ubiquitous. • auditory system • baseline activity • functional MRI
Article
Perceptual aftereffects following adaptation to simple stimulus attributes (e.g., motion, color) have been studied for hundreds of years. A striking recent discovery was that adaptation also elicits contrastive aftereffects in visual perception of complex stimuli and faces [1-6]. Here, we show for the first time that adaptation to nonlinguistic information in voices elicits systematic auditory aftereffects. Prior adaptation to male voices causes a voice to be perceived as more female (and vice versa), and these auditory aftereffects were measurable even minutes after adaptation. By contrast, crossmodal adaptation effects were absent, both when male or female first names and when silently articulating male or female faces were used as adaptors. When sinusoidal tones (with frequencies matched to male and female voice fundamental frequencies) were used as adaptors, no aftereffects on voice perception were observed. This excludes explanations for the voice aftereffect in terms of both pitch adaptation and postperceptual adaptation to gender concepts and suggests that contrastive voice-coding mechanisms may routinely influence voice perception. The role of adaptation in calibrating properties of high-level voice representations indicates that adaptation is not confined to vision but is a ubiquitous mechanism in the perception of nonlinguistic social information from both faces and voices.