Article

Vocomotor and Social Brain Networks Work Together to Express Social Traits in Voices

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Voice modulation is important when navigating social interactions-tone of voice in a business negotiation is very different from that used to comfort an upset child. While voluntary vocal behavior relies on a cortical vocomotor network, social voice modulation may require additional social cognitive processing. Using functional magnetic resonance imaging, we investigated the neural basis for social vocal control and whether it involves an interplay of vocal control and social processing networks. Twenty-four healthy adult participants modulated their voice to express social traits along the dimensions of the social trait space (affiliation and competence) or to express body size (control for vocal flexibility). Naïve listener ratings showed that vocal modulations were effective in evoking social trait ratings along the two primary dimensions of the social trait space. Whereas basic vocal modulation engaged the vocomotor network, social voice modulation specifically engaged social processing regions including the medial prefrontal cortex, superior temporal sulcus, and precuneus. Moreover, these regions showed task-relevant modulations in functional connectivity to the left inferior frontal gyrus, a core vocomotor control network area. These findings highlight the impact of the integration of vocal motor control and social information processing for socially meaningful voice modulation.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Together, these studies suggest that differences in functional activation in VMN might underlie volitional voice modulation efficacy. Beyond the VMN, precuneus, medial prefrontal cortex (mPFC) and superior temporal sulcus (STS)/Temporo-parietal junction (TPJ) support the volitional expression of explicit social traits in the voice (Guldner et al., 2020), as well as explicit (McGettigan et al., 2013) and covert (Brown et al., 2019) vocal identity expression. ...
... Here we present novel individual-differences analyses using functional MRI and speech production data previously described with group-level analyses in Guldner et al. (2020). Data relating to individual social reactivity and trait scales are being reported and analysed here for the first time. ...
... Social traits included vocally expressed intelligence, likeability and hostility. Modulating the voice to express a large body size, as well as speaking in non-modulated neutral voice, were implemented as control conditions (for detailed instructions on trait expressions see Guldner et al., 2020). Exemplars consisted of four two-syllable, five-letter pseudowords with a C-V-C-V-C (C=consonant, V= vowel) phonotactic structure (belam, lagod, minad, and namil;Frühholz et al., 2015). ...
Preprint
How we use our voice is central to how we express information about ourselves to others. A speaker’s dispositional social reactivity might contribute to how well they can volitionally modulate their voice to manage listener impressions. Here, we investigated individual differences in social vocal control performance in relation to social reactivity indices and underlying neural mechanisms. Twenty-four right-handed speakers of British English (twenty females) modulated their voice to communicate social traits (sounding likeable, hostile, intelligent) while undergoing a rapid-sparse fMRI protocol. Performance in social vocal control was operationalized as the specificity with which speakers evoked trait percepts in an independent group of naïve listeners. Speakers’ empathy levels, as well as psychopathic and Machiavellian traits, were assessed using self-report questionnaires. The ability to express specific social traits in voices was associated with activation in brain regions involved in vocal motor and social processing (left posterior TPJ, bilateral SMG, premotor cortex). While dispositional cognitive empathy predicted general vocal performance, self-reported levels of Machiavellianism were specifically related to better performance in expressing likeability. These findings highlight the psychological and neural mechanisms involved in strategic social voice modulation, suggesting differential processing in a combined network of vocal control and social processing streams.
... McGettigan and colleagues [55] asked participants to overtly express two different dimensions of social information-participants were asked to read non-words aloud and to express social traits of competence (e.g. intelligent) and affiliation (e.g. ...
... By contrast, the expression of social traits (over the nonsocial trait of size) resulted in activations in memory-related circuits such as the left hippocampus and bilateral retrosplenial, visual imagery-related fields such as the left cuneus and precuneus and the bilateral lingual gyri, and semantic fields such as the medial prefrontal cortex and bilateral anterior STS. These regions have all been argued to fall within the social brain network [55]. This is an important study, showing how elements of the social brain network are interacting with voluntary voice modulation networks to effect vocal change. ...
... Another difference that may be important in these studies is that some instruct the participants to actively change their voices (e.g. the vocal changes used in [50,55]), while others give the participants tasks that will implicitly lead to vocal change, even if the participants are not directly aware of this (e.g. the auditory manipulations of speech production discussed by Meekings & Scott [53], the joint speech task used by Jasmin et al. [56]). What are the implications of these different kinds of task requirements on the patterns of neural activations seen in the VVN? ...
Article
The networks of cortical and subcortical fields that contribute to speech production have benefitted from many years of detailed study, and have been used as a framework for human volitional vocal production more generally. In this article, I will argue that we need to consider speech production as an expression of the human voice in a more general sense. I will also argue that the neural control of the voice can and should be considered to be a flexible system, into which more right hemispheric networks are differentially recruited, based on the factors that are modulating vocal production. I will explore how this flexible network is recruited to express aspects of non-verbal information in the voice, such as identity and social traits. Finally, I will argue that we need to widen out the kinds of vocal behaviours that we explore, if we want to understand the neural underpinnings of the true range of sound-making capabilities of the human voice. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part II)’.
... For example, vocal size exaggeration is effective in changing listeners' evaluations of talker height [17], which may provide advantages in competitive situations. Furthermore, recent evidence on social trait expression has shown that talkers can volitionally modulate their speaking voice to generate exaggerated impressions of specific traits in naïve listeners [21]. Beyond the mere demonstration of vocal modulation in humans, it is of interest to investigate how this skill might vary across individuals. ...
... An alternative possibility is that the larynx's intrinsic musculature may be represented neurally in a more fine-grained way linked to ongoing prosodic modulation rather than mean pitch. This argument is supported by recent work using electrocorticography in pre-surgical patients, in which the intonation contour of spoken sentences and sung phrases was tracked by high gamma activity of electrodes located in dorsal LMC [21]. ...
Article
Humans have a remarkable capacity to finely control the muscles of the larynx, via distinct patterns of cortical topography and innervation that may underpin our sophisticated vocal capabilities compared with non-human primates. Here, we investigated the behavioural and neural correlates of laryngeal control, and their relationship to vocal expertise, using an imitation task that required adjustments of larynx musculature during speech. Highly trained human singers and non-singer control participants modulated voice pitch and vocal tract length (VTL) to mimic auditory speech targets, while undergoing real-time anatomical scans of the vocal tract and functional scans of brain activity. Multivariate analyses of speech acoustics, larynx movements and brain activation data were used to quantify vocal modulation behaviour and to search for neural representations of the two modulated vocal parameters during the preparation and execution of speech. We found that singers showed more accurate task-relevant modulations of speech pitch and VTL (i.e. larynx height, as measured with vocal tract MRI) during speech imitation; this was accompanied by stronger representation of VTL within a region of the right somatosensory cortex. Our findings suggest a common neural basis for enhanced vocal control in speech and song. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part I)’.
... Among other functions, the LMCs are involved in raising or lowering the larynx 51 and participate in broader brain networks 52 involved in characteristically human modes of communication such as speaking and singing 53,54 , as well as expressing emotions 55,56 . Vocal trait modulation similarly engages this system, alongside a broader network of brain areas associated with social reasoning 57 . Adjacent to the LMCs is a putative larynx somatosensory cortex, which is enhanced in highly trained Opera singers 58,59 and may mediate the improved vocal modulation skill of singers 36 . ...
Article
Full-text available
The human voice carries socially relevant information such as how authoritative, dominant, and attractive the speaker sounds. However, some speakers may be able to manipulate listeners by modulating the shape and size of their vocal tract to exaggerate certain characteristics of their voice. We analysed the veridical size of speakers’ vocal tracts using real-time magnetic resonance imaging as they volitionally modulated their voice to sound larger or smaller, corresponding changes to the size implied by the acoustics of their voice, and their influence over the perceptions of listeners. Individual differences in this ability were marked, spanning from nearly incapable to nearly perfect vocal modulation, and was consistent across modalities of measurement. Further research is needed to determine whether speakers who are effective at vocal size exaggeration are better able to manipulate their social environment, and whether this variation is an inherited quality of the individual, or the result of life experiences such as vocal training.
... At a proximate level, voice modulation across social contexts involves both the motor cortex [5] and regions of the brain involved in social processing, including the medial prefrontal cortex, superior temporal sulcus and precuneus [6]. From a developmental perspective, the capacity to formulate vocal expression and inflection is evident early in life and is seen when infants babble [7]. ...
Article
The human voice is dynamic, and people modulate their voices across different social interactions. This article presents a review of the literature examining natural vocal modulation in social contexts relevant to human mating and intrasexual competition. Altering acoustic parameters during speech, particularly pitch, in response to mating and competitive contexts can influence social perception and indicate certain qualities of the speaker. For instance, a lowered voice pitch is often used to exert dominance, display status and compete with rivals. Changes in voice can also serve as a salient medium for signalling a person's attraction to another, and there is evidence to support the notion that attraction and/or romantic interest can be distinguished through vocal tones alone. Individuals can purposely change their vocal behaviour in attempt to sound more attractive and to facilitate courtship success. Several findings also point to the effectiveness of vocal change as a mechanism for communicating relationship status. As future studies continue to explore vocal modulation in the arena of human mating, we will gain a better understanding of how and why vocal modulation varies across social contexts and its impact on receiver psychology. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part I)’.
Preprint
Objective: This study explores the graduation of social traits in virtual characters by experimental manipulation of perceived trustworthiness with the aim to validate an existing predictive model in animated whole-body avatars.Method: We created a set of 210 animated virtual characters, for which facial features were generated according to a predictive statistical model originally developed for 2D faces. In a first online study, participants (N=34) rated mute video clips of the characters on the dimensions of trustworthiness, dominance, valence, and arousal. In a second study (N=49), vocal expressions were added to the avatars, with voice recordings manipulated on the dimension of trustworthiness by their speakers. Results: In study one, as predicted, we found a significant positive linear (t(7071)=47.67, p<.001) as well as quadratic (t(7071)=-4.76, p<.001) trend in trustworthiness ratings. We found a significant negative correlation between mean trustworthiness and arousal (=-.37, p<.001), and a positive correlation with valence (=.88, p<.001). In study two, we found a significant linear (t(4465.96)=33.91, p<.001), quadratic (t(4465.96)=-10.05, p<.001), cubic (t(4465.96)=-5.90, p<.001), quartic (t(4465.96)=4.88, p<.001) and quintic (t(4465.96)=3.20, p=.001) trend in trustworthiness ratings. Similarly to study one, we found a significant negative correlation between mean trustworthiness and arousal (=-.42, p<.001) and a positive correlation with valence (=.76, p<.001).Conclusion: We successfully showed that a multisensory graduation of social traits, originally developed for 2D stimuli, can be applied to virtually animated characters, to create a battery of animated virtual humanoid male characters. These virtual avatars have a higher ecological validity in comparison to their 2D counterparts and allow for a targeted experimental manipulation of perceived trustworthiness. The stimuli could be used for social cognition research in neurotypical and psychiatric populations.
Article
Full-text available
The current study represents a first attempt at examining the neural basis of dramatic acting. While all people play multiple roles in daily life—for example, ‘spouse' or ‘employee'—these roles are all facets of the ‘self' and thus of the first-person (1P) perspective. Compared to such everyday role playing, actors are required to portray other people and to adopt their gestures, emotions and behaviours. Consequently, actors must think and behave not as themselves but as the characters they are pretending to be. In other words, they have to assume a ‘fictional first-person' (Fic1P) perspective. In this functional MRI study, we sought to identify brain regions preferentially activated when actors adopt a Fic1P perspective during dramatic role playing. In the scanner, university-trained actors responded to a series of hypothetical questions from either their own 1P perspective or from that of Romeo (male participants) or Juliet (female participants) from Shakespeare's drama. Compared to responding as oneself, responding in character produced global reductions in brain activity and, particularly, deactivations in the cortical midline network of the frontal lobe, including the dorsomedial and ventromedial prefrontal cortices. Thus, portraying a character through acting seems to be a deactivation-driven process, perhaps representing a ‘loss of self'.
Article
Full-text available
We form very rapid personality impressions about speakers on hearing a single word. This implies that the acoustical properties of the voice (e.g., pitch) are very powerful cues when forming social impressions. Here, we aimed to explore how personality impressions for brief social utterances transfer across languages and whether acoustical properties play a similar role in driving personality impressions. Additionally, we examined whether evaluations are similar in the native and a foreign language of the listener. In two experiments we asked Spanish listeners to evaluate personality traits from different instances of the Spanish word “Hola” (Experiment 1) and the English word “Hello” (Experiment 2), native and foreign language respectively. The results revealed that listeners across languages form very similar personality impressions irrespective of whether the voices belong to the native or the foreign language of the listener. A social voice space was summarized by two main personality traits, one emphasizing valence (e.g., trust) and the other strength (e.g., dominance). Conversely, the acoustical properties that listeners pay attention to when judging other’s personality vary across languages. These results provide evidence that social voice perception contains certain elements invariant across cultures/languages, while others are modulated by the cultural/linguistic background of the listener.
Article
Full-text available
Inter-individual differences in human fundamental frequency (F0, perceived as voice pitch) predict mate quality and reproductive success, and affect lis-teners' social attributions. Although humans can readily and volitionally manipulate their vocal apparatus and resultant voice pitch, for instance, in the production of speech sounds and singing, little is known about whether humans exploit this capacity to adjust the non-verbal dimensions of their voices during social (including sexual) interactions. Here, we recorded full-length conversations of 30 adult men and women taking part in real speed-dating events and tested whether their voice pitch (mean, range and variability) changed with their personal mate choice preferences and the overall desirability of each dating partner. Within-individual analyses indicated that men lowered the minimum pitch of their voices when interacting with women who were overall highly desired by other men. Men also lowered their mean voice pitch on dates with women they selected as potential mates, particularly those who indicated a mutual preference (matches). Interestingly, although women spoke with a higher and more variable voice pitch towards men they selected as potential mates, women lowered both voice pitch parameters towards men who were most desired by other women and whom they also personally preferred. Between-individual analyses indicated that men in turn preferred women with lower-pitched voices, wherein women's minimum voice pitch explained up to 55% of the variance in men's mate preferences. These results, derived in an ecologically valid setting, show that individual-and group-level mate preferences can interact to affect vocal behaviour, and support the hypothesis that human voice modulation functions in non-verbal communication to elicit favourable judgements and behaviours from others, including potential mates.
Article
Full-text available
Our ability to understand others' communicative intentions in speech is key to successful social interaction. Indeed, misunderstanding an "excuse me" as apology, while meant as criticism, may have important consequences. Recent behavioural studies have provided evidence that prosody, i.e., vocal tone, is an important indicator for speakers' intentions. Using a novel audio-morphing paradigm, the present fMRI study examined the neurocognitive mechanisms that allow listeners to 'read' speakers' intents from vocal-prosodic patterns. Participants categorised prosodic expressions that gradually varied in their acoustics between criticism, doubt, and suggestion. Categorising typical exemplars of the three intentions induced activations along the ventral auditory stream, complemented by amygdala and mentalizing system. These findings likely depict the step-wise conversion of external perceptual information into abstract prosodic categories and internal social semantic concepts, including the speaker's mental state. Ambiguous tokens, in turn, involved cingulo-opercular areas known to assist decision-making in case of conflicting cues. Auditory and decision-making processes were flexibly coupled with the amygdala, depending on prosodic typicality, indicating enhanced categorisation efficiency of overtly relevant, meaningful prosodic signals. Altogether, the results point to a model in which auditory-prosodic categorisation and socio-inferential conceptualisation cooperate to translate perceived vocal tone into a coherent representation of the speaker's intent.
Article
Full-text available
Significance In speech, social evaluations of a speaker’s dominance or trustworthiness are conveyed by distinguishing, but little-understood, pitch variations. This work describes how to combine state-of-the-art vocal pitch transformations with the psychophysical technique of reverse correlation and uses this methodology to uncover the prosodic prototypes that govern such social judgments in speech. This finding is of great significance, because the exact shape of these prototypes, and how they vary with sex, age, and culture, is virtually unknown, and because prototypes derived with the method can then be reapplied to arbitrary spoken utterances, thus providing a principled way to modulate personality impressions in speech.
Article
Full-text available
The neurobiology of emotional prosody production is not well investigated. In particular, the effects of cues and social context are not known. The present study sought to differentiate cued from free emotion generation and the effect of social feedback from a human listener. Online speech filtering enabled fMRI during prosodic communication in 30 participants. Emotional vocalizations were a) free, b) auditorily cued, c) visually cued, or d) with interactive feedback. In addition to distributed language networks, cued emotions increased activity in auditory and - in case of visual stimuli - visual cortex. Responses were larger in pSTG at the right hemisphere and the ventral striatum when participants were listened to and received feedback from the experimenter. Sensory, language, and reward networks contributed to prosody production and were modulated by cues and social context. The right pSTG is a central hub for communication in social interactions - in particular for interpersonal evaluation of vocal emotions.
Article
Full-text available
When we hear a new voice we automatically form a "first impression" of the voice owner's personality; a single word is sufficient to yield ratings highly consistent across listeners. Past studies have shown correlations between personality ratings and acoustical parameters of voice, suggesting a potential acoustical basis for voice personality impressions, but its nature and extent remain unclear. Here we used data-driven voice computational modelling to investigate the link between acoustics and perceived trustworthiness in the single word "hello". Two prototypical voice stimuli were generated based on the acoustical features of voices rated low or high in perceived trustworthiness, respectively, as well as a continuum of stimuli inter- and extrapolated between these two prototypes. Five hundred listeners provided trustworthiness ratings on the stimuli via an online interface. We observed an extremely tight relationship between trustworthiness ratings and position along the trustworthiness continuum (r = 0.99). Not only were trustworthiness ratings higher for the high- than the low-prototypes, but the difference could be modulated quasi-linearly by reducing or exaggerating the acoustical difference between the prototypes, resulting in a strong caricaturing effect. The f0 trajectory, or intonation, appeared a parameter of particular relevance: hellos rated high in trustworthiness were characterized by a high starting f0 then a marked decrease at mid-utterance to finish on a strong rise. These results demonstrate a strong acoustical basis for voice personality impressions, opening the door to multiple potential applications.
Article
Full-text available
Aggressive, violent behaviour is a major burden and challenge for society. It has been linked to deficits in social understanding, but the evidence is inconsistent and the specifics of such deficits are unclear. Here, we investigated affective (empathy) and cognitive (Theory of Mind) routes to understanding other people in aggressive individuals. Twenty-nine men with a history of legally relevant aggressive behaviour (i.e. serious assault) and 32 control participants were tested using a social video task (EmpaToM) that differentiates empathy and Theory of Mind and completed questionnaires on aggression and alexithymia. Aggressive participants showed reduced empathic responses to emotional videos of others’ suffering, which correlated with aggression severity. Theory of Mind performance, in contrast, was intact. A mediation analysis revealed that reduced empathy in aggressive men was mediated by alexithymia. These findings stress the importance of distinguishing between socio-affective and socio-cognitive deficits for understanding aggressive behaviour and thereby contribute to the development of more efficient treatments.
Article
Full-text available
Despite known differences in the acoustic properties of children’s and adults’ voices, no work to date has examined the vocal cues associated with emotional prosody in youth. The current study investigated whether child (n = 24, 17 female, aged 9–15) and adult (n = 30, 15 female, aged 18–63) actors differed in the vocal cues underlying their portrayals of basic emotions (anger, disgust, fear, happiness, sadness) and social expressions (meanness, friendliness). We also compared the acoustic characteristics of meanness and friendliness to comparable basic emotions. The pattern of distinctions between expressions varied as a function of age for voice quality and mean pitch. Specifically, adults’ portrayals of the various expressions were more distinct in mean pitch than children’s, whereas children’s representations differed more in voice quality than adults’. Given the importance of pitch variables for the interpretation of a speaker’s intended emotion, expressions generated by adults may thus be easier for listeners to decode than those of children. Moreover, the vocal cues associated with the social expressions of meanness and friendliness were distinct from those of basic emotions like anger and happiness respectively. Overall, our findings highlight marked differences in the ways in which adults and children convey socio-emotional expressions vocally, and expand our understanding of the communication of paralanguage in social contexts. Implications for the literature on emotion recognition are discussed.
Article
Full-text available
Previous neuroimaging studies have revealed that a trait code is mainly represented in the ventral medial prefrontal cortex (vmPFC). However, those studies only investigated the neural code of warmth traits. According to the ‘Big Two’ model of impression formation, competence traits are the other major dimension when we judge others. The current study explored the neural representation of competence traits by using an fMRI repetition suppression paradigm, which is a rapid reduction of neuronal responses upon repeated presentation of the same implied trait. Participants had to infer an agent’s trait from brief behavioral descriptions that implied a competence trait. In each trial, the critical target sentence was preceded by a prime sentence that implied the same or opposite competencerelated trait, or no trait. The results revealed robust repetition suppression from prime to target in the vmPFC and precuneus during trait conditions. Critically, the suppression effect was much stronger after being primed with a similar and opposite competence trait compared with a trait-irrelevant prime. This suppression pattern was found nowhere else in the brain. Consistent with previous fMRI studies, we suggest that the neural code of competence traits is represented in these two brain areas with different levels of abstraction.
Conference Paper
Full-text available
Individual differences in human voice pitch (fundamental frequency, F0) have evolutionary relevance. Fundamental frequency indicates the sex, age, and even dominance of the speaker, and influences a host of social assessments including mate preferences. Yet, due to the almost exclusive utilization of cross-sectional designs in previous work, it remains unknown whether individual differences in F0 emerge before or after sexual maturation, and whether F0 remains stable throughout a person’s lifetime. In our study, we tracked within-individual variation in the F0s of male and female speakers whose voices were recorded from childhood into adulthood. Voice recordings were extracted from digital archives. Our results corroborate those of earlier cross-sectional studies indicating a sharp decrease in male F0 at puberty resulting in the emergence of sexual dimorphism in adult F0. Critically, our results further revealed that men’s pre-pubertal F0 strongly predicted their F0 at every subsequent adult age, and that F0 remained remarkably stable within-individuals throughout their adulthood. These findings suggest that adult men’s voice pitch may be linked to pre-natal/pre-pubertal androgen exposure and may function as a reliable and stable signal of mate quality, with implications for our understanding of the developmental mechanisms, adaptive functions, and social perception of human voice pitch.
Article
Full-text available
The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics (f 0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f 0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f 0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20–25, 40–45, and 60–65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers' age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency (f 0) and speech rate when attempting to sound younger and decreased f 0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f 0 , as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended.
Article
Full-text available
We aimed to progress understanding of prosodic emotion expression by establishing brain regions active when expressing specific emotions, those activated irrespective of the target emotion, and those whose activation intensity varied depending on individual performance. BOLD contrast data were acquired whilst participants spoke non-sense words in happy, angry or neutral tones, or performed jaw-movements. Emotion-specific analyses demonstrated that when expressing angry prosody, activated brain regions included the inferior frontal and superior temporal gyri, the insula, and the basal ganglia. When expressing happy prosody, the activated brain regions also included the superior temporal gyrus, insula, and basal ganglia, with additional activation in the anterior cingulate. Conjunction analysis confirmed that the superior temporal gyrus and basal ganglia were activated regardless of the specific emotion concerned. Nevertheless, disjunctive comparisons between the expression of angry and happy prosody established that anterior cingulate activity was significantly higher for angry prosody than for happy prosody production. Degree of inferior frontal gyrus activity correlated with the ability to express the target emotion through prosody. We conclude that expressing prosodic emotions (vs. neutral intonation) requires generic brain regions involved in comprehending numerous aspects of language, emotion-related processes such as experiencing emotions, and in the time-critical integration of speech information.
Article
Full-text available
Several mammalian species scale their voice fundamental frequency (F0) and formant frequencies in competitive and mating contexts, reducing vocal tract and laryngeal allometry thereby exaggerating apparent body size. Although humans' rare capacity to volitionally modulate these same frequencies is thought to subserve articulated speech, the potential function of voice frequency modulation in human nonverbal communication remains largely unexplored. Here, the voices of 167 men and women from Canada, Cuba, and Poland were recorded in a baseline condition and while volitionally imitating a physically small and large body size. Modulation of F0, formant spacing (∆F), and apparent vocal tract length (VTL) were measured using Praat. Our results indicate that men and women spontaneously and systemically increased VTL and decreased F0 to imitate a large body size, and reduced VTL and increased F0 to imitate small size. These voice modulations did not differ substantially across cultures, indicating potentially universal sound-size correspondences or anatomical and biomechanical constraints on voice modulation. In each culture, men generally modulated their voices (particularly formants) more than did women. This latter finding could help to explain sexual dimorphism in F0 and formants that is currently unaccounted for by sexual dimorphism in human vocal anatomy and body size.
Article
Full-text available
Psychiatric disorders can affect our ability to successfully and enjoyably interact with others. Conversely, having difficulties in social relations is known to increase the risk of developing a psychiatric disorder. In this article, the assumption that psychiatric disorders can be construed as disorders of social interaction is reviewed from a clinical point of view. Furthermore, it is argued that a psychiatrically motivated focus on the dynamics of social interaction may help to provide new perspectives for the field of social neuroscience. Such progress may be crucial to realize social neuroscience’s translational potential and to advance the transdiagnostic investigation of the neurobiology of psychiatric disorders.
Article
Full-text available
Deciphering the neural mechanisms of social behavior has propelled the growth of social neuroscience. The exact computations of the social brain, however, remain elusive. Here we investigated how the human brain tracks ongoing changes in social relationships using functional neuroimaging. Participants were lead characters in a role-playing game in which they were to find a new home and a job through interactions with virtual cartoon characters. We found that a two-dimensional geometric model of social relationships, a "social space" framed by power and affiliation, predicted hippocampal activity. Moreover, participants who reported better social skills showed stronger covariance between hippocampal activity and "movement" through "social space." The results suggest that the hippocampus is crucial for social cognition, and imply that beyond framing physical locations, the hippocampus computes a more general, inclusive, abstract, and multidimensional cognitive map consistent with its role in episodic memory. Copyright © 2015 Elsevier Inc. All rights reserved.
Article
Full-text available
Recent neural network models for the production of primate vocalizations are largely based on research in nonhuman primates. These models seem yet not fully capable of explaining the neural network dynamics especially underlying different types of human vocalizations. Unlike animal vocalizations, human affective vocalizations might involve higher levels of vocal control and monitoring demands, especially in case of more complex vocal expressions of emotions superimposed on speech. Here we therefore investigated the functional cortico-subcortical network underlying different types (evoked vs. repetition) of producing human affective vocalizations in terms of affective prosody, especially examining the aggressive tone of a voice while producing meaningless speech-like utterances. Functional magnetic resonance imaging revealed, first, that bilateral auditory cortices showed a close functional interconnectivity during affective vocalizations pointing to a bilateral exchange of relevant acoustic information of produced vocalizations. Second, bilateral motor cortices (MC) that directly control vocal motor behavior showed functional connectivity to the right inferior frontal gyrus (IFG) and the right superior temporal gyrus (STG). Thus, vocal motor behavior during affective vocalizations seems to be controlled by a right lateralized network that provides vocal monitoring (IFG), probably based on auditory feedback processing (STG). Third, the basal ganglia (BG) showed both positive and negative modulatory connectivity with several frontal (ACC, IFG) and temporal brain regions (STG). Finally, the repetition of affective prosody compared to evoked vocalizations revealed a more extended neural network probably based on higher control and vocal monitoring demands. Taken together, the functional brain network underlying human affective vocalizations revealed several features that have been so far neglected in models of primate vocalizations.
Article
Full-text available
In 2013, London Underground reinstated the actor Oswald Laurence's famous “Mind the gap” announcement at Embankment station, having learned that the widow of the actor had been regularly visiting this station since her husband's death in order to hear his voice again (Hope, 2013). Even in the absence of a personal connection to the couple, it is easy to find this an emotionally affecting story. Anecdotally, “It's so nice to hear your voice” is commonly encountered in telephone conversations with loved ones, yet there is relatively little known about the cognitive and neural underpinnings of this expression. Similarly, a sense of ownership of one's voice has important implications—companies like VocalID (www.vocalid.co) have recognized the impact of providing individualized voices to patients who rely upon synthesizers for communication—but, to date, the neuroscience of speech production has been predominantly concerned with the accurate formulation of linguistic messages. Although there are relatively unchanging aspects of every voice, due to the anatomical constraints of the talker's vocal tract as well as body size and shape (Kreiman and Sidtis, 2011), it is also important to note that the voice is not a static object. There is no such thing as a passive voice; the voice (like all sounds) demands an action to occur for it to exist at all (Scott and McGettigan, 2015). Much of our vocal expression is the result of voluntary motor acts, which can be modified consciously in response to changes in acoustic, informational and social demands (McGettigan and Scott, 2014; Scott and McGettigan, 2015). Sidtis and Kreiman (2012) write that the voice is “revelatory of ‘self,’ mental states, and consciousness,” reflecting “both the speaker and the context in which the voice is produced” (p. 150). It is thus a dynamic self that is modified according to the talker's goals, affecting both the talker and the addressee in their roles as perceivers and producers of verbal and non-verbal vocal signals. Disruption to paralinguistic aspects of voice perception and production has implications for psychosocial wellbeing. Most reports of Foreign Accent Syndrome—where patients produce altered speech that perceptually resembles a non-native accent (e.g., due to brain injury, or orofacial surgery)—concentrate on the phonetic, perceptual and neurological correlates of the disorder, yet there is evidence that there can also be significant impacts on the patient's sense of self-identity (Miller et al., 2011; DiLollo et al., 2014). In voice perception, difficulties in the recognition of emotional and attitudinal prosody have implications for effective psychosocial function in healthy aging, schizophrenia, and autism (Mitchell and Ross, 2013). It is thus crucial that neurobiological accounts of speech and voice processing consider not just what is said, but how it is said, in order to characterize the human aspects of vocal communication behaviors.
Article
Full-text available
Three studies examined Jones’ (Perspectives on Psychological Science, 9, 445-451, 2014) suggestion that psychopathic individuals use mimicry to avoid detection. In study 1, student, community, and offender participants posed fearful facial expressions while looking at a prototypical fear face. Expressions were coded for facial movements associated with fear and were rated on genuineness by a separate sample of undergraduates. Across samples, psychopathic traits were associated with increased use of typical action units for fearful facial expressions and with genuineness ratings. In study 2, undergraduates completed the Psychopathic Personality Inventory and told a story about a time when they did something that they should have felt remorseful for but did not. Factor 1 traits were found to positively relate to genuineness scores given by a separate sample of undergraduates. Finally, in study 3, four videos of false remorse stories told by violent offenders were rated by a sample of undergraduates. The two high factor 1 videos received significantly higher genuineness ratings, supporting the relationship between factor 1 and affective mimicry. Overall, findings suggest that the psychopathic traits (specifically, factor 1) may be associated with the ability to accurately mimic emotional expression (fear and remorse) leading others to perceive emotional genuineness.
Article
Full-text available
Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.
Article
Full-text available
Although the neural basis for the perception of vocal emotions has been described extensively, the neural basis for the expression of vocal emotions is almost unknown. Here, we asked participants both to repeat and to express high-arousing angry vocalizations to command (i.e., evoked expressions). First, repeated expressions elicited activity in the left middle superior temporal gyrus (STG), pointing to a short auditory memory trace for the repetition of vocal expressions. Evoked expressions activated the left hippocampus, suggesting the retrieval of long-term stored scripts. Secondly, angry compared with neutral expressions elicited activity in the inferior frontal cortex IFC and the dorsal basal ganglia (BG), specifically during evoked expressions. Angry expressions also activated the amygdala and anterior cingulate cortex (ACC), and the latter correlated with pupil size as an indicator of bodily arousal during emotional output behavior. Though uncorrelated, both ACC activity and pupil diameter were also increased during repetition trials indicating increased control demands during the more constraint production type of precisely repeating prosodic intonations. Finally, different acoustic measures of angry expressions were associated with activity in the left STG, bilateral inferior frontal gyrus, and dorsal BG. http://cercor.oxfordjournals.org/content/early/2014/04/15/cercor.bhu074.abstract
Article
Full-text available
On hearing a novel voice, listeners readily form personality impressions of that speaker. Accurate or not, these impressions are known to affect subsequent interactions; yet the underlying psychological and acoustical bases remain poorly understood. Furthermore, hitherto studies have focussed on extended speech as opposed to analysing the instantaneous impressions we obtain from first experience. In this paper, through a mass online rating experiment, 320 participants rated 64 sub-second vocal utterances of the word 'hello' on one of 10 personality traits. We show that: (1) personality judgements of brief utterances from unfamiliar speakers are consistent across listeners; (2) a two-dimensional 'social voice space' with axes mapping Valence (Trust, Likeability) and Dominance, each driven by differing combinations of vocal acoustics, adequately summarises ratings in both male and female voices; and (3) a positive combination of Valence and Dominance results in increased perceived male vocal Attractiveness, whereas perceived female vocal Attractiveness is largely controlled by increasing Valence. Results are discussed in relation to the rapid evaluation of personality and, in turn, the intent of others, as being driven by survival mechanisms via approach or avoidance behaviours. These findings provide empirical bases for predicting personality impressions from acoustical analyses of short utterances and for generating desired personality impressions in artificial voices.
Article
Full-text available
Evidence suggests that people can manipulate their vocal intonations to convey a host of emotional, trait, and situational images. We asked 40 participants (20 men and 20 women) to intentionally manipulate the sound of their voices in order to portray four traits: attractiveness, confidence, dominance, and intelligence to compare these samples to their normal speech. We then asked independent raters of the same- and opposite-sex to assess the degree to which each voice sample projected the given trait. Women’s manipulated voices were judged as sounding more attractive than their normal voices, but this was not the case for men. In contrast, men’s manipulated voices were rated by women as sounding more confident than their normal speech, but this did not hold true for women’s voices. Further, women were able to manipulate their voices to sound just as dominant as the men’s manipulated voices, and both sexes were able to modify their voices to sound more intelligent than their normal voice. We also assessed all voice samples objectively using spectrogram analyses and several vocal patterns emerged for each trait; among them we found that when trying to sound sexy/attractive, both sexes slowed their speech and women lowered their pitch and had greater vocal hoarseness. Both sexes raised their pitch and spoke louder to sound dominant and women had less vocal hoarseness. These findings are discussed using an evolutionary perspective and implicate voice modification as an important, deliberate aspect of communication, especially in the realm of mate selection and competition.
Article
Full-text available
We meta-analyzed imaging studies on theory of mind and formed individual task groups based on stimuli and instructions. Overlap in brain activation between all task groups was found in the mPFC and in the bilateral posterior TPJ. This supports the idea of a core network for theory of mind that is activated whenever we are reasoning about mental states, irrespective of the task- and stimulus-formats (Mar, 2011). In addition, we found a number of task-related activation differences surrounding this core-network. ROI based analyses show that areas in the TPJ, the mPFC, the precuneus, the temporal lobes and the inferior frontal gyri have distinct profiles of task-related activation. Functional accounts of these areas are reviewed and discussed with respect to our findings.
Article
Social life requires people to predict the future: people must anticipate others’ thoughts, feelings, and actions to interact with them successfully. The theory of predictive coding suggests that the social brain may meet this need by automatically predicting others’ social futures. If so, when representing others’ current mental state, the brain should already start representing their future states. To test this hypothesis, we used fMRI to measure female and male human participants’ neural representations of mental states. Representational similarity analysis revealed that neural patterns associated with mental states currently under consideration resembled patterns of likely future states more so than patterns of unlikely future states. This effect manifested in activity across the social brain network and in medial prefrontal cortex in particular. Repetition suppression analysis also supported the social predictive coding hypothesis: considering mental states presented in predictable sequences reduced activity in the precuneus relative to unpredictable sequences. In addition to demonstrating that the brain makes automatic predictions of others’ social futures, the results also demonstrate that the brain leverages a 3D representational space to make these predictions. Proximity between mental states on the psychological dimensions of rationality, social impact, and valence explained much of the association between state-specific neural pattern similarity and state transition likelihood. Together, these findings suggest that the way the brain represents the social present gives people an automatic glimpse of the social future.
Article
Stereotype research emphasizes systematic processes over seemingly arbitrary contents, but content also may prove systematic. On the basis of stereotypes' intergroup functions, the stereotype content model hypothesizes that (a) 2 primary dimensions are competence and warmth, (b) frequent mixed clusters combine high warmth with low competence (paternalistic) or high competence with low warmth (envious), and (c) distinct emotions (pity, envy, admiration, contempt) differentiate the 4 competence-warmth combinations. Stereotypically, (d) status predicts high competence, and competition predicts low warmth. Nine varied samples rated gender, ethnicity, race, class, age, and disability out-groups. Contrary to antipathy models, 2 dimensions mattered, and many stereotypes were mixed, either pitying (low competence, high warmth subordinates) or envying (high competence, low warmth competitors). Stereotypically, status predicted competence, and competition predicted low warmth.
Article
Feeling of knowing (or expressed confidence) reflects a speaker's certainty or commitment to a statement and can be associated with one's trustworthiness or persuasiveness in social interaction. We investigated the perceptual-acoustic correlates of expressed confidence and doubt in spoken language, with a focus on both linguistic and vocal speech cues. In Experiment 1, utterances subserving different communicative functions (e.g., stating facts, making judgments) produced in a confident, close-to-confident, unconfident, and neutral-intending voice by six speakers, were then rated for perceived confidence by 72 native listeners. As expected, speaker confidence ratings increased with the intended level of expressed confidence; neutral-intending statements were frequently judged as relatively high in confidence. The communicative function of the statement, and the presence vs. absence of an utterance-initial probability phrase (e.g., Maybe, I'm sure), further modulated speaker confidence ratings. In Experiment 2, acoustic analysis of perceptually valid tokens rated in Expt. 1 revealed distinct patterns of pitch, intensity and temporal features according to perceived confidence levels; confident expressions were highest in fundamental frequency (f0) range, mean amplitude, and amplitude range, whereas unconfident expressions were highest in mean f0, slowest in speaking rate, with more frequent pauses. Dynamic analyses of f0 and intensity changes across the utterance uncovered distinctive patterns in expression as a function of confidence level at different positions of the utterance. Our findings provide new information on how metacognitive states such as confidence and doubt are communicated by vocal and linguistic cues which permit listeners to arrive at graded impressions of a speaker's feeling of (un)knowing.
Article
Explaining the evolution of speech and language poses one of the biggest challenges in biology. We propose a dual network model that posits a volitional articulatory motor network (VAMN) originating in the prefrontal cortex (PFC; including Broca's area) that cognitively controls vocal output of a phylogenetically conserved primary vocal motor network (PVMN) situated in subcortical structures. By comparing the connections between these two systems in human and nonhuman primate brains, we identify crucial biological preadaptations in monkeys for the emergence of a language system in humans. This model of language evolution explains the exclusiveness of non-verbal communication sounds (e.g., cries) in infants with an immature PFC, as well as the observed emergence of non-linguistic vocalizations in adults after frontal lobe pathologies.
Article
During political elections, voters rely on various cues that signal good social leadership, such as indicators of physical strength and masculinity. In adult men, masculine traits are related to testosterone levels, and one of those traits is low-pitched voice. Hence, lower pitch in a presidential candidate may be related to the election's outcome. This prediction is supported by experimental evidence showing that people prefer to vote for a candidate with a low-pitched voice. The aim of this study was to investigate the relationship between presidential candidates' vocal characteristics and actual election outcomes in 51 presidential elections held across the world. After analysis of the voices of opposing candidates, results showed that winners had lower-pitched voices with less pitch variability. Moreover, regression analysis revealed an interaction effect of voice pitch and voice pitch variability on the election outcome. Candidates with lower-pitched voices had greater likelihood of winning the election if they had higher pitch variability. This study extends previous findings, shows the importance of assessing vocal characteristics other than voice pitch, and offers external validity for the experimental evidence that candidates' vocal characteristics are related to the election outcome.
Article
Discriminating between auditory signals of different affective value is critical to successful social interaction. It is commonly held that acoustic decoding of such signals occurs in the auditory system, whereas affective decoding occurs in the amygdala. However, given that the amygdala receives direct subcortical projections that bypass the auditory cortex, it is possible that some acoustic decoding occurs in the amygdala as well, when the acoustic features are relevant for affective discrimination. We tested this hypothesis by combining functional neuroimaging with the neurophysiological phenomena of repetition suppression and repetition enhancement in human listeners. Our results show that both amygdala and auditory cortex responded differentially to physical voice features, suggesting that the amygdala and auditory cortex decode the affective quality of the voice not only by processing the emotional content from previously processed acoustic features, but also by processing the acoustic features themselves, when these are relevant to the identification of the voice’s affective value. Specifically, we found that the auditory cortex is sensitive to spectral high-frequency voice cues when discriminating vocal anger from vocal fear and joy, whereas the amygdala is sensitive to vocal pitch when discriminating between negative vocal emotions (i.e. anger and fear). Vocal pitch is an instantaneously recognized voice feature, which is potentially transferred to the amygdala by direct subcortical projections. These results together provide evidence that, besides the auditory cortex, the amygdala too processes acoustic information, when this is relevant to the discrimination of auditory emotions.
Article
The study of voice perception in congenitally blind individuals allows researchers rare insight into how a lifetime of visual deprivation affects the development of voice perception. Previous studies have suggested that blind adults outperform their sighted counterparts in low-level auditory tasks testing spatial localization and pitch discrimination, as well as in verbal speech processing; however, blind persons generally show no advantage in nonverbal voice recognition or discrimination tasks. The present study is the first to examine whether visual experience influences the development of social stereotypes that are formed on the basis of nonverbal vocal characteristics (i.e., voice pitch). Groups of 27 congenitally or early-blind adults and 23 sighted controls assessed the trustworthiness, competence, and warmth of men and women speaking a series of vowels, whose voice pitches had been experimentally raised or lowered. Blind and sighted listeners judged both men’s and women’s voices with lowered pitch as being more competent and trustworthy than voices with raised pitch. In contrast, raised-pitch voices were judged as being warmer than were lowered-pitch voices, but only for women’s voices. Crucially, blind and sighted persons did not differ in their voice-based assessments of competence or warmth, or in their certainty of these assessments, whereas the association between low pitch and trustworthiness in women’s voices was weaker among blind than sighted participants. This latter result suggests that blind persons may rely less heavily on nonverbal cues to trustworthiness compared to sighted persons. Ultimately, our findings suggest that robust perceptual associations that systematically link voice pitch to the social and personal dimensions of a speaker can develop without visual input.
Article
Any account of “what is special about the human brain” (Passingham 2008) must specify the neural basis of our unique ability to produce speech and delineate how these remarkable motor capabilities could have emerged in our hominin ancestors. Clinical data suggest that the basal ganglia provide a platform for the integration of primate-general mechanisms of acoustic communication with the faculty of articulate speech in humans. Furthermore, neurobiological and paleoanthropological data point at a two-stage model of the phylogenetic evolution of this crucial prerequisite of spoken language: (i) monosynaptic refinement of the projections of motor cortex to the brainstem nuclei that steer laryngeal muscles, presumably, as part of a “phylogenetic trend” associated with increasing brain size during hominin evolution; (ii) subsequent vocal-laryngeal elaboration of cortico-basal ganglia circuitries, driven by human-specific FOXP2 mutations. This concept implies vocal continuity of spoken language evolution at the motor level, elucidating the deep entrenchment of articulate speech into a “nonverbal matrix” (Ingold 1994), which is not accounted for by gestural-origin theories. Moreover, it provides a solution to the question for the adaptive value of the “first word” (Bickerton 2009) since even the earliest and most simple verbal utterances must have increased the versatility of vocal displays afforded by the preceding elaboration of monosynaptic corticobulbar tracts, giving rise to enhanced social cooperation and prestige. At the ontogenetic level, the proposed model assumes age-dependent interactions between the basal ganglia and their cortical targets, similar to vocal learning in some songbirds. In this view, the emergence of articulate speech builds on the “renaissance” of an ancient organizational principle and, hence, may represent an example of “evolutionary tinkering” (Jacob 1977).
Book
A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods. The emphasis is on presenting practical problems and full analyses of real data sets.
Article
Behavioral evidence and theory suggest gesture and language processing may be part of a shared cognitive system for communication. While much research demonstrates both gesture and language recruit regions along perisylvian cortex, relatively less work has tested functional segregation within these regions on an individual level. Additionally, while most work has focused on a shared semantic network, less has examined shared regions for processing communicative intent. To address these questions, functional and structural MRI data were collected from 24 adult participants while viewing videos of an experimenter producing communicative, Participant-Directed Gestures (PDG) (e.g., "Hello, come here"), noncommunicative Self-adaptor Gestures (SG) (e.g., smoothing hair), and three written text conditions: (1) Participant-Directed Sentences (PDS), matched in content to PDG, (2) Third-person Sentences (3PS), describing a character's actions from a third-person perspective, and (3) meaningless sentences, Jabberwocky (JW). Surface-based conjunction and individual functional region of interest analyses identified shared neural activation between gesture (PDGvsSG) and language processing using two different language contrasts. Conjunction analyses of gesture (PDGvsSG) and Third-person Sentences versus Jabberwocky revealed overlap within left anterior and posterior superior temporal sulcus (STS). Conjunction analyses of gesture and Participant-Directed Sentences to Third-person Sentences revealed regions sensitive to communicative intent, including the left middle and posterior STS and left inferior frontal gyrus. Further, parametric modulation using participants' ratings of stimuli revealed sensitivity of left posterior STS to individual perceptions of communicative intent in gesture. These data highlight an important role of the STS in processing participant-directed communicative intent through gesture and language. Hum Brain Mapp, 2016. © 2016 Wiley Periodicals, Inc.
Article
Theory of mind (ToM) is an important skill that refers broadly to the capacity to understand the mental states of others. A large number of neuroimaging studies have focused on identifying the functional brain regions involved in ToM, but many important questions remain with respect to the neural networks implicated in specific types of ToM task. In the present study, we conducted a series of activation likelihood estimation (ALE) meta-analyses on 144 datasets (involving 3150 participants) to address these questions. The ALE results revealed common regions shared across all ToM tasks and broader task parameters, but also some important dissociations. In terms of commonalities, consistent activation was identified in the medial prefrontal cortex and bilateral temporoparietal junction. On the other hand, ALE contrast analyses on our dataset, as well as meta-analytic connectivity modelling (MACM) analyses on the BrainMap database, indicated that different types of ToM tasks reliably elicit activity in unique brain areas. Our findings provide the most accurate picture to date of the neural networks that underpin ToM function.
Article
We examined neural activity, in the frontal lobes, associated with speech production during affective states. Using functional magnetic resonance imaging (fMRI), the blood oxygen level-dependent (BOLD) response to the overt reading of emotionally neutral sentences was measured before and after a happy or sad mood induction. There was no explicit demand to produce affect-congruent speech and a cover story was used to de-emphasize the significance of the speech task in light of our experimental aims. Each fMRI measurement was acquired 6 s after the onset of sentence presentation so that speech could be recorded while the scanner noise was minimal; speech parameters (e.g. pitch variation) were extracted from the sentences and regressed against fMRI data. In the sad group we found the predicted changes in affect and pitch variation. Further, the fMRI data confirmed our hypothesis in that the 'reading effect' (i.e. the BOLD response to reading minus the BOLD response to baseline stimuli) in the supracallosal anterior cingulate cortex covaried negatively with both pitch variation and affect. Our results suggest that the anterior cingulate cortex modulates paralinguistic features of speech during affective states, thus placing this neural structure at the interface between action and emotions.
Article
An unresolved issue in comparative approaches to speech evolution is the apparent absence of an intermediate vocal communication system betweenhuman speech and the lessflexible vocal repertoires of other primates. We argue that humans’ ability to modulate nonverbal vocal features evolutionarily linked to expression of body size and sex (fundamental and formant frequencies) provides a largely overlooked window into the nature of this intermediate system. Recent behavioral and neural evidence indicates that humans’ vocal control abilities, commonly assumed to subserve speech, extend to these nonverbal dimensions. This capacity appears in continuity with context-dependent frequency modulations recently identified in other mammals, including primates, and may represent a living relic of early vocal control abilities that led to articulated human speech.
Article
This study investigates to what extent social and competence traits are represented in a similar or different neural trait code. To localize these trait codes, we used fMRI repetition suppression, which is a rapid reduction of neuronal responses upon repeated presentation of the same implied trait. Participants had to infer an agent's trait from brief trait-implying behavioral descriptions. In each trial, the critical target sentence was preceded by a prime sentence that implied the same trait or a different competence-related trait which was also opposite in valence. The results revealed robust repetition suppression from prime to target in the ventral medial prefrontal cortex (mPFC) given a similar (social) as well as a dissimilar (competence) prime. The suppression given a similar prime confirms earlier research demonstrating that a trait code is represented in the ventral mPFC. The suppression given a dissimilar prime is interpreted as indicating that participants categorize a combination of competence and social information into novel subcategories, reflecting nice (but incompetent) or nerdy (but socially awkward) traits. A multivoxel pattern analysis broadly confirmed these results, and pinpointed the inferior parietal cortex, cerebellum, temporo-parietal junction and mPFC as areas that differentiate between social and competence traits.
Article
Recent years have seen a major change in views on language and lan- guage use. During the last decades, language use has been more and more recognized as an intentional action (Grice 1957). In the form of speech acts (Austin 1962; Searle 1969), language expresses the speaker’s attitudes and communicative intents to shape the listener’s reaction. Notably, the speaker’s intention is often not directly coded in the lexical meaning of a sentence, but rather conveyed implicitly, for example via nonverbal cues such as mimics, body posture, and speech prosody. The theoretical work of intonational phonologists seeking to define the meaning of specific vocal intonation profiles (Bolinger 1986; Kohler 1991) demonstrates the role of prosody in conveying the speaker’s conversational goal. However, to date only little is known about the neurocognitive architecture underlying the comprehension of communicative intents in general (Holtgraves 2005; Egorova, Shtyrov, Pulvermüller 2013), and the distinctive role of prosody in particular. The present study aimed, therefore, to investigate this interpersonal role of prosody in conveying the speaker’s intents and its underlying acoustic properties. Taking speech act theory as a framework for intention in language (Austin 1962; Searle 1969), we created a novel set of short (non-)word utterances intoned to express different speech acts. Adopting an approach from emotional prosody research (Banse, Scherer 1996; Sauter, Eisner, Calder, Scott 2010), this stimulus set was employed in a combination of behavioral ratings and acoustic analyzes to test the following hypotheses: If prosody codes for the communicative intention of the speaker, we expect 1) above-chance behavioral recognition of different intentions that are merely expressed via prosody, 2) acoustic markers in the prosody that identify these intentions, and 3) independence of acoustics and behavior from the overt lexical meaning of the utterance. The German words ‘‘Bier’’ (beer) and ‘‘Bar’’ (bar) and the non-words ‘‘Diem’’ and ‘‘Dahm’’ were recorded from four (two female) speakers expressing six different speech acts in their prosody—crit- icism, wish (expressives), warning, suggestion (directives), doubt, and naming (assertives). Acoustic features for pitch, duration, intensity, and spectral features were extracted with PRAAT. These measures were subjected to discriminant analyzes—separately for words and non-words—in order to test whether the acoustic features have enough discriminant power to classify the stimuli to their corre- sponding speech act category. Furthermore, 20 participants were tested for the behavioral recognition of the speech act categories with a 6 alternative-forced-choice task. Finally, a new group of 40 par- ticipants performed subjective ratings of the different speech acts (e.g. ‘‘How much does the stimulus sound like criticism?’’) to obtain more detailed information on the perception of different intentions and allow, as quantitative variable, further analyzes in combination with the acoustic measures. The discriminant analyzes of the acoustic features yielded high above chance predictions for each speech act category, with an overall classification accuracy of about 90 % for both words and non- words (chance level: 17 %). Likewise, participants were behaviorally very well able to classify the stimuli into the correct category, with a slightly lower accuracy for non-words (73 %) than for words (81 %). Multiple regression analyzes of participants’ ratings of the different speech acts and the acoustic measures further identified distinct pat- terns of physical features that were able to predict the behavioral perception. These findings indicate that prosodic cues convey sufficient detail to classify short (non-)word utterances according to their underlying intention, at acoustic as well as perceptual levels. Lexical meaning seems to be supportive but not necessary for the comprehension of different intentions, given that participants showed a high perfor- mance for the non-words, but scored higher for the words. In total, our results show that prosodic cues are powerful indicators for the speaker’s intentions in interpersonal communication. The present carefully constructed stimulus set will serve as a useful tool to study the neural correlates of intentional prosody in the future.
Article
Affective prosody is that aspect of speech that conveys a speaker's emotional state through modulations in various vocal parameters, most prominently pitch. While a large body of research implicates the cingulate vocalization area in controlling affective vocalizations in monkeys, no systematic test of this functional homology has yet been reported in humans. In the present study, we used functional MRI to compare brain activations when subjects produced affective vocalizations in the form of exclamations versus non-affective vocalizations with similar pitch contours. We also examined the perception of affective vocalizations by having participants make judgments about either the emotions being conveyed by recorded affective vocalizations or the pitch contours of acoustically similar but non-affective vocalizations. Production of affective vocalizations and matched pitch contours activated a highly overlapping set of brain areas, including the larynx-phonation area of the primary motor cortex and a region of the anterior cingulate cortex that is consistent with the macro-anatomical position of the cingulate vocalization area. This overlap contradicts the dominant view that these areas form two distinct vocal pathways with dissociable functions. Instead, we propose that these brain areas are nodes in a single vocal network, with an emphasis on pitch modulation as a vehicle for affective expression. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Article
The human voice is the primary carrier of speech but also a fingerprint for person identity. Previous neuroimaging studies have revealed that speech and identity recognition is accomplished by partially different neural pathways, despite the perceptual unity of the vocal sound [Formisano, E., De Martino, F., Bonte, M., & Goebel, R. "Who" is saying "what"? Brain-based decoding of human voice and speech. Science, 322, 970-973, 2008; von Kriegstein, K., Eger, E., Kleinschmidt, A., & Giraud, A. L. Modulation of neural responses to speech by directing attention to voices or verbal content. Brain Research, Cognitive Brain Research, 17, 48-55, 2003]. Importantly, the right STS has been implicated in voice processing, with different contributions of its posterior and anterior parts. However, the time point at which vocal and speech processing diverge is currently unknown. Also, the exact role of right STS during voice processing is so far unclear because its behavioral relevance has not yet been established. Here, we used the high temporal resolution of magnetoencephalography and a speech task control to pinpoint transient behavioral correlates: we found, at 200 msec after stimulus onset, that activity in right anterior STS predicted behavioral voice recognition performance. At the same time point, the posterior right STS showed increased activity during voice identity recognition in contrast to speech recognition whereas the left mid STS showed the reverse pattern. In contrast to the highly speech-sensitive left STS, the current results highlight the right STS as a key area for voice identity recognition and show that its anatomical-functional division emerges around 200 msec after stimulus onset. We suggest that this time point marks the speech-independent processing of vocal sounds in the posterior STS and their successful mapping to vocal identities in the anterior STS.
Article
Research on mammals predicts that the anterior striatum is a central component of human motor learning. However, as vocalizations in most mammals are innate, much of the neurobiology of human vocal learning has been inferred from studies on songbirds. Essential for song learning is a pathway, the homologue of mammalian cortical-basal ganglia 'loops', which includes the avian striatum. The present functional magnetic resonance imaging (fMRI) study investigated adult human vocal learning, a skill that persists throughout life, albeit imperfectly as late-acquired languages are spoken with an accent. Monolingual adult participants were scanned while repeating novel non-native words. After training on the pronunciation of half the words for one week, they underwent a second scan. During scanning there was no external feedback on performance. Activity declined sharply in left and right anterior striatum, both within and between scanning sessions, and this change was independent of training and performance. This indicates that adult speakers rapidly adapt to the novel articulatory movements, possibly by using motor sequences from their native speech to approximate those required for the novel speech sounds. Improved accuracy correlated only with activity in motor-sensory perisylvian cortex. We propose that future studies on vocal learning, using different behavioral and pharmacological manipulations, will provide insights into adult striatal plasticity and its potential for modification in both educational and clinical contexts.
Article
Adult listeners are capable of identifying the gender of speakers as young as 4 years old from their voice. In the absence of a clear anatomical dimorphism in the dimensions of pre-pubertal boys' and girls' vocal apparatus, the observed gender differences may reflect children's regulation of their vocal behaviour. A detailed acoustic analysis was conducted of the utterances of 34 6- to 9-year-old children, in their normal voices and also when asked explicitly to speak like a boy or a girl. Results showed statistically significant shifts in fundamental and formant frequency values towards those expected from the sex dimorphism in adult voices. Directions for future research on the role of vocal behaviours in pre-pubertal children's expression of gender are considered.