Article

On how the brain decodes vocal cues about speaker confidence

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Confident utterances are also more likely to exhibit falling intonation, while uncertain utterances are more likely to exhibit rising intonation [SC93,BW95,SK05]. Some studies find a lower pitch with more confident utterances [JP15], while others find a higher pitch [SLW73]. These acoustic cues can also influence listeners' perception of a speaker's certainty. ...
... These acoustic cues can also influence listeners' perception of a speaker's certainty. Perceived confidence increases with faster speech rate [BW95, PBS11, GFVJ19 Most of the existing work on the relationship between certainty and acoustic characteristics of speech addresses the speaker's confidence, e.g., based on instructions to speak confidently or doubtfully [SLW73,JP15] or when answering trivia questions that they feel more or less confident about [SC93,BW95]. The work in perception of these acoustic characteristics similarly asks listeners to evaluate how confident a speaker sounds [BW95, PBS11, JP17, KLSG22], rather than evaluating certainty of the information itself. ...
... In Experiment 2, we added variants to further capture the impact of content and speech attributes, including concrete text with E1 modifications, fuzzy text with no modifications, and fuzzy text with E1 modifications and question contour. As some acoustic correlates of uncertainty are also correlates of emphasis, we used a lower pitch modification to use the acoustic cues as signaling uncertainty rather than emphasis; a higher pitch is a main correlate of emphasis [LP84,KS11], but results for its relationship to confidence are varied [JP15,SLW73]. Speech stimuli were created using Google Speech Synthesis Markup Language (SSML) [Goo23], a standardized markup language that allows adjustments in synthesized speech. Experiment 1 stimuli are shown in Table 1, and Experiment 2 stimuli are shown in Table 2. ...
Article
Full-text available
Understanding and communicating data uncertainty is crucial for making informed decisions in sectors like finance and healthcare. Previous work has explored how to express uncertainty in various modes. For example, uncertainty can be expressed visually with quantile dot plots or linguistically with hedge words and prosody. Our research aims to systematically explore how variations within each mode contribute to communicating uncertainty to the user; this allows us to better understand each mode's affordances and limitations. We completed an exploration of the uncertainty design space based on pilot studies and ran two crowdsourced experiments examining how speech, text, and visualization modes and variants within them impact decision‐making with uncertain data. Visualization and text were most effective for rational decision‐making, though text resulted in lower confidence. Speech garnered the highest trust despite sometimes leading to risky decisions. Results from these studies indicate meaningful trade‐offs among modes of information and encourage exploration of multimodal data representations.
... Men are often rated as more confident and agentic than women (Fisk and Ridgeway, 2018), but when women exhibit assertiveness traits, they are often rated more severely and suffer social and professional backlash (e.g., overlooked for professional advancement; Amanatullah and Tinsley, 2013). Perceptions of confidence are important because they allow us to command respect from others, influence our social status, persuade others, and communicate trust through knowledge and certainty (Heesacker et al., 1983;Booth-Butterfield and Gutowski, 1993;Driskell et al., 1993;Carli et al., 1995;Jiang and Pell, 2015, 2016, 2017Mori and Pell, 2019)-which may be important social goals to women and men alike. To close this assumed gender communication gap, it is important to consider how adapting the interpretation of socio-indexical cues (i.e., social cues that relate to the context; Clark, 1998;Pajak et al., 2016;Yu, 2022-e.g., social features of who is saying it may shape interpretation; Babel and Russell, 2015) away from common gender stereotypes could positively impact women in society. ...
... rising intonation, and declining intonation). Based on results from other studies, it was assumed that sound files with rising intonation would be perceived as less confident, while sound files with declining intonation would be perceived as more confident (Jiang and Pell, 2015;Roche et al., 2019Roche et al., , 2022. We had no a priori assumptions about the natural/flat productions and included them mainly to increase the number of trials so that listeners would not guess the purpose of the task (i.e., consistent with the matched guise technique, Ball and Giles, 1982), and these trials acted as fillers. ...
... Therefore, the listeners adapted or adjusted their social judgments based on the statistical properties of the listening context. In other words, because both women and men are known to produce the rising intonation cue in a lack of confidence context (Jiang and Pell, 2015;Roche et al., 2019Roche et al., , 2022, the sheer number of people producing this cue reduced its weightage, which avoided the triggering of the heuristic, stereotypical response that women are less confident than men. In fact, when we visually inspected confidence ratings for the female and male-pitched voices over time in the mixed condition, there is a clear visual pattern that supports this assumption. ...
Article
Full-text available
Introduction Socio-indexical cues to gender and vocal affect often interact and sometimes lead listeners to make differential judgements of affective intent based on the gender of the speaker. Previous research suggests that rising intonation is a common cue that both women and men produce to communicate lack of confidence, but listeners are more sensitive to this cue when it is produced by women. Some speech perception theories assume that listeners will track conditional statistics of speech and language cues (e.g., frequency of the socio-indexical cues to gender and affect) in their listening and communication environments during speech perception. It is currently less clear if these conditional statistics will impact listener ratings when context varies (e.g., number of talkers). Methods To test this, we presented listeners with vocal utterances from one female and one male-pitched voice (single talker condition) or many female/male-pitched voices (4 female voices; 4 female voices pitch-shifted to a male range) to examine how they impacted perceptions of talker confidence. Results Results indicated that when one voice was evaluated, listeners defaulted to the gender stereotype that the female voice using rising intonation (a cue to lack of confidence) was less confident than the male-pitched voice (using the same cue). However, in the multi-talker condition, this effect went away and listeners equally rated the confidence of the female and male-pitched voices. Discussion Findings support dual process theories of information processing, such that listeners may rely on heuristics when speech perception is devoid of context, but when there are no differentiating qualities across talkers (regardless of gender), listeners may be ideal adapters who focus on only the relevant cues.
... A power analysis was run using G*power to determine a minimum sample size for the experiment (Faul et al., 2007). Expecting large effects of prosody, but conservatively assuming medium effect sizes regarding accent effects on the ERP components of interest in this experimental setting (Jiang and Pell, 2015;Mauchand et al., 2021), a minimum of 24 participants was determined to achieve a power over 80%. ...
... How the brain registers vocal cues to interpret the communicative intentions of a speaker is being actively explored in reference to various types of speech acts (Jiang and Pell, 2015;Mauchand et al., 2021;Rigoulot et al., 2020b;Vergis et al., 2020;Zougkou et al., 2017) and in the context of different indexical cues derived about a speaker (Foucart and Hartsuiker, 2021;Jiang et al., 2020). The present study is the first to provide ERP evidence demonstrating the time course for deriving meaning from third-party complaints, which serve primarily an expressive function during speech communication. ...
... Later, though, neutral speech elicited an enhanced, widespread negative deflection in the 650-900 ms latency range (peaking at ~800 ms), but only for ingroup statements produced in a neutral tone (compared to all other stimulus conditions). Interestingly, Jiang and Pell (2015) reported an increased late positivity beginning ~1000 ms post-onset of neutral-sounding (vs. overtly confident or unconfident) utterances when listeners evaluated the speaker's confidence level based on their vocal expression. ...
Article
Interpersonal communication often involves sharing our feelings with others; complaining, for example, aims to elicit empathy in listeners by vocally expressing a speaker's suffering. Despite the growing neuroscientific interest in the phenomenon of empathy, few have investigated how it is elicited in real time by vocal signals (prosody), and how this might be affected by interpersonal factors, such as a speaker's cultural background (based on their accent). To investigate the neural processes at play when hearing spoken complaints, twenty-six French participants listened to complaining and neutral utterances produced by in-group French and out-group Québécois (i.e., French-Canadian) speakers. Participants rated how hurt the speaker felt while their cerebral activity was monitored with electroencephalography (EEG). Principal Component Analysis of Event-Related Potentials (ERPs) taken at utterance onset showed culture-dependent time courses of emotive prosody processing. The high motivational relevance of ingroup complaints increased the P200 response compared to all other utterance types; in contrast, outgroup complaints selectively elicited an early posterior negativity in the same time window, followed by an increased N400 (due to ongoing effort to derive affective meaning from outgroup voices). Ingroup neutral utterances evoked a late negativity which may reflect re-analysis of emotively less salient, but culturally relevant ingroup speech. Results highlight the time-course of neurocognitive responses that contribute to emotive speech processing for complaints, establishing the critical role of prosody as well as social-relational factors (i.e., cultural identity) on how listeners are likely to “empathize” with a speaker.
... In addition, we analyzed data between 300-500 ms (N400) and between 600-800 ms (late positivity) to make inferences about the late effects. These time windows were informed by visual inspection and previous studies which show that they are sensitive to linguistic-pragmatic manipulations and processing demands created by vocal information in speech (Jiang et al., , 2013aJiang & Pell, 2015;. ...
... Because there is not a large body of literature documenting the topography of ERPs related to the processing of vocal politeness, we made no a priori assumptions regarding the spatial distribution of effects. Instead, we defined nine regions of interest (ROI) represented by 6-8 electrodes each (Jiang & Pell, 2015): Accordingly, ROI was included as an additional fixed factor in the analysis, allowing us to test for possible distribution effects in interaction with the tone of voice and politeness marker factors. Subjects and channels were included as random intercepts in all models (Payne et al., 2015;. ...
... Results of studies addressing other types of pragmatic distinctions align with this interpretation. In tasks requiring listeners to evaluate whether speakers sound confident, the P200 decreases for stimuli with an unconfident (vs confident or neutral) tone of voice (Jiang & Pell, 2015). In contrast, when listeners must evaluate whether the speakers are believable, the amplitude of the P200 increases for utterances spoken with an unconfident tone . ...
Article
Information in the tone of voice alters social impressions of a speaker and underlying brain activity as listeners evaluate the interpersonal relevance of an utterance. Here, we presented basic requests that expressed politeness distinctions through the speaker’s voice (polite/rude) and the use of explicit linguistic markers (half of the requests began with Please). Thirty participants performed a social perception task (rating friendliness) while their electroencephalogram was recorded. Behaviorally, vocal politeness strategies had a much stronger influence on the perceived friendliness of the speaker than the use of Please. Analysis of event-related potentials from stimulus onset revealed rapid effects of (im)polite voices on cortical brain activity prior to ~300ms; irrespective of whether speakers said Please, P200 amplitudes increased for polite versus rude voices, suggesting that the speaker’s polite stance was registered as more salient in our task. At later stages of meaning elaboration, politeness distinctions encoded by the speaker’s voice and their use of Please interacted, modulating activity in the N400 (300-500ms) and late positivity (600-800ms) time windows. Patterns of results suggest that initial attention deployment to politeness cues is rapidly influenced by the motivational significance of a speaker’s voice. At later stages, processes for integrating vocal and lexical information resulted in increased cognitive effort to reevaluate utterances with ambiguous or contradictory cues about speaker politeness. The potential influence of social anxiety on the P200 effect is also discussed.
... To ensure that their intention is accurately recognized, speakers use different forms of contextual and pragmatic cues, such as "tone of voice" (speech prosody), to help listeners detect their interpersonal stance and to retrieve meanings that go beyond the semantic content of the verbal message (Gibbs & Colston, 2007;Pexman, 2008). Prosodic information furnishes powerful cues about the affective disposition, mental state, and social (e.g., politeness) intentions of a speaker as listeners process language (Belin, Fecteau, & Bédard, 2004;Jiang & Pell, 2015;Van Lancker Sidtis, Pachana, Cummings, & Sidtis, 2006;. ...
... To characterize how listeners incrementally construct representations of ironic utterances in daily interactions, it is necessary to establish a time course of irony processing that considers the uptake of prosodic cues that express different types of verbal irony (e.g., criticisms, compliments) and different time intervals during which listeners encounter pragmatic markers of irony and integrate them with linguistic information. Event-related potentials (ERPs) are well suited to this task, because they allow us to capture fine-grained differences in cognitive function as prosodic cues are first registered and then integrated into an utterance representation in real time (Jiang & Pell, 2015;Rigoulot, Vergis, Jiang & Pell, 2020). ...
... Documenting this dynamic process will require analysis of ERPs at multiple time points during irony processing (Kowatch et al., 2013). In an adjacent literature, it has been shown that vocal expressions that encode various facets of a speaker's affective or mental state are registered rapidly and automatically by listeners directly from speech onset (Jiang, Gossack-Keenan, & Pell, 2020;Jiang & Pell, 2015;Paulmann & Kotz, 2008). Notably, motivationally salient prosodic cues increase the P200 amplitude from utterance onset in a range of communicative contexts, depending on their potential relevance to the listener and/or the task they are engaged in (Hajcak, Weinberg, MacNamara, & Foti, 2012;Paulmann & Kotz, 2008;Pell et al., 2015). ...
Article
In social interactions, speakers often use their tone of voice (“prosody”) to communicate their interpersonal stance to pragmatically mark an ironic intention (e.g., sarcasm). The neurocognitive effects of prosody as listeners process ironic statements in real time are still poorly understood. In this study, 30 participants judged the friendliness of literal and ironic criticisms and compliments in the absence of context while their electrical brain activity was recorded. Event-related potentials reflecting the uptake of prosodic information were tracked at two time points in the utterance. Prosody robustly modulated P200 and late positivity amplitudes from utterance onset. These early neural responses registered both the speaker's stance (positive/negative) and their intention (literal/ironic). At a later timepoint (You are such a great/horrible cook), P200, N400, and P600 amplitudes were all greater when the critical word valence was congruent with the speaker’s vocal stance, suggesting that irony was contextually facilitated by early effects from prosody. Our results exemplify that rapid uptake of salient prosodic features allows listeners to make online predictions about the speaker’s ironic intent. This process can constrain their representation of an utterance to uncover nonliteral meanings without violating contextual expectations held about the speaker, as described by parallel-constraint satisfaction models.
... These questions were recently evaluated in a series of ERP studies Jiang & Pell, 2015. Data show that, like emotional expressions, vocal expressions of confidence are rapidly assigned meaning from the acoustic onset of speech and refined with increased exposure to the input (Jiang & Pell, 2015. ...
... These questions were recently evaluated in a series of ERP studies Jiang & Pell, 2015. Data show that, like emotional expressions, vocal expressions of confidence are rapidly assigned meaning from the acoustic onset of speech and refined with increased exposure to the input (Jiang & Pell, 2015. Vocally expressed confidence is robustly detected at the stage of salience detection, differentiating the P200 response to confident versus doubtful voices. ...
... Vocally expressed confidence is robustly detected at the stage of salience detection, differentiating the P200 response to confident versus doubtful voices. Similar to when vocal emotion expressions are analyzed, the directionality of P200 effects depends on the listener's task focus (Paulmann et al., 2013); for example, while highly confident vocal expressions are more salient in certain contexts (Jiang & Pell, 2015), vocal cues marking the speaker's hesitation are more salient (increased P200 amplitude) when listeners must decide whether or not to trust the speaker ; see also Caballero & Pell, 2020). Following initial semantic differentiation of the stimulus (confident vs. doubtful), fine-tuning of a mental representation of the speaker's vocal confidence level appears to build up in the 300-700 ms time window, ensuring that a finer gradient of meaning is achieved ("she seems just slightly uncommitted to this idea"). ...
Preprint
Neurocognitive models (e.g., Schirmer & Kotz, 2006) have helped to characterize how listeners incrementally derive meaning from vocal expressions of emotion in spoken language, what neural mechanisms are involved at different processing stages, and their relative time course. But how can these insights be applied to communicative situations in which prosody serves a predominantly interpersonal function? This comment examines recent data highlighting the dynamic interplay of prosody and language, when vocal attributes serve the sociopragmatic goals of the speaker or reveal interpersonal information that listeners use to construct a mental representation of what is being communicated. Our comment serves as a beacon to researchers interested in how the neurocognitive system "makes sense" of socioemotive aspects of prosody.
... Prosody refers to fluctuations in a speaker's pitch, the intensity of their voice, changes in voice quality, the rate and duration of speech events, and other acoustic cues that jointly convey meaning during interpersonal communication. Prosody guides the resolution of lexical ambiguities (Nygaard & Lunders, 2002;Nygaard & Queen, 2008), signals the pragmatic implications of an utterance to the listener (Pell, 2001;, and encodes the speaker's emotional and mental state while speaking (Banse & Scherer, 1996;Jiang & Pell, 2015;Paulmann, Bleichner, & Kotz, 2013). Critical to this study, prosody is often pivotal for uncovering indirect meanings of the speaker that are not evident at the surface of language. ...
... One of the social functions of prosody is to mark the speaker's attitude or stance in relation to what is being said. Recent work has examined the role of prosody in communicating literal vs. sarcastic remarks (Cheang & Pell, 2008;Mauchand, Vergis, & Pell, 2020;Regel, Gunter, & Friederici, 2011), expressing (im)polite requests (Ofuka, McKeown, Waterman, & Roach, 2000;, uttering (in)sincere compliments (Rigoulot, Fish, & Pell, 2014), and stating personal facts or opinions with different levels of commitment, or vocally-expressed confidence (Jiang & Pell, 2015, 2016. This research points to a rapid uptake and structuring of prosodic information by the neurocognitive system, directly from speech onset, to register potential differences in the social relevance of distinct types of vocal expressions (e.g., Jiang, Gossack-Keenan, & Pell, 2020). ...
... Speakers routinely use extra-linguistic cues to communicate their stance towards the listener or towards the content of their utterance, altering its meaning and/or social relevance to the listener (Caballero, Vergis, Jiang, & Pell, 2018;Jiang & Pell, 2015, 2016Vergis, Jiang & Pell, In Press). An "atypical" use of prosody when performing a speech act, such as when making ironic compliments or criticisms (Bryant & Fox Tree, 2002;Mauchand, Vergis, & Pell, 2020), often serves a signal function by pointing the listener to alternative or indirect meanings intended by the speaker (Arndt & Janney, 1991). ...
Article
Speakers modulate their voice (prosody) to communicate non-literal meanings, such as sexual innuendo (She inspected his package this morning, where “package” could refer to a man’s penis). Here, we analyzed event-related potentials to illuminate how listeners use prosody to interpret sexual innuendo and what neurocognitive processes are involved. Participants listened to third-party statements with literal or ‘sexual’ interpretations, uttered in an unmarked or sexually evocative tone. Analyses revealed: 1) rapid neural differentiation of neutral vs. sexual prosody from utterance onset; (2) N400-like response differentiating contextually constrained vs. unconstrained utterances following the critical word (reflecting integration of prosody and word meaning); and (3) a selective increased negativity response to sexual innuendo around 600 ms after the critical word. Findings show that the brain quickly integrates prosodic and lexical-semantic information to form an impression of what the speaker is communicating, triggering a unique response to sexual innuendos, consistent with their high social relevance.
... Then, subsequent stages of cognitive analysis, beginning approximately 300 ms post-onset of the vocal expression, allow further specification of the vocal meaning, evaluation of voice information in relation to the social context, and the generation of pragmatic inferences (Jiang et al., 2020;Schirmer and Kotz, 2006). Evidence of these latter processes may be inferred from prosody-induced changes in the P300 (Jiang and Pell, 2015) and/or late centroparietal positivity (Late Positive Component/LPC) or late negativity responses beginning around 450 ms post-stimulus-onset, depending on the interpretative context and evaluative demands (see Jiang et al., 2020). ...
... Although several studies have documented late positivities from utterance onset (e.g. Jiang and Pell, 2015;Rigoulot et al., 2014;Zougkou et al., 2017), our design did not allow for this possibility. As the mean duration of the first constituent (verb phrase) was approximately 500ms, the time window for the late positivity response (after 500 ms) would overlap with the onset of the imposition word. ...
... A first set of analyses sought to determine if differences in a speaker's stance (polite vs. rude prosody) were registered from the acoustic onset of the request, as demonstrated for other types of vocal expressions (e.g., Jiang and Pell, 2015;Mauchand et al., submitted for publication;Paulmann et al., 2013). We looked at potential differences in acoustic-sensory processing (N100) and how attention is deployed to reinforce motivationally significant vocal patterns that might contribute to decisions about social compliance (P200). ...
Article
The way that speakers communicate their stance towards the listener is often vital for understanding the interpersonal relevance of speech acts, such as basic requests. To establish how interpersonal dimensions of an utterance affect neurocognitive processing, we compared event-related potentials elicited by requests that linguistically varied in how much they imposed on listeners (e.g., Lend me a nickel vs. hundred) and in the speaker's vocally-expressed stance towards the listener (polite or rude tone of voice). From utterance onset, effects of vocal stance were robustly differentiated by an early anterior positivity (P200) which increased for rude versus polite voices. At the utterance–final noun that marked the 'cost' of the request (nickel vs. hundred), there was an increased negativity between 300 and 500 ms in response to high-imposition requests accompanied by rude stance compared to the rest of the conditions. This N400 effect was followed by interactions of stance and imposition that continued to inform several effects in the late positivity time window (500–800 ms post-onset of the critical noun), some of which correlated significantly with prosody-related changes in the P200 response from utterance onset. Results point to rapid neural differentiation of voice-related information conveying stance (around 200 ms post-onset of speech) and exemplify the interplay of different sources of interpersonal meaning (stance, imposition) as listeners evaluate social implications of a request. Data show that representations of speaker meaning are actively shaped by vocal and verbal cues that encode interpersonal features of an utterance, promoting attempts to reanalyze and infer the pragmatic significance of speech acts in the 500–800 ms time window.
... Onset LPC: differential and sustained cognitive analysis of the tone of voice before critical semantic information is available [12,13]. ...
... Sarcastic speech exhibits atypical prosodic features [7][8][9], that listeners use efficiently to understand the speaker's target intent [10,11]. Prosody-related ERP components have recently been investigated [12,13], but little is known about their implication in sarcasm processing. ...
... Components of interest [13] • Utterance onset: N1 (70-130ms), P2 (150-280ms), LPC (600-1000ms) • Critical word: N400 (300-500ms), P600 (500-800ms) One-way repeated measures ANOVAs were computed for these time windows. ...
Poster
Full-text available
The indirect nature of sarcasm renders it challenging to interpret: the actual speaker’s intent can only be retrieved when the incongruence between the content and pragmatic cues, such as context or tone of voice, is recognized. The cognitive processes underlying the interpretation of irony and sarcasm, in particular, the effects of contextual incongruence on brain activity have recently been examined through event-related potential (ERP) techniques. The role of the tone of voice (prosody) in the perception, processing and interpretation of sarcastic speech, however, remains to be understood. This study aims to investigate this role by assessing differences in the processing of sarcastic and literal speech in the absence of context, when the tone of voice is the only cue to disambiguate the speakers´ intent. Literal and sarcastic stimuli were created by recording verbal compliments (e.g., You are such an awesome driver) with different tones of voice to suggest a literal interpretation (perceived as friendly in a validation phase), or a sarcastic one (a veiled criticism, perceived as unfriendly in a validation phase). Later, these recordings were presented to 21 subjects while their brain activity was recorded through EEG during a friendliness rating task. ERPs were computed for each type of stimulus at the utterance onset, to compare the isolated effects of prosody, and at the onset of the critical word (i.e., awesome), to investigate the point at which tone of voice allowed listeners to confirm the literal intent of the compliment or its sarcastic interpretation and suggested criticism. At sentence onset, early differentiation of sarcastic versus literal utterances appeared at the N1/P2 complex over frontal-central electrodes. Compared to literal speech, sarcasm yielded a reduced amplitude at the N1, known to index discrimination of acoustic features, and a greatly reduced amplitude at the P2, a processing stage for marking the emotional salience of sounds. A later, long-lasting differentiation was characterized by higher positivity for sarcastic utterances around 600-1000ms, resembling a late positive component (LPC) over right central-parietal electrodes, implying differential and sustained cognitive analysis of the sarcastic prosody before semantic information is fully available. At the critical word onset, a negative shift for sarcastic vs. literal utterances at 600-800ms over frontal-central electrodes was detected, in a period suggested to index pragmatic (re)interpretation processes. Results show that, even in the absence of context, sarcasm can be differentiated from literal speech at the neurocognitive level. Discrimination of sarcastic intentions starts very early based on perceptual differences in the tone of voice (N1) and differences in how strategic attention reinforces and is allocated to prosodic distinctions underlying literal and sarcastic utterances as they emerge (P2-LPC). Neural processes that encode differences in the prosodic form of utterances, once integrated with the key word, then impact on late pragmatic interpretation processes when the incongruent nature of content and prosody in sarcastic messages has been made clear. Our preliminary results argue that the tone of voice plays a significant role at multiple processing stages during sarcasm perception, integration and interpretation.
... Confidence has been described as being represented by the demonstration of knowledge and certainty (Jiang and Pell, 2015), and it has been argued that the identification (by listeners) and the demonstration (by talkers) of knowing are thought to be tightly coupled with confidence (Caffi and Janney, 1994). Additionally, it has been suggested that some affective states may have physiological impacts on the articulatory system (e.g., nervousness or anxiety; Hagenaars and Van Minnen, 2005). ...
... Additionally, it has been suggested that some affective states may have physiological impacts on the articulatory system (e.g., nervousness or anxiety; Hagenaars and Van Minnen, 2005). Jiang and Pell (2015) found that unconfident talkers' speech may be marked by higher average pitch and slower speaking durations relative to confident and close-to-confident speech. Talkers do seem to modulate their acoustics when not confident, but how listeners encode this acoustic information as related to knowing and (un)certainty to indicate confidence is relatively less clear. ...
... Recall that Jiang and Pell (2015) found that talkers typically produced higher average pitch and slower speaking durations when they were unconfident. This may be due to physiological impact of weakened control over the articulatory system as exhibited in anxious voice (Hagenaars and Van Minnen, 2005;Stephan and Stephan, 1985;Stephan et al., 1999). ...
Article
Full-text available
In the current study, an interactive approach is used to explore possible contributors to the misattri-butions listeners make about female talker expression of confidence. To do this, the expression and identification of confidence was evaluated through the evaluation of talker-(e.g., talker knowledge and affective acoustic modulation) and listener-specific factors (e.g., interaction between talker acoustic cues and listener knowledge). Talker and listener contexts were manipulated by implementing a social constraint for talkers and withholding information from listeners. Results indicated that listeners were sensitive to acoustic information produced by the female talkers in this study. However, when world knowledge and acoustics competed, judgments of talker confidence by listeners were less accurate. In fact, acoustic cues to female talker confidence were more accurately used by listeners as a cue to perceived confidence when relevant world knowledge was missing. By targeting speech dynamics between female talkers and both female and male listeners, the current study provides a better understanding of how confidence is realized acoustically and, perhaps more importantly, how those cues may be interpreted/misinterpreted by listeners.
... To examine the effect of spatial location on changes in event-related activity, we defined 9 regions of interest (ROI) composed of 6-9 electrodes (Jiang & Pell, 2015 As trial-to-trial latency variability can attenuate component amplitudes that hinder component discrimination, we implemented a Residue Iteration Decomposition (RIDE) procedure on the ERP data before analysis (Ouyang et al., 2015;2017). The RIDE algorithm decomposes ERP clusters at predefined time windows into components that are invariant in latency and those that can vary. ...
... ERP measures cast new light on the neurocognitive mechanisms engaged by native-and foreignaccented expressions of evaluative attitudes as they are processed in real time. Previous evidence has shown that listeners rapidly perceive acoustic cues that reveal information about the speaker's identity Romero-Rivas et al., 2015) and their expressed attitude (Jiang & Pell, 2015;Mauchand et al., 2021). In line with this, we found increased N100 and P200 responses to native-vs foreign-accented utterances at anterior ROIs, consistent with evidence that early perceptual components tend to be larger for native/familiar accents during speech comprehension (Foucart et al., 2020;Foucart & Hartsuiker, 2021;Jiang et al., 2020;Romero-Rivas et al., 2015). ...
Preprint
Evaluative statements are routine in interpersonal communication but may evoke different responses depending on the speaker’s identity. Here, thirty participants listened to direct compliments and criticisms spoken in native or foreign accents and rated the speaker's friendliness as their electroencephalogram was recorded. Event-related potentials (ERPs) were examined for (1) vocal speech cues time-locked to statement onset and (2) emotive semantic attributes at the sentence-final word. Criticisms from native speakers were perceived as less friendly than from foreign speakers, whereas compliments did not differ. ERPs revealed listeners rapidly used acoustic information to differentiate speaker identity and evaluative attitude (N100, P200), then selectively monitored vocal attitude expressed by native speakers in the LPP time window. When the evaluative word was heard, native accents contextually modulated early semantic processing (N400). Words of criticism increased the LPP irrespective of accent. Our data showcase unique neurocognitive and behavioural effects of speaker accent on interpersonal communication.
... A power analysis was run using G*power to determine a minimum sample size for the experiment (Faul et al., 2007). Expecting large effects of prosody, but conservatively assuming medium effect sizes regarding accent effects on the ERP components of interest (Jiang & Pell, 2015;, a minimum of 24 participants was determined to achieve a power over 80%. Twenty-eight French participants were recruited in the region of Montreal, Canada. ...
... Negative effects have been reported in this time-window in innuendo statements, when prosody was used as a cue to interpret otherwise ambiguous statements (Rigoulot et al., 2020), suggesting a prosody-based pragmatic resolution of such ambiguity. Alternatively, the resulting increased P600 for Neutral/ Innocuous and Complaint/Painful statement may reflect fine-grained interpretation of speaker intentions in unambiguous speech (i.e., matched valence of prosody and statement): in this case, clear evidence of the speaker's stance would allow the listener to engage in more indepth evaluation of its pragmatic implications (Jiang et al., 2013;Jiang & Pell, 2015;Rigoulot et al., 2014). Either way, the results suggest that prosody can act as a high-constraint signal that influence N400 and P600 effects Molinaro et al., 2009;Zhang et al., 2021), effectively superseding other contextual and verbal cues in appraising other's discourse. ...
Article
When complaining, speakers can use their voice to convey a feeling of pain, even when describing innocuous events. Rapid detection of emotive and identity features of the voice may constrain how the semantic content of complaints is processed, as indexed by N400 and P600 effects evoked by the final, pain-related word. Twenty-six participants listened to statements describing painful and innocuous events expressed in a neutral or complaining voice, produced by ingroup and outgroup accented speakers. Participants evaluated how hurt the speaker felt under EEG monitoring. Principal Component Analysis of Event-Related Potentials from the final word onset demonstrated N400 and P600 increases when complainers described innocuous vs. painful events in a neutral voice, but these effects were altered when utterances were expressed in a complaining voice. Independent of prosody, N400 amplitudes increased for complaints spoken in outgroup vs. ingroup accents. Results demonstrate that prosody and accent constrain the processing of spoken complaints as proposed in a parallel-constraint-satisfaction model.
... The mel-frequency cepstrum coefficient (MFCC) [22] is a feature parameter discovered based on this auditory mechanism, which is in nonlinear correspondence with frequency and has been widely used in the fields of speech emotion recognition and deception detection. Research has shown that early extraction and analysis of acoustic parameters affect the differentiation of early ERP responses, while stimuli caused by acoustic characteristics in the early stages can affect brain cognition in the later stages [23]. The nervous system encodes these evolving acoustic parameters to obtain a clear representation of different speech patterns, further enabling the brain to clearly distinguish between lies and truth. ...
... When people lie, they tend to use more complex language and take longer to respond to questions. This process is accompanied by changes in ERPs on the amygdala, insula, and prefrontal regions of the brain as well as changes in acoustic signature parameters associated with lying, with some studies demonstrating that these two changes are correlated [23]. Drawing on the work of Low et al. [47] and Pastoriza-Domínguez et al. [48] who used machine learning algorithms based on acoustic feature analysis for detecting major mental disorders, we focused, in this paper, on choosing the acoustic feature parameters associated with the act of lying and used the trained neural network model to detect subtle changes in the acoustic feature parameters under different speech patterns to discriminate between lies and truth. ...
Article
Full-text available
Human lying is influenced by cognitive neural mechanisms in the brain, and conducting research on lie detection in speech can help to reveal the cognitive mechanisms of the human brain. Inappropriate deception detection features can easily lead to dimension disaster and make the generalization ability of the widely used semi-supervised speech deception detection model worse. Because of this, this paper proposes a semi-supervised speech deception detection algorithm combining acoustic statistical features and time-frequency two-dimensional features. Firstly, a hybrid semi-supervised neural network based on a semi-supervised autoencoder network (AE) and a mean-teacher network is established. Secondly, the static artificial statistical features are input into the semi-supervised AE to extract more robust advanced features, and the three-dimensional (3D) mel-spectrum features are input into the mean-teacher network to obtain features rich in time-frequency two-dimensional information. Finally, a consistency regularization method is introduced after feature fusion, effectively reducing the occurrence of over-fitting and improving the generalization ability of the model. This paper carries out experiments on the self-built corpus for deception detection. The experimental results show that the highest recognition accuracy of the algorithm proposed in this paper is 68.62% which is 1.2% higher than the baseline system and effectively improves the detection accuracy.
... Vocal confidence expressions serve as "evidentiality" devices for inferring the reliability, correctness, or truth value of what is expressed from a speaker's tone of voice (Caffi and Janney, 1994;Jiang and Pell, 2015). In particular, a speaker's possession of confidence is typically encoded by external cues that provide evidence for the speaker's knowledge about the self-evaluated correctness or truth value of his own statements (London et al., 1970a(London et al., ,b, 1971Scherer et al., 1973). ...
... This finding suggests that segmental and suprasegmental features in vowels can provide sufficient information to differentiate when the speakers' intended high vs. low confidence and when the speaker did or did not intend any emotion or confidence in the sound (Jiang and Pell, 2015). In addition, lexical tone modulated the acoustic encoding of speaker confidence levels in vowels. ...
Article
Full-text available
Introduction Wuxi dialect is a variation of Wu dialect spoken in eastern China and is characterized by a rich tonal system. Compared with standard Mandarin speakers, those of Wuxi dialect as their mother tongue can be more efficient in varying vocal cues to encode communicative meanings in speech communication. While literature has demonstrated that speakers encode high vs. low confidence in global prosodic cues at the sentence level, it is unknown how speakers’ intended confidence is encoded at a more local, phonetic level. This study aimed to explore the effects of speakers’ intended confidence on both prosodic and formant features of vowels in two lexical tones (the flat tone and the contour tone) of Wuxi dialect. Methods Words of a single vowel were spoken in confident, unconfident, or neutral tone of voice by native Wuxi dialect speakers using a standard elicitation procedure. Linear-mixed effects modeling and parametric bootstrapping testing were performed. Results The results showed that (1) the speakers raised both F1 and F2 in the confident level (compared with the neutral-intending expression). Additionally, F1 can distinguish between the confident and unconfident expressions; (2) Compared with the neutral-intending expression, the speakers raised mean f0, had a greater variation of f0 and prolonged pronunciation time in the unconfident level while they raised mean intensity, had a greater variation of intensity and prolonged pronunciation time in the confident level. (3) The speakers modulated mean f0 and mean intensity to a larger extent on the flat tone than the contour tone to differentiate between levels of confidence in the voice, while they modulated f0 and intensity range more only on the contour tone. Discussion These findings shed new light on the mechanisms of segmental and suprasegmental encoding of speaker confidence and lack of confidence at the vowel level, highlighting the interplay of lexical tone and vocal expression in speech communication.
... As confident and unconfident assertions generally sound different, our neurocognitive credence-tracking systems can then use these differences in auditory cues to rapidly and automatically work out the confidence of others when they assert that p. This has been demonstrated quite conclusively by a series of studies from Jiang & Pell, which observed that participants can quickly and accurately judge a speaker's level of confidence in her assertion 5 on the basis of prosodic cues alone, ultimately concluding that 'a listener's brain is rapidly attuned to vocal cues that signal one's [degree of confidence]' (2015, 24; also Jiang and Pell 2016;. Importantly for our purposes, neurophysiological evidence indicates that neutral-sounding statements are processed differently than all prosodically marked statements, whether marked for high or low speaker confidence (Jiang and Pell 2015, §3.2). ...
... Emotional states like anger, sadness, joy, and fear all produce characteristic (graded) prosodic cues that allow us to rapidly track these states in others (for a review, see Frühholz et al. 2016). It is then unsurprising that converging evidence from a variety of sources indicates that the tracking of speaker confidence through vocal pitch dynamics is supported by the same neural mechanisms responsible for processing speakers' emotional cues: EEG studies have observed neural time courses for the processing of speaker confidence consistent with those of other emotional vocal cues Pell 2015, 2016); hemodynamic activation during the processing of confident/unconfident utterances, especially in the superior temporal gyrus and inferior frontal gyrus , closely matches that of other emotional vocal cues (see Frühholz et al. 2016); and neuropsychological observations from patients with right hemisphere damage (Pell 2007) as well as Parkinson's disease (Monetta et al. 2008) indicate that deficits in tracking the confidence of others via prosodic cues closely coincide with deficits in tracking emotional prosody. In short, when credences are tracked in this way, they are tracked just as we might expect were credence some variety of emotional state. ...
Article
Full-text available
Here I explore a new line of evidence for belief–credence dualism, the thesis that beliefs and credences are distinct and equally fundamental types of mental states. Despite considerable recent disagreement over this thesis, little attention has been paid in philosophy to differences in how our mindreading systems represent the beliefs and credences of others. Fascinatingly, the systems we rely on to accurately and efficiently track others’ mental states appear to function like belief–credence dualists: Credence is tracked like an emotional state, composed of both representational and affective content, whereas belief is tracked like a bare representational state with no affective component. I argue on a preliminary basis that, in this particular case, the mechanics of mentalizing likely pick out a genuine affective dimension to credence that is absent for belief, further strengthening the converging case for belief–credence dualism.
... Currently, other studies show that listeners are adept at making cognitive appraisals of speaker confidence based on social and acoustic cues produced by the speaker (e.g., Jiang and Pell, 2016;Lempert et al., 2015;Roche et al., 2019). For example, Jiang and Pell (2015) and Roche et al. (2019) asked participants to answer questions posed by an experimenter. Jiang and Pell coached participants to feign confidence, while Roche et al. (2019) asked participants to self-report their state of confidence. ...
... Interestingly, found interpersonal characteristics modulated decision making, evidenced by modulation of brain activation and functional connectivity during believability judgements. Although Jiang and Pell (2015) note that unconfident expressions elicit a weaker P2, there appear to be gender differences in explicit social judgments about confidence. ...
Article
One's ability to express confidence is critical to achieve one's goals in a social context—such as commanding respect from others, establishing higher social status, and persuading others. How individuals perceive confidence may be shaped by the socio-indexical cues produced by the speaker. In the current production/perception study, we asked four speakers (two cisgender women/men) to answer trivia questions under three speaking contexts: natural, overconfident, and underconfident (i.e., lack of confidence). An evaluation of the speakers' acoustics indicated that the speakers significantly varied their acoustic cues as a function of speaking context and that the women and men had significantly different acoustic cues. The speakers' answers to the trivia questions in the three contexts (natural, overconfident, underconfident) were then presented to listeners ( N = 26) in a social judgment task using a computer mouse-tracking paradigm. Listeners were sensitive to the speakers' acoustic modulations of confidence and differentially interpreted these cues based on the perceived gender of the speaker, thereby impacting listeners' cognition and social decision making. We consider, then, how listeners' social judgments about confidence were impacted by gender stereotypes about women and men from social, heuristic-based processes.
... Finally, the last stage of auditory emotional processing includes a relatively more complex evaluation of the stimuli, including explicit judgements 22,23 . The late ERP component, LPC, is associated with elaborate processing of emotional auditory stimuli 22,25,46,47 . Its amplitude has been shown to increase in response to compliments perceived as sarcastic (insincere, ironic) vs. genuine 27,52 , suggesting a possible role in authenticity discrimination in non-verbal vocalizations as well. ...
... This may suggest that P200 amplitude is particularly triggered by lack of authenticity/genuineness (unlike N100). The effect was in the direction we predicted given previous evidence linking increased amplitude to motivational salience, and supporting the P200 amplitude modulation as an early indicator of emotional significance 22,23,25,34,40,46,47 . The P200 effects we observed might thus reflect a higher motivational salience of the acted stimuli 9,53 , serving to signal the need to resolve the expression's ambiguity and the intention of the speaker, while authentic emotions require less effort to decipher. ...
Article
Full-text available
Deciding whether others’ emotions are genuine is essential for successful communication and social relationships. While previous fMRI studies suggested that differentiation between authentic and acted emotional expressions involves higher-order brain areas, the time course of authenticity discrimination is still unknown. To address this gap, we tested the impact of authenticity discrimination on event-related potentials (ERPs) related to emotion, motivational salience, and higher-order cognitive processing (N100, P200 and late positive complex, the LPC), using vocalised non-verbal expressions of sadness (crying) and happiness (laughter) in a 32-participant, within-subject study. Using a repeated measures 2-factor (authenticity, emotion) ANOVA, we show that N100’s amplitude was larger in response to authentic than acted vocalisations, particularly in cries, while P200’s was larger in response to acted vocalisations, particularly in laughs. We suggest these results point to two different mechanisms: (1) a larger N100 in response to authentic vocalisations is consistent with its link to emotional content and arousal (putatively larger amplitude for genuine emotional expressions); (2) a larger P200 in response to acted ones is in line with evidence relating it to motivational salience (putatively larger for ambiguous emotional expressions). Complementarily, a significant main effect of emotion was found on P200 and LPC amplitudes, in that the two were larger for laughs than cries, regardless of authenticity. Overall, we provide the first electroencephalographic examination of authenticity discrimination and propose that authenticity processing of others’ vocalisations is initiated early, along that of their emotional content or category, attesting for its evolutionary relevance for trust and bond formation.
... comparison. This analysis was performed separately for the left (single-trial magnitude averaged across Jiang and Pell, 2015). Throughout the study, multiple comparisons were corrected with the FDR (false discovery rate) procedure when LMEM analyses were implemented over multiple time windows and three scalp regions for each prosodic intention effect (suggestion-vs.-neutral and warning-vs.-neutral), and corrected p values were reported. ...
Article
Understanding the correct intention of a speaker is critical for social interaction. Speech prosody is an important source for understanding speakers’ intentions during verbal communication. However, the neural dynamics by which the human brain translates the prosodic cues into a mental representation of communicative intentions in real time remains unclear. Here, we recorded EEG (electroencephalograph) while participants listened to dialogues. The prosodic features of the critical words at the end of sentences were manipulated to signal either suggestion, warning, or neutral intentions. The results showed that suggestion and warning intentions evoked enhanced late positive event-related potentials (ERPs) compared to the neutral condition. Linear mixed-effects model (LMEM) regression and representational similarity analysis (RSA) analyses revealed that these ERP effects were distinctively correlated with prosodic acoustic analysis, emotional valence evaluation, and intention interpretation in different time windows; The onset latency significantly increased as the processing level of abstractness and communicative intentionality increased. Neural representations of intention and emotional information emerged and parallelly persisted over a long time window, guiding the correct identification of communicative intention. These results provide new insights into understanding the structural components of intention processing and their temporal neural dynamics underlying communicative intention comprehension from speech prosody in online social interactions.
... However, worsening perception of speaker's persuasion and pitch variation work as a cue to the prominence of the utterance focus (Zhao et al., 2018). From a neurophysiological perspective, the studies (Jiang & Pell, 2015;Pell, 2007) show that melodic (vocal and pitch) variation is perceived by the brain in association with other cues, leading to an inference about the status of the speaker, which would signal the level of confidence and certainty of speech. Thus, the studies indicate that there are mechanisms of cognitive processing and individual variations in the interpretation of acoustic cues related to the mental state of the speaker from the speech signal. ...
Article
Full-text available
Purpose: To review the literature regarding prosodic acoustic features found in communicative attitudes related to confidence, certainty, and persuasion. Method: A systematic review was carried out in the databases VHL, Web of Science, Science Direct, SciELO, and SCOPUS with no temporal restriction. Data Extraction: The data from each article was extracted based on the STROBE Statement checklist. To analyze the prosodic variables, the data were subdivided according to Couper-Kuhlen’s (1986) theoretical assumptions, and the variables were grouped into “temporal organization of speech,” “intensity,” and “pitch”. Conclusion: The data suggested that there are relevant but not consensual variables to characterize a persuasive speech and some variables can be considered as positive, negative, or neutral in different language contexts. The variables that stand out as relevant and are characteristic of the persuasive or confidant speech were faster speech speed and higher intensity. The only negative variable that stood out regarding persuasion was the increase in pitch. Keywords: persuasion; prosody; communication; acoustics; speech
... Both new instruments, media communication, stage design, and the skill development of vocal performers themselves have produced a series of changes [7][8][9][10]. How to strengthen the infectious power of the vocal stage and how to express the inner thoughts and emotions of vocal works are issues that vocal performers are always studying in the process of vocal "teaching" and "learning" [11][12][13][14]. In vocal singing, the so-called control ability is the ability to sing, which includes the control of voice strength, singing speed, and breath energy, all of which belong to the psychological characteristics of personality in vocal art psychology [15][16][17][18]. ...
Article
Full-text available
Due to the diverse development trend of modern media, new media arts and applications are being presented in the field of vocal performance teaching with its many advantages of interactivity, immediacy, sharing, comprehensiveness, versatility, community, and personalization. In this paper, by decoding the EEG signal, through the decoding process of EEG data pre-processing, feature extraction, feature identification, and classification, and calculating the significance of each element in the time-frequency matrix, an iso-dimensional mask matrix can be obtained. Then the conditional random field model is established on the random field theory to get the parameters of the model. Finally, the parameters of the model are obtained by maximizing the following entropy function, which is brought into the Lagrangian operator to obtain the pairwise Lagrangian operator. Finally, the EEG signal is decoded to realize the self-control training of vocal performance teaching in the new media environment. The experimental results show that by conducting the intervention test on self-control and vocal performance insight, the mean value of the total self-control score in self-control training is 61.99±11.45, and the intervention effect has stability. Therefore, improving self-control, forming correct expressions and forms, and enriching emotions are important for vocal performance.
... For example, architectural color detailing can reconceptualize a space by adding white and gray tones to create a clean, quiet interior. The top space decoration is the finishing touch to the overall design, and the sky pattern with green and blue colors can add a natural and green atmosphere to the interior space environment, and visually give people the illusion of being under white clouds [8][9][10][11][12][13]. The psychological community calls this thinking "theoretical psychology," which is roughly equivalent to philosophical thinking about a wide range of psychological content. ...
Article
Full-text available
Due to the diversified development trend of modern architectural design, the functional design and application of houses are being presented in the interior space color layout of houses with its many advantages of interactivity, comprehensiveness, multi-functionality, and personalization. With this as the starting point, this paper analyzes the psychological perception of survey respondents on the same spatial color through big data, converts the perception into EEG signal for decoding, and after the decoding process of EEG data pre-processing, feature extraction, feature identification, and classification, calculates the significance of each element in the time-frequency matrix, which can get a homo-dimensional mask matrix. Then the conditional random field model is established on the random field theory to get the parameters of the model. Finally, the parameters of the model are obtained by maximizing the following entropy function, which is brought into the Lagrangian operator to obtain the pairwise Lagrangian operator. Finally, the EEG signal is decoded to realize the self-control training of color perception under different mentalities. The experimental results showed that by performing the intervention test on self-control and color insight, the mean value of the total self-control score in training was 61.99±11.45, and the intervention effect had stability. Therefore, improving self-control ability and forming correct tendency psychological perception plays a vital role in the color design of architectural space.
... For example, a palace commercial horn sign feather may only have thin timbre and timbre quality. PD patients with selective hearing loss in addition to articulation and airflow tremors over time may talk and hear differently from other people and experience social impairment [62][63][64][65][66][67]. ...
Article
Full-text available
The clinical features of Parkinson’s disease (PD) include tremors and rigidity. However, paresthesia has not drawn clinical attention. PD involves the whole body and begins with gastrointestinal lesions, which do not start in the midbrain substantia nigra, but from the beginning of the medulla oblongata of the glossopharyngeal nerve nuclei, to the motor nerve dorsal nucleus of the vagus nerve, to the pons and midbrain, and finally to the neocortex. The human eye, ear, nose, tongue, and body perceive the external world. (1) Visual impairment in patients with PD can be easily confused with senile eye disease. This change in retinal pigment cells has many similarities to the degeneration of dopaminergic neurons in the substantia nigra in PD. (2) Selective high-frequency hearing impairment can cause a certain degree of communication barriers, only understanding the son’s bass but not the daughter’s soprano, and there is a certain relationship between hearing and body postural balance. (3) Olfactory loss is one of the earliest signs of PD and an important indicator for the early screening of PD. (4) Taste disorders, including loss of taste and taste memory, can cause cognitive impairment. (5) The body’s sense of touch, pressure, pain, temperature, and position abnormalities interfere with the motor symptoms of PD and seriously affect the quality of life of patients. This article discusses vision, hearing, smell, taste, touch, and analyses of neuroanatomy and pathology, revealing its clinical significance.
... Specifically, an enhanced auditory cortex response elicited by rare relative to frequent sounds is further amplified for rare sounds that are positive or negative when compared with rare neutral sounds (Schirmer et al., 2005;Schirmer and Escoffier, 2010). Although these early sensory modulations may be influenced by higher-order mental processes, they are typically considered more stimulus-driven or bottom-up when compared with later ERPs (Schirmer and Kotz, 2006;Paulmann and Kotz, 2008;Jiang and Pell, 2015). ...
Article
Full-text available
Here we asked whether, similar to visual and auditory event-related potentials (ERPs), somatosensory ERPs reflect affect. Participants were stroked on hairy or glabrous skin at five stroking velocities (0.5, 1, 3, 10 and 20 cm/s). For stroking of hairy skin, pleasantness ratings related to velocity in an inverted u-shaped manner. ERPs showed a negativity at 400 ms following touch onset over somatosensory cortex contra-lateral to the stimulation site. This negativity, referred to as sN400, was larger for intermediate than for faster and slower velocities and positively predicted pleasantness ratings. For stroking of glabrous skin, pleasantness showed again an inverted u-shaped relation with velocity and, additionally, increased linearly with faster stroking. The sN400 revealed no quadratic effect and instead was larger for faster velocities. Its amplitude failed to significantly predict pleasantness. In sum, as was reported for other senses, a touch’s affective value modulates the somatosensory ERP. Notably, however, this ERP and associated subjective pleasantness dissociate between hairy and glabrous skin underscoring functional differences between the skin with which we typically receive touch and the skin with which we typically reach out to touch.
... Overall, these findings in auditory models are particularly relevant to human auditory brain function for acoustic communication and comprehension, which demands a balance between highly specific encoding of acoustic cue features to identify a speaker's voice or their emotional context (e.g., Dietrich et al., 2008;Jiang and Pell, 2015), and more general cue encoding strategies that allow humans to understand the same words from different speakers (e.g. Von Kriegstein et al., 2010). ...
Article
Full-text available
The burgeoning field of neuroepigenetics has introduced chromatin modification as an important interface between experience and brain function. For example, epigenetic mechanisms like histone acetylation and DNA methylation operate throughout a lifetime to powerfully regulate gene expression in the brain that is required for experiences to be transformed into long-term memories. This Review highlights emerging evidence from sensory models of memory that converge on the premise that epigenetic regulation of activity-dependent transcription in the sensory brain facilitates highly precise memory recall. Chromatin modifications may be key for neurophysiological responses to transient sensory cue features experienced in the “here and now” to be recapitulated over the long term. We conclude that the function of epigenetic control of sensory system neuroplasticity is to regulate the amount and type of sensory information retained in long-term memories by regulating neural representations of behaviorally relevant cues that guide behavior. This is of broad importance in the neuroscience field because there are few circumstances in which behavioral acts are devoid of an initiating sensory experience.
... Trials were labeled according to voice Form (speech, vocalization), Congruency (congruent, incongruent), and Face (angry, happy, sad). For each T-S factor, we built linear mixed effects models (LMM) to evaluate the significance of the main condition effects and their interactions (e.g., Jiang & Pell, 2015). Maximal random effect structures were kept diminishing type I error (Barr, Levy, Scheepers, & Tily, 2013). ...
Article
When we hear an emotional voice, does this alter how the brain perceives and evaluates a subsequent face? Here, we tested this question by comparing event-related potentials evoked by angry, sad, and happy faces following vocal expressions which varied in form (speech-embedded emotions, non-linguistic vocalizations) and emotional relationship (congruent, incongruent). Participants judged whether face targets were true exemplars of emotion (facial affect decision). Prototypicality decisions were more accurate and faster for congruent vs. incongruent faces and for targets that displayed happiness. Principal component analysis identified vocal context effects on faces in three distinct temporal factors: a posterior P200 (150-250ms), associated with evaluating face typicality; a slow frontal negativity (200-750ms) evoked by angry faces, reflecting enhanced attention to threatening targets; and the Late Positive Potential (LPP, 450-1000ms), reflecting sustained contextual evaluation of intrinsic face meaning (with independent LPP responses in posterior and prefrontal cortex). Incongruent faces and faces primed by speech (compared to vocalizations) tended to increase demands on face perception at stages of structure-building (P200) and meaning integration (posterior LPP). The frontal LPP spatially overlapped with the earlier frontal negativity response; these components were functionally linked to expectancy-based processes directed towards the incoming face, governed by the form of a preceding vocal expression (especially for anger). Our results showcase differences in how vocalizations and speech-embedded emotion expressions modulate cortical operations for predicting (prefrontal) versus integrating (posterior) face meaning in light of contextual details.
... In addition to offering a friendly pat on the back, we may make eye contact, smile and speak with a soft, melodic voice, or emit affectively regulated odors. These and other nonverbal activities have been likened to language and considered as code for an intended emotional message [5][6][7]. Moreover, researchers have tried to crack that code by mapping behavioral characteristics onto a sender's emotions. ...
Article
Traditionally, nonverbal behaviors have been understood as coded messages one person sends to another. Following this tradition, social touch has been pursued by asking what it communicates. We argue this question is misleading and ask instead how touch impacts on those giving and receiving it. Indeed, a growing literature investigating gentle physical contact highlights that both toucher and touchee may benefit because such contact is pleasurable, because it helps regulate stress and negative affect, or because it generates trust and good will. Together, published findings prompt a new perspective that understands tactile and other nonverbal behaviors as tools. This perspective seems better suited to explain existing data and to guide future research into the processes and consequences of social touch.
... Instead, during the above circumstances, other's tone, facial expression, choice of words, etc., are usually more readily available. Some of these features, such as pitch, speech rate, are thought to be associated with one's confidence [12,13]. In other words, confederates may still be able to convey confidence information to us even when only language features are prominent. ...
Article
Full-text available
Memory conformity may develop when people are confronted with distinct memories reported by others in social situations and knowingly/unknowingly adhere to these exogenous memories. Earlier research on memory conformity suggests that (1) subjects were more likely to conform to confederate with high confidence; (2) subjects with low confidence on their memory accuracy were more likely to conform, and; (3) this subjective confidence could be adjusted by social manipulations. Nonetheless, it remains unclear how the confidence levels of ours and others may interact and produce a combined effect on our degree of conformity. More importantly, is memory conformity, defined by a complete adoption of the opposite side, the result of a gradual accumulation of subtler changes at the confidence level, i.e., a buildup of confidence conformity? Here, we followed participant’s confidence transformation quantitatively over three confederate sessions in a memory test. After studying a set of human motion videos, participants had to answer simultaneously whether a target or lure video had appeared before by indicating their side (i.e., Yes/No) and their associated confidence rating. Participants were allowed to adjust their responses as they were being shown randomly-generated confederates’ answers and confidence values. Results show that participants indeed demonstrated confidence conformity. Interestingly, they tended to become committed to their side early on and gain confidence gradually over subsequent sessions. This polarizing behaviour may be explained by two kinds of preferences: (1) Participant’s confidence enhancement towards same-sided confederates was greater in magnitude compared to the decrement towards an opposite-sided confederate; and (2) Participants had the most effective confidence boost when the same-sided confederates shared similar, but not considerably different, confidence level to theirs. In other words, humans exhibit side- and similarity-biases during confidence conformity.
... Recent neuroimaging and electrophysiological studies looked at the impact of the voice on the perception of the speaker (e.g., Jiang et al., 2020;Jiang et al., 2017;Jiang and Pell, 2015). Jiang and collaborators (2020) showed that the believability of the speaker can be influenced by vocally-expressed confidence in the speech and by whether the speaker belongs to the same social group as the listener. ...
Article
This study investigated the impact of the speaker’s identity generated by the voice on sentence processing. We examined the relation between ERP components associated with the processing of the voice (N100 and P200) from voice onset and those associated with sentence processing (N400 and late positivity) from critical word onset. We presented Dutch native speakers with sentences containing true (and known) information, unknown (but true) information or information violating world knowledge and had them perform a truth evaluation task. Sentences were spoken either in a native or a foreign accent. Truth evaluation judgments were not different for statements spoken by the native-accented and the foreign-accented speakers. Reduced N100 and P200 were observed in response to the foreign speaker’s voice compared to the native speaker’s. While statements containing unknown information or world knowledge violations generated a larger N400 than true statements in the native condition, they were not significantly different in the foreign condition, suggesting shallower processing of foreign-accented speech. The N100 was a significant predictor for the N400 in that the reduced N100 observed for the foreign speaker compared to the native speaker was related to a smaller N400 effect. These finding suggest that the impression of the speaker that listeners rapidly form from the voice affects semantic processing, which confirms that speaker’s identity and language comprehension cannot be dissociated.
... Sarcasm finally has a slower vocalization rate, higher voice intensity, and lower pitch (Rockwell, 2000). These few examples serve to show that the acoustic space of voice signals is large to allow signaling of rich social information, such as competence (Schroeder and Epley, 2015), power (McAleer et al., 2014), warnings and hesitation (Hellbernd and Sammler, 2016), politeness (Pell, 2007), confidence (Jiang and Pell, 2015), sincerity , lying and deception (Anolli and Ciceri, 1997), sexual orientation (Sulpizio et al., 2015), or social status (Leongómez et al., 2017;Oveis et al., 2016). In terms of referential voice signals, some monkey species have evolved sophisticated taxonomies of alarm calls, for example, that signal the presence of certain predators (Fichtel et al., 2005;Seyfarth et al., 1980). ...
Article
Full-text available
While humans have developed a sophisticated and unique system of verbal auditory communication, they also share a more common and evolutionarily important nonverbal channel of voice signaling with many other mammalian and vertebrate species. This nonverbal communication is mediated and modulated by the acoustic properties of a voice signal, and is a powerful – yet often neglected – means of sending and perceiving socially relevant information. From the viewpoint of dyadic (involving a sender and a signal receiver) voice signal communication, we discuss the integrated neural dynamics in primate nonverbal voice signal production and perception. Most previous neurobiological models of voice communication modelled these neural dynamics from the limited perspective of either voice production or perception, largely disregarding the neural and cognitive commonalities of both functions. Taking a dyadic perspective on nonverbal communication, however, it turns out that the neural systems for voice production and perception are surprisingly similar. Based on the interdependence of both production and perception functions in communication, we first propose a re-grouping of the neural mechanisms of communication into auditory, limbic, and paramotor systems, with special consideration for a subsidiary basal-ganglia-centered system. Second, we propose that the similarity in the neural systems involved in voice signal production and perception is the result of the co-evolution of nonverbal voice production and perception systems promoted by their strong interdependence in dyadic interactions.
... While accents rapidly (and involuntarily) reveal facets of a speaker's social identity, vocal behavior provides speakers with opportunities to strategically modulate their way of speaking to influence listeners' social impressions (Caffi & Janney, 1994). For example, people can volitionally signal different levels of confidence, or perceived commitment, through the quality of their vocal expressions (Jiang & Pell, 2015. The vocal expression of confidence, in turn, promotes inferences about how trustworthy and persuasive speakers are and how much their statements should be believed. ...
Article
Full-text available
People often evaluate speakers with nonstandard accents as being less competent or trustworthy, which is often attributed to in-group favoritism. However, speakers can also modulate social impressions in the listener through their vocal expression (e.g., by speaking in a confident vs. a doubtful tone of voice). Here, we addressed how both accents and vocally-expressed confidence affect social outcomes in an interaction setting using the Trust Game, which operationalizes interpersonal trust using a monetary exchange situation. In a first study, 30 English Canadians interacted with partners speaking English with a Canadian, Australian, or foreign (French) accent. Speakers with each accent vocally expressed themselves in different ways (confident, doubtful, or neutral voice). Results show that trust decisions were significantly modulated by a speaker’s accent (fewer tokens were given to foreign-accented speakers) and by vocally-expressed confidence (less tokens were given to doubtful-sounding speakers). Using the same paradigm, a second study then tested whether manipulating the social identity of the speaker-listener led to similar trust decisions in participants who spoke English as a foreign language (EFL; 60 native speakers of French or Spanish). Again, EFL participants trusted partners who spoke in a doubtful manner and those with a foreign accent less, regardless of the participants’ linguistic background. Taken together, results suggest that in social-interactive settings, listeners implicitly use different sources of vocal cues to derive social impressions and to guide trust-related decisions, effects not solely driven by shared group membership. The influence of voice information on trust decisions was very similar for native and non-native listeners.
... Yet, such time is needed to extract a general emotional context since for more precise emotional state recognition those estimations are remarkably increased. For instance after vocal stimuli, it has been reported that a time window of 980-1270ms might be required [71]. Moreover, the performance and required time for identifying the basic emotions is highly dependent on the intensity of the expressed emotion [72]. ...
Article
The advancement of Human-Robot Interaction (HRI) drives research into the development of advanced emotion identification architectures that fathom audio-visual (A-V) modalities of human emotion. State-of-the-art methods in multi-modal emotion recognition mainly focus on the classification of complete video sequences, leading to systems with no online potentialities. Such techniques are capable of predicting emotions only when the videos are concluded, thus restricting their applicability in practical scenarios. The paper at hand provides a novel paradigm for online emotion classification, which exploits both audio and visual modalities and produces a responsive prediction when the system is confident enough. We propose two deep Convolutional Neural Network (CNN) models for extracting emotion features, one for each modality, and a Deep Neural Network (DNN) for their fusion. In order to conceive the temporal quality of human emotion in interactive scenarios, we train in cascade a Long Short-Term Memory (LSTM) layer and a Reinforcement Learning (RL) agent -which monitors the speaker- thus stopping feature extraction and making the final prediction. The comparison of our results on two publicly available A-V emotional datasets viz., RML and BAUM-1s, against other state-of-the-art models, demonstrates the beneficial capabilities of our work.
... Notably, although controlling tones communicating an immediate need to respond elicited the strongest P2 amplitude, autonomy-supportive tones also elicited stronger responses as compared to neutral tones of voice, indicating support offered meaningful motivational content, but suggesting a softer quality to messages that motivate listeners through inviting opportunity through choice and encouragement when control is the comparison. Overall, these data nicely complement the attitudinal and emotional prosody literature which has repeatedly shown that emotional and attitudinal signals as conveyed through prosody are processed rapidly (e.g., Paulmann and Kotz, 2008b;Paulmann et al., 2013;Schirmer et al., 2013;Jiang and Pell, 2015). Crucially, it has been proposed that prosodic stimuli that are intrinsically relevant (e.g., signaling that immediate action is needed) may lead to enhanced processing efforts. ...
Article
Motivating communications are a frequent experience within daily life. Recently, it has been found that two types of motivations are spoken with distinct tones of voices: control (pressure) is spoken with a low pitched, loud tone of voice, fast speech rate, and harsh sounding voice quality; autonomy (support) is spoken with a higher pitched, quieter tone of voice and a slower speech rate. These two motivational tones of voice also differentially impact listeners’ well-being. Yet, little is known about the brain mechanisms linked to motivational communications. Here, participants were asked to listen to semantically identical sentences spoken in controlling, neutral, or autonomy-supportive prosody. We also presented cross-spliced versions of these sentences for maximum control over information presentation across time. Findings showed listeners quickly detected whether a speaker was providing support, being pressuring, or not using motivating tones at all. Also, listeners who are pressured do not seem to respond anew when a supportive motivational context arises, but those who had been supported are affected by a newly pressuring environment. Findings are discussed in light of motivational and prosody literatures, and in terms of significance for the role of motivational communications on behavior.
... Furthermore, largely based on non-verbal vocal information (such as pitch and intonation) rather than verbal content (i.e. what is said), it has been shown that rapid assessments are made about a speaker's affective state [17][18][19], confidence level [20], perceived intelligence [21], and personality [1,6,[22][23][24]. In turn, such rapid judgements impact our business decisions [25], voting and political preferences [26][27][28][29][30][31], whom we hire [21], whom we laugh with [19,32], and whom we are attracted to [22][23][24]. ...
Article
Full-text available
It has previously been shown that first impressions of a speaker’s personality, whether accurate or not, can be judged from short utterances of vowels and greetings, as well as from prolonged sentences and readings of complex paragraphs. From these studies, it is established that listeners’ judgements are highly consistent with one another, suggesting that different people judge personality traits in a similar fashion, with three key personality traits being related to measures of valence (associated with trustworthiness), dominance, and attractiveness. Yet, particularly in voice perception, limited research has established the reliability of such personality judgements across stimulus types of varying lengths. Here we investigate whether first impressions of trustworthiness, dominance, and attractiveness of novel speakers are related when a judgement is made on hearing both one word and one sentence from the same speaker. Secondly, we test whether what is said, thus adjusting content, influences the stability of personality ratings. 60 Scottish voices (30 females) were recorded reading two texts: one of ambiguous content and one with socially-relevant content. One word (~500 ms) and one sentence (~3000 ms) were extracted from each recording for each speaker. 181 participants (138 females) rated either male or female voices across both content conditions (ambiguous, socially-relevant) and both stimulus types (word, sentence) for one of the three personality traits (trustworthiness, dominance, attractiveness). Pearson correlations showed personality ratings between words and sentences were strongly correlated, with no significant influence of content. In short, when establishing an impression of a novel speaker, judgments of three key personality traits are highly related whether you hear one word or one sentence, irrespective of what they are saying. This finding is consistent with initial personality judgments serving as elucidators of approach or avoidance behaviour, without modulation by time or content. All data and sounds are available on OSF (osf.io/s3cxy).
Article
Remote pair programming is widely used in software development, but no research has examined how race affects these interactions between developers. We embarked on this study due to the historical under representation of Black developers in the tech industry, with White developers comprising the majority. Our study involved 24 experienced developers, forming 12 gender-balanced same- and mixed-race pairs. Pairs collaborated on a programming task using the think-aloud method, followed by individual retrospective interviews. Our findings revealed elevated productivity scores for mixed-race pairs, with no differences in code quality between same- and mixed-race pairs. Mixed-race pairs excelled in task distribution, shared decision-making, and role-exchange but encountered communication challenges, discomfort, and anxiety, shedding light on the complexity of diversity dynamics. Our study emphasizes race’s impact on remote pair programming and underscores the need for diverse tools and methods to address racial disparities for collaboration.
Article
Full-text available
Sound symbolism refers to the non-arbitrary relationship between phonemes and specific perceptual attributes. Few studies have focused on the sound symbolic associations between Mandarin phonemes and multiple perceptual dimensions, including social attitudes. The main purpose of the current study is to identify the acoustic cues crucial to perceptual judgment on the visual, tactile, and interpersonal dimensions based on the Mandarin rimes. The study found that in addition to the first and second formants, formant transitions and nasal codas were crucial in characterizing sound symbolic associations. Machine learning models also showed that critical acoustic parameters could successfully classify the different perceived attributes of rimes. The study further examined the mechanisms underlying forming a sound symbolic association. Through mediation analyses, the light/heavy dimension proved to have a suppression effect on the politeness and friendliness perception of Mandarin compound rimes (diphthongs and diphthongs with nasal codas /n/ or /ŋ/). This finding suggested that the two potential mechanisms of sound symbolism (i.e., language pattern account and shared-property account) could coexist and interact with each other.
Article
People are often advised to project confidence with their bodies and voices to convince others. Prior research has focused on the high and low thinking processes through which vocal confidence signals (e.g., fast speed, falling intonation, low pitch) can influence attitude change. In contrast, this research examines how the vocal confidence of speakers operates under more moderate elaboration levels, revealing that falling intonation only benefits persuasion under certain circumstances. In three experiments, we show that falling (vs. rising) vocal intonation at the ends of sentences can signal speaker confidence. Under moderate elaboration conditions, falling (vs. rising) vocal intonation increased message processing, bolstering the benefit of strong over weak messages, increasing the proportion of message-relevant thoughts, and increasing thought-attitude correspondence. In sum, the present work examined an unstudied role of vocal confidence in guiding persuasion, revealing new processes by which vocal signals increase or fail to increase persuasion.
Chapter
One of the outstanding characteristics of the human species is that it is endowed with speech. Other living beings have different ways of communicating: dances, plumage, trills, sounds, aromas, but we do not know of another living being endowed with words, with language in the strict sense. In fact, Aristotle defines man as an animal endowed with logos, therefore with speech and reason. Since when do human beings speak? What was the first word spoken? What sense or utility would it have had? Neurolinguistics explains that in order to be able to emit words, a brain that dominates a vocal apparatus is needed, but above all the domain of the intercostal muscles and the diaphragm. To dominate these muscles, the vagus nerve (X pair) must have a sufficient number of fibers. This nerve exits at the base of the skull through the posterior foramen lacerum. Only 80,000 years ago that hole at the base of the skull was similar in size to ours, so it is reasonable to think that the first human word was spoken at that time. Since we have no evidence to know what that word was or who spoke it, we can freely imagine it. The human word is related to music, each language and each speaking region has its own music so we can infer where a speaker comes from by listening to him. That is why I think the first word was sung. I think the first human person to utter a word was a woman because the word requires a community, someone to listen. That first word must have been a magical sound, and whoever spoke it is a goddess to the others. The first images of gods of humanity are women. I also think it was maternal. The first woman who spoke was a mother, and the first who listened was a son. I also think it was conjured. That is to say, evocative of peace and tranquility before a fearful child. That is why I believe that the first word in the history of humanity was a woman, a mother, singing to her child in her arms to ward off fear. That allowed the child to sleep and rest, that is, she was therapeutic. We will develop that in this chapter.
Article
When listeners hear a voice, they rapidly form a complex first impression of who the person behind that voice might be. We characterize how these multivariate first impressions from voices emerge over time across different levels of abstraction using electroencephalography and representational similarity analysis. We find that for eight perceived physical (gender, age, and health), trait (attractiveness, dominance, and trustworthiness), and social characteristics (educatedness and professionalism), representations emerge early (~80 ms after stimulus onset), with voice acoustics contributing to those representations between ~100 ms and 400 ms. While impressions of person characteristics are highly correlated, we can find evidence for highly abstracted, independent representations of individual person characteristics. These abstracted representationse merge gradually over time. That is, representations of physical characteristics (age, gender) arise early (from ~120 ms), while representations of some trait and social characteristics emerge later (~360 ms onward). The findings align with recent theoretical models and shed light on the computations underpinning person perception from voices.
Preprint
Full-text available
This study investigated the capability of vocal-identity-cloning Artificial Intelligence (AI) to encode human-specific confident, doubtful, and neutral-intending emotive states. Linear mixed-effects models and machine learning classification with eXtreme Gradient Boosting were employed to examine the underlying acoustic signatures from 2,700 audio clips, comprising of sentences spoken by human speakers and two sets of equivalences (AI-Geography/AI-Trivia, based on the trained text) generated by voice-cloning models designed to clone human speakers’ identities. Compared with neutral-intending voice, human speakers lengthened their vocal tract, raised the fundamental frequency, and increased Chroma constant-Q transform when they intended to be confident; An opposite pattern was shown when they intended to be doubtful. The two sets of AI sounds displayed a similar pattern to human speech, suggesting a shared mechanism for encoding vocal expression across sources. The 1,000 times training-testing classification models reported an in-group advantage for AI sources. The algorithms, trained on AI-Geography/AI-Trivia, resulted in higher accuracies when tested within these AI sources than when tested on human audio. All between-source classifications reported above-chance-level (1/3) accuracies. These findings highlighted that voice-cloning AI, the widely used conversational agent, can learn and generate human-specific vocally-expressed confidence.
Chapter
Neural oscillations have emerged as a paradigm of reference for EEG and MEG research. In this chapter, we highlight some the possibilities and limits of modelling the dynamics of complex stimulus perception as being shaped by internal oscillators. The reader is introduced to the main physiological tenets underpinning the use of neural oscillations in cognitive neuroscience. The concepts of entrainment and neural tracking are illustrated with particular reference to speech and language processes.Key wordsNeural oscillationsNeural entrainmentCortical trackingSynchronySpeechLanguage
Chapter
Prosody, that is meaningful patterns in intonation, rhythm, stress, and tone, impacts on a large body of other language operations, and it is likely one of the most undervalued and possibly understudied language-related functions. Drawing primarily upon evidence from event-related brain potentials (ERPs) but also referring to neural oscillation activity where possible, this chapter offers a concise review of the electrophysiological responses underlying prosody, how prosody impinges on other language functions, summarizes the effects of task demands and listeners characteristics, and ends by sketching future directions of travel for the field of studying how prosody is processed in the brain.Key wordsEmotional prosodyTone of voiceAffective prosodySocial intonationERPsNeural oscillations
Article
Listeners spontaneously form impressions of a person from their voice: Is someone old or young? Trustworthy or untrustworthy? Some studies suggest that these impressions emerge rapidly (e.g., < 400 ms for traits), but it is unclear just how rapidly different impressions can emerge and whether the time courses differ across characteristics. I presented 618 adult listeners with voice recordings ranging from 25 ms to 800 ms in duration and asked them to rate physical (age, sex, health), trait (trustworthiness, dominance, attractiveness), and social (educatedness, poshness, professionalism) characteristics. I then used interrater agreement as an index for impression formation. Impressions of physical characteristics and dominance emerged fastest, showing high agreement after only 25 ms of exposure. In contrast, agreement for trait and social characteristics was initially low to moderate and gradually increased. Such a staggered time course suggests that there could be a temporo-perceptual hierarchy for person perception in which faster impressions could influence later ones.
Article
Full-text available
This article unpacks the basic mechanisms by which paralinguistic features communicated through the voice can affect evaluative judgments and persuasion. Special emphasis is placed on exploring the rapidly emerging literature on vocal features linked to appraisals of confidence (e.g., vocal pitch, intonation, speech rate, loudness, etc.), and their subsequent impact on information processing and meta-cognitive processes of attitude change. The main goal of this review is to advance understanding of the different psychological processes by which paralinguistic markers of confidence can affect attitude change, specifying the conditions under which they are more likely to operate. In sum, we highlight the importance of considering basic mechanisms of attitude change to predict when and why appraisals of paralinguistic markers of confidence can lead to more or less persuasion.
Article
Full-text available
Evidence suggests that observers can accurately perceive a speaker's static confidence level, related to their personality and social status, by only assessing their visual cues. However, less is known about the visual cues that speakers produce to signal their transient confidence level in the content of their speech. Moreover, it is unclear what visual cues observers use to accurately perceive a speaker's confidence level. Observers are hypothesized to use visual cues in their social evaluations based on the cue's level of perceptual salience and/or their beliefs about the cues that speakers with a given mental state produce. We elicited high and low levels of confidence in the speech content by having a group of speakers answer general knowledge questions ranging in difficulty while their face and upper body were video recorded. A group of observers watched muted videos of these recordings to rate the speaker's confidence and report the face/body area(s) they used to assess the speaker's confidence. Observers accurately perceived a speaker's confidence level relative to the speakers' subjective confidence, and broadly differentiated speakers as having low compared to high confidence by using speakers' eyes, facial expressions, and head movements. Our results argue that observers use a speaker's facial region to implicitly decode a speaker's transient confidence level in a situation of low-stakes social evaluation, although the use of these cues differs across speakers. The effect of situational factors on speakers' visual cue production and observers' utilization of these visual cues are discussed, with implications for improving how observers in real world contexts assess a speaker's confidence in their speech content.
Article
Our decision to believe what another person says can be influenced by vocally expressed confidence in speech and by whether the speaker-listener are members of the same social group. The dynamic effects of these two information sources on neurocognitive processes that promote believability impressions from vocal cues are unclear. Here, English Canadian listeners were presented personal statements (She has access to the building) produced in a confident or doubtful voice by speakers of their own dialect (in-group) or speakers from two different "out-groups" (regional or foreign-accented English). Participants rated how believable the speaker is for each statement and event-related potentials (ERPs) were analysed from utterance onset. Believability decisions were modulated by both the speaker's vocal confidence level and their perceived in-group status. For in-group speakers, ERP effects revealed an early differentiation of vocally expressed confidence (i.e., N100, P200), highlighting the motivational significance of doubtful voices for drawing believability inferences. These early effects on vocal confidence perception were qualitatively different or absent when speakers had an accent; evaluating out-group voices was associated with increased demands on contextual integration and re-analysis of a non-native representation of believability (i.e., increased N400, late negativity response). Accent intelligibility and experience with particular out-group accents each influenced how vocal confidence was processed for out-group speakers. The N100 amplitude was sensitive to out-group attitudes and predicted actual believability decisions for certain out-group speakers. We propose a neurocognitive model in which vocal identity information (social categorization) dynamically influences how vocal expressions are decoded and used to derive social inferences during person perception.
Article
Full-text available
Male and female “senders” described their opinions on four controversial issues to target persons. Each sender expressed sincere agreement with the target on one of the issues and sincere disagreement on another (truthful messages), and also pretended to agree with the partner on one of the issues (an ingratiating lie) and pretended to disagree on another (a noningratiating lie). Groups of judges then rated the sincerity of each message on the basis of information available from one of four different channels: verbal (words only, in transcript form), audio (audiotape only), visual (videotape with no sound), and audiovisual (videotape with sound). Results showed that (a) lies told by women were more readily detected than lies told by men, (b) lies told to opposite-sex targets were more easily detected than lies told to same-sex targets, and (c) ingratiating lies were more successfully detected than were noningratiating lies, particularly when told to attractive targets. Furthermore, when senders talked to opposite-sex (relative to same-sex) targets, their lies were most easily detected from the three channels that included nonverbal cues. For ingratiating (relative to noningratiating) lies, detectability was greatest for the channels that included visual nonverbal cues. Senders addressing attractive targets were perceived as less sincere than senders addressing unattractive targets, both when lying and when telling the truth, and this difference in the degree of sincerity conveyed was especially pronounced in the channels that included nonverbal cues. Results are discussed in terms of the effects of motivation on verbal and nonverbal communicative success.
Article
Full-text available
Research on language comprehension using event-related potentials (ERPs) reported distinct ERP components reliably related to the processing of semantic (N400) and syntactic information (P600). Recent ERP studies have challenged this well-defined distinction by showing P600 effects for semantic and pragmatic anomalies. So far, it is still unresolved whether the P600 reflects specific or rather common processes. The present study addresses this question by investigating ERPs in response to a syntactic and pragmatic (irony) manipulation, as well as a combined syntactic and pragmatic manipulation. For the syntactic condition, a morphosyntactic violation was applied, whereas for the pragmatic condition, such as "That is rich", either an ironic or literal interpretation was achieved, depending on the prior context. The ERPs at the critical word showed a LAN-P600 pattern for syntactically incorrect sentences relative to correct ones. For ironic compared to literal sentences, ERPs showed a P200 effect followed by a P600 component. In comparison of the syntax-related P600 to the irony-related P600, distributional differences were found. Moreover, for the P600 time window (i.e., 500-900 ms), different changes in theta power between the syntax and pragmatics effects were found, suggesting that different patterns of neural activity contributed to each respective effect. Thus, both late positivities seem to be differently sensitive to these two types of linguistic information, and might reflect distinct neurocognitive processes, such as reanalysis of the sentence structure versus pragmatic reanalysis.
Article
Full-text available
ERPLAB toolbox is a freely available, open-source toolbox for processing and analyzing event-related potential (ERP) data in the MATLAB environment. ERPLAB is closely integrated with EEGLAB, a popular open-source toolbox that provides many EEG preprocessing steps and an excellent user interface design. ERPLAB adds to EEGLAB’s EEG processing functions, providing additional tools for filtering, artifact detection, re-referencing, and sorting of events, among others. ERPLAB also provides robust tools for averaging EEG segments together to create averaged ERPs, for creating difference waves and other recombinations of ERP waveforms through algebraic expressions, for filtering and re-referencing the averaged ERPs, for plotting ERP waveforms and scalp maps, and for quantifying several types of amplitudes and latencies. ERPLAB’s tools can be accessed either from an easy-to-learn graphical user interface or from MATLAB scripts, and a command history function makes it easy for users with no programming experience to write scripts. Consequently, ERPLAB provides both ease of use and virtually unlimited power and flexibility, making it appropriate for the analysis of both simple and complex ERP experiments. Several forms of documentation are available, including a detailed user’s guide, a step-by-step tutorial, a scripting guide, and a set of video-based demonstrations.
Chapter
Full-text available
The last decade has seen an explosion of research in auditory perception and cognition. This growing activity encompasses neurophysiological research in nonhuman species, computational modeling of basic neurophysiological functions, and neuroimaging research in humans. Among the various neuroimaging techniques available, scalp recording of neuroelectric (electroencephalography [EEG]) and neuromagnetic (magnetoencephalography [MEG]) (see Nagarajan, Gabriel, and Herman, Chapter 5) brain activity have proven to be formidable tools in the arsenal available to cognitive neuroscientists interested in understanding audition. These techniques measure the dynamic pattern of electromagnetic fields at the scalp produced by the coherent activity of large neuronal populations in the brain. In cognitive neuroscience, the measurement of the electrical event-related brain potentials (ERPs) or magnetic event-related fields (ERFs) is among the major noninvasive techniques used for investigating sensory and cognitive information processing and for testing specific assumptions of cognitive theories that are not easily amenable to behavioral techniques. After identifying and characterizing the ERP/ERF signals that accompany the basic steps of processing discrete events, scientific interest has gradually shifted toward specifying the complex processing of more realistic stimulus configurations. In the auditory modality, recent years have seen an upsurge of research papers investigating the processes of auditory scene analysis (ASA) by ERP/ERF methods (for recent reviews, see Alain, 2007; Snyder & Alain, 2007; Winkler et al., 2009a).
Article
Full-text available
We propose a new functional-anatomical mapping of the N400 and the P600 to a minimal cortical network for language comprehension. Our work is an example of a recent research strategy in cognitive neuroscience, where researchers attempt to align data regarding the nature and time-course of cognitive processing (from ERPs) with data on the cortical organization underlying it (from fMRI). The success of this “alignment” approach critically depends on the functional interpretation of relevant ERP components. Models of language processing that have been proposed thus far do not agree on these interpretations, and present a variety of complicated functional architectures. We put forward a very basic functional-anatomical mapping based on the recently developed Retrieval-Integration account of language comprehension (Brouwer et al., 2012). In this mapping, the left posterior part of the Middle Temporal Gyrus (BA 21) serves as an epicenter (or hub) in a neurocognitive network for the retrieval of word meaning, the ease of which is reflected in N400 amplitude. The left Inferior Frontal Gyrus (BA 44/45/47), in turn, serves a network epicenter for the integration of this retrieved meaning with the word's preceding context, into a mental representation of what is being communicated; these semantic and pragmatic integrative processes are reflected in P600 amplitude. We propose that our mapping describes the core of the language comprehension network, a view that is parsimonious, has broad empirical coverage, and can serve as the starting point for a more focused investigation into the coupling of brain anatomy and electrophysiology.
Article
Full-text available
The combined knowledge of word meanings and grammatical rules does not allow a listener to grasp the intended meaning of a speaker's utterance. Pragmatic inferences on the part of the listener are also required. The present work focuses on the processing of ironic utterances (imagine a slow day being described as "really productive") because these clearly require the listener to go beyond the linguistic code. Such utterances are advantageous experimentally because they can serve as their own controls in the form of literal sentences (now imagine an active day being described as "really productive") as we employ techniques from electrophysiology (EEG). Importantly, the results confirm previous ERP findings showing that irony processing elicits an enhancement of the P600 component (Regel et al., 2011). More original are the findings drawn from Time Frequency Analysis (TFA) and especially the increase of power in the gamma band in the 280-400 time-window, which points to an integration among different streams of information relatively early in the comprehension of an irony. This represents a departure from traditional accounts of language processing which generally view pragmatic inferences as late-arriving. We propose that these results indicate that unification operations between the linguistic code and contextual information play a critical role throughout the course of irony processing and earlier than previously thought.
Article
Full-text available
Previous research suggests that emotional prosody processing is a highly rapid and complex process. In particular, it has been shown that different basic emotions can be differentiated in an early event-related brain potential (ERP) component, the P200. Often, the P200 is followed by later long lasting ERPs such as the late positive complex. The current experiment set out to explore in how far emotionality and arousal can modulate these previously reported ERP components. In addition, we also investigated the influence of task demands (implicit vs. explicit evaluation of stimuli). Participants listened to pseudo-sentences (sentences with no lexical content) spoken in six different emotions or in a neutral tone of voice while they either rated the arousal level of the speaker or their own arousal level. Results confirm that different emotional intonations can first be differentiated in the P200 component, reflecting a first emotional encoding of the stimulus possibly including a valence tagging process. A marginal significant arousal effect was also found in this time-window with high arousing stimuli eliciting a stronger P200 than low arousing stimuli. The P200 component was followed by a long lasting positive ERP between 400 and 750 ms. In this late time-window, both emotion and arousal effects were found. No effects of task were observed in either time-window. Taken together, results suggest that emotion relevant details are robustly decoded during early processing and late processing stages while arousal information is only reliably taken into consideration at a later stage of processing.
Article
Full-text available
Under adverse listening conditions, speech comprehension profits from the expectancies that listeners derive from the semantic context. However, the neurocognitive mechanisms of this semantic benefit are unclear: How are expectancies formed from context and adjusted as a sentence unfolds over time under various degrees of acoustic degradation? In an EEG study, we modified auditory signal degradation by applying noise-vocoding (severely degraded: four-band, moderately degraded: eight-band, and clear speech). Orthogonal to that, we manipulated the extent of expectancy: strong or weak semantic context (±con) and context-based typicality of the sentence-last word (high or low: ±typ). This allowed calculation of two distinct effects of expectancy on the N400 component of the evoked potential. The sentence-final N400 effect was taken as an index of the neural effort of automatic word-into-context integration; it varied in peak amplitude and latency with signal degradation and was not reliably observed in response to severely degraded speech. Under clear speech conditions in a strong context, typical and untypical sentence completions seemed to fulfill the neural prediction, as indicated by N400 reductions. In response to moderately degraded signal quality, however, the formed expectancies appeared more specific: Only typical (+con +typ), but not the less typical (+con −typ) context–word combinations led to a decrease in the N400 amplitude. The results show that adverse listening “narrows,” rather than broadens, the expectancies about the perceived speech signal: limiting the perceptual evidence forces the neural system to rely on signal-driven expectancies, rather than more abstract expectancies, while a sentence unfolds over time.
Article
Full-text available
Package: LMERConvenienceFunctions Type: Package Title: A suite of functions to back-fit fixed effects and forward-fit random effects, as well as other miscellaneous functions. Version: 2.4 Date: 2013-11-29 Author: Antoine Tremblay, Dalhousie University, and Johannes Ransijn, University of Copenhagen Maintainer: "Antoine Tremblay, Dalhousie University" <trea26@gmail.com> Description: Functions to back-fit fixed effects (on F or t values as well as log-likelihood ratio testing (llrt), AIC, BIC, relLik.AIC or relLik.BIC) and to forward-fit random effects (using log-likelihood ratio testing). NOTE that the back- and forward-fitting of generalized linear mixed-effects regression (glmer) models is now supported by functions ``bfFixefLMER_t.fnc'' and ``ffRanefLMER.fnc''. The package also includes a function to compute ANOVAs with upper- and lower-bound p-values (anti-conservative and conservative, respectively), a function to graph model criticism plots, functions to trim data on model residuals or on a response variable (per subject), a function to perform posthoc analyses (with or without MCMC p-values), a function to generate summaries of mcposthoc objects, a function to generate (dynamic) 3d plots of (i) predicted values of an LMER model for interactions between two numeric variables,(ii) the raw data as a function of two numeric variable, and (iii) kernel density estimates (densities) of two numeric variables, and finally a function to calculate the relative log-likelihood between two models. Also, as of version 2.4, the package gains function ``plotLMER.fnc'' (revived from archived package ``languageR''). Depends: Matrix, lme4 Suggests: LCFdata, rgl, fields, mgcv, parallel License: GPL-2 LazyLoad: yes
Article
Full-text available
Past research has identified an event-related potential (ERP) marker for vocal emotional encoding and has highlighted vocal-processing differences between male and female listeners. We further investigated this ERP vocal-encoding effect in order to determine whether it predicts voice-related changes in listeners' memory for verbal interaction content. Additionally, we explored whether sex differences in vocal processing would affect such changes. To these ends, we presented participants with a series of neutral words spoken with a neutral or a sad voice. The participants subsequently encountered these words, together with new words, in a visual word recognition test. In addition to making old/new decisions, the participants rated the emotional valence of each test word. During the encoding of spoken words, sad voices elicited a greater P200 in the ERP than did neutral voices. While the P200 effect was unrelated to a subsequent recognition advantage for test words previously heard with a neutral as compared to a sad voice, the P200 did significantly predict differences between these words in a concurrent late positive ERP component. Additionally, the P200 effect predicted voice-related changes in word valence. As compared to words studied with a neutral voice, words studied with a sad voice were rated more negatively, and this rating difference was larger, the larger the P200 encoding effect was. While some of these results were comparable in male and female participants, the latter group showed a stronger P200 encoding effect and qualitatively different ERP responses during word retrieval. Estrogen measurements suggested the possibility that these sex differences have a genetic basis.
Article
Full-text available
The influence of an adult model's degree of persistence and statements of confidence were studied with 100 1st- and 2nd-grade Black and Hispanic children from a lower-class, urban school. A male model unsuccessfully attempted to separate the 2 rings of a wire puzzle, and the child was subsequently presented a different insolvable ring puzzle to solve. A day later, the child was tested again with an insolvable embedded word puzzle. In addition to their actual persistence on the 2 tasks, the children's self-efficacy estimates were assessed at various points during the experiment. The model's long duration of performance and his statements of confidence significantly increased the children's degree of persistence on both the wire puzzle and the embedded word transfer task. These modeling treatments significantly affected the children's self-efficacy estimates as well. The implications of these findings are discussed in relation to A. Bandura's (1977) theory of self-efficacy. (13 ref)
Article
Full-text available
In 3 experiments, 61 undergraduates listened to recordings of male speakers answering 2 interview questions and rated the speakers on a variety of semantic differential scales. The recordings had been altered so that the pitch of the speakers' voices was raised or lowered by 20% or left at its normal level, and speech rate was expanded or compressed by 30% or left at its normal rate. The results provide clear evidence that listeners use these acoustic properties in making personal attributions to speakers. Speakers with high-pitched voices were judged less truthful, less emphatic, less "potent" (smaller, thinner, faster), and more nervous. Slow-talking speakers were judged less truthful, less fluent, and less persuasive and were seen as more "passive" (slower, colder, passive, weaker) but more "potent." However, the effects of the acoustic manipulations on personal attributions also depended on the particular question that elicited the response. (29 ref)
Article
Full-text available
be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
Article
Full-text available
This paper is about using prosody to automatically detect one aspect of a speaker's internal state: their level of certainty. While past work on classifying level of certainty used the per-ceived level of certainty as the value to predict, we find that this quantity often differs from a speaker's actual level of certainty as gauged by self-reports. In this work we build models to pre-dict a speaker's self-reported level of certainty using prosodic features. Our data is a corpus of single-sentence utterances that are annotated with (1) whether the statement is correct or incor-rect, (2) the perceived level of certainty, and (3) the self-reported level of certainty. Knowing the self-reported level of certainty, in conjunction with the perceived level of certainty, allows us to assess what we will refer to as the speaker's transparency. Knowing the self-reported level of certainty, in conjunction with the correctness of the answer, allows us to assess what we will refer to as self-awareness. Our models, trained on prosodic fea-tures, correctly classify the self-reported level of certainty 75% of the time. Intelligent systems can use this information to make inferences about the user's internal state, for example whether the user of a system has a misconception, makes a lucky guess, or needs encouragement.
Article
Full-text available
This paper reviews the literature on the Nl wave of the human auditory evoked potential. It concludes that at least six different cerebral processes can contribute to (he negative wave recorded from the scalp with a peak latency between 50 and 150 ms: a component generated in the auditory-cortex on the supratemporal plane, a component generated in the association cortex on the lateral aspect of the temporal and parietal cortex, a component generated in the motor and premotor cortices, the mismatch negativity, a temporal component of the processing negativity, and a frontal component of the processing negativity, The first three, which can be considered ‘true’ N1 components, are controlled by the physical and temporal aspects of the stimulus and by the general state of the subject. The other three components are not necessarily elicited by a stimulus but depend on the conditions in which the stimulus occurs. They often last much longer than the true N1 components that they overlap.
Article
Full-text available
To establish a valid database of vocal emotional stimuli in Mandarin Chinese, a set of Chinese pseudosentences (i.e., semantically meaningless sentences that resembled real Chinese) were produced by four native Mandarin speakers to express seven emotional meanings: anger, disgust, fear, sadness, happiness, pleasant surprise, and neutrality. These expressions were identified by a group of native Mandarin listeners in a seven-alternative forced choice task, and items reaching a recognition rate of at least three times chance performance in the seven-choice task were selected as a valid database and then subjected to acoustic analysis. The results demonstrated expected variations in both perceptual and acoustic patterns of the seven vocal emotions in Mandarin. For instance, fear, anger, sadness, and neutrality were associated with relatively high recognition, whereas happiness, disgust, and pleasant surprise were recognized less accurately. Acoustically, anger and pleasant surprise exhibited relatively high mean f0 values and large variation in f0 and amplitude; in contrast, sadness, disgust, fear, and neutrality exhibited relatively low mean f0 values and small amplitude variations, and happiness exhibited a moderate mean f0 value and f0 variation. Emotional expressions varied systematically in speech rate and harmonics-to-noise ratio values as well. This validated database is available to the research community and will contribute to future studies of emotional prosody for a number of purposes. To access the database, please contact pan.liu@mail.mcgill.ca.
Article
Full-text available
Prosodic elements such as stress and intonation are generally seen as providing both ‘natural’ and properly linguistic input to utterance comprehension. They contribute not only to overt communication but to more covert or accidental forms of information transmission. They typically create impressions, convey information about emotions or attitudes, or alter the salience of linguistically-possible interpretations rather than conveying distinct propositions or concepts in their own right. These aspects of communication present a challenge to pragmatic theory: how should they be described and explained? This paper is an attempt to explore how the wealth of insights provided by the literature on the interpretation of prosody might be integrated into the relevance-theoretic framework (Sperber and Wilson, 1986/1995; Blakemore, 2002; Carston, 2002). We will focus on four main issues. First, how should the communication of emotions, attitudes and impressions be analysed? Second, how might prosodic elements function as ‘natural’ communicative devices? Third, what (if anything) do prosodic elements encode? Fourth, what light can the study of prosody shed on the place of pragmatics in the architecture of the mind? In each case, we hope to show that the study of prosody and the study of pragmatics can interact in ways that benefit both disciplines.
Article
Full-text available
We investigated the influence of English proficiency on ERPs elicited by lexical semantic violations in English sentences, in both native English speakers and native Spanish speakers who learned English in adulthood. All participants were administered a standardized test of English proficiency, and data were analyzed using linear mixed effects (LME) modeling. Relative to native learners, late learners showed reduced amplitude and delayed onset of the N400 component associated with reading semantic violations. As well, after the N400 late learners showed reduced anterior negative scalp potentials and increased posterior potentials. In both native and late learners, N400 amplitudes to semantically appropriate words were larger for people with lower English proficiency. N400 amplitudes to semantic violations, however, were not influenced by proficiency. Although both N400 onset latency and the late ERP effects differed between L1 and L2 learners, neither correlated with proficiency. Different approaches to dealing with the high degree of correlation between proficiency and native/late learner group status are discussed in the context of LME modeling. The results thus indicate that proficiency can modulate ERP effects in both L1 and L2 learners, and for some measures (in this case, N400 amplitude), L1–L2 differences may be entirely accounted for by proficiency. On the other hand, not all effects of L2 learning can be attributed to proficiency. Rather, the differences in N400 onset and the post-N400 violation effects appear to reflect fundamental differences in L1–L2 processing.
Article
An important issue in irony comprehension concerns when and how listeners integrate extra-linguistic and linguistic information to compute the speaker's intended meaning. To assess whether knowledge about the speaker's communicative style impacts the brain response to irony, ERPs were recorded as participants read short passages that ended either with literal or ironic statements made by one of two speakers. The experiment was carried out in two sessions in which each speaker's use of irony was manipulated. In Session 1, 70% of ironic statements were made by the ironic speaker, while the non-ironic speaker expressed 30% of them. For irony by the non-ironic speaker, an increased P600 was observed relative to literal utterances. By contrast, both ironic and literal statements made by the ironic speaker elicited similar P600 amplitudes. In Session 2, conducted 1 day later, both speakers' use of irony was balanced (i.e. 50% ironic, 50% literal). ERPs for Session 2 showed an irony-related P600 for the ironic speaker but not for the non-ironic speaker. Moreover, P200 amplitude was larger for sentences congruent with each speaker's communicative style (i.e. for irony made by the ironic speaker, and for literal statements made by the non-ironic speaker). These findings indicate that pragmatic knowledge about speakers can affect language comprehension 200 ms after the onset of a critical word, as well as neurocognitive processes underlying the later stages of comprehension (500-900 ms post-onset). Thus perceived speakers' characteristics dynamically impact the construction of appropriate interpretations of ironic utterances.
Article
This study aims to investigate the perceptual-acoustic correlates of vocal confidence. Statements with different communicative functions (e.g., stating facts, making judgments) were spoken in confident, close-to-confident, unconfident and neutral voices. Statements with preceding linguistic cues (e.g. I'm positive, Most likely, Maybe, etc.) or no linguistic cues were presented to sixty listeners in a perceptual study. The listeners were asked to judge whether statements conveyed some level of confidence, and if so, they were asked to evaluate the level of confidence of the speaker. The results demonstrated that the intended levels of confidence varied in a graded manner in the perceptual rating score; the more confident the statement intended to be, the higher the rating. In general, the neutral voice was judged to be more confident than the close-to-confident voice, but less than the confident voice. The presence of a linguistic cue tended to increase ratings of confident voices but decrease ratings of voices in the less confident voice conditions. To evaluate how specific prosodic cues are used to encode and decode confidence information, acoustic analyses were performed on the stimuli without the linguistic cue based on the mean perceptual rating of speaker confidence for each item. Results showed that statements rated as confident versus unconfident differed in the mean and the variance of fundamental frequency (f0) as well as speech rate, with confident statements exhibiting lower mean f0, smaller f0 variance, and faster speaking rate than unconfident statements. The perceived level of confidence was differentiated in the mean fundamental frequency in a parametric way, the lower the level of confidence, the higher the mean f0. Confident voices were also distinct from the other three conditions in terms of mean and range of amplitude (i.e., loudness). These findings shed light on how linguistic and paralinguistic cues reveal confidence-related information to listeners during speech.
Article
This study assessed the dimensionality of the Empathy Quotient (EQ) using two statistical approaches: Rasch and Confirmatory Factor Analysis (CFA). Participants included N=658 with an autism spectrum condition diagnosis (ASC), N=1375 family members of this group, and N=3344 typical controls. Data were applied to the Rasch model (Rating Scale) using WINSTEPS. The Rasch model explained 83% of the variance. Reliability estimates were greater than .90. Analysis of differential item functioning (DIF) demonstrated item invariance between the sexes. Principal Components Analysis (PCA) of the residual factor showed separation into Agree and Disagree response subgroups. CFA suggested that 26-item model with response factors had the best fit statistics (RMSEA.05, CFI .93). A shorter 15-item three-factor model had an omega (ω) of .779, suggesting a hierarchical factor of empathy underlies these sub-factors. The EQ is an appropriate measure of the construct of empathy and can be measured along a single dimension.
Article
Prosodic aspects of speech such as pitch, duration and amplitude constitute nonverbal cues that supplement or modify the meaning of the spoken word, to provide valuable clues as to a speakers' state of mind. It can thus indicate what emotion a person is feeling (emotional prosody), or their attitude towards an event, person or object (attitudinal prosody). Whilst the study of emotional prosody has gathered pace, attitudinal prosody now deserves equal attention. In social cognition, understanding attitudinal prosody is important in its own right, since it can convey powerful constructs such as confidence, persuasion, sarcasm and superiority. In this review, it is examined what prosody is, how it conveys attitudes, and which attitudes prosody can convey. The review finishes by considering the neuroanatomy associated with attitudinal prosody, and put forward the hypothesis that this cognition is mediated by the right cerebral hemisphere, particularly posterior superior lateral temporal cortex, with an additional role for the basal ganglia, and limbic regions such as the medial prefrontal cortex and amygdala. It is suggested that further exploration of its functional neuroanatomy is greatly needed, since it could provide valuable clues about the value of current prosody nomenclature and its separability from other types of prosody at the behavioral level.
Article
People responding to questions are sometimes uncertain, slow, or unable to answer. They handle these problems of self-presentation, we propose, by the way they respond. Twenty-five respondents were each asked 40 factual questions in a conversational setting. Later, they rated for each question their feeling that they would recognize the correct answer, then took a recognition test on all 40 questions. As found previously, the weaker their feeling of knowing, the slower their answers, the faster their nonanswers ("I don′t know"), and the worse their recognition. But further, as proposed, the weaker their feeling of knowing, the more often they answered with rising intonation, used hedges such as "I guess," responded "I don′t know" instead of "I can′t remember," and added "uh" or "um," self-talk, and other face-saving comments. They reliably used "uh" to signal brief delays and "um" longer ones.
Article
Principal components analysis (PCA) has attracted increasing interest as a tool for facilitating analysis of high-density event-related potential (ERP) data. While every researcher is exposed to this statistical procedure in graduate school, its complexities are rarely covered in depth and hence researchers are often not conversant with its subtleties. Furthermore, application to ERP datasets involves unique aspects that would not be covered in a general statistics course. This tutorial seeks to provide guidance on the decisions involved in applying PCA to ERPs and their consequences, using the ERP PCA Toolkit to illustrate the analysis process on a novelty oddball dataset.
Article
In previous studies (Johnson & London, 1968; London et al., 1970a) expressed confidence has been identified as a new ‘message’ variable causing persuasion. However, the variable has been extracted from naturalistic discussion in a dyad. In order to determine further its efficacy, we manipulated expressed confidence in two studies. In Study 1 expressed confidence was manipulated via language. In Study 2 expressed confidence was manipulated via body language. The results confirm earlier findings and indicate that a ‘channel’ notion is required to understand the expression of confidence.
Article
A courtroom simulation technique was employed to examine the effects of a communicator's looking behavior on observers' perceptions of his credibility. Half of the subjects heard testimony presented on behalf of a defendent by a witness (one of three confederates) who was visually presented as either looking directly toward the target of his communication (gaze maintenance) or slightly downward (gaze aversion) while testifying. The other half of the subjects merely heard the audio portion of the testimony. The results indicated that witnesses who averted their gaze were perceived to be less credible and, ultimately, the defendant for whom they testified was judged as more likely to be guilty. These results are discussed in terms of their implications for research concerned with the communicative effects of visual behavior.
Article
The relative contribution of verbal and nonverbal cues to the formation of impressions of confidence was assessed. Sequences, in which actresses gave street directions with different levels of verbal and nonverbal confidence, were recorded and replayed to subjects using television. Eighty male and eighty female first-year Psychology students rated the televised performances on five scales which included expressed confidence. Analysis of the ratings demonstrated that the nonverbal cues expressing confidence or its lack accounted for more than ten times the variance due to the verbal cues. The results are interpreted in the context of other experiments which similarly demonstrate that in the expression of feelings and emotions the nonverbal signals tend to dominate the verbal.
Article
Two studies addressed whether children consider speakers' knowledge states when establishing initial word-referent links. In Study 1, forty-eight 3- and 4-year-olds were taught two novel words by a speaker who expressed either knowledge or ignorance about the words' referents. Children showed better word learning when the speaker was knowledgeable. In Study 2, forty-eight 3- and 4-year-olds were taught two novel words by a speaker who expressed uncertainty about their referents. Whether the uncertainty truly reflected ignorance, however, differed across conditions. In one condition, the speaker said he made the object himself and thus, he was knowledgeable. In the other condition, the speaker stated that the object was made by a friend and thus, expressed ignorance about it. Four-year-olds learned better in the speaker-made than in the friend-made condition; 3-year-olds, however, showed relatively poor learning in both conditions. These findings suggest that theory-of-mind developments impact word learning.
Article
This study examined whether two paralinguistic variables, vocal loudness and response latency, were associated with confidence in answers to trivia questions. Audience presence and size were manipulated and subjects' assertiveness was measured. Subjects verbally responded to trivia questions by indicating their choice and how confident they were in each answer. Tapes of these responses were later analyzed for latency of response and loudness of speech. As expected, the more confident individuals were in their answers, the faster and louder they responded. Assertive subjects spoke louder. The presence of an audience had no effects on vocal responding. Apparently, the confidence of a speaker can be inferred from the speed and loudness of the speaker's responses.
Article
A standard speaker read linguistically confident and doubtful texts in a confident or doubtful voice. A computer-based acoustic analysis of the four tapes showed that paralinguistic confidence was expressed by increased loudness of voice, rapid rate of speech, and infrequent, short pauses. Under some conditions, higher pitch levels and greater pitch and energy fluctuations in the voice were related to paralinguistic confidence. In a 2 × 2 design, observers perceived and used these cues to attribute confidence and related personality traits to the speaker. Both text and voice cues are related to confidence ratings; in addition, the two types of cue are related to differing personality attributes.
Article
The task of developing a unified pragmatics of emotive communication poses many interesting challenges for future research. This paper outlines some areas in which more work could be done to help coordinate present linguistic research. After briefly reviewing some pioneering historical work on language and affect, the paper discusses the following concepts, all of which seem to be in need of further clarification: ‘emotive meaning’, ‘involvement’, ‘emotive markedness’, ‘degree of emotive divergence’, ‘objects of emotive choice’, ‘loci of emotive choice’, and ‘outer vs. inner deixis’. Competing categories of emotive devices in current studies of language and affect are reviewed, and a simplified framework is proposed, consisting of: (1) evaluation devices, (2) proximity devices, (3) specificity devices, (4) evidentiality devices, (5) volitionality devices, and (6) quantity devices. It is argued that only with consensual categories and objects of analysis can investigators start focusing on, and comparing findings about, emotive linguistic phenomena from a unified point of view. Finally, some distinctions between potential perspectives, units, and loci of emotive analysis are proposed, and the paper concludes with a call for increased discussion of how research on language and affect might be better coordinated in the future.
Article
In question-answering, speakers display their metacognitive states using filled pauses and prosody (Smith & Clark, 1993). We examined whether listeners are actually sensitive to this information. Experiment 1 replicated Smith and Clark′s study; respondents were tested on general knowledge questions, surveyed about their FOK (feeling-of-knowing) for these questions, and tested for recognition of answers. In Experiment 2, listeners heard spontaneous verbal responses from Experiment 1 and were tested on their feeling-of-another′s-knowing (FOAK) to see if metacognitive information was reliably conveyed by the surface form of responses. For answers, rising intonation and longer latencies led to fewer FOAK ratings by listeners. For nonanswers, longer latencies led to higher FOAK ratings. In Experiment 3, electronically edited responses with 1-s latencies led to higher FOAK ratings for answers and lower FOAK ratings for nonanswers than those with 5-s latencies. Filled pauses led to lower ratings for answers and higher ratings for nonanswers than did unfilled pauses. There was no support for a filler-as-morpheme hypothesis, that "um" and "uh" contrast in meaning. We conclude that listeners can interpret the metacognitive information that speakers display about their states of knowledge in question-answering.
Article
This paper describes two experiments on the role of audiovisual prosody for signalling and detecting meta-cognitive information in question answering. The first study consists of an experiment, in which participants are asked factual questions in a conversational setting, while they are being filmed. Statistical analyses bring to light that the speakers’ Feeling of Knowing (FOK) is cued by a number of visual and verbal properties. It appears that answers tend to have a higher number of marked auditory and visual cues, including divergences from the neutral facial expression, when the FOK score is low, while the reverse is true for non-answers. The second study is a perception experiment, in which a selection of the utterances from the first study is presented to participants in one of three conditions: vision only, sound only, or vision + sound. Results reveal that human observers can reliably distinguish high FOK responses from low FOK responses in all three conditions, but that answers are easier than non-answers, and that a bimodal presentation of the stimuli is easier than the unimodal counterparts.
Article
This paper provides an introduction to mixed-effects models for the analysis of repeated measurement data with subjects and items as crossed random effects. A worked-out example of how to use recent software for mixed-effects modeling is provided. Simulation studies illustrate the advantages offered by mixed-effects analyses compared to traditional analyses based on quasi-F tests, by-subjects analyses, combined by-subjects and by-items analyses, and random regression. Applications and possibilities across a range of domains of inquiry are discussed.
Article
Despite the potentially infinite creativity of language, many words are patterned in ordered strings called collocations. Final words of these clusters are highly predictable; in addition, their overall meaning can vary on the literality dimension, ranging from (figurative) idiomatic strings to literal strings. These structures thus offer a natural linguistic scenario to contrast ERP correlates of contextual expectation and semantic integration processes during comprehension. In this study, expected endings elicited a positive peak around 300 ms compared to less expected synonyms, suggesting that the earlier recognition of the string leads to the specific pre-activation of the lexical items that conclude the expression. On the other hand, meaning variations of these fixed strings (either a literal or a figurative whole meaning) affected ERPs only around 400 ms, i.e. in the frontal portion of the N400. These findings are discussed within a more general cognitive framework as outlined in Kok's (2001) dual categorization model.
Article
A number of perceptual features have been utilized for the characterization of the emotional state of a speaker. However, for automatic recognition suitable objective features are needed. We have examined several features of the speech signal in relation to accentuation and traces of event-related brain potentials (ERPs) during affective speech perception. Concerning the features of the speech signal we focus on measures related to breathiness and roughness. The objective measures used were an estimation of the harmonics-to-noise ratio, the glottal-to-noise excitation ratio, a measure for spectral flatness, as well as the maximum prediction gain for a speech production model computed by the mutual information function and the ERPs. Results indicate that in particular the maximum prediction gain shows a good differentiation between neutral and non-neutral emotional speaker state. This differentiation is partly comparable to the ERP results that show a differentiation of neutral, positive and negative affect. Other objective measures are more related to accentuation than to emotional state of the speaker.
Article
Speech is an important carrier of emotional information. However, little is known about how different vocal emotion expressions are recognized in a receiver's brain. We used multivariate pattern analysis of functional magnetic resonance imaging data to investigate to which degree distinct vocal emotion expressions are represented in the receiver's local brain activity patterns. Specific vocal emotion expressions are encoded in a right fronto-operculo-temporal network involving temporal regions known to subserve suprasegmental acoustic processes and a fronto-opercular region known to support emotional evaluation, and, moreover, in left temporo-cerebellar regions covering sequential processes. The right inferior frontal region, in particular, was found to differentiate distinct emotional expressions. The present analysis reveals vocal emotion to be encoded in a shared cortical network reflected by distinct brain activity patterns. These results shed new light on theoretical and empirical controversies about the perception of distinct vocal emotion expressions at the level of large-scale human brain signals. Hum Brain Mapp , 2012. © 2012 Wiley Periodicals, Inc.
Article
In traditional theories of language comprehension, syntactic and semantic processing are inextricably linked. This assumption has been challenged by the 'semantic illusion effect' found in studies using event related brain potentials. Semantically anomalous sentences did not produce the expected increase in N400 amplitude but rather one in P600 amplitude. To explain these findings, complex models have been devised in which an independent semantic processing stream can arrive at a sentence interpretation that may differ from the interpretation prescribed by the syntactic structure of the sentence. We review five such multi-stream models and argue that they do not account for the full range of relevant results because they assume that the amplitude of the N400 indexes some form of semantic integration. Based on recent evidence we argue that N400 amplitude might reflect the retrieval of lexical information from memory. On this view, the absence of an N400-effect in semantic illusion sentences can be explained in terms of priming. Furthermore, we suggest that semantic integration, which has previously been linked to the N400 component, might be reflected in the P600 instead. When combined, these functional interpretations result in a single-stream account of language processing that can explain all of the Semantic Illusion data.
Article
In social interactions, humans can express how they feel in what (verbal) they say and how (non-verbal) they say it. Although decoding of vocal emotion expressions occurs rapidly, accumulating electrophysiological evidence suggests that this process is multilayered and involves temporally and functionally distinct processing steps. Neuroimaging and lesion data confirm that these processing steps, which support emotional speech and language comprehension, are anchored in a functionally differentiated brain network. The present review on emotional speech and language processing discusses concepts and empirical clinical and neuroscientific evidence on the basis of behavioral, event-related brain potential, and functional magnetic resonance imaging data. These data allow shaping our understanding of how we communicate emotions to others through speech and language. It leads to a multistep processing model of vocal and visual emotion expressions.
Article
The ability to identify emotions from the human voice is a crucial aspect of social cognition. Currently, very little is known about the neural correlates of nonverbal emotional vocalizations processing. We used electrophysiological measures to examine the processing of emotional versus neutral vocalizations. Participants listened to nonverbal angry, happy, and neutral vocalizations, as well as to monkey voices, which served as a response target. Angry sounds were processed differently than happy and neutral ones starting at 50 ms, whereas both vocal emotions were associated with decreased N100 and increased P200 components relative to neutral sounds. These findings indicate a rapid and automatic differentiation of emotional as compared with neutral vocalizations and suggest that this differentiation is not dependent on valence.