Article

How the voice persuades

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Research has examined persuasive language, but relatively little is known about how persuasive people are when they attempt to persuade through paralanguage, or acoustic properties of speech (e.g., pitch and volume). People often detect and react against what communicators say, but might they be persuaded by speakers' attempts to modulate how they say it? Four experiments support this possibility, demonstrating that communicators engaging in paralinguistic persuasion attempts (i.e., modulating their voice to persuade) naturally use paralinguistic cues that influence perceivers' attitudes and choice. Rather than being effective because they go undetected, however, the results suggest a subtler possibility. Even when they are detected, paralinguistic attempts succeed because they make communicators seem more confident without undermining their perceived sincerity. Consequently, speakers' confident vocal demeanor persuades others by serving as a signal that they more strongly endorse the stance they take in their message. Further, we find that paralinguistic approaches to persuasion can be uniquely effective even when linguistic ones are not. A cross-study exploratory analysis and replication experiment reveal that communicators tend to speak louder and vary their volume during paralinguistic persuasion attempts, both of which signal confidence and, in turn, facilitate persuasion. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Furthermore, if interacting via voice is more demanding for consumers, it might leave less (mental) room to identify persuasive attempts. Hence, cognitive load plays an important role when examining persuasion (Berry et al., 2005;Van Zant and Berger, 2019). Therefore, this research includes perceived human-likeness and cognitive load as two alternative mechanisms that can explain the persuasiveness of virtual assistants. ...
... Examining the relationship of perceived human-likeness and persuasiveness is crucial, as virtual assistants are characterized by their social or human-like cues that can potentially make them more influential. Research on paralinguistic cues builds on the idea that voice, in comparison to text, provides more opportunities for the illustration of human traits, states and feelings in the communication (Van Zant and Berger, 2019). This line of research has found voice to influence persuasiveness via positive perceptions of these human characteristics (i.e. ...
... Thirdly, we make a theoretical contribution by including cognitive load as an alternative explanation for persuasiveness. We expected that interacting via voice is more demanding for consumers and increases cognitive load (H2; Berry et al., 2005;Van Zant and Berger, 2019), which in turn also leads them to be less likely to activate their persuasion knowledge (RQ2 and RQ3; Campbell and Kirmani, 2000). Contrary to expectations, our results show that cognitive load did not suppress, but increased persuasion knowledge. ...
Article
Purpose Virtual assistants are increasingly used for persuasive purposes, employing the different modalities of voice and text (or a combination of the two). In this study, the authors compare the persuasiveness of voice-and text-based virtual assistants. The authors argue for perceived human-likeness and cognitive load as underlying mechanisms that can explain why voice- and text-based assistants differ in their persuasive potential by suppressing the activation of consumers' persuasion knowledge. Design/methodology/approach A pre-registered online-experiment ( n = 450) implemented a text-based and two voice-based (with and without interaction history displayed in text) virtual assistants. Findings Findings show that, contrary to expectations, a text-based assistant is perceived as more human-like compared to a voice-based assistant (regardless of whether the interaction history is displayed), which in turn positively influences brand attitudes and purchase intention. The authors also find that voice as a communication modality can increase persuasion knowledge by being cognitively more demanding in comparison to text. Practical implications Simply using voice as a presumably human cue might not suffice to give virtual assistants a human-like appeal. For the development of virtual assistants, it might be beneficial to actively engage consumers to increase awareness of persuasion. Originality/value The current study adds to the emergent research stream considering virtual assistants in explicitly exploring modality differences between voice and text (and a combination of the two) and provides insights into the effects of persuasion coming from virtual assistants.
... We supplant this intuitive notion with research aligned with the two-dimensional model of affect, including theory of entrepreneurial passion (e.g., Cardon et al., 2009) and valence-arousal 2 What scholars have focused on are vocal characteristics and subjective proprieties derived from vocal characteristics (confidence, sincerity). These, in turn, have been argued to shape the persuasive influence of vocal expressions (e.g., Banse and Scherer, 1996;Burgoon et al., 1990;Juslin and Laukka, 2003;Klofstad, 2016;Lowe et al., 2017;Van Zant and Berger, 2020;Wang et al., 2021). congruence theory (e.g., Robinson et al., 2004), which we use to predict how the valence and arousal of vocal expressions shape how the pitch is received by funders. ...
... The study of these characteristics has provided evidence for their use to accurately index the valence and arousal of vocal expressions (e.g., Juslin and Scherer, 2005;Sauter et al., 2010). This has also led to the development of scale measures of positivity, negativity, confidence, and sincerity, as manifest vocally (Van Kleef et al., 2015;Van Zant and Berger, 2020;Zampetakis et al., 2017 (Frese et al., 2003;Niebuhr et al., 2017;Towler, 2003), lower pitch (frequency) promotes positive perceptions and outcomes for managers and elected officials (Klofstad, 2016;Mayew et al., 2013;Tigue et al., 2012), and specific technical vocal characteristics (e.g., tempo, frequency, loudness) shape persuasiveness (Niebuhr et al., 2017;Van Zant and Berger, 2020). ...
... The study of these characteristics has provided evidence for their use to accurately index the valence and arousal of vocal expressions (e.g., Juslin and Scherer, 2005;Sauter et al., 2010). This has also led to the development of scale measures of positivity, negativity, confidence, and sincerity, as manifest vocally (Van Kleef et al., 2015;Van Zant and Berger, 2020;Zampetakis et al., 2017 (Frese et al., 2003;Niebuhr et al., 2017;Towler, 2003), lower pitch (frequency) promotes positive perceptions and outcomes for managers and elected officials (Klofstad, 2016;Mayew et al., 2013;Tigue et al., 2012), and specific technical vocal characteristics (e.g., tempo, frequency, loudness) shape persuasiveness (Niebuhr et al., 2017;Van Zant and Berger, 2020). ...
Article
Full-text available
The voice is often the only continuous channel of expression in pitch videos. We isolate the influence of entrepreneurs' vocal expressions on funding by examining how valence (positivity/negativity) and arousal (activation) shape funders' perceptions of passion and preparedness. We show that an entrepreneur's high-arousal vocal expressions, whether positive or negative, increase perceptions of their passion. Entrepreneurs are perceived as more prepared when the valence and arousal of their vocal expressions are congruent. We test our hypotheses in the context of rewards-based crowdfunding, using both an experiment and a speech affect analysis of real-world crowdfunding pitches.
... This is known to provide a distorted picture, as requesting participants to produce communicative displays leads them to produce highly stereotypical rather than genuine displays (Juslin, Laukka, & Bänziger, 2018). At a more fundamental level, measuring prosodic displays during social interactions necessarily leads to conflating the contribution of natural expressions of confidence (i.e., a behavior naturally means X when such behavior is typically associated with X) (Grice, 1957;Wharton, 2009), and that of socially induced, deliberate self-presentation mechanisms: speakers do not only show prosodic displays automatically, they can also shape these displays pragmatically, for instance in order to persuade (Van Zant & Berger, 2019) or to appear more dominant (Cheng, Tracy, Ho, & Henrich, 2016). Thus, past research leaves open the question of whether epistemic prosody is only displayed when the speaker has a communicative intention, or whether it is constitutively (or naturally) associated with confidence. ...
... Second, typical approaches to this question do not allow discriminating the respective influence of sensory evidence, accuracy and confidence on prosody, because typically the impact of these distinct variables are not measured separately (Dijkstra, Krahmer, & Swerts, 2006;Jiang, Gossack-Keenan, & Pell, 2020;Jiang & Pell, 2016Kimble & Seidel, 1991;Van Zant & Berger, 2019). Thus, it remains unknown what exact psychological variable these prosodic manifestations reflect: do they reflect competence (how accurate speakers actually are), or do they genuinely reveal subjective feelings of confidence (how accurate speakers think they are), thus being akin to non-verbal variants of linguistic expressions such as "I don't know"? ...
... Finally, the fact that such epistemic prosodic markers were observed in the absence of an audience is consistent with past research (Kimble & Seidel, 1991), and shows that they are manifested constitutively and automatically as a function of the speaker's level of confidence and accuracy: i.e., they constitute natural signs of confidence and competence. Importantly, this is not to say that these displays are never under voluntary control: humans can obviously control the pitch, duration and volume of their voice, making it possible to deliberately use prosodic displays as "social tools" during conversation (Crivelli & Fridlund, 2018;Van Zant & Berger, 2019;Wharton, 2009) and past research has shown that, indeed, similar prosodic signatures as the ones we find here are exploited during communicative interactions: listeners perceive them to infer confidence and honesty in their partners (Goupil et al., 2021;Jiang & Pell, 2017), and speakers manipulate them in order to persuade their interlocutors (Van Zant & Berger, 2019). Thus, it will be important to extend our psychophysical approach to social interactions in future work, for instance by relying on dyadic collective decision-making paradigms (Bahrami et al., 2010;Fusaroli et al., 2012;Pescetelli & Yeung, 2020), in order to examine how specific social settings -such as the fact that speakers are engaged in cooperative or competitive interactions -impact how they display these prosodic signatures. ...
Article
Whether speech prosody truly and naturally reflects a speaker's subjective confidence, rather than other dimensions such as objective accuracy, is unclear. Here, using a new approach combing psychophysics with acoustic analysis and automatic classification of verbal reports, we tease apart the contributions of sensory evidence, accuracy, and subjective confidence to speech prosody. We find that subjective confidence and objective accuracy are distinctly reflected in the loudness, duration and intonation of verbal reports. Strikingly, we show that a speaker's accuracy is encoded in speech prosody beyond their own metacognitive awareness, and that it can be automatically decoded from this information alone with performances up to 60%. These findings demonstrate that confidence and accuracy have separable prosodic signatures that are manifested with different timings, and on different acoustic dimensions. Thus, both subjective mental states of confidence, and objective states related to competence, can be directly inferred from this natural behavior.
... Consequently, other streams of research have focused on paralinguistic markers and provided some evidence that listeners can infer speakers' levels of certainty from the insertion of pauses (i.e., hesitations) or fillers (e.g., "huum"), dedicated gestures (i.e., flipping palms or shrugging), and specific prosodic signatures 9,13-15 . Research in this field typically involves elicitation procedures comprising two phases 9,13,14,16 . First, encoders (trained actors 14,17 or speakers in a semi-naturalistic setting 13,16 ) are recorded while expressing utterances with various levels of certainty. ...
... Research in this field typically involves elicitation procedures comprising two phases 9,13,14,16 . First, encoders (trained actors 14,17 or speakers in a semi-naturalistic setting 13,16 ) are recorded while expressing utterances with various levels of certainty. In a second phase, acoustic analysis of these recordings is performed, and listeners are asked to recover the degree of certainty expressed by the speakers. ...
... In a second phase, acoustic analysis of these recordings is performed, and listeners are asked to recover the degree of certainty expressed by the speakers. Acoustic analyses of these recordings typically reveal that speakers' uncertainty is associated with decreased volume and rising intonation 9,13,14,18 , and to a lesser extent higher 14 (yet also sometimes lower 16 ) mean pitch as well as slower 13,14,16,19 (yet also sometimes faster 18 ) speech rate. ...
Article
Full-text available
The success of human cooperation crucially depends on mechanisms enabling individuals to detect unreliability in their conspecifics. Yet, how such epistemic vigilance is achieved from naturalistic sensory inputs remains unclear. Here we show that listeners’ perceptions of the certainty and honesty of other speakers from their speech are based on a common prosodic signature. Using a data-driven method, we separately decode the prosodic features driving listeners’ perceptions of a speaker’s certainty and honesty across pitch, duration and loudness. We find that these two kinds of judgments rely on a common prosodic signature that is perceived independently from individuals’ conceptual knowledge and native language. Finally, we show that listeners extract this prosodic signature automatically, and that this impacts the way they memorize spoken words. These findings shed light on a unique auditory adaptation that enables human listeners to quickly detect and react to unreliability during linguistic interactions.
... Much of the information conveyed by speech is transmitted through paralinguistic cues encoded in the audio signal. These paralinguistic cues are essential to the communication of emotions, intentions, and personality [1][2][3]. For the majority of our daily interactions, these paralinguistic cues are embedded among phonetic cues encoding linguistic meaning. ...
... Another method is to discard phonetic cues by separating the vocal signal into two parts, the signal representing the laryngeal source, and the signal representing the supralaryngeal filter. Whereas the filter is more typically associated with linguistic articulation [8][9][10], the source is more associated with paralinguistic features that are essential to affect, such as voice pitch, breathiness, roughness, and other varieties of voice quality [1,3,[11][12][13]. The most common method for separating the vocal source signal e(t) from the vocal tract impulse response h(t) is linear predictive coding (LPC) [14]. ...
Preprint
Full-text available
In this paper, we propose a method for removing linguistic information from speech for the purpose of isolating paralinguistic indicators of affect. The immediate utility of this method lies in clinical tests of sensitivity to vocal affect that are not confounded by language, which is impaired in a variety of clinical populations. The method is based on simultaneous recordings of speech audio and electroglottographic (EGG) signals. The speech audio signal is used to estimate the average vocal tract filter response and amplitude envelop. The EGG signal supplies a direct correlate of voice source activity that is mostly independent of phonetic articulation. These signals are used to create a third signal designed to capture as much paralinguistic information from the vocal production system as possible -- maximizing the retention of bioacoustic cues to affect -- while eliminating phonetic cues to verbal meaning. To evaluate the success of this method, we studied the perception of corresponding speech audio and transformed EGG signals in an affect rating experiment with online listeners. The results show a high degree of similarity in the perceived affect of matched signals, indicating that our method is effective.
... Applicant ratings during virtual interviews may reflect teleconferencing conditions to some degree, in addition to traditional measures of applicant quality such as merit, character, and interpersonal communication skills. Prior studies have demonstrated that people infer personal attributes through nonverbal cues [15,21,22], with vocal pitch and amplitude affecting perceptions of speaker confidence [23,24], and poor eye contact and halting speech resulting in impressions of lower intelligence [16]. However, during virtual interviews, faulty equipment or unstable internet can distort these cues and influence interviewer impressions of applicants. ...
... Lastly, despite controlling for non-White race, gender, video duration, and teleconferencing conditions, mock applicants received significantly different ratings under control conditions, suggesting that some intrinsic applicant features affected interviewer impressions. Unmeasured variables such as vocal inflection, timbre, and appearance [15,23,24] of applicants may have influenced how they were perceived. Facial expression or grooming may have contributed to baseline rating discrepancies. ...
Article
PurposeThe objective of this study was to assess how teleconferencing variables influence faculty impressions of mock residency applicants.Methods In October 2020, we conducted an online experiment studying five teleconferencing variables: background, lighting, eye contact, internet connectivity, and audio quality. We created interview videos of three mock residency applicants and systematically modified variables in control and intervention conditions. Faculty viewed the videos and rated their immediate impression on a 1–10 scale. The effect of each variable was measured as the mean difference between the intervention and control impression ratings. One-way analysis of variance (ANOVA) was performed to assess whether ratings varied across applicants. Paired-samples Wilcoxon signed-rank tests were conducted to assess the significance of the effect of each variable.ResultsOf 711 faculty members who were emailed a link to the experiment, 97 participated (13.6%). The mean ratings for control videos were 8.1, 7.2, and 7.6 (P < .01). Videos with backlighting, off-center eye contact, choppy internet connectivity, or muffled audio quality had lower ratings when compared with control videos (P < .01). There was no rating difference between home and conference room backgrounds (P = .77). Many faculty participants reported that their immediate impressions were very much or extremely influenced by audio quality (60%), eye contact (57%), and internet connectivity (49%).Conclusions Teleconferencing variables may serve as a source of assessment bias during residency interviews. Mock residency applicants received significantly lower ratings when they had off-center eye contact, muffled audio, or choppy internet connectivity, compared to optimal teleconferencing conditions.
... In the first group of study, speakers were instructed experienced to utter sentences in a confident vs. unconfident way, after which acoustic analysis was performed by measuring different prosodic characteristics of the speaker's voice based on which level of confidence the speakers' intended. The results showed that speakers often spoke with a higher pitch and at a greater intensity when they intended to be confident (Scherer et al., 1973;Van Zant and Berger, 2020). In a second set of works, the same group of vocal stimuli was judged on speaker confidence by an independent group of listeners, and the acoustic analysis was performed based on the regrouping of the stimuli according to the listener's perception. ...
Article
Full-text available
IntroductionWuxi dialect is a variation of Wu dialect spoken in eastern China and is characterized by a rich tonal system. Compared with standard Mandarin speakers, those of Wuxi dialect as their mother tongue can be more efficient in varying vocal cues to encode communicative meanings in speech communication. While literature has demonstrated that speakers encode high vs. low confidence in global prosodic cues at the sentence level, it is unknown how speakers’ intended confidence is encoded at a more local, phonetic level. This study aimed to explore the effects of speakers’ intended confidence on both prosodic and formant features of vowels in two lexical tones (the flat tone and the contour tone) of Wuxi dialect.Methods Words of a single vowel were spoken in confident, unconfident, or neutral tone of voice by native Wuxi dialect speakers using a standard elicitation procedure. Linear-mixed effects modeling and parametric bootstrapping testing were performed.ResultsThe results showed that (1) the speakers raised both F1 and F2 in the confident level (compared with the neutral-intending expression). Additionally, F1 can distinguish between the confident and unconfident expressions; (2) Compared with the neutral-intending expression, the speakers raised mean f0, had a greater variation of f0 and prolonged pronunciation time in the unconfident level while they raised mean intensity, had a greater variation of intensity and prolonged pronunciation time in the confident level. (3) The speakers modulated mean f0 and mean intensity to a larger extent on the flat tone than the contour tone to differentiate between levels of confidence in the voice, while they modulated f0 and intensity range more only on the contour tone.DiscussionThese findings shed new light on the mechanisms of segmental and suprasegmental encoding of speaker confidence and lack of confidence at the vowel level, highlighting the interplay of lexical tone and vocal expression in speech communication.
... human-resources/?lang=en). The tuning of certain voice parameters can be coached to induce positive impressions, eg by avoiding a slow speech rate and high pitch (indicating stress 24 ) and using low pitch (leadership 25 ) and a loud voice with large volume variations (confidence 26 ). ...
Article
Objectives: To investigate the impact of standardized mobile phone recordings passed through a telecom channel on acoustic markers of voice quality and on its perception by voice experts in normophonic speakers. Methods: Continuous speech and a sustained vowel were recorded for fourteen female and ten male normophonic speakers. The recordings were done simultaneously with a head-mounted high-quality microphone and through the telephone network on a receiving smartphone. Twenty-two acoustic voice quality, breathiness and pitch-related measures were extracted from the recordings. Nine vocologists perceptually rated the G, R and B parameters of the GRBAS scale on each voice sample. The reproducibility, the recording type, the stimulus type and the gender effects, as well as the correlation between acoustic and perceptual measures were investigated. Results: The sustained vowel samples are damped after one second. Only the frequencies between 100 and 3700Hz are passed through the telecom channel and the frequency response is characterized by peaks and troughs. The acoustic measures show a good reproducibility over the three repetitions. All measures significantly differ between the recording types, except for the local jitter, the harmonics-to-noise ratio by Dejonckere and Lebacq, the period standard deviation and all six pitch measures. The AVQI score is higher in telephone recordings, while the ABI score is lower. Significant differences between genders are also found for most of the measures; while the AVQI is similar in men and women, the ABI is higher in women in both recording types. For the perceptual assessment, the interrater agreement is rather low, while the reproducibility over the three repetitions is good. Few significant differences between recording types are observed, except for lower breathiness ratings on telephone recordings. G ratings are significantly more severe on the sustained vowel on both recording types, R ratings only on telephone recordings. While roughness is rated higher in men on telephone recordings by most experts, no gender effect is observed for breathiness on either recording types. Finally, neither the AVQI nor the ABI yield strong correlations with any of the perceptual parameters. Conclusions: Our results show that passing a voice signal through a telecom channel induces filter and noise effects that limit the use of common acoustic voice quality measures and indexes. The AVQI and ABI are both significantly impacted by the recording type. The most reliable acoustic measures seem to be pitch perturbation (local jitter and period standard deviation) as well as the harmonics-to-noise ratio from Dejonckere and Lebacq. Our results also underline that raters are not equally sensitive to the various factors, including the recording type, the stimulus type and the gender effects. Neither of the three perceptual parameters G, R and B seem to be reliably measurable on telephone recordings using the two investigated acoustic indexes. Future studies investigating the impact of voice quality in telephone conversations should thus focus on acoustic measures on continuous speech samples that are limited to the frequency response of the telecom channel and that are not too sensitive to environmental and additive noise.
... When funders perceive the entrepreneur's behaviors as being driven by impression management motives, this perception will engender negative reactions. By demonstrating the potential risk of displaying enthusiastic expressions, our research shows the possibility that this risk can also arise when entrepreneurs use other types of nonverbal or verbal behaviors for managing impressions, such as high energetic vocal tones (van Zant & Berger, 2020) and verbal self-promotion (e.g., . Thus, future research should examine which types of nonverbal or verbal behaviors can more easily trigger observers' perceptions of impression management motives, thereby triggering a negative pathway like the one we found in this paper. ...
Article
Full-text available
Displaying enthusiasm (an emotional manifestation of passion) is a common practice for entrepreneurs to attract crowdfunding. However, we propose that funders may attribute an entrepreneur’s displayed enthusiasm to impression management motives, which can in turn reduce their funding intentions. Moreover, this negative pathway is more likely to occur when the entrepreneur is perceived to have lower domain expertise. We found consistent support for these hypotheses from a survey and an experiment. Our findings suggest that displaying enthusiasm may not always be effective for entrepreneurs because there are both positive and negative pathways underlying the influence of displayed enthusiasm on funders.
... Importantly, while we are interested in attitude emotionality, this is not the only way expression mode might impact word of mouth recipients. Speaking also often involves less formal language, more words produced, paralinguistic cues, and other aspects (Chafe and Tannen 1987;Schroeder and Epley 2015;Van Zant and Berger 2020), all of which might independently impact observer attitudes. While an in-depth investigation into how modality impacts word of mouth recipients is beyond the scope of this article, we simply examine whether, by leading consumers to express less emotional attitudes, communicating attitudes through writing can reduce persuasion. ...
Article
Full-text available
Consumers often communicate their attitudes and opinions with others, and such word of mouth has an important impact on what others think, buy, and do. But might the way consumers communicate their attitudes (i.e., through speaking or writing) shape the attitudes they express? And, as a result, the impact of what they share? While a great deal of research has begun to examine drivers of word of mouth, there has been less attention to how communication modality might shape sharing. Six studies, conducted in the laboratory and field, demonstrate that compared to speaking, writing leads consumers to express less emotional attitudes. The effect is driven by deliberation. Writing offers more time to deliberate about what to say, which reduces emotionality. The studies also demonstrate a downstream consequence of this effect: by shaping the attitudes expressed, the modality consumers communicate through can influence the impact of their communication. This work sheds light on word of mouth, effects of communication modality, and the role of language in communication.
... During verbal communication, the matching of voice and intonation not only helps create emotional resonance between the communicators, but also facilitates the educational content accepted and internalized. Several studies and experiments have also demonstrated that paralinguistic persuasion attempts (i.e., modulating the voice for persuasion) can be effective in influencing perceivers' attitudes and choices during the persuasion process [84]. The literatures have reported that rhetorical devices make speech more appealing, enhance its expressive effect, guarantee the effective transmission of semantic messages and improve the effectiveness of persuasion and education [83,85]. ...
Article
Full-text available
Background Telehealth and online health information provide patients with increased access to healthcare services and health information in chronic disease management of older patients with chronic diseases, addressing the challenge of inadequate health resources and promoting active and informed participation of older patients in chronic disease management. There are few qualitative studies on the application of telehealth and online health information to chronic disease management in older patients. Chronic obstructive pulmonary disease is one of the most common chronic diseases in older adults. Telehealth is widely used in the management of chronic obstructive pulmonary disease. The purpose of this study was to explore the perceptions and experiences of older patients and healthcare providers in the application of telehealth and online health information to chronic disease management of chronic obstructive pulmonary disease. Methods A qualitative descriptive study with data generated from 52 individual semi-structured interviews with 29 patients [Law of the People’s Republic of China on the protection of the rights and interests of older people (2018 Revised Version) = >60 years old] with chronic obstructive pulmonary disease and 23 healthcare providers. The inductive thematic analysis method was used for data analysis. Results Four themes and 16 sub-themes were identified in this study. Four themes included: faced with a vast amount of online health information, essential competencies and personality traits ensuring older patients’ participation and sustained use, user experience with the use of technology, being in a complex social context. Conclusion The ability of patients to understand health information should be fully considered while facilitating access to online health information for older patients. The role of health responsibility and user experience in older patients’ participation and sustained use of telehealth and online health information needs to be emphasised. In addition, the complex social context is a determining factor to be considered, particularly the complex impact of a reliance on offspring and social prejudice on the behaviour of older adults using telehealth and online health information.
... Visual and auditory modalities may convey the same information as text, or something different. Tools like Praat (Boersma & Weenink, 2018) can be used to extract pitch and tone from audio files (e.g., Van Zant & Berger, 2020) and research has started to use computer vision to extract features from images (Li & Xie, 2020;Zhang et al., 2017). While there has been less work in these areas than in text analysis, emerging approaches will hopefully enable better analysis of these important information channels. ...
Article
Language can provide important insights into people, and culture more generally. Further, the digitization of information has made more and more textual data available. But by itself, all that data are just that: data. Realizing its potential requires turning that data into insight. We suggest that automated text analysis can help. Recent advances have provided novel and increasingly accessible ways to extract insight from text. While some psychologists may be familiar with dictionary methods, fewer may be aware of approaches like topic modeling, word embeddings, and more advanced neural network language models. This article provides an overview of natural language processing and how it can be used to deepen understanding of people and culture. We outline the dual role of language (i.e., reflecting things about producers and impacting audiences), review some useful text analysis methods, and discuss how these approaches can help unlock a range of interesting questions. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
... For instance, perceptions are made about features and traits such as attractiveness (Apicella & Feinberg, 2009;Borkowska & Pawlowski, 2011;Bruckert et al., 2006;Feinberg et al., 2005;Hughes et al., 2010;Pisanski & Rendall, 2011;Tigue, 2014) and warmth and likeability (Baus et al., 2019;McAleer et al., 2014). Listeners can even utilize minimal acoustic information, such as low or high volume, to extract information about a speaker's aggressiveness or dominance (Armstrong et al., 2019;Guyer et al., 2018;Klofstad et al., 2015;Knowles, 2010;Peschard et al., 2018;Van Zant & Berger, 2020), trustworthiness (O'Connor & Barclay, 2017), persuasiveness and popularity (Ballew & Todorov, 2007;Klofstad, 2016), physical characteristics (Pisanski & Rendall, 2011), and nationality (Gallois & Callan, 1981). ...
Article
Purpose Voice and speech are rich with information about a speaker's personality and other features of identity. This study seeks to determine the extent to which listeners agree about speakers' social, physical, and personality attributes. Method Two experiments were conducted. In the first experiment, listeners rated a group of speakers who were unbalanced for sex and personality traits. The second experiment elaborated on the first by ensuring the speaker set was balanced for sex and personality traits. Both experiments played standard speech samples from speakers who provided personality information via the Multidimensional Personality Questionnaire–Brief Form. Groups of listeners rated each speaker on the same personality traits and other features of identity. Responses were analyzed for listener agreement. Results For both experiments, listeners showed consistently high levels of agreement on the personality attributes of a speaker. For certain speakers, listener agreement on some personality traits was as high as 92% and 97% in Experiments 1 and 2, respectively. Furthermore, a range of agreement across personality subscales was observed across speakers such that some were agreed-upon across all personality ratings and others were agreed-upon only for a few personality traits. Conclusions When it comes to judging personality traits and other features of identity, most listeners might not be “correct” about speakers' traits and attributes, but they broadly agree about how the listener sounds. Some speakers send more salient voice and speech cues that drive agreement about their personality, whereas others speak in a manner that precludes consensus. Supplemental Material https://doi.org/10.23641/asha.16906990
... Speaking Activity is widely studied during social interactions, including inter-speaker influence [23], paralinguistic persuasion [94], but also in the context of interaction mediation systems that try to improve collaboration [49]. Furthermore, active speech is closely correlated with non-verbal features like gestures [33] and eye contact [41,48,78]. ...
... Indeed, past research indicates that voices perceived as being high (vs. low) in confidence are often more persuasive, for a variety of reasons Van Zant & Berger, 2020). ...
Article
Full-text available
Past research has largely focused on how emotional expressions provide information about the speaker’s emotional state, but has generally neglected vocal affect’s influence over communication effectiveness. This is surprising given that other nonverbal behaviors often influence communication between individuals. In the present theory paper, we develop a novel perspective called the Contextual Influences of Vocal Affect (CIVA) model to predict and explain the psychological processes by which vocal affect may influence communication through three broad categories of process: emotion origin/construal, changing emotions, and communication source inferences. We describe research that explores potential moderators (e.g., affective/cognitive message types, message intensity), and mechanisms (e.g., emotional assimilation, attributions, surprise) shaping the effects of vocally expressed emotions on communication. We discuss when and why emotions expressed through the voice can influence the effectiveness of communication. CIVA advances theoretical and applied psychology by providing a clear theoretical account of vocal affect’s diverse impacts on communication.
... In both studies, the ascription of masculine and feminine attributes as well as ratings of the speaker's competence and likability were the dependent variables, and participants' own gender identity was included as a potential moderator. Unlike previous studies that used experimental variations in the pronunciation of letters and words to examine the effects on impressions about the speaker (e.g., Borkowska & Pawlowski, 2011;Tsantani et al., 2016), our studies join a small body of research that manipulated voice pitch in longer speech samples of semantically meaningful text (e.g., Klofstad et al., 2012;van Zant & Berger, 2020). ...
Article
Two experiments examined the impact of voice pitch on gender stereotyping. Participants listened to a text read by a female (Study 1; N = 171) or male (Study 2, N = 151) speaker, whose voice pitch was manipulated to be high or low. They rated the speaker on positive and negative facets of masculinity and femininity, competence, and likability. They also indicated their own gendered self-concept. High pitch was associated with the ascription of more feminine traits and greater likability. The high-pitch female speaker was rated as less competent, and the high-pitch male speaker was perceived as less masculine. Text content and participants’ gendered self-concept did not moderate the pitch effect. The findings underline the importance of voice pitch for impression formation.
... However, recent studies suggest that prosody alone can reveal the intentions of a speaker in a powerful manner (Hellbernd and Sammler, 2016;Caballero et al., 2018;Truesdale and Pell, 2018). For example, in motivating and persuasive speech, prosody can "tag" verbal information as important and increase the persuasiveness of a speaker even when the verbal information is not credible (Zougkou et al., 2017;Van Zant and Berger, 2020). Prosody is thus an important emotive and persuasive device in low-involvement communicative situations (Gelinas-Chebat and Chebat, 1992), which is often the case of third-party complaints (Alicke et al., 1992;Boxer, 1993). ...
Article
Full-text available
Emotive speech is a social act in which a speaker displays emotional signals with a specific intention; in the case of third-party complaints, this intention is to elicit empathy in the listener. The present study assessed how the emotivity of complaints was perceived in various conditions. Participants listened to short statements describing painful or neutral situations, spoken with a complaining or neutral prosody, and evaluated how complaining the speaker sounded. In addition to manipulating features of the message, social-affiliative factors which could influence complaint perception were varied by adopting a cross-cultural design: participants were either Québécois (French Canadian) or French and listened to utterances expressed by both cultural groups. The presence of a complaining tone of voice had the largest effect on participant evaluations, while the nature of statements had a significant, but smaller influence. Marginal effects of culture on explicit evaluation of complaints were found. A multiple mediation analysis suggested that mean fundamental frequency was the main prosodic signal that participants relied on to detect complaints, though most of the prosody effect could not be linearly explained by acoustic parameters. These results highlight a tacit agreement between speaker and listener: what characterizes a complaint is how it is said (i.e., the tone of voice), more than what it is about or who produces it. More generally, the study emphasizes the central importance of prosody in expressive speech acts such as complaints, which are designed to strengthen social bonds and supportive responses in interactive behavior. This intentional and interpersonal aspect in the communication of emotions needs to be further considered in research on affect and communication.
... While the present study focused on verbal aspects of female communication, oral presentations come with a variety of other observable behaviors. Paralinguistic cues, such as pitch, or volume may affect persuasion [80], and future research will have to determine how stylistic features of female language in interaction with paralinguistic cues might predict impact. Future research will also be required to exactly understand in which contexts (e.g.; oral versus written, academic versus non-academic) and for whom (e.g.; junior versus senior professionals) female-typical language holds the persuasive potential we observed in TED Talks. ...
Article
Full-text available
The huge power for social influence of digital media may come with the risk of intensifying common societal biases, such as gender and age stereotypes. Speaker’s gender and age also behaviorally manifest in language use, and language may be a powerful tool to shape impact. The present study took the example of TED, a highly successful knowledge dissemination platform, to study online influence. Our goal was to investigate how gender- and age-linked language styles–beyond chronological age and identified gender–link to talk impact and whether this reflects gender and age stereotypes. In a pre-registered study, we collected transcripts of TED Talks along with their impact measures, i.e., views and ratios of positive and negative talk ratings, from the TED website. We scored TED Speakers’ (N = 1,095) language with gender- and age-morphed language metrics to obtain measures of female versus male, and younger versus more senior language styles. Contrary to our expectations and to the literature on gender stereotypes, more female language was linked to higher impact in terms of quantity, i.e., more talk views, and this was particularly the case among talks with a lot of views. Regarding quality of impact, language signatures of gender and age predicted different types of positive and negative ratings above and beyond main effects of speaker’s gender and age. The differences in ratings seem to reflect common stereotype contents of warmth (e.g., “beautiful” for female, “courageous” for female and senior language) versus competence (e.g., “ingenious”, “informative” for male language). The results shed light on how verbal behavior may contribute to stereotypical evaluations. They also illuminate how, within new digital social contexts, female language might be uniquely rewarded and, thereby, an underappreciated but highly effective tool for social influence.
... So far, empirical work suggests that three-step models can be usefully applied to situations in which the listener forms a mental representation of speaker (un)certainty or (un)commitment in the context of particular speech acts, such as statements of fact, opinion, or intentions. This research creates a starting point for broader work that examines how vocal expressions of confidence contribute to social cognitive processes related to competence, persuasion, and trust (e.g., Caballero & Pell, 2020;McAleer et al., 2014;van Zant & Berger, 2019), and which explore their neural underpinnings (e.g., Hellbrand & Sammler, 2018). Examining other communicative situations in which vocally expressed confidence is used as a pragmatic devicefor example, in persuasive communication and marketing, to convince people of untruths in the political arena, or using cues of uncertainty solely to convey politeness, etc.-is an especially promising research area to explore further. ...
Preprint
Neurocognitive models (e.g., Schirmer & Kotz, 2006) have helped to characterize how listeners incrementally derive meaning from vocal expressions of emotion in spoken language, what neural mechanisms are involved at different processing stages, and their relative time course. But how can these insights be applied to communicative situations in which prosody serves a predominantly interpersonal function? This comment examines recent data highlighting the dynamic interplay of prosody and language, when vocal attributes serve the sociopragmatic goals of the speaker or reveal interpersonal information that listeners use to construct a mental representation of what is being communicated. Our comment serves as a beacon to researchers interested in how the neurocognitive system "makes sense" of socioemotive aspects of prosody.
... Paralinguistic cues such as intonation, pitch, or volume also can contribute to persuasion for speech, while these cues are necessarily absent for written text (van Zant and Berger 2020). The device used to listen (headphones vs. loudspeaker) can also affect feelings of closeness toward the speaker (Lieberman, Schroeder, and Amir 2022). ...
Preprint
Full-text available
Voice assistants often present choices where consumers listen to product options. But do consumers process information differently when listening compared to reading? Bridging theories on evaluability and memory, six experiments, including one conducted in consumers’ homes on Alexa voice speakers, demonstrate that consumers listening to speech utilize higher-evaluability product information (which can be understood without making comparisons to other options) to guide their judgments and choices relatively more than consumers reading the same text. A difference in memory drives this tendency. This is because (1) due to its ephemeral nature, processing speech requires greater reliance on memory and (2) information higher in evaluability is more easily remembered. Thus, higher-evaluability information is likely to be remembered regardless of presentation mode (speech vs. text), while memory for lower-evaluability information is likely to favor text, leading to the observed effect. The findings speak to the evaluability, memory, and auditory information processing literatures, and underscore that marketing managers presenting choices via speech will do well to highlight favorable highly-evaluable information about products such as recommendations, sales ranks, or descriptions such as “like new.” Substantively, a new format for presenting information is demonstrated which may improve voice-based sales.
Article
This study investigates how voice characteristics (i.e., speech rate, loudness, pitch) affect interpretation purchases in digital interpretation platforms through the lenses of signaling theory and nonverbal communication. Based on auditory and transactional data from a leading digital interpretation platform in China, this study uses voice mining techniques to extract voice characteristics and examine their effects on interpretation purchases. Findings demonstrate the significant positive effects of speech rate and the significant inverted U-effects of loudness and pitch on interpretation purchases. This study thus extends tourism interpretation research focused on traditional forms to digital interpretation platforms and provides empirical evidence that nonverbal signals (voice characteristics) matter in tourism interpretation purchases. Findings also offer practical implications for tourism interpretation innovation and platform operation.
Article
Full-text available
In two studies, we examined if correct and incorrect testimony statements were produced with vocally distinct characteristics. Participants watched a staged crime film and were interviewed as eyewitnesses. Witness responses were recorded and then analysed along 16 vocal dimensions. Results from Study 1 showed six vocal characteristics of accuracy, which included dimensions of frequency, energy, spectral balance and temporality. Study 2 attempted to replicate Study 1, and also examined effects of emotion on the vocal characteristic-accuracy relationship. Although the results from Study 1 were not directly replicated in Study 2, a mega-analysis of the two datasets showed four distinct vocal characteristics of accuracy; correct responses were uttered with a higher pitch (F0 [M]), greater energy in the first formant region (F1 [amp]), higher speech rate (VoicedSegPerSec) and shorter pauses (UnvoicedSegM). Taken together, this study advances previous knowledge by showing that accuracy is not only indicated by what we say, but also by how we say it.
Article
Language plays a fundamental role in every aspect of life. But only recently has research begun to understand the role of language in consumer behavior. This paper offers an integrative discussion of research on the language of consumer psychology. We review some of the main areas of inquiry and discuss some key methodological approaches (e.g., automated textual analysis) that have been crucial to the area's development. Further, we outline some broad issues and opportunities in the space and highlight potential directions for future research. We hope to encourage more consumer psychologists to consider the great potential in producing new conceptual and substantive wisdom from words.
Article
We explore the effect of recommendation modality on recommendation adherence. Results from five experiments run on various online platforms ( N = 6,103 adults from TurkPrime and Prolific) show that people are more likely to adhere to recommendations that they hear (auditory) than recommendations that they read (visual). This effect persists regardless of whether the auditory recommendation is spoken by a human voice or an automated voice and holds for hypothetical and consequential choices. We show that the effect is in part driven by the relative need for closure—manifested in a sense of urgency—that is evoked by the ephemerality of auditory messages. This work suggests that differences in the physical properties of auditory and visual modalities can lead to meaningful psychological and behavioral consequences.
Article
Full-text available
Brands and consumers alike have become creators and distributors of digital words, thus generating increasing interest in insights to be gained from text-based content. This work develops an algorithm to identify textual paralanguage, which are nonverbal parts of speech expressed in online communication. The paralanguage classifier (PARA) is developed and validated utilizing social media data from Twitter, YouTube, and Instagram (N = 1,241,489 posts). Based in auditory, tactile, and visual properties of text, this tool detects nonverbal communication cues, aspects of text often neglected by other word-based sentiment lexica. This work is the first to reveal the importance of textual paralanguage as a critical indicator of sentiment valence and intensity. Automatically detected textual paralanguage is further demonstrated to predict consumer engagement over and above existing text analytic tools. The algorithm is designed for researchers, scholars, and practitioners seeking to optimize marketing communications and offers a methodological advancement to quantify the importance of not only what is said verbally, but how it is said nonverbally.
Article
Full-text available
Voice quality, or type of phonation (e.g., a whispery voice) can prime specific sensory associations amongst consumers. In the realm of sensory and consumer science, a wide range of taste-sound correspondences have been documented. A growing body of research on crossmodal correspondences has revealed that people reliably associate sounds with basic taste qualities. Here, we examined the largely unexplored associations between basic tastes and sounds: namely taste-voice quality correspondences. Across three pre-registered studies, participants associated four types of voice qualities (modal, whispery, creaky, and falsetto) with the five basic tastes (sweet, sour, salty, bitter, and umami). Study 1 investigated the relations between voice qualities and taste words. Study 2 attempted to replicate the findings and revealed the underpinning psychological mechanisms in terms of semantic/emotional associations. Study 3 used the descriptions of food products that varied in terms of their taste in order to expand the applicability of the findings. The results demonstrated that participants reliably associate specific voice qualities with particular tastes. Falsetto voices are matched more strongly with sweetness than other voices. Creaky voices are matched more strongly with bitterness than with other voice qualities. Modal voices are matched more strongly with umami than creaky voices. Evaluation/positive valence might partially underlie the associations between sweet/bitter-voice quality correspondences. Taken together, these findings reveal a novel case of sound-taste correspondences and deepen our understanding of how people are able to associate attributes from different senses.
Article
Full-text available
This article unpacks the basic mechanisms by which paralinguistic features communicated through the voice can affect evaluative judgments and persuasion. Special emphasis is placed on exploring the rapidly emerging literature on vocal features linked to appraisals of confidence (e.g., vocal pitch, intonation, speech rate, loudness, etc.), and their subsequent impact on information processing and meta-cognitive processes of attitude change. The main goal of this review is to advance understanding of the different psychological processes by which paralinguistic markers of confidence can affect attitude change, specifying the conditions under which they are more likely to operate. In sum, we highlight the importance of considering basic mechanisms of attitude change to predict when and why appraisals of paralinguistic markers of confidence can lead to more or less persuasion.
Article
Past research has uncovered actions that would seem to undermine but in fact frequently enhance persuasion. For example, expressing doubt about one’s view or presenting arguments against it would seem to weaken one’s case, but can sometimes promote it. We propose a framework for understanding these findings. We posit that these actions constitute acts of receptiveness—behaviors that signal openness to new information and opposing viewpoints. We review four classes of acts of receptiveness: conveying uncertainty, acknowledging mistakes, highlighting drawbacks, and asking questions. We identify conditions under which and mechanisms through which these actions boost persuasion. Acts of receptiveness appear to be more persuasive when they come from expert or high-status sources, rather than non-expert or low-status sources, and to operate through two primary mechanisms: increased involvement and enhanced source perceptions. Following a review of this work, we delineate potentially novel acts of receptiveness and outline directions for future research.
Article
Full-text available
Persuasion success is often related to hard-to-measure characteristics, such as the way the persuader speaks. To examine how vocal tones impact persuasion in an online appeal, this research measures persuaders’ vocal tones in Kickstarter video pitches using novel audio mining technology. Connecting vocal tone dimensions with real-world funding outcomes offers insight into the impact of vocal tones on receivers’ actions. The core hypothesis of this paper is that a successful persuasion attempt is associated with vocal tones denoting (1) focus, (2) low stress, and (3) stable emotions. These three vocal tone dimensions—which are in line with the stereotype content model—matter because they allow receivers to make inferences about a persuader’s competence. The hypotheses are tested with a large-scale empirical study using Kickstarter data, which is then replicated in a different category. In addition, two controlled experiments provide evidence that perceptions of competence mediate the impact of the three vocal tones on persuasion attempt success. The results identify key indicators of persuasion attempt success and suggest a greater role for audio mining in academic consumer research.
Article
Full-text available
Contributions: Brunswik Society Newsletter 2019
Chapter
Nach einer Definition des Konstrukts Einstellung wird auf die kognitive und motivationale Funktion von Einstellungen eingegangen. Danach schließen sich Abschnitte zur Einstellungsbildung (genetische Faktoren, Lernprozesse, Selbstwahrnehmungsprozesse und Mere Exposure) und Einstellungsänderung an. In letzterem wird vor allem auf das Streben nach Konsistenz bzw. kognitiver Dissonanz, Strategien zur Reduktion von Dissonanz und auf Persuasion, d. h. die bewusste Einflussnahme, um Einstellungen zu ändern, eingegangen. Anschließend werden Wege vorgestellt, wie Resistenz gegenüber Einstellungsänderungen aufgebaut werden kann. Das Kapitel endet mit einem Abschnitt zum Zusammenhang von Einstellungen und Verhalten sowie einem Überblick über direkte und indirekte Messmethoden für Einstellungen.
Article
This paper reviews the nonverbal display of power, status, and dominance (PSDom). While PSDom are theoretically and often practically separate constructs, in the domain of nonverbal behaviors (NVBs) they are more often expressed similarly. Experimental research and field observations on adult humans were harvested for this review. The goals of this review were to: (1) summarize the list of reliable NVBs of PSDom (with associated references), (2) separately report those behaviors we think are associated with PSDom from those actually associated with PSDom, (3) describe the few existing distinctions between how power, status, and dominance each display NVBs and describe new reports on NVB associated with SES, social network size, and confidence, (4) address the quandary of whether the nonverbal expression of PSDom are universal across gender and culture and (5) provide a resource for researchers wishing to code nonverbal behaviors associated with PSDom.
Article
Full-text available
In the marketing literature, the ‘K effect’ refers to the claim that the letter K is overrepresented as the initial letter of brand names. To date, however, most findings have only considered the frequency of the written letters incorporated into brand names. Here, we argue that since letters sometimes sound different when pronounced in different words (e.g., ‘C’ in Cartier vs. Cisco), a phonemic analysis of the initial phonemes is likely to be more insightful than merely a comparison of the written form (as reported by previous researchers). With this in mind, the initial phonemes of top brand names were analyzed and compared with: (1) words in the dictionary; (2) a corpus of contemporary American English; and (3) the most popular current children’s names in the USA. We also analyzed a different list of top brands, including both corporate brand names (e.g., Procter & Gamble) as well as the product-related brand names (e.g., Pantene). We conclude by reporting the most underrepresented [vowels (/aʊ/, /ɜː/, /ɔɪ/, /ɔː/) and consonants (/r/, /ʒ/, /l/, /θ/)] and overrepresented [vowels (/iː/, /əʊ/) and consonants (/j/, /z/, /f/, /dʒ/, /p/, /j/, /t/)] initial phonemes in the brand names vis-à-vis the current linguistic naming conventions.
Preprint
Speech prosody constitutes a fundamental way through which speakers communicate their levels of confidence. Yet, it remains unknown whether prosodic markers of uncertainty constitute mere indices, that are constitutively present when speakers feel doubtful, or rather, whether they reflect other underlying psychological variables. By combining a psychophysical procedure with an acoustic analysis of verbal reports, we tease apart the contributions of sensory evidence, accuracy, and subjective confidence to epistemic prosody. We find that loudness, duration and intonation reflect distinct underlying mental processes: while loudness is predominantly impacted by accuracy, duration and intonation truly reflect subjective confidence, over and beyond sensory evidence and accuracy. We also find that speakers’ accuracy can still be heard beyond their own metacognitive awareness, and that at the level of intonation, speakers who display better metacognitive sensitivity are also the best signalers. Our results highlight prosody as a fundamental interface through which confidence can be shared.
Article
Words are part of almost every marketplace interaction. Online reviews, customer service calls, press releases, marketing communications, and other interactions create a wealth of textual data. But how can marketers best use such data? This article provides an overview of automated textual analysis and details how it can be used to generate marketing insights. The authors discuss how text reflects qualities of the text producer (and the context in which the text was produced) and impacts the audience or text recipient. Next, they discuss how text can be a powerful tool both for prediction and for understanding (i.e., insights). Then, the authors overview methodologies and metrics used in text analysis, providing a set of guidelines and procedures. Finally, they further highlight some common metrics and challenges and discuss how researchers can address issues of internal and external validity. They conclude with a discussion of potential areas for future work. Along the way, the authors note how textual analysis can unite the tribes of marketing. While most marketing problems are interdisciplinary, the field is often fragmented. By involving skills and ideas from each of the subareas of marketing, text analysis has the potential to help unite the field with a common set of tools and approaches.
ResearchGate has not been able to resolve any references for this publication.