ChapterPDF Available

Acoustic Patterns of Emotions

Authors:
  • Vox Institute

Abstract and Figures

IntroductionPsychophysiological Determinants of Emotional Speech PatternsIntra and Inter-Emotion Pattering of Acoustic ParametersConclusion AcknowledgementsReferences
Content may be subject to copyright.
23
Acoustic Patterns of
Emotions
Branka Zei Pollermann and Marc Archinard
Liaison Psychiatry, Geneva University Hospitals
CH-1205 Geneva, Switzerland
vox-institute@swissonline.ch
Introduction
Naturalness of synthesised speech is often judged by to how well it reflects the speak-
er's emotions and/or how well it features the culturally shared vocal prototypes of
emotions (Scherer, 1992). Emotionally coloured vocal output is thus characterised
by a blend of features constituting patterns of a number of acoustic parameters
related to F0, energy, rate of delivery and the long-term average spectrum.
Using the covariance model of acoustic patterning of emotional expression, the
chapter presents the authors' data on: (1) the inter-relationships between acoustic
parameters in male and female subjects; and (2) the acoustic differentiation of
emotions. The data also indicate that variations in F0, energy, and timing param-
eters mainly reflect different degrees of emotionally induced physiological arousal,
while the configurations of long term average spectra (more related to voice qual-
ity) reflect both arousal and the hedonic valence of emotional states.
Psychophysiological Determinants of Emotional Speech Patterns
Emotions have been described as psycho-physiological processes that include cog-
nitions, visceral and immunological reactions, verbal and nonverbal expressive dis-
plays as well as activation of behavioural reactions (such as approach, avoidance,
repulsion). The latter reactions can vary from covert dispositions to overt behav-
iour. Both expressive displays and behavioural dispositions/reactions are supported
by the autonomic nervous system which influences the vocalisation process on
three levels: respiration, phonation and articulation. According to the covariance
model (Scherer et al., 1984; Scherer and Zei, 1988; Scherer, 1989), speech patterns
covary with emotionally induced physiological changes in respiration, phonation
and articulation. The latter variations affect vocalisation on three levels:
1. suprasegmental (overall pitch and energy levels and their variations as well as
timing);
2. segmental (tense/lax articulation and articulation rate);
3. intrasegmental (voice quality).
Emotions are usually characterised along two basic dimensions:
1. activation level (aroused vs. calm), which mainly refers to the physiological
arousal involved in the preparation of the organism for an appropriate reac-
tion;
2. hedonic valence (pleasant/positive vs. unpleasant/negative) which mainly refers
to the overall subjective hedonic feeling.
The precise relationship between the physiological activation and vocal expression
was first modelled by Williams and Stevens (1972) and has received considerable
empirical support (Banse and Scherer, 1996; Scherer, 1981; Simonov et al., 1980;
Williams and Stevens, 1981). The activation aspect of emotions is thus known to be
mainly reflected in the pitch and energy parameters such as mean F0, F0 range,
general F0 variability (usually expressed either as SD or the coefficient of vari-
ation), mean acoustic energy level, its range and its variability as well as the rate of
delivery. Compared with an emotionally unmarked (neutral) speaking style, an
angry voice would be typically characterised by increased values of many or all of
the above parameters, while sadness would be marked by a decrease in the same
parameters. By contrast, the hedonic valence dimension, appears to be mainly
reflected in intonation patterns, and in voice quality.
While voice patterns related to emotions have a status of symptoms (i.e. signals
emitted involuntarily), those influenced by socio-cultural and linguistic conventions
have a status of a consciously controlled speaking style. Vocal output is therefore
seen as a result of two forces: the speaker's physiological state and socio-cultural
linguistic constraints (Scherer and Kappas, 1988).
As the physiological state exerts a direct causal influence on vocal behaviour, the
model based on scalar covariance of continuous acoustic variables appears to have
high cross-language validity. By contrast the configuration model remains restricted
to specific socio-linguistic contexts, as it is based on configurations of category
variables (like pitch `fall' or pitch `rise') combined with linguistic choices. From the
listener's point of view, naturalness of speech will thus depend upon a blend of
acoustic indicators related, on the one hand, to emotional arousal, and on the
other hand, to culturally shared vocal stereotypes and/or prototypes characteristic
of a social group and its status.
Intra and Inter-Emotion Patterning of Acoustic Parameters
Subjects and Procedure
Seventy-two French speaking subjects' voices were used. Emotional states were
induced through verbal recall of the subjects' own emotional experiences of joy,
238 Improvements in Speech Synthesis
sadness and anger (Mendolia and Kleck, 1993). At the end of each recall, the
subjects said a standard sentence on the emotion congruent tone of voice.
The sentence was: `Alors, tu acceptes cette affaire' (`So you accept the deal.').
Voices were digitally recorded, with mouth-to-microphone distance being kept con-
stant.
The success of emotion induction and the degree of emotional arousal experi-
enced during the recall and the saying of the sentence were assessed through self-
report. The voices of 66 subjects who reported having felt emotional arousal while
saying the sentence were taken into account (30 male and 36 female). Computerised
analyses of the subjects' voices were performed by means of Signalyze, a Macintosh
platform software (Keller, 1994). The latter provided measurements of a number of
vocal parameters related to emotional arousal (Banse and Scherer, 1996; Scherer,
1989). The following vocal parameters were used for statistical analyses: mean F0,
F0sd, F0 max/min ratio, voiced energy range. The latter was measured between
two mid-point vowel nuclei corresponding to the lowest and the highest peak in the
energy envelopes and expressed in pseudo dB units (Zei and Archinard, 1998). The
rate of delivery was expressed as the number of syllables uttered per second. Long-
term average spectra were also computed.
Results for Intra-Emotion Patterning
Significant differences between male and female subjects were revealed by the
ANOVA test. The differences concerned only pitch-related parameters. There was
no significant gender-dependent difference either for voiced energy range or for the
rate of delivery: both male and female subjects had similar distributions of values
regarding the rate of delivery and voiced energy range. Table 23.1 presents the F0
parameters affected by speakers' gender and ANOVA results.
Table 23.1 F0 parameters affected by speakers' gender
Emotions F0 mean
in Hz
ANOVA F0 max/
min
ratio
ANOVA F0 SD ANOVA
anger M 128;
F 228
M 2.0;
F 1.8
M 21.2;
F 33.8
F(1, 64) 84.6*** F(1, 64)
5.6*
F(1, 64)
11.0**
joy M 126;
F 236
M 1.9;
F 1.9
M 22.6;
F 36.9
F(1, 64) 116.8*** F(1, 64)
.13
F(1, 64)
14.5***
sadness M 104;
F 201
M 1.6;
F 1.5
M 10.2;
F 19.0
F(1, 64) 267.4*** F(1, 64)
.96
F(1, 64)
39.6***
Note:N66. *p<:05, **p<:01, ***p<:001; M male; F female.
Acoustic Patterns of Emotions 239
As gender is both a sociological variable (related to social category and cultural
status) and a physiological variable (related to the anatomy of the vocal tract),
we assessed the relation between mean F0 and other vocal parameters. This
was done by computing partial correlations between mean F0 and other vocal
parameters, with sex of speaker being partialed out. The results show that
the subjects with higher F0 also have higher F0 range (expressed as max/min ratio)
across all emotions. In anger, the subjects with higher F0 also exhibit higher
pitch variability (expressed as F0sd) and faster delivery rate. In sadness the F0
level is negatively correlated with voiced energy range. Table 23.2 presents the
results.
Results for Inter-Emotion Patterning
The inter-emotion comparison of vocal data was performed separately for male
and female subjects. A paired-samples t-test was applied. The pairs consisted of the
same acoustic parameter measured for two emotions. The results presented in
Tables 23.2 and 23.4 show significant differences mainly for emotions that differ
on the level of physiological activation: anger vs. sadness, and joy vs. sadness. We
thus concluded that F0±related parameters, voiced energy range, and the rate of
delivery mainly contribute to the differentiation of emotions at the level of physio-
logical arousal.
In order to find vocal indicators of emotional valence, we compared voice qual-
ity parameters for anger (a negative emotion with high level of physiological
arousal) with those for joy (a positive emotion with high level of physiological
arousal). This was inspired by the studies on the measurement of vocal differenti-
ation of hedonic valence in spectral analyses of the voices of astronauts (Popov et
al., 1971; Simonov et al., 1980). We thus hypothesised that spectral parameters
could significantly differentiate between positive and negative valence of the emo-
tions which have similar levels of physiological activation. To this purpose, long-
term average spectra (LTAS) were computed for each voice sample, yielding 128
data points for a range of 40±5 500 Hz.
We used a Bark-based strategy of spectral data analyses, where perceptually
equal intervals of pitch are represented as equal distances on the scale. The fre-
quencies covered by 1.5 Bark intervals were the following: 40±161 Hz; 161±297 Hz;
Table 23.2 Partial correlation coefficients between mean F0 and other vocal parameters with
speaker's gender partailed out
Mean F0 and
emotions
F0 max/min
ratio
F0 sd voiced energy
range in
pseud dB
Delivery rate
mean F0 in Anger .43** .77** .03 .39**
mean F0 in Joy .36** .66** .08 .16
mean F0 in Sadness .32** .56** .43** .13
Note:N66. *p<:05, **p<:01, ***p<:001; all significance levels are 2-tailed.
240 Improvements in Speech Synthesis
Table 23.3 Acoustic differentiation of emotions in male speakers
Emotions
compared
F0 mean
in Hz
T-test
and P
F0
max/min
ratio
T-test
and P
F0 SD T-test
and P
Voiced
energy
range in
pseudo d
T-test
and P
Delivery
rate
T-test
and P
sadness 104 1.6 10.2 9.6 3.9
anger 128 4.3*** 2.0 6.0*** 21.2 5.7*** 14.2 5.0*** 4.6 2.2*
sadness 104 1.6 10.2 9.6 3.9
joy 126 4.6*** 1.9 6.0*** 22.7 7.5*** 12.1 2.5* 4.5 2.9**
joy 126 1.9 22.7 12.0 4.5
anger 128 .4 2.0 .9 21.2 .8 14.2 2.8** 4.6 .2
Note:N30. *p<:05, **p<:01, ***p<:001; all significance levels are 2-tailed.
Acoustic Patterns of Emotions 241
Table 23.4 Acoustic differentiation of emotions in female speakers
Emotions
compared
F0 mean
in Hz
T-test
and P
F0 max/min
ratio
T-test
and P
F0 SD T-test
and P
voiced energy
range in
pseudo dB
T-test
and P
Delivery
rate
T-test
and P
Sadness 201 1.5 19.0 10.9 4.2
Anger 228 2.7** 1.8 3.4** 33.8 4.8*** 14.2 2.9** 5.0 3.7**
Sadness 201 1.5 19.0 10.9 4.2
Joy 236 3.7** 1.9 5.7*** 37.0 6.1*** 12.8 2.2* 5.0 3.3**
Joy 236 1.9 37.0 12.8 5.0
Anger 228 .8 1.8 1.6 33.8 1.0 14.2 1.0 5.0 .1
Note:N36. *p<:05, **p<:01, ***p<:001; all significance levels are 2-tailed.
242 Improvements in Speech Synthesis
297±453 Hz; 453±631 Hz; 631±838 Hz; 838±1 081 Hz; 1 081±1 370 Hz;
1 370±1720 Hz; 1 720±2 152 Hz; 2 152±2 700 Hz; 2 700±3 400 Hz; 3 400±4 370 Hz; 4
370±5 500 Hz (Hassal and Zaveri, 1979; Pittam and Gallois, 1986; Pittam, 1987).
Subsequently mean energy value for each band was computed. We thus obtained
13 spectral energy values per emotion and per subject.
Paired t-tests were applied. The pairs consisted of the same acoustic parameter
(the value regarding the same frequency interval) compared across two emotions.
The results showed that several frequency bands contributed significantly to the
differentiation between anger and joy, thus confirming the hypothesis that the
valence dimension of emotions can be reflected in the long term average spectrum.
The results show that in a large portion of the spectrum, energy is higher in
anger than in joy. In male subjects it is significantly higher as of 300 Hz up to 3
400 Hz, while in female subjects the spectral energy is higher in anger than in joy in
the frequency range from 800±3 400 Hz. Thus our analysis of LTAS curves, based
on 1.5 Bark intervals, shows that an overall difference in energy is not the conse-
quence of major differences in the distribution of energy across the spectrum for
Anger and Joy. This fact may lend itself to two interpretations: (1) those aspects of
voice quality which are measured by spectral distribution are not relevant for the
distinction between positive and negative valence of high-arousal emotions or (2)
anger and joy also differ on the level of arousal which is reflected in spectral energy
(both voiced and voiceless). Table 23.5 presents the details of the results for the
Bark-based strategy of the LTAS analysis.
Although we assumed that vocal signalling of emotion can function independently
of the semantic and affective information inherent to the text (Banse and Scherer,
1996; Scherer, Ladd, and Silverman, 1984), the generally positive connotations of
Table 23.5 Spectral differentiation between anger and joy utterances in 1.5 Bark frequency
intervals.
Frequency
bands in Hz
spectral energy
in pseudo dB
T-test and P spectral energy
in pseudo dB
T-test and P
Male subjects Female subjects
40±161 A 18.6; J 17.6 .69 A 12.2; J 13.8 1.2
161±297 A 23.5; J 20.8 2.0 A 19.1; J 18.9 .12
297±453 A 26.7; J 22 3.1* A 21.9; J 20.8 .62
453±631 A 30.9; J 24.3 3.4** A 24.2; J 21.3 1.5
631±838 A 28.5; J 21.0 4.4** A 23.6; J 19.3 2.2
838±1 081 A 21.1; J 15.8 3.8** A 19.4; J 14.7 2.6*
1 081±1 370 A 19.6; J 14.8 3.6** A 16.9; J 12.6 2.9*
1 370±1 720 A 22.5; J 17.0 3.7** A 17.5; J 12.9 3.3**
1 720±2 152 A 20.7; J 14.6 3.8** A 19.7; J 16.1 2.5*
2 152±2 700 A 18.7; J 13.0 3.7** A 15.2; J 12.4 2.4*
2 700±3 400 A 13.3; J 10.1 2.9* A 14.7; J 11.3 2.7*
3 400±4 370 A 10.6; J 4.1 2.5 A 8.8; J 3.9 1.7
4 370±5 500 A 1.9; J .60 1.2 A 1.3; J .5 1.9
Note:N20 *p <.05, **p <.01, ***p <.001; A anger; J joy; All significance levels are 2-tailed.
Acoustic Patterns of Emotions 243
the words `accept' and `deal' sometimes did disturb the subjects' ease of saying the
sentence with a tone of anger. Such cases were not taken into account for statistical
analyses. However, this fact points to the influence of the semantic content on
vocal emotional expression. Most of the subjects reported that emotionally congru-
ent semantic content could considerably help produce appropriate tone of voice.
The authors also repeatedly noticed that in the subjects( spontaneous verbal ex-
pression, the emotion words were usually said on an emotionally congruent tone.
Conclusion
In spite of remarkable individual differences in vocal tract configurations, it
appears that vocal expression of emotions exhibits similar patterning of vocal par-
ameters. The similarities may be partly due to the physiological factors and partly
to the contextually driven vocal adaptations governed by stereotypical representa-
tions of emotional voice patterns. Future research in this domain may further
clarify the influence of cultural and socio-linguistic factors on intra-subject pattern-
ing of vocal parameters.
Acknowledgements
The authors thank Jacques Terken, Technische Universiteit Eindhoven, Nederland,
for his constructive critical remarks. This article was carried out in the framework
of COST 258.
References
Banse, R. and Scherer, K.R. (1996). Acoustic profiles in vocal emotion expression. Journal
of Personality and Social Psychology,70, 614±636.
Hassal, J.H. and Zaveri, K. (1979). Acoustic Noise Measurements.Bu
Èel and Kjaer.
Keller, E. (1994). Signal Analysis for Speech and Sound. InfoSignal.
Mendolia, M. and Kleck, R.E. (1993). Effects of talking about a stressful event on arousal:
Does what we talk about make a difference? Journal of Personality and Social Psychology,
64, 283±292.
Pittam, J. (1987). Discrimination of five voice qualities and prediction of perceptual ratings.
Phonetica,44, 38±49.
Pittam, J. and Gallois C. (1986). Predicting impressions of speakers from voice quality
acoustic and perceptual measures. Journal of Language and Social Psychology,5, 233±247.
Popov, V.A., Simonov, P.V. Frolov, M.V. et al. (1971). Frequency spectrum of speech as a
criterion of the degree and nature of emotional stress. (Dept. of Commerce, JPRS 52698.)
Zh. Vyssh. Nerv. Dieat., (Journal of Higher Nervons Activity)1, 104±109.
Scherer, K.R. (1981). Vocal indicators of stress. In J. Darby (ed.), Speech Evaluation in
Psychiatry (pp. 171±187). Grune and Stratton.
Scherer, K.R. (1989). Vocal correlates of emotional arousal and affective disturbance. Hand-
book of Social Psychophysiology (pp. 165±197). Wiley.
Scherer, K.R. (1992). On social representations of emotional experience: Stereotypes, proto-
types, or archetypes? In M.V.H Cranach, W. Doise, and G. Mugny (eds), Social Represen-
tations and the Social Bases of Knowledge (pp. 30±36). Huber.
244 Improvements in Speech Synthesis
Scherer, K.R. (1993). Neuroscience projections to current debates in emotion psychology.
Cognition and Emotion,7, 1±41.
Scherer, K.R. and Kappas, A. (1988). Primate vocal expression of affective state. In D.Todt,
P.Goedeking, and D. Symmes (eds), Primate Vocal Communication (pp. 171±194).
Springer-Verlag.
Scherer, K.R., Ladd, D.R., and Silverman, K.E.A. (1984). Vocal cues to speaker affect:
Testing two models. Journal of the Acoustical Society of America,76, 1346±1356.
Scherer, K.R. and Zei, B. (1988). Vocal indicators of affective disorders. Psychotherapy and
Psychosomatics,49, 179±186.
Simonov, P.V., Frolov, M.V., and Ivanov E.A. (1980). Psychophysiological monitoring of
operator's emotional stress in aviation and astronautics. Aviation, Space, and Environmen-
tal Medicine, January 1980, 46±49.
Williams, C.E. and Stevens, K.N. (1972). Emotion and speech: Some acoustical correlates.
Journal of the Acoustical Society of America,52, 1238±1250.
Williams, C.E. and Stevens, K.N. (1981). Vocal correlates of emotional states. In J.K. Darby
(ed.), Speech Evaluation in Psychiatry (pp. 221±240). Grune and Statton.
Zei, B. and Archinard, M. (1998). La variabilite
Âdu rythme cardiaque et la diffe
Ârentiation
prosodique des e
Âmotions, Actes des XXIIe
Ámes Journe
Âes d'Etudes sur la Parole (pp.
167±170). Martigny.
Acoustic Patterns of Emotions 245
... Although the nice and reprehensive speeches had some similar characteristics (highpitched and high-energy at elevated frequencies), we recorded differences between them in other acoustic variables (minimum frequency, maximum frequency, and average power). Different degrees of emotional activation influence tonal and energetic aspects of sound [53,54]. Social stress-such as stress generated during rough interactions, or after unmet expectationscan be responsible for disturbing neurophysiological, behavioral, and emotional patterns of individuals [55]. ...
... Social stress-such as stress generated during rough interactions, or after unmet expectationscan be responsible for disturbing neurophysiological, behavioral, and emotional patterns of individuals [55]. The differences we found in the acoustic parameters of different speeches may be related to their different valences: a "positive" emotion with physiological arousal versus a "negative" emotion with physiological arousal, respectively [54]. As a result, an increase in the individuals' respiratory rate and in subglottic pressure during speech increases the relative energy of the upper harmonics and the fundamental frequency. ...
Article
Full-text available
In a previous study, we found that Positive Reinforcement Training reduced cortisol of wolves and dogs; however, this effect varied across trainer–animal dyads. Here we investigate whether and how the trainers’ use of speech may contribute to this effect. Dogs’ great interest in high-pitched, intense speech (also known as Dog Directed Speech) has already been reported, but whether and how wolves respond similarly/differently to voice characteristics has never been studied before. We analyzed 270 training sessions, conducted by five trainers, with nine mixed-breed dogs and nine wolves, all human-socialized. Through Generalized Linear Mixed Models, we analyzed the effects of (a) three speech categories (nice, neutral, reprehensive) and laugh; and (b) acoustic characteristics of trainers’ voices on animals’ responses (correct responses, latency, orientation, time at less than 1 m, non-training behaviors, tail position/movements, cortisol variation). In both subspecies, tail wagging occurred more often in sessions with longer durations of nice speech, and less often in sessions with reprehensive speech. For dogs, the duration of reprehensive speech within a session was also negatively related to correct responses. For wolves, retreat time was associated with more reprehensive speech, whereas duration of nice speech was positively associated with time spent within one meter from the trainer. In addition, most dog behavioral responses were associated with higher average intonations within sessions, while wolf responses were correlated with lower intonations within sessions. We did not find any effects of the variables considered on cortisol variation. Our study highlights the relevance of voice tone and speech in a training context on animals’ performances and emotional reactions.
... It has been proposed that the human voice also conveys emotional states, each characterized by a unique acoustic profile (e.g., Banse & Scherer, 1996;Scherer, Banse, Wallbott, & Goldbeck, 1991). A number of studies support the idea of emotion-specific patterns of acoustic features for discrete negative emotions, in that acoustic profiles of several negative emotions, including anger, fear, and sadness, have been reported to show considerable differentiation (e.g., Banse & Scherer, 1996;Juslin & Laukka, 2001;van Bezooijen, 1984;Pollermann & Archinard, 2002). ...
... Filter related acoustic features such as formant frequencies and energy distribution have been more rarely considered in studies of positive emotions. Research suggests that filter related features, particularly energy distribution in the spectrum, might be important for differentiating emotional valence even between emotions of similar arousal level (e.g., Banse & Scherer 1996;Pollermann & Archinard, 2002;Waarama, Laukkanen, Airas, & Alku, 2010), whereas source-related parameters do not allow differentiation of valence, but do differentiate between discrete emotions (Patel, Scherer, Björkner, & Sundberg, 2011). However, more research measuring a large set of parameters including filter-related features is needed to obtain acoustic features for a larger set of discrete emotions. ...
Article
Full-text available
Researchers examining nonverbal communication of emotions are becoming increasingly interested in differentiations between different positive emotional states like interest, relief, and pride. But despite the importance of the voice in communicating emotion in general and positive emotion in particular, there is to date no systematic review of what characterizes vocal expressions of different positive emotions. Furthermore, integration and synthesis of current findings are lacking. In this review, we comprehensively review studies (N = 108) investigating acoustic features relating to specific positive emotions in speech prosody and nonverbal vocalizations. We find that happy voices are generally loud with considerable variability in loudness, have high and variable pitch, and are high in the first two formant frequencies. When specific positive emotions are directly compared with each other, pitch mean, loudness mean, and speech rate differ across positive emotions, with patterns mapping onto clusters of emotions, so-called emotion families. For instance, pitch is higher for epistemological emotions (amusement, interest, relief), moderate for savouring emotions (contentment and pleasure), and lower for a prosocial emotion (admiration). Some, but not all, of the differences in acoustic patterns also map on to differences in arousal levels. We end by pointing to limitations in extant work and making concrete proposals for future research on positive emotions in the voice.
... There is evidence that filter-related parameters can provide information about valence (10, 23,[40][41][42][43]. Indeed, research on humans has shown that filter-related cues vary when comparing emotions that differ in valence but are characterized by similar levels of arousal [e.g., (40,42,44,45)]. As described below, some studies on non-human mammals have examined formants, which may be the key to investigating emotional valence in the future (10, 23,26,33,46). Briefer (10) argues, "it is crucial to measure a large set of parameters including formant frequencies, using the source-filter framework, in order to obtain emotion-specific vocal profiles" (p. ...
Article
Full-text available
This review discusses how welfare scientists can examine vocalizations to gain insight into the affective states of individual animals. In recent years, researchers working in professionally managed settings have recognized the value of monitoring the types, rates, and acoustic structures of calls, which may reflect various aspects of welfare. Fortunately, recent technological advances in the field of bioacoustics allow for vocal activity to be recorded with microphones, hydrophones, and animal-attached devices (e.g., collars), as well as automated call recognition. We consider how vocal behavior can be used as an indicator of affective state, with particular interest in the valence of emotions. While most studies have investigated vocal activity produced in negative contexts (e.g., experiencing pain, social isolation, environmental disturbances), we highlight vocalizations that express positive affective states. For instance, some species produce vocalizations while foraging, playing, engaging in grooming, or interacting affiliatively with conspecifics. This review provides an overview of the evidence that exists for the construct validity of vocal indicators of affective state in non-human mammals. Furthermore, we discuss non-invasive methods that can be utilized to investigate vocal behavior, as well as potential limitations to this line of research. In the future, welfare scientists should attempt to identify reliable, valid species-specific calls that reflect emotional valence, which may be possible by adopting a dimensional approach. The dimensional approach considers both arousal and valence by comparing vocalizations emitted in negative and positive contexts. Ultimately, acoustic activity can be tracked continuously to detect shifts in welfare status or to evaluate the impact of animal transfers, introductions, and changes to the husbandry routine or environment. We encourage welfare scientists to expand their welfare monitoring toolkits by combining vocal activity with other behavioral measures and physiological biomarkers.
... This relationship between call structure and context may also reflect the caller's emotional state. In times of high emotional arousal, respiration, salivation, and muscular tension change, resulting in changes to fundamental frequencies, durations, and rates of vocalizations (Pollermann and Archinard 2002;Titze 1994). ...
Article
Full-text available
Vocalizations are a vital form of communication. Call structure and use may change depending on emotional arousal, behavioral context, sex, or social complexity. Pithecia chrysocephala (golden-faced sakis) are a little-studied Neotropical species. We aimed to determine the vocal repertoire of P. chrysocephala and the influence of context on call structure. We collected data June–August 2018 in an urban secondary forest fragment in Manaus, Amazonian Brazil. We took continuous vocal recordings in 10-min blocks with 5-min breaks during daily follows of two groups. We recorded scan samples of group behavior at the start and end of blocks and used ad libitum behavioral recording during blocks. We collected 70 h of data and analyzed 1500 calls. Lowest frequencies ranged 690.1–5879 Hz in adults/subadults and 5393.6–9497.8Hz in the only juvenile sampled. We identified eight calls, three of which were juvenile specific. We found that, while repertoire size was similar to that of other New World monkeys of similar group size and structure, it also resembled those with larger group sizes and different social structures. The durations of Chuck calls were shorter for feeding contexts compared to hostile, but frequencies were higher than predicted if call structure reflects motivation. This finding may be due to the higher arousal involved in hostile situations, or because P. chrysocephala use Chuck calls in appeasement, similar to behavior seen in other primates. Call structures did not differ between sexes, potentially linked to the limited size dimorphism in this species. Our findings provide a foundation for further investigation of Pithecia vocal behavior and phylogeny, as well as applications for both captive welfare (stress relief) and field research (playbacks for surveys).
... Emotions of high arousal, such as fear or joy, are associated with an increase in amplitude, F0, F0 range, F0 variability, jitter, shimmer and speech rate, as well as with fewer and shorter interruptions (inter- vocalization interval). By contrast, emotions of low arousal, such as boredom, induce a low F0, narrow F0 range and low speech rate (Zei Pollermann and Archinard, 2002;Juslin and Scherer, 2005;Li et al., 2007). Patel et al. (2011) showed that voice frequency allowed good classification of five emotions (relief, joy, panic/ fear, anger and sadness). ...
Article
Vocal communication is particularly important for nocturnal species as well as those living in dense forests, where visual abilities can be somewhat constrained. The Andean night monkey (Aotus lemurinus) is a nocturnal American primate living in mountain forests in the Northern Andes with scant information on its behavior and ecology. The main goal of this study is to describe the vocal repertoire of a group of wild Andean night monkeys and compare it with earlier bioacoustics studies on the only nocturnal platyrrhines. We recorded the vocal behavior of a group of night monkeys living in the eastern Andes of Colombia between August and December 2019. Based on an auditory and a visual inspection of the vocal records, and through a quantitative analysis of the acoustic parameters of the vocalizations, we were able to identify five different calls emitted by the Andean night monkey. Four of these calls are stereotyped while the fifth vocalization (Squeak) is more variable, having different forms. Additionally, one call (Acetate) was found to be unique to this species. The result of this study contributes to the scant information on the ecology and behavior of the Andean night monkey and sets baseline information on the vocal behavior of night monkeys that may be used in future studies on communication of these and other nocturnal primates.
Article
Full-text available
As Darwin first recognized, the study of emotional communication has the potential to improve scientific understanding of the mechanisms of signal production as well as how signals evolve. We examined the relationships between emotional arousal and selected acoustic characteristics of coo and scream vocalizations produced by female rhesus macaques, Macaca mulatta, during development. For coos, arousal was assessed through measures of stress-induced elevations of plasma cortisol exhibited in response to the human intruder test. In the analysis of screams, arousal was evaluated from the intensity of aggression experienced by the vocalizer during natural social interactions. Both call types showed a positive relationship between arousal and overall fundamental frequency (F0, perceived as pitch in humans). In coos, this association was dampened over development from infancy (6 months) to the juvenile, prepubertal period (16 months) and further to menarche (21.3–31.3 months), perhaps reflecting developmental changes in physiology, anatomy and/or call function. Heightened arousal was also associated in coos with increases in an acoustic dimension related to F0 modulation and noisiness. As monkeys matured, coos showed decreases in overall F0 as well as increased noisiness and F0 modulation, likely reflecting growth of the vocal apparatus and changes in vocal fold oscillation. Within screams, only one acoustic dimension (related to F0 modulation) showed developmental change, and only within one subclass of screams within one behavioural context. Our results regarding the acoustic correlates of arousal in both call types are broadly consistent with findings in other species, supporting the hypothesis of evolutionary continuity in emotion expression. We discuss implications for broader theories of how vocal acoustics respond to selection pressures.
Article
Full-text available
Very little is known about cattle vocalizations. The few studies available in the literature have been conducted using animals under stress or very intensive husbandry conditions. Similarly, the individual consistency of behaviour in cattle has rarely been considered except in applied studies of constrained and isolated animals, and no previous research has attempted to address a possible association between vocal communication and temperament in cattle. The studies reported here address these gaps in our knowledge. I found that cattle contact calls have acoustic characteristics that give them individualized distinctiveness, in both adult cows and calves. These results were confirmed using playback experiments, where I found that there is bidirectional mother-offspring recognition, as has been recorded in other “weak hider” ungulates. Additionally, using visual and acoustic stimuli, I assessed individual cattle temperament. The results showed that there was no individual behavioural consistency in responses to a novel object presentations. However, calves behaved consistently more boldly than cows. Furthermore, there was significant individual consistency in responses to vocalisations of heterospecifics, when they were played back through a speaker in the field. Surprisingly, no correlations were found between the ability of cattle to identify their own mother/offspring and the acoustic features of their vocalisations, or behavioural responses in any other context. There were, however, significant correlations between one characteristic of vocalisations in adult cows (formant spacing) and the boldness of behavioural responses to both novel objects and auditory stimuli. Additionally, higher F0 in calf contact vocalizations correlated with boldness in the auditory stimuli experiment. These relationships imply that vocalisations may encode information about individual temperament, something which has rarely been documented. Surprisingly, no strong correlations were found between the behavioural responses to visual and acoustic stimuli, suggesting that individual consistency in behaviour across contexts was limited, and that behavioural plasticity could play an important role in determining responses in different environmental contexts. Overall, my results contribute to our knowledge of animal communication in mammals from a bioacoustic point of view, and they are also potentially relevant to studies of vocalizations as indicators of cattle welfare.
Article
Full-text available
Studying vocal correlates of emotions is important to provide a better understanding of the evolution of emotion expression through cross-species comparisons. Emotions are composed of two main dimensions: emotional arousal (calm versus excited) and valence (negative versus positive). These two dimensions could be encoded in different vocal parameters (segregation of information) or in the same parameters, inducing a trade-off between cues indicating emotional arousal and valence. We investigated these two hypotheses in horses. We placed horses in five situations eliciting several arousal levels and positive as well as negative valence. Physiological and behavioral measures collected during the tests suggested the presence of different underlying emotions. First, using detailed vocal analyses, we discovered that all whinnies contained two fundamental frequencies ("F0" and "G0"), which were not harmonically related, suggesting biphonation. Second, we found that F0 and the energy spectrum encoded arousal, while G0 and whinny duration encoded valence. Our results show that cues to emotional arousal and valence are segregated in different, relatively independent parameters of horse whinnies. Most of the emotion-related changes to vocalizations that we observed are similar to those observed in humans and other species, suggesting that vocal expression of emotions has been conserved throughout evolution.
Article
Full-text available
We identified certain assumptions implicit in two divergent approaches to studying vocal affect signaling. The 'covariance' model assumes that nonverbal cues function independently of verbal content, and that relevant acoustic parameters covary with the strength of the affect conveyed. The 'configuration' model assumes that both verbal and nonverbal cues exhibit categorical linguistic structure, and that different affective messages are conveyed by different configurations of category variables. We tested thse assumptions in a series of two judgment experiments in which subjects rated utterances, written transcripts, and three different acoustically masked version of the utterances. Comparison of the different conditions showed that voice quality and F0 level can convey affective information independently of the verbal context. However, judgments of the unaltered recordings also showed that intonational categories (contour types) conveyed affective information only in interaction with grammatical features of the text. It appears necessary to distinguish between linguistic features of intonation and other (paralinguistic) nonverbal cues and to design research methods appropriate to the type of cues under study.
Article
Full-text available
This study examined the relationship of an acoustic measure of speech based on the Long-Term Average Spectrum (LTAS), expert ratings of five voice qualities (breathy, creaky, nasal, tense, and whispery voices), and naive ratings of status and solidarity. Six male and six female speakers recorded a standard passage in the five voice types; these recordings were then analysed acoustically using the LTAS, judged by expert coders, and rated by undergraduate students. A two-stage path analysis revealed significant prediction from the LTAS to expert ratings on all voice qualities except nasality. In addition, creaky, nasal, and tense voices were negatively related to solidarity judgements, and nasal voice was negatively related to status judgements. These results point to the usefulness of the LTAS in measuring voice quality, as well as to the importance of perceptual judgements in connecting physical measures of the voice to impressions of speakers. Finally, the results suggest that the ‘ideal’ voice may be characterised by only small amounts of all these voice qualities.
Chapter
A major feature of communication, which is all too easy to forget when one specializes in the study of particular aspects of information transmission, is the multifunctionality of signal use. Karl Biihler highlighted the fact that most signals serve several functions simultaneously in his “Organon” model of language arguing that a “sign complex” serves, at the same time, as a symbol, a semantic representation of an object or idea, as symptom, an expression of sender state, and an appeal (“Appell”), provoking a reaction of the receiver. In an attempt to classify the functions of nonverbal signs in conversation, K. R. Scherer (1980) suggested to use the semiotic approach (Morris 1946; Peirce 1931–1935) to differentiate between semantic functions (i.e., nonverbal signs replacing, amplifying, contradicting, or modifying verbal signs), pragmatic functions (expression of sender state, reactions, and intentions), and dialogic functions (relationship between sender and receiver, regulation of interaction). These types of functions can be easily mapped onto Buhler’s symbol, symptom, and appeal functions, respectively. In addition, Scherer noted a syntactic function, related to the ordering of signs in a sequence as well as its hierarchical organization.
Article
Crandall and Sacia in 1924 first made an analytical study of speech sounds by means of frequency selective networks. They, in effect, simultaneously impressed speech current on a set of tuned circuits, having resonant frequencies distributed more or less uniformly over the speech frequency range, and measured the long?time average power output of each of the tuned circuits. By this means they obtained a power?frequency spectrum for each of thirteen sustained vowel sounds. The objective in this study was the translation of the time pattern of the sound wave into a frequency pattern, for the ear, in so far as it obeys Ohm's law of hearing, makes a translation of this type and presumably uses such frequency patterns in the identification of the speech sounds. This general method of study, improved and modified to meet special requirements, has since been used for obtaining a great variety of useful speech data. In this paper apparatus will be described by which, in effect,short?time average output power levels of a set of frequency?selective networks are graphically recorded in synchronism when speech current is applied to the set. The problems of selecting the optimum number and type of networks and of designing a suitable recorder for producing the most useful data on both vowels and consonants will be discussed. Samples of records obtained with the apparatus will be shown.
Article
Possible contributions from different branches of the neurosciences to current debates in emotion psychology are discussed. The controversial issues covered in the paper include the nature of emotion, cognitionemotion interaction, the evaluative criteria used in emotion-antecedent appraisal processes, sequential vs. parallel processing in appraisal, differential patterning of discrete emotions, and possible entry points into the emotion system. Examples for neuroscience work that may be pertinent to these issues are drawn from neural network modelling, comparative studies of brain architecture and functional pathways in animals, experimental work in cognitive psychology, and case studies of braindamaged patients in clinical neuropsychology.