ArticlePDF Available

Abstract and Figures

Speech and song are universal forms of vocal expression that reflect distinct channels of communication. While these two forms of expression share a common means of sound production, differences in the acoustic properties of speech and song have not received significant attention. Here, we present evidence of acoustic differences in the speaking and singing voice. Twenty-four actors were recorded while speaking and singing different statements with five emotions, two emotional intensities, and two repetitions. Acoustic differences of speech and song were reported in several acoustic parameters, including vocal loudness, spectral properties, and vocal quality. Interestingly, emotion was conveyed similarly in many acoustic features across speech and song. These results demonstrate the entwined nature of speech and song, and provide evidence in support of the shared emergence of speech and song as a form of early proto-language. These recordings form part of our new Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) that will be freely released in 2013.
Content may be subject to copyright.
INTRODUCTION
Across cultures and throughout history, speech and song have often being regarded as a dyadic form of vocal
expression (Nettl, 2000; 2005). In Persia, vocal music was understood by the term khāndan, referring to an activity
involving reciting, reading, and singing (Nettl, 2005). For members of the blackfoot tribe, saapup entailed a display
of singing, dancing, and ceremonial chanting (Nettl, 2000). Among the southern Nguni people, ngoma refers to
singing, divining, and the designation of people who engage in these activities (Janzen, 1992), while for the ancient
Greeks, the acts of singing and speaking were described interchangeably and did not exist as the distinct forms we
know them today (Stamou, 2002). Evolutionary theorists have proposed that speech and song may once have existed
as a coupled means of vocal communication, a central goal of which was the expression of emotion (Darwin, 1871;
Brown, 2000; Mithen, 2005). Similarly, speech and song have long been considered to share a common ‘acoustic
code’ in the expression of emotion (Spencer, 1875; Scherer, 1995). In a major review of the subject, Juslin and
Laukka (2003) concluded that music performance, under which singing was classified, shared many of the same
acoustic features of speech for the expression of emotion. However, to the authors’ knowledge, despite their long
historical and academic association, there have been no direct comparative analyses of the acoustic properties of
emotional speech and song. In this paper we report preliminary findings on the acoustic commonalities of emotion
in matched productions of speech and song.
Acoustic analyses were run on six vocalists taken from the Ryerson Audio-Visual Database of Emotional Speech
and Song (RAVDESS) (Livingstone et al., 2012). The RAVDESS, which is being prepared for public release,
consists of 24 professional actors, speaking and singing matched statements with a large range of different emotions,
each with two emotional intensities. The RAVDESS contains over 7000 files (audio-only, video-only, full audio-
video in 720p), and will be released with perceptual validation data, acoustic analyses, and facial motion analyses.
The purpose for creating the RAVDESS was to provide researchers with an open-access repository of high-quality,
audio-visual recordings of speech and song in North American English. Perceptual accuracy of the acoustic
recordings used in the present analysis was confirmed in a separate pilot experiment. Based on previous reviews of
the acoustic cues of emotion in speech (Cowie et al., 2001; Juslin and Laukka, 2001) and song (Sundberg, 1998), we
hypothesized that the two would exhibit similar patterns of change in fundamental frequency, vocal intensity,
utterance duration, first formant frequency, and spectral energy distribution.
METHOD
Participants
Six highly trained actors (mean age = 25.0, SD = 4.04, 3 females), were recruited from the Toronto community.
Participants were native English speakers with a North American accent, and had at least six years of acting
experience (M = 10.17, SD = 3.72), and varied amounts of singing experience (M = 6.83, SD = 2.85).
Stimulus and Materials
Two neutral English statements were used (“Kids are talking by the door”, “Dogs are sitting by the door”).
Statements were seven syllables in length and were matched in word frequency and familiarity using the MRC
psycholinguistic database (Coltheart, 1981). Statements were chosen to enable a matched production in speech and
song. In the song condition, three isochronous melodies were used; one each for emotionally neutral (F4, F4, G4,
G4, F4, E4, F4), positively valenced (F4, F4, A4, A4, F4, E4, F4), and negatively valenced (F4, F4, Ab4, Ab4, F4,
E4, F4) emotions. The neutral melody did not contain the 3rd scale degree, while the positive and negative melodies
were in the major and minor mode respectively (Kastner and Crowder, 1990). Stimuli were presented visually on a
15” Macbook Pro running Windows XP SP3 and Matlab 2009b, and auditorily over KRK Rocket 5 speakers,
controlled by Matlab and the Psychophysics Toolbox (3.0.8 SVN 1648, Brainard, 1997). Vocal utterances were
captured with an AKG C414 B-XLS cardioid microphone with a pop filter, positioned 30 cm from the actor, on a
Mac Pro computer with Pro Tools at 48 kHz. Recordings were edited using Adobe Premiere 6. To avoid perceptual
confusion between the three melodies, song trials were pitch corrected using Melodyne to ensure the mean
fundamental frequency of each note did not exceed ± 35 cents of the notated melody (Vurma and Ross, 2006).
Occasional pops filtered using a high-pass filter (100 Hz) in Adobe Audition. Vocal intensity was peak-normalized
within each actor to retain intensity variability across their emotions. The perceptual validity of the recordings was
tested in a pilot experiment with 8 raters. An average accuracy of 70.9% was recorded across the analyzed emotions.
Design, Procedure, and Analysis
The experimental design was a 2 (Domain: speech, song) × 11 (Emotion: neutral, calm, very calm, happy, very
happy, sad, very sad, angry, very angry, fearful, very fearful) × 2 (Statement: Kids, Dogs) × 2 (Repetition: 1, 2)
within-subjects design, with 88 trials per participant1. Trials were blocked by Production, with speech presented first
to reduce any temporal influences from the regularity of the song condition. Trials were blocked by Domain and
Emotion to reduce fatigue and the order of blocks was counter-balanced across participants. A dialogue script was
used when working with the actors. Each emotion was described, along with a vignette describing a scenario
involving that emotion. It was emphasized that actors were to produce realistic expressions of emotion, and that they
were to prepare themselves physiologically using method acting to induce the desired emotion prior to recording. In
the singing condition, actors were told to sing the basic notated pitches, but they were free to vary other acoustic
characteristics in order to convey the desired emotion.
Acoustic recordings were analyzed with Praat (Boersma and Weenink, 2013). Vocal duration, Fundamental
frequency (floor and range), vocal intensity, first formant frequency (F1, mean) and HF500 (proportion of energy
above 500 Hz/energy below 500 Hz, Juslin and Laukka, 2001) were extracted. For this preliminary analysis,
measures across the entire utterance were performed. Statistical analyses were combined across actor, statement, and
repetition.
RESULTS
Separate two-way analyses of variance by Domain and Emotion were conducted on Duration, Fundamental
Frequency (floor and range), Vocal Intensity, F1, and HF500, as listed in Table 1.
TABLE 1. Summary of results from Analysis of Variance of vocal features, showing the F-
scores. Analyses examined 2 Domains (speech, song), and 11 Emotions (happy, sad). * p < .01,
** p < .001, otherwise p < .05, Dunn-Bonferroni corrected.
Feature
Domain
Emotion
D X E
Duration
124.73**
5.87**
6.16**
Fundamental freq. (Floor)
31.10
6.52**
3.72*
Fundamental freq. (Range)
n.s.
5.48**
6.38**
Vocal intensity (Mean)
132.70**
24.80**
10.98**
First formant freq. (Mean)
n.s.
14.48**
2.91
HF500 (Mean)
n.s.
12.70**
n.s.
The main effect of Domain (speech, song) was significant for three of the six analyzed features (duration, F0
floor, vocal intensity), with singing having a longer duration (2.69s vs 1.69s), higher pitch floor (202.56 Hz vs
147.72 Hz)2, and louder vocal intensity (50.45 dB vs 46.0 dB) than speech respectively. Interestingly, the main
effect of Emotion was significant for all six analyzed features. Across speech and song, low arousal emotions: calm
and sad, exhibited longer durations than high arousal emotions: happy, angry and fearful. Similarly, low arousal
emotions: neutral, calm and sad (but not very sad), exhibited a smaller range in fundamental frequency than high
arousal emotions: happy, angry, and fearful. This pattern of effects was again reflected in vocal intensity, displayed
in Figures 1 and 2, with high arousal emotions generally exhibiting greater vocal intensity than lower arousal
emotions. First formant frequency and HF500 were also elevated for happy, angry, and fearful emotions. Significant
interactions of Domain and Emotion were reported in five of the six analyzed features. Overall, the interactions did
not vary the general pattern of main effects. Pronounced differences were shown in fundamental frequency (Floor,
range), though this was expected as vocalists were required to sing using the prescribed melody pitches.
1 Four additional emotions in speech (surprised, very surprised, disgusted, very disgusted) were not included in the analyses as the singing
condition did not contain these emotions. It was felt that song could not adequately express these emotions.
2 Future analyses will consider the effect of gender, which is important for song.
FIGURE 1. Mean vocal intensity across the 11 emotions for speech (a), and song (b). The figure illustrates the two main effects
of Domain and Emotion, with song being louder overall than speech, and with speech emotions appearing to show greater
variability in intensity than song.
CONCLUSION
Speech and song have long been considered an entwined form of vocal expression. Despite their long
association, there have until now been no direct comparisons of the acoustic similarities of speech and song in their
expression of emotion. In this paper we presented preliminary data from the Ryerson Audio-Visual Database of
Emotional Speech and Song (RAVDESS). We showed that speech and song shared many of the same acoustic
features in their expression of emotion, while also exhibiting differences that distinguish speech from song.
Collectively, these data support the notion that speech and song may once have emerged from a common vocal
origin.
ACKNOWLEDGMENTS
This research was supported through grants from AIRS (Advancing Interdisciplinary Research in Singing), a
Major Collaborative Research Initiative of the Social Sciences and Humanities Research Council of Canada,
awarded to the first and third authors, and by a Discovery grant from the Natural Sciences and Engineering Research
Council of Canada awarded to the third author. The authors thank Gabe Nespoli and Alex Andrews for their
assistance.
REFERENCES
Boersma, P., and Weenink, D. (2013). "Praat: doing phonetics by computer."
Brainard, D. H. (1997). "The psychophysics toolbox," Spatial Vision 10, 433-436.
Brown, S. (2000). "The" musilanguage" model of musical evolution," in The origins of music, edited by N. L. Wallin, B. Merker,
and S. Brown (The MIT Press, Cambridge, Mass), pp. 271-300.
Coltheart, M. (1981). "The MRC psycholinguistic database," The Quarterly Journal of Experimental Psychology Section A 33,
497-505.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. G. (2001). "Emotion
recognition in human-computer interaction," Signal Processing Magazine, IEEE 18, 32-80.
Darwin, C. (1871). The descent of man and selection in relation to sex (John Murray London).
Janzen, J. M. (1992). Ngoma: discourses of healing in Central and Southern Africa (University of California Press).
Juslin, P. N., and Laukka, P. (2001). "Impact of intended emotion intensity on cue utilization and decoding accuracy in vocal
expression of emotion," Emotion 1, 381.
Juslin, P. N., and Laukka, P. (2003). "Communication of emotions in vocal expression and music performance: Different
channels, same code?," Psychological Bulletin 129, 770.
Kastner, M. P., and Crowder, R. G. (1990). "Perception of the major/minor distinction: IV. Emotional connotations in young
children," Music Perception, 189-201.
Livingstone, S. R., Peck, K., and Russo, F. A. (2012). "RAVDESS: The Ryerson Audio-Visual Database of Emotional Speech
and Song.," in 22nd Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS)
(Kingston, ON.).
Mithen, S. J. (2005). The singing Neanderthals: The origins of music, language, mind, and body (Harvard Univ Pr).
Nettl, B. (2000). "An ethnomusicologist contemplates universals in musical sound and musical culture," The origins of music,
463-472.
Nettl, B. (2005). The study of ethnomusicology: thirty-one issues and concepts (Univ of Illinois Pr).
Scherer, K. R. (1995). "Expression of emotion in voice and music," Journal of Voice 9, 235-248.
Spencer, H. (1875). "The origin and function of music," in Fraser's Magazine, pp. 396-408.
Stamou, L. (2002). "Plato and Aristotle on music and music education: Lessons from ancient Greece," Internationl Journal of
Music Education 39, 3-16.
Sundberg, J. (1998). "Expressivity in singing. A review of some recent investigations," Logopedics Phonatrics Vocology 23, 121-
127.
Vurma, A., and Ross, J. (2006). "Production and perception of musical intervals," Music Perception 23, 331-344.
... In contrast, the more anterior portions of PP were activated significantly more to singing than to either speech or instruments. In this case, we failed to identify one, or a linear combination of, acoustic parameters that correlated with activity in this region, including factors previously identified as differentiating singing from speech, such as duration, fundamental frequency floor, and vocal intensity (Livingstone, Peck, & Russo, 2013). One possible explanation for this null result is that the transition from speech to song involves a more complex, nonlinear weighting of several acoustic features (Saitou, Tsuji, Unoki, & Akagi, 2004;Saitou, Goto, Unoki, & Akagi, 2007;Livingstone, Peck & Russo, 2013). ...
... In this case, we failed to identify one, or a linear combination of, acoustic parameters that correlated with activity in this region, including factors previously identified as differentiating singing from speech, such as duration, fundamental frequency floor, and vocal intensity (Livingstone, Peck, & Russo, 2013). One possible explanation for this null result is that the transition from speech to song involves a more complex, nonlinear weighting of several acoustic features (Saitou, Tsuji, Unoki, & Akagi, 2004;Saitou, Goto, Unoki, & Akagi, 2007;Livingstone, Peck & Russo, 2013). Overall, the brief duration of the stimuli did not allow for the computation of additional information about the acoustic features to directly explore this question. ...
Article
The ubiquity of music across cultures as a means of emotional expression, and its proposed evolutionary relation to speech, motivated researchers to attempt a characterization of its neural representation. Several neuroimaging studies have reported that specific regions in the anterior temporal lobe respond more strongly to music than to other auditory stimuli, including spoken voice. Nonetheless, because most studies have employed instrumental music, which has important acoustic distinctions from human voice, questions still exist as to the specificity of the observed “music‐preferred” areas. Here, we sought to address this issue by testing 24 healthy young adults with fast, high‐resolution fMRI, to record neural responses to a large and varied set of musical stimuli, which, critically, included a capella singing, as well as purely instrumental excerpts. Our results confirmed that music; vocal or instrumental, preferentially engaged regions in the superior STG, particularly in the anterior planum polare, bilaterally. In contrast, human voice, either spoken or sung, activated more strongly a large area along the superior temporal sulcus. Findings were consistent between univariate and multivariate analyses, as well as with the use of a “silent” sparse acquisition sequence that minimizes any potential influence of scanner noise on the resulting activations. Activity in music‐preferred regions could not be accounted for by any basic acoustic parameter tested, suggesting these areas integrate, likely in a nonlinear fashion, a combination of acoustic attributes that, together, result in the perceived musicality of the stimuli, consistent with proposed hierarchical processing of complex auditory information within the temporal lobes.
... Such filters act as frequency selectors, producing different resonance frequencies known as formants in speech technology literature. Finally, the same theoretical framework can be applied to analyze the singing voice [7]. Indeed, in the present article, Fant's theory was leveraged to analyze the singing voice since singing and speaking can be explained by the same underlying acoustic theory. ...
Article
Full-text available
Behind human voice production, a complex biological mechanism generates and modulates sound. Recent research has explored machine-learning (ML) techniques to analyze singing-voice characteristics. However, the classification efficiency reported in such research works suggests the possibility of improvement. In addition, there is also scope for further improvement through the application of still under-utilized optimization techniques. Thus, the present article proposes a novel approach that leverages the Differential Evolution (DE) algorithm to optimize hyperparameters within three selected ML models, with the aim of classifying singing-voice registers i.e., chest, mixed, and head registers). To develop the present study, a dataset of 350 audio files encompassing the three aforementioned registers was constructed. Then, the TSFEL Python library was employed to extract 14 pieces of temporal information from the audio signals for subsequent classification by the employed ML models. The obtained findings demonstrated that the Extreme Gradient Boosting model, optimized with DE, achieved an average classification accuracy of 97.60%, thus indicating the efficacy of the proposed approach for singing-voice register classification.
... One possibility for the greater responsivity of the pre-SMA to expressions of happiness is that head movements tend to be greater for happy than for sad expressions (Livingstone & Palmer, 2016). Similarly, the frequency range and intensity of vocal emotions tend to be larger for happy expressions (Livingstone & Russo, 2018;Livingstone et al., 2013). Thus, while both happy and sad speech are more likely to recruit the pre-SMA than neutral speech, happy speech may be able to elicit a stronger response. ...
Article
Full-text available
Sensorimotor brain areas have been implicated in the recognition of emotion expressed on the face and through nonverbal vocalizations. However, no previous study has assessed whether sensorimotor cortices are recruited during the perception of emotion in speech—a signal that includes both audio (speech sounds) and visual (facial speech movements) components. To address this gap in the literature, we recruited 24 participants to listen to speech clips produced in a way that was either happy, sad, or neutral in expression. These stimuli also were presented in one of three modalities: audio-only (hearing the voice but not seeing the face), video-only (seeing the face but not hearing the voice), or audiovisual. Brain activity was recorded using electroencephalography, subjected to independent component analysis, and source-localized. We found that the left presupplementary motor area was more active in response to happy and sad stimuli than neutral stimuli, as indexed by greater mu event-related desynchronization. This effect did not differ by the sensory modality of the stimuli. Activity levels in other sensorimotor brain areas did not differ by emotion, although they were greatest in response to visual-only and audiovisual stimuli. One possible explanation for the pre-SMA result is that this brain area may actively support speech emotion recognition by using our extensive experience expressing emotion to generate sensory predictions that in turn guide our perception.
... One possibility for the greater responsivity of the pre-SMA to expressions of happiness is that head movements tend to be greater for happy than for sad expressions (Livingstone & Palmer, 2016). Similarly, the frequency range and intensity of vocal emotions tend to be larger for happy expressions (Livingstone & Russo, 2018;Livingstone et al., 2013). Thus, while both happy and sad speech are more likely to recruit the pre-SMA than neutral speech, happy speech may be able to elicit a stronger response. ...
Preprint
Sensorimotor brain areas have been implicated in the recognition of emotion expressed on the face and through non-verbal vocalizations. However, no previous study has assessed whether sensorimotor cortices are recruited during the perception of emotion in speech, a signal that includes both audio (speech sounds) and visual (facial speech movements) components. To address this gap in the literature, we recruited 24 participants to listen to speech clips expressed in a way that was either a happy, sad, or neutral. These stimuli were also presented in one of three modalities: audio-only (hearing the voice but not seeing the face), video-only (seeing the face but not hearing the voice), or audiovisual. Brain activity was recorded using electroencephalography, subjected to independent component analysis, and source-localized. We found that the left pre-supplementary motor area was more active in response to happy and sad stimuli than neutral stimuli, as indexed by greater mu event-related desynchronization. This effect did not differ by the sensory modality of the stimuli. Activity levels in other sensorimotor brain areas did not differ by emotion, although they were greatest in response to visual-only and audiovisual stimuli. One possible explanation for the pre-SMA result is that this brain area may actively support speech emotion recognition by using our extensive experience expressing emotion to generate sensory predictions that in turn guide our perception.
... e perception of emotion in speech [31,39] has also been studied extensively. Evidence of acoustic [29,34,42] and physiological [49] connections between emotional expression in singing and speech have also been shown. A variety of electronic masking techniques, such as random-splicing or low-pass ltering, have been performed in perceptual studies with the aim of evaluating which speci c acoustic parameters are involved in emotional speech [23,40]. ...
... Even for humans it is sometimes difficult to correctly discriminate speech from singing voices [3], [4]. Nevertheless, there is an agreement about some characteristics that behave differently between speech and singing [5], [6]: ...
... e perception of emotion in speech [31,39] has also been studied extensively. Evidence of acoustic [29,34,42] and physiological [49] connections between emotional expression in singing and speech have also been shown. A variety of electronic masking techniques, such as random-splicing or low-pass ltering, have been performed in perceptual studies with the aim of evaluating which speci c acoustic parameters are involved in emotional speech [23,40]. ...
Conference Paper
Full-text available
With the increased usage of internet based services and the mass of digital content now available online, the organisation of such content has become a major topic of interest both commercially and within academic research. The addition of emotional understanding for the content is a relevant parameter not only for music classification within digital libraries but also for improving users experiences, via services including automated music recommendation. Despite the singing voice being well-known for the natural communication of emotion, it is still unclear which specific musical characteristics of this signal are involved such affective expressions. The presented study investigates which musical parameters of singing relate to the emotional content, by evaluating the perception of emotion in electronically manipulated a cappella audio samples. A group of 24 individuals participated in a perception test evaluating the emotional dimensions of arousal and valence of 104 sung instances. Key results presented indicate that the rhythmic-melodic contour is potentially related to the perception of arousal whereas musical syntax and tempo can alter the perception of valence.
... Two comprised speaking the phrases using the original words of these two songs, two comprised singing these word phrases, two comprised speaking the same phrases but with the vowel /a/ substituted for each syllable instead of the original words, and two comprised singing the same syllable phrases. Because the main factors in recognizing speaking and singing voices have been found to be the rate of speaking, intensity, and the fundamental frequency (F0), 16,17 all recordings were made with an in-ear metronome set at 80 bpm and the loudness was kept constant; thus, although it could be ad-justed by the participants during the experiment to fit their comfort level, the level would be the same for all recorded voices and regardless of speaking or singing. Additionally, C4 was set as the starting pitch for all singing. ...
Article
Full-text available
Objectives and hypothesis: We tested whether speaking voices of unfamiliar people could be matched to their singing voices, and, if so, whether the content of the utterances would influence this matching performance. Our hypothesis was that enough acoustic features would remain the same between speaking and singing voices such that their identification as belonging to the same or different individuals would be possible even upon a single hearing. We also hypothesized that the contents of the utterances would influence this identification process such that voices uttering words would be easier to match than those uttering vowels. Study design: We used a within-participant design with blocked stimuli that were counterbalanced using a Latin square design. In one block, mode (speaking vs singing) was manipulated while content was held constant; in another block, content (word vs syllable) was manipulated while mode was held constant, and in the control block, both mode and content were held constant. Method: Participants indicated whether the voices in any given pair of utterances belonged to the same person or to different people. Results: Cross-mode matching was above chance level, although mode-congruent performance was better. Further, only speaking voices were easier to match when uttering words. Conclusions: We can identify speaking and singing voices as the same or different even on just a single hearing. However, content interacts with mode such that words benefit matching of speaking voices but not of singing voices. Results are discussed within an attentional framework.
Article
Human emotion recognition is critical to people managing their stress and emotions. Although many innovative techniques have been proposed to recognize human emotions, it is still challenging to understand the emotions due to individual differences in the diversity of emotions. This article focuses on analyzing the emotions computationally. In detail, a wavelet transform technique is utilized to extract significant features and find patterns in an emotion dataset. With the extracted features, both classification and visual analysis are performed. For the classification, Logistic Regression, C4.5, and Support Vector Machine are used. Visualization approaches are also utilized to represent similarities and differences among the emotion patterns. From the analysis, the authors found that the proposed method shows an improvement in identifying the differences among the emotions.
Article
Full-text available
This Article Reports Two Experiments. In the first experiment, 13 professional singers performed a vocal exercise consisting of three ascending and descending melodic intervals: minor second, tritone, and perfect fifth. Seconds were sung more narrowly but fifths more widely in both directions, as compared to their equally tempered counterparts. In the second experiment, intonation accuracy in performances recorded from the first experiment was evaluated in a listening test. Tritones and fifths were more frequently classified as out of tune than seconds. Good correspondence was found between interval tuning and the listeners responses. The performers themselves evaluated their performance almost randomly in the immediate post-performance situation but acted comparably to the independent group after listening to their own recording. The data suggest that melodic intervals may be, on an average, 20 to 25 cents out of tune and still be estimated as correctly tuned by expert listeners.
Chapter
The book can be viewed as representing the birth of evolutionary biomusicology. What biological and cognitive forces have shaped humankind's musical behavior and the rich global repertoire of musical structures? What is music for, and why does every human culture have it? What are the universal features of music and musical behavior across cultures? In this groundbreaking book, musicologists, biologists, anthropologists, archaeologists, psychologists, neuroscientists, ethologists, and linguists come together for the first time to examine these and related issues. The book can be viewed as representing the birth of evolutionary biomusicology—the study of which will contribute greatly to our understanding of the evolutionary precursors of human music, the evolution of the hominid vocal tract, localization of brain function, the structure of acoustic-communication signals, symbolic gesture, emotional manipulation through sound, self-expression, creativity, the human affinity for the spiritual, and the human attachment to music itself. Contributors Simha Arom, Derek Bickerton, Steven Brown, Ellen Dissanayake, Dean Falk, David W. Frayer, Walter Freeman, Thomas Geissmann, Marc D. Hauser, Michel Imberty, Harry Jerison, Drago Kunej, François-Bernard Mâche, Peter Marler, Björn Merker, Geoffrey Miller, Jean Molino, Bruno Nettl, Chris Nicolay, Katharine Payne, Bruce Richman, Peter J.B. Slater, Peter Todd, Sandra Trehub, Ivan Turk, Maria Ujhelyi, Nils L. Wallin, Carol Whaling Bradford Books imprint
Article
Thirty-eight children between ages 3 and 12 listened to 12 short musical passages derived from a counterbalanced 2 × 2 arrangement of (1) major versus minor modes and (2) harmonized versus simple melodic realizations of these modes. For each passage, they pointed to one of four schematic faces chosen to symbolize happy, sad, angry, and contented facial expressions. The main result was that all children, even the youngest, showed a reliable positive-major/negative-minor connotation, thus conforming to the conventional stereotype. The possible contributions of native and experiential factors to this behavior are discussed.
Article
This paper describes a computerised database of psycholinguistic information. Semantic, syntactic, phonological and orthographic information about some or all of the 98,538 words in the database is accessible, by using a specially-written and very simple programming language. Word-association data are also included in the database. Some examples are given of the use of the database for selection of stimuli to be used in psycholinguistic experimentation or linguistic research. © 1981, The Experimental Psychology Society. All rights reserved.
Article
In the current resurgence of interest in the biological basis of animal behavior and social organization, the ideas and questions pursued by Charles Darwin remain fresh and insightful. This is especially true of The Descent of Man and Selection in Relation to Sex, Darwin's second most important work. This edition is a facsimile reprint of the first printing of the first edition (1871), not previously available in paperback. The work is divided into two parts. Part One marshals behavioral and morphological evidence to argue that humans evolved from other animals. Darwin shoes that human mental and emotional capacities, far from making human beings unique, are evidence of an animal origin and evolutionary development. Part Two is an extended discussion of the differences between the sexes of many species and how they arose as a result of selection. Here Darwin lays the foundation for much contemporary research by arguing that many characteristics of animals have evolved not in response to the selective pressures exerted by their physical and biological environment, but rather to confer an advantage in sexual competition. These two themes are drawn together in two final chapters on the role of sexual selection in humans. In their Introduction, Professors Bonner and May discuss the place of The Descent in its own time and relation to current work in biology and other disciplines.
Article
Some old and recent contributions to the description of expression in singing are reviewed and their implications are discussed. The contributions provide reasons to assume that, generally, singers' attempts to convey an emotional ambiance are successful. Moreover, expressivity seems to contain information on musical structure, enhances categorisation of tones and embeds the performance in an appropriate emotional ambiance. This is realised in a code, consisting of meaningful modulation of various parameters such as tempo and vibrato extent, of ascending glides to target pitch, and of shaving off or sharpening loudness and pitch contours. It seems likely that, to a large extent, the code is taken from speech prosody.
Article
Domesticated animal excitement, whether of happiness or the opposite, is marked by muscular excitement. And in the struggles of creatures in pain, we see that the like relation holds between excitement of the muscles and excitement of the nerves of sensation. In ourselves, distinguished from lower creatures as we are by feelings alike more powerful and more varied, parallel facts are at once more conspicuous and more numerous. We may conveniently look at them in groups. We shall find that pleasurable sensations and painful sensations, pleasurable emotions and painful emotions, all tend to produce active demonstrations in proportion to their intensity. All feelings, then--sensations or emotions, pleasurable or painful--have a common characteristic, that they are muscular stimuli. Not forgetting the few apparently exceptional cases in which emotions exceeding a certain intensity produce prostration, we may set it down as a general law that, alike in man and animals, there is a direct connection between feeling and motion; the last growing more vehement as the first grows more intense. Were it allowable here to treat the matter scientifically, we might trace this general law down to the principle known among physiologists as that of reflex action. Without doing this, however, the above numerous instances justify the generalization, that mental excitement of all kinds ends in excitement of the muscles; and that the two preserve a more or less constant ratio to each other, "But what has all this to do with The Origin and Function of Music?" asks the reader. Very much, as we shall presently see. This chapter addresses the following topics: Vocal sounds and states of feeling; Emotions expressed by pitch; Emotions expressed by intervals; Basis of a theory of music; Rhythmic motion under excitement; Development of emotional speech; Sensibility of musical composers; Physiological explanation of musical effects; Its indirect benefits and pleasures; It develops the language of the emotions; Importance of emotional language; and Future growth of emotional language. (PsycINFO Database Record (c) 2012 APA, all rights reserved)