ArticlePDF Available

Singing ability is related to vocal emotion recognition: Evidence for shared sensorimotor processing across speech and music


Abstract and Figures

The ability to recognize emotion in speech is a critical skill for social communication. Motivated by previous work that has shown that vocal emotion recognition accuracy varies by musical ability, the current study addressed this relationship using a behavioral measure of musical ability (i.e., singing) that relies on the same effector system used for vocal prosody production. In the current study, participants completed a musical production task that involved singing four-note novel melodies. To measure pitch perception, we used a simple pitch discrimination task in which participants indicated whether a target pitch was higher or lower than a comparison pitch. We also used self-report measures to address language and musical background. We report that singing ability, but not self-reported musical experience nor pitch discrimination ability, was a unique predictor of vocal emotion recognition accuracy. These results support a relationship between processes involved in vocal production and vocal perception, and suggest that sensorimotor processing of the vocal system is recruited for processing vocal prosody.
Content may be subject to copyright.
Singing ability is related to vocal emotion recognition: Evidence
for shared sensorimotor processing across speech and music
Emma B. Greenspon
&Victor Montanaro
Accepted: 3 November 2022
#The Psychonomic Society, Inc. 2022
The ability to recognize emotion in speech is a critical skill for social communication. Motivated by previous work that has
shown that vocal emotion recognition accuracy varies by musical ability, the current study addressed this relationship using a
behavioral measure of musical ability (i.e., singing) that relies on the same effector system used for vocal prosody production. In
the current study, participants completed a musical production task that involved singing four-note novel melodies. To measure
pitch perception, we used a simple pitch discrimination task in which participants indicated whether a target pitch was higher or
lower than a comparison pitch. We also used self-report measures to address language and musical background. We report that
singing ability, but not self-reported musical experience nor pitch discrimination ability, was a unique predictor of vocal emotion
recognition accuracy. These results support a relationship between processes involved in vocal production and vocal perception,
and suggest that sensorimotor processing of the vocal system is recruited for processing vocal prosody.
Keywords Singing accuracy .Music ability .Emotion recognition .Prosody
Human languagerelies on emotional cues that are defined by a
number of non-verbal acoustic features, including pitch, tim-
bre, tempo, loudness, and duration (Coutinho & Dibben,
2013). Prosodic features such as fluctuations in vocal pitch
and loudness have been linked to physiological responses as-
sociated with the emotion that is being expressed in both
speech and music (Juslin & Laukka, 2003; Scherer, 2009).
According to arousal-based and multi-component theories of
emotion, these physiological changes underlie emotion ap-
praisal (James, 1884; Scherer, 2009), and, therefore, physio-
logical arousal may reflect one possible pathway by which
vocal cues can convey information to a listener about a
speakers internal state. Furthermore, during in-person inter-
actions, vocal cues are closely coupled with changes in facial
behavior (Yehia et al., 1998), reflecting the dynamic and mul-
timodal nature of emotion cues during conversation.
Relatedly, automatic mimicry of facial gestures occurs when
processing emotional speech and singing (Livingstone et al.,
2009; Stel & van Knippenberg, 2008)andhasbeenlinkedto
emotion recognition (Stel & van Knippenberg, 2008).
Recognition of vocal prosody has also been shown to relate
to music background (for a review, see Nussbaum &
Schweinberger, 2021). For instance, Dmitrieva et al. (2006)
found that musically gifted children showed enhanced vocal
emotion recognition compared to age-matched non-musi-
cians. This effect varied by age group, with the largest differ-
ence reserved for the youngest group (710 years old), which
may suggest that early music experience facilitates socio-
cognitive development (Gerry et al., 2012). Fuller et al.
(2014) reported that effects of musical experience persist in
adulthood, with adult musicians exhibiting better vocal emo-
tion recognition than adult non-musicians, and this effect held
even under degraded listening conditions. In line with Fuller
et al. (2014), Lima and Castro (2011) found that musicians are
better at recognizing emotions in speech than non-musicians,
even when controlling for other variables like general cogni-
tive abilities and personality traits. To address the
directionality of the musician effect, Thompson et al. (2004)
and Good et al. (2017) used early music interventions with
children. Thompson and colleagues (2004) found that children
with musical training in piano, but not voice, recognizedvocal
emotion more accurately than children without musical train-
ing. Similarly, Good et al. (2017) found that children with
cochlear implants showed enhanced vocal emotion
*Emma B. Greenspon
Department of Psychology, Monmouth University, West Long
Branch, NJ, USA
Attention, Perception, & Psychophysics
recognition after musical training in piano compared to a con-
trol group that received training in painting.
However, a role of musical experience in vocal prosody
processing has not been consistently demonstrated in previous
research. For instance, in contrast to Thompson et al. (2004)
study, Trimmer and Cuddy (2008), who used the same battery
as Thompson et al. (2004), reported that musical training did
not account for individual differences in vocal emotion recog-
nition. In that same study, emotional intelligence, on the other
hand, was a reliable predictor of vocal emotion recognition,
but did not reliably relate to years of musical training
(Schellenberg, 2011; cf. Petrides et al., 2006). In addition,
Dibben et al. (2018) found an effect of musical training on
emotion recognition in music, but not speech.
prosody, then one could expect individuals with poor mu-
sical abilities to exhibit impairments in recognizing vocal
emotion. This claim was addressed by Thompson et al.
(2012)andZhangetal.(2018), who found that individ-
uals with congenital amusia, a deficit in music processing,
exhibited lower sensitivity to vocal emotion relative to
individuals without amusia. In order to build on work
demonstrating that vocal emotion recognition varies by
musical ability, the current study was designed to address
the role of musical ability in processing vocal emotion
using a musical task (i.e., singing) that recruits a shared
effector system with speech production.
In order to sing a specific pitch with ones voice, a
singer must be able to accurately associate a perceptual
representation of the target pitch with the exact motor
plan of the vocal system that would produce that pitch.
As such, singing is a vocal behavior that reflects sensori-
motor processing. Previous work on individual differ-
ences in singing ability has found that although inaccurate
singing can exist without impaired pitch perception
(Pfordresher & Brown, 2007), pitch perception has been
shown to correlate with pitch imitation ability (Greenspon
& Pfordresher, 2019), with stronger associations observed
across singing performance and performance on percep-
tual measures that assess higher-order musical representa-
tions (Pfordresher & Nolan, 2019). Although inaccurate
singers can show impairment in matching pitch with their
voice, but not when matching pitch using a tuning instru-
ment (Demorest, 2001; Demorest & Clements, 2007;
Hutchins & Peretz, 2012;Hutchinsetal.,2014), these
individuals exhibit similar vocal ranges to accurate
singers, non-random imitation performance, and have in-
telligible speech production, suggesting that these singers
express at least some degree of vocal-motor precision
(Pfordresher & Brown, 2007). While neither a purely per-
ceptual nor motoric account may be able to fully explain
individual differences in singing ability, behavioral stud-
ies measuring auditory imagery, a mental process that
recruits both perceptual and motor planning areas of the
brain (Herholz et al., 2012; Lima et al., 2016), have sup-
ported a sensorimotor account of inaccurate singing
(Greenspon et al., 2017; Greenspon et al., 2020;
Greenspon & Pfordresher, 2019; Pfordresher & Halpern,
It is important to note that the ability to accurately vary
vocal pitch is not only a critical feature in singing but also
an important dimension for communicating spoken prosody,
another vocal behavior relying on sensorimotor processing
(Aziz-Zadeh et al., 2010; Banissy et al., 2010; Pichon &
Kell, 2013). Previous neuroimaging work has established
that vocal prosody production recruits overlapping sensori-
motor speech pathways used for vocal prosody perception
(Aziz-Zadeh et al., 2010). Furthermore, disrupting these sen-
sorimotor pathways through transcranial magnetic stimula-
tion disrupts ones ability to discriminate non-verbal vocal
emotions (Banissy et al., 2010). Complementing this find-
ing, Correia et al. (2019) reported that emotion recognition
is associated with individual differences in childrenssenso-
rimotor processing. Together, these neuroimaging results
suggest a link between vocal prosody perception and the
vocal system.
Given that both singing and spoken prosody have been
linked to individual differences in sensorimotor processing
(Aziz-Zadeh et al., 2010; Pfordresher & Brown, 2007;
Pfordresher & Mantell, 2014), it is possible that a similar
mechanism that accounts for individual differences in vocal
imitation of pitch in the context of singing may also account
for individual differences in vocal emotion, as suggested by
the Multi-Modal Imagery Association (MMIA) model
(Pfordresher et al., 2015), a general model of sensorimotor
processing based on multi-modal imagery. Such a claim is
supported by neuroimaging research that consistently demon-
strates that motor planning regions are recruited during audi-
tory imagery for both speech and music (for a review, see
Lima et al., 2016). A shared sensorimotor network for singing
and vocal emotion also aligns with predictions made by the
OPERA hypothesis in which overlapping brain networks for
music and speech are proposed to account for the facilitatory
effects of music processing on speech processing (Patel, 2011,
2014). Furthermore, behavioral studies support evidence for at
least partially shared processes involved in vocal production
of speech and song (Christiner & Reiterer, 2013,2015;
Christiner et al., 2022), and have shown that inaccurate imita-
tors of pitch in speech tend to also show impairments in imi-
tating pitch in song (Mantell & Pfordresher, 2013;Wangetal.,
In addition to studies on vocal production, behavioral re-
sults have supported the role of vocal pitch perception in
speech processing. In a study conducted by Schelinski and
von Kriegstein (2019), individuals who were better at discrim-
inating vocal pitch tended to also be better at recognizing
Attention, Perception, & Psychophysics
vocal emotion. One disorder that has been linked to deficits in
vocal emotion recognition is autism spectrum disorder (ASD;
Globerson et al., 2015; Schelinski & von Kriegstein, 2019).
Individuals with ASD have been found to exhibit impairments
in both vocal pitch perception (Schelinski & von Kriegstein,
2019) and imitation of pitch in speech and song (Jiang et al.,
2015;Wangetal.,2021), though ASD can exist with unim-
paired non-vocal pitch perception (Schelinski & von
Kriegstein, 2019). Together, this pattern of findings suggests
that emotion recognition may recruit processes involved in the
vocal system and that for those who exhibit impaired emotion
recognition, these impairments may extend to behaviors in-
volving vocal production and vocal perception.
We addressed the role of sensorimotor processing in vocal
prosody perception for the following reasons. First, physio-
logical changes that occur during felt emotion have been
shown to influence vocal expression in both speech and song
(Juslin & Laukka, 2003; Scherer, 2009), suggesting that vocal
cues can provide information about anothers internal state.
Second, previous work has found that vocal pitch perception
is associated with emotion recognition ability (Schelinski &
von Kriegstein, 2019) and that impairments in emotion recog-
nition, vocal production, and vocal perception co-occur (Jiang
et al., 2015; Schelinski & von Kriegstein, 2019;Wangetal.,
2021), suggesting a possible relationship between emotion
processing and the vocal system. Third, neuroimaging work
has provided evidence that perceiving vocal prosody recruits
overlapping sensorimotor networks involved in vocal produc-
tion (Aziz-Zadeh et al., 2010; Skipper et al., 2017), and that
individual differences in these sensorimotor pathways are re-
lated to emotion recognition (Correia et al., 2019). For these
reasons, we hypothesized that singing ability would relate to
vocal emotion recognition accuracy. Spoken pseudo-
sentences were used in the vocal emotion recognition task in
order to focus on prosodic features while controlling for se-
mantic information (Pell & Kotz, 2011). We assessed singing
ability using a singing protocol that has been found to produce
comparableassessments of singing accuracy for in-person and
online settings (Honda & Pfordresher, 2022). Pitch discrimi-
nation ability was measured in order to address whether vocal
emotion recognition ability can be accounted for by lower-
level pitch processing, and self-reported musical experience
was also assessed.
Seventy-nine undergraduate students at Monmouth
University participated in the study for course credit. Four
participants were removed from this sample due to problems
related to administering the experiment and four additional
participants were removed due to poor performance levels in
at least one task that suggested that participants either did not
follow instructions in the task or exhibited a deficit in pitch
This resulted in a sample of 71 participants (57
female participants, 14 male participants) who were between
18 and 53 years of age (M=20.10,SD = 4.48). Music expe-
rience ranged from 0 to 18 years (M=3.30,SD = 4.86) and 13
participants reported the voice as their primary instrument.
Eight participants reported a language other than English as
their first language, and all participants reported learning
English by the age of eight years.
Singing task
Singing accuracy was measured by participantsperformances
on the pattern pitch imitation task from the Seattle Singing
Accuracy Protocol (SSAP; Demorest et al., 2015) in which
participants heard and then imitated four-note novel melodies.
Melodies comprised pitches that reflected common comfort-
able female and male vocal ranges based on unpublished data
from the SSAP database. For female participants, melodies
were centered around a single pitch (A3) that is typically com-
fortable for female singers. Melodies were presented one oc-
tave lower for male participants, with melodies centered
around A2, a pitch that is typically comfortable for male
Pitch discrimination task
Participants also completed a modified non-adaptive version
of the pitch discrimination task from the SSAP (Demorest
et al., 2015), in which participants heard two pitches and de-
termined whether the second pitch was higher or lower than
the initial 500-Hz pitch. There were ten comparison pitches:
300 Hz, 350 Hz, 400 Hz, 450 Hz, 475 Hz, 525 Hz, 550 Hz,
600 Hz, 650 Hz, and 700 Hz. Each comparison pitch was
presented five times for a total of 50 trials, and trials were
presented in a random order.
Three participants were dropped from this sample due to poor recording
quality, one participant was dropped due to experimenter error, oneparticipant
was dropped for singing in the wrong octave, two participants were dropped
due to extreme contour errors in the singing task (> 3 SD from mean), and
based on a priori exclusion criteria one participant was dropped for exhibiting
chance-level performance (chance = .5 proportion correct) in the pitch discrim-
ination task.
Four participants reported Spanish as their first language, one participant
reportedboth English and Spanish as their firstlanguage, and three participants
reported Chinese, Gujarati, or Urdu as their first language.
Attention, Perception, & Psychophysics
Vocal emotion recognition task
Vocal emotion recognition was measured with a selection of
12 English-like pseudo-sentence stimuli (e.g., The rivix
jolled the silling) from Pell and Kotz (2011). Stimuli were
pre-recorded by four speakers (two male and two female
speakers). Each speaker conveyed six different emotions (neu-
trality, happiness, sadness, anger, fear, disgust) for three
pseudo-sentences for a total set of 72 stimuli (4 speakers × 3
sentences × 6 emotions). As such, there were 12 trials per
emotion type. Participants were asked to listen to each sen-
tence and identify the target emotion in a six-option forced-
choice task. Stimuli were presented in one of two pseudo-
randomized orders, ordered so that no speaker, sentence, or
emotion appeared consecutively, and no stimulus was present-
ed in the same position in both orders.
Participants completed the experiment in a private Zoom ses-
sion with the experimenter. Once in the session, participants
received a link to the study, which was administered through
the online platform FindingFive (FindingFive Team, 2019)in
Google Chrome on the participantsown computers. Audio
was presented and recorded by participantsown headphones/
speakers and microphone, and participant recordings were
saved to the FindingFive server as a compressed (ogg) file.
Participants remained in the Zoom session with their audio
connected but their video disabled while completing the ex-
periment through FindingFive. Participants were instructed to
sit upright in a chair in order to promote good singing posture
before completing a vocal warm-up task. For the vocal warm-
up task, participants were instructed to sing a pitch that they
found comfortable singing followed by the highest pitch and
then the lowest pitch that they could sing. Participants then
completed the singing task, which involved imitating a novel
pitch sequence of four notes for six trials. These trials were
preceded by a practice trial. Following the singing task, par-
ticipants completed a pitch discrimination task, which asked
participants to determine whether a second pitch was higher or
lower than the first. Participants then completed the vocal
emotion recognition task. On each trial of this task, partici-
pants listened to a spoken sentence and identified which one
out of six emotions was being conveyed through the
sentences prosody. Participants were then directed to fill out
a musical experience and demographics questionnaire. The
experiment took approximately 30 minutes to complete.
Data analysis
In order to analyze performance in the singing task, the com-
pressed (ogg) files were first converted to wav files using the
file converter FFmpeg (FFmpeg, 2021). Singing accuracy was
then analyzed by extracting the median f
for each sung note
using Praat (Boersma & Weenink, 2013). For each note, the
difference between the sung f
and target f
was calculated
correct imitation was defined as a sung pitch within the range
of 50 cents above or below the target pitch. An incorrect
imitation was defined as any sung pitch outside of the target
range. Correct imitations of a sung pitch were coded as 1 and
incorrect imitations were coded as 0. Singing accuracy was
averaged within a trial and across the six trials of the singing
Music experience was defined based on self-reported number
of years of music experience on the participantsprimary instru-
ment. For the pitch discrimination task, responses that correctly
identified that the comparison pitch was higher or lower than the
target pitch were coded as 1, while all other responses were
coded as 0. Due to high performance in this task, we removed
trials with large pitch changes (i.e., greater than a 200-cent dif-
ference between the target and comparison pitch) to avoid a
ceiling effect and analyzed the remaining 20 trials.
In the vocal emotion recognition task, raw hit rates were cal-
culated by coding a response that correctly identified the intended
emotion as 1, while all other responses were coded as 0. We also
evaluated accuracy by calculating unbiased hit rates (Wagner,
1993), which aligns with procedures for defining unbiased emo-
tion recognition accuracy in Pell and Kotz (2011). For the unbi-
ased hit rates (H
), a value of 0 indicated that the emotion label
was never accurately matched with the intended emotion, and a
value of 1 indicated that the emotion label was always accurately
matched with the intended emotion. We did not have hypotheses
regarding emotion-specific associations across measures, for this
reason, accuracy was then averaged across emotion types in
order to provide an overall measure of vocal emotion recogni-
tion. This was done for both raw and unbiased hit rates. Bivariate
correlations and hierarchical linear regression were conducted to
evaluate individual differences in vocal emotion recognition ac-
curacy. All proportion data were arcsine square-root transformed
for the regression analyses.
The current study addressed whether individual differences in
singing accuracy, pitch discrimination ability, or self-reported
musical experience could best account for variability in emo-
tion recognition of spoken pseudosentences. Bivariate corre-
lations across all measures and descriptive statistics for each
measure are presented in Table 1. Singing accuracy and pitch
discrimination accuracy were calculated as the proportion of
A measure of relative pitch accuracy was calculated for the singing task in
addition to our measure of absolute pitch accuracy. Relative pitch accuracy
was strongly correlated with absolute pitch accuracy (r=.83,p<.05)and
replicated the relationship between singing accuracy and emotion recognition
Attention, Perception, & Psychophysics
correct responses in each task, vocal emotion recognition ac-
curacy was measured as raw and unbiased hit rates, and music
experience was a self-reported measure of the number of years
participants played their primary instrument. Bivariate corre-
lations between predictors and recognition accuracy for dif-
ferent emotion types are presented in the Appendix.
Given the similar pattern observed for both raw and unbi-
ased hit rates shown in Table 1, the remaining analyses focus
on unbiased hit rates to measure vocal emotion recognition
accuracy while controlling for response bias. As shown in Fig.
1, there was a significant correlation between singing accuracy
and unbiased hit rates for vocal emotion recognition such that
individuals who were more accurate at imitating pitch tended
to be better at recognizing vocal emotion than less accurate
singers. In contrast, pitch discrimination (p= .06) and self-
reported musical experience (p= .43) were not correlated with
vocal emotion recognition. In addition to an association with
vocal emotion recognition, unsurprisingly, singing accuracy
was also positively correlated with self-reported musical ex-
perience (p<.01).
We next conducted a three-step hierarchical linear regres-
sion with singing accuracy, pitch discrimination accuracy, and
self-reported musical experience as predictor variables and
unbiased hit rates for vocal emotion recognition as the depen-
dent variable. Predictors were ordered such that theoretically
relevant predictors or predictors that have been previously
shown to relate to vocal emotion recognition (Correia et al.,
2022; Globerson et al., 2013) were entered before the hypoth-
esized predictor of primary interest (i.e., singing accuracy). As
shown in Table 2, only singing accuracy predicted emotion
recognition performance above and beyond the other predic-
tors. Alternative orderings of the predictor variables in the
model produced the same pattern of results.
The current study was designed to address how individual
differences in sensorimotor processes pertaining to the vocal
system, as measured by singing accuracy, may account for a
facilitatory effect of music experience on speech processing.
Correlational analyses revealed that singing accuracy was re-
lated to vocal emotion recognition and music experience, but
neither music experience nor pitch discrimination ability were
related to general vocal emotion recognition. Of particular
importance to the current study, we observed that singing
accuracy was a unique predictor of general vocal emotion
recognition ability when controlling for pitch discrimination
ability and self-reported musical experience.
We interpret the association between singing accuracy and
vocal emotion recognition as evidence for the role of sensori-
motor processing in vocal prosody perception. This explana-
tion is motivated by evidence from previous research that
inaccurate singing is linked to a sensorimotor deficit
(Greenspon et al., 2017; Greenspon et al., 2020; Greenspon
& Pfordresher, 2019; Pfordresher & Brown, 2007;
Pfordresher & Halpern, 2013; Pfordresher & Mantell, 2014)
and that vocal prosody recognition is related to individual
differences in sensorimotor processing (Correia et al., 2019).
Furthermore, based on our evidence that singing ability, but
not self-reported musical experience, is a unique predictor of
general vocal emotion recognition, this finding suggests that
Table 1 Bivariate correlations and descriptive statistics
Measure 1 2 3 4 5
1. Emotion Raw Hit Rates - .99** .25* .17 .02
2. Emotion Unbiased Hit Rates - .26* .19 .02
3. Singing Accuracy - .12 .31**
4. Pitch Discrimination - .05
5. Music Experience -
M.75 .58 .57 .87 3.30
SD .08 .11 .30 .10 4.86
Note. * p<.05,**p< .01 using a one-tailed test of significance
Fig. 1 Bivariate correlation between singing accuracy and vocal emotion
Table 2 Three-step hierarchical regression model predicting emotion
recognition accuracy
Step/Predictor Step 1 βStep 2 βStep 3 β
Music Experience .04 .03 -.08
Pitch Discrimination .18 .15
Singing Accuracy .31*
Adjusted R
-.01 .006 .08*
Ffor ΔR
2.55 6.56*
Note. * p <.05,reporting standardized regression estimates
Attention, Perception, & Psychophysics
sensorimotor processes involved in spoken prosody may re-
flect an effector-specific and dimension-specific network of
the vocal system recruited for processing pitch in both speech
and song. Importantly, a sensorimotor network for processing
vocal pitch aligns with the domain general framework of the
MMIA model, which is a model accounting for individual
differences in sensorimotor processes originally established
to account for variability in vocal pitch imitation
(Pfordresher et al., 2015). In support of a domain-general ef-
fect of sensorimotor processing, previous research has shown
that individuals who tend to be poor at imitating pitch in song
also tend to be poor at imitating pitch in speech (Liu et al.,
2013; Mantell & Pfordresher, 2013; cf. Yang et al., 2014).
Furthermore, the sensorimotor account of the relationship be-
tween singing accuracy and vocal emotion recognition in the
current study is also compatible with the framework proposed
by the OPERA hypothesis (Patel, 2011,2014), in which mu-
sical processing is expected to facilitate speech processing for
tasks that recruit shared networks involved in both music and
In line with the current results, other studies that have relied
on self-report measures of music experience have shown that
although emotional intelligence, personality, and age relate to
vocal emotion perception, musical training does not (Dibben
et al., 2018; Trimmer & Cuddy, 2008). However, studies fo-
cused on group comparisons between musicians and non-
musicians (Dmitrieva et al., 2006;Fulleretal.,2014;Lima
& Castro, 2011; Thompson et al., 2004) and musical training
interventions (Good et al., 2017;Thompsonetal.,2004)have
reported enhanced vocal emotion processing for musically
trained individuals. Relatedly, comparisons between individ-
uals with and without a musical impairment (i.e., congenital
amusia) reveal that individuals with amusia tend to also ex-
hibit poor vocal emotion perception (Thompson et al., 2012)
and that these impairments extend to individuals with tonal
language experience (Zhang et al., 2018). Given that amusia
has been linked to a deficit specific to pitch processing (Ayotte
et al., 2002), one possible explanation for these findings is that
individual differences in pitch processing may account for
variability in vocal emotion recognition. However, in the cur-
rent study, pitch discrimination was not a unique predictor of
overall vocal emotion recognition. This finding aligns with
previous research, which has shown that vocal pitch percep-
tion is related to vocal emotion recognition ability; however,
pitch perception for non-vocal pitch is not (Schelinski & von
Kriegstein, 2019). Complementing these findings, previous
research has shown that ASD, which has been linked to diffi-
culty in emotion recognition (Globerson et al., 2015;
Schelinski & von Kriegstein, 2019), has also been linked to
impairments in vocal perception and vocal production (Jiang
et al., 2015; Schelinski & von Kriegstein, 2019;Wangetal.,
2021). Furthermore, neuroimaging research has shown that
overlapping neural resources are recruited for both vocal pro-
duction and perception (Aziz-Zadeh et al., 2010; Skipper
et al., 2017), including activity in the inferior frontal gyrus
(Aziz-Zadeh et al., 2010; Pichon & Kell, 2013).
Interestingly, Aziz-Zadeh et al. (2010)reportedthatactivity
in this region during prosody perception correlated with self-
reported affective empathy scores (see also Banissy et al.,
2012), suggesting a possible link between vocal emotion pro-
cessing and affective empathy.
In addition to a sensorimotor account of the relationship
between singing accuracy and vocal emotion recognition,
we also consider whether this relationship can be conceptual-
ized as reflecting individual differences in how auditory infor-
mation is being prioritized by the listener. In support of this
alternative account, Atkinson et al. (2021) have found that
listeners can prioritize auditory information when that infor-
mation is deemed valuable. Furthermore, Sander et al. (2005),
who used a dichotic listening task in which participants were
instructed to identify a speakers gender, report that different
brain networks are recruited when participants are attending or
not attending to angry prosody. Therefore, it may be the case
that individuals who are better singers may be better than less
accurate singers at prioritizing prosodic cues such as pitch,
given that pitch is an important acoustic feature for both spo-
ken prosody and musical performance. This claim aligns with
findings from Greenspon and Pfordresher (2019), who found
that pitch short-term memory, pitch discrimination, and pitch
imagery were unique predictors of singing accuracy, but ver-
bal measures were not. In the current study, participants in the
final sample exhibited high levels of pitch discrimination ac-
curacy, suggesting that these individuals did not have difficul-
ty prioritizing pitch information. Furthermore, singing accura-
cy was a unique predictor of average emotion recognition
scores when controlling for individual differences in pitch
discrimination ability. However, a limitation of the current
study is that pitch perception was measured using a non-
adaptive pitch discrimination task with sine wave tones, and
therefore cannot address the degree to which individual dif-
ferences in vocal pitch perception or higher order musical
processes involved in melody perception may contribute to
the current findings, which are questions that should be ad-
dressed in future work.
When considering the results of the current study with re-
spect to task modality, our findings suggest that when
assessing musical processes using production and
perception-based tasks, the production-based task is a stronger
predictor of vocal emotion recognition than the perception-
based task. This finding builds on the work by Correia et al.
Attention, Perception, & Psychophysics
(2022), who found that perceptual musical abilities (see also
Globerson et al., 2013) and verbal short-term memory were
both unique predictors of vocal emotion recognition, but mu-
sical training was not. However, one limitation of the current
study is that only prosody perception, not production, was
measured. Therefore, future research is needed to clarify
whether individual differences in prosody production relate
to singing ability, as found for vocal prosody perception in
the current study.
Although the current study focused on general vocal emo-
tion recognition, previous work on vocal expression of emo-
tion suggests that different emotions can be signaled through
specific acoustic features, such as variations in pitch contour
(Banse & Scherer,1996; Frick, 1985), and that these cues
communicate emotions in both speech and music (Coutinho
&Dibben,2013; Juslin & Laukka, 2003). In addition to being
characterized by different acoustic profiles, basic emotions
such as anger, disgust, fear, happiness, and sadness have been
found to also reflect differences in accuracy and processing
time (Pell & Kotz, 2011). For these reasons, we also explored
whether singing accuracy, pitch discrimination, and music
experience predicted vocal emotion recognition for specific
emotions, as discussed in the Appendix. Although all correla-
tions between singing accuracy and vocal emotion recognition
showed a positive association, only correlations involving rec-
ognition accuracy for sentences portraying fear and sadness
reached significance. Correlations between pitch discrimina-
tion accuracy and vocal emotion recognition were more vari-
able, with correlations for anger and disgust showing nega-
tive, albeit non-significant, relationships. However, pitch dis-
crimination accuracy did positively correlate with vocal emo-
tion recognition for sentences portraying fear, happiness, and
neutral emotion. In contrast, we did not find any significant
correlations between self-reported musical training and vocal
emotion recognition. The emotion-specific pattern reported
for these correlations aligns with neuroimaging work that
has found emotion-specific neural signatures that are related
across different modalities (Aubé et al., 2015; Saarimäki et al.,
2016). Furthermore, neuroimaging research has also found
that neural responses for specific emotions differ based on
musical training with musicians showing different levels of
neural activation than non-musicians when listening to spoken
sentences portraying sadness (Park et al., 2015). In addition,
vocal expression of basic emotions has also been shown to be
influenced by physiological changes associated with emotion-
al reactions (Juslin & Laukka, 2003; Scherer, 2009). As such,
one pathway by which vocal prosody in speech and song may
communicate emotional states of a vocalist is through the
association between vocal cues and physiological responses.
Such a claim aligns with physiological-based and multi-
component models of emotion processing (James, 1884;
Scherer, 2009).
In sum, results of the current study address the degree to
which musical ability is associated with processing vocal
prosody using a musical production-based singing task that
recruits the same effector system as speech. Regression anal-
yses revealed that singing accuracy was the only unique pre-
dictor of average spoken prosody recognition, when control-
ling for pitch discrimination accuracy and self-reported musi-
cal experience. Together, our results support sensorimotor
processing of the vocal system as a possible mechanism for
the facilitatory effects of musical ability on speech processing.
We evaluated whether vocal emotion accuracy for different
emotion types in the current study replicated the effect of
emotion type reported in Pell and Kotz (2011). A one-way
repeated-measures ANOVA on unbiased hit rates in the vocal
emotion recognition task revealed a main effect of emotion
type, F(5, 350) = 77.16, p< .05. Descriptive statistics for each
emotion (Anger, Disgust, Fear, Happy, Sad, and Neutral) are
shown in Appendix Table 3. We conducted pairwise contrasts
using a Holm-Bonferroni correction to evaluate differences
between emotion types. It is important to note that Pell and
Kotz (2011) used a gating procedure whereas the current
study used only the full presentation of each sentence (i.e.,
gate 7), therefore our discussion focuses on the results Pell
and Kotz (2011) reported for later gates of the stimuli. We
replicated the pattern that fear was recognized with the highest
accuracy compared to all other emotions (all p< .001) and
disgust was recognized with the lowest accuracy compared to
all other emotion types (all p< .001). In addition, we replicat-
ed the finding that accuracy for sentences intended to convey
happy emotion were not statistically different from accuracy
for sentences intended to convey sad (p= .17) nor neutral
emotion (p=.17).
We next addressed whether singing accuracy, pitch dis-
crimination, and music experience were reliably associated
with recognition accuracy for each emotion type. As shown
in Appendix Table 3, singing accuracy was positively related
to emotion recognition for sentences intended to convey fear
and sadness. Correlations between singing accuracy and other
emotion types were also positive, but did not reach statistical
significance. As found for singing accuracy, pitch discrimina-
tion accuracy was positively related to vocal emotion recog-
nition for sentences intended to convey fear. In addition, pitch
Attention, Perception, & Psychophysics
discrimination was positively related to emotion recognition
scores for sentences intended to convey happiness and neutral
emotion. Unlike the associations found with singing accuracy,
associations between pitch discrimination and emotion recog-
nition for different emotion types were not consistently in a
positive direction. Finally, correlations between self-reported
musical experience and emotion recognition also did not show
consistently positive associations and did not reach statistical
significance for any emotion type.
Acknowledgements The authors would like to thank Marc D. Pell for the
stimuli in the vocal emotion recognition task, and Odalys A. Arango,
Arelis B. Bernal, Maryam Ettayebi, Joseph LaBarbera, Katherine R.
Rivera, Sydney P. Squier, and Adriana A. Zefutie for their assistance with
data collection.
Open practices statement We have provided information on participant
selection for the final sample, study design, and data analysis. Data for
this study is available at (
0080fadd74274c05b0c5dc13d92b887b). The experiment was not pre-
Atkinson, A. L., Allen, R. J., Baddeley, A. D., Hitch, G. J., & Waterman,
A. H. (2021). Can valuable information be prioritized in verbal
working memory? Journal of Experimental Psychology: Learning,
Memory, and Cognition, 47(5), 747764.
Aubé, W., Angulo-Perkins, A., Peretz, I., Concha, L., & Armony, J. L.
(2015). Fear across the senses: brain responses to music, vocaliza-
tions and facial expressions. Social Cognitive and Affective
Neuroscience, 10(3), 399407.
Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia: A group
study of adults afflicted with a music-specific disorder. Brain,
125(2), 238251.
Aziz-Zadeh, L., Sheng, T., & Gheytanchi, A. (2010). Common premotor
regions for the perception and production of prosody and correla-
tions with empathy and prosodic ability. PLoS One, 5(1), e8759.
Banissy, M. J., Sauter, D. A., Ward, J., Warren, J. E., Walsh, V., & Scott,
S. K. (2010). Suppressing sensorimotor activity modulates the dis-
crimination of auditory emotions but not speaker identity. Journal of
Neuroscience, 30(41), 1355213557.
Banissy, M. J., Kanai, R., Walsh, V., & Rees, G. (2012). Inter-individual
differences in empathy are reflected in human brain structure.
Neuroimage, 62(3), 20342039.
Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion
expression. Journal of Personality and Social Psychology, 70(3),
Boersma, P., & Weenink, D. (2013). Praat: doing phonetics by computer
(Version 5.4.09). [Software] Available from
Christiner, M., & Reiterer, S. M. (2013). Songand speech: Examining the
link between singing talent and speech imitationability. Frontiers in
Psychology, 4,874.
Christiner, M., & Reiterer, S. M. (2015). A Mozart is not a Pavarotti:
singers outperform instrumentalists on foreign accent imitation.
Frontiers in Human Neuroscience, 9,482.
Christiner, M., Bernhofs, V., & Groß, C. (2022). Individual Differences
in Singing Behavior during Childhood Predicts Language
Performance during Adulthood. Languages, 7,72.
Correia, A. I., Branco, P., Martins, M., Reis, A. M., Martins, N., Castro,
S. L., & Lima, C. F. (2019).Resting-state connectivity reveals a role
for sensorimotor systems in vocal emotional processing in children.
NeuroImage, 201, 116052.
Correia, A. I., Castro, S. L., MacGregor, C., Müllensiefen, D.,
Schellenberg, E. G., & Lima, C. F. (2022). Enhanced recognition
of vocal emotions in individuals with naturally good musical abili-
ties. Emotion.22(5), 894906.
Coutinho, E., & Dibben, N. (2013). Psychoacoustic cues to emotion in
speech prosody and music. Cognition & Emotion, 27(4), 658684.
Demorest, S. M. (2001). Pitch-matching performance of junior high boys:
A comparison of perception and production. Bulletin of the Council
for Research in Music Education,6370.
Demorest, S. M., & Clements, A. (2007). Factors influencing the pitch-
matching of junior high boys. Journal of Research in Music
Education, 55(3), 190203.
Demorest, S. M., Pfordresher, P. Q., Bella, S. D., Hutchins, S., Loui, P.,
Rutkowski, J., & Welch, G. F. (2015). Methodological perspectives
on singing accuracy: An introduction to the special issue on singing
accuracy (part 2). Music Perception: An Interdisciplinary Journal,
32(3), 266271.
Dibben, N., Coutinho, E., Vilar, J. A., & Estévez-Pérez, G. (2018). Do
individual differences influence moment-by-moment reports of
Table 3 Descriptive statistics and bivariate correlations between emotion types and predictors
Emotion Type Unbiased Hit Rates
Singing Accuracy
Pitch Discrimination
Music Experience
Anger .65 (.15) .07 -.01 -.07
Disgust .35 (.18) .10 -.04 .03
Fear .76 (.15) .33** .30** -.01
Happy .58 (.19) .19 .20* .02
Sad .54 (.14) .26* .14 .14
Neutral .62 (.15) .17 .22* -.01
Note.*p< .05, **p< .01 using a one-tailed test of significance. Singing and pitch discrimination accuracy were assessed as the proportion of correct
responses in each task,vocal emotion recognition accuracy was assessed as unbiased hit rates, and musicexperience used a self-report measure expressed
in years
Attention, Perception, & Psychophysics
emotion perceived in music and speech prosody? Frontiers in
Behavioral Neuroscience, 12, 184.
Dmitrieva, E. S., Gelman, V. Y., Zaitseva, K. A., & Orlov, A. M. (2006).
Ontogenetic features of the psychophysiological mechanisms of per-
ception of the emotional component of speech in musically gifted
children. Neuroscience and Behavioral Physiology, 36(1), 5362.
FFmpeg Developers. (2021). ffmpeg tool (Version 4.4). [Software]
Available from
FindingFive Team. (2019). FindingFive: A web platform for creating,
running, and managing your studies in one place. FindingFive
Corporation (nonprofit), NJ, USA.
Frick, R. W. (1985). Communicating emotions: The role of prosodic
features. Psychological Bulletin, 97(3), 412429.
Fuller, C. D., Galvin, J. J., Maat, B., Free, R. H., & Başkent, D. (2014).
The musician effect: Does it persist under degraded pitch conditions
of cochlear implant simulations? Frontiers in Neuroscience, 8,
Article 179.
Gerry, D., Unrau, A., & Trainor, L. J. (2012). Active music classes in
infancy enhance musical, communicative and social development.
Developmental Science, 15(3), 398407.
Globerson, E., Amir, N., Golan, O., Kishon-Rabin, L., & Lavidor, M.
(2013). Psychoacoustic abilities as predictors of emotion recogni-
tion. Attention, Perception, & Psychophysics, 75(8), 1,7991,810.
Globerson, E., Amir, N., Kishon-Rabin, L., & Golan, O. (2015). Prosody
recognition in adults with high-functioning autism spectrum disor-
ders: From psychoacoustics to cognition. Autism Research, 8(2),
Good, A., Gordon, K. A., Papsin, B. C., Nespoli, G., Hopyan, T., Peretz,
I., & Russo, F. A. (2017). Benefits of music training for perception
of emotional speech prosody in deaf children with cochlear im-
plants. Ear and Hearing, 38(4), 455.
Greenspon, E. B., & Pfordresher, P. Q. (2019). Pitch-specific contribu-
tions of auditory imagery and auditory memory in vocal pitch imi-
tation. Attention, Perception, & Psychophysics, 81(7), 24732481.
Greenspon, E. B., Pfordresher, P. Q., & Halpern, A. R. (2017). Pitch
imitation ability in mental transformations of melodies. Music
Perception: An Interdisciplinary Journal, 34(5), 585604.
Greenspon, E. B., Pfordresher, P. Q., & Halpern, A. R.(2020). The role of
long-term memory in mental transformations of pitch. Auditory
Perception & Cognition, 3(1-2), 7693.
Herholz, S. C., Halpern, A. R., & Zatorre, R. J. (2012). Neuronal corre-
lates of perception, imagery, and memory for familiar tunes. Journal
of Cognitive Neuroscience, 24,13821397.
Honda, C., & Pfordresher, P. Q. (2022). Remotely collected data can be
as good as laboratory collected data: A comparison between online
and in-person data collection in vocal production [Manuscript in
revision for publication].
Hutchins, S. M., & Peretz, I. (2012). A frog in your throat or in your ear?
Searching for the causes of poor singing. Journal of Experimental
Psychology: General, 141(1), 7697.
Hutchins, S., Larrouy-Maestri, P., & Peretz, I. (2014). Singing ability is
rooted in vocal-motor control of pitch. Attention, Perception, &
Psychophysics, 76(8), 25222530.
James, W. (1884). What is an emotion? Mind, 9(34), 188205.
Jiang, J., Liu, F., Wan, X., & Jiang, C. (2015). Perception of melodic
contour and intonation in autism spectrum disorder: Evidence from
Mandarin speakers. Journal of Autism and Developmental
Disorders, 45(7), 20672075.
Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal
expression and music performance: Different channels, same code?
Psychological Bulletin, 129(5), 770814.
Lima, C. F., & Castro, S. L. (2011). Speaking to the trained ear: Musical
expertise enhances the recognition of emotions in speech prosody.
Emotion, 11(5), 10211031.
Lima, C. F., Krishnan, S., & Scott, S. K. (2016). Roles of supplementary
motor areas in auditory processing and auditory imagery. Trends in
Neurosciences, 39(8), 527542.
Liu, F., Jiang, C., Pfordresher, P. Q., Mantell, J. T., Xu, Y., Yang, Y., &
Stewart, L. (2013). Individuals with congenital amusia imitate
pitches more accurately in singing than in speaking: Implications
for music and language processing. Attention, Perception, &
Psychophysics, 75(8), 17831798.
Livingstone, S., Thompson, W. F., & Russo, F. A. (2009). Facial expres-
sions and emotional singing: A study of perception and production
with motion capture and electromyography. Music Perception, 26,
Mantell, J. T., & Pfordresher, P. Q. (2013). Vocal imitation of song and
speech. Cognition, 127(2), 177202.
Nussbaum, C., & Schweinberger, S. R.(2021). Linksbetween musicality
and vocal emotion perception. Emotion Review, 13(3), 211224.
Park, M., Gutyrchik, E., Welker, L., Carl, P., Pöppel, E., Zaytseva, Y.,
et al. (2015). Sadness is unique: neural processing of emotions in
speech prosody in musicians and non-musicians. Frontiers in
Human Neuroscience, 8, 1049.
Patel, A. D. (2011). Why would musical training benefit the neural
encoding of speech? The OPERA hypothesis. Frontiers in
Psychology, 2,114.
Patel, A. D. (2014). Can nonlinguistic musical training change the way
the brain processes speech? The expanded OPERA hypothesis.
Hearing Research, 308,98108.
Pell, M. D., & Kotz, S. A. (2011). On the time course of vocal emotion
recognition. PLoS One, 6(11), e27256.
Petrides, K. V., Niven, L., & Mouskounti, T. (2006). The trait emotional
intelligence of ballet dancers and musicians. Psicothema, 18,101
Pfordresher, P. Q., & Brown, S. (2007). Poor-pitch singing in the absence
of "tone deafness". Music Perception, 25(2), 95115.
Pfordresher, P. Q., & Halpern, A. R. (2013). Auditory imagery and the
poor-pitch singer. Psychonomic Bulletin & Review, 20(4), 747753.
Pfordresher, P. Q., & Mantell, J. T. (2014). Singing with yourself:
Evidence for an inverse modeling account of poor-pitch singing.
Cognitive Psychology, 70,3157.
Pfordresher, P. Q., & Nolan, N. P. (2019). Testing convergence between
singing and music perception accuracy using two standardized mea-
sures. Auditory Perception & Cognition, 2(1-2), 6781.
Pfordresher, P. Q., Halpern, A. R., & Greenspon, E. B. (2015). A mech-
anism for sensorimotor translation in singing: The Multi-Modal
Imagery Association (MMIA) model. Music Perception: An
Interdisciplinary Journal, 32(3), 242253.
Pichon, S., & Kell, C. A. (2013). Affective and sensorimotor components
of emotional prosody generation. Journal of Neuroscience, 33(4),
Saarimäki, H., Gotsopoulos, A., Jääskeläinen, I. P., Lampinen, J.,
Vuilleumier, P., Hari, R., ... & Nummenmaa, L. (2016). Discrete
neural signatures of basic emotions. Cerebral Cortex,26(6), 2563-
Sander, D., Grandjean, D., Pourtois, G., Schwartz, S., Seghier, M. L.,
Scherer, K. R., & Vuilleumier, P. (2005). Emotion and attention
interactions in social cognition: brain regions involved in processing
anger prosody. Neuroimage, 28(4), 848858.
Schelinski, S., & von Kriegstein, K. (2019). The relation between vocal
pitch and vocal emotion recognition abilities in people with autism
spectrum disorder and typical development. JournalofAutismand
Developmental Disorders, 49(1), 6882.
Attention, Perception, & Psychophysics
Schellenberg, E. G. (2011). Music lessons, emotional intelligence, and
IQ. Music Perception, 29(2), 185194.
Scherer, K. R. (2009). The dynamic architecture of emotion: Evidence for
the component process model. Cognition and Emotion, 23(7),
Skipper, J. I., Devlin, J. T., & Lametti, D. R. (2017). The hearing ear is
always found close to the speaking tongue: Review of the role of the
motor system in speech perception. Brain and Language, 164,77
Stel, M., & van Knippenberg, A. (2008). The role of facial mimicry in the
recognition of affect. Psychological Science, 19(10), 984985.
Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding
speech prosody: Do music lessons help? Emotion, 4(1), 4664.
Thompson, W. F., Marin, M. M., & Stewart, L. (2012). Reduced sensi-
tivity to emotional prosody in congenital amusia rekindles the mu-
sical protolanguage hypothesis. Proceedings of the National
Academy of Sciences of the United States of America, 109(46), 19,
Trimmer, C. G., & Cuddy, L. L. (2008). Emotional intelligence, not
music training, predicts recognition of emotional speech prosody.
Emotion, 8(6), 838849.
Wagner, H. L. (1993). On measuring performance in category judgment
studies of nonverbal behavior. Journal of Nonverbal Behavior,
17(1), 328.
Wang, L., Pfordresher, P. Q., Jiang, C., & Liu, F. (2021). Individuals with
autism spectrum disorder are impaired in absolute but not relative
pitch and duration matching in speech and song imitation. Autism
Research,14(11), 23552372.
Yang, W. X., Feng, J., Huang, W. T., Zhang, C. X., & Nan, Y. (2014).
Perceptual pitch deficits coexist with pitch production difficulties in
music but not Mandarin speech. Frontiers in Psychology, 4, 1024.
Yehia, H., Rubin, P., & Vatikiotis-Bateson, E. (1998). Quantitative asso-
ciation of vocal-tract and facial behavior. Speech Communication,
26(1-2), 2343.
Zhang, Y., Geng, T., & Zhang, J. (2018, September 2-6). Emotional
prosody perception in Mandarin-speaking congenital amusics. In:
Proceedings of the Annual Conference of the International Speech
Communication Association (Interspeech 2018), 21962200.
Publishersnote Springer Nature remains neutral with regard to jurisdic-
tional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds
exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such
publishing agreement and applicable law.
Attention, Perception, & Psychophysics
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Research on singing and language abilities has gained considerable interest in the past decade. While several studies about singing ability and language capacity have been published, investigations on individual differences in singing behavior during childhood and its relationship to language capacity in adulthood have largely been neglected. We wanted to focus our study on whether individuals who had sung more often during childhood than their peers were also better in language and music capacity during adulthood. We used questionnaires to assess singing behavior of adults during childhood and tested them for their singing ability, their music perception skills, and their ability to perceive and pronounce unfamiliar languages. The results have revealed that the more often individuals had sung during childhood, the better their singing ability and language pronunciation skills were, while the amount of childhood singing was less predictive on music and language perception skills. We suggest that the amount of singing during childhood seems to influence the ability to sing and the ability to acquire foreign language pronunciation later in adulthood.
Full-text available
Links between musicality and vocal emotion perception skills have only recently emerged as a focus of study. Here we review current evidence for or against such links. Based on a systematic literature search, we identified 33 studies that addressed either (a) vocal emotion perception in musicians and nonmusicians, (b) vocal emotion perception in individuals with congenital amusia, (c) the role of individual differences (e.g., musical interests, psychoacoustic abilities), or (d) effects of musical training interventions on both the normal hearing population and cochlear implant users. Overall, the evidence supports a link between musicality and vocal emotion perception abilities. We discuss potential factors moderating the link between emotions and music, and possible directions for future research.
Full-text available
Individuals with autism spectrum disorder (ASD) often exhibit atypical imitation. However, few studies have identified clear quantitative characteristics of vocal imitation in ASD. This study investigated imitation of speech and song in English-speaking individuals with and without ASD and its modulation by age. Participants consisted of 25 autistic children and 19 autistic adults, who were compared to 25 children and 19 adults with typical development matched on age, gender, musical training , and cognitive abilities. The task required participants to imitate speech and song stimuli with varying pitch and duration patterns. Acoustic analyses of the imitation performance suggested that individuals with ASD were worse than controls on absolute pitch and duration matching for both speech and song imitation, although they performed as well as controls on relative pitch and duration matching. Furthermore, the two groups produced similar numbers of pitch contour, pitch interval-, and time errors. Across both groups, sung pitch was imitated more accurately than spoken pitch, whereas spoken duration was imitated more accurately than sung duration. Children imitated spoken pitch more accurately than adults when it came to speech stimuli, whereas age showed no significant relationship to song imitation. These results reveal a vocal imitation deficit across speech and music domains in ASD that is specific to absolute pitch and duration matching. This finding provides evidence for shared mechanisms between speech and song imitation, which involves independent implementation of relative versus absolute features. Lay Summary Individuals with autism spectrum disorder (ASD) often exhibit atypical imitation of actions and gestures. Characteristics of vocal imitation in ASD remain unclear. By comparing speech and song imitation, this study shows that individuals with ASD have a vocal imitative deficit that is specific to absolute pitch and duration matching, while performing as well as controls on relative pitch and duration matching, across speech and music domains.
Full-text available
Though there is substantial evidence that individuals can prioritize more valuable information in visual working memory, little research has examined this in the verbal domain. Four experiments were conducted to investigate this and the conditions under which effects emerge. In each experiment, participants listened to digit sequences and then attempted to recall them in the correct order. At the start of each block, participants were either told that all items were of equal value, or that an item at a particular serial position was worth more points. Recall was enhanced for these higher value items (Experiment 1a), a finding that was replicated while rejecting an alternative account based on distinctiveness (Experiment 1b). Thus, valuable information can be prioritized in verbal working memory. Two further experiments investigated whether these boosts remained when participants completed a simple concurrent task disrupting verbal rehearsal (Experiment 2), or a complex concurrent task disrupting verbal rehearsal and executive resources (Experiment 3). Under simple concurrent task conditions, prioritization boosts were observed, but with increased costs to the less valuable items. Prioritization effects were also observed under complex concurrent task conditions, although this was accompanied by chance-level performance at most of the less valuable positions. A substantial recency advantage was also observed for the final item in each sequence, across all conditions. Taken together, this indicates that individuals can prioritize valuable information in verbal working memory even when rehearsal and executive resources are disrupted, though they do so by neglecting or abandoning other items in the sequence.
Full-text available
Research on individual differences in musical abilities, and music-related deficits, has increased dramatically in the past 20 years. Although most studies to date concern music perception, in particular the deficit referred to as congenital amusia, a growing area of research has addressed individual differences in singing accuracy and poor-pitch singing. How closely associated are music perception and singing abilities? Several studies to date have reported dissociations between these abilities. However, these studies have tended to use small samples and have not compared leading standardized measures to each other. In the present study, we measured perception and singing abilities in a larger sample (N = 86) on standardized measures of singing accuracy (the Seattle Singing Accuracy Protocol) and music perception (the Online Test of Amuisa). Results revealed stronger associations between these higher-level musical abilities than either measure had with simple pitch discrimination. Analyses in which participants were classified as typical or deficient based on existing norms further suggested that deficits in music perception predict poor-pitch singing deficits more so than the reverse. Taken together, these results suggest that music perception and production may rely on shared higher-order representations of music that play a more important role than basic perception or motor control.
Full-text available
Vocal imitation guides both music and language development. Despite the developmental significance of this behavior, a sizable minority of individuals are inaccurate at vocal pitch imitation. Although previous research suggested that inaccurate pitch imitation results from deficient sensorimotor associations between pitch perception and vocal motor planning, the cognitive processes involved in sensorimotor translation are not clearly defined. In the present research, we investigated the roles of basic cognitive processes in the vocal imitation of pitch, as well as the degree to which these processes rely on pitch-specific resources. In the present study, participants completed a battery of pitch and verbal tasks to measure pitch perception, pitch and verbal auditory imagery, pitch and verbal auditory short-term memory, and pitch imitation ability. Information on participants’ music background was collected, as well. Pitch imagery, pitch short-term memory, pitch discrimination ability, and musical experience were unique predictors of pitch imitation ability. Furthermore, pitch imagery was a partial mediator of the relationship between pitch short-term memory and pitch imitation ability. These results indicate that vocal imitation recruits cognitive processes that rely on at least partially separate neural resources for pitch and verbal representations.
Most people can recognize and perform a musical piece under a variety of transformations such as altering the key or varying the tempo. However, we also know that other mental transformations of music can be difficult to generate and to recognize. Two factors that might affect this mental flexibility are the familiarity of the piece and musical ability of the listener, in this case singing accuracy. The current experiment addressed the accuracy and flexibility of representations of novel and traditional melodies among accurate, moderate, and inaccurate singers. Participants sang or recognized melodies in either their original form or as a transformation: a transposition, a shift of serial position, and a reversal of the melody. Participants showed an advantage for traditional melodies, but only when singing or recognizing tunes in their original form. Participants were similarly disrupted by mental transformations of traditional and novel tunes in both production and recognition tasks. Interestingly, we found that the only advantage for traditional melodies when singing repetitions of the melody occurred among the moderate singers, but all three groups showed an advantage for traditional melodies when recognizing exact repetitions.
Music training is widely assumed to enhance several nonmusical abilities, including speech perception, executive functions, reading, and emotion recognition. This assumption is based primarily on cross-sectional comparisons between musicians and nonmusicians. It remains unclear, however, whether training itself is necessary to explain the musician advantages, or whether factors such as innate predispositions and informal musical experience could produce similar effects. Here, we sought to clarify this issue by examining the association between music training, music perception abilities and vocal emotion recognition. The sample (N = 169) comprised musically trained and untrained listeners who varied widely in their musical skills, as assessed through self-report and performance-based measures. The emotion recognition tasks required listeners to categorize emotions in nonverbal vocalizations (e.g., laughter, crying) and in speech prosody. Music training was associated positively with emotion recognition across tasks, but the effect was small. We also found a positive association between music perception abilities and emotion recognition in the entire sample, even with music training held constant. In fact, untrained participants with good musical abilities were as good as highly trained musicians at recognizing vocal emotions. Moreover, the association between music training and emotion recognition was fully mediated by auditory and music perception skills. Thus, in the absence of formal music training, individuals who were "naturally" musical showed musician-like performance at recognizing vocal emotions. These findings highlight an important role for factors other than music training (e.g., predispositions and informal musical experience) in associations between musical and nonmusical domains. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Voices are a primary source of emotional information in everyday interactions. Being able to process non-verbal vocal emotional cues, namely those embedded in speech prosody, impacts on our behaviour and communication. Extant research has delineated the role of temporal and inferior frontal brain regions for vocal emotional processing. A growing number of studies also suggest the involvement of the motor system, but little is known about such potential involvement. Using resting-state fMRI, we ask if the patterns of motor system intrinsic connectivity play a role in emotional prosody recognition in children. Fifty-five 8-year-old children completed an emotional prosody recognition task and a resting-state scan. Better performance in emotion recognition was predicted by a stronger connectivity between the inferior frontal gyrus (IFG) and motor regions including primary motor, lateral premotor and supplementary motor sites. This is mostly driven by the IFG pars triangularis and cannot be explained by differences in domain-general cognitive abilities. These findings indicate that individual differences in the engagement of sensorimotor systems, and in its coupling with inferior frontal regions, underpin variation in children's emotional speech perception skills. They suggest that sensorimotor and higher-order evaluative processes interact to aid emotion recognition, and have implications for models of vocal emotional communication.
Congenital amusia, which is a neurogenetic disorder affecting musical pitch processing, was found recently to affect not only human speech perception, but also emotional perception. Since previous studies only examined participants with non-tonal languages, they cannot easily generalize the finding to people with tonal language background, due to the fact that those people utilize pitch cues much more heavily in daily communication compared with others. To make clear the doubt, this paper investigates emotional prosody perception of Mandarin speakers with congenital amusia. We tried to recruit 19 amusics and matched control group of similar number of normal speakers, and carried out emotional perception experiments in which speech and non-speech stimuli with six kinds of emotions were used, including happy, sad, fear, angry, surprise, and neutral. Results showed that the amusics performed significantly worse than matched controls. This indicated that tone-language expertise cannot compensate for pitch deficits in amusia for emotional perception. Further analyses demonstrated that there was a positive correlation between emotion prosody performance and pitch perceptional ability. These findings further support previous hypothesis that music and language share cognitive and neural resources, and provide a new perspective on the proposition of the relation between music and language.