Conference PaperPDF Available

Figures

Content may be subject to copyright.
Acoustic-Prosodic and Physiological Response to Stressful Interactions
in Children with Autism Spectrum Disorder
Daniel Bone1, Julia Mertens2, Emily Zane2, Sungbok Lee1, Shrikanth Narayanan1, Ruth Grossman2,3
1Signal Analysis and Interpretation Laboratory (SAIL), USC, Los Angeles, CA, USA
2Face Lab, Emerson College, Boston, MA, USA
3Department of Communication Sciences and Disorders, Emerson College, Boston, MA, USA
dbone@usc.edu, http://sail.usc.edu
Abstract
Social anxiety is a prevalent condition affecting individuals to
varying degrees. Research on autism spectrum disorder (ASD),
a group of neurodevelopmental disorders marked by impair-
ments in social communication, has found that social anxiety
occurs more frequently in this population. Our study aims to
further understand the multimodal manifestation of social stress
for adolescents with ASD versus neurotypically developing
(TD) peers. We investigate this through objective measures of
speech behavior and physiology (mean heart rate) acquired dur-
ing three tasks: a low-stress conversation, a medium-stress in-
terview, and a high-stress presentation. Measurable differences
are found to exist for speech behavior and heart rate in relation
to task-induced stress. Additionally, we find the acoustic mea-
sures are particularly effective for distinguishing between diag-
nostic groups. Individuals with ASD produced higher prosodic
variability, agreeing with previous reports. Moreover, the most
informative features captured an individual’s vocal changes be-
tween low and high social-stress, suggesting an interaction be-
tween vocal production and social stressors in ASD.
Index Terms: stress, acoustic-prosody, physiology, autism
spectrum disorder, interaction
1. Introduction
Stressors are pervasive in our daily lives, impacting our mood,
our general sense of well-being, and even our health [1]. In fact,
our ability to deal with and adapt to stress is associated with
positive health outcomes. Anxiety disorders are the most preva-
lent disorder in the United States, affecting 18% of the popula-
tion [2]. When stress causes anxiety, it leads to increased physi-
ological arousal in the body [3], which we express in our verbal
and non-verbal behavior. One such stressor is social anxiety, as
with public speaking. Under stress, a person experiences un-
conscious sympathetic responses; e.g., the laryngeal folds may
tighten, leading to a rise in vocal pitch [4]. But there is still
much to learn about the ways in which individuals experience
and express stress; one viable approach uses scalable objective
measures of behavior, i.e., Behavioral Signal Processing [5].
A meta-analytic study reported that social anxiety occurs
more frequently in individuals with autism spectrum disorder,
or ASD [6], affecting 40% of the population. ASD is a highly
heterogeneous, highly prevalent (1 in 68 [7]) neurodevelopmen-
tal disorder defined by impairments in social communication
and reciprocity, as well as restricted, repetitive behavioral pat-
terns and interests [8]. Given the prevalence of anxiety in ASD,
researchers are striving to better understand when individuals
become stressed and how they respond (e.g., from skin conduc-
tance responses [9]). Since it can be difficult for those with
ASD to understand and communicate their emotions, acoustic
analyses may provide an effective measurement of stress.
There has been limited work specifically focused on the
acoustic correlates of stress, likely due to the challenges of
collecting high-quality, naturalistic speech under stress [10].
Speech researchers have primarily focused on optimizing emo-
tion classification within a database [11], whether the target is
categorical or dimensional (i.e., arousal, valence, dominance).
Yet, studies continue to find such models tuned for one database
do not readily transfer to another [12], which is critical to
the realization of speech-based behavioral health systems op-
erating “in the wild”. Approaches have included knowledge-
inspired system design [13], unsupervised neural-network adap-
tation [14], and multimodal behavioral integration [15, 16]. A
survey article by Juslin and Scherer reported several measures
that reliably increase with arousal or stress: pitch and intensity
mean and variability; the ratio of high-frequency energy; and
speaking rate [17]. In this study, we extract corresponding fea-
tures, but use functionals that were found to be more robust for
tracking arousal such as median or interquartile ratio [13].
The present work builds upon several of our previous stud-
ies which sought acoustic correlates of the “atypical prosody”
so commonly observed in autism spectrum disorders [18, 19,
20, 21, 22]. Our experiments [18, 19, 20] in a sample of 29
children from the USC CARE Corpus [23] found children with
increasing ASD severity spoke less, spoke slower, responded
later, had more variable prosody, and had more atypical voice
quality. Since atypicality is not universal in ASD, we have also
investigated human perception of atypicality or “awkwardness”.
We found that human agreement can be rather low for very spe-
cific dimensions of prosody, but that speech rate and rhythm
cues were highly predictive of overall perceived “awkardness”
in the read speech task [21]. In a large-scale study, we found
that prosodic variability was significantly higher for individu-
als with ASD compared to peers with non-ASD developmen-
tal disorders [22], aligning with previous findings in smaller
databases [24, 25]. Additionally, we presented novel features
that measured reduced coordination between pitch and intensity
or duration, quantifying a previous qualitative perception [26].
Previous work has primarily focused on a single modality;
in this novel study, we investigate the multimodal presentation
of stress in individuals with ASD and their neurotypically devel-
oping peers as they participate in a series of progressively stress-
ful interactions. As a measure of latent physiology, we consider
mean heart rate, which generally correlates with acute increases
in stress [27]. We also explore a set of acoustic-prosodic fea-
tures that are expected to be modulated by changes in affect [13]
as well as ASD symptoms [22], observing global tendencies as
well as changes that occur within a person between different
tasks. Through this study, we aim to enhance our understand-
ing of signal-derived measures of stress, which are crucial to
development of clinical engineering systems.
2. Methodology
In this section, we discuss: three interactions of varying stress
in our study; data collection and participant demographics;
acoustic-prosodic and physiological features related to stress
and autism; and data analysis and machine learning models.
2.1. Social Interactions of Variable Stress
Subjects participated in three types of social interactions ex-
pected to be progressively more stressful: a low stress one-on-
one conversation, a medium stress one-on-one interview, and
a high stress presentation to an audience. In the first task, the
subjects watched YouTube clips, and then discussed the clips
with a researcher (this conversation typically lasted under one
minute). Because the interaction was casual (participants were
not aware they were being recorded) and because the topic was
impersonal, we assume that individuals experienced low levels
of stress during these chats.
In the second scenario, the research assistant interviewed
the subject about their hobbies, family, and school for twenty
minutes (we analyze two minutes). Since questions were per-
sonal, the context was more formal, and the subject was aware
they were being recorded, we expected the subjects to feel an in-
creased level of pressure compared to the casual conversations.
The third interaction, an oral presentation, is hypothesized
to be the most stressful. Subjects were to develop the ending
of a story within five minutes, and then were asked to present
in front of a seated audience of three adult judges, video-edited
to appear as a live Skype call. Further intensifying any social
anxiety, the subjects were told that their performance would be
judged against their peers’ performances (see [9, 28]).
2.2. Data Collection and Participants
Experimental data consists of video-recorded interactions from
the three stressful scenarios for all subjects. Data are from 17
children with autism spectrum disorder (ASD) and 24 subjects
with neurotypical development (TD). Participant demographics
are presented in Table 1, including: ADOS diagnosis, age, and
number of audio samples for each of the three tasks.
Data were collected at a single site as part of an IRB-
approved study. Efforts were made to ensure video and audio
quality consistency between subjects and tasks. All recordings
took place in the same room. Camera microphone distance was
not constant across sessions, but the distance is not known to
be systematically different between groups. Still, we did not
feel confident in using voice quality measures, which were pre-
viously shown to be characteristic of ASD speech [19], with
the present far-field recordings; instead, we focus on prosodic
measures that may be more robust to any recording variability.
Table 1: Demographic information of all subjects presented as
mean (stdv.). Differences between ASD and TD subject’s age
and gender are non-significant (p>0.05).
N Age in yr. Female Acquired Task Audio
Low Medium High
ASD 17 13.7 (2.2) 19% 12 14 12
TD 24 13.4 (2.3) 39% 14 16 23
total 41 13.5 (2.2) 31% 26 30 34
Presence/absence of the subjects’ speech was manually an-
notated. Audio for several sessions was not available due to
recording difficulties or corrupted files. The number of session
for which audio features were extracted is displayed by task
in Table 1. Similar data loss occurred with heart-rate record-
ings. This loss primarily affects the joint audio-HR analyses,
for which missing HR data reduces data size by 19%.
2.3. Acoustic-Prosodic and Physiological Features
We computed five classes of features: segmental pitch cues;
segmental spectral cues; speaking rate; coordination between
prosodic modalities (a novel feature type from [22]); and heart
rate. Details of the feature extraction are provided below.
2.3.1. Speaking Rate
Because transcripts were not available for these data—with
which we could perform forced alignment—we needed to de-
termine syllabic boundaries directly from the audio signal.
Speaking rate estimation from prosodic and spectral signals
has been of some interest to the speech processing commu-
nity [29, 30, 31], but accurate estimation of syllabic boundaries
remains challenging. We implemented a version of a pitch-
and intensity-based method that has reported competitive per-
formance [31]. Visual inspection suggested this syllabic seg-
mentation was adequate. We computed two features using syl-
lable boundaries: median speaking rate (syl/s) and syllable du-
ration inter-quartile ratio (s), or IQR.
2.3.2. Segmental Prosodic Cues: Syllabic Contours
Pitch, volume, and the percentage of high-frequency energy are
all expected to increase with anxiety, stress, and arousal [17].
Further, segmental intonation that captures speaker idiosyn-
crasies in micro-prosodic production have been used to char-
acterize the speech of indivdiuals with ASD [19, 21, 22]. As
such, we compute nine segmental prosodic features from pitch
and intensity extracted via Praat [32], as well as median HF500
(the ratio of energy above 500Hz to that below) computed via
the vocal arousal score toolkit (VC-AS) [13].
In particular, we extracted syllable-level second-order poly-
nomial parametrization of pitch and intensity, then calculated
session-level medians and inter-quartile ratios of slope (four
features). The overall median and IQR of both log-pitch and
intensity are also calculated (four features). Aside from me-
dian log-pitch, all pitch analysis is performed in the OME (Oc-
tave MEdian) scale [33], a log-pitch transformation as in Eq. 1
through which speaker’s tend to have the same pitch range, i.e.,
one octave.
OME = log(f0Hz
)log(median(f0Hz
)) (1)
Since a speaker’s range has been observed to reliably be one
OME around center in neutral speech, all speakers should have
a comparable range regardless of median pitch (unlike for Hz).
2.3.3. Prosodic Coordination Features
In previous work, we found that subjects with ASD showed re-
duced coordination of pitch with other prosodic markers [22].
Aside from ASD diagnosis, stress may affect this prosodic co-
ordination. Following the same approach [22], we quantified
the simultaneous movements of pitch, duration, and intensity
across syllables. These three feature streams are concatenated
per session, and then the Spearman’s rank-correlation coeffi-
cient is calculated pairwise, producing three features.
2.3.4. Physiological measure: heart rate
A person’s heart rate generally hastens under acute stress, and
has been specifically shown to increase in stressful speech inter-
actions [27]. We compute mean heart rate per session, while ex-
cluding sensor artifacts. Although heart-rate variability (HRV)
is commonly employed as a robust measure of complexity dif-
ferentially affected by acute versus chronic stress [27], compu-
tation requires a minimum sampling period of five minutes [34],
whereas each session lasts between 30 seconds and three min-
utes. Unlike HRV, mean HR is robust to the sampling period.
Table 2: Correlations of features with ADOS severity and best-estimate diagnosis. * indicates p<0.05; n.s. is non-significant.
Category Feature Task Stress Level ASD Diagnosis
Trend with task stress Sp. ρTrend with diagnosis Sp. ρ
Pitch cues
log-f0 median higher 0.49n.s. 0.15
log-f0 IQR n.s. 0.12 higher 0.33
log-f0 slope median n.s. 0.20 lower 0.27
log-f0 slope IQR n.s. 0.13 n.s. 0.14
Spectral cues
intensity median lower 0.47higher 0.29
intensity IQR lower 0.40higher 0.36
intensity slope median higher 0.56lower 0.22
intensity slope IQR n.s. 0.20 n.s. 0.18
HF500 median lower 0.28higher 0.32
Speaking Rate syllable rate median higher 0.23n.s. 0.10
syllable duration IQR lower 0.29higher 0.27
Prosodic
Coordination
corr. f0 & dur. less 0.29n.s. 0.05
corr. f0 & intensity less 0.25n.s. 0.14
corr. dur. & intensity less 0.21n.s. 0.01
Physiology heart rate median n.s. 0.20 n.s. 0.22
2.4. Statistical Analysis and Machine Learning
We conducted both statistical correlation analyses and clas-
sification experiments (with support vector machine via Lib-
linear software [35]). Parameters are tuned using two-level
nested cross-validation (CV), and averaged statistics of ten runs
of leave-one-subject-out CV are reported. Spearman’s rank-
correlation coefficient and unweighted average recall (UAR,
the mean of per-class recall) are selected as evaluation met-
rics. Note that in cases for which only two classes exists, the
p-value for Pearson’s rank-correlation coefficient is equivalent
to that from ANOVA; following, the same is true with Spear-
man’s rank-correlation coefficient, apart from the initial rank-
based feature transformation.
3. Results and Discussion
Relations between extracted behavioral features, task-induced
stress, and autism spectrum disorder (ASD) diagnosis can in-
form large-scale behavioral analyses. In Section 3.1, the objec-
tive speech and heart rate cues are analyzed versus the hypothe-
sized task-related stress level. Then, in Section 3.2, the cues are
used to differentiate task-type and to predict ASD diagnosis.
3.1. Correlational Feature Analysis
Acoustic-prosodic and heart rate feature correlations with task-
related stress level and ASD diagnosis are provided in Table 2.
The low stress (casual conversation), medium stress (interview),
and high stress (presentation) tasks are encoded with values of
0, 1, and 2, respectively, for purposes of analysis. Since the fea-
ture values are dependent on various sources of noise in data
collection and feature extraction processes, we cannot confi-
dently state that a construct is or is not informative of a target
variable, only whether an extracted feature is in this experiment.
Segmental pitch cues capture short-term tendencies in us-
age of fundamental frequency. We expected median pitch to
shift upward with increasing stress for a given speaker [17]; in
fact, the only significant relation between pitch cues and task
stress-level is that speakers tend to increase their pitch in more
stressful tasks (p<0.05). Although one may expect pitch vari-
ability to have also increased, median pitch may be more robust,
as it has been shown for a related percept, vocal arousal [13].
Regarding ASD, children with higher social-communicative
deficits have previously shown more negative pitch curvature—
which is possibly perceived as “flat” or “monotone” [18]—and
displayed more prosodic variability [22]. For both cases, we
confirm the previous findings; i.e., log-f0 variability is higher
for ASD subjects, while log-f0 slope is lower. The relative
(person-specific) changes in log-f0 between tasks are not signif-
icantly different between groups, while for both ASD and TD
subjects there is a significant increase in log-f0 between low and
high stress tasks (Figure 1a).
Segmental intensity cues may be similarly influenced by
stress; however, sound level is also a function of distance and
angle to the microphone. As such, we should interpret the in-
tensity findings cautiously. Contrary to expectations, we find
med_stress_f0 − low_stress_f0 high_stress_f0 − low_stress_f0
−0.2
0
0.2
0.4
0.6
0.8
f0 change (fraction)
ASD TD
(a) Relative fundamental frequency changes between interactions.
med_stress_HR − low_stress_HR high_stress_HR − low_stress_HR
−5
0
5
10
15
20
HR change (beats/min)
ASD TD
(b) Heart rate (HR) changes (beats/min) between interactions.
Figure 1: Mean relative changes (and standard-deviation) between stressful interactions: low (conversation), medium (interview), and
high (presentation). For both groups and features, changes between low and high stress scenarios are statistically significant (p<0.05).
that intensity median and IQR tend to decrease with increas-
ing task stress, as does HF500 (which is a primary correlate of
arousal [13]). Given the variability in audio conditions, it is dif-
ficult to ascertain if an incidental result of microphone distance
is being measured, or if, in fact, subjects reduce their volume
given increasing stress. We also observe that ASD subjects have
higher and more variable vocal intensity.
Speaking rate is another reported correlate of perceived vo-
cal arousal [17]. In our data, subjects in higher stress situations
speak faster, but also with lower durational variability; this in-
dicates a more rigid, tense speech production. Also, children
with ASD spoke with more durational variability—yet another
indicator of increased variability associated with ASD.
Following our previous quantitative support [22] of a qual-
itative finding [26, 36], we suspected that ASD subjects with
“atypical” prosody were sometimes modulating pitch incongru-
ously with other modalities. We quantified this prosodic coor-
dination as the pairwise correlation between three modalities:
syllabic fundamental frequency, vocal intensity, and duration.
In this study, we found no statistical difference between ASD
and TD groups. However, we did find that subjects in higher
stress tasks tended to have less coordination between prosodic
modalities–a possible result of reduced motor control as a phys-
iological response to stress.
Lastly, we investigate our single physiological measure,
mean heart rate, which is anticipated to increase with task stress.
We find that overall heart rate was not significantly higher in
the higher stress tasks, and that there is no relation with diag-
nosis. But because heart rate varies from person to person (due
to general health, respiration rate, etc.) we calculated relative
(per-person) increases between tasks; in fact, we find that mean
HR increased from low stress to high stress tasks (Figure 1b).
3.2. Prediction Experiments
Machine learning allows for building systems that incorporate
multivariate dependencies which are not obvious in statistical
observation. In this section, we analyze the performance of dif-
ferent feature categories for predicting tasks of varying stress
(Table 3) and for predicting ASD diagnosis (Table 4). In ad-
dition to session-level features, we introduce relative features
(as in Figure 1), which measure intra-personal changes between
tasks. We compute relative changes between five features (log-
f0, intensity, HF500, speaking rate, and HR) for all three task
comparisons (medium-low, high-medium, and high-low).
We initially examine the predictive power of acoustic and
heart rate features across diagnostic groups for task-stress as
shown in Table 3. We report both UAR and Spearman’s rank-
correlation coefficient, given that the task stress-labels are ordi-
nal. The acoustic features are significantly predictive of ASD
severity within both ASD and TD populations (p<0.05). This
is an intuitive finding, given the theoretical underpinnings and
empirical evidence for the relation between the acoustic features
and stress/anxiety/arousal. Interestingly, heart rate level alone
is only predictive of stress level for the ASD subjects. As stated
Table 3: Classification of task stress level from acoustic-
prosodic and HR features. Results are presented in terms of
UAR (baseline=33%) and Spearman’s rank-correlation coeffi-
cient. Bolded statistics are significant at the α=0.05 level.
Features
Group Acoustic Heart Rate Combined
ASD 56% 0.59 54% 0.45 62% 0.70
TD 52% 0.57 34% 0.10 55% 0.60
All 70% 0.72 41% 0.17 67% 0.69
Table 4: Classification of ASD diagnosis from acoustic-
prosodic and HR features in different stressful interactions. Re-
sults are presented in terms of UAR (baseline=50%). Bolded
statistics are significant at the α=0.05 level. ”Session” refers
to an individual task, while ”relative” refers to comparisons be-
tween low/medium, medium/high, and low/high, respectively.
Features
Acoustic Heart rate Combined
Task Session Relative Session All
Low 70 64 (M-L) 54 65
Medium 73 73 (H-M) 56 77
High 70 87 (H-L) 57 84
All 69 75 (All) 61 73
previously, dependence of resting heart rate on external factors
may overcome the influence of certain acute stressors; thus, the
relative HR features are most appropriate and useful. Feature
fusion generally leads to nominal increases in performance.
Next we consider classification of ASD diagnosis with be-
havioral features as a function of stressful interaction type. We
hypothesized that there may be differences in the ways in which
individuals on the spectrum experience and express stress in
comparison to their typically developing peers. However, it
is unclear if such a difference exists. We do observe that for
the combined feature set classification performance is higher
for more stressful tasks (65%, 75%, and 84%, respectively);
but this appears driven primarily by the relative-change acoustic
features (i.e., 64%, 73%, and 87%, respectively). Thus, further
investigation is required to ascertain the degree to which stress
changes modulate vocal behavior in ASD. Our physiological
measure, task-level mean heart rate, did not achieve significant
prediction, nor did the relative heart rate features of Table 3 (not
shown due to space constraints).
The most informative features are certainly the relative
acoustic features, wherein vocal changes between low and high
stress tasks achieve a predictive performance of 87% UAR.
Session-specific features achieve a lower performance, although
one that is consistent across tasks. This highlights an important
concept, that each person has their own baseline, and the way in
which behavior deviates from that baseline is quite informative.
4. Conclusion
In this work, we examined vocal and physiological measures
of stress during social interactions designed to induce varying
levels of anxiety in individuals with autism spectrum disorder
and typically developing peers. Certain findings corroborate
previous reports regarding acoustic-prosodic markers of autis-
tic speech, including increased prosodic variability (pitch, in-
tensity, and speaking rate) and more negative pitch slope (a pos-
sible correlate of perceived “monotone” speech in ASD). Fur-
thermore, measurable differences in behavioral features were
demonstrated through classification experiments in which those
features could identify the corresponding stressful task as well
as diagnosis. It is compelling that intra-personal acoustic devi-
ation between low and high stress tasks was quite informative
of ASD diagnosis. Still, further investigation is needed to better
understand the covariation of covert and overt behavioral cues,
acute stress, and autism spectrum disorder.
5. Acknowledgments
This work was supported by funds from the National Science
Foundation and the National Institute of Health. We thank the
children and families who generously gave their time.
6. References
[1] N. Schneiderman, G. Ironson, and S. D. Siegel, “Stress and health:
psychological, behavioral, and biological determinants,” Annu.
Rev. Clin. Psychol., vol. 1, pp. 607–628, 2005.
[2] R. C. Kessler, W. T. Chiu, O. Demler, and E. E. Walters, “Preva-
lence, severity, and comorbidity of 12-month dsm-iv disorders in
the national comorbidity survey replication,Archives of general
psychiatry, vol. 62, no. 6, pp. 617–627, 2005.
[3] T. Steimer, “The biology of fear-and anxiety-related behaviors,
Dialogues in clinical neuroscience, vol. 4, pp. 231–250, 2002.
[4] K. R. Scherer, “Vocal affect expression: a review and a model for
future research.” Psychological bulletin, vol. 99, no. 2, 1986.
[5] S. Narayanan and P. G. Georgiou, “Behavioral signal process-
ing: Deriving human behavioral informatics from speech and lan-
guage,” Proceedings of the IEEE, vol. PP, no. 99, pp. 1–31, 2013.
[6] F. J. van Steensel, S. M. B¨
ogels, and S. Perrin, “Anxiety disor-
ders in children and adolescents with autistic spectrum disorders:
a meta-analysis,” Clinical child and family psychology review,
vol. 14, no. 3, p. 302, 2011.
[7] J. Baio, “Prevalence of autism spectrum disorder among children
aged 8 years-autism and developmental disabilities monitoring
network, 11 sites, united states, 2010.Morbidity and mortality
weekly report. Surveillance summaries, vol. 63, no. 2, p. 1, 2014.
[8] A. P. Association et al.,Diagnostic and statistical manual of men-
tal disorders, (DSM-5®). American Psychiatric Pub, 2013.
[9] J. Mertens, E. Zane, K. Neumeyer, and R. Grossman, “How anx-
ious do you think i am? relationship between state and trait anxi-
ety in children with and without asd during social tasks,” Journal
of Autism and Developmental Disorders, pp. 1–12, 2017.
[10] J. H. Hansen, S. E. Bou-Ghazale, R. Sarikaya, and B. Pellom,
“Getting started with susas: a speech under simulated and actual
stress database.” in Eurospeech, vol. 97, no. 4, 1997, pp. 1743–46.
[11] C.-C. Lee, E. Mower, C. Busso, S. Lee, and S. Narayanan,
“Emotion recognition using a hierarchical binary decision tree ap-
proach,” Speech Comm., vol. 53, no. 9, pp. 1162–1171, 2011.
[12] B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz,
A. Wendemuth, and G. Rigoll, “Cross-corpus acoustic emotion
recognition: Variances and strategies,IEEE Transactions on Af-
fective Computing, vol. 1, no. 2, pp. 119–131, 2010.
[13] D. Bone, C.-C. Lee, and S. Narayanan, “Robust unsupervised
arousal rating: A rule-based framework with knowledge-inspired
vocal features,” IEEE Transactions on Affective Computing,
vol. 5, no. 1, pp. 201–213, 2014.
[14] J. Deng, Z. Zhang, F. Eyben, and B. Schuller, “Autoencoder-based
unsupervised domain adaptation for speech emotion recognition,”
IEEE Signal Proc. Letters, vol. 21, no. 9, pp. 1068–1072, 2014.
[15] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee,
A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, “Anal-
ysis of emotion recognition using facial expressions, speech and
multimodal information,” in Proceedings of the 6th international
conference on Multimodal interfaces. ACM, 2004, pp. 205–211.
[16] M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. Tor-
res Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic, “Avec
2016: Depression, mood, and emotion recognition workshop and
challenge,” in Proceedings of the 6th International Workshop on
Audio/Visual Emotion Challenge. ACM, 2016, pp. 3–10.
[17] P. Juslin and K. Scherer, The New Handbook of Methods in Non-
verbal Behavior Research. Oxford: Oxford University Press.,
2005, ch. 3. Vocal Expression of Affect, pp. 65–135.
[18] D. Bone, M. P. Black, C.-C. Lee, M. E. Williams, P. Levitt, S. Lee,
and S. Narayanan, “Spontaneous-Speech Acoustic-Prosodic Fea-
tures of Children with Autism and the Interacting Psychologist.”
in Proceedings of Interspeech, 2012, pp. 1043–1046.
[19] ——, “The Psychologist as an Interlocutor in Autism Spectrum
Disorder Assessment: Insights from a Study of Spontaneous
Prosody,” Journal of Speech, Language, and Hearing Research,
vol. 57, pp. 1162–1177, 2014.
[20] D. Bone, C.-C. Lee, T. Chaspari, M. Black, M. Williams, S. Lee,
P. Levitt, and S. Narayanan, “Acoustic-prosodic, turn-taking, and
language cues in child-psychologist interactions for varying social
demand,” in Proceedings of Interspeech, 2013.
[21] D. Bone, M. P. Black, A. Ramakrishna, R. Grossman,
and S. Narayanan, “Acoustic-prosodic correlates of ‘awkward’
prosody in story retellings from adolescents with autism,” in Pro-
ceedings of Interspeech, 2015.
[22] D. Bone, S. Bishop, R. Gupta, S. Lee, and S. Narayanan,
“Acoustic-prosodic and turn-taking features in interactions with
children with neurodevelopmental disorders,Proceedings of In-
terspeech, pp. 1185–1189, 2016.
[23] M. P. Black, D. Bone, M. E. Williams, P. Gorrindo, P. Levitt, and
S. S. Narayanan, “The USC CARE Corpus: Child-Psychologist
Interactions of Children with Autism Spectrum Disorders,” in
Proceedings of Interspeech, 2011.
[24] J. J. Diehl, D. Watson, L. Bennetto, J. McDonough, and C. Gun-
logson, “An Acoustic Analysis of Prosody in High-Functioning
Autism,” Applied Psycholinguistics, vol. 30, pp. 385–404, 2009.
[25] R. B. Grossman, L. R. Edelson, and H. Tager-Flusberg, “Emo-
tional facial and vocal expressions during story retelling by chil-
dren and adolescents with high-functioning autism,” Journal of
Speech, Language, and Hearing Research, vol. 56, no. 3, pp.
1035–1044, 2013.
[26] C. Baltaxe, J. Q. Simmons, and E. Zee, “Intonation patterns in
normal, autistic and aphasic children,” in Proceedings of the Tenth
International Congress of Phonetic Sciences. Foris Publications
Dordrecht, The Netherlands, 1984, pp. 713–718.
[27] C. Schubert, M. Lambertz, R. Nelesen, W. Bardwell, J.-B. Choi,
and J. Dimsdale, “Effects of stress on heart rate complexity?a
comparison between short-term and chronic stress,” Biological
psychology, vol. 80, no. 3, pp. 325–332, 2009.
[28] C. Kirschbaum, K.-M. Pirke, and D. H. Hellhammer, “The ‘trier
social stress test’–a tool for investigating psychobiological stress
responses in a laboratory setting,” Neuropsychobiology, vol. 28,
no. 1-2, pp. 76–81, 1993.
[29] N. Morgan and E. Fosler-Lussier, “Combining multiple estimators
of speaking rate,” in ICASSP, vol. 2. IEEE, 1998, pp. 729–732.
[30] D. Wang and S. S. Narayanan, “Robust speech rate estimation for
spontaneous speech,” IEEE Transactions on Audio, Speech, and
Language Processing, vol. 15, no. 8, pp. 2190–2201, 2007.
[31] N. H. De Jong and T. Wempe, “Praat script to detect syllable nu-
clei and measure speech rate automatically,Behavior research
methods, vol. 41, no. 2, pp. 385–390, 2009.
[32] P. Boersma, “Praat, a system for doing phonetics by computer,”
Glot International, vol. 5, no. 9/10, pp. 341–345, 2001.
[33] C. De Looze and D. Hirst, “The ome (octave-median) scale: A
natural scale for speech prosody,” in Proceedings of the 7th Inter-
national Conference on Speech Prosody (SP7), 2014.
[34] T. F. of the European Society of Cardiology et al., “Heart rate
variability standards of measurement, physiological interpreta-
tion, and clinical use,” Eur heart J, vol. 17, pp. 354–381, 1996.
[35] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin,
“Liblinear: A library for large linear classification,The Journal
of Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
[36] L. D. Shriberg, R. Paul, J. L. McSweeny, A. Klin, D. J. Cohen,
and F. R. Volkmar, “Speech and Prosody Characteristics of Ado-
lescents and Adults with High-Functioning Autism and Asperger
Syndrome,” JSLHR, vol. 44, pp. 1097–1115, 2001.
... Behavioral signal processing (BSP) is a framework for sensing the signals that arise from behavior, analyzing behavior from these signals, and modeling the behavioral constructs that observers abstract from the signals (Narayanan & Georgiou, 2013; see Figure 5). For instance, a BSP system may use a microphone and an electrocardiograph to sense a child's speech and heart rate during a dyadic interaction with a clinical psychologist, recognize speech intonation and autonomic arousal, and model the behavioral construct of social anxiety (Bone, Mertens, et al., 2017). The purpose of a BSP model in the domain of ASD is not to replace but rather to complement expert assessment of behavior and cognition Bone, Goodwin, et al., 2015;Bone, Mertens, et al., 2017). ...
... For instance, a BSP system may use a microphone and an electrocardiograph to sense a child's speech and heart rate during a dyadic interaction with a clinical psychologist, recognize speech intonation and autonomic arousal, and model the behavioral construct of social anxiety (Bone, Mertens, et al., 2017). The purpose of a BSP model in the domain of ASD is not to replace but rather to complement expert assessment of behavior and cognition Bone, Goodwin, et al., 2015;Bone, Mertens, et al., 2017). This section describes state-of-the-art engineering tools for enhancing the health care of individuals with ASD and accelerating scientific discovery in the disorder. ...
Article
Full-text available
Purpose As increasing amounts and types of speech data become accessible, health care and technology industries increasingly demand quantitative insight into speech content. The potential for speech data to provide insight into cognitive, affective, and psychological health states and behavior crucially depends on the ability to integrate speech data into the scientific process. Current engineering methods for acquiring, analyzing, and modeling speech data present the opportunity to integrate speech data into the scientific process. Additionally, machine learning systems recognize patterns in data that can facilitate hypothesis generation, data analysis, and statistical modeling. The goals of the present article are (a) to review developments across these domains that have allowed real-time magnetic resonance imaging to shed light on aspects of atypical speech articulation; (b) in a parallel vein, to discuss how advancements in signal processing have allowed for an improved understanding of communication markers associated with autism spectrum disorder; and (c) to highlight the clinical significance and implications of the application of these technological advancements to each of these areas. Conclusion The collaboration of engineers, speech scientists, and clinicians has resulted in (a) the development of biologically inspired technology that has been proven useful for both small- and large-scale analyses, (b) a deepened practical and theoretical understanding of both typical and impaired speech production, and (c) the establishment and enhancement of diagnostic and therapeutic tools, all having far-reaching, interdisciplinary significance. Supplemental Material https://doi.org/10.23641/asha.7740191
... Chevallier et al. (87) buscam, por meio da investigação da entoação, duração e intensidade, utilizando os pressupostos teóricos da Teoria da Mente, analisar os parâmetros vocálicos e acústicos de sujeitos autistas. Por outro lado, Bone et al. (88), em sua pesquisa, visando descrever respostas acústicas e fisiológicas em situações de estresse sensorial, debruçam-se sobre outros fenômenos além da entoação, tais como o contorno silábico. Ploog et al. (89), por sua vez, investigam a produção e a percepção entoacionais de sentenças desconhecidas pelos sujeitos autistas participantes do estudo, de modo a encontrar e documentar as pistas prosódicas nesse contexto. ...
Article
Full-text available
Esta revisão sistemática, vinculada ao Núcleo de Estudos em Aquisição da Linguagem/UFRRJ, dedica-se ao levantamento de material científico publicado sob o tema autismo e prosódia na última década. O objetivo deste trabalho é o de coletar os artigos publicados sob o tópico mencionado que apresentam como metodologia a análise acústica, em um recorte temporal de 2011 a 2020/2021. Para isso, foram selecionadas palavras-chave específicas juntamente com a utilização de operadores booleanos para a pesquisa das publicações em 6 bases indexadoras diferentes: ERIC, PsycINFO, PubMed, MLA, SciELO e Scopus. Ao realizar a busca nessas plataformas científicas, obtiveram-se 174 resultados. Em seguida, após a tabulação dos dados em diferentes categorias e feita a análise de material, fazem parte desta revisão sistemática 74 artigos. Por fim, com a descrição da análise feita, apontando os traços comparativos e contrastantes dos artigos em questão, conclui-se que poucas são as publicações destinadas a descrever e analisar a manifestação acústica da linguagem em indivíduos com Transtorno do Espectro do Autismo. Por esta razão, busca-se também explicitar possíveis caminhos para pesquisas futuras sobre o tema autismo e prosódia, pela abordagem da análise acústica.
... The strength of the LMBAS is that it is a theoretically driven, knowledge-based system that allows users to evaluate the link between speech production and perception. For future work, the performance of LMBAS can be compared to the performance of other acoustic features commonly extracted for use in voice quality analysis and other neuropsychiatric disorders that commonly affect voice and emotion recognition (Agurto et al., 2019;Agurto et al., 2020;Bone et al., 2017;Cummins et al., 2015;Deshpande and Schuller, 2020;Eyben et al., 2010;Harati et al., 2018;Huang et al., 2018;K€ onig et al., 2015;Low et al., 2020;Maor et al., 2020;Marmar et al., 2019;Norel et al., 2018;Orozco-Arroyave et al., 2016;Perez et al., 2018;Pinkas et al., 2020;Rusz et al., 2011;Sara et al., 2020). Some of these features include autocorrelation, zero-crossing rate, entropy/entropy ratios across targeted spectral ranges, energy/intensity, Mel/Bark-frequency cepstral coefficients (MFCC), linear predictive coefficients (LPC), perceptual linear predictive coefficients (PLP), perceptual linear predictive cepstral coefficients (PLP-CC), spectral features, psychoacoustic sharpness, spectral harmonicity, F0, F0 harmonics ratios, jitter/shimmer, and a variety of statistical and mathematical summary measurements for these frame-level values. ...
Article
This study evaluated the feasibility of differentiating conversational and clear speech produced by individuals with muscle tension dysphonia (MTD) using landmark-based analysis of speech (LMBAS). Thirty-four adult speakers with MTD recorded conversational and clear speech, with 27 of them able to produce clear speech. The recordings of these individuals were analyzed with the open-source LMBAS program, SpeechMark®, matlab Toolbox version 1.1.2. The results indicated that glottal landmarks, burst onset landmarks, and the duration between glottal landmarks differentiated conversational speech from clear speech. LMBAS shows potential as an approach for detecting the difference between conversational and clear speech in dysphonic individuals.
... Some studies have investigated the characteristics of ASD in speech, expecting its communication and interaction deficits to be presented [3]- [5]. A meta-analysis revealed the characteristics of ASD by the means and standard deviations (SDs) of fundamental frequency (F0) and speech intensity [6]; however, previous studies comparing whole-session means and variances have reported inconsistent results. ...
... They show that audio data collected by smartphones can be used to automatically detect the occurrence of a conversation [28], measure the frequency and duration of conversations [14], identify the speaker [42], count the number of parties involved in a conversation [59], and quantify the contribution of conversation partners and their turn-taking behaviors [21,28]. In addition, prior work has also developed algorithms to detect stress from audio data captured using smartphones [30], assessed the correlation between speech characteristics and stressful situations such as job interview or public speaking, and investigated correlations between vocal characteristics and social stress for adolescents with autism spectrum disorder [9]. As detection of stress from audio data can usually be done only when the user is speaking, works on detecting stress from audio data do not explore how to distinguish stressful conversation from other stressors. ...
Article
Stressful conversation is a frequently occurring stressor in our daily life. Stressors not only adversely affect our physical and mental health but also our relationships with family, friends, and coworkers. In this paper, we present a model to automatically detect stressful conversations using wearable physiological and inertial sensors. We conducted a lab and a field study with cohabiting couples to collect ecologically valid sensor data with temporally-precise labels of stressors. We introduce the concept of stress cycles, i.e., the physiological arousal and recovery, within a stress event. We identify several novel features from stress cycles and show that they exhibit distinguishing patterns during stressful conversations when compared to physiological response due to other stressors. We observe that hand gestures also show a distinct pattern when stress occurs due to stressful conversations. We train and test our model using field data collected from 38 participants. Our model can determine whether a detected stress event is due to a stressful conversation with an F1-score of 0.83, using features obtained from only one stress cycle, facilitating intervention delivery within 3.9 minutes since the start of a stressful conversation.
... A plot of acceleration across time (Fig. 7) supports this explanation, showing relatively large changes in velocity for the ASD group. Although previous literature has not reported increased variability in the facial-expression movements of individuals with ASD, such intra-participant variability is believed to be characteristic of their other social-communicative behaviors, like prosody (Bone et al. 2017;Bonneh et al. 2011;Nadig and Shaw 2012). Additionally, some research has suggested that the quality of other motor movements -like gait and grip --is marked by increased irregularity and variability in ASD (David et al. 2009;Hallett et al. 1993). ...
Article
Full-text available
Research shows that neurotypical individuals struggle to interpret the emotional facial expressions of people with Autism Spectrum Disorder (ASD). The current study uses motion-capture to objectively quantify differences between the movement patterns of emotional facial expressions of individuals with and without ASD. Participants volitionally mimicked emotional expressions while wearing facial markers. Recorded marker movement was grouped by expression valence and intensity. We used Growth Curve Analysis to test whether movement patterns were predictable by expression type and participant group. Results show significant interactions between expression type and group, and little effect of emotion valence on ASD expressions. Together, results support perceptions that expressions of individuals with ASD are different from—and more ambiguous than—those of neurotypical individuals’.
Article
Full-text available
Social stress can be caused by many factors. The United Nations Convention on the Rights of Persons with Disabilities (CRPD) highlights many social stressors disabled people experience in their daily lives. How social stressors experienced by disabled people are discussed in the academic literature and what data are generated influence social-stressor related policies, education, and research. Therefore, the aim of our study was to better understand the academic coverage of social stressors experienced by disabled people. We performed a scoping review study of academic abstracts employing SCOPUS, the 70 databases of EBSCO-HOST and Web of Science, and a directed qualitative content analysis to achieve our aim. Using many different search strategies, we found few to no abstracts covering social stressors experienced by disabled people. Of the 1809 abstracts obtained using various stress-related phrases and disability terms, we found a bias towards covering disabled people as stressors for others. Seventeen abstracts mentioned social stressors experienced by disabled people. Fourteen abstracts flagged “disability” as the stressor. No abstract contained stress phrases specific to social stressors disabled people experience, such as “disablism stress*” or “ableism stress*”. Of the abstracts containing equity, diversity, and inclusion phrases and policy frameworks, only one was relevant, and none of the abstracts covering emergency and disaster discussions, stress-identifying technologies, or science and technology governance were relevant. Anxiety is one consequence of social stressors. We found no abstract that contained anxiety phrases that are specific to social stressors disabled people experience, such as “ableism anxiety”, “disablism anxiety” or “disability anxiety”. Within the 1809 abstract, only one stated that a social stressor is a cause of anxiety. Finally, of the abstracts that contained anxiety phrases linked to a changing natural environment, such as “climate anxiety”, none were relevant. Our study found many gaps in the academic literature that should be fixed and with that highlights many opportunities.
Article
Full-text available
Individuals with autism spectrum disorder (ASD) often exhibit increased anxiety, even in non-stressful situations. We investigate general anxiousness (anxiety trait) and responses to stressful situations (anxiety state) in 22 adolescents with ASD and 32 typically developing controls. We measured trait anxiety with standardized self- and parent-reported questionnaires. We used a Biopac system to capture state anxiety via skin conductance responses, mean heart rate and heart rate variability during high- and low-anxiety tasks. Results reveal higher trait anxiety in adolescents with ASD (p < 0.05) and no group difference in state anxiety. Increased parent-reported trait anxiety may predict decreased state anxiety during high-stress conditions. Together, these findings suggest that higher trait anxiety may result in dampened physical responses to stress.
Conference Paper
Full-text available
The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.
Conference Paper
Full-text available
Fundamental frequency, the primary acoustic correlate of speech melody, is generally analysed and displayed using a linear scale (Hertz) or a logarithmic one, generally in semitones and usually offset to an arbitrary reference level such as 100 Hz. In this paper we argue that a more natural scale for analysing speech is the OME (Octave-MEdian) scale, using the octave (o) as the basic unit, offset to the median value of the speaker's range. We present results showing that a reasonable estimate of a speaker's neutral pitch range can be obtained directly from the median.
Conference Paper
Atypical speech prosody is a primary characteristic of autism spectrum disorders (ASD), yet it is often excluded from diagnostic instrument algorithms due to poor subjective reliability. Robust, objective prosodic cues can enhance our understanding of those aspects which are atypical in autism. In this work, we connect objective signal-derived descriptors of prosody to subjective perceptions of prosodic awkwardness. Subjectively, more awkward speech is less expressive (more monotone) and more often has perceived awkward rate/rhythm, volume, and intonation. We also find expressivity can be quantified through objective intonation variability features, and that speaking rate and rhythm cues are highly predictive of perceived awkwardness. Acoustic-prosodic features are also able to significantly differentiate subjects with ASD from typically developing (TD) subjects in a classification task, emphasizing the potential of automated methods for diagnostic efficiency and clarity
Article
Description of System: The Autism and Developmental Disabilities Monitoring (ADDM) Network is an active surveillance system in the United States that provides estimates of the prevalence of ASD and other characteristics among children aged 8 years whose parents or guardians live in 11 ADDM sites in the United States. ADDM surveillance is conducted in two phases. The first phase consists of screening and abstracting comprehensive evaluations performed by professional providers in the community. Multiple data sources for these evaluations include general pediatric health clinics and specialized programs for children with developmental disabilities. In addition, most ADDM Network sites also review and abstract records of children receiving specialeducation services in public schools. The second phase involves review of all abstracted evaluations by trained clinicians to determine ASD surveillance case status. A child meets the surveillance case definition for ASD if a comprehensive evaluation of that child completed by a qualified professional describes behaviors consistent with the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) diagnostic criteria for any of the following conditions: autistic disorder, pervasive developmental disorder-not otherwise specified (including atypical autism), or Asperger disorder. This report provides updated prevalence estimates for ASD from the 2010 surveillance year. In addition to prevalence estimates, characteristics of the population of children with ASD are described. Results: For 2010, the overall prevalence of ASD among the ADDM sites was 14.7 per 1,000 (one in 68) children aged 8 years. Overall ASD prevalence estimates varied among sites from 5.7 to 21.9 per 1,000 children aged 8 years. ASD prevalence estimates also varied by sex and racial/ethnic group. Approximately one in 42 boys and one in 189 girls living in the ADDM Network communities were identified as having ASD. Non-Hispanic white children were approximately 30% more likely to be identified with ASD than non-Hispanic black children and were almost 50% more likely to be identified with ASD than Hispanic children. Among the seven sites with sufficient data on intellectual ability, 31% of children with ASD were classified as having IQ scores in the range of intellectual disability (IQ ≤70), 23% in the borderline range (IQ = 71-85), and 46% in the average or above average range of intellectual ability (IQ > 85). The proportion of children classified in the range of intellectual disability differed by race/ethnicity. Approximately 48% of non-Hispanic black children with ASD were classified in the range of intellectual disability compared with 38% of Hispanic children and 25% of non-Hispanic white children. The median age of earliest known ASD diagnosis was 53 months and did not differ significantly by sex or race/ethnicity. Interpretation: These findings from CDC's ADDM Network, which are based on 2010 data reported from 11 sites, provide updated population-based estimates of the prevalence of ASD in multiple communities in the United States. Because the ADDM Network sites do not provide a representative sample of the entire United States, the combined prevalence estimates presented in this report cannot be generalized to all children aged 8 years in the United States population. Consistent with previous reports from the ADDM Network, findings from the 2010 surveillance year were marked by significant variations in ASD prevalence by geographic area, sex, race/ethnicity, and level of intellectual ability. The extent to which this variation might be attributable to diagnostic practices, underrecognition of ASD symptoms in some racial/ethnic groups, socioeconomic disparities in access to services, and regional differences in clinical or school-based practices that might influence the findings in this report is unclear. Public Health Action: ADDM Network investigators will continue to monitor the prevalence of ASD in select communities, with a focus on exploring changes within these communities that might affect both the observed prevalence of ASD and population-based characteristics of children identified with ASD. Although ASD is sometimes diagnosed by 2 years of age, the median age of the first ASD diagnosis remains older than age 4 years in the ADDM Network communities. Recommendations from the ADDM Network include enhancing strategies to address the need for 1) standardized, widely adopted measures to document ASD severity and functional limitations associated with ASD diagnosis; 2) improved recognition and documentation of symptoms of ASD, particularly among both boys and girls, children without intellectual disability, and children in all racial/ethnic groups; and 3) decreasing the age when children receive their first evaluation for and a diagnosis of ASD and are enrolled in community-based support systems.
Article
Errors in Byline, Author Affiliations, and Acknowledgment. In the Original Article titled “Prevalence, Severity, and Comorbidity of 12-Month DSM-IV Disorders in the National Comorbidity Survey Replication,” published in the June issue of the ARCHIVES (2005;62:617-627), an author’s name was inadvertently omitted from the byline on page 617. The byline should have appeared as follows: “Ronald C. Kessler, PhD; Wai Tat Chiu, AM; Olga Demler, MA, MS; Kathleen R. Merikangas, PhD; Ellen E. Walters, MS.” Also on that page, the affiliations paragraph should have appeared as follows: Department of Health Care Policy, Harvard Medical School, Boston, Mass (Drs Kessler, Chiu, Demler, and Walters); Section on Developmental Genetic Epidemiology, National Institute of Mental Health, Bethesda, Md (Dr Merikangas). On page 626, the acknowledgment paragraph should have appeared as follows: We thank Jerry Garcia, BA, Sara Belopavlovich, BA, Eric Bourke, BA, and Todd Strauss, MAT, for assistance with manuscript preparation and the staff of the WMH Data Collection and Data Analysis Coordination Centres for assistance with instrumentation, fieldwork, and consultation on the data analysis. We appreciate the helpful comments of William Eaton, PhD, Michael Von Korff, ScD, and Hans-Ulrich Wittchen, PhD, on earlier manuscripts. Online versions of this article on the Archives of General Psychiatry Web site were corrected on June 10, 2005.
Article
Speech and prosody-voice profiles for 15 male speakers with High-Functioning Autism (HFA) and 15 male speakers with Asperger syndrome (AS) were compared to one another and to profiles for 53 typically developing male speakers in the same 10- to 50-years age range. Compared to the typically developing speakers, significantly more participants in both the HFA and AS groups had residual articulation distortion errors, uncodable utterances due to discourse constraints, and utterances coded as inappropriate in the domains of phrasing, stress, and resonance. Speakers with AS were significantly more voluble than speakers with HFA, but otherwise there were few statistically significant differences between the two groups of speakers with pervasive developmental disorders. Discussion focuses on perceptual-motor and social sources of differences in the prosody-voice findings for individuals with Pervasive Developmental Disorders as compared with findings for typical speakers, including comment on the grammatical, pragmatic, and affective aspects of prosody.