ArticlePublisher preview available

The Psychologist as an Interlocutor in Autism Spectrum Disorder Assessment: Insights From a Study of Spontaneous Prosody

American Speech-Language-Hearing Association
Journal of Speech, Language, and Hearing Research
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Purpose The purpose of this study was to examine relationships between prosodic speech cues and autism spectrum disorder (ASD) severity, hypothesizing a mutually interactive relationship between the speech characteristics of the psychologist and the child. The authors objectively quantified acoustic-prosodic cues of the psychologist and of the child with ASD during spontaneous interaction, establishing a methodology for future large-sample analysis. Method Speech acoustic-prosodic features were semiautomatically derived from segments of semistructured interviews (Autism Diagnostic Observation Schedule, ADOS; Lord, Rutter, DiLavore, & Risi, 1999; Lord et al., 2012) with 28 children who had previously been diagnosed with ASD. Prosody was quantified in terms of intonation, volume, rate, and voice quality. Research hypotheses were tested via correlation as well as hierarchical and predictive regression between ADOS severity and prosodic cues. Results Automatically extracted speech features demonstrated prosodic characteristics of dyadic interactions. As rated ASD severity increased, both the psychologist and the child demonstrated effects for turn-end pitch slope, and both spoke with atypical voice quality. The psychologist's acoustic cues predicted the child's symptom severity better than did the child's acoustic cues. Conclusion The psychologist, acting as evaluator and interlocutor, was shown to adjust his or her behavior in predictable ways based on the child's social-communicative impairments. The results support future study of speech prosody of both interaction partners during spontaneous conversation, while using automatic computational methods that allow for scalable analysis on much larger corpora.
This content is subject to copyright. Terms and conditions apply.
JSLHR
Research Article
The Psychologist as an Interlocutor in
Autism Spectrum Disorder Assessment:
Insights From a Study of
Spontaneous Prosody
Daniel Bone,
a
Chi-Chun Lee,
a
Matthew P. Black,
a
Marian E. Williams,
b
Sungbok Lee,
a
Pat Levitt,
c,d
and Shrikanth Narayanan
a
Purpose: The purpose of this study was to examine
relationships between prosodic speech cues and autism
spectrum disorder (ASD) severity, hypothesizing a mutually
interactive relationship between the speech characteristics
of the psychologist and the child. The authors objectively
quantified acoustic-prosodic cues of the psychologist and
of the child with ASD during spontaneous interaction,
establishing a methodology for future large-sample analysis.
Method: Speech acoustic-prosodic features were
semiautomatically derived from segments of semistructured
interviews (Autism Diagnostic Observation Schedule, ADOS;
Lord, Rutter, DiLavore, & Risi, 1999; Lord et al., 2012) with
28 children who had previously been diagnosed with ASD.
Prosody was quantified in terms of intonation, volume, rate,
and voice quality. Research hypotheses were tested via
correlation as well as hierarchical and predictive regression
between ADOS severity and prosodic cues.
Results: Automatically extracted speech features
demonstrated prosodic characteristics of dyadic interactions.
As rated ASD severity increased, both the psychologist and
the child demonstrated effects for turn-end pitch slope, and
both spoke with atypical voice quality. The psychologists
acoustic cues predicted the childs symptom severity better
than did the childs acoustic cues.
Conclusion: The psychologist, acting as evaluator and
interlocutor, was shown to adjust his or her behavior in
predictable ways based on the childs social-communicative
impairments. The results support future study of speech
prosody of both interaction partners during spontaneous
conversation, while using automatic computational methods
that allow for scalable analysis on much larger corpora.
Key Words: autism spectrum disorder, children, prosody,
social communication, assessment, dyadic interaction
Human social interaction necessitates that each
participant continually perceive, plan, and express
multimodal pragmatic and affective cues. Thus, a
persons ability to interact effectively may be compromised
when there is an interruption in any facet of this perception
production loop. Autism spectrum disorder (ASD) is a
developmental disorder defined clinically by impaired social
reciprocity and communicationjointly referred to as social
affect (Gotham, Risi, Pickles, & Lord, 2007)as well as
by restricted, repetitive behaviors and interests (American
Psychiatric Association, 2000).
Speech prosodywhich refers to the manner in which
a person utters a phrase to convey affect, mark a commu-
nicative act, or disambiguate meaningplays a critical role in
social reciprocity. A central role of prosody is to enhance
communication of intent and, thus, enhance conversational
quality and flow. For example, a rising intonation canindicate
a request for response, whereas a falling intonation can in-
dicate finality (Cruttenden, 1997). Prosody can also be used to
indicate affect (Juslin & Scherer, 2005) or attitude (Uldall,
1960). Furthermore, speech prosody has been associated
with social-communicative behaviors such as eye contact in
children (Furrow, 1984).
a
Signal Analysis & Interpretation Laboratory (SAIL), University of
Southern California, Los Angeles
b
University Center for Excellence in Developmental Disabilities, Keck
School of Medicine of University of Southern California and Childrens
Hospital Los Angeles
c
Keck School of Medicine of University of Southern California
d
Childrens Hospital Los Angeles
Correspondence to Daniel Bone: dbone@usc.edu
Editor: Jody Kreiman
Associate Editor: Megha Sundara
Received March 14, 2013
Revision received August 25, 2013
Accepted September 5, 2013
DOI: 10.1044/2014_JSLHR-S-13-0062
Disclosure: The authors have declared that no competing interests existed at the
time of publication.
Journal of Speech, Language, and Hearing Research Vol. 57 11621177 August 2014 AAmerican Speech-Language-Hearing Association
1162
... These results demonstrate how adult speech prosody may be related to social and/or communicative abilities in children and highlight the importance of understanding the nuanced relationship between parent prosody and their children's outcomes. Further work suggests that an adult's (i.e., psychologist's) speech prosody may change depending on a child's autism characteristics (Bone et al., 2012(Bone et al., , 2014. More specifically, measures of the psychologists' vocal quality and variation (i.e., jitter and shimmer) were found to be predictive of the children's communicative abilities. ...
... Interestingly, the prosodic features of the psychologists' speech were more predictive of a child's prosodic behaviours than the measure of the child's autism characteristics (Bone et al., 2012). In a subsequent study, it was found that psychologists demonstrated higher variability in their volume and pitch dynamics when interacting with autistic children, suggesting that they may be modifying their voice to match that of the child (Bone et al., 2014). Further, examining the amount of speech produced by the psychologists during administration of the Autism Diagnostic Observation Schedule (ADOS; Emotions and Social Difficulties subtests) revealed that psychologists spoke more when interacting with autistic children who had higher scores on the ADOS (Bone et al., 2014). ...
... In a subsequent study, it was found that psychologists demonstrated higher variability in their volume and pitch dynamics when interacting with autistic children, suggesting that they may be modifying their voice to match that of the child (Bone et al., 2014). Further, examining the amount of speech produced by the psychologists during administration of the Autism Diagnostic Observation Schedule (ADOS; Emotions and Social Difficulties subtests) revealed that psychologists spoke more when interacting with autistic children who had higher scores on the ADOS (Bone et al., 2014). However, we recognize that autistic children may be less comfortable discussing emotions and social difficulties than non-autistic children, which may have influenced these findings. ...
Article
Full-text available
Purpose: Autistic individuals often face challenges perceiving and expressing emotions, potentially stemming from differences in speech prosody. Here we explore how autism diagnoses between groups, and measures of social competence within groups may be related to, first, children’s speech characteristics (both prosodic features and amount of spontaneous speech), and second, to these two factors in mothers’ speech to their children. Methods: Autistic (n = 21) and non-autistic (n = 18) children, aged 7–12 years, participated in a Lego-building task with their mothers, while conversational speech was recorded. Mean F0, pitch range, pitch variability, and amount of spontaneous speech were calculated for each child and their mother. Results: The results indicated no differences in speech characteristics across autistic and non-autistic children, or across their mothers, suggesting that conversational context may have large effects on whether differences between autistic and non-autistic populations are found. However, variability in social competence within the group of non-autistic children (but not within autistic children) was predictive of children’s mean F0, pitch range and pitch variability. The amount of spontaneous speech produced by mothers (but not their prosody) predicted their autistic children’s social competence, which may suggest a heightened impact of scaffolding for mothers of autistic children. Conclusion: Together, results suggest complex interactions between context, social competence, and adaptive parenting strategies in driving prosodic differences in children’s speech.
... Several studies have shown the promising potential of analyzing reciprocal features of the conversation, such as turn-taking [40,41]. Finally, prosodic features of the adult interacting with the child have also been shown to be very informative, as caregivers or psychologists tend to attune their prosody to the child that they are addressing and align the complexity of their verbal production to the child's language structure [42,43]. As such, one could hypothesize that the entire audio soundtrack of a social interaction contains a rich set of information about both the characteristics of the child's vocal production as well as a signature of the mutual vocal & verbal choreography that has the potential to represent a powerful autism screening tool. ...
... Indeed, social interaction is a highly dynamic phenomenon; what each agent does or does not do will inevitably affect its quality. Bone and colleagues [42] have shown that trained psychologist attune their prosody depending on the severity of autistic symptoms of the child while performing the ADOS with children aged 5-17 years old. The authors even reported that the psychologist's prosodic cues better predicted the child's autistic symptom severity than the prosody of the child himself. ...
Article
Full-text available
A timely diagnosis of autism is paramount to allow early therapeutic intervention in preschoolers. Deep Learning tools have been increasingly used to identify specific autistic symptoms. But they also offer opportunities for broad automated detection of autism at an early age. Here, we leverage a multi-modal approach by combining two neural networks trained on video and audio features of semi-standardized social interactions in a sample of 160 children aged 1 to 5 years old. Our ensemble model performs with an accuracy of 82.5% (F1 score: 0.816, Precision: 0.775, Recall: 0.861) for screening Autism Spectrum Disorders (ASD). Additional combinations of our model were developed to achieve higher specificity (92.5%, i.e., few false negatives) or sensitivity (90%, i.e. few false positives). Finally, we found a relationship between the neural network modalities and specific audio versus video ASD characteristics, bringing evidence that our neural network implementation was effective in taking into account different features that are currently standardized under the gold standard ASD assessment.
... Previously, conversational speech features have been used to identify differences between typically developing children and children on the autism spectrum. Vocal entrainment measures [7]- [10] and prosodic modulations by the clinician [11] have shown statistically significant correlations to clinical autism scores. While these works provide indicators of ASD in speech, they fail to offer a holistic view of the diagnostic session, as they often ignore the dynamics available through the co-occurring visual modality (such as repetitive behaviors, atypical eye gaze, and gesture patterns [12]- [14]). ...
Preprint
Full-text available
Clinical videos in the context of Autism Spectrum Disorder are often long-form interactions between children and caregivers/clinical professionals, encompassing complex verbal and non-verbal behaviors. Objective analyses of these videos could provide clinicians and researchers with nuanced insights into the behavior of children with Autism Spectrum Disorder. Manually coding these videos is a time-consuming task and requires a high level of domain expertise. Hence, the ability to capture these interactions computationally can augment the manual effort and enable supporting the diagnostic procedure. In this work, we investigate the use of foundation models across three modalities: speech, video, and text, to analyse child-focused interaction sessions. We propose a unified methodology to combine multiple modalities by using large language models as reasoning agents. We evaluate their performance on two tasks with different information granularity: activity recognition and abnormal behavior detection. We find that the proposed multimodal pipeline provides robustness to modality-specific limitations and improves performance on the clinical video analysis compared to unimodal settings.
... Differences between children with ASD and TD children in other acoustic parameters have also been reported in other studies. For instance, Patel et al. [21] reported slower speech rate for autistic individuals, while Bone et al. [22] reported a positive association between ASD severity and median f 0 slope as well as atypical voice quality like jitter and shimmer. It is worth mentioning, however, there are also studies reporting no significant differences between the speech rate of individuals with and without ASD [23,24]. ...
Article
Full-text available
Abnormal speech prosody has been widely reported in individuals with autism. Many studies on children and adults with autism spectrum disorder speaking a non-tonal language showed deficits in using prosodic cues to mark focus. However, focus marking by autistic children speaking a tonal language is rarely examined. Cantonese-speaking children may face additional difficulties because tonal languages require them to use prosodic cues to achieve multiple functions simultaneously such as lexical contrasting and focus marking. This study bridges this research gap by acoustically evaluating the use of Cantonese speech prosody to mark information structure by Cantonese-speaking children with and without autism spectrum disorder. We designed speech production tasks to elicit natural broad and narrow focus production among these children in sentences with different tone combinations. Acoustic correlates of prosodic focus marking like f0, duration and intensity of each syllable were analyzed to examine the effect of participant group, focus condition and lexical tones. Our results showed differences in focus marking patterns between Cantonese-speaking children with and without autism spectrum disorder. The autistic children not only showed insufficient on-focus expansion in terms of f0 range and duration when marking focus, but also produced less distinctive tone shapes in general. There was no evidence that the prosodic complexity (i.e. sentences with single tones or combinations of tones) significantly affected focus marking in these autistic children and their typically-developing (TD) peers.
... Although descriptions of voice quality appear in the literature (e.g., "harsh," "nasal," or "hoarse"), voice quality measures are remarkably absent from the study of vocal atypicality in ASD. Two studies (Bone et al., 2014;Kissine & Geelhand, 2019) have investigated jitter, that is, cycle-to-cycle changes in the fundamental period, which is associated with the perceptual qualities of breathiness and hoarseness (Eskenazi et al., 1990;Wolfe et al., 1997). Other candidate acoustic features are shimmer (Kissine & Geelhand, 2019), which quantifies cycle-to-cycle fluctuations in the amplitude of the waveform and has been related to both breathiness and hoarseness (Wolfe et al., 1997); harmonics-to-noise ratio, which quantifies the relative amount of energy in harmonic portions of the spectrum with other, "noise" energy, and has been related to hoarseness (Yumoto et al., 1982); and H1-H2, the relative amplitudes of the first two harmonics, which is also linked to the perception of breathiness (Hillenbrand & Houde, 1994;Klatt & Klatt, 1990;Kreiman & Gerratt, 2010). ...
... In autistic individuals, acoustical alterations include differences in pitch variability [144][145][146][147][148] , harshness of speech tone 149 and hypernasality [150][151][152] . As in psychosis, longer pauses [153][154][155] have been detected. ...
Article
Full-text available
Although the clinical phenotypes of autism and schizophrenia spectrum disorders are considered distinct, there are substantial areas of overlap between both diagnoses. The two conditions co-occur disproportionately often, and research has begun to explore the impact of their simultaneous occurrence on clinically relevant outcomes such as depression and quality of life. In this Review, we describe what is known about the rates of co-occurrence between autism and schizophrenia spectrum disorders, and delineate their unique and shared neuropsychological features regarding sensory processing, cognitive functioning (including language production) and social interactions. Despite this increasing body of literature, critical questions remain about how symptoms of autism and psychosis can best be differentiated, and which treatment options are best suited for people with co-occurring symptoms. We end by providing a research road map to direct efforts towards filling these knowledge gaps.
Article
Full-text available
Natural speech plays a pivotal role in communication and interactions between human beings. The prosody of natural speech, due to its high ecological validity and sensitivity, has been acoustically analyzed and more recently utilized in machine learning to identify individuals with autism spectrum disorders (ASDs). In this meta-analysis, we evaluated the findings of empirical studies on acoustic analysis and machine learning techniques to provide statistically supporting evidence for adopting natural speech prosody for ASD detection. Using a random-effects model, the results observed moderate-to-large pooled effect sizes for pitch-related parameters in distinguishing individuals with ASD from their typically developing (TD) counterparts. Specifically, the standardized mean difference (SMD) values for pitch mean, pitch range, pitch standard deviation, and pitch variability were 0.3528, 0.6744, 0.5735, and 0.5137, respectively. However, the differences between the two groups in temporal features could be unreliable, as the SMD values for duration and speech rate were only 0.0738 and −0.0547. Moderator analysis indicated task types were unlikely to influence the final results, whereas age groups showed a moderating role in pooling pitch range differences. Furthermore, promising accuracy rates on ASD identification were shown in our analysis of multivariate machine learning studies, indicating averaged sensitivity and specificity of 75.51% and 80.31%, respectively. In conclusion, these findings shed light on the efficacy of natural prosody in identifying ASD and offer insights for future investigations in this line of research.
Article
Full-text available
In an earlier study, we evaluated the effectiveness of several acoustic measures in predicting breathiness ratings for sustained vowels spoken by nonpathological talkers who were asked to produce nonbreathy, moderately breathy, and very breathy phonation (Hillenbrand, Cleveland, & Erickson, 1994). The purpose of the present study was to extend these results to speakers with laryngeal pathologies and to conduct tests using connected speech in addition to sustained vowels. Breathiness ratings were obtained from a sustained vowel and a 12-word sentence spoken by 20 pathological and 5 nonpathological talkers. Acoustic measures were made of (a) signal periodicity, (b) first harmonic amplitude, and (c) spectral tilt. For the sustained vowels, a frequency domain measure of periodicity provided the most accurate predictions of perceived breathiness, accounting for 92% of the variance in breathiness ratings. The relative amplitude of the first harmonic and two measures of spectral tilt correlated moderately with breathiness ratings. For the sentences, both signal periodicity and spectral tilt provided accurate predictions of breathiness ratings, accounting for 70%-85% of the variance.
Article
Full-text available
A companion paper includes rationale for the use of 10 metrics of articulation competence in conversational speech (Shriberg, Austin, Lewis, McSweeny, & Wilson, 1997). The present paper reports lifespan reference data for these measures using records from a total of 836 3– to 40+-year-old speakers with normal and disordered speech. The reference data are subdivided by diagnostic classification based on extensions to an instrument titled the Speech Disorders Classification System (SDCS; Shriberg, 1993). Appendices provide procedural information on the SDCS and statistical rationale for the reference data.
Chapter
For many years the Handbook of Methods in Nonverbal Behavior Research (Scherer & Ekman, 1982) has been an invaluable text for researchers looking for methods to study nonverbal behavior and the expression of affect. A successor to this essential text, The New Handbook of Methods in Nonverbal Behavior Research is a substantially updated volume with 90% new material. It includes chapters on coding and methodological issues for a variety of areas in nonverbal behavior: facial actions, vocal behavior, and body movement. Issues relevant to judgment studies, methodology, reliability, analyses, etc. have also been updated. The topics are broad and include specific information about methodology and coding strategies in education, psychotherapy, deception, nonverbal sensitivity, and marital and group behavior. There is also a chapter detailing specific information on the technical aspects of recording the voice and face, and specifically in relation to deception studies. This volume will be valuable for both new researchers and those already working in the fields of nonverbal behavior, affect expression, and related topics. It will play a central role in further refining research methods and coding strategies, allowing a comparison of results from various laboratories where research on nonverbal behavior is being conducted. This will advance research in the field and help to coordinate results so that a more comprehensive understanding of affect expression can be developed.
Book
When originally published in 1986, this book was the first to survey intonation in all its aspects, both in English and universally. In this updated edition, while the basic descriptive facts of the form and use of intonation are presented in the British nuclear tone tradition, there is nevertheless extensive comparison with other theoretical frameworks, in particular with the ToBI framework, which has become widespread in the United States. In this new edition Alan Cruttenden has expanded the sections on historical background, different theoretical approaches and sociolinguistic variation. After introductory chapters on the physiology and acoustics of pitch, he describes in detail the forms and functions of intonation in English and discusses the sociolinguistic and dialectal variations in intonation. The concluding chapter provides an overview of the state of the art in intonational studies.
Article
This paper describes experiments in which Osgood's semantic differential was used to measure the attitude of listeners to a variety of intonation patterns. 16 pitch contours were applied by synthesis to recordings of four sentences and listeners were asked to rate the patterns with respect to 10 scales of the type BORED/ INTERESTED, POLITE/RUDE. From the results it was possible to draw some conclusions about the relative effectiveness of the chosen scales and about some general features of the intonation patterns which had particular weight with respect to three factors: Pleasant/Unpleasant, Interest/Lack of Interest and Authoritative/ Submissive.
Article
Impaired social communication and social reciprocity are the primary phenotypic distinctions between autism spectrum disorders (ASD) and other developmental disorders. We investigate quantitative conversational cues in child-psychologist interactions using acoustic-prosodic, turn-taking, and language features. Results indicate the conversational quality degraded for children with higher ASD severity, as the child exhibited difficulties conversing and the psychologist varied her speech and language strategies to engage the child. When interacting with children with increasing ASD severity, the psychologist exhibited higher prosodic variability, increased pausing, more speech, atypical voice quality, and less use of conventional conversational cue such as assents and non-fluencies. Children with increasing ASD severity spoke less, spoke slower, responded later, had more variable prosody, and used personal pronouns, affect language, and fillers less often. We also investigated the predictive power of features from interaction subtasks with varying social demands placed on the child. We found that acoustic prosodic and turn-taking features were more predictive during higher social demand tasks, and that the most predictive features vary with context of interaction. We also observed that psychologist language features may be robust to the amount of speech in a subtask, showing significance even when the child is participating in minimal-speech, low social-demand tasks.