Conference Paper

Engagement detection for children with Autism Spectrum Disorder

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Children with Autism Spectrum Disorder (ASD) face several difficulties in social communication. Hence, analyzing social interaction can provide insight on their social and cognitive skills. In this paper, we investigate the degree of engagement of children in interactions with their parents. Features derived from both participants including acoustic, linguistic and dialogue act features are explored. The effect of visual cues is also investigated. We experimented on the task of engagement detection using video-recorded sessions consisting of interactions of typically developing (TD) and ASD children. Results show that engagement is easier to predict for TD children than for ASD children, and that the parent’s actions/movements are better predictors of the child’s degree of engagement.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The 21 included studies were published between 2007 and 2020, with most of them (n = 13/21; 62%) published after 2016 (33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45). The sample size of the included studies ranged from 2 to 35 children and/or youth, with a mean age up to 20.8 years (see Table 1). ...
... Of the 12 studies that reported on child and/or youth gender, 11 studies (92%) included male majority samples (33, 34, 38-41, 45, 46, 49, 50, 53). The vast majority of the included studies focused on children and/or youth with autism spectrum disorder (ASD) (n = 19/21; 91%) (33,34,(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51)(52)(53), followed by single instances of studies including children with Down syndrome (5%) (50), children with a visual impairment (5%) (35), and children with cerebral palsy (5%) (36). None of the studies reported on family socio-economic status, family income, or child or youth ethnicity [Hispanic, non-Hispanic]. ...
... Liu et al. (53) collected youth and caregiver report data (i.e., annotations done by youth and caregiver), in addition to therapist annotations; however, only therapist annotations together with collected physiological indices (e.g., heart sound) were included in the predictive model of participation. These annotated observations were paired with data collected from facial, skeleton, or eye recognition tools (n = 9/20; 45%) (33, 37, 40-42, 44, 45, 50, 52), sensors (n = 6/20; 30%) (34,40,44,45,51,53), EEG (n = 4/20; 20%) (34,36,39,46), via distance estimates (n = 3/20; 15%) (47)(48)(49), and/or other tools (n = 6/20; 30%) such as microphones or electrodes (34,38,40,(43)(44)(45). To capture or predict participation, 18 of the 21 participation assessment approaches used multiple types of AI (86%) (34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)52). ...
Article
Full-text available
Background There is increased interest in using artificial intelligence (AI) to provide participation-focused pediatric re/habilitation. Existing reviews on the use of AI in participation-focused pediatric re/habilitation focus on interventions and do not screen articles based on their definition of participation. AI-based assessments may help reduce provider burden and can support operationalization of the construct under investigation. To extend knowledge of the landscape on AI use in participation-focused pediatric re/habilitation, a scoping review on AI-based participation-focused assessments is needed. Objective To understand how the construct of participation is captured and operationalized in pediatric re/habilitation using AI. Methods We conducted a scoping review of literature published in Pubmed, PsycInfo, ERIC, CINAHL, IEEE Xplore, ACM Digital Library, ProQuest Dissertation and Theses, ACL Anthology, AAAI Digital Library, and Google Scholar. Documents were screened by 2–3 independent researchers following a systematic procedure and using the following inclusion criteria: (1) focuses on capturing participation using AI; (2) includes data on children and/or youth with a congenital or acquired disability; and (3) published in English. Data from included studies were extracted [e.g., demographics, type(s) of AI used], summarized, and sorted into categories of participation-related constructs. Results Twenty one out of 3,406 documents were included. Included assessment approaches mainly captured participation through annotated observations ( n = 20; 95%), were administered in person ( n = 17; 81%), and applied machine learning ( n = 20; 95%) and computer vision ( n = 13; 62%). None integrated the child or youth perspective and only one included the caregiver perspective. All assessment approaches captured behavioral involvement, and none captured emotional or cognitive involvement or attendance. Additionally, 24% ( n = 5) of the assessment approaches captured participation-related constructs like activity competencies and 57% ( n = 12) captured aspects not included in contemporary frameworks of participation. Conclusions Main gaps for future research include lack of: (1) research reporting on common demographic factors and including samples representing the population of children and youth with a congenital or acquired disability; (2) AI-based participation assessment approaches integrating the child or youth perspective; (3) remotely administered AI-based assessment approaches capturing both child or youth attendance and involvement; and (4) AI-based assessment approaches aligning with contemporary definitions of participation.
... It is investigated how auditory, linguistic, and conversation act aspects, which are obtained from both participants, interact with one another. Additionally, the impact of visual signals is being looked at as well [4]. ...
Article
Autism spectrum disorder (ASD) is a behavioural condition that affects the child’s social interaction, communication, and behaviour. The early identification of ASD is critical for the effective and timely therapies. This study presents an enhanced prediction model for Autism Spectrum Disorder (ASD). It is based on facial features extracted from the face image. Mallat’s multi-resolution algorithm is employed in this work for extracting facial features. Two distance based classifiers such as Euclidean Distance Classifier (EDC) and Absolute Distance Classifier (ADC) are employed for the ASD prediction. The proposed ASD prediction system is evaluated on face images of autistic and non-autistic children. The database is obtained from the Kaggle data repository. A total of 2940 facial images (1470 autistic and 1470 non-autistic) are employed for performance analysis. Experimental results show that the proposed ASD prediction system provides promising results with an accuracy of 97.01% by EDC and 96.87% by ADC classifiers. Keywords Autism spectrum disorder, Computer aided diagnosis, Machine learning, Image based diagnosis, Prediction system.
... To ensure sufficient training volume, progressive sampling was used in both the cases. After evaluating multiple machine learning algorithms, the author chose Random Forests for its robustness against overfitting [30]. ...
... However, those cues seem to not directly appear in children with ASD [100]. As a result, engagement is easier to detect in typically developing children than children with ASD [101]. Our proposed approach therefore investigates the role of body movement, particularly the posture of the upper body, using 3D skeleton data as well as gaze direction. ...
Article
Full-text available
Early therapeutic intervention programs help children diagnosed with Autism Spectrum Disorder (ASD) to improve their socio-emotional and functional skills. To relieve the children’s caregivers while ensuring that the children are adequately supported in their training exercises, new technologies may offer suitable solutions. This study investigates the potential of a robotic learning assistant which is planned to monitor the children’s state of engagement and to intervene with appropriate motivational nudges when necessary. To analyze stakeholder requirements, interviews with parents as well as therapists of children with ASD were conducted. Besides a general positive attitude towards the usage of new technologies, we received some important insights for the design of the robot and its interaction with the children. One strongly accentuated aspect was the robot’s adequate and context-specific communication behavior, which we plan to address via an AI-based engagement detection system. Further aspects comprise for instance customizability, adaptability, and variability of the robot’s behavior, which should further be not too distracting while still being highly predictable.
... Uma Rani.R et al. [3] proposed the comparison of classification algorithms with statistical models in autism dataset. Arodami Chorianopoulou et al. [5] used with different modalities like audio, text, video and with parent's action of interactions of typically developing(TD) and ASD children. The moderate accuracy occurred only due to engagement on parents behaviour. ...
Article
Autism Spectrum disorder (ASD) is a neurobiological developmental disorder is symbolize by means of the impairment of social interaction, stereotypic behaviours, and communiqué lack. Early deduction of ASD will enhance the fine of lifestyles of the affected person. The objective of the paper is to focus on the application of various Machine Learning strategies applied for the autism dataset for diagnosing ASD. In this study, the effective pre-processing techniques One-hot encoding, Splitting and Scaling are used to standardize the dataset and the Principal Component Analysis (PCA) evaluator method is applied for the best feature selection. This technique is investigated with various Machine learning techniques like Random Forest, SVM, Logistic Regression, KNN, Naive Bayes. Comparatively, the effective Pre-Processing technique with Random Forest model shows the better accuracy of 92% in diagnosing ASD. When with other metrics such as accuracy, precision, recall, F1-score, ROC, error rate.
... With engagement being an inherently internal mental state of the human interacting with the robot, observers (human or robot) have to resort to the analysis of external cues (vision, speech, audio) to estimate its level [14]. Furthermore, search results show that engagement is easier to predict for TD children than for ASD children [15]. Cues like eye-gaze, blinking and head-pose, which are shown to be indicative of the engagement level in TD children, do not appear so directly connected with it in ASD children [16]. ...
Conference Paper
Full-text available
Estimating the engagement of children is an essential prerequisite for constructing natural Child-Robot Interaction. Especially in the case of children with Autism Spectrum Disorder, monitoring the engagement of the other party allows robots to adjust their actions according to the educational and therapeutic goals in hand. In this work, we delve into engagement estimation with a focus on children with autism spectrum disorder. We propose deep convolutional architectures for engagement estimation that outperform previous methods and explore their performance under variable conditions, in four databases depicting ASD and TD children interacting with robots or humans.
... 14-19 Chorianopoulou et al. collected structured home videos from participants and had expert annotators label the dataset with the actions, emotions, gaze fixations, utterances, and overall level of engagement in each video; this information was then used to train a classifier to identify specific engagement features that could be correlated with ASD. 20 Rudovic et al. trained a large and generalizable neural network to estimate engagement in children with ASD from different cultural backgrounds. 21 Engagement labels were manually annotated by trained individuals. ...
Preprint
Objective Autism spectrum disorder (ASD) is a widespread neurodevelopmental condition with a range of potential causes and symptoms. Children with ASD exhibit behavioral and social impairments, giving rise to the possibility of utilizing computational techniques to evaluate a child’s social phenotype from home videos. Methods Here, we use a mobile health application to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment. We utilize automated dataset annotations to analyze two social indicators that have previously been shown to differ between children with ASD and their neurotypical (NT) peers: (1) gaze fixation patterns and (2) visual scanning methods. We compare the gaze fixation and visual scanning methods utilized by children during a 90-second gameplay video in order to identify statistically-significant differences between the two cohorts; we then train an LSTM neural network in order to determine if gaze indicators could be predictive of ASD. Results Our work identifies one statistically significant region of fixation and one significant gaze transition pattern that differ between our two cohorts during gameplay. In addition, our deep learning model demonstrates mild predictive power in identifying ASD based on coarse annotations of gaze fixations. Discussion Ultimately, our results demonstrate the utility of game-based mobile health platforms in quantifying visual patterns and providing insights into ASD. We also show the importance of automated labeling techniques in generating large-scale datasets while simultaneously preserving the privacy of participants. Our approaches can generalize to other healthcare needs.
... In [26], wearable sensors were used to measure the electrodermal activity of the children and a Support Vector Machine (SVM) classifier was applied to classify the children being engaged or not. In [27], acoustic and linguistic data were utilized to detect the social engagement in conversational interactions of children with ASD and their parents, using an SVM classifier. e first indepth study of measuring the engagement of children when interacting with social robots was proposed by Anzalone et al. [28]. ...
Article
Full-text available
The task of child engagement estimation when interacting with a social robot during a special educational procedure is studied. A multimodal machine learning-based methodology for estimating the engagement of the children with learning difficulties, participating in appropriate designed educational scenarios, is proposed. For this purpose, visual and audio data are gathered during the child-robot interaction and processed towards deciding an engaged state of the child or not. Six single and three ensemble machine learning models are examined for their accuracy in providing confident decisions on in-house developed data. The conducted experiments revealed that, using multimodal data and the AdaBoost Decision Tree ensemble model, the children’s engagement can be estimated with 93.33% accuracy. Moreover, an important outcome of this study is the need for explicitly defining the different engagement meanings for each scenario. The results are very promising and put ahead of the research for closed-loop human centric special education activities using social robots.
... In addition, to further justify the effectiveness and robustness of the approach, we plan to evaluate it on additional large-scale emotional datasets, where annotations from multiple raters are provided, such as SEWA [30]. Moreover, given that our framework can be deployed to other subjective recognition tasks, we would like to examine its generalisation properties on more tasks, such as sentiment analysis [11], personality estimation [35], and engagement detection [7]. Lastly, Bayesian learning-based approaches will be explored to model the uncertainty and learn interpretable representations of emotional instances in future [24]. ...
Article
Full-text available
Predicting emotions automatically is an active field of research in affective computing. Considering the property of the individual’s subjectivity, the label of an emotional instance is usually created based on opinions from multiple annotators. That is, the labelled instance is often accompanied with the corresponding inter-rater disagreement information, which we call here the perception uncertainty. Such uncertainty information, as shown in previous studies, can provide supplementary information for better recognition performance in such a subjective task. In this paper, we propose a multi-task learning framework to leverage the knowledge of perception uncertainty to ameliorate the prediction performance. In particular, in our novel framework, the perception uncertainty is exploited in an explicit manner to manipulate an initial prediction dynamically, in contrast to merely estimating the emotional state and perception uncertainty simultaneously, as done in a conventional multi-task learning framework. To evaluate the feasibility and effectiveness of the proposed method, we perform extensive experiments for time- and value-continuous emotion predictions in audiovisual conversation and music listening scenarios. Compared with other state-of-the-art approaches, our approach yields remarkable performance improvements in both datasets. The obtained results indicate that integrating the perception uncertainty information can enhance the learning process.
... Gaze alternation, in dyadic or group interaction, is used to assess joint attention [1]. Those cues were largely considered, for example, in understanding engagement in autistic children [10]. From the literature, we know that blind children have difficulties in detecting patterns of social interactions, meanwhile sighted people surrounding them may have difficulties assessing where a blind child focuses her attention, since there is neither visual orienting and pointing, and gazing and facial expressions are more neutral [5]. ...
Conference Paper
When developing children interaction systems, such as serious games in educational technology context, it is important to take into account and address relevant cognitive and emotional child's experiences that may influence learning outcomes. Some works were done to analyze and automatically recognize these cognitive and affective states from nonverbal expressive behaviors. However, there is a lack of knowledge about visually impaired children and their body language to convey those states during learning tasks. In this paper, we present an analysis of nonverbal expressive behaviors of both blind and low-vision children, aiming at understanding what type of body communication can be an indicator of two cognitive states: engagement and confidence. In the study we consider the data collected along the EU-ICT H2020 weDRAW Project, while children were asked to solve mathematical tasks with their body. For such dataset, we propose a list of 31 nonverbal behaviors, annotated both by visually impaired rehabilitators and naive observers. In the last part of the paper, we propose a preliminary study on automatic recognition of engagement and confidence states from 2D positional data. The classification results are up to 0.71 (F-score) on three class classification task.
... The social characteristics that a robot should have when performing as tutors were examined in [19], [20]. Specific focus is given in estimating the engagement of children with ASD interacting with adults [21] or robots [22]. A study analyzing the engagement of children participating in a robot-assisted therapy can be found in [23]. ...
Preprint
Full-text available
In this work we tackle the problem of child engagement estimation while children freely interact with a robot in their room. We propose a deep-based multi-view solution that takes advantage of recent developments in human pose detection. We extract the child's pose from different RGB-D cameras placed elegantly in the room, fuse the results and feed them to a deep neural network trained for classifying engagement levels. The deep network contains a recurrent layer, in order to exploit the rich temporal information contained in the pose data. The resulting method outperforms a number of baseline classifiers, and provides a promising tool for better automatic understanding of a child's attitude, interest and attention while cooperating with a robot. The goal is to integrate this model in next generation social robots as an attention monitoring tool during various CRI tasks both for Typically Developed (TD) children and children affected by autism (ASD).
... This may particularly be challenging due to the large individual and cultural heterogeneity in image data of this population. Also, most of existing works on analysis of facial cues in autism focus on eye-gaze, blinking, and head-pose [12], [29], which are shown to be a good proxy of joint attention and engagement -the lack of which is pertinent to ASC. Extracting these cues from face images is usually done using detectors specifically built for each facial cue. ...
Conference Paper
Full-text available
Many children on autism spectrum have atypical behavioral expressions of engagement compared to their neurotypical peers. In this paper, we investigate the performance of deep learning models in the task of automated engagement estimation from face images of children with autism. Specifically, we use the video data of 30 children with different cultural backgrounds (Asia vs. Europe) recorded during a single session of a robot-assisted autism therapy. We perform a thorough evaluation of the proposed deep architectures for the target task, including within- and across-culture evaluations, as well as when using the child-independent and child-dependent settings.We also introduce a novel deep learning model, named CultureNet, which efficiently leverages the multi-cultural data whenperforming the adaptation of the proposed deep architectureto the target culture and child. We show that due to the highlyheterogeneous nature of the image data of children with autism, the child-independent models lead to overall poor estimation of target engagement levels. On the other hand, when a small amount of data of target children is used to enhance the model learning, the estimation performance on the held-out data from those children increases significantly. This is the first time thatthe effects of individual and cultural differences in children with autism have empirically been studied in the context of deep learning performed directly from face images.
... En el caso particular de los trastornos ASD, las emociones relacionadas con el desarrollo comunicacional se encuentran seriamente afectadas y, por lo tanto, una persona con este tipo de déficit no puede decodificar señales emocionales de su interlocutor, porque no tiene la empatía necesaria para hacerlo. Quienes padecen ASD se sienten aislados socialmente y, de esta manera, su patología se profundiza cada vez más, porque precisamente, se ven privados de lo que necesitan para entrenar sus sistemas emocionales poco desarrollados: la sociabilización [8][9][10][11][12][13][14][15]. ...
Conference Paper
Full-text available
RESUMEN Las investigaciones en neurociencias, de las últimas dos décadas, han demostrado la importancia que tienen las plataformas biométricas en muchos procesos cognitivos, y en cuadros complejos de déficit emocional, como el Trastorno del Espectro Autista (ASD, Autism Spectrum Disorders) que afecta a 1 de cada 100 personas en el mundo. La aparición de la robótica afectiva, la inteligencia artificial y los sistemas de aprendizaje automático (ML, Machine Learning) están dando lugar a un área apasionante de la tecnología que se denomina "tecnología de la afectividad", y que en pocos años cambiará la concepción que hoy tenemos de nuestra vida emocional y, también, de la forma en la que nos vinculamos con los sistemas naturales y artificiales. Las aplicaciones innovadoras de este tipo de tecnología, a pacientes con ASD, marcarán el futuro de la misma. Se presenta una descripción de ASD y el estado avance de un diseño experimental piloto del uso de aBCI (affective Brain-Computer Interfaces) para su tratamiento terapéutico.
Article
Engagement is critical to satisfaction and performance in a number of domains but is challenging to measure and sustain. Thus, there is considerable interest in developing affective computing technologies to automatically measure and enhance engagement, especially in the wild and at scale. This article provides an accessible introduction to affective computing research on engagement detection and enhancement using educational applications as an application domain. We begin with defining engagement as a multicomponential construct (i.e., a conceptual entity) situated within a context and bounded by time and review how the past six years of research has conceptualized it. Next, we examine traditional and affective computing methods for measuring engagement and discuss their relative strengths and limitations. Then, we move to a review of proactive and reactive approaches to enhancing engagement toward improving the learning experience and outcomes. We underscore key concerns in engagement measurement and enhancement, especially in digitally enhanced learning contexts, and conclude with several open questions and promising opportunities for future work.
Article
The current study aims to use JASPER program components; joint attention, symbolic play, engagement, and regulation to develop the skills of children with autism and reduce the level of disorder. The study sample consisted of 7 children (3 males and 4 females) in addition to their mothers, the children were enrolled in one of the centers for people with disabilities in the United Arab Emirates. The chronological ages of the sample were between 6 to 8 years. The study used a set of tools to achieve the objectives of the study, including the scale of Jasper tasks for Early Intervention, the autism disorder scale, and a training program prepared by the researcher. The results of the study confirmed the effectiveness of the training program in improving the average scores of the experimental group children on JASPER scale for the dimensions and total score. The results also revealed that there were statistically significant differences between the mean scores before and after the application of the training program for the experimental group children for the dimensions and the total degree of autism disorder diagnosis scale in the direction of the post-test. The results also showed that there were no statistically significant differences between all of post-tests and follow-up tests (a month after the end of the program application). Keywords: Tasks, Program, JASPER, Early intervention, Children with Autism Spectrum Disorder.
Article
Full-text available
Background Autism spectrum disorder (ASD) is a widespread neurodevelopmental condition with a range of potential causes and symptoms. Standard diagnostic mechanisms for ASD, which involve lengthy parent questionnaires and clinical observation, often result in long waiting times for results. Recent advances in computer vision and mobile technology hold potential for speeding up the diagnostic process by enabling computational analysis of behavioral and social impairments from home videos. Such techniques can improve objectivity and contribute quantitatively to the diagnostic process. Objective In this work, we evaluate whether home videos collected from a game-based mobile app can be used to provide diagnostic insights into ASD. To the best of our knowledge, this is the first study attempting to identify potential social indicators of ASD from mobile phone videos without the use of eye-tracking hardware, manual annotations, and structured scenarios or clinical environments. Methods Here, we used a mobile health app to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment. We used automated data set annotations to analyze two social indicators that have previously been shown to differ between children with ASD and their neurotypical (NT) peers: (1) gaze fixation patterns, which represent regions of an individual’s visual focus and (2) visual scanning methods, which refer to the ways in which individuals scan their surrounding environment. We compared the gaze fixation and visual scanning methods used by children during a 90-second gameplay video to identify statistically significant differences between the 2 cohorts; we then trained a long short-term memory (LSTM) neural network to determine if gaze indicators could be predictive of ASD. ResultsOur results show that gaze fixation patterns differ between the 2 cohorts; specifically, we could identify 1 statistically significant region of fixation (P
Conference Paper
Full-text available
Resumen: Las investigaciones en neurociencias, de las últimas dos décadas, han demostrado la importancia que tienen las plataformas biométricas en muchos procesos cognitivos, y en cuadros complejos de déficit emocional, como el Trastorno del Espectro Autista (ASD, Autism Spectrum Disorders) que afecta a 1 de cada 100 personas en el mundo. La aparición de la robótica afectiva, la inteligencia artificial y los sistemas de aprendizaje automático (ML, Machine Learning) están dando lugar a un área apasionante de la tecnología que se denomina "tecnología de la afectividad", y que en pocos años cambiará la concepción que hoy tenemos de nuestra vida emocional y, también, de la forma en la que nos vinculamos con los sistemas naturales y artificiales. Las aplicaciones innovadoras de este tipo de tecnología, a pacientes con ASD, marcarán el futuro de la misma. Se presenta una descripción de ASD y el estado avance de un diseño experimental piloto del uso de aBCI (affective Brain-Computer Interfaces) para su tratamiento terapéutico.
Conference Paper
Full-text available
Atypical speech prosody is a primary characteristic of autism spectrum disorders (ASD), yet it is often excluded from diagnostic instrument algorithms due to poor subjective reliability. Robust, objective prosodic cues can enhance our understanding of those aspects which are atypical in autism. In this work, we connect objective signal-derived descriptors of prosody to subjective perceptions of prosodic awkwardness. Subjectively, more awkward speech is less expressive (more monotone) and more often has perceived awkward rate/rhythm, volume, and intonation. We also find expressivity can be quantified through objective intonation variability features, and that speaking rate and rhythm cues are highly predictive of perceived awkwardness. Acoustic-prosodic features are also able to significantly differentiate subjects with ASD from typically developing (TD) subjects in a classification task, emphasizing the potential of automated methods for diagnostic efficiency and clarity
Conference Paper
Full-text available
Starting from the English affective lexicon ANEW (Bradley and Lang, 1999a) we have created the first Greek affective lexicon. It contains human ratings for the three continuous affective dimensions of valence, arousal and dominance for 1034 words. The Greek affective lexicon is compared with affective lexica in English, Spanish and Portuguese. The lexicon is automatically expanded by selecting a small number of manually annotated words to bootstrap the process of estimating affective ratings of unknown words. We experimented with the parameters of the semantic-affective model in order to investigate their impact to its performance, which reaches 85% binary classification accuracy (positive vs. negative ratings). We share the Greek affective lexicon that consists of 1034 words and the automatically expanded Greek affective lexicon that contains 407K words.
Article
Full-text available
Recognition of intentions is a subconscious cognitive process vital to human communication. This skill enables anticipation and increases the quality of interactions between humans. Within the context of engagement, non-verbal signals are used to communicate the intention of starting the interaction with a partner. In this paper, we investigated methods to detect these signals in order to allow a robot to know when it is about to be addressed. Originality of our approach resides in taking inspiration from social and cognitive sciences to perform our perception task. We investigate meaningful features, i.e. human readable features, and elicit which of these are important for recognizing someone’s intention of starting an interaction. Classically, spatial information like the human position and speed, the human–robot distance are used to detect the engagement. Our approach integrates multimodal features gathered using a companion robot equipped with a Kinect. The evaluation on our corpus collected in spontaneous conditions highlights its robustness and validates the use of such a technique in a real environment. Experimental validation shows that multimodal features set gives better precision and recall than using only spatial and speed features. We also demonstrate that 7 selected features are sufficient to provide a good starting engagement detection score. In our last investigation, we show that among our full 99 features set, the space reduction is not a solved task. This result opens new researches perspectives on multimodal engagement detection.
Conference Paper
Full-text available
Signal-derived measures can provide effective ways towards quantifying human behavior. Verbal Response Latencies (VRLs) of children with Autism Spectrum Disorders (ASD) during conversational interactions are able to convey valuable information about their cognitive and social skills. Motivated by the inherent gap between the external behavior and inner affective state of children with ASD, we study their VRLs in relation to their explicit but also implicit behavioral cues. Explicit cues include the children's language use, while implicit cues are based on physiological signals. Using these cues, we perform classification and regression tasks to predict the duration type (short/long) and value of VRLs of children with ASD while they interacted with an Embodied Conversational Agent (ECA) and their parents. Since parents are active participants in these triadic interactions, we also take into account their linguistic and physiological behaviors. Our results suggest an association between VRLs and these externalized and internalized signal information streams, providing complementary views of the same problem.
Conference Paper
Full-text available
Children with Autism Spectrum Conditions (ASC) may experi-ence significant difficulties to recognise and express emotions. The ASC-Inclusion project is setting up an internet-based digi-tal gaming experience that will assist children with ASC to im-prove their socio-emotional communication skills, combining voice, face, and body gesture analysis, and giving corrective feedback regarding the appropriateness of the child's expres-sions. The present contribution focuses on the recognition of emotion in speech and on feature analysis. For this purpose, a database of prompted phrases was collected in Hebrew, induc-ing nine emotions embedded in short-stories. It contains speech of children with ASC and typically developing children under the same conditions. We evaluate the emotion task over the nine categories including the binary valence/arousal discrimina-tion. We further investigate the discrimination of each emotion against neutral. The results show performances for arousal and valance of up to 86.5% and for nine emotions including neutral of up to 42% unweighted average recall. Moreover we compare and analyse manually selected prosodic features with automatic selected features with respect to their relevance for discriminat-ing each of the eight emotion classes.
Conference Paper
Full-text available
Atypical prosody, often reported in children with Autism Spectrum Disorders, is described by a range of qualitative terms that reflect the eccentricities and variability among persons in the spectrum. We investigate various wordand phonetic-level features from spontaneous speech that may quantify the cues reflecting prosody. Furthermore, we introduce the importance of jointly modeling the psychologist's vocal behavior in this dyadic interaction. We demonstrate that acoustic-prosodic features of both participants correlate with the children's rated autism severity. For increasing perceived atypicality, we find children's prosodic features that suggest 'monotonic' speech, variable volume, atypical voice quality, and slower rate of speech. Additionally, we find the psychologist's features inform their perception of a child's atypical behavior-e.g., the psychologist's pitch slope and jitter are increasingly variable and their speech rate generally decreases.
Article
Full-text available
This study aimed to identify the nature and extent of receptive and expressive prosodic deficits in children with high-functioning autism (HFA). Thirty-one children with HFA, 72 typically developing controls matched on verbal mental age, and 33 adults with normal speech completed the prosody assessment procedure, Profiling Elements of Prosodic Systems in Children. Children with HFA performed significantly less well than controls on 11 of 12 prosody tasks (p < .005). Receptive prosodic skills showed a strong correlation (p < .01) with verbal mental age in both groups, and to a lesser extent with expressive prosodic skills. Receptive prosodic scores also correlated with expressive prosody scores, particularly in grammatical prosodic functions. Prosodic development in the HFA group appeared to be delayed in many aspects of prosody and deviant in some. Adults showed near-ceiling scores in all tasks. The study demonstrates that receptive and expressive prosodic skills are closely associated in HFA. Receptive prosodic skills would be an appropriate focus for clinical intervention, and further investigation of prosody and the relationship between prosody and social skills is warranted.
Conference Paper
Full-text available
We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.
Article
Full-text available
Colwyn Trevarthen, working on autism, discussed the importance of time, rhythm and temporal processing in brain function. The brains of new born infants show highly coherent and coordinated patterns of activity over time, and their rhythms are remarkably similar to those of adults. Since the cortex has not yet developed, this coordination must be subcortical in origin. The likely source is the emotional motor system. He noted that the cerebellum might regulate the intricate timing of the development and expression of emotional communication. He also pointed out that emotional and motivational factors have often been seriously neglected in psychology (largely owing to a misplaced focus on 'cognition' as some isolated entity) and emphasized the potential importance of empathetic support and music therapy in helping autistic children.
Article
Full-text available
This study investigated social attention impairments in autism (social orienting, joint attention, and attention to another's distress) and their relations to language ability. Three- to four-year-old children with autism spectrum disorder (ASD; n = 72), 3- to 4-year-old developmentally delayed children (n = 34), and 12- to 46-month-old typically developing children (n = 39), matched on mental age, were compared on measures of social orienting, joint attention, and attention to another's distress. Children with autism performed significantly worse than the comparison groups in all of these domains. Combined impairments in joint attention and social orienting were found to best distinguish young children with ASD from those without ASD. Structural equation modeling indicated that joint attention was the best predictor of concurrent language ability. Social orienting and attention to distress were indirectly related to language through their relations with joint attention. These results help to clarify the nature of social attention impairments in autism, offer clues to developmental mechanisms, and suggest targets for early intervention.
Article
Full-text available
Shriberg et al. [Shriberg, L. et al. (2001). Journal of Speech, Language and Hearing Research, 44, 1097-1115] described prosody-voice features of 30 high functioning speakers with autistic spectrum disorder (ASD) compared to age-matched control speakers. The present study reports additional information on the speakers with ASD, including associations among prosody-voice variables and ratings of communication social abilities. Results suggest that the inappropriate sentential stress and hypernasality previously identified in some of these speakers is related to communication/sociability ratings. These findings and associated trends are interpreted to indicate important links between prosodic performance and social and communicative competence. They suggest the need for careful assessment of inappropriate prosody and voice features in speakers with ASD, and for effective intervention programs aimed at reducing the stigmatization of individuals with these conditions.
Article
Full-text available
Attention-deficit/hyperactivity disorder (ADHD) is associated with functional impairments in different areas of daily life. One such area is social functioning. The purpose of this paper is to critically review research on social dysfunctioning in children with ADHD. Children with ADHD often have conflicts with adults and peers, and suffer from unpopularity, rejection by peers, and a lack of friendships, in part as a consequence of their ADHD symptoms. Comorbid oppositional defiant or conduct disorder aggravates these impairments. In some cases the inadequate social behavior of children with ADHD may be phenomenologically and etiologically related to pervasive developmental disorders (PDD). However, the causes and consequences of PDD symptoms in ADHD are understudied. Also, the relative contributions of ADHD, on the one hand, and comorbid disorders, on the other, to the course of social impairments are unknown. Social dysfunctioning in children with ADHD appears to increase their risk of later psychopathology other than ADHD. Thus far effective treatment for social dysfunctioning is lacking. Future research should address the exact nature and long-term consequences of social dysfunctioning in children with ADHD, and focus on development of effective treatment strategies.
Article
List of illustrations Acknowledgements 1. Asperger and his syndrome 2. 'Autistic psychopathy' in childhood 3. The relationship between Asperger's syndrome and Kanner's autism 4. Clinical and neurobiological aspects of Asperger syndrome in six family studies 5. Asperger syndrome in adulthood 6. Living with Asperger's syndrome 7. The autobiographical writings of three Asperger syndrome adults: problems of interpretation and implications for theory Name index Subject index.
Article
Impaired social communication and social reciprocity are the primary phenotypic distinctions between autism spectrum disorders (ASD) and other developmental disorders. We investigate quantitative conversational cues in child-psychologist interactions using acoustic-prosodic, turn-taking, and language features. Results indicate the conversational quality degraded for children with higher ASD severity, as the child exhibited difficulties conversing and the psychologist varied her speech and language strategies to engage the child. When interacting with children with increasing ASD severity, the psychologist exhibited higher prosodic variability, increased pausing, more speech, atypical voice quality, and less use of conventional conversational cue such as assents and non-fluencies. Children with increasing ASD severity spoke less, spoke slower, responded later, had more variable prosody, and used personal pronouns, affect language, and fillers less often. We also investigated the predictive power of features from interaction subtasks with varying social demands placed on the child. We found that acoustic prosodic and turn-taking features were more predictive during higher social demand tasks, and that the most predictive features vary with context of interaction. We also observed that psychologist language features may be robust to the amount of speech in a subtask, showing significance even when the child is participating in minimal-speech, low social-demand tasks.
Article
Researchers from various disciplines are concerned with the study of affective phenomena, especially arousal. Expressed affective modulations, which reflect both an individual's internal state and external factors, are central to the communicative process. Bone et al. developed a robust, unsupervised (rule-based) method which provides a scale-continuous, bounded arousal rating from the vocal signal. In this study, we investigate the joint-dynamics of child and psychologist vocal arousal in autism spectrum disorder (ASD) diagnostic interactions. Arousal synchrony is assessed with multiple methods. Results indicate that children with higher ASD severity tend to lead the arousal dynamics more, seemingly because the children aren't as responsive to the psychologist's affective modulations. A vocal arousal model is also proposed which incorporates social and conversational constructs. The model captures conversational signal relations, and is able to distinguish between high and low ASD severity at accuracies well-above chance.
Chapter
The development of joint attention reflects and contributes to the early developmental processes necessary for social engagement and social competence in infants. Results of longitudinal studies suggest that the tendencies of infants to initiate joint attention (IJA) bids could be predictive of some aspects of social engagement and social competence during childhood. Observations further suggest that more frequent IJA bids during infancy could be used as a marker of at-risk children's vulnerability to poor social outcomes. IJA measures may be useful in identifying children who are likely to have hyperactivity and attention problems, or those who may have stronger resistance to the negative impact of moderate attachment disturbances. Measures of joint attention could provide unique data on processes affecting developmental continuity, risk, and social outcomes for children. © 2006 by Peter J. Marshall & Nathan A. Fox. All rights reserved.
Article
Student engagement is a key concept in contemporary education, where it is valued as a goal in its own right. In this paper we explore approaches for automatic recognition of engagement from students’ facial expressions. We studied whether human observers can reliably judge engagement from the face; analyzed the signals observers use to make these judgments; and automated the process using machine learning. We found that human observers reliably agree when discriminating low versus high degrees of engagement (Cohen’s $kappa = 0.96$). When fine discrimination is required (four distinct levels) the reliability decreases, but is still quite high ($kappa = 0.56$). Furthermore, we found that engagement labels of 10-second video clips can be reliably predicted from the average labels of their constituent frames (Pearson $r=0.85$), suggesting that static expressions contain the bulk of the information used by observers. We used machine learning to develop automatic engagement detectors and found that for binary classification (e.g., high engagement versus low engagement), automated engagement detectors perform with comparable accuracy to humans. Finally, we show that both human and automatic engagement judgments correlate with task performance. In our experiment, student post-test performance was predicted with comparable accuracy from engagement labels ($r=0.47$) as from pre-test scores ($r=0.44$).
Article
Observational methods are fundamental to the study of human behavior in the behavioral sciences. For example, in the context of research on intimate relationships, psychologists’ hypotheses are often empirically tested by video recording interactions of couples and manually coding relevant behaviors using standardized coding systems. This coding process can be time-consuming, and the resulting coded data may have a high degree of variability because of a number of factors (e.g., inter-evaluator differences). These challenges provide an opportunity to employ engineering methods to aid in automatically coding human behavioral data. In this work, we analyzed a large corpus of married couples’ problem-solving interactions. Each spouse was manually coded with multiple session-level behavioral observations (e.g., level of blame toward other spouse), and we used acoustic speech features to automatically classify extreme instances for six selected codes (e.g., “low” vs. “high” blame). Specifically, we extracted prosodic, spectral, and voice quality features to capture global acoustic properties for each spouse and trained gender-specific and gender-independent classifiers. The best overall automatic system correctly classified 74.1% of the instances, an improvement of 3.95% absolute (5.63% relative) over our previously reported best results. We compare performance for the various factors: across codes, gender, classifier type, and feature type.
Conference Paper
The Data Category Registry is one of the ISO initiatives towards the establishment of standards for Language Resource management, creation and coding. Successful application of the DCR depends on the availability of tools that can interact with it. This paper describes the first steps that have been taken to provide users of the multimedia annotation tool ELAN, with the means to create references from tiers and annotations to data categories defined in the ISO Data Category Registry. It first gives a brief description of the capabilities of ELAN and the structure of the documents it creates. After a concise overview of the goals and current state of the ISO DCR infrastructure, a description is given of how the preliminary connectivity with the DCR is implemented in ELAN.
The role of dialog in human robot interaction
  • Candace L Sidner
  • Christopher Lee
  • Neal Lesh
Candace L. Sidner, Christopher Lee, and Neal Lesh, "The role of dialog in human robot interaction," in International workshop on language understanding and agents for real world interaction, 2003.
Babyrobot -next generation social robots: Enhancing communication and collaboration development of TD and ASD children by developing and commercially exploiting the next generation of human-robot interaction technologies
  • Alexandros Potamianos
  • Costas Tzafestas
  • Elias Iosif
  • Franziska Kirstein
  • Petros Maragos
  • Kerstin Dauthenhahn
  • Joakim Gustafson
  • John Erland Stergaard
  • Stefan Kopp
  • Preben Wik
  • Oliver Pietquin
  • Samer Al Moubayed
Alexandros Potamianos, Costas Tzafestas, Elias Iosif, Franziska Kirstein, Petros Maragos, Kerstin Dauthenhahn, Joakim Gustafson, John Erland stergaard, Stefan Kopp, Preben Wik, Oliver Pietquin, and Samer Al Moubayed, "Babyrobot -next generation social robots: Enhancing communication and collaboration development of TD and ASD children by developing and commercially exploiting the next generation of human-robot interaction technologies," in Proceedings of the 2nd Workshop on Evaluating Child-Robot Interaction (CRI) at Human-Robot Interaction (HRI'16), 2016.
Origins of semiosis: Sign evolution in nature and culture
  • Colwyn Trevarthen
Colwyn Trevarthen, "Infant semiosis1," Origins of semiosis: Sign evolution in nature and culture, vol. 116, pp. 219, 1994.
Children's verbal turn-taking Developmental pragmatics Acoustical analysis of engagement behavior in children
  • Susan Ervin-Tripp Gupta
  • Chi-Chun Lee
  • Daniel Bone
  • Agata Rozga
  • Sungbok Lee
  • Shrikanth Narayanan
Susan Ervin-Tripp, " Children's verbal turn-taking, " Developmental pragmatics, pp. 391–414, 1979. [18] Rahul Gupta, Chi-Chun Lee, Daniel Bone, Agata Rozga, Sungbok Lee, and Shrikanth Narayanan, " Acoustical analysis of engagement behavior in children., " in Workshop on Child Computer Interaction (WOCCI), 2012, pp. 25–31.
Prelinguistic communication
  • Vasudevi Reddy
  • Martyn Barrett
Vasudevi Reddy and Martyn Barrett, "Prelinguistic communication," The development of language, pp. 25-50, 1999.
Acoustical analysis of engagement behavior in children
  • Rahul Gupta
  • Chi-Chun Lee
  • Daniel Bone
  • Agata Rozga
  • Sungbok Lee
  • Shrikanth Narayanan
Rahul Gupta, Chi-Chun Lee, Daniel Bone, Agata Rozga, Sungbok Lee, and Shrikanth Narayanan, "Acoustical analysis of engagement behavior in children.," in Workshop on Child Computer Interaction (WOCCI), 2012, pp. 25-31.
Learning how to meanexplorations in the development of language
  • Michael Alexander Kirkwood Halliday
Michael Alexander Kirkwood Halliday, "Learning how to meanexplorations in the development of language," 1975.