Article

Human Computer Interface Based on Tongue and Lips Movements and its Application for Speech Therapy System

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We found that researchers have addressed various types of SSDs in the literature. Out of the 24 studies reviewed, 3 did not specifically address any particular SSD (Bílková et al., 2020;Desolda et al., 2021;Hair et al., 2018). Instead, these studies proposed AI-based automated speech therapy for a general SSD population and experimentally evaluated these tools without specifying any particular SSD. ...
... PocketSphinx is a real-time continuous speaker-independent speech recognition system based on Sphinx II as its baseline (Huggins-Daines et al., 2006). The CNN technique was used by two studies for automatic evaluation of tongue/lip movement (Bílková et al., 2020) and providing visual feedback of the user's vocal tract (Anjos et al., 2020). Another study evaluated the feasibility of their proposed tool using the Wizard of Oz (WoZ) technique to imitate therapy scenarios . ...
... The majority of the studies (13 out of 24) focused on fully automated tools, often lacking active stakeholder participation in the decision-making process (Anjos et al., 2018(Anjos et al., , 2020Bílková et al., 2020;Y. J. Chen & Huang, 2007;Y. ...
Article
In response to the limited availability of speech therapy services, researchers are developing AI-based automated speech therapy tools for individuals with speech sound disorder (SSD). However, their effectiveness/efficacy compared to conventional speech therapy remains unclear, and no guidelines exist for designing these tools or their required automation levels compared to therapy by speech-language pathologists (SLPs). Moreover, AI applications raise concerns about job displacement, biased algorithms, and privacy issues. This systematic review aims to provide comprehensive insights into AI-based automated speech therapy, focusing on (i) types of SSD addressed; (ii) AI techniques used; (iii) autonomy levels achieved; (iv) delivery modes; and (v) effectiveness/efficacy of these tools. PRISMA guidelines were applied across five databases for studies published between January 2007 and February 2022. Twenty-four studies that met the inclusion and exclusion criteria were included. Results suggest that articulation disorders are the most frequently addressed SSD. Various AI techniques were applied, ranging from traditional automatic speech recognition (ASR) to advanced methods. Most studies proposed fully automated tools, often overlooking the role of other stakeholders. Computer-based and gamified applications were the most common intervention modes. The results suggest the potential of AI-based automated speech therapy tools for individuals with SSD; however, only a few studies have compared their effectiveness/efficacy with conventional speech therapy. Further research is needed to develop speech corpora for under-represented languages, apply a Human-Centered AI approach, conduct usability studies on intervention modes, and perform more rigorous effectiveness studies.
... Researchers have also specifically worked and devised AI-based tools for persons with hearing impairment [23,24]. A novel tongue-based Human-Computer interaction tool [25] and gamified AI-based tool [7] for persons with motor speech disorder have been proposed. [28,29]. ...
... While Desolda et al. emphasized the role of caregivers and SLP in the design of a remote therapy tool, "Pronuntia" [12], Ng et al. proposed a fully automated assessment tool using the CUChild 127 speech corpus in Cantonese [14]. In another study, Bilkova et al. developed a novel lip, tongue, and teeth detection system using Convolutional Neural Network (CNN) and Augmented Reality (AR) for supporting the automatic evaluation of speech therapy exercises [25]. ...
... In another similar study, Ramamurthy et al. proposed a therapy robot, "Buddy," allowing children to practice assigned exercises at home [31]. Many studies have incorporated serious games as an intervention tool for automatic speech therapy [7,21,25,31,34]. One of the studies incorporated augmented reality to build a serious game using tongue detection [25]. ...
Preprint
Full-text available
This paper presents a systematic literature review of published studies on AI-based automated speech therapy tools for persons with speech sound disorders (SSD). The COVID-19 pandemic has initiated the requirement for automated speech therapy tools for persons with SSD making speech therapy accessible and affordable. However, there are no guidelines for designing such automated tools and their required degree of automation compared to human experts. In this systematic review, we followed the PRISMA framework to address four research questions: 1) what types of SSD do AI-based automated speech therapy tools address, 2) what is the level of autonomy achieved by such tools, 3) what are the different modes of intervention, and 4) how effective are such tools in comparison with human experts. An extensive search was conducted on digital libraries to find research papers relevant to our study from 2007 to 2022. The results show that AI-based automated speech therapy tools for persons with SSD are increasingly gaining attention among researchers. Articulation disorders were the most frequently addressed SSD based on the reviewed papers. Further, our analysis shows that most researchers proposed fully automated tools without considering the role of other stakeholders. Our review indicates that mobile-based and gamified applications were the most frequent mode of intervention. The results further show that only a few studies compared the effectiveness of such tools compared to expert Speech-Language Pathologists (SLP). Our paper presents the state-of- the-art in the field, contributes significant insights based on the research questions, and provides suggestions for future research directions.
... For instance, Desolda et al. developed a web application that enables SLPs to assign speech therapy exercises remotely to children with SSD, benefiting both children and their caregivers [9]. Additionally, various technologies such as tablet-based therapy tool [10][11][12][13][14][15], computer-based prosody teaching system [16], robotic assistants [17,18], and augmented reality system [19] have been proposed and integrated into the field. Many of these tools leverage Automatic Speech Recognition (ASR), an advanced AIpowered technology that transforms spoken language into written text, among other AI techniques. ...
Preprint
Full-text available
Background: While Speech-Language Pathologists (SLPs) are crucial in addressing Speech Sound Disorder (SSD), a global shortage of SLPs poses significant challenges in providing speech therapy services, particularly in impoverished and rural areas. Despite the potential of AI-based automated speech therapy tools, concerns such as job displacement, algorithmic bias, and privacy issues persist. Purpose: This study adopted a Human-Centered AI (HCAI) approach to understand the needs and perspectives of SLPs, aiming to inform the development of a Human-Centered AI-based Speech Therapy Tool (HCAI-STT) for children with SSD. Methods: A qualitative study using deductive reflexive thematic analysis with MAXQDA software was conducted to explore the needs and perspectives of SLPs. Results: The domain understanding theme highlighted the complexity of functional SSD, emphasizing unknown etiology, parental concerns, and the developmental nature of speech acquisition. Current practices involve using digital tools under supervision and adhering to therapy guidelines. Key challenges included accessibility issues, socio-economic constraints, and the absence of a standardized Assamese Photo Articulation Test (PAT). Future directions highlighted the need for technology-based interventions, culturally relevant audio-visual stimuli, mobile-based solutions, and affordable tools. Conclusion: The findings emphasize the necessity for a culturally tailored, technologically advanced approach to speech therapy. Recommendations include integrating Assamese PAT, culturally relevant audio-visual stimuli, AI-based diagnostic and feedback tools, and home-based therapy with supervision. These insights will guide the development of the HCAI-STT, enhancing AI integration in speech therapy and improving quality and accessibility. Future research will engage additional stakeholders and develop and evaluate the tool’s usability, efficacy, and effectiveness.
... For instance, Desolda et al. developed a web application that enables SLPs to assign speech therapy exercises remotely to children with SSD, benefiting both children and their caregivers [9]. Additionally, various technologies such as tablet-based speech therapy game [16], computer-based prosody teaching system [23], robotic assistants [24,26], and augmented reality system [5] have been proposed and integrated into the field. Many of these tools leverage ASR technology, an advanced AI-powered technology that transforms spoken language into written text, among other AI techniques. ...
Chapter
With the advent of improved Artificial Intelligence (AI) algorithms and the availability of large datasets, researchers worldwide are developing numerous AI-based applications to replicate human capabilities. One such application is automating the task of Speech Language Pathologists (SLPs) and building automated speech therapy tools for children with Speech Sound Disorder (SSD). However, this development of AI focused on imitating human capabilities brings concerns such as algorithmic discrimination or biased algorithms, job displacements, and privacy issues. To address these challenges, researchers advocate for Human-Centered AI (HCAI) and have proposed various frameworks for AI-based systems. Although the proposed frameworks were developed for generalized AI application, it is not clear about its relevance in specialized AI application such as speech therapy. This study aims to establish HCAI goals and a goal hierarchy specific to an HCAI-based Speech Therapy Tool (HCAI-STT) designed for children with SSD. Through an Affinity Mapping exercise, we identify seven top-level goals and sub-goals, which include fairness, responsibility and accountability, human-centered empowerment, trustworthiness, privacy, unbiased funding, and security. Our findings highlight the importance of considering not only the technical capabilities of AI systems, but also their ethical and social implications. By prioritizing these goals, we can help ensure that AI-based speech therapy tools are developed and deployed in a responsible and ethical manner that aligns with the needs and values of their users. Our findings have broader implications for the development and deployment of AI systems across domains, and future research can build on our findings by exploring how the goal hierarchy we developed can be operationalized in practice.
... As opposed, our application is unique in providing a direct real-time evaluation of exercise performance based on image processing of standard webcam data allowing widespread use. Moreover, the proposed methods for detection of the tip of the tongue and lips is applicable in other areas such as a humancomputer interface for disabled people, as described in [5]. ...
Chapter
This research paper presents a novel software tool designed to revolutionize speech therapy for individuals with Childhood Apraxia of Speech (CAS), also known as Developmental Verbal Dyspraxia. The software offers a comprehensive multi-modal analysis approach, utilizing video, audio, and speech-to-text data to extract valuable insights into articulation patterns, head pose, audio characteristics, and word usage. This information empowers Speech-Language Pathologists (SLPs) with data-driven tools for a more precise assessment and development of personalized treatment plans. While the current study employs data from just one healthy subject to evaluate the software’s overall accuracy and data coherence, the functionalities are promising to improve therapy effectiveness for individuals with CAS, giving space for further experiments, analysis and validations with experts. The paper explores how the software’s capabilities can complement existing therapies like PROMPT, Touch-Cue Method, and Melodic Intonation Therapy, based on a cognitive perspective, ultimately aiming to transform the field of speech-language pathology and enhance the lives of those affected by speech disorders.
Preprint
Full-text available
This paper presents a systematic literature review of published studies on AI-based automated speech therapy tools for persons with speech sound disorders (SSD). The COVID-19 pandemic has initiated the requirement for automated speech therapy tools for persons with SSD making speech therapy accessible and affordable. However, there are no guidelines for designing such automated tools and their required degree of automation compared to the conventional speech therapy given by Speech Language Pathologists (SLPs). In this systematic review, we followed the PRISMA framework to address four research questions: 1) what types of SSD do AI-based automated speech therapy tools address, 2) what is the level of autonomy achieved by such tools, 3) what are the different modes of intervention, and 4) how effective are such tools in comparison with the conventional mode of speech therapy. An extensive search was conducted on digital libraries to find research papers relevant to our study from 2007 to 2022. The results show that AI-based automated speech therapy tools for persons with SSD are increasingly gaining attention among researchers. Articulation disorders were the most frequently addressed SSD based on the reviewed papers. Further, our analysis shows that most researchers proposed fully automated tools without considering the role of other stakeholders. Our review indicates that mobile-based and gamified applications were the most frequent mode of intervention. The results further show that only a few studies compared the effectiveness of such tools compared to the conventional mode of speech therapy. Our paper presents the state-of-the-art in the field, contributes significant insights based on the research questions, and provides suggestions for future research directions.
Article
Full-text available
We propose two automatic methods for detecting bleeding in wireless capsule endoscopy videos of the small intestine. The first one uses solely the color information, whereas the second one incorporates the assumptions about the blood spot shape and size. The original idea is namely the definition of a new color space that provides good separability of blood pixels and intestinal wall. Both methods can be applied either individually or their results can be fused together for the final decision. We evaluate their individual performance and various fusion rules on real data, manually annotated by an endoscopist. © 2016 Society of Photo-Optical Instrumentation Engineers (SPIE).
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Article
There are many excellent toolkits which provide support for developing machine learning soft- ware in Python, R, Matlab, and similar environments. Dlib-ml is an open source library, targeted at both engineers and research scientists, which aims to provide a similarly rich environment for developing machine learning software in the C++ language. Towards this end, dlib-ml contains an extensible linear algebra toolkit with built in BLAS support. It also houses implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classifi- cation, regression, clustering, anomaly detection, and fe ature ranking. To enable easy use of these tools, the entire library has been developed with contract p rogramming, which provides complete and precise documentation as well as powerful debugging tools.
Automatic Evaluation of Speech Therapy Exercises Based on Image Data
  • Zuzana Bílková
Zuzana Bílková, et al., Automatic Evaluation of Speech Therapy Exercises Based on Image Data, Image Analysis and Recognition : 16th International Conference, ICIAR, 397-404, (2019).
System and method for using a haptic device as an input device
  • Arthur E Quaid
Arthur E. Quaid III, System and method for using a haptic device as an input device, U.S. Patent No. 8,095,200, (2012).
Input device using eye-tracking
  • C Yoon
  • Seok
Yoon C. Seok, Input device using eye-tracking, U.S. Patent Application 15/357,184, (2017).