Science topic

Speech Technologies - Science topic

Explore the latest publications in Speech Technologies, and find Speech Technologies experts.
Filters
All publications are displayed by default. Use this filter to view only publications with full-texts.
Publications related to Speech Technologies (2,328)
Sorted by most recent
Article
Full-text available
This review investigates the development and influence of AI anchors in the broadcasting sector, focusing on their significance, theoretical underpinnings, and practical applications. It explores the shift from traditional human presenters to AI-driven systems, with attention to advancements in natural language processing, speech technology, and au...
Preprint
Full-text available
Speech technologies are transforming interactions across various sectors, from healthcare to call centers and robots, yet their performance on African-accented conversations remains underexplored. We introduce Afrispeech-Dialog, a benchmark dataset of 50 simulated medical and non-medical African-accented English conversations, designed to evaluate...
Article
Full-text available
The development of the digital world is increasingly rapid, especially in the era of Industry 4.0 which is marked by advances in information technology. One application of this technology is in learning the Qur'an, the holy book of Muslims which contains divine guidance. This study explores the potential of artificial intelligence (AI) technology,...
Preprint
Full-text available
There is a growing need for diverse, high-quality stuttered speech data, particularly in the context of Indian languages. This paper introduces Project Boli, a multi-lingual stuttered speech dataset designed to advance scientific understanding and technology development for individuals who stutter, particularly in India. The dataset constitutes (a)...
Article
Full-text available
Purpose In this review article, we present an extensive overview of recent developments in the area of dysarthric speech research. One of the key objectives of speech technology research is to improve the quality of life of its users, as evidenced by the focus of current research trends on creating inclusive conversational interfaces that cater to...
Preprint
Full-text available
Spoken language datasets are vital for advancing linguistic research, Natural Language Processing, and speech technology. However, resources dedicated to Italian, a linguistically rich and diverse Romance language, remain underexplored compared to major languages like English or Mandarin. This survey provides a comprehensive analysis of 66 spoken I...
Preprint
Full-text available
While recent multilingual automatic speech recognition models claim to support thousands of languages, ASR for low-resource languages remains highly unreliable due to limited bimodal speech and text training data. Better multilingual spoken language understanding (SLU) can strengthen massively the robustness of multilingual ASR by levering language...
Preprint
Full-text available
The Mice Autism Detection via Ultrasound Vocalization (MAD-UV) Challenge introduces the first INTERSPEECH challenge focused on detecting autism spectrum disorder (ASD) in mice through their vocalizations. Participants are tasked with developing models to automatically classify mice as either wild-type or ASD models based on recordings with a high s...
Preprint
Full-text available
Advancements in spoken language technologies for neurodegenerative speech disorders are crucial for meeting both clinical and technological needs. This overview paper is vital for advancing the field, as it presents a comprehensive review of state-of-the-art methods in pathological speech detection, automatic speech recognition, pathological speech...
Article
Full-text available
Intelligent speech technology has emerged as a transformative tool in enhancing oral expression within primary school Chinese education. This study integrates key technological components, including voice data collection, recognition, evaluation, synthesis, and interactive feedback, to elevate instructional quality. The "Yi Dian Hui" system exempli...
Article
Full-text available
Speaker diarization, the process of segmenting audio into speaker-specific regions, plays a critical role in various speech technologies by determining "who spoke when" in a conversation. This technique is particularly valuable for enhancing automatic speech recognition (ASR) and conversational artificial intelligent systems. However, its applicati...
Article
Full-text available
Spoken language datasets are vital for advancing linguistic research, Natural Language Processing, and speech technology. However, resources dedicated to Italian, a linguistically rich and diverse Romance language, remain underexplored compared to major languages like English or Mandarin. This survey provides a comprehensive analysis of 66 spoken I...
Preprint
Full-text available
This study investigates the impact of integrating a dataset of disordered speech recordings ($\sim$1,000 hours) into the fine-tuning of a near state-of-the-art ASR baseline system. Contrary to what one might expect, despite the data being less than 1% of the training data of the ASR system, we find a considerable improvement in disordered speech re...
Book
Full-text available
Natural Language processing is one of the fast-growing research areas that benefit the real world in various aspects. It gives the ability to machines to understand the text and audio in an efficient manner as human beings. NLP drives program code that supports virtual assistants, a voice-operated GPS system, text-to-speech transformation, and many...
Conference Paper
Full-text available
Parenthetical clauses serve as essential elements in English discourse, providing supplementary information, clarification, or commentary. This study investigates the intonational patterns associated with parenthetical clauses and their discourse functions in spoken English. Using data from authentic conversations and corpus-based analysis, the res...
Preprint
Full-text available
Modern speech technologies enable the artificial replication, or cloning, of the human voice. In the present study, we investigated whether listeners’ perception and social evaluation of state-of-the-art voice clones depend on whether the clone being heard is a replica of the self, a friend, or a total stranger. We recorded and cloned the voices of...
Preprint
Full-text available
This study addresses the critical issue of gender-based violence's (GBV) impact on women's mental health. GBV, encompassing physical and sexual aggression, often results in long-lasting adverse effects for the victims, including anxiety, depression, post-traumatic stress disorder (PTSD), and substance abuse. Artificial Intelligence (AI)-based speec...
Article
Full-text available
Intelligent Speech Technology (IST) is revolutionizing healthcare by enhancing transcription accuracy, disease diagnosis, and medical equipment control in smart hospital environments. This study introduces an innovative approach employing federated learning with Multi-Layer Perceptron (MLP) and Gated Recurrent Unit (GRU) neural networks to improve...
Article
Full-text available
Variability in speech pronunciation is widely observed across different linguistic backgrounds, which impacts modern automatic speech recognition performance. Here, we evaluate the performance of a self-supervised speech model in phoneme recognition using direct articulatory evidence. Findings indicate significant differences in phoneme recognition...
Chapter
Full-text available
Speech technology has made significant advances with the introduction of deep learning and large datasets, enabling automatic speech recognition and synthesis at a practical level. Dialogue systems and conversational AI have also achieved dramatic advances based on the development of large language models. However, the application of these technolo...
Preprint
Full-text available
This article surveys convolution-based models - convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker identification, emotion recognition, and speech enhancement applications. Through comparative training cost assessment, mode...
Preprint
Full-text available
This article surveys convolution-based models convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker identification, emotion recognition, and speech enhancement applications. Through comparative training cost assessment, model...
Article
Full-text available
This article explores disabled experience and the future of technologies relating to augmentative and alternative communication (AAC). This field includes people’s use of AAC devices, typically in combination with other modes of communication, including vocalising, revoicing and body language. Such devices have speech technology and digital voices...
Preprint
Full-text available
There has been a surge of interest in leveraging speech as a marker of health for a wide spectrum of conditions. The underlying premise is that any neurological, mental, or physical deficits that impact speech production can be objectively assessed via automated analysis of speech. Recent advances in speech-based Artificial Intelligence (AI) models...
Article
Full-text available
Sequence-to-sequence models have been applied to many challenging problems, including those in text and speech technologies. Normalization is one of them. It refers to transforming non-standard language forms into their standard counterparts. Non-standard language forms come from different written and spoken sources. This paper deals with one such...
Preprint
Full-text available
With the significant progress of speech technologies, spoken goal-oriented dialogue systems are becoming increasingly popular. One of the main modules of a dialogue system is typically the dialogue policy, which is responsible for determining system actions. This component usually relies only on audio transcriptions, being strongly dependent on the...
Article
Full-text available
Today, various interactive tools or partially available artificial intelligence applications are actively used in educational processes to solve multiple problems for resource-rich languages, such as English, Spanish, French, etc. Unfortunately, the situation is different and more complex for low-resource languages, like Kazakh, Uzbek, Mongolian, a...
Conference Paper
Full-text available
Computer-Assisted Pronunciation Training (CAPT) for non-native children leverages speech technology to aid in improving pronunciation accuracy. Hybrid automatic speech recognition (ASR) models, combining neural networks with statistical methods, are well-suited for CAPT due to their high accuracy and reduced latency, especially in limited search sp...
Article
Full-text available
Phonetics is the scientific field concerned with the study of how speech is produced, heard, and perceived. It abounds with data, such as acoustic speech recordings, neuroimaging data, or articulatory data. In this article, we provide an introduction to different areas of phonetics (acoustic phonetics, sociophonetics, speech perception, articulator...
Article
Full-text available
Purpose The Speech Accessibility Project (SAP) intends to facilitate research and development in automatic speech recognition (ASR) and other machine learning tasks for people with speech disabilities. The purpose of this article is to introduce this project as a resource for researchers, including baseline analysis of the first released data packa...
Article
Full-text available
Nous utilisons des données de terrain en créole haïtien, récoltées il y a 40 ans sur cassettes puis numé- risées, pour entraîner un modèle natif d’apprentissage auto-supervisé (SSL) de la parole (WAV2VEC2) en haïtien. Nous utilisons une approche de pré-entraînement continu (CPT) sur des modèles SSL pré-entraînés de deux langues étrangères : la lang...
Preprint
Full-text available
This paper presents a review of 107 research papers relating to speech and sex or gender in ISCA Interspeech publications between 2013 and 2023. We note the scarcity of work on this topic and find that terminology, particularly the word \textit{gender}, is used in ways that are underspecified and often out of step with the prevailing view in social...
Preprint
Full-text available
This literature review surveys the advancements of keyword spotting (KWS) technologies, specifically focusing on Urdu, Pakistan's low-resource language (LRL), which has complex phonetics. Despite the global strides in speech technology, Urdu presents unique challenges requiring more tailored solutions. The review traces the evolution from foundatio...
Preprint
Full-text available
The diagnosis and treatment of individuals with communication disorders offers many opportunities for the application of speech technology, but research so far has not adequately considered: the diversity of conditions, the role of pragmatic deficits, and the challenges of limited data. This paper explores how a general-purpose model of perceived p...
Conference Paper
Full-text available
Adopting a social interactionist approach to technology for speaking development (Egbert & Shahrokni, 2018), this paper evaluates the potential of mobile-based resources to promote semi-autonomous speaking practice for second language (L2) tourism students at a university in Central America. Tools were evaluated based on Chapelle and Jamieson's (20...
Research Proposal
Full-text available
Sprachanalysen im Asylverfahren – Kritische Reflexionen und Zukunftsperspektiven Sprachanalysen und Sprachgutachten kommen im Zuge von Asylverfahren in Österreich und in anderen Staaten immer wieder zur Anwendung. Dabei wird darauf abgezielt, einen eindeutigen Herkunftsort und/oder eine ethnische Zuschreibung von Asylwerber*innen zu ermitteln....
Conference Paper
Full-text available
Speech technology has been increasingly deployed in various areas of daily life including sensitive domains such as healthcare and law enforcement. For these technologies to be effective, they must work reliably for all users while preserving individual privacy. Although tradeoffs between privacy and utility, as well as fairness and utility, have b...
Preprint
Full-text available
Recent speech technologies have led to produce high quality synthesised speech due to recent advances in neural Text to Speech (TTS). However, such TTS models depend on extensive amounts of data that can be costly to produce and is hardly scalable to all existing languages, especially that seldom attention is given to low resource languages. With t...
Preprint
Full-text available
This paper addresses the persistent challenge in Keyword Spotting (KWS), a fundamental component in speech technology, regarding the acquisition of substantial labeled data for training. Given the difficulty in obtaining large quantities of positive samples and the laborious process of collecting new target samples when the keyword changes, we intr...
Preprint
Full-text available
The evolution and diversity of a language is evident from it's various dialects. If the various dialects are not addressed in technological advancements like automatic speech recognition and speech synthesis, there is a chance that these dialects may disappear. Speech technology plays a role in preserving various dialects of a language from going e...
Preprint
Full-text available
Speech technology has been increasingly deployed in various areas of daily life including sensitive domains such as healthcare and law enforcement. For these technologies to be effective, they must work reliably for all users while preserving individual privacy. Although tradeoffs between privacy and utility, as well as fairness and utility, have b...
Article
Full-text available
This mixed-methods study investigated the potential of speech technology and machine learning to enhance inclusive education across rural India, sub-Saharan Africa, and Southeast Asia. A voice-based adaptive learning platform was developed and implemented, revealing significant improvements in language learning (d = 0.58) and basic numeracy (d = 0....
Preprint
Full-text available
This paper introduces FLEURS-R, a speech restoration applied version of the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) corpus. FLEURS-R maintains an N-way parallel speech corpus in 102 languages as FLEURS, with improved audio quality and fidelity by applying the speech restoration model Miipher. The aim of FLEURS-R...
Article
Full-text available
Special education can be defined as specially designed instruction and other related services to meet special needs, which ensures the students with disabilities succeed both academically and personally. Briefly, special education challenges entail proper identification of students, effective and practical development of the Individualized Educatio...
Article
In the ever-evolving landscape of speech technology, achieving robust speech recognition and speaker identification in real-world noisy environments remains a significant challenge. This research proposes a novel hybrid Long Short-Term Memory-Convolutional Neural Network (LSTM-CNN) architecture designed to enhance the accuracy and reliability of sp...
Preprint
Full-text available
The neural codec model reduces speech data transmission delay and serves as the foundational tokenizer for speech language models (speech LMs). Preserving emotional information in codecs is crucial for effective communication and context understanding. However, there is a lack of studies on emotion loss in existing codecs. This paper evaluates neur...
Preprint
Full-text available
The VoicePrivacy Challenge promotes the development of voice anonymisation solutions for speech technology. In this paper we present a systematic overview and analysis of the second edition held in 2022. We describe the voice anonymisation task and datasets used for system development and evaluation, present the different attack models used for eva...
Article
Full-text available
With mobile and embedded devices getting more integrated in our daily lives, the focus is increasingly shifting toward human-friendly interfaces, making automatic speech recognition (ASR) a central player as the ideal means of interaction with machines. ASR is essential for many cognitive computing applications, such as speech-based assistants, dic...
Article
Full-text available
This article proposes a hierarchical and graded urban power grid disaster response method based on automatic voice warning technology. This method combines automatic speech technology and hierarchical classification methods to achieve fast and accurate information transmission and response. This article first introduces the background and significa...
Preprint
Full-text available
Forced alignment (FA) plays a key role in speech research through the automatic time alignment of speech signals with corresponding text transcriptions. Despite the move towards end-to-end architectures for speech technology, FA is still dominantly achieved through a classic GMM-HMM acoustic model. This work directly compares alignment performance...
Article
Full-text available
Latar belakang: Perpustakaan memilki peran penting dalam memenuhi kebutuhan informasi penggunanya tanpa terkecuali termasuk masyarakat berkebutuhan khusus. Penelitian ini berfokus pada kebutuhan informasi kelompok disabilitas tunanetra. Kelompok tunaetra adalah individu yang memiliki keterbatasan dalam hal penglihatan sehingga tidak dapat menerima...
Article
Full-text available
This study investigates gender-based phonological variations in the acoustic production of paired monophthongs among Pahari native speakers of Pakistani English (PakE). It addresses the research questions concerning how male and female speakers differ in the acoustic characteristics of English monophthongs within the PakE context. The study builds...
Preprint
Full-text available
The development of speech technologies for languages with limited digital representation poses significant challenges, primarily due to the scarcity of available data. This issue is exacerbated in the era of large, data-intensive models. Recent research has underscored the potential of leveraging weak supervision to augment the pool of available da...
Conference Paper
Full-text available
Bilingual children at a young age can benefit from exposure to dual language, impacting their language and literacy development. Speech technology can aid in developing tools to accurately quantify children's exposure to multiple languages, thereby helping parents, teachers, and early-childhood practitioners to better support bilingual children. Th...
Preprint
Full-text available
Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived...
Preprint
Full-text available
The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-r...
Preprint
Full-text available
Self-supervised speech representations can hugely benefit downstream speech technologies, yet the properties that make them useful are still poorly understood. Two candidate properties related to the geometry of the representation space have been hypothesized to correlate well with downstream tasks: (1) the degree of orthogonality between the subsp...
Preprint
Full-text available
In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language represen...
Preprint
Full-text available
One of the central skills that language learners need to practice is speaking the language. Currently, students in school do not get enough speaking opportunities and lack conversational practice. Recent advances in speech technology and natural language processing allow for the creation of novel tools to practice their speaking skills. In this wor...
Article
Full-text available
В статье рассмотрены вопросы адаптивной декомпозиции сигналов на основе разложения на эмпирические моды. Представлен модифицированный метод декомпозиции на эмпирические моды, позволяющий устранить избыточность в разложении. Исследуются свойства эмпирических мод. Представлены сравнительные характеристики методов декомпозиции на эмпирические моды, ан...
Article
Full-text available
ThisSpecial Issue presents the latest advances in research and novel applications of speech and language technologies based on the works presented at the sixth edition of the IberSPEECH conference held in Granada in 2022, paying special attention to those focused on Iberian languages. IberSPEECH is the international conference of the Special Intere...
Chapter
Full-text available
Semantic searching offers significant advantages over full-text search, particularly because it allows users to formulate queries in natural language without knowing the precise indexed key phrases. By using vector databases that store and index data as high-dimensional vectors, we can search through large datasets in real-time. In this work, we pr...
Conference Paper
Full-text available
Despite the growing demand for digital therapeutics for children with Autism Spectrum Disorder (ASD), there is currently no speech corpus available for Korean children with ASD. This paper introduces a speech corpus specifically designed for Korean children with ASD, aiming to advance speech technologies such as pronunciation and severity evaluatio...
Conference Paper
Full-text available
Research projects incorporating spoken data require either a selection of existing speech corpora, or they plan to record new data. In both cases, recordings need to be transcribed to make them accessible to analysis. Underestimating the effort of transcribing can be risky. Automatic Speech Recognition (ASR) holds the promise to considerably reduce...
Preprint
Full-text available
Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce. This is mainly due to the rarity of large datasets with controlled demographic tags. This paper introduces the Sonos Voice Control Bias Assessment Dataset, an open dataset composed o...
Article
Full-text available
This study aims to investigate the use of mobile learning to provide pronunciation training for lecturers of English as a Foreign Language (EFL) from Vietnamese provincial universities. Mobile learning offers a potential solution for the delivery of professional development to lecturers based outside major cities thanks to its capacity to enable le...
Thesis
Full-text available
This thesis describes the development of an interactive digital edition of Kʷu Sqilxʷ /We are the People: A Trilogy of Okanagan Legends in nsyilxcən. It is the first digital edition to date to use automatic speech-to-text alignment for nsyilxcən. The written portion of this Digital Humanities (DH) project addresses a longstanding schism between wes...
Conference Paper
Full-text available
The integration of a speech technology into a digital edition to support the acquisition of a critically endangered Indigenous language is a complex task. More than simply consisting of technical challenges of working with an under-resourced language, researchers face the potential of re-enacting causes of language endangerment without rigorous adh...
Article
Full-text available
The emergence of Artificial Intelligence Generated Content (AIGC) represents a new opportunity for the development of intelligent communication. The development of AIGC technology has given rise to new media transformations, creating new content production and dissemination methods. As one of the core areas of the AIGC application, generative speec...
Thesis
Full-text available
This project aims to develop Geabaire, a customised communication tool for non-speaking users of the Irish language. Geabaire uses pictures matched with text to construct sentences spoken aloud through synthetic speech. The Phonetics and Speech Laboratory at Trinity College, Dublin, is working on this project as part of the larger ABAIR initiative....
Preprint
Full-text available
With the increasing role of speech technology in society, it has become indispensable for researchers to develop systems for low-resource languages that can cater to the needs of the common man residing in remote areas. This paper presents the inaugural attempt to develop an automatic continuous speech recognition system in Maithili, a low-resource...
Conference Paper
Full-text available
Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in a monolithic way as a macro-language, resulting in disparities for dialects and varieties for which there are...