David Escudero-Mancebo

David Escudero-Mancebo
Universidad de Valladolid | UVA · Department of Informatics

Doctor of Engineering
Associate Professor of Computer Science, University of Valladolid

About

104
Publications
20,232
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,211
Citations
Citations since 2017
29 Research Items
582 Citations
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120

Publications

Publications (104)
Article
Full-text available
The relation between scientific research and engineering design is fraught with controversy. While the number of academic PhD programs on design grows, because the discipline is in its infancy, there is no consolidated method for systematically approaching the generation of knowledge in this domain. This paper reviews recently published papers from...
Conference Paper
Full-text available
RESUMEN Las personas con discapacidad intelectual son un colectivo que merece especial atención a la hora de definir la interacción en aplicaciones educativas. Esta comunicación tiene como objetivo presentar en el congreso Interacción 2022 la versión actual del videojuego educativo "La piedra mágica", una herramienta para el entrenamiento del uso d...
Article
Full-text available
Oral productions of speakers with Down syndrome exhibit special characteristics that have been the target of study for decades. In spite of this attention, the availability of rich resources for its analysis is still scarce. In this paper, we present the definition and compiling procedure of a corpus of semi-controlled oral productions of speakers...
Preprint
The relation between scientific research and industrial design is fraught with controversy. While the number of academic PhD programs on product design grows because the discipline is in its infancy, there is no consolidated method for systematically approaching the generation of knowledge in this domain. This paper aims to review recently publishe...
Article
The speech of people with Down syndrome (DS) shows prosodic features which are distinct from those observed in the oral productions of typically developing (TD) speakers. Although a different prosodic realization does not necessarily imply wrong expression of prosodic functions, atypical expression may hinder communication skills. The focus of this...
Article
Full-text available
General-purpose automatic speech recognition (ASR) systems have improved in quality and are being used for pronunciation assessment. However, the assessment of isolated short utterances, such as words in minimal pairs for segmental approaches, remains an important challenge, even more so for non-native speakers. In this work, we compare the perform...
Preprint
Full-text available
General–purpose automatic speech recognition (ASR) systems have improved their quality and are being used for pronunciation assessment. However, the assessment of isolated short utterances, as words in minimal pairs for segmental approaches, remains an important challenge, even more for non-native speakers. In this work, we compare the performance...
Article
Full-text available
The use of ICT tools is broadly extended among people with intellectual disabilities and also, to a lesser degree, the use of learning tools including learning games. Although the use of learning games is widely accepted due to its high engagement capacity, there are few studies that analyze its usability for people with intellectual disabilities....
Conference Paper
Recent advances on speech technologies (automatic speech recognition, ASR, and text-to-speech, TTS, synthesis) have led to their integration in computer-assisted pronunciation training (CAPT) tools. However, pronunciation is an area of teaching that has not been developed enough since there is scarce empirical evidence assessing the effectiveness o...
Conference Paper
General–purpose state-of-the-art automatic speech recognition (ASR) systems have notably improved their quality in the last decade opening the possibility to be used in different practical applications, such as pronunciation assessment. However, the assessment of short words as minimal pairs in segmental approaches remains an important challenge fo...
Article
Full-text available
Learning games have a remarkable potential for education. They provide an emergent form of social participation that deserves the assessment of their usefulness and efficiency in learning processes. This study describes a novel learning game for foreign pronunciation training in which players can challenge each other. Native Spanish speakers perfor...
Article
Over the last few years, we have witnessed a growing interest in computer-assisted pronunciation training (CAPT) tools and the commercial success of foreign language teaching applications that incorporate speech synthesis and automatic speech recognition technologies. However, empirical evidence supporting the pedagogical effectiveness of these sys...
Article
Full-text available
Prosody is a fundamental speech element responsible for communicative functions such as intonation, accent and phrasing, and prosodic impairments of individuals with intellectual disabilities reduce their communication skills. Yet, technological resources have paid little attention to prosody. This study aims to develop an automatic classifier to p...
Conference Paper
Full-text available
In this document, we describe the mobile application Japañol 1 , a learning tool which helps pronunciation training of Spanish as a foreign language (L2) at a segmental level. The tool has been specifically designed to be used by native Japanese people , and implies a branch of a previous CAPT gamified tool TipTopTalk!. In this case, a predefined c...
Conference Paper
Full-text available
Availability and usability of mobile smart devices and speech technologies ease the development of language learning applications, although many of them do not include pronunciation practice and improvement. A key to success is to choose the correct methodology and provide a sound experimental validation assessment of their pedagogical effectivenes...
Conference Paper
Full-text available
There are many software tools that rely on speech technologies for providing to users L2 pronunciation training in the field of Computer Assisted Pronunciation Training (CAPT) [1]. Currently the most popular mobile and desktop operating systems grant users a free access to several Text-To-Speech (TTS) and Automatic Speech Recognition (ASR) systems....
Article
There are many studies that identify important deficits in the voice production of people with Down syndrome. These deficits affect not only the spectral domain, but also the intonation, accent, rhythm and speech rate. The main aim of this work is the identification of the acoustic features that characterize the speech of people with Down syndrome,...
Article
Full-text available
Un par mínimo es un conjunto de dos palabras que difieren en sólo uno de los fonemas que constituyen su producción oral, cambiando por completo su significado. Existen programas informáticos que emplean pares mínimos para el entrenamiento de la pronunciación de lengua extranjera, principalmente para el inglés. En este artículo se presenta una herra...
Conference Paper
Full-text available
Feedback is an important concern in Computer-Assisted Pronunciation Training (CAPT), inasmuch as it bears on a sys-tem's capability to correct users' input and promote improved L2 pronunciation performance in the target language. In this paper, we test the use of synthetic voice as a corrective feedback resource. A group of students used a CAPT too...
Article
Full-text available
This paper presents a novel methodology to characterize the style of different speakers or groups of speakers. This methodology uses sequences of prosodic labels (automatic Sp_ToBI labels) to compare and differentiate these speaking styles. A set of metrics based on conditional entropy is used to compute the distance between two speakers or group o...
Article
Full-text available
This article describes the design, implementation and evaluation of an educational video game that helps individuals with Down syndrome to improve their speech skills, specifically those related to prosody. Special attention has been paid to the design of the user interface, taking into account the cognitive, learning and attentional limitations of...
Conference Paper
Full-text available
This demonstration describes the TipTopTalk! mobile application , a serious game for foreign language (L2) pronunciation training, based on the minimal-pairs technique. Multiple Spoken Language Technologies (SLT) such as speech recognition and text-to-speech conversion are integrated in our system. User's interaction consists in a sequence of chall...
Conference Paper
An analysis of the prosodic characteristics of the voice of people with intellectual disability is presented in this paper. A serious game has been developed for training the communicative competences of people with intellectual disability, including those related with prosody. An evaluation of the video game was carried out and, as a result, a cor...
Conference Paper
This work presents an analysis over the set of results derived from the goodness of pronunciation (GOP) algorithm for the evaluation of pronunciation at phoneme level over the SAMPLE corpus of non native speech. This corpus includes several recordings of uttered sentences by distinct speakers that have been rated in terms of quality by a group of l...
Conference Paper
Full-text available
We present a foreign language (L2) pronunciation training serious game, TipTopTalk!, based on the minimal-pairs technique. We carried out a three-week test experiment where participants had to overcome several challenges including exposure, discrimination and production , while using Text-To-Speech (TTS) and Automatic Speech Recognition (ASR) syste...
Article
Full-text available
The Magic Stone " 1 is a video game whose main aim is to help people with Down syndrome to improve communication skills that have been affected due to their disability, especially those related with prosody. The interface of the video game includes a number of elements to motivate the users to practice and train their pronunciation. The usability t...
Conference Paper
Full-text available
We present a L2 pronunciation training serious game based on the minimal-pairs technique, incorporating sequences of exposure, discrimination and production, and using text-to-speech and speech recognition systems. We have measured the quality of users' production during a period of time in order to assess improvement after using the application. S...
Conference Paper
Full-text available
Swain's (1985) Comprehensible Output Hypothesis considers that input alone may not be enough for second/foreign language (L2) learners to acquire new language forms. The Hypothesis claims that producing an L2 will facilitate L2 learning due to the mental processes related with language production. Thus, learners will more likely notice discrepancie...
Conference Paper
Full-text available
Computer Assisted Pronunciation Training (CAPT) apps are becoming widespread to aid learning new languages. However, they are still highly criticized for the lack of the unreplaceable need of direct feedback from a human expert. The combination of the right learning methodology with a gamification design strategy can, nevertheless, increase engagem...
Conference Paper
Full-text available
This paper introduces the architecture and interface of a serious game intended for pronunciation training and assessment for Spanish students of English as second language. Users will confront a challenge consisting in the pronunciation of a minimal-pair word battery. Android ASR and TTS tools will prove useful in discerning three different pronun...
Conference Paper
Full-text available
Prosodic prominence is an umbrella term encompassing various related but conceptually and functionally different phenomena such as phonological stress, paralinguistic emphasis, lexical, syntactic, semantic or pragmatic salience, to mention a few. Due to the high interest prominence has received from various disciplines, it has been studied from mul...
Article
Full-text available
Based on a philosophy of integrating components from multimodal interaction applications with 3D graphical environments, reusing already defined markup language for describing graphics, graphical and spoken interactions based on the interactive movie metaphor, a markup language for modeling scenes, behavior and interaction is sought. With the defin...
Article
In this paper we present some experiments on multiclass ToBI pitch accent classification. The system is based on the fusion of pairwise classifiers, which are specialized in the distinction of pairs of prosodic labels. Several machine learning techniques, including neural networks, decision trees and support vector machines, are combined in differe...
Article
Full-text available
One of the goals of the Glissando research project1 is to enrich a radio news corpus [1] with Sp ToBI labels. In this paper we present the application of the automatic predictions of a fuzzy classifier to speed up the labeling process. The strategy is proposed after completing the following steps: a) manual annotation of a part of the Glissando cor...
Article
Full-text available
Literature review on prosody reveals the lack of corpora for prosodic studies in Catalan and Spanish. In this paper, we present a corpus intended to fill this gap. The corpus comprises two distinct data-sets, a news subcorpus and a dialogue subcorpus, the latter containing either conversational or task-oriented speech. More than 25 h were recorded...
Article
Full-text available
In this work we present a methodology for modelling intona- tion from corpus that operates with alternative types of intona- tion units. We compare prediction results obtained using a set of different ones. Results permit to select most suitable one de- pending on the corpus and to obtain information about the rela- tive importance of different pro...
Article
This paper presents an original approach to automatic prosodic labeling. Fuzzy logic techniques are used for representing situations of high uncertainty with respect to the category to be assigned to a given prosodic unit. The Fuzzy Integer technique is used to combine the output of different base classifiers. The resulting fuzzy classifier benefit...
Article
Full-text available
Este artículo presenta un proyecto de digitalización 3D de piezas del yacimiento arqueológico de Pintia realizado a lo largo del año 2012. Se describen las características del programa empleado, que genera los modelos a partir de fotografías. Se muestran los resultados obtenidos y se discute sobre el uso potencial de los modelos virtuales.
Article
Full-text available
Literature review on prosody reveals the lack of corpora for prosodic studies in Catalan and Spanish. In this paper, we present a corpus intended to fill this gap. The corpus comprises two distinct data-sets, a news subcorpus and a dialogue subcorpus, the latter containing either conversational or task-oriented speech. More than 25 h were recorded...
Conference Paper
Full-text available
In this paper, we present an experiment on computer assisted prosodic labeling in which a labeler team validates or corrects ToBI pitch accents automatically predicted by a classifier. The innovative aspect of this automatic system is that it is not a deterministic prediction model, it offers the human transcriber more than one label per word (mult...
Article
This paper presents a system that automatically labels tones and break indices (ToBI) events. The detection (binary classification) of prosodic events has received significantly more attention from researchers than its classification because of the intrinsic difficulty of classification. We focus on the classification problem, identifying eight typ...
Article
Full-text available
A set of tools to analyze inconsistencies observed in a Cat_ToBI labeling experiment are presented. We formalize and use the metrics that are commonly used in inconsistency tests. The metrics are systematically applied to analyze the robustness of every symbol and every pair of transcribers. The results reveal agreement rates for this study that ar...
Article
Full-text available
Until now, speech synthesis has mainly involved reading-style speech. Today, however, text-to-speech systems must provide a variety of styles because users expect these interfaces to do more than just read information. If synthetic voices must be integrated into future technology, they must simulate the way people talk instead of the way people rea...
Article
Full-text available
This paper presents a technique that allows us to detect similarities among prosodic labels used to describe pitch accents within the ToBI framework. The inter-label proximity is determined empirically as a result of the evidence obtained in contingency tables of inter-transcriber agreement tests and in the confusion matrices used in automatic pros...
Conference Paper
Full-text available
In this paper we present an experimental study on how corpus-based automatic prosodic information labeling can be transferred from a source language to a different target language. The Spanish ESMA corpus is used to train models for the identification of the prominent words. Then, the models are used to identify the accented words of the English Bo...
Conference Paper
Full-text available
RESUMEN En este artículo se estudia un rasgo prosódico del español consistente en la suspensión temporal del grupo fónico mediante el alargamiento de un alófono o la inclusión de una pausa pre-oclusiva dentro de una palabra. Primero se constata empíricamente la existencia del fenómeno en un corpus de hablantes ibero americanos de español. A continu...
Conference Paper
Full-text available
This paper presents an experimental study on how corpus-based automatic prosodic information labeling can be transferred from a source language to a different target language. Tone accent identification models trained for Spanish, using the ESMA corpus, are used to automatically assign tonal accent ToBI labels on the (English) Boston Radio news cor...
Conference Paper
Full-text available
This contribution faces the ToBI accent recognition problem with the goal of multiclass identification vs. the more conservative Accent vs. No Accent approach. A neural network and a decision tree are used for automatic recognition of the ToBI accents in the Boston Radio Corpus. Multiclass classification results show the difficulty of the problem a...
Conference Paper
Full-text available
In the present paper we present a new approach to the synthesis of filled pauses. The problem is tackled from the point of view of disfluent speech synthesis. Based on the synthetic disfluent speech model, we analyse the features that describe filled pauses and propose a model to predict them. The model was implemented and perceptually evaluated wi...
Article
Full-text available
This paper reports on the results of a pilot study that was run to assess the labeling consistency of the proposed approach in Sp-ToBI before starting a large-scale production of annotations in the project Glissando. This test should serve to refine the model and to maintain consistently the annotation conventions across transcription sites. The Sp...
Conference Paper
This paper describes a multimodal architecture to control 3D avatars with speech dialogs and mouse events. We briefly describe the scripting language used to specify the sequences and the components of the architecture supporting the system. Then we focus on the evaluation procedure that is proposed to test the system. The discussion on the evaluat...
Poster
Full-text available
The paper presents the advances in the annotation of the Glissando corpus and discusses results concerning to two areas: a) the consistency and reliability of the inventory of tonal accents in Spanish when applied to large sets of data. The paper reports on the results of a pilot study that was run to assess the labeling consistency of the proposed...
Poster
Full-text available
Previous accounts of prosodic phrasing patterns in Catalan (Frota et al., 2007) have shown various cues that are indicative of phrase boundaries.This paper reports on the distributional analysis of the boundary types in the Festcat corpus. The most relevant results are the following: (1) Strategies of phrasing are equivalent in the two voices (male...
Chapter
Full-text available
This chapter describes a framework to integrate voice interaction in 3D worlds allowing users to manage VRML objects by using speech dialogs. We have defined a language named XMMVR to specify in a common program the 3D scenes and the multimodal interaction. XMMVR is based on the theater metaphor adding the possibility to include speech dialogs for...
Conference Paper
Full-text available
Disfluent speech synthesis is necessary in some applications such as automatic film dubbing or spoken translation. This pa- per presents a model for the generation of synthetic disfluent speech based on inserting each element of a disfluency in a con- text where they can be considered fluent. Prosody obtained by the application of standard techniqu...
Article
Full-text available
This communication presents an ongoing research on the defini-tion of a methodology to compare the intonation of two different corpora. The two corpora that we compare here, try to be repre-sentative of the Spanish and Catalan intonation respectively. As a consequence, the comparison reported here, projects the most relevant differences between the...
Article
Full-text available
With the definition of a mark-up language for specifying scenes, behaviour and multimodal interaction based on the metaphor of interactive movie, we want to propose a framework to develop applications allowing multimodal interaction with virtual reality worlds. Reusing standardized mark-up languages for describing graphics, vocal interaction and gr...
Conference Paper
Full-text available
Speech synthesis techniques have already reached a high level of naturalness. However, they are often evaluated on text reading tasks. New applications will request for conversational speech instead and dis- fluencies are crucial in such a style. The present paper presents a sys- tem to predict filled pauses and synthesise them. Objective results s...