Ignasi Iriondo Sanz

Ignasi Iriondo Sanz
Universitat Ramon Llull | URL · La Salle Engineering

PhD

About

58
Publications
7,916
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
512
Citations
Citations since 2016
8 Research Items
180 Citations
2016201720182019202020212022010203040
2016201720182019202020212022010203040
2016201720182019202020212022010203040
2016201720182019202020212022010203040

Publications

Publications (58)
Article
Full-text available
The concentration of CO2 is relatively large in poultry farms and high accumulations of this gas reduce animal welfare. Good control of its concentration is crucial for the health of the animals. The vocalizations of the chickens can show their level of well-being linked to the presence of carbon dioxide. An audio recording system was implemented a...
Article
Full-text available
Qualitative research activities, including first-day of class surveys and user experience interviews on completion of a subject were carried out to obtain students’ feedback in order to improve the design of the subject ‘Information Systems’ as a part of a general initiative to enhance ICT (Information and Communication Technologies) engineering pr...
Article
Full-text available
The expectations, attitudes, engagement, and motivation of students are key elements when designing learning activities. Several studies have been implemented and different strategies and activities have been analyzed to improve the aforesaid aspects of learning content. In the context of the New Learning Context (NLC), this paper presents the find...
Article
Full-text available
The COVID-19 pandemic significantly disrupted traditional face-to-face teaching worldwide and forced education institutions to adopt new, online teaching formats to enable students to continue with their studies. This research focuses on students’ perceptions of three teaching different modalities: face-to-face (F2F), Emergency Remote Teaching (ERT...
Article
Full-text available
Poultry meat is the world’s primary source of animal protein due to low cost and is widely eaten at a global level. However, intensive production is required to supply the demand although it generates stress to animals and welfare problems, which have to be reduced or eradicated for the better health of birds. In this study, bird welfare is measure...
Article
This work describes the design, implementation and evaluation of a multi-subject learning experience based on the principles of Constructionism, in which the construction of a videogame is the learning artifact that engages students in four different technical and management subjects included in the ICT engineering degree curricula of the School of...
Article
Full-text available
This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F 0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the use...
Article
Full-text available
The aim of this article is to classify children’s affective states in a real-life non-prototypical emotion recognition scenario. The framework is the same as that proposed in the Interspeech 2009 Emotion Challenge. We used a large set of acoustic features and five linguistic parameters based on the concept of emotional salience. Features were extra...
Article
Full-text available
The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic an...
Conference Paper
Full-text available
Detection of affective states in speech could improve the way users interact with electronic devices. However the analysis of speech at the acoustic level could be not enough to determine the emotion of a user speaking in a realistic scenario. In this paper we analysed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and lin...
Conference Paper
Full-text available
This paper presents an approach to improve emotion recognition from spontaneous speech. We used a wrapper method to reduce an acoustic set of features and feature-level fusion to merge them with a set of linguistic ones. The proposed system was evaluated with the FAU Aibo Corpus. We considered the same emotion set that was proposed in the Interspee...
Conference Paper
Full-text available
This paper describes three categorical classification approaches to spontaneous children's emotion recognition based on acoustic features from speech. Also, we present a fourth approach combining by stacking generalisation the two best classifiers. We used the FAU Aibo Corpus to work under real-life conditions, dealing with spontaneous speech and w...
Article
Full-text available
This paper proposes an approach to transform speech from a neutral style into other expressive styles using both prosody and voice quality (VoQ). The main aim is to validate the usefulness of VoQ in the enhancement of expressive synthetic speech. A Harmonic plus Noise Model (HNM) is used to modify speech following a set of rules extracted from an e...
Conference Paper
Full-text available
This paper describes our participation in the INTERSPEECH 2009 Emotion Challenge [1]. Starting from our previous experience in the use of automatic classification for the validation of an expressive corpus, we have tackled the difficult task of emotion recognition from speech with real-life data. Our main contribution to this work is related to the...
Article
Full-text available
This paper presents an automatic system able to enhance expressiveness in speech corpora recorded from acted or stimulated speech. The system is trained with the results of a subjective evaluation carried out on a reduced set of the original corpus. Once the system has been trained, it is able to check the complete corpus and perform an automatic p...
Article
Full-text available
A novel way to learn and track simultaneously the appearance of a previously non-seen face without intrusive techniques can be found in this article. The presented approach has a causal behaviour: no future frames are needed to process the current ones. The model used in the tracking process is refined with each input frame thanks to a new algorith...
Conference Paper
Full-text available
This paper describes a high-quality Spanish HMM-based speech synthesis of emotional speaking styles. The quality of the HMM-based speech synthesis is enhanced by using the most recent features presented for the Blizzard system (i.e. STRAIGHT spectrum extraction and mixed excitation). Two techniques are evaluated. First, a method simultaneously mode...
Conference Paper
An artificial vision system for vehicles is proposed in this article to alert drivers of potential head on collisions. It is capable of detecting any type of frontal collision from any type of obstacle that may present itself in a vehiclepsilas path. The system operates based on a sequence of algorithms whose images are recorded on a camera located...
Chapter
Full-text available
The use of speech in human-machine interaction is increasing as the computer interfaces are becoming more complex but also more useable. These interfaces make use of the information obtained from the user through the analysis of different modalities and show a specific answer by means of different media. The origin of the multimodal systems can be...
Conference Paper
Full-text available
TRUE (Testing platfoRm for mUltimedia Evaluation) is an online platform developed to create and perform subjective tests oriented to the evaluation of stimuli of different nature such as audio, video, graphics and text. Due to the high flexibility that the platform offers to researchers different kinds of tests can be carried out, such as emotion i...
Article
Full-text available
RESUMEN En este trabajo se presenta un nuevo procedimiento para la medida de los parámetros de cualidad de voz (VoQ), el jitter y el shimmer. Este nuevo procedimiento tiene en consideración la prosodia del enunciado, de manera que su efecto se atenúa antes de realizar la medida de cada uno de los parámetros. El objetivo, además de realizar la medid...
Conference Paper
Full-text available
In this work, the capability of voice quality parameters to discriminate among different expressive speech styles is analyzed. To that effect, the data distribution of these parameters, directly measured from the acoustic speech signal, is used to train a Linear Discriminant Analysis that conducts an automatic classification. As a result, the most...
Article
Full-text available
This paper presents the validation of the expres-siveness of an acted corpus produced to be used in speech synthesis, as this kind of emotional speech can be rather lacking in authenticity. The goal is to obtain a system which is able to prune bad utterances from an expressiveness point of view. The results from a previous subjective test are used...
Conference Paper
Full-text available
This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis. The use of acted speech can be rather lacking in authenticity and therefore its expressiveness validation is required. The goal is to obtain an automatic classifier able to prune the bad utterances –with wrong expressiveness–. Fi...
Conference Paper
Full-text available
In this paper a fuzzy system for automatically assessing the students’ teamwork performance is presented. The main goal of this work is to guarantee an equitable assessment of students’ teamwork throughout the course and across the lecturers of the same subject when subjective criteria are considered. The proposed fuzzy system (i) is designed by us...
Conference Paper
Full-text available
Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using an external machine learning technique to help improving the expressi...
Conference Paper
Full-text available
This paper presents the validation of the expressiveness of an acted oral corpus produced to be used in speech synthesis. Firstly, an objective validation has been conducted by means of automatic emotion identification techniques using statistical features extracted from the prosodic parameters of speech. Secondly, a listening test has been perform...
Conference Paper
Full-text available
This paper presents the use of analogical learning, in particular case-based reasoning, for the automatic generation of prosody from text, which is automatically tagged with prosodic features. This is a corpus-based method for quantitative modelling of prosody to be used in a Spanish text to speech system. The main objective is the development of a...
Article
Full-text available
Hidden Markov Models based text-to-speech (HMM-TTS) syn-thesis is a technique for generating speech from trained statisti-cal models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to de-scribe a Spanish HMM-TTS system using CBR as a F0 esti-mator, analysing its performance objectively and...
Article
Full-text available
Hidden Markov Models based text-to-speech (HMM-TTS) syn-thesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of ba-sic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based...
Conference Paper
Full-text available
This paper describes a multi-domain text-to-speech (MD-TTS) synthesis strategy for generating speech among different domains and so increasing the flexibility of high quality TTS systems. To that effect, the MD-TTS introduces a flexible TTS architecture that includes an automatic domain classification module, which allows MD-TTS systems to be imple...
Article
Full-text available
Resumen Este trabajo presenta nuevas aportaciones relacionadas con la definición de la conversión de texto en habla (CTH) denomina-da síntesis multidomino. Esta propuesta intenta conseguir una calidad sintética próxima a la de los sistemas de CTH de do-minio limitado con la versatilidad de la síntesis de propósito general. La arquitectura multidomi...
Conference Paper
Full-text available
The quality of corpus based text-to-speech systems depends on the accuracy of the unit selection process, which relies on the values of the weights of the cost function. This paper is focused on defining a new framework for the tuning of these weights. We propose a technique for taking into account the subjective perception of speech in the selecti...
Conference Paper
Full-text available
A new algorithm for the incremental learning and non-intrusive tracking of the appearance of a previously non-seen face is presented. The computation is done in a causal fashion: the information for a given frame to be processed is combined only with the one of previous frames. To achieve this aim, a novel way for simultaneous and incremental compu...
Conference Paper
Full-text available
This paper describes an initial approach to emotional speech synthesis in Catalan based on a diphone concatenation TTS system. The main goal of this work is to develop a simple prosodic model for expressive synthesis. This model is obtained from an emotional speech collection artificially generated by means of a copy-prosody experiment. After valid...
Conference Paper
Full-text available
This paper proposes a new method for lip animation of personalized facial model from auditory speech. It is based on Bayesian estimation and person specific appearance models (PSFAM). Initially, a video of a speaking person is recorded from which the visual and acoustic features of the speaker and their relationship will be learnt. First, the visua...
Article
A novel way to learn and track simultaneously the appearance of a previously non-seen face without intrusive techniques can be found in this article. The presented approach has a causal behaviour: no future frames are needed to process the current ones. The model used in the tracking process is refined with each input frame thanks to a new algorith...
Conference Paper
Full-text available
This paper presents a new method named text to visual synthesis with appearance models (TEVISAM) for generating videorealistic talking heads. In a first step, the system learns a person-specific facial appearance model (PSFAM) automatically. PSFAM allows modeling all facial components (e.g. eyes, mouth, etc) independently and it will be used to ani...
Article
Full-text available
The work presented in this paper deals with text-to-speech systems based on unit selection. The quality of the synthesis relies on having an accurate unit selection process. Usually, the quality of this procedure can be tuned by adjusting a set of weights that control the selection process. However, in order to achieve a good quality, the tuning pr...
Article
Full-text available
In this paper we present a high-quality text-to-speech system using diphones and triphones. The implemented synthesis system is based on a hybrid model that combines a harmonic plus noise decomposition technique with some features of TD-PSOLA. The analysis and the synthesis processes are pitch-synchronous, so prosodic modifications can be generated...
Conference Paper
Full-text available
This paper describes a 2D realistic talking face. The facial appearance model is constructed with a parameterised 2D sample based model. This representation supports moderated head movements, facial gestures and emotional expressions. Two main contributions for talking heads applications are proposed. First, the image of the lips is synthesized by...
Article
Full-text available
En este artículo se presenta la implementación y evaluación de un sistema de generación automática de marcas de pitch, para el etiquetado de un corpus de voz. El sistema está basado en dos conceptos: la energía de la señal de voz y la programación dinámica. La evaluación es doble: respecto al etiquetado de un corpus de habla contínua en catalán y r...
Article
Full-text available
This paper describes the methodology used for validating the results obtained in a study about acoustical modelling of emotional expression in Castilian Spanish.
Conference Paper
This paper describes a Unit Selection system based on diphones that was developed by the Speech Technology Group of the Enginyeria Arquitectura La Salle School, Universitat Ramon Llull. This system works with a PSOLA synthesiser for Catalan language which is used in an Oral Synthesised Message Editor (EMOVS) and Windows applications developed using...
Article
Full-text available
The unit selection based text-to-speech (TTS) systems [1] work with large speech corpora labelled with a huge amount of data. Recorded speech is time-aligned at phonetic level by segmentation marks (phoneme boundaries). Although manual phonetic alignment is considered more accurate than automatic methods, it is too time consuming to be commonly use...
Article
Full-text available
This paper presents the text-to-speech (TTS) synthesis system of La Salle (Universitat Ramon Llull, URL) and its adaptation to the Albayzin Evaluation Campaign of FALA2010 conference. The URL-TTS system follows the classical scheme of unit se-lection TTS synthesis systems. However, it presents two dis-tinguishable particularities: i) prosody predic...
Article
Full-text available
RESUMEN En este artículo se presenta la utilización del aprendizaje analógico, en particular el razonamiento basado en casos, como herramienta de generación automática de la proso-dia a partir de texto, el cual ha sido etiquetado de for-ma automática con atributos prosódicos. Se trata de un método basado en corpus para el modelado cuantitativo de l...
Article
Full-text available
Este artículo describe nuevas líneas de investigación referentes a la síntesis concatenativa para la conversión texto habla. La técnica TD-PSOLA supuso un salto importante en cuanto a la mejora de la calidad de los sistemas anteriores. Ha sido un método válido para muchas aplicaciones pero es insuficiente para las nuevas necesidades en el campo de...

Network

Cited By

Projects