
Ignasi EsquerraUniversitat Politècnica de Catalunya | UPC · TALP - Centre for Language and Speech Technologies and Applications
Ignasi Esquerra
MSc. Engineer
About
20
Publications
1,327
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
83
Citations
Citations since 2017
Introduction
Skills and Expertise
Publications
Publications (20)
This paper presents our work around the FESTCAT project, whose main goal was the development of voices for the Festival suite in Catalan. In the first year, we produced the corpus and the speech data needed for build 10 voices using the Clunits (unit selection) and the HTS (Markov models) methods. The resulting voices are freely available on the we...
In this paper we describe the design and production of Catalan database for building synthetic voices. Two speakers, with 10 hours per speaker, have recorded 10 hours of speech. The speaker selection and the corpus design aim to provide resources for high quality synthesis. The resources have been used to build voices for the Festival TTS. Both the...
Segre is a rule -based automatic phonetic transcription system for Catalan, jointly developed by the Universitat Politècnica de Catalunya, the Universitat Autònoma de Barcelona and the Universitat de Barcelona in the framework of the Catalan Reference Centre for Language Engineering (CREL, Centre de Referència en Enginyeria Lingüística). The syntax...
This paper summarizes the text-to-speech system that has been developed in the Speech Group of the Universitat Politècnica de Catalunya (UPC). The system is composed of a core and different interfaces so that it is compatible for research, for telephone applications (either CTI boards or standard ISDN PC cards supporting CAPI), and Windows applicat...
Knowledge of phonetic unit frequency is very necessary for developing databases in both concatenative synthesis and continuous speech recognition. In the present work, a large corpus of text was processed and phonetically transcribed to obtain allophone and diphone frequencies for the Catalan language. The corpus was acquired from newspaper article...
Different databases of phonetic units are required in multilingual Text-to-Speech systems based on concatenative synthesis. We are currently developing a TTS system able to convert text either in Catalan and Spanish, with some of the modules being used indistinctly by the two languages while others are specific to each language. In order to reduce...
The multiwindow approach is a meaningful framework for
nonparametric spectral estimation. It also encompasses several
conventional methods as WOSA and frequency-averaged periodogram.
Recently, some authors claimed that the Slepian windows of Thomson's
method and other related optimal sets of windows show a better
performance in terms of resolution,...
The developed tool provides utilities for prosody analysis and labeling of voice signals. It works under Windows 95 and Windows NT environments and uses the Microsoft Win32 application programming interface (API) for audio playing and recording. The application detects the prosody of speech signal and then the original intonation can be stylized in...
This paper presents the evaluation of Ogmios, the UPC TTS system carried out within the Blizzard Challenge Initia-tive, 2007. Ogmios is a unit-selection based system. Prosodic models are used to select the units using acoustic measures in the target cost but the selected units are not modified. Most of the modules of Ogmios rely on data driven tech...
This paper presents the UPC TTS system named Ogmios. It was used to generate the voices in UK English and Mandarin Chinese for Blizzard Challenge 2008. Ogmios is a system based on unit-selection using acoustic and phonetic features both in target and concatenation costs. Most of the modules of Ogmios rely on data driven techniques. This evaluation...
In this paper, we present the design of a corpus for speech recognition to be used for the recording of a speech database in Catalan. A previous database in Spanish was the reference in setting the specifications about the characteristics of the sentences and in the minimum number of units required. An analysis of unit frequencies were carried out...
Abstractó The topic of emotional speech synthesis has received lately a lot of interest within the research community, as shown by the number of papers presented at conferences and workshops. In the beginning, synthesizers used rules to perform prosodic and voice quality changes in order to produce different styles of speaking to the synthetic voic...
This paper describes Ogmios, the UPC TTS system that was used in the 2010 Albayzin Evaluation. Ogmios is a concatenation system that builds the synthetic sentence from demiphones selected from the training database. In this evaluation round, the database was provided by the organization and it has been phonetically transcribed and segmented automat...
Estudio financiado por el MEC, Programa Estudio y Análisis Postprint (published version)