Sérgio Paulo

Sérgio Paulo
VoiceInteraction S.A. · Speech Processing Technologies

PhD

About

19
Publications
2,958
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
166
Citations

Publications

Publications (19)
Article
The subtitling demand of multimedia content has grown quickly over the last years, especially after the adoption of the new European audiovisual legislation, which forces to make multimedia content accessible to all. As a result, TV channels have been moved to produce subtitles for a high percentage of their broadcast content. Consequently, the mar...
Conference Paper
Full-text available
This paper describes the data collection, annotation and sharing activities carried out within the FP7 EU-funded SAVAS project. The project aims to collect, share and reuse audiovisual language resources from broadcasters and subtitling companies to develop large vocabulary continuous speech recognisers in specific domains and new languages, with t...
Conference Paper
Full-text available
The subtitling demand has grown quickly over the years. The path of manual subtitling is no longer feasible, due to increased costs and reduced production times. Assisted Subtitling is an emerging technique, consisting in the application of Automatic Speech Recognition (ASR) to automatically generate program transcripts. This paper will report on r...
Conference Paper
Full-text available
Much information of potential relevance to police investigations of organised crime is available in public sources without being recognised and used. Barriers to the simple and efficient exploitation of this information include that not everything is easily searchable, and may be written in a language other than that of the investigator. To help ov...
Conference Paper
Full-text available
This paper describes a new generic text-to-speech synthesis system, developed in the scope of the Tecnovoz Project. Although it was primarily targeted at speech synthesis in European Portuguese, its modular architecture and flexible components allows its use for different languages. We also provide a survey on the development of the language resour...
Conference Paper
Full-text available
In this paper we share our experience and describe the methodologies that we have used in designing and recording large speech databases for applications requiring speech synthesis. Given the growing demand for customized and domain specific voices for use in corpus based synthesis systems, we believe that good practices should be established for t...
Conference Paper
Full-text available
This paper describes the INESC-ID participation in the Blizzard Challenge 2008, which consisted in building the two English voices. We have been developing a new European Portuguese TTS system, called DIXI, for the last two years. This year, the system was already stable enough to be used in the challenge, after a partial adaptation to support synt...
Article
Full-text available
Here, we present the waveform generation module of Dixi Speech Synthesis System, which was developed in the scope of the Tecnovoz project. It was originaly designed for European Portuguese, and is now being adapted to other languages, such as British English. This is the acoustic synthesis module that we will use in the ECESS evaluation,campaign. W...
Conference Paper
Full-text available
This paper describes our work integrating automatic speech generation into a virtual environment where autonomous agents are enabled to interact by natural spoken language. The application intents to address bullying problems for children aged 9-12 in the UK and Germany by presenting improvised dramas and by asking the user to act as an "invisible...
Conference Paper
Full-text available
Abstract The Multi-Level Alignment System (MuLAS) is the L2F tool for building multi-tier speech corpora with reduced or no hu- man,intervention at all. MuLAS automatically combines,in- formation coming from external speech annotations, human or machine-generated, with the text-based utterance descriptions that it creates, in order to build more re...
Conference Paper
Full-text available
This paper describes a speech segmentation tool allowing alternative word pronunciations within a WFST framework. Two approaches to word pronunciation graph generation were developed and evaluated. The first approach is grapheme-based where each grapheme is converted into all the phones it can give rise to, in the form of a WFST. Word graphs are ob...
Conference Paper
Full-text available
The goal of producing a corpus-based synthesizer with the owner's voice can only be achieved if the system can handle recordings with less than ideal characteristics. One of the limita- tions is that a normal speaker does not always pronounce a word exactly as predicted by the language rules. In this work we com- pare two methods for handling varia...
Conference Paper
Full-text available
In this paper we propose the use of an HMM-based phonetic aligner together with a speech-synthesis-based one to improve the accu- racy of the global alignment system. We also present a phone duration- independent measure to evaluate the accuracy of the automatic annota- tion tools. In the second part of the paper we propose and evaluate some new co...
Article
This paper describes our e#orts in porting our letter-tosound module from European Portuguese to Mirandese, the second o#- cial language in Portugal. We describe the rule formalism and the composition of the various transducers involved in the letter-to-sound conversion.
Conference Paper
Full-text available
This paper describes our efforts in porting our letter-to- sound module from European Portuguese to Mirandese, the second offi- cial language in Portugal. We describe the rule formalism and the com- position of the various transducers involved in the letter-to-sound con- version. We propose a set of extra SAMPA symbols to be used in the phonetic tr...
Conference Paper
Full-text available
The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such mate- rial is available, a possible solution is to synthesize the same phonetic sequence a...
Conference Paper
Full-text available
This paper presents the results of our effort in improving the accuracy of a DTW-based automatic phonetic aligner. The adopted model assumes that the phonetic segment sequence is already known and so the goal is only to align the spoken utterance with a reference synthetic signal produced by waveform concatenation without prosodic modifications. In...
Conference Paper
Full-text available
The purpose of this work was the development of a set of tools to automate the process of multilevel annotation of speech signals, preserving the alignments of the utterance's different levels of the linguistic representation. Our goal is to build speech databases, using speech from non professional speakers with multilevel relational annotations,...
Article
Full-text available
In this paper we describe our system used for the 2007 Blizzard Challenge TTS evaluation task. Following the rules we were building three voices from the given speech database where a first voice was created from the full data a second voice was build from the ARCTIC subset data and a third voice from a self-defined subset. The self defined subset...

Network

Cited By

Projects

Projects (2)
Project
Vigilancia da criminalidade