Xavi Gonzalvo

Xavi Gonzalvo
Google Inc. | Google · Research and machine intelligence

PhD

About

26
Publications
5,124
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
439
Citations
Additional affiliations
April 2016 - present
Google
Position
  • Staff research scientist
January 2011 - April 2016
Google
Position
  • Staff research scientist
April 2008 - December 2011
Position
  • Phonetic Arts Ltd

Publications

Publications (26)
Article
Full-text available
We present E NERGY N ET , a new framework for analyzing and building artificial neural network architectures. Our approach adaptively learns the structure of the networks in an unsupervised manner. The methodology is based upon the theoretical guarantees of the energy function of restricted Boltzmann machines (RBM) of infinite number of nodes. We p...
Conference Paper
Full-text available
We present a new theoretical framework for analyzing and learning artificial neural networks. Our approach simultaneously and adaptively learns both the structure of the network as well as its weights. The methodology is based upon and accom- panied by strong data-dependent theoretical learning guarantees, so that the final network architecture pro...
Conference Paper
Full-text available
This paper presents advances in Google's hidden Markov model (HMM)-driven unit selection speech synthesis system. We describe several improvements to the run-time system; these include minimal latency, high-quality and fast refresh cycle for new voices. Traditionally unit selection synthesizers are limited in terms of the amount of data they can ha...
Patent
Full-text available
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for providing statistical unit selection language modeling based on acoustic fingerprinting. The methods, systems and apparatus include the actions of obtaining a unit database of acoustic units and, for each acoustic unit, linguistic data corresponding...
Patent
Full-text available
The present disclosure describes example systems, methods, and devices for generating a synthetic speech signal. An example method may include determining a phonemic representation of text. The example method may also include identifying one or more finite-state machines (FSMs) corresponding to one or more phonemes included in the phonemic represen...
Article
Full-text available
Modern Text-To-Speech (TTS) systems need to increasingly deal with multilingual input. Navigation, social and news are all domains with a large proportion of foreign words. However, when typical monolingual TTS voices are used, the synthesis quality on such input is markedly lower. This is because traditional TTS derives pronunciations from a lexic...
Conference Paper
Full-text available
This paper proposes the use of Quantized Hidden Markov Models (QHMMs) for reducing the footprint of conventional parametric HMM-based TTS system. Previously, this technique was successfully applied to automatic speech recognition in embedded devices without loss of recognition performance. In this paper we investigate the construction of different...
Conference Paper
Full-text available
This paper describes a high-quality Spanish HMM-based speech synthesis of emotional speaking styles. The quality of the HMM-based speech synthesis is enhanced by using the most recent features presented for the Blizzard system (i.e. STRAIGHT spectrum extraction and mixed excitation). Two techniques are evaluated. First, a method simultaneously mode...
Article
This paper is a contribution to the recent advancements in the development of high-quality next generation text-to-speech (TTS) synthesis systems. Two of the hottest research topics in this area are oriented towards the improvement of speech expressiveness and flexibility of synthesis. In this context, this paper presents a new TTS strategy called...
Conference Paper
Full-text available
In this work, the capability of voice quality parameters to discriminate among different expressive speech styles is analyzed. To that effect, the data distribution of these parameters, directly measured from the acoustic speech signal, is used to train a Linear Discriminant Analysis that conducts an automatic classification. As a result, the most...
Conference Paper
Full-text available
Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using an external machine learning technique to help improving the expressi...
Article
Full-text available
Hidden Markov Models based text-to-speech (HMM-TTS) syn-thesis is a technique for generating speech from trained statisti-cal models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to de-scribe a Spanish HMM-TTS system using CBR as a F0 esti-mator, analysing its performance objectively and...
Article
Full-text available
Hidden Markov Models based text-to-speech (HMM-TTS) syn-thesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of ba-sic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based...
Conference Paper
Full-text available
This paper describes a multi-domain text-to-speech (MD-TTS) synthesis strategy for generating speech among different domains and so increasing the flexibility of high quality TTS systems. To that effect, the MD-TTS introduces a flexible TTS architecture that includes an automatic domain classification module, which allows MD-TTS systems to be imple...
Article
Full-text available
En este trabajo se presenta un sistema de clasificación de textos adaptado a las necesidades que plantea la conversión de texto en habla multidominio. Este sistema, que es una evolución de una propuesta anterior basada en la representación de los textos mediante un grafo de nodos ponderados, ha sido desarrollado para mejorar la eficiencia de clasif...
Conference Paper
Full-text available
This paper presents the design of spoken dialogue system strategies based on reinforcement learning. Many authors have recently proposed treating the dialogue system as a state sequence and the introduction of learning methods based on trial-and-error to find and optimal dialogue strategy has opened a new investigation area. This work proposes some...
Conference Paper
Full-text available
This paper presents an academic tracing support application based on data mining algorithms, neural networks and fuzzy logic methods. The information technologies progresses have propitiated a dramatic increase of the data volume handled by a wide range of organizations, academic ones included. For this reason, tools that provide information from d...
Conference Paper
Full-text available
In this paper a new DS-CDMA protocol, oriented to auditory management, is presented. Design bases and a study of the main design parameters are presented. Channel coding, frequency channelization and frame structure, among others, are described to be used in our communication system. Moreover, simulation software is developed in order to help the n...
Article
Full-text available
RESUMEN En este artículo se presenta la utilización del aprendizaje analógico, en particular el razonamiento basado en casos, como herramienta de generación automática de la proso-dia a partir de texto, el cual ha sido etiquetado de for-ma automática con atributos prosódicos. Se trata de un método basado en corpus para el modelado cuantitativo de l...
Article
Full-text available
Este trabajo presenta un transcriptor automático de acrónimos con el objetivo de incrementar la calidad de la síntesis generada en un conversor de texto en habla, ante la presencia de acrónimos en el texto. La transcripción de los acrónimos se realiza usando un árbol de decisión (algoritmo C4.5) sobre los datos de entrenamiento. El trabajo presenta...

Network

Cited By

Projects

Projects (4)
Project
Adaptive structural learning of neural networks
Archived project
Text-to-Speech startup in Cambridge, UK.