Pierre Lanchantin

Pierre Lanchantin
University of Cambridge | Cam · Department of Engineering

About

58
Publications
10,770
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,081
Citations

Publications

Publications (58)
Article
Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the form of the vocoder. One of the main causes of degradation is the reconstruction of the noise. In this article, a new signal model is proposed that leads to a simple synthesizer, without the need for ad-hoc tuning of model parameters. The model is not...
Article
Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the form of the vocoder. One of the main causes of degradation is the reconstruction of the noise. In this article, a new signal model is proposed that leads to a simple synthesizer, without the need for ad-hoc tuning of model parameters. The model is not...
Conference Paper
Full-text available
The quality of the vocoder plays a crucial role in the performance of parametric speech synthesis systems. In order to improve the vocoder quality, it is necessary to reconstruct as much of the perceived components of the speech signal as possible. In this paper, we first show that the noise component is currently not accurately modelled in the wid...
Conference Paper
Full-text available
We describe the development of our speech-to-text transcription systems for the 2015 Multi-Genre Broadcast (MGB) challenge. Key features of the systems are: a segmentation system based on deep neural networks (DNNs); the use of HTK 3.5 for building DNNbased hybrid and tandem acoustic models and the use of these models in a joint decoding framework;...
Conference Paper
Full-text available
We describe the alignment systems developed both for the preparation of data for the Multi-Genre Broadcast (MGB) challenge and for our participation in the transcription and alignment tasks. Captions of varying quality are aligned with the audio of TV-shows that range from few minutes long to more than six hours. Lightly supervised decoding is perf...
Conference Paper
Full-text available
This paper presents a multi-stage speaker diarisation system with longitudinal linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly dev...
Conference Paper
Full-text available
This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU–2015, an evaluation focused on speech recognition, speaker diarization, and “lightly supervised” alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast aud...
Article
Full-text available
This paper presents a multi-stage speaker diarisation system with longitudinal linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly dev...
Conference Paper
Full-text available
This is a placeholder record for research data underpinning conference publication (Interspeech'2015 conference in Dreseden http://interspeech2015.org/) "Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition". This record will be updated after the publication.
Chapter
Full-text available
The absence of alternatives/variants is a dramatical limitation of text-to-speech (TTS) synthesis compared to the variety of human speech. This chapter introduces the use of speech alternatives/variants in order to improve TTS synthesis systems. Speech alternatives denote the variety of possibilities that a speaker has to pronounce a sentence—depen...
Article
Full-text available
The assignment of prosodic events (accent and phrasing) from the text is crucial in text-to-speech synthesis systems. This paper addresses the combination of linguistic and metric constraints for the assignment of prosodic events in textto- speech synthesis. First, a linguistic processing chain is used to provide a rich linguistic description of a...
Conference Paper
This paper describes a novel approach for the speaker adaptation of statistical parametric speech synthesis systems based on the interpolation of a set of average voice models (AVM). Recent results have shown that the quality/naturalness of adapted voices depends on the distance from the average voice model used for speaker adaptation. This suggest...
Conference Paper
Full-text available
This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data. Standard lightly supervised training uses automatically derived decoding hypotheses using a biased language model. However, as the actual speech can deviate significantly from the original programme scripts that are supplied, the quality o...
Article
Full-text available
This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data. Standard lightly supervised training uses automatically derived decoding hypotheses using a biased language model. However, as the actual speech can deviate significantly from the original programme scripts that are supplied, the quality o...
Conference Paper
Full-text available
This paper describes some recent results of our collaborative work on developing a speech recognition system for the automatic transcription or media archives from the British Broadcasting Corporation (BBC). The material includes a wide diversity of shows with their associated metadata. The latter are highly diverse in terms of completeness, reliab...
Conference Paper
Full-text available
We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel techn...
Article
Full-text available
This paper addresses the use of speech alternatives to en-rich speech synthesis systems. Speech alternatives denote the variety of strategies that a speaker can use to pronounce a sen-tence -depending on pragmatic constraints, speaking style, and specific strategies of the speaker. During the training, symbolic and acoustic characteristics of a uni...
Data
Full-text available
This paper addresses the use of speech alternatives to en-rich speech synthesis systems. Speech alternatives denote the variety of strategies that a speaker can use to pronounce a sen-tence -depending on pragmatic constraints, speaking style, and specific strategies of the speaker. During the training, symbolic and acoustic characteristics of a uni...
Article
In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants–Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presen...
Conference Paper
Full-text available
IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be a...
Conference Paper
Full-text available
This paper assesses the ability of a HMM-based speech synthe- sis systems to model the speech characteristics of various speak- ing styles 1 . A discrete/continuous HMM is presented to model the symbolic and acoustic speech characteristics of a speak- ing style. The proposed model is used to model the average characteristics of a speaking style tha...
Conference Paper
Full-text available
In this paper, a method for prosodic break modelling based on segmental-HMMs and Dempster-Shafer fusion for speech syn- thesis is presented, and the relative importance of linguistic and metric constraints in prosodic break modelling is assessed 1 . A context-dependent segmental-HMM is used to explicitly model the linguistic and the metric constrai...
Conference Paper
Full-text available
Spectral voice conversion is usually performed using a single model selected in order to represent a tradeoff between goodness of fit and complexity. Recently, we proposed a new method for spectral voice conversion, called Dynamic Model Selection (DMS), in which we assumed that the model topology may change over time, depending on the source acoust...
Conference Paper
Full-text available
IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be a...
Article
Hidden Markov chains (HMC) are a very powerful tool in hidden data restoration and are currently used to solve a wide range of problems. However, when these data are not stationary, estimating the parameters, which are required for unsupervised processing, poses a problem. Moreover, taking into account correlated non-Gaussian noise is difficult wit...
Article
Statistical methods for voice conversion are usually based on a single model selected in order to represent a tradeoff between goodness of fit and complexity. In this paper we assume that the best model may change over time, depending on the source acoustic features. We present a new method for spectral voice conversion 1 called Dynamic Model Selec...
Conference Paper
Full-text available
This paper introduces a HMM-based speech synthesis system which uses a new method for the Separation of Vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and th...
Article
Full-text available
A major drawback of current Hidden Markov Model (HMM)-based speech synthesis is the monotony of the generated speech which is closely related to the monotony of the generated prosody. Com-plementary to model-oriented approaches that aim to increase the prosodic variability by reducing the "over-smoothing" effect, this paper presents a linguistic-or...
Chapter
Full-text available
La mission principale de l’Institut de recherche et coordination acoustique/musique (Ircam) est la création musicale et la création artistiqueen général, ce qui inclut notamment les arts du spectacle comme le théâtre ou le film. Cet institut possède une longue expérience dans l’analyseet la synthèse des sons, et en particulier de la parole. En effe...
Article
The hidden Markov chain (HMC) model is a couple of random sequences (X,Y), in which X is an unobservable Markov chain, and Y is its observable noisy version. Classically, the distribution p(y|x) is simple enough to ensure the Markovianity of p(x|y), that enables one to use different Bayesian restoration techniques. HMC model has recently been exten...
Conference Paper
Full-text available
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques, e.g. Hidden Markov Models (HMM), can be optimised for maximum segmentation accuracy. This paper presents the results of tuning such a phoneme segmentation system. Firstly, using no text transcriptio...
Article
Full-text available
un éventail d'outils pour la création, l'accès et la synchronisation des données d'un corpus de parole, mais ils sont rarement intégrés dans une seule et même plate-forme. Dans cet article, nous proposons IrcamCorpusTools, une plate-forme ouverte et facilement ex-tensible pour la création, l'analyse et l'exploitation de corpus de parole. Elle perme...
Article
Full-text available
This paper addresses the problem of unsupervised Bayesian hidden Markov chain restoration. When the hidden chain is stationary, the classical "Hidden Markov Chain" (HMC) model is quite efficient, and associated unsupervised Bayesian restoration methods using the "Expectation-Maximization" (EM) algorithm work well. When the hidden chain is non stati...
Conference Paper
Full-text available
The hidden Markov chains (HMC), which are widely used in different data restoration problems, have recently been generalized to pairwise partially Markov chains (PPMC), in which the distribution of the observed chain conditional on the hidden one is of any form. In particular, long-memory noise cases can be dealt with. The aim of this paper is to p...
Article
Full-text available
Les chaînes de Markov Triplet (CMT) généralisent les chaînes de Markov Couple (CMCouple), ces dernières généralisant les chaînes de Markov cachées (CMC). Par ailleurs, dans une CMC la loi a posteriori du processus caché, qui est de Markov, peut être vue comme une combinaison de Dempster de sa loi a priori p avec une probabilité q définie à partir d...
Article
Full-text available
Nous définissons un nouvel outil de segmentation statistique non supervisée, basé sur un modèle d'arbre de Markov caché flou. Notre modèle flou combine l'incertitude probabiliste des données observées avec les classes thématiques discrètes et continues qui représentent l'imprécision des données cachées. La technique de segmentation bayésienne mise...
Article
Full-text available
This work deals with the unsupervised Bayesian h idden Markov chain restoration extended to the non stationary case. Unsupervised restoration based on "Expectation- Maximization" (EM) or "Stochastic EM" (SEM) estimates considering the " Hidden Markov Chain" (HMC) model i s quite e fficient when the hidden chain is s tationary. However, when the lat...
Article
Full-text available
Hidden Markov fields (HMF) are widely used in image processing. In such models, the hidden random field of interest S s s X X = ) ( is a Markov field, and the distribution of the observed random field S s s Y Y = ) ( (conditional on X ) is given by s s x y p x y p ) ( ) ( . The posterior distribution ) ( y x p is then a Markov distribution, which a...
Article
Full-text available
The triplet Markov chains (TMC) generalize the pairwise Markov chains (PMC), and the latter generalize the hidden Markov chains (HMC). Otherwise, in an HMC the posterior distribution of the hidden process can be viewed as a particular case of the so called "Dempster's combination rule" of its prior Markov distribution p with a probability q defined...
Article
Full-text available
Les chaînes de Markov Triplet (CMT) généralisent les chaînes de Markov Couple (CMCouple), ces dernières généralisant les chaînes de Markov cachées (CMC). Par ailleurs, dans une CMC la loi a posteriori du processus caché, qui est de Markov, peut être vue comme une fusion de Dempster-Shafer (fusion DS) de sa loi p avec une probabilité q définie à par...

Network

Cited By