Figure 1 - uploaded by Hieu-Thi Luong
Content may be subject to copyright.
Source publication
Speech synthesis is the technology of generating speech from an input. While the term is commonly used to refer to text-to-speech (TTS), there are many types of speech synthesis systems which handle different input interfaces such as voice conversion (VC), which converts speech of a source speaker to the voice of a target, or video-to-speech, which...
Similar publications
We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech with a target voice either from a text input or a reference utterance of an arbitrary source speaker. By using a multi-speaker speech corpus to train all requisite encoders and decoders in the initial training stage, our system can clone unseen voices using untr...
Neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods in many ways. However, the exposure bias problem, that arises from the mismatch between the training and inference process in autoregressive models, remains an issue. It often leads to performance degradation in face of out-of-domain test data. To address this pr...