Figure 6 - available via license: Creative Commons Attribution-ShareAlike 4.0 International
Content may be subject to copyright.
t-SNE Visualization of speaker embeddings of male actual and generated samples of VCTK dataset. The left side shows the actual embedding space of all male speakers in the dataset. Right side shows the embedding space of train(green), val(red: cluster id -7,26,42) and test(black: cluster id -0,6,9,12,13)
Source publication
The style of the speech varies from person to person and every person exhibits his or her own style of speaking that is determined by the language, geography, culture and other factors. Style is best captured by prosody of a signal. High quality multi-speaker speech synthesis while considering prosody and in a few shot manner is an area of active r...