Huifang Wang’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


Figure 1. Voice by Text Domain Interaction (Intelligibility)
Figure 2. Voice by Factor Interaction (Acceptability)
Intelligibility and Acceptability of Short Phrases Generated by Embedded Text-To-Speech Engines
  • Conference Paper
  • Full-text available

January 2001

·

211 Reads

·

4 Citations

Huifang Wang

·

We investigated the intelligibility and acceptability of three formant text-to-speech (TTS) engines suitable for use in devices with embedded speech recognition capability. Listeners transcribed and rated recordings of short phrases from four text domains (U.S. currency, dates, digits and proper names) produced by three commercially-available embedded TTS engines and a human speaker. The human voice received the best intelligibility and acceptability scores, and one of the TTS engines had superior intelligibility and acceptability relative to the two others. The results suggest that the ability to accurately produce names (the least constrained and least accurately transcribed text domain) was the system characteristic that best discriminated among the engines. The intelligibility and acceptability scores were generally consistent. Listeners transcribed shorter phrases more accurately than longer phrases, but acceptability ratings were independent of phrase length.

Download

Citations (1)


... For the voice samples, I went through all the recordings of synthetic voices I had worked with from 2000 to 2017 (Lewis, 2001a(Lewis, -b, 2002(Lewis, , 2004Polkosky & Lewis, 2002a-c). Previous research has shown that listeners make their quality judgements quickly (Polkosky & Lewis, 2002c;Wang & Lewis, 2001), so the samples were edited to have a length of about 30 seconds. In addition to the synthetic voices, I also put together three samples of professional human voice talents who had either recorded segments for an interactive voice response system or had provided recordings for the production of a synthetic voice. ...

Reference:

Investigating MOS-X Ratings of Synthetic and Human Voices
Intelligibility and Acceptability of Short Phrases Generated by Embedded Text-To-Speech Engines