January 2001
·
211 Reads
·
4 Citations
We investigated the intelligibility and acceptability of three formant text-to-speech (TTS) engines suitable for use in devices with embedded speech recognition capability. Listeners transcribed and rated recordings of short phrases from four text domains (U.S. currency, dates, digits and proper names) produced by three commercially-available embedded TTS engines and a human speaker. The human voice received the best intelligibility and acceptability scores, and one of the TTS engines had superior intelligibility and acceptability relative to the two others. The results suggest that the ability to accurately produce names (the least constrained and least accurately transcribed text domain) was the system characteristic that best discriminated among the engines. The intelligibility and acceptability scores were generally consistent. Listeners transcribed shorter phrases more accurately than longer phrases, but acceptability ratings were independent of phrase length.