ABSTRACT: The Speech Recognition Group at IBM Research in Yorktown Heights has designed a real-time, isolated-utterance speech recognizer for natural language with a 5,000-word vocabulary based on the IBM Personal Computer (PC) AT model and two IBM Signal Processors realized in VLSI technology. The enrollment period for a new user is approximately 20 minutes. The basic vocabulary is chosen from the most common words in several collections of documents such as office memoranda and business letters. The system supports spelling and interactive personalization to augment this vocabulary. Signal processing, vector quantization, and acoustic matching algorithms are programmed on the IBM Signal Processors which fit into the PC AT chassis. The PC AT controls the Processors and implements the decoder stack search and the language model, as well as the application-specific interface. The modular architecture of the design is expandable to a 20,000-word vocabulary system by the addition of two more IBM Signal Processors housed in a PC Expansion Unit.
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86.; 05/1986