Natural speaker-independent Arabic speech recognition system based on Hidden Markov Models using Sphinx tools
ABSTRACT This paper reports the design, implementation, and evaluation of a research work for developing a high performance natural speaker-independent Arabic continuous speech recognition system. It aims to explore the usefulness and success of a newly developed speech corpus, which is phonetically rich and balanced, presenting a competitive approach towards the development of an Arabic ASR system as compared to the state-of-the-art Arabic ASR researches. The developed Arabic AS R mainly used the Carnegie Mellon University (CMU) Sphinx tools together with the Cambridge HTK tools. To extract features from speech signals, Mel-Frequency Cepstral Coefficients (MFCC) technique was applied producing a set of feature vectors. Subsequently, the system uses five-state Hidden Markov Models (HMM) with three emitting states for tri-phone acoustic modeling. The emission probability distribution of the states was best using continuous density 16 Gaussian mixture distributions. The state distributions were tied to 500 senons. The language model contains uni-grams, bi-grams, and tri-grams. The system was trained on 7.0 hours of phonetically rich and balanced Arabic speech corpus and tested on another one hour. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers but similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a Word Error Rate (WER) of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a Word Error Rate (WER) of 15.59% and 14.44% with and without diacritical marks respectively.
- [show abstract] [hide abstract]
ABSTRACT: This paper describes the development of an Arabic broadcast news transcription system. The presented system is a speaker-independent large vocabulary natural Arabic speech recognition system, and it is intended to be a test bed for further research into the open ended problem of achieving natural language man-machine conversation. The system addresses a number of challenging issues pertaining to the Arabic language, e.g. generation of fully vocalized transcription, and rule-based spelling dictionary. The developed Arabic speech recognition system is based on the Carnegie Mellon university Sphinx tools. The Cambridge HTK tools were also utilized at various testing stages. The system was trained on 7.0 hours of a 7.5 hours of Arabic broadcast news corpus and tested on the remaining half an hour. The corpus was made to focus on economics and sport news. At this experimental stage, the Arabic news transcription system uses five-state HMM for triphone acoustic models, with 8 and 16 Gaussian mixture distributions. The state distributions were tied to about 1680 senons. The language model uses both bi-grams and tri-grams. The test set consisted of 400 utterances containing 3585 words. The Word Error Rate (WER) came initially to 10.14 percent. After extensive testing and tuning of the recognition parameters the WER was reduced to about 8.61% for non-vocalized text transcription.International Journal of Speech Technology 10(4):183-195.
Conference Proceeding: Human computer interaction using isolated-words speech recognition technology[show abstract] [hide abstract]
ABSTRACT: This research paper aims to develop an isolated-word automatic speech recognition (IWASR) system based on vector quantization (VQ). This system receives, analyzes, searches and matches an input speech signal with the trained set of speech signals which are stored in the database/codebook, and returns matching results to users. IWASR is meant to assist customers calling a universitypsilas telephone operator to respond to their enquiries in a convenient way using their natural speech. Callers are assisted to select language, faculty and the staff name they wish to contact. To extract features from speech signals, Mel-frequency cepstral coefficients (MFCC) algorithm was applied. Subsequently, vector quantization was used for all feature vectors generated from the MFCC. A codebook was resulted from training the VQ initial codebook and experimental results showed that the recognition rate has been improved with the increase of codebook size and showed that the codebook size of 81 feature vectors had a recognition rate exceeded 85%.Intelligent and Advanced Systems, 2007. ICIAS 2007. International Conference on; 12/2007
Conference Proceeding: Generation of arabic phonetic dictionaries for speech recognition[show abstract] [hide abstract]
ABSTRACT: Phonetic dictionaries are essential components of large-vocabulary natural language speaker-independent speech recognition systems. This paper presents a rule-based technique to generate Arabic phonetic dictionaries for a large vocabulary speech recognition system. The system used classic Arabic pronunciation rules, common pronunciation rules of Modern Standard Arabic, as well as morphologically driven rules. The paper gives in detail an explanation of these rules as well as their formal mathematical presentation. The rules were used to generate a dictionary for a 5.4 hours corpus of broadcast news. The phonetic dictionary contains 23,841 definitions corresponding to about 14232 words. The generated dictionary was evaluated on an actual Arabic speech recognition system. The pronunciation rules and the phone set were validated by test cases. The Arabic speech recognition system achieves word error rate of %11.71 for fully diacritized transcription of about 1.1 hours of Arabic broadcast news.Innovations in Information Technology, 2008. IIT 2008. International Conference on; 01/2009