Natural speaker-independent Arabic speech recognition system based on Hidden Markov Models using Sphinx tools
Fac. of Comput. Sci. & Inf. Technol., Univ. of Malaya, Kuala Lumpur, Malaysia
DOI: 10.1109/ICCCE.2010.5556829 Conference: Computer and Communication Engineering (ICCCE), 2010 International Conference on
This paper reports the design, implementation, and evaluation of a research work for developing a high performance natural speaker-independent Arabic continuous speech recognition system. It aims to explore the usefulness and success of a newly developed speech corpus, which is phonetically rich and balanced, presenting a competitive approach towards the development of an Arabic ASR system as compared to the state-of-the-art Arabic ASR researches. The developed Arabic AS R mainly used the Carnegie Mellon University (CMU) Sphinx tools together with the Cambridge HTK tools. To extract features from speech signals, Mel-Frequency Cepstral Coefficients (MFCC) technique was applied producing a set of feature vectors. Subsequently, the system uses five-state Hidden Markov Models (HMM) with three emitting states for tri-phone acoustic modeling. The emission probability distribution of the states was best using continuous density 16 Gaussian mixture distributions. The state distributions were tied to 500 senons. The language model contains uni-grams, bi-grams, and tri-grams. The system was trained on 7.0 hours of phonetically rich and balanced Arabic speech corpus and tested on another one hour. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers but similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a Word Error Rate (WER) of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a Word Error Rate (WER) of 15.59% and 14.44% with and without diacritical marks respectively.
Available from: Mohamed Tahar Ben Othman
- "We have invented a speech recognition        engine for the Arabic spoken phrases. It is based on the recently developed Google Speech API. "
[Show abstract] [Hide abstract]
ABSTRACT: Quran memorization and learning are among the
best ways of worship that close to Almighty God because of
their numerous benefits as stated in the Quran and Sunnah.
We have developed a virtual learning system (Electronic
Miqra'ah). Scientists can supervise remotely the registered
students. Students, from different ages, can register from
anywhere in the world given that they have Internet
connection. Students can interact with the scientist in real time
so that they can help them memorize (Tahfeez), guide them for
error correction, and give them lectures or lessons through
virtual learning rooms. The targeted groups of users can be
non-blind people, blind people, manual-disabled people and
illiterate people. We have developed this system such that it
takes the commands via voice in addition to the normal inputs
like mouse and keyboard. Users can dictate the commands to
the system orally and the system recognizes the spoken phrases
and execute them. The system administrators create several
virtual learning rooms and register the licensed scientists.
Administrators prepare a daily schedule for each room.
Students can register to any of these rooms by pronouncing its
name. Each student is allocated a portion of time where he/she
can interact directly by voice with the scientist. Other students
can listen to the current student's recitation and the error
corrections, guidance or lessons from the scientists.
IEEE International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, Al Madinah, Saudi Arabia, 22-25, Dec 2013; 12/2013
[Show abstract] [Hide abstract]
ABSTRACT: This paper describes the preparation, recording, analyzing, and evaluation of a new speech corpus for Modern Standard Arabic (MSA). The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing three major regions (Levant, Gulf, and Africa). Three hundred and sixty seven sentences are considered as phonetically rich and balanced, which are used for training Arabic Automatic Speech Recognition (ASR) systems. The rich characteristic is in the sense that it must contain all phonemes of Arabic language, whereas the balanced characteristic is in the sense that it must preserve the phonetic distribution of Arabic language. The remaining 48 sentences are created for testing purposes, which are mostly foreign to the training sentences and there are hardly any similarities in words. In order to evaluate the speech corpus, Arabic ASR systems were developed using the Carnegie Mellon University (CMU) Sphinx 3 tools at both training and testing/decoding levels. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 8 h of training speech data, the acoustic model is best using continuous observation's probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The language model contains uni-grams, bi-grams, and tri-grams. For same speakers with different sentences, Arabic ASR systems obtained average Word Error Rate (WER) of 9.70%. For different speakers with same sentences, Arabic ASR systems obtained average WER of 4.58%, whereas for different speakers with different sentences, Arabic ASR systems obtained average WER of 12.39%. © 2011 Springer Science+Business Media B.V.
12/2011; 46(4):1-34. DOI:10.1007/s10579-011-9166-8
[Show abstract] [Hide abstract]
ABSTRACT: Gaussian mixture model (GMM) is a conventional method for speech recognition, known for its effectiveness and scalability in speech modeling. This paper presents automatic recognition of the Spoken Arabic Digits based on (GMM) classifier and the leading approach for speech recognition features extraction Delta-Delta Mel- frequency cepstral coefficients (DDMFCC). The experimental results give the best result with the obtained parameters; they achieve a 99.31% correct digit recognition dataset which is very satisfactory compared to previous work on spoken Arabic digits speech recognition.
Sustainable Utilization and Development in Engineering and Technology (STUDENT), 2012 IEEE Conference on; 01/2012
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.