Conference Paper

Natural speaker-independent Arabic speech recognition system based on Hidden Markov Models using Sphinx tools

Fac. of Comput. Sci. & Inf. Technol., Univ. of Malaya, Kuala Lumpur, Malaysia
DOI: 10.1109/ICCCE.2010.5556829 Conference: Computer and Communication Engineering (ICCCE), 2010 International Conference on
Source: IEEE Xplore

ABSTRACT This paper reports the design, implementation, and evaluation of a research work for developing a high performance natural speaker-independent Arabic continuous speech recognition system. It aims to explore the usefulness and success of a newly developed speech corpus, which is phonetically rich and balanced, presenting a competitive approach towards the development of an Arabic ASR system as compared to the state-of-the-art Arabic ASR researches. The developed Arabic AS R mainly used the Carnegie Mellon University (CMU) Sphinx tools together with the Cambridge HTK tools. To extract features from speech signals, Mel-Frequency Cepstral Coefficients (MFCC) technique was applied producing a set of feature vectors. Subsequently, the system uses five-state Hidden Markov Models (HMM) with three emitting states for tri-phone acoustic modeling. The emission probability distribution of the states was best using continuous density 16 Gaussian mixture distributions. The state distributions were tied to 500 senons. The language model contains uni-grams, bi-grams, and tri-grams. The system was trained on 7.0 hours of phonetically rich and balanced Arabic speech corpus and tested on another one hour. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers but similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a Word Error Rate (WER) of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a Word Error Rate (WER) of 15.59% and 14.44% with and without diacritical marks respectively.

0 Bookmarks
 · 
138 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: —There are many Islamic websites and applications dedicated to the educational services for the Holy Quran and Its Sciences (Quran Recitations, the interpretations, etc.) on the Internet. Unfortunately, blind and handicapped people could not use these services. These people cannot use the keyboard and the mouse. In addition, the ability to read and write are essential to benefit from these services. In this Paper, we present an educational environment that allows these people to take full advantage of the scientific materials. This is be done through the interaction with the system using voice commands by speaking directly without the need to write or to use the mouse. Google Speech API is used for the universal speech recognition after a preprocessing and post processing phases to improve the accuracy. For blind people, responses of these commands will be played back through the audio device instead of displaying the text to the screen. The text will be displayed on the screen to help other people make use of the system.
    IEEE International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, Al Madinah, Saudi Arabia; 12/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Quran memorization and learning are among the best ways of worship that close to Almighty God because of their numerous benefits as stated in the Quran and Sunnah. We have developed a virtual learning system (Electronic Miqra'ah). Scientists can supervise remotely the registered students. Students, from different ages, can register from anywhere in the world given that they have Internet connection. Students can interact with the scientist in real time so that they can help them memorize (Tahfeez), guide them for error correction, and give them lectures or lessons through virtual learning rooms. The targeted groups of users can be non-blind people, blind people, manual-disabled people and illiterate people. We have developed this system such that it takes the commands via voice in addition to the normal inputs like mouse and keyboard. Users can dictate the commands to the system orally and the system recognizes the spoken phrases and execute them. The system administrators create several virtual learning rooms and register the licensed scientists. Administrators prepare a daily schedule for each room. Students can register to any of these rooms by pronouncing its name. Each student is allocated a portion of time where he/she can interact directly by voice with the scientist. Other students can listen to the current student's recitation and the error corrections, guidance or lessons from the scientists.
    IEEE International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, Al Madinah, Saudi Arabia, 22-25, Dec 2013; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gaussian mixture model (GMM) is a conventional method for speech recognition, known for its effectiveness and scalability in speech modeling. This paper presents automatic recognition of the Spoken Arabic Digits based on (GMM) classifier and the leading approach for speech recognition features extraction Delta-Delta Mel- frequency cepstral coefficients (DDMFCC). The experimental results give the best result with the obtained parameters; they achieve a 99.31% correct digit recognition dataset which is very satisfactory compared to previous work on spoken Arabic digits speech recognition.
    Sustainable Utilization and Development in Engineering and Technology (STUDENT), 2012 IEEE Conference on; 01/2012