Skills (22)
-
12 Questions916 Followers
-
28 Questions4370 Followers
-
16 Questions1439 Followers
-
130 Questions17724 Followers
-
1 Question13 Followers
-
274 Questions6632 Followers
-
34 Questions5019 Followers
-
28 Questions472 Followers
-
13 Questions176 Followers
-
7 Questions207 Followers
-
1 Question54 Followers
-
0 Questions13 Followers
-
6 Questions142 Followers
-
0 Questions71 Followers
-
11 Questions1796 Followers
-
6 Questions1290 Followers
-
9 Questions691 Followers
-
0 Questions30 Followers
-
32 Questions4167 Followers
-
2 Questions63 Followers
-
1 Question25 Followers
-
3 Questions127 Followers
Research experience
-
Jul 2010–
presentResearch: Robotic Speech Recognition
University of Auckland · Department of Electrical & Computer Engineering · HealthBots GroupNew Zealand · Auckland
Education
-
Jul 2006–
Mar 2009Ain Shams University
Computer Science · MasterEgypt · Cairo
Awards & achievements
-
Oct 2012Award: Best Student Paper Award from the international conference on social robotics, China
-
Jul 2010Grant: R&D program of the Korea Ministry of Knowledge and Economy (MKE) and Korea Evaluation Institute of Industrial Technology (KEIT)
Other
-
LanguagesEnglish, Arabic
-
Scientific Memberships--
-
Other Interests--
Questions and Answers (4) View all
-
Answer added in Signal Processing7 How is information captured from a signal?By Gupta Ashutosh · Amity UniversityAbdelaziz Abdelhamid · University of AucklandFrom the speech recognition perspective, the process goes like this. The brain of a speaker thinking of a certain message to speak, then the voice art... [more]From the speech recognition perspective, the process goes like this. The brain of a speaker thinking of a certain message to speak, then the voice articulators (such as vocal cords, lips, tongue, nose, etc.) are working together to produce the sounds of the parts of this messages (i.e., phonemes and words). On the other hand, the brain of the human listening to this message, analyzes the incoming signal through converting the frequencies of this signal to another scale (i.e., mel-scale), then the new frequencies are processed to extract a meaningful features. These features (physically in the form of electrical pulses in the brain) are compared with some neurons in the brain through a complicated process to match these features with already known phones and words, which in turn are used by the brain to uncover the full text of the message inherent in the incoming signal. There are a lot of details about this process, but this may help as a brief and general overview of how humans understand spoken messages.Following
-
Answer added in Digital Signal Processing14 Difference between convolution and correlationBy Aasma Garg · Indian Institute of Information Technology AllahabadAbdelaziz Abdelhamid · University of AucklandSimply, correlation is a measure of similarity between two signals, and convolution is a measure of effect of one signal on the other.Simply, correlation is a measure of similarity between two signals, and convolution is a measure of effect of one signal on the other.Following
-
Answer added in Shortest Path Routing Algorithms11 What is the latest/best runtime known for computing an all-pairs shortest paths-problem on a weighted network?By Katharina Zweig · Technische Universität KaiserslauternAbdelaziz Abdelhamid · University of AucklandI think Viterbi algorithm can be used in this case. It is very efficient in finding the shorted path (best decoding hypothesis) in a huge weighted fin... [more]I think Viterbi algorithm can be used in this case. It is very efficient in finding the shorted path (best decoding hypothesis) in a huge weighted finite state transducer network in the speech recognition domain. This algorithm can be realized using the token passing paradigm.Following
-
Question asked in Speech Processing1 Can i use minimum phone error (MPE) criterion for training HMM acoustic models using 1-best hypothesis instead of nom and denom lattices?Usually lattices are used in HMM parameter optimization with the MPE criterion. What about using only one decoding hypothesis instead of lattices?Usually lattices are used in HMM parameter optimization with the MPE criterion. What about using only one decoding hypothesis instead of lattices?By Abdelaziz Abdelhamid · University of AucklandFollowing
Publications (7) View all
-
Conference Proceeding: On the Robustness of Joint Optimization on Transducer-based Decoding Graphs
Abdelaziz A Abdelhamid, Waleed H Abdulla[show abstract] [hide abstract]
ABSTRACT: It is our believe that joint optimization of acoustic and language models meets the inherent correlation between them, and thus expected to achieve better recognition perfor-mance. This nice approach should be effective in achieving robust speech recognition where the testing conditions are different from those of training. The acoustic and language models are integrated together into a unified decoding graph using weighted finite state transducers. In this paper, we report experimental re-sults of the joint optimization of acoustic and language models on the Resource Management (RM1) continuous speech recognition. The results show that the proposed joint optimization approach is effective under noisy conditions for unseen testing utterances and achieved relative word error rate reduction from 7% to 17% for different noise levels. These results emphasize our expectation about the robustness of the proposed joint optimization approach.IEEE TENCON, Australia; 04/2013 -
SourceAvailable from: Abdelaziz A. Abdelhamid
Conference Proceeding: Speech Decoding Using Lattice Rescoring
Abdelaziz A Abdelhamid, Waleed H Abdulla[show abstract] [hide abstract]
ABSTRACT: The goal of automatic speech recognition (ASR) is to decode spoken utterances, by machine, to uncover their information content. Therefore, ASR is also called speech decoding. Over the last seven decades, Tremendous research efforts emerged competing together to achieve this goal in a real-time accurate way. One of the promising approaches to achieve that goal is lattice rescoring. Several knowledge sources are usually incorporated in the speech decoding process, such as acoustic, language and lexical models that are integrated together into a recognition network (also called decoding graph). Currently, the common approach usually followed in building this recog-nition network is weighted finite-state transducer (WFST) [1]. The speech decoding process can be performed based in either single-pass or multi-pass approach. The lattice rescoring is a form of multi-pass decoding, in which the lattice is generated in the first pass using simple and low order knowledge sources and the rescoring is performed in the second pass using higher order knowledge sources. WFST and speech decoding Currently, WFST is adopted as the best unified approach for integrating speech knowledge sources into one decoding graph [1]. This is due to the efficiency of the WFST operations in building compact graphs convenient for existing decoding algorithms, such as Viterbi algorithm. The WFST is defined as a set of states and transitions, where each transition carries an input symbol (i.e., context-dependent phoneme), output symbol (i.e., word) and a weight (i.e., natural log of a language model probability). When applying the Viterbi algorithm to a WFST, the result can be either a single-best decoding hypothe-sis (single sequence of words) or N-best hypotheses. To output these N-best hypotheses in a rich format, we usually group them together in a graph called lattice. The algorithm that is commonly used to generate these hypotheses is the Viterbi beam pruning which is usually implemented in the form of token passing mechanism, as shown in Fig. 1. In this figure, the decoding network is navigated through a propagation of a set of tokens, and at each point of the navigation process the acoustic and language model scores are calculated and accumulated in the corresponding token. During the decoding process, each active transition has an associated token for storing the decoding score up to this transition, and at the end of the decoding process, the last set of tokens are compared with respect to the accumulated scores at each token, then the N-best tokens (carrying highest scores) are selected. To generate a lattice during decoding, a separate data structure can be used to store the set of transitions the tokens passed over during the decoding process.APSIPA Letters; 03/2013 -
SourceAvailable from: Abdelaziz A. Abdelhamid
Conference Proceeding: Discriminative Training of Context-Dependent Phones on WFST-based Decoding Graphs
Abdelaziz A Abdelhamid, Waleed H Abdulla[show abstract] [hide abstract]
ABSTRACT: This research proposes a sub-graph extraction method to boost the discriminability of the generative acoustic models. The proposed method explicitly formalizes the notion that the words of reference decoding paths have not only one pronunciation but instead multiple pronunciations for the same word. The discriminative training framework is based on a minimum classification error (MCE) criterion for optimizing the parameters of hidden Markov models (HMMs) for context dependent phones. The primary task examined consists of 12 different large decoding graphs composed from four HMMs with different topologies and three n-gram language models containing 5k, 20k, and 64k words. The experimental results show that the proposed approach outperforms the baseline system based on maximum likelihood estimation (MLE) and achieves a reduction of 18.7% and 17.4% in word error rate (WER) and triphone error rate (TER) respectively when tested on the resource management (RM1) speech database.International Conference on Communications, Signal Processing, and their Applications (ICCSPA), 2013; 02/2013 -
SourceAvailable from: Abdelaziz A. Abdelhamid
Conference Proceeding: Optimizing the parameters of decoding graphs using new log-based MCE
A A Abdelhamid, W H AbdullaSignal Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific; 01/2012 -
SourceAvailable from: Abdelaziz A. Abdelhamid
Chapter: RoboASR: A Dynamic Speech Recognition System for Service Robots
AbdelazizA. Abdelhamid, WaleedH. Abdulla, BruceA. MacDonald01/2012: pages 485-495; , ISBN: 9783642341021